HW #5
Due dates
The assignment is due Monday, November 20 by the end of the day for both sections.
Description
In this assignment, we’ll explore restaurant review data available through the Yelp Dataset Challenge. The dataset includes Yelp data for user reviews and business information for many metropolitan areas. I’ve already downloaded this dataset (8 GB total!) and extracted out the data files for reviews and restaurants in Philadelphia. I’ve placed these data files into the data
directory in this repository.
This assignment is broken into two parts:
Part 1: Analyzing correlations between restaurant reviews and census data
We’ll explore the relationship between restaurant reviews and the income levels of the restaurant’s surrounding area.
Part 2: Exploring the impact of fast food restaurants
We’ll run a sentiment analysis on reviews of fast food restaurants and estimate income levels in neighborhoods with fast food restaurants. We’ll test how well our sentiment analysis works by comparing the number of stars to the sentiment of reviews.
Background readings - Does sentiment analysis work? - The Geography of Taste: Using Yelp to Study Urban Culture
Assignment details
A skeleton Jupyter notebook is available in this repository that will walk you through the steps of the assignment. The completed notebook should be submitted as your assignment.
Submission
We’ll be using GitHub for assignment submission again. You can set up your own private repository on GitHub for this assignment using the link below.
The invitation link will depend upon your section:
Section 402: https://classroom.github.com/a/eUYIUqNa
The assignment should be added to this GitHub repository before the deadline. You can add files to the repository through the web (github.com) interface or using the command line locally on your machine.