Final Project
The final project is to replicate the pipeline approach on a dataset (or datasets) of your choosing.
The final deliverable will be a web-based data visualization and accompanying description including a summary of the results and the methods used in each step of the process (collection, analysis and visualization).
Due dates
Note: Due dates are tentative and subject to change
Written proposal due date: Monday, December 4th (or earlier)
Project due date: Wednesday, December 20th at 11:59 PM
Final projects must receive prior approval in the form of a written proposal.
Deliverable
The final deliverable must include all both of the following items:
A Quarto website that summarizes the project and main results, using a combination of text, images, and interactive visualizations.
The website should be hosted on GitHub Pages and should be generated from the Quarto website template repository. The website should be a multi-page document that explains in depth all aspects of the project’s implementation as well as the final results.
All code/notebooks/spreadsheets/datasets used. These materials should be submitted to your own specific GitHub repository, which can be created using the link below, depending on your section:
Section 401: https://classroom.github.com/a/iA1Z_0Dv
Section 402: https://classroom.github.com/a/FX7-CcSS
In this repository, you should include all of the code used to collect, analyze, and visualize the data. You should also include a README file that includes the repository URL for the Quarto website set up in #1 above.
Important
- Be sure to include the URL for your Quarto website in the README of the main, submission repository.
- Be sure to include the names of everyone who worked on the final project somewhere in the README!
Group projects
Group projects are permitted, with a maximum number of group members of 3. You are also permitted to combine this assignment with one you are working on for another course. But keep in mind that if you choose either of these options, the expectations for the project’s scope will be adjusted accordingly.
If you combine this assignment with one from another course, the portion that you are submitting for this final project must be a clearly defined addition to the original project. In such a case, you will be graded only on the portion submitted for this course, not on the entire project.
Guidelines
The project is open-ended. The topic and technologies used are up to you. However, the project must be sufficiently complex and challenging. Below is the list of possible requirements for the project. The number of requirements you must satisfy depends on the number of people in your group:
- 1 person: 2 requirements
- 2 people: 3 requirements
- 3 people: 4 requirements
The list of requirements is:
- Data is collected through a means more sophisticated than downloading (e.g. scraping, API).
- At least one of the datasets contains more than 1,000,000 rows.
- It combines data collected from 3 or more different sources.
- The analysis of the data is reasonably complex, involving multiple steps (geospatial joins/operations, data shaping, data frame operations, etc).
- You use an osmnx or pandana to perform an analysis of street network data
- You use scikit-learn to perform a clustering analysis
- You analyze raster data using rasterio, rasterstats, or xarray.
- You perform a machine learning analysis with scikit-learn as part of the analysis.
- The project includes a deployed Panel dashboard
- The project includes multiple interactive visualizations that include a significant interactive component (cross-filtering, interactive widgets, etc)
As a rough guideline, you should shoot for something that is 3-4 times as involved as the required assignments.
Grading
The final project is worth 45% of the final grade and will be graded on four criteria:
- Concept: Is it sufficiently complex/challenging/sophisticated? Is the final product useful/interesting/creative?
- Technical implementation: Was it well thought out? Was each step done correctly? Does it work as described? Is it consistent with the proposal?
- Visualization: How well does the data visualization serve its purpose? Does it tell a clear story? Are the colors/layout/titles well-chosen?
- Writeup: Is all of the above explained clearly? The writeup should be a multi-page document that explains in depth all aspects of the project’s implementation as well as the final results.
Examples from Past Semesters
Note: Previous semesters used a different web-based template for presenting project results, so past projects will appear different. The template used in this semester is the Quarto website template.
- An Analysis System for Taxi Data: A series of planning, visualization and prediction tools around taxi ridership.
- Hospitality in the Era of Airbnb: An analysis of the impact of Airbnb on the hospitality industry in New York.
- A Look at Reddit’s Values, IRL Events, and Possible Bad Actors: An analysis of Reddit data to determine what redditors “value”
- Orange Line Shutdown Effects on Bluebike Ridership: An analysis examining the effect of the orange line shutdown on Bluebike ridership in Boston, MA