Analyzing Air Quality Data Using Distributed Cloud Services and Spark
You may use any dataset of your choice as long as it has more than 1,000 observations with at
least 2 numeric predictors. This dataset should be relevant to your interest or research (should
you pursue a thesis).
This project should have the following core components:
1) A data lake or data warehouse – defend your choice. Why are you choosing to store your
data in one over the other
2) Connect data lake or data warehouse to a distributed cloud service such as AWS,
AZURE, or GCP
3) Run your Spark application over those distributed services
4) Documentation – This is the most CRUCIAL STEP. The final product should be a report
(NO LESS than 3 pages), detailing the steps you took to get to the results, and what the
final results are.
5) This report should also include an explanation of your goal, datasets used, and the
technical approach to get to the end result.
6) THINK OF IT THIS WAY: I should be able to follow the approaches from your report and
replicate whatever you did.
Criteria:
1- Includes data storage
2- Includes distributed cloud service
3- Includes final report
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.
