ADTA – 5240
Team – B: [Every document and screenshot should be named as ‘Team-B’]
Overview:
This week you will work with your group on your final project.
Objectives
- Apply concepts learned about Hadoop Ecosystem
- Apply concepts learned in data preparation to preprocess data
- Construct an external table using basic SQL commands in BigQuery
- Develop queries in BigQuery.
- Construct a well-defined schema using basic HQL commands in Hive
- Develop queries in Hive
- Develop queries in Spark
Instructions
Each group will research their assigned use case. They will select a static dataset and streaming data source from the approved list provided or locate another and obtain the instructors’ approval.
Each group will create an executive summary. This summary should be between 400 and 550 words, not including the title page, references, or other supporting documents. It should read like a summary of your presentation, giving the use case project, stepping through the data lifecycle, identifying tools/applications used during certain phases of the data lifecycle, and concluding with the next steps for the data science or analyst teams. The executive summary is in Times New Roman, 12-point, with one-inch margins.
Each group will create a document with screenshots that includes the project and storage they created for their use case in GCP, setting up their Hadoop ecosystem, performing data processing with their static and streaming data, and performing queries in BigQuery, Hive, and Spark to ensure the quality of their data for the data science or analysts teams. Through each step, the team will take screenshots of their work and present them in a word document with brief explanations of the screenshots. The desciption should include the application used, the task performed, and why it was performed. Do not include how-to instructions.
Each group will create a presentation that tells a story using the data lifecycle as a guide, and they will present their work during the designated time. You may be creative with the presentation with PowerPoint. The presentation is a professional business presentation. Each member of the group should speak. After the presentation, the group will entertain questions from the audience. The presentation should be at least 10-15 minutes in length.
Meeting_Notes_Template:
I have provided the ‘Meeting_Notes_Template.docx’, please fill the provided template.
Approved Data Sources:
I have provided ‘Approved Data Sources.pdf’ please select a two datasets from any USE CASES Approved Data Sources provided in the pdf.
PPT and Word:
- Topic: Use cases from the discussion post
- Data: Use approved data sources (two or more)
- Executive Summary (25%): This paper should be between 400 and 550 words, not including the title page, code, and references.
- Screenshots (25%): These screenshots should show how you applied what you learned. Create a new project in GCP for this use case.
- Presentation (50%): The group will present
- Grading: This project is worth 20% of your final course grade. The Executive Summary will comprise 25% of this grade, screenshots 25%, and the presentation will be 50%
- Document Type: Word and PPT
Executive Summary Requirements:
- 400 to 550 words, not counting the title page, references, or supporting documents.
- Title page: Organization Name, Logo, Use case, group number, and group members
- Introduction: Introduce the use case and its purpose (Example: Data Engineering Request)
- Body: Step through the data lifecycle with your use case and the tasks you did
- Conclusion: Summarize and discuss the next steps for the data science and analyst teams
- Double-spaced Word Document
- References
Application Screenshots Requirements:
- GCP project & storage
- Hadoop
- OpenRefine
- BigQuery
- Hive
- Spark
Include an explanation (3-10 sentences) with the screenshots telling the application used and the task performed.
Supporting Documents:
- Reference page
- Meeting notes or Task board
- Data Sheet – List of Data sources and any
additional information such as the website
address
- Other documents
- Word Document
Meeting Notes Template:
- Date:
- Start and End time:
- Attendees:
- Note-taker:
- Notes:
- Decisions:
- Action Items:
Task board:
- Create a Task board using MS Teams – Planner,
Excel, or Word
- Task board Columns
- To-Do
- In Progress
- Review
- Done
- Task Info: Description, Owner, Due Date
Presentation Requirements:
- Business casual
- TELL A STORY
- Every group member must present
- 10-15 minutes to present
- 2 minutes for questions
- 10-20 PowerPoint Slides
- Title page: Organization Name, Logo, Use case, group number, and group members
- Outline or Agenda
- Every step of the data lifecycle – No definitions
- Hive and Spark SQL comparison chart
- A few of your screenshots (No more than 5)
- Cite the source on the slide if not your own words
Word document regarding rubrics instructions.
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.