Descriptive Analysis using Spark SQL
ICS 574 – HW #2 Descriptive Analytics 1. Solve the following questions on Google Colab or Databricks using Spark SQL a. [4 pts] Search the internet for a big dataset of at least 0.5 GB. b. [4 pts] Create a DataFrame from the dataset. c. Using the DataFrame and implement the following aggregation functions. i. [4 pts] Aggregation with grouping ii. [4 pts] Aggregation with pivoting iii. [4 pts] Aggregation with rollups and cubes d. Spark SQL supports the following window functions. Apply these functions on the DataFrame i. [10 pts] Ranking functions 1. rank 2. dense_rank 3. percent_rank 4. row_number 5. ntile ii. [10 pts] Analytic functions 1. cume_dist 2. first_value 3. last_value 4. lag 5. lead Deliverables • One pdf file which contains the following. o A cover page which includes, your KFUPM ID, name, HW number, and date o A description of the big dataset and its source. o Each SQL statement and a snapshot of its output o Problems you faced if any. Note: • • Submit the homework before 11:59pm Saturday April 27, 2024. There are many YouTube videos that teach how to use Spark in Databricks or Colab.
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.