Develop an Apache Spark application per provided specifications and Crunchbase Open Data Map orga
Develop an Apache Spark application per provided specifications and Crunchbase Open Data Map organizations dataset download, using PySpark in Google Colab.
Details
Use the Week 11 Class Exercise downloads a reference:
- Create a new notebook in Google Colab
- Download Crunchbase ODM Orgs CSV download file and upload it to the "Files" section in your Colab notebook (may take a few minutes to upload)
- Read the Crunchbase Orgs dataset into Spark DataFrame
Implement PySpark code using DataFrames, RDDs or Spark UDF functions:
- Find all entities with the name that starts with a letter "F" (e.g. Facebook, etc.):
- print the count and show() the resulting Spark DataFrame
- Find all entities located in New York City:
- print the count and show() the resulting Spark DataFrame
- Add a "Blog" column to the DataFrame with the row entries set to 1 if the "domain" field contains "blogspot.com", and 0 otherwise.
- show() only the records with the "Blog" field marked as 1
- Find all entities with names that are palindromes (name reads the same way forward and reverse, e.g. madam):
- print the count and show() the resulting Spark DataFrame
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.