Cleaning and Profiling Code
Cleaning and Profiling Code
Use only Hadoop MapReduce in this part of your project.
Do not use anything else.
You must write and submit 2 separate MapReduce jobs:
MR Job 1.
Data profiling – to explore your data
– Name the files: CountRecs.java, CountRecsMapper.java, CountRecsReducer.java
(Please use these exact names for your classes)
– This MR job counts the number of records in a dataset
– Run it on the original dataset, before cleaning, and output the number of records
– Run it on the cleaned dataset (result of MR Job 2 described below), output number of records – If the number of records don’t match, you should figure out why that is
– Re-submit a schema if it has changed.
MR Job 2.
Data cleaning – to avoid nasty exceptions later on in your analytic
– Name the files: Clean.java, CleanMapper.java, CleanReducer.java
(Please use these exact names for your classes)
– This MR job cleans the data – for example, by dropping columns you don’t need.
– It should write out a new file with only the columns you will use in your analytic.
– The selected columns for your data schema
FOR FULL CREDIT, PROVIDE THE CLASSES FOR EACH JOB
Requirements: based on the question | .doc file
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.
