College Pal
Connecting to a pal for your paper
  • Home
  • Place Order
  • My Account
    • Register
    • Login
  • Confidentiality Policy
  • Samples
  • How It Works
  • Guarantees

Sms or Whatsapp only : US:+12403895520

 

email: [email protected]
November 29, 2023

You will identify a data driven business problem that requires preparation of the data. This preparation involves Extracting data (from 3 or more sources), Transforming (or cleaning)

computer science

 

Tasks to complete

Goal: This project will be used to integrate concepts developed from all the assignments in the second half of this class, specifically. You will identify a data driven business problem that requires preparation of the data. This preparation involves Extracting data (from 3 or more sources), Transforming (or cleaning) the data before Loading it into a database for analysis. In other words, you will experience, first-hand, the ETL process of Data management – preparing the data for further analyses.

Options: You can take this project in one of two directions: (1) Identify a large file, clean the data and normalize it into three or more tables OR (2) Identify three or more large data sources, clean the data and merge them into a denormalized table for analysis. In either case, you will need to identify what you plan to learn from the cleaned and loaded data.  BOTTOM LINE: Can you do the analyses WITHOUT going through this ETL process. If so, what's the point?!

Resource: This articleLinks to an external site.

In preparation for your project this term, I need you to do some digging to identify sources and ideas for a decent project.

There are a couple of decisions that have to be made. And so, I am making part of the project a "deliverable" so you can begin mulling over it. Most ETL tasks involve cleaning and integration. For integration, it is vital that you have an attribute that is common across all three data sets

Cleaning

Cleaning is one of the most important steps as it ensures the quality of the data in the data warehouse. Cleaning should perform basic data unification rules, such as:

  • Making identifiers unique (sex categories Male/Female/Unknown, M/F/null, Man/Woman/Not Available are translated to standard Male/Female/Unknown)
  • Convert null values into standardized Not Available/Not Provided value
  • Convert phone numbers, ZIP codes to a standardized form
  • Validate address fields, convert them into proper naming, e.g. Street/St/St./Str./Str
  • Validate address fields against each other (State/Country, City/State, City/ZIP code, City/Street).

Transform

The transform step applies a set of rules to transform the data from the source to the target. This includes

  • converting any measured data to the same dimension (i.e. conformed dimension) using the same units so that they can later be joined.
  • generating surrogate keys or FKs so that you can join data from several sources,
  • generating aggregates
  • deriving new calculated values,
  • Adding columns to create PKs and/or FKs

Data Integration

It is at this stage that you get the most value for the project. This typically means you are adding some attribute from a related set that adds 'Color' to the data. Perhaps Census data to labor data or other demographic data. The challenge is to locate data that are relatable.

Project direction: You will need to complete a datamart with significant pre-processing (ETL) activities. 

Requirements:

  1. Problem being solved:  What do you propose to learn from this data? List several of these business questions and show how your project solution (data set) could answer them.
  2. Tools: You must complete the entire project using Visual Studio. OR you can do this with some other tool of your choice (ETL) like Power BI or tableauLinks to an external site..
  3. Volume: Total result data set must add up to at least 5k records, but not more than 100k.
  4. Destination: SQL server table(s). Depending on the direction you are taking, you can move all the data to a single CSV file and dump it into SQL server at the end or direct the final destination tables to SQL server.
  5. Transformation – it must include TWO new columns (for each final destination) that is populated by (a) the current date and time so you know when that data was brought into the final dataset and (b) a second one to know where the data came from (source file name). This may be done through SSIS or in SQL server.
    Note: Filename capturing works only when the source is a flat file.  So, if your source is NOT a flat file,  you may want to make a CSV file an intermediate destination and then use this file as the source (Hint: Use derived column transformation to add a column)
    In addition it must include at least 3 of the following transformations: data conversion, derived column, data split, lookup, merge, merge join, multicast, union all, fuzzy lookup or any of the transforms not covered in class. 

Data sources: You are welcome to use datasets from work that has been sufficiently "anonymizedLinks to an external site.". In fact this itself is a valuable transformation task that you can then use to protect your data and make it available for additional analysis/exploration. There are many public data sets that can be used (see "data sources" tab)

 

Project ideas & Data sets [for ETL]

Goal: Explore various datasets (see below) to see what is missing in any of the data and how you can enhance it by combining info from other seemingly unconnected data (industry, education, poverty and liquor shops?). The links below serve as a starting point for your exploration. Get started!’

Expectation: You can take this project in one of two directions: (1) Identify three or more large data sources, clean the data and merge them into a denormalized table for analysis.  OR (2) Identify a large file, clean the data and normalize it into three or more tables so that when you rejoin them, you get more accurate answers to your questions. Sometimes this process may require you to get “reference sources” so your dimension tables (destinations in Model Y above) are more complete/accurate.  

In either case, you will need to identify what you plan to learn from the cleaned and loaded data.

There are two main ideas to keep in mind: (1) Cleaning badly prepared data and (2) integrating data from multiple sources. An ETL project usually involves BOTH of these.

When integrating data from more than one source, you need to make sure that they can be linked in the first place. In other words, is there something in common between the two data sets? Some kind of identifier like we use as PK and FK? If not, can you create it?

As you review the following sources for ideas, look for files that can be linked. Otherwise, all you have is data!

Note: You don’t have to get ALL your data from a single source. As long as they are related, you can draw from multiple sources.

I ALREADY HAVE THE DATA SOURCES AND PROJECT BACKGROUND TO WORK WITH. YOU JUST HAVE TO DO THE PROJECT ETL AND PRESENTATION. FOR THIS PROJECT YOU NEED TO USE VISUAL STUDIO 2019(PREFFERED) OR POWER BI OR TABLEAU. I NEED IN A SHORT TIME SO PLEASE BID ONLY IF YOU ARE SURE YOU CAN DO IT . I WILL BE UPLOADING THE DATA FILES HERE.

  • attachment

    PROJECT_background.docx

  • attachment

    COVID-19_Vaccination_Coverage__Citywide.csv

  • attachment

    Archive__COVID-19_Vaccination_and_Case_Trends_by_Age_Group__United_States.csv

  • attachment

    COVID-19_Outcomes_by_Vaccination_Status.csv

"Comprehensive COVID-19 Data Analysis: A Deep Dive into Vaccination, Case Trends, and Outcomes" The ETL project aims to provide important insights into the effectiveness as well as the effect of COVID-19 vaccinations across various demographic groups. The project will reveal trends and differences in vaccination rates and outcomes, such as infection and hospitalization rates, among different age groups, ethnicities, and genders by combining and analysing different datasets. This analysis is essential for informing public health decisions and strategies for dealing with current and future health crises. For missing values, use data imputation strategies. Standardise and normalise data formats such as date and time, as well as categorical labels. Transforming: Measurements should be converted to a unified scale. Surrogate keys should be used for seamless data integration. Using ETL tools, automate the addition of metadata columns (such as source and timestamp). Data Usage and Sources Data Used: COVID-19 Vaccination Coverage, Citywide COVID-19 Outcomes by Vaccination Status COVID-19 Vaccination and Case Trends by Age Group Sources:  https://data.gov/ Total rows: Approximately 10,660 (738 + 3591 + 5331 from each file). Keys: Primary Keys (PK): Composite keys likely formed by 'Week End', 'Age Group', and other demographic fields. Foreign Keys (FK): Used for linking datasets, possibly through common fields like 'Week End', 'Age Group'. Decision Support and Its Relationship to Excel Decision Support: Analyses COVID-19 vaccination effectiveness and outcomes to inform public health strategies. Identifies demographic groups that are at higher risk or have lower vaccination rates in order to target interventions. In comparison to Excel: Excel can perform basic analysis but is limited in its ability to process large datasets and complex ETL operations. For complex datasets, Excel lacks solid data integration and transformation capabilities. Benefits of This Approach: Increased data processing power for large datasets. ETL capabilities that are more sophisticated for cleaning, transforming, and integrating diverse data sources. Allows for more complex analyses and visualizations, which are required for thorough decision-making.

,

date Season Measure Age Group Race/Ethnicity Sex Population Size Count Percent
9/16/2023 2023-2024 UpToDate_2023_2024 0-17 yrs All Race/Ethnicities All Sexes 545173 6 0
9/23/2023 2023-2024 UpToDate_2023_2024 0-17 yrs All Race/Ethnicities All Sexes 545173 187 0
9/30/2023 2023-2024 UpToDate_2023_2024 0-17 yrs All Race/Ethnicities All Sexes 545173 633 0
9/16/2023 2023-2024 UpToDate_2023_2024 0-4 yrs All Race/Ethnicities All Sexes 149804 0 0
9/23/2023 2023-2024 UpToDate_2023_2024 0-4 yrs All Race/Ethnicities All Sexes 149804 44 0
9/30/2023 2023-2024 UpToDate_2023_2024 0-4 yrs All Race/Ethnicities All Sexes 149804 157 0
9/16/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Asian, non-Latinx All Sexes 8903 0 0
9/23/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Asian, non-Latinx All Sexes 8903 8 0
9/30/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Asian, non-Latinx All Sexes 8903 20 0
10/7/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Asian, non-Latinx All Sexes 8903 38 0
10/14/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Asian, non-Latinx All Sexes 8903 112 0
10/21/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Asian, non-Latinx All Sexes 8903 188 0
10/28/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Asian, non-Latinx All Sexes 8903 235 0
9/16/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Black, non-Latinx All Sexes 43530 0 0
9/23/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Black, non-Latinx All Sexes 43530 1 0
9/30/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Black, non-Latinx All Sexes 43530 7 0
10/7/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Black, non-Latinx All Sexes 43530 17 0
10/14/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Black, non-Latinx All Sexes 43530 51 0
9/16/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Latinx All Sexes 52737 0 0
9/23/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Latinx All Sexes 52737 3 0
9/30/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Latinx All Sexes 52737 13 0
10/7/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Latinx All Sexes 52737 39 0
10/14/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Latinx All Sexes 52737 133 0
10/21/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Latinx All Sexes 52737 237 0
11/4/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Latinx All Sexes 52737 369 0
9/16/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Other, non-Latinx All Sexes 8900 0 0
9/23/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Other, non-Latinx All Sexes 8900 0 0
9/30/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Other, non-Latinx All Sexes 8900 2 0
10/7/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Other, non-Latinx All Sexes 8900 8 0
10/14/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Other, non-Latinx All Sexes 8900 32 0
10/21/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Other, non-Latinx All Sexes 8900 73 0
10/28/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Other, non-Latinx All Sexes 8900 109 0
9/16/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Unknown All Sexes 0
9/23/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Unknown All Sexes 9
9/30/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Unknown All Sexes 33
10/7/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Unknown All Sexes 58
10/14/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Unknown All Sexes 136
10/21/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Unknown All Sexes 209
10/28/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Unknown All Sexes 220
11/4/2023 2023-2024 UpToDate_2023_2024 0-4 yrs Unknown All Sexes 235
9/16/2023 2023-2024 UpToDate_2023_2024 0-4 yrs White, non-Latinx All Sexes 35734 0 0
9/23/2023 2023-2024 UpToDate_2023_2024 0-4 yrs White, non-Latinx All Sexes 35734 23 0
9/30/2023 2023-2024 UpToDate_2023_2024 0-4 yrs White, non-Latinx All Sexes 35734 82 0
10/21/2023 2023-2024 UpToDate_2023_2024 0-4 yrs White, non-Latinx All Sexes 35734 1024 0
11/4/2023 2023-2024 UpToDate_2023_2024 0-4 yrs White, non-Latinx All Sexes 35734 1560 0
9/16/2023 2023-2024 UpToDate_2023_2024 05-11 yrs All Race/Ethnicities All Sexes 209779 0 0
9/23/2023 2023-2024 UpToDate_2023_2024 05-11 yrs All Race/Ethnicities All Sexes 209779 46 0
9/30/2023 2023-2024 UpToDate_2023_2024 05-11 yrs All Race/Ethnicities All Sexes 209779 259 0
10/7/2023 2023-2024 UpToDate_2023_2024 05-11 yrs All Race/Ethnicities All Sexes 209779 769 0
10/14/2023 2023-2024 UpToDate_2023_2024 05-11 yrs All Race/Ethnicities All Sexes 209779 1933 0
9/16/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Asian, non-Latinx All Sexes 10628 0 0
9/23/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Asian, non-Latinx All Sexes 10628 6 0
9/30/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Asian, non-Latinx All Sexes 10628 38 0
10/7/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Asian, non-Latinx All Sexes 10628 104 0
10/14/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Asian, non-Latinx All Sexes 10628 206 0
10/21/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Asian, non-Latinx All Sexes 10628 343 0
10/28/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Asian, non-Latinx All Sexes 10628 446 0
11/4/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Asian, non-Latinx All Sexes 10628 518 0
9/16/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Black, non-Latinx All Sexes 66473 0 0
9/23/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Black, non-Latinx All Sexes 66473 4 0
9/30/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Black, non-Latinx All Sexes 66473 20 0
10/7/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Black, non-Latinx All Sexes 66473 66 0
10/14/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Black, non-Latinx All Sexes 66473 143 0
10/21/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Black, non-Latinx All Sexes 66473 255 0
10/28/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Black, non-Latinx All Sexes 66473 321 0
11/4/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Black, non-Latinx All Sexes 66473 419 0
9/16/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Latinx All Sexes 85789 0 0
9/23/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Latinx All Sexes 85789 4 0
9/30/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Latinx All Sexes 85789 36 0
10/7/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Latinx All Sexes 85789 115 0
10/14/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Latinx All Sexes 85789 369 0
9/16/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Other, non-Latinx All Sexes 9113 0 0
9/23/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Other, non-Latinx All Sexes 9113 0 0
9/30/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Other, non-Latinx All Sexes 9113 10 0
10/7/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Other, non-Latinx All Sexes 9113 24 0
10/14/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Other, non-Latinx All Sexes 9113 63 0
9/16/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Unknown All Sexes 0
9/23/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Unknown All Sexes 5
9/30/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Unknown All Sexes 21
10/7/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Unknown All Sexes 51
10/14/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Unknown All Sexes 117
10/21/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Unknown All Sexes 176
10/28/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Unknown All Sexes 189
11/4/2023 2023-2024 UpToDate_2023_2024 05-11 yrs Unknown All Sexes 201
9/16/2023 2023-2024 UpToDate_2023_2024 05-11 yrs White, non-Latinx All Sexes 37776 0 0<

Collepals.com Plagiarism Free Papers

Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.

Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS

Why Hire Collepals.com writers to do your paper?

Quality- We are experienced and have access to ample research materials.

We write plagiarism Free Content

Confidential- We never share or sell your personal information to third parties.

Support-Chat with us today! We are always waiting to answer all your questions.

Vulnerability Case Study For the project, 15 minute presentation, a Powerpoint presentation (< 15 slides), and an annotated bibliography of the sources upon which you rely. THE VULNER For this unit’s assignment, think of an organization that needs a website (or one that needs one to be modified). This does not have to be a real client. In a Microsoft Word document,

Related Posts

computer science

The goal of this project is to integrate your various components into polished, professional products. Follow the instructions below to ensure a successful

computer science

Cyber operations have a long and storied history that has evolved tremendously over the last few decades. Cyber operations, and in particular its subset of cyber warfare, came

computer science

In this assignment, you will develop strategies to protect the data and file systems of a fictitious software development company. The specific course learning outcome associa

Why Choose Us

Best Essay Writing Services- Get Quality Homework Essay Paper at Discounted Prices

At the risk of sounding immodest, we must point out that we have an elite team of writers. Ours isn’t a collection of individuals who are good at searching for information on the Internet and then conveniently re-writing the information obtained to barely beat Plagiarism Software. Who can’t do that?

Our writers have strong academic backgrounds with regards to their areas of writing. A paper on History will only be handled by a writer who is trained in that field. A paper on health care can only be dealt with by a writer qualified on matters health care. Thesis papers will only be handled by Masters’ Degree holders while Dissertations will strictly be handled by PhD holders. With such a system, you needn’t worry about the quality of work. Quality isn’t just an option, it is the only option. We don’t just employ writers, we hire professionals.

We have writers spread into all fields including but not limited to Philosophy, Economics, Business, Medicine, Nursing, Education, Technology, Tourism and Travels, Leadership, History, Poverty, Marketing, Climate Change, Social Justice, Chemistry, Mathematics, Literature, Accounting and Political Science.

Our writers are also well trained to follow client instructions as well adhere to various writing conventional writing structures as per the demand of specific articles.

They are also well versed with citation styles such as APA, MLA, Chicago, Harvard, and Oxford which come handy during the preparation of academic papers.

They also have unrivalled skill in writing language be it UK English or USA English considering that they are native English speakers. You also needn’t worry about logical flow of thought, sentence structure as well as proper use of phrases.

Our writers are also not the kind to decorate articles with unnecessary filler words. We respect your money and most importantly your trust in us. In writing, we will be precise and to the point and fill the paper with content as opposed to words aimed at beating the word count.

Our shift-system also ensures that you get fresh writers each time you send a job. This helps overcome occupational hazards brought about by fatigue. Hence, quality will consistently be at the top.

From our writers, you expect; good quality work, friendly service, timely deliveries, and adherence to client’s demands and specifications.

Once you’ve submitted your writing requests, you can go take a stroll while waiting for our all-star team of writers and editors to submit top quality work.

How Our Website Works

Get an Essay from Us

College Essays is the biggest affiliate and testbank for WriteDen. We hire writers from all over the world with an aim to give the best essays to our clients.

Our writers will help you write all your homework. They will write your papers from scratch. We also have a team of editors who read each paper from our writers just to make sure all papers are of HIGH QUALITY & PLAGIARISM FREE.

Step 1
To make an Order you only need to click ORDER NOW and we will direct you to our Order Page. Then fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline. Deadline range from 6 hours to 30 days.

Step 2
Once done with writing your paper we will upload it to your account on our website and also forward a copy to your email.

Step 3
Upon receiving your paper, review it and if any changes are needed contact us immediately. We offer unlimited revisions at no extra cost.

Is it Safe to use our services?
We never resell papers on this site. Meaning after your purchase you will get an original copy of your assignment and you have all the rights to use the paper.

Pricing and Discounts
Our price ranges from $8-$14 per page. If you are short of Budget, contact our Live Support for a Discount Code. All new clients are eligible for 20% off in their first Order. Our payment method is safe and secure.
Please note we do not have prewritten answers. We need some time to prepare a perfect essay for you.

Recent Posts

  • Discuss some common strategies and pitfalls you have seen with business continuity
  • Describe the purpose of applying nursing theory to patient care
  • Identify risks of potential health problems related to infection control during a disaster or catastrophic event
  • Graham v Connor
  • Draft of Liberty IRB Materials Assignment Instructions
College Pal

All Rights Reserved Terms and Conditions
College pals.com Privacy Policy 2010-2018