For each game, calculate the total global sales as well as the total sales (still for each game) in each of North America, Europe, Japan, and other parts of the world.
Question 1 (3 points)
Use the vgsales data from the file vgsales.xlsx. For each game, calculate the total global sales as well as the total sales (still for each game) in each of North America, Europe, Japan, and other parts of the world.
Put these data together as one table where the left-most column is the game name, the middle 4 columns are the total sales in NA, EU, JP, and other sales, and the right-most column is the total global sales. Sort the data by total global sales. Show only the top-10 rows (ie, 10 games with the highest total global sales).
It is fine to take a screenshot of your data in RStudio, provided that the font is large enough that the TA can read it; you do not need to export the data from R and create a “pretty” table in another program.
Hint re importing data: To import the vgsales data into R, you can first convert the data to a CSV file and import the data as demonstrated in class with read_csv(). Or you can use the read_excel() function from the readxl package.
Hint re calculations: once the data are in R, you should be able to create the requested table with one set of piped-together commands. This is not a requirement and you will be awared full credit as long as you create the requested table using R any way you like.
Question 2 (5 points)
Import the Order and OrderDetail datasets from order.csv and orderdetail.csv. Use these datasets to calculate the total revenue (in millions) per shipping region. Also calculate the percent of revenue for each shipping region. Order the rows by total revenue such that the shipping region with the largest total revenue is at the top.
Revenue can be calculated as Unit Price * Quantity * (1 – Discount).
Your table should have nine rows (one per shipping region) and three columns (the shipping region, the total revenue in millions, and revenue per region as a percent of all revenue).
Question 3 (5 points)
Continue to use the Order and OrderDetail data from question 2, as well as the revenue values you calculated. Use the “unaggregated” dataset with 621,883 rows (each row is a line item from an order). Drop the 73 rows with a missing (ie, N/A) value for the shipping year.
Then use facet_grid()
to create a grid of histograms on this line-item data.
Each plot in the grid will be a histogram of the log(revenue)
revenue is a very positively skewed distribution, so we are plotting the natural log of revenue (R uses the log() function to calculate the natural log)
Each row of the grid should be a Shipping Region.
Each column of the grid will be the year in which an order was shipped.
Before creating the plot, you will need to create a ship_year
variable. To do that, use the following code to help you:
mutate( ship_date = as.Date(shippeddate, "%m/%d/%Y"),
ship_year = lubridate::year(ship_date))
This code says that we want to convert the shippeddate
variable into a date using the as.Date()
function. We have to tell the as.Date()
function how the date us currently written, and so we use "%m/%d/%Y"
. Then we use the year()
function from the lubridate package to “extract” the year values.
You can use code like filter(!is.na(ship_year))
to remove rows where the shipping year has a missing value.
- Question 4 (5 points)
Use the smartphone customer dataset. Scale the 6 phone-use variables (gaming, chat, maps, video, social, and reading). Then run k-means on all 6 variables with K=3. Answer the following two questions:
- How many customers are in each cluster?
- What is the within-cluster sum of squares value?
- (2 points) Question 5
Plot gaming vs reading minutes as a scatter plot and color the points according to their cluster assignment from question 4. Why do the clusters “overlap” in this plot — ie, the points “mix” near the cluster boundaries — but the clusters did not overlap when we did the k-means example in class?
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.
