Problem 4: Statistical Description of Multivariate Data for a Real-Wor
Problem 4: Statistical Description of Multivariate Data for a Real-World Dataset [40 points]
To complete this task you have to use the crx.data file. This file crx.data contains data collected from credit card applications. All attribute names and values have been changed to meaningless symbols to protect the confidentiality of the data. The dataset is downloaded from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets.php).
This dataset is interesting because there is a good mix of attributes — continuous, nominal with small numbers of values, and nominal with larger numbers of values. There are also a few missing values. Read the data in R using the following command.
data <- read.table("path/crx.data", sep = ",");
Here, replace the path with the path of the file crx.data in your computer. After loading the data in R you can access each column using data[ , 1], data[ , 2], … , data[ , 15]. All the data will be in character format when you load it from crx.data you will have to convert the numeric columns from character to numeric using the as.numeric() function as follows. You can view the data using view(data) command.
attribute1 <- as.numeric(data[ , 2])
For missing values, NAs will be introduced by coercion.
There are 16 columns in the data the first 15 columns are the attributes of the data and the 16th column is the label of the data. You have to only analyze the attributes of the data.
- Find which attributes are the nominal attributes and which are continuous attributes.
- Identify the attribute/attributes with missing values (having NA). Drop the attributes with missing values from the data.
- Calculate the central tendency of the rest of the attributes. Remember for the nominal attribute you can only calculate the mode.
- Calculate the five-number summary of the numeric attributes.
- Show box plots for the numeric attributes and identify the attributes having outliers.
- Show pairwise scatter plots of the numeric attributes. Inspect the scatter plots and mention if each pair’s attributes are negatively correlated, positively correlated or there is no correlation.
*Do not forget to label the axes of the plots.
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.