This problem is related to Nearest neighbors classifiers described in section 9.5 in “Modern Statistics with R” – https://modernstatisticswithr.com: Fit a kNN classification model to the wine data, using pH, alcohol, fixed.acidity, and residual.sugar as explanatory variables. Evaluate its performance using 10-fold cross-validation, using AUC to choose the best k.
1. Drills with R on K-NN models
This problem is related to Nearest neighbors classifiers described in section 9.5 in “Modern Statistics with R” – https://modernstatisticswithr.com: Fit a kNN classification model to the wine data, using pH, alcohol, fixed.acidity, and residual.sugar as explanatory variables. Evaluate its performance using 10-fold cross-validation, using AUC to choose the best k.
To solve the problem, you’ll need to load the data and libraries with:
# Import data about white and red wines:
white <- read.csv(“https://tinyurl.com/winedata1”,sep = “;”)
red <- read.csv(“https://tinyurl.com/winedata2”,sep = “;”)
# Add a type variable:
white$type <- “white”
red$type <- “red”
# Merge the datasets:
wine <- rbind(white, red)
wine$type <- factor(wine$type)
install.packages(‘caret’, dependencies = TRUE)
library(caret)
# to visualize results you need the following
install.packages(‘MLeval’, dependencies = TRUE)
library(MLeval)
For the submission:
1. Provide the commands in plain text that you used to solve the problem.
Attach the figure that resulted after command: plots$roc
Output after executed command: plots$optres[[1]][13,]
Attach the figure that resulted after command: plots$cc
2. Dissimilarities between data objects
This project demonstrates how to measure similarities between data objects. These topics described are mostly in chapter 6 Statistical Machine Learning from ‘Practical Statistics for Data Scientists’. Cover in the project the following:
Find some data examples and show examples of calculating
Euclidean distance
L1 distance
Prove or disprove that Euclidean and L1 distance satisfy
Positivity d(x,y) >= 0 for all x and y, d(x,y) == 0 only if x == y.
Symmetry d(x,y) == d(y,x) for all x and y.
Triangle Inequality d(x,z) <= d(x,y) + d(y,z) for all points x, y, and z
Explain why it is not possible or why it is possible to
rearrange data so Euclidean distance gives the same meaning as Hamming distance
show that measure d=1-cos(x,y) satisfies positivity, symmetry, and triangle Inequality
Draw conclusions about what is important when choosing the distance measure for the evaluation of dissimilarities between data objects.
Assignment 1 and 2 are to be done in 2 different papers in APA format
Requirements: 5-7 pages
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.