September 5, 2025

Unit 3 Self-Check Assignment 3 Diabetes Forecasting

Follow the attach instruction to complete this work.

This assignment builds on your previous work and introduces you to predictive analytics through a forecasting method called a binary classifier. You will then work on how to visualize and understand a binary classifier.

Receive an introduction to binary classifiers, logistic regression, and the results, including true-positive, false-positive, true-negative, and false-negative results
Run a binary classification algorithm on our diabetes data
Visualize the results in Tableau

For the following assignment, please review the instructions on how to download a blocked file in Chrome before you start your assignment.

Assignment Instuctions: Unit 3 Self-Check Assignment 3 Diabetes Forecasting
Google Colab Notebook: Diabetes_Classifier
Dataset: Diabetes
How to Download a Blocked File in Chrome

can it be done in 4 hours

Unit 3: Self-Check Assignment 3: Diabetes Forecasting

This assignment builds on all of our previous work and introduces you to predictive analytics through a forecasting method called a binary classifier. We will then work on how to visualize and understand a binary classifier.

In this assignment, you will:

· Receive an introduction to binary classifiers, logistic regression, and the results, including true- positive, false-positive, true-negative, and false-negative results

· Run a binary classification algorithm on our diabetes data

· Visualize the results in Tableau

For this assignment, follow these steps:

1) Download the diabetes dataset if you need it

2) Learn about binary classifiers

3) Perform binary classification using a logistic regression in Python (this has been written for you; all you need to do is press ‘run’ in Colab)

4) Download the results

5) Visualize the results in Tableau

Attachments:

· Diabetes_Classifier.ipynb

· Diabetes.csv dataset

Download the Diabetes Dataset

If you need to download the dataset again, click on the following link:

Pima Indians Diabetes Database

(We just used this dataset in a previous assignment, so you very well may already have it handy.)

Learn About Binary Classifiers

The word “binary” in this context means “just two options.” Some common binary outcomes could be whether a consumer will respond to direct marketing outreach (binary outcomes: they buy or they don’t buy), whether a streaming subscriber will like a certain movie (binary outcomes: they give it thumbs-up or thumbs-down), or whether an attempted financial transaction is legitimate (binary outcomes: it’s legitimate, or it’s a fraud). The important part of a binary outcome is that there are exactly two options.

A classifier is an algorithm that takes as its input one or more input variables and, as its output, makes a prediction about the value of a different variable. The prediction values are constrained to be on a pre-selected list.

A binary classifier, then, is an algorithm that takes as its input one or more variables and, as its output, classifies the results into one of two mutually exclusive categories:

Problem Domain	Possible Input Variables (can have lots)	Binary Output Variable (2 values only)
Direct marketing	Age, income, gender of the consumer	Consumer buys or does not buy
Streaming subscriptions	Other movies they like, age of streamer, subscription price	Thumbs-up or thumbs-down for this movie
Financial transactions	Dollar amount of transaction, country of origin, frequency of transaction, whether or not the person has bought from this vendor before	Transaction is marked as legitimate, or transaction is flagged as fraudulent

Question 1: Understanding the Problem

In the diabetes dataset, what is/are the possible input variable(s)? (Input variables are the things we will use to make our prediction.) Select all that apply.

A. Glucose

B. Insulin

C. BMI

D. Age

E. Blood pressure

F. Outcome

Question 2: Understanding the Problem

In the diabetes dataset, what is/are the possible output variable(s)? (An output variable is the thing we want to predict.) Select all that apply.

A. Glucose

B. Insulin

C. BMI

D. Age

E. Blood Pressure

F. Outcome

There are many algorithms which can be used in data science for classification. Exactly how to determine which algorithm should be used, and how to evaluate its results, is beyond the scope of this course. But we will give you a very basic overview of how predictive analytics models work here. In the learning resources for this unit, we have provided a video from StatQuest about logistic regression. His example in predicting obesity in mice is very close to what we are doing here.

Question 3: What We Are Trying to Do Here with Logistic Regression

Which statement most closely resembles what we are trying to do here with our logistic regression binary classifier?

A. We want to predict whether or not a person will have diabetes (our binary outcome). We want to use some combination of glucose, insulin, BMI, and other data, and we realize that the relationship might not be linear. If you double the BMI, you might not double the chances of having diabetes.

B. We want to predict whether or not a person will have diabetes (our binary outcome). We want to use some combination of glucose, insulin, BMI, and other data, and we expect that the relationship will be linear for all variables. In other words, if you double glucose, you will double the diabetes. If you double insulin, you will double the diabetes. And if you double glucose and insulin, you will have four times the diabetes.

C. We want to predict the BMI of a person based on their diabetes status. We want to use the logistic regression S-curve to determine what the 25th, 50th, 75th, and 99th percentiles of BMI for diabetic and non-diabetic people in this sample are.

D. We want to predict the S-curve-shaped interrelationships between BMI, age, glucose, pregnancies, and other data. We want to be able to see, as age goes up, what happens to BMI, glucose, and pregnancies with a valid regression with a solid P-value.

E. We want to predict the log odds of having diabetes because mathematically, this will solve the problem that a straight-line linear relationship will often exceed 100%, especially when some numbers are outliers (like age of 80+ years or BMI at age 50+).

With binary classifiers, we typically build the model on our training data and then test the model (to see how good the predictions actually were) on the testing data. We then collect the results of our testing in a confusion matrix. You will find a learning resource about confusion matrices from StatQuest.

Question 4: Our Diabetes Model Confusion Matrix

Let’s say we want to predict whether a person has diabetes, and we are using the following confusion matrix:

	Person actually has diabetes	Person actually does not have diabetes
Person is predicted to have diabetes	A	B
Person is predicted to not have diabetes	C	D

Match the cell with its label

(True positive, or TP)

(False positive, or FP)

(False negative, or FN)

(True negative, or TN)

Question 5: Practicing Our TP/TN/FP/FN Terminology

Let’s say we have a person with a glucose of 136, insulin of 130, and BMI of 28.3, and they are 42 years old. Our logistic regression model predicts that this person will not have diabetes. However, their medical records indicate that they do indeed have diabetes. Which phrase should be used to describe this situation?

A True positive

B False positive

C False negative

D True negative

Perform Binary Classification Using Logistic Regression in Python

Now we are going to run a binary classification predictive analytics algorithm in Python and review the results. You won’t have to write any code, but you will be running code which has been written for you.

1. Go to your browser and set up a new instance of Google Colab at Welcome to Colaboratory.

2. Upload two files:

a. Upload the “Diabetes_Classifier.ipynb” as a notebook:

A screenshot of a computer

b. Upload the “diabetes.csv” as a file uploaded to session storage:

A screenshot of a computer

Alt text: Google Colab

3. Run the first cell, the classifier model. You can ask ChatGPT to explain this to you more fully, but basically what we are doing here with this code is:

a. Importing a bunch of other code written by other people to help us build the model

b. Reading in the diabetes.csv dataset

c. Splitting the data into a training dataset (which we will use to build our logistic regression prediction model) and a testing dataset (which we will use to tell how good our model really was)

d. Running the model on our training data

e. Evaluating the model on our testing data

4. When the code in this cell has finished running, it gives a little confusion matrix. (Note this confusion matrix has its labels switched from the way StatQuest did them. If you are keeping close track of these things, you will notice that the matrix printed from this code has the actual values on the left and the predicted values on the top. If you are not keeping close track of these things, you don’t need to keep close track of this switch either.)

StatQuest

Alt text: StatQuest

5. Run the next cell to generate the output file we will use to visualize the results in Tableau. Your output should look something like this, and you should have a "diabetes_predicted.csv" file available for download. It may take a minute or two to run and another minute or two to refresh, and you can click the "refresh" icon if you want to see the output file the very minute it is available:

Classifier

Alt text: Classifier

6. Let’s just look at the "diabetes_predicted.csv” file before we download it:

csv file

Alt text: csv file

a. Here, let’s look at the first row, Patient_ID 767. This person has a glucose of 126, BMI of 30.1, and an age of 47. This person also had an actual outcome of Diabetes (fourth column) but was predicted to have Not Diabetes (fifth column). The Model Results column classified this as a False Negative for this person (sixth column).

Question 6: Interpreting the Output File

Look further through the diabetes_predicted.csv file. For Patient_ID 526, what was their outcome?

A True positive

B False positive

C False negative

D True negative

7. Download the diabetes_predicted.csv file to your computer. We are now ready to visualize it using Tableau.

Visualize the Results in Tableau

We can see that these sorts of output files can be difficult to interpret. Let’s use Tableau to help visualize them.

1. Fire up Tableau and import your diabetes_predicted.csv data file to Tableau. Be sure the file you import has both Actual Outcome Text and Predicted Outcome Text fields in it.

2. Check: You should have 231 total rows in this data source.

3. First, let’s make a basic bar graph: How many model results were true positives? False positives? Other values?

a. Drag the Model Results to the Columns bar and the diabetes_predicted.csv (Count) to the Rows. It should look a little bit like the skeleton below—but you should have bar charts here.

csv file

Alt text: csv file

Question 7: Interpreting the Output File

How did the model do? Of the 231 people in this dataset, what was the most frequent model result?

A True positive: 49% of the results were true positive

B False positive: 18 people had a false-positive result

C False negative: 32% of the results were a false negative

D True negative: 132 people had a true-negative result

4. Let’s take another look at these results, which are more akin to the confusion matrix we saw earlier.

a. Go to another worksheet

b. Put the Actual Outcome Text in the Rows area, and the Predicted Outcome Text in the Columns area:

outcome

Alt text: outcome

c. Then drag the diabetes_predicted.csv (Count) to the area with the “Abc” in it:

csv file

Alt text: csv file

d. You will now have the numbers of the actual and predicted outcomes summed up for you:

predicted outcomes

Alt text: predicted outcomes

e. Let’s get the Marks a bit fancier: Take the diabetes_predicted.csv (Count), also, to the Size, and once again drag diabetes_predicted.csv (Count) to the Label. Take the Model Results to the Label and expand your graphics so you can see the whole thing. You will get something that should look like this:

predicted csv

Alt text: predicted csv

Question 8: Interpreting the Visual Confusion Matrix

Look at your visual matrix. Which statements would you agree with? Select all that apply.

A If a person actually has diabetes, their results would be found on the top row.

B If a person actually does not have diabetes, their results would be found on the bottom row.

C If the model predicts diabetes, the majority of the people in this category will turn out to have diabetes

D If the model predicts not diabetes, the majority of the people in this category will not turn out to have diabetes

E If a person has diabetes, the model is not great at predicting this; there will be a lot of incorrect predictions given

F If a person does not have diabetes, the model is not great at predicting this; there will be a lot of incorrect predictions given

5. Sometimes we want to see how a model’s predictions vary as certain variables change. Does this model predict differently for people of different ages?

a. Go to a new worksheet and make a histogram of the age. Set the bin size to 10. It should look like this:

bar graph

Alt text: bar graph

b. Add the Predicted Outcome text in front of the Age (bin). You will now see histograms, but they are split by predictions:

bar graph

Alt text: bar graph

Question 9: Interpreting the Split Histograms

Look at these two histograms. Which statements would you agree with? Select all that apply.

A Among those who are predicted not to have diabetes, the age distribution has a lot of younger people in it.

B In the age group 40–49, the model is predicting approximately the same number of people with and without diabetes.

C In the age group 40–49, the model is predicting approximately the same percentage of people with and without diabetes.

C In the group which is predicted to have diabetes, the ages are relatively evenly distributed between people in their 20s, 30s, 40s, and 50s, with a sharp drop-off at age 60 and older.

6. Sometimes the total head count does not give the whole picture, and a percentage is a better way to go. Let’s try to get our histograms to show us percentages of total.

a. Duplicate your paired Age histograms to a new sheet.

b. Under the Rows, CNT(Age), pull down the right arrow and Add Table Calculation.

histogram

Alt text: histogram

c. For your Table Calculation, choose Percent of Total, and have it compute using Table(down):

table

Alt text: table

d. Then put the Model Results on the Color so you can see what percentage of each age group has what sorts of model results:

graph

Alt text: graph

e. The final touch: Often, culturally, we see green as “good/correct” and red as “bad/error.” Let’s go through and set the colors so the “true” outcomes are in the green family and the “false” outcomes are in the red family.

graph

Alt text: graph

f. Now we can look at – for example – a person in their 20s who is predicted not to have diabetes. Do they need to worry?

i. The prediction is not diabetes, so we want the graph on the right (blue and red).

ii. Find the bar which represents people in their 20s who are not predicted to have diabetes

graph

Alt text: graph

iii. Let’s look at this bar a little more closely. We can drag the diabetes_predicted.csv (Count) onto the labels to have it show us the total number of people here. We can see that it does pretty well (lots of true model outcomes) for people in their 20s who are predicted not to have diabetes.

graph

Alt text: graph

Question 10: Interpreting the Stacked Percentage Bar Charts

Look at these charts. Which statements are accurate? Select all that apply.

A For people in their 40s (age 40–49), a model prediction of “no diabetes” is very good news because the model is nearly always correct, and they probably don’t have diabetes.

B For very elderly people (age 80–89), there is only one person in the dataset of this age. Because the model predicts “diabetes” for this person, it will always predict “diabetes” for all people in this age group, regardless of their BMI, glucose, or other variables.

C Say you have 10 people in their 20s who receive a model prediction of “diabetes.” Approximately 7 of those people will actually have diabetes, but 3 will be incorrectly predicted to have diabetes.

D Say you have 10 people in their 20s who receive a model prediction of “diabetes.” Approximately 4 of those people will actually have diabetes, and these are the false positives.

E There are relatively few people in either category (predicted diabetes, predicted no diabetes) who are age 60–69, so we should be cautious about interpreting these percentages for a broader population.

Unit 3 Self-Check Assignment 3 Diabetes Forecasting

Page 1

image1.png

image2.png

image3.png

image4.png

image5.png

image6.png

image7.png

image8.png

image9.png

image10.png

image11.png

image12.png

image13.png

image14.png

image15.png

image16.png

image17.png

image18.png

image19.png

in

Collepals.com Plagiarism Free Papers

Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.

Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS

Why Hire Collepals.com writers to do your paper?

Quality- We are experienced and have access to ample research materials.

We write plagiarism Free Content

Confidential- We never share or sell your personal information to third parties.

Support-Chat with us today! We are always waiting to answer all your questions.

Why Choose Us

Best Essay Writing Services- Get Quality Homework Essay Paper at Discounted Prices

At the risk of sounding immodest, we must point out that we have an elite team of writers. Ours isn’t a collection of individuals who are good at searching for information on the Internet and then conveniently re-writing the information obtained to barely beat Plagiarism Software. Who can’t do that?

Our writers have strong academic backgrounds with regards to their areas of writing. A paper on History will only be handled by a writer who is trained in that field. A paper on health care can only be dealt with by a writer qualified on matters health care. Thesis papers will only be handled by Masters’ Degree holders while Dissertations will strictly be handled by PhD holders. With such a system, you needn’t worry about the quality of work. Quality isn’t just an option, it is the only option. We don’t just employ writers, we hire professionals.

We have writers spread into all fields including but not limited to Philosophy, Economics, Business, Medicine, Nursing, Education, Technology, Tourism and Travels, Leadership, History, Poverty, Marketing, Climate Change, Social Justice, Chemistry, Mathematics, Literature, Accounting and Political Science.

Our writers are also well trained to follow client instructions as well adhere to various writing conventional writing structures as per the demand of specific articles.

They are also well versed with citation styles such as APA, MLA, Chicago, Harvard, and Oxford which come handy during the preparation of academic papers.

They also have unrivalled skill in writing language be it UK English or USA English considering that they are native English speakers. You also needn’t worry about logical flow of thought, sentence structure as well as proper use of phrases.

Our writers are also not the kind to decorate articles with unnecessary filler words. We respect your money and most importantly your trust in us. In writing, we will be precise and to the point and fill the paper with content as opposed to words aimed at beating the word count.

Our shift-system also ensures that you get fresh writers each time you send a job. This helps overcome occupational hazards brought about by fatigue. Hence, quality will consistently be at the top.

From our writers, you expect; good quality work, friendly service, timely deliveries, and adherence to client’s demands and specifications.

Once you’ve submitted your writing requests, you can go take a stroll while waiting for our all-star team of writers and editors to submit top quality work.

How Our Website Works

Get an Essay from Us

College Essays is the biggest affiliate and testbank for WriteDen. We hire writers from all over the world with an aim to give the best essays to our clients.

Our writers will help you write all your homework. They will write your papers from scratch. We also have a team of editors who read each paper from our writers just to make sure all papers are of HIGH QUALITY & PLAGIARISM FREE.

Step 1
To make an Order you only need to click ORDER NOW and we will direct you to our Order Page. Then fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline. Deadline range from 6 hours to 30 days.

Step 2
Once done with writing your paper we will upload it to your account on our website and also forward a copy to your email.

Step 3
Upon receiving your paper, review it and if any changes are needed contact us immediately. We offer unlimited revisions at no extra cost.

Is it Safe to use our services?
We never resell papers on this site. Meaning after your purchase you will get an original copy of your assignment and you have all the rights to use the paper.

Pricing and Discounts
Our price ranges from $8-$14 per page. If you are short of Budget, contact our Live Support for a Discount Code. All new clients are eligible for 20% off in their first Order. Our payment method is safe and secure.
Please note we do not have prewritten answers. We need some time to prepare a perfect essay for you.

Pregnancies	Glucose	BloodPressure	SkinThickness	Insulin	BMI	DiabetesPedigreeFunction	Age	Outcome
6	148	72	35	0	33.6	0.627	50	1
1	85	66	29	0	26.6	0.351	31	0
8	183	64	0	0	23.3	0.672	32	1
1	89	66	23	94	28.1	0.167	21	0
0	137	40	35	168	43.1	2.288	33	1
5	116	74	0	0	25.6	0.201	30	0
3	78	50	32	88	31	0.248	26	1
10	115	0	0	0	35.3	0.134	29	0
2	197	70	45	543	30.5	0.158	53	1
8	125	96	0	0	0	0.232	54	1
4	110	92	0	0	37.6	0.191	30	0
10	168	74	0

Unit 3 Self-Check Assignment 3 Diabetes Forecasting

image1.png

image2.png

image3.png

image4.png

image5.png

image6.png

image7.png

image8.png

image9.png

image10.png

image11.png

image12.png

image13.png

image14.png

image15.png

image16.png

image17.png

image18.png

image19.png

in

Related Posts

Write a college-level learning narrative about two specific skills or concepts from this course that are useful to a current or future employer. Rese

You are working as an IT security professional for an organization (called Website 101) that has 300 employees and one large corporate office with th

Discuss some hardening techniques or products you have used for Workstations List any observations, tips or questions about this lab that would