Boston University Answer the code in Jupyter
MET AD599 Introduction to Python and SQL for Business Analytics Assignment #3 – For the questions in this assignment, you will use Python Jupyter Notebook to analyze the following questions: Part 1: 1. Download the Titanic Dataset from here: https://www.kaggle.com/brendan45774/test-file 2. Import the dataset into Jupyter Notebook, and if there are NANs in column “Age”, replace them with Median of Age (0.35 pts) 3. Use a Markdown Cell to answer this question: Do you think Gender/Sex matters on whether a passenger will survive in Titanic Incident? And why is that? Do you have evidence, or have you heard some stories? (0.35 pts) 4. Use a Markdown Cell to answer this question: Which model will you use on this dataset? And why did you choose them? Please kindly note that the output should be “Survive” column. (0.35 pts) 5. Convert the “Sex” column from “object” to “numeric” (0.35 pts) 6. Use a Markdown Cell to answer this question: Which variables will you choose as inputs to build the model? Why? (0.35 pts) 7. Check all the variables you choose from step 6, if there are any NANs, replace them with the Average of the column. (0.35 pts) 8. Now build the model. Please remember to split a train and valid set first, then use the train set to build the model (0.35 pts) 9. Use a Markdown Cell to answer this question: Does this model have an “accuracy rate”? Explain why. (0.35 pts) 10. Use the .predict function to predict the results for the valid set. What is the accuracy rate of the model on valid set? Is it good? (0.35 pts) 11. Make up an imaginary individual, and use markdown cells to give a brief introduction of this individual (such as Sex, Age, Fare, etc.). Will this individual survive? (0.35 Pts) Part 2: 1. Download the house Dataset from here: https://www.kaggle.com/thomasnibb/amsterdamhouse-price-prediction 2. This dataset is plain and simple, the output should be “Price”, and input should be “Area”, “Room”, “Lon”, and “Lat”. Check if there are any NANs in these variables, if there is, then replace the NANs with Mean. (0.35 Pts) 3. Split the dataset into Train and Valid sets. Calculate the CV score of the Train sets for Linear Regression, Polynomial Regression (degree 1 to 4), Lasso Regression, Ridge Regression, KN Regression, Decision Tree Regression, and Random Forest Regression. (2.1 Pts) 4. Use a markdown cell to answer this question: Which model will you choose and why? (0.35 Pts) 5. Make up an imaginary house, and use the .predict function to predict the price of it with the model you choose.(0.7 Pt)
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.