Tidy Data and More on Data Transformation Worksheet
Instructions.
•Use R Markdown to create an html document with the homework tasks.
•As always, any plots should have appropriate axis and overall labels.
1 Data
Data for this HW assignment come from a randomized experiment to study the efficacy of acupuncture for treating headaches. Results of the trial were published in the British Medical Journal in 2004. You may view the paper at the following link: http://www.bmj.com/content/328/7442/744.full. The data set includes 301 cases, 140 control (no acupuncture) and 161 treated (acupuncture). Participants were randomly assigned to groups.
Variable names and descriptions are as follows:
•age; age in years
•sex; male = 0, female = 1
•migraine; diagnosis of migraines = 1, diagnosis of tension-type headaches = 0
•chronicity; number of years of headache disorder at baseline
•acupuncturist; ID for acupuncture provider
•group; acupuncture treatment group = 1, control group = 0
•pk1; headache severity rating at baseline
•pk5; headache severity rating 1 year later
Import the data using read_csv() and call it acu. Note that the data have a header row.
Homework problems:
1. Create a new version of the data called acu2 that are sorted by treatment group, age, and baseline headache severity (pk1), in that order.
2. Create a subset of the data called acu3 that only includes particpants who were in the acupuncture group and were over 30 years of age.
3. Plot baseline vs one year headache severity in a scatterplot with different colors for treatment group and different regression lines by treatment group in ggplot2. What do the regression lines suggest about the efficacy of the acupuncture treatment?
4. Note that pk1 and pk5 are both measures of the same outcome variable taken at two different times (baseline and one year). Pivot the data from wide to long format so that pk1 and pk5 appear in a single column called severity. When pivoting, you
should create a new variable called time with values 0 or 1 depending on whether the observation was taken at baseline (= 0) or at 1 year (= 1). Note that it’s ok to do this in multiple steps or with piped mutate() calls; both will work. For example, when
you pivot, if you use names_prefix = “pk”, you will get a factor with levels 1 and 5. Then, you would need to change to numeric and change the levels to 0 and 1.
5. We only covered pivot_longer(). Figure out how to use pivot_wider() to get your data back from long format into wide format (i.e., restore them to their original form).
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.