Please make sure that it is your own work. Please watch out for spelling errors and grammar errors. Please read the study guide.
Please make sure that it is your own work. Please watch out for spelling errors and grammar errors. Please read the study guide. Please use the APA 7th edition 4pages length including the tables, figures format
Book reference:Fox, J. (2017). Using the R Commander: A point-and-click interface for R. CRC Press. https://online.vitalsource.com/#/books/9781498741934
Report Descriptive Statistics and Normality
A unique data set for this course has been provided to each student by the instructor. Refer to the data set for these tests:
- Report descriptive statistics for the data set.
- Test the distribution of the leadership variable (ldrship) using the Shapiro-Wilk test.
- Test the distribution of the aptitude variable using the Anderson-Darling test.
Results to be included are the descriptive statistics for the data set, results of the distribution of the leadership variable (ldrship) using the Shapiro-Wilk test, and results of the distribution of the aptitude variable using the Anderson-Darling test bar chart. A brief narrative explaining results should be included.
A Quick Tour of the R Commander
This chapter introduces the R Commander graphical user interface (GUI) by demonstrating its use for a simple problem: constructing a contingency table to examine the relationship between two categorical variables. In developing the example, I explain how to start the R Commander, describe the structure of the R Commanderinterface, show how to read data into the R Commander, how to modify data to prepare them for analysis, how to draw a graph, how to compute numerical summaries of data, how to create a printed report of your work, how to edit and re-execute commands generated by the R Commander, and how to terminate your R and R Commander session—in short, the typical work flow of data analysis using the R Commander. I also explain how to customize the R Commander interface.
In the course of this chapter, you’ll get an overview of the operation of the R Commander. Later in the book, I’ll return in more detail to many of the topics addressed in the chapter.
I assume that you have installed R and the Rcmdr package, as described in the preceding chapter. As well, if you haven’t read Chapter 1 , now is a good time to do so— Chapter 1 explains some typographical conventions used in this book, discusses the general characteristics and origin of R and the R Commander, and introduces the web site for the book.
Start R in the normal manner for your computer, for example, by double-clicking on the R desktop icon in Windows, by double-clicking on R.app in the Mac OS X Applications folder, or by clicking on the R icon in the Mac OS X Launchpad. 1 On a Linux or Unix machine, you’d normally start R by typing R at the command prompt in a terminal window.
Once R starts up, type the command library(Rcmdr) at the R > command prompt, and then press the Enter or Return key. This command should load the Rcmdr package and—after a brief delay—start the R Commander GUI, as shown in Figure 3.1 for Windows 2 or Figure 3.2 for Mac OS X. If you encounter a problem in starting R or the R Commander, see the sections on troubleshooting in Chapter 2 ( Section 2.2.1 for Windows, 2.3.4 for Mac OS X, or 2.4.1 for Linux/Unix).
Under Windows, the R Commander ( Figure 3.1 ) looks like a standard program. In contrast, under Mac OS X( Figure 3.2 ), the R Commander has its own main menu bar, unlike a standard application, which would use the menu bar at the top of the Mac OS X desktop. 3
As you can see, the main R Commander window looks very similar under Windows and Mac OS X. After this introductory chapter, I will show R Commander dialog boxes as they appear under Windows 10. As well, all dialogs and graphs in the text are rendered in monochrome (gray-scale) rather than in color. 4
At the top of the R Commander window there is a menu bar with the following top-level menus:
File contains menu items for opening and saving various kinds of files, and for changing the R working directory—the folder or directory in your file system where R will look for and write files by default.
Edit contains common menu items for editing text, such as Copy and Paste, along with specialized items for R Markdown documents (discussed in Section 3.6.2 ).
Data contains menu items and submenus for importing, exporting, and manipulating data (see in particular Sections 3.3 and 3.4 , and Chapter 4 ).
Statistics contains submenus for various kinds of statistical data analysis (discussed in several subsequent chapters), including fitting statistical models to data ( Chapter 7 ).
Graphs contains menu items and submenus for creating common statistical graphs (see in particular Chapter 5 ).
Models contains menu items and submenus for performing various operations on statistical models that have been fit to data (see Chapter 7 ).
Distributions contains a menu item for setting the R random-number-generator seed for simulations, and submenus for computing, graphing, and sampling from a variety of common (and not so common) statistical distributions (see Chapter 8 ).
Tools contains menu items for loading R packages and R Commander plug-in packages (see Chapter 9 ), for setting and saving R Commander options (see Section 3.9 ), for installing optional auxiliary software (see Section 2.5 ), and, under Mac OS X, for managing app nap for R.app (see Section 2.3.3 ).
Help contains menu items for obtaining information about the R Commander and R, including links to a brief introductory manual and to the R Commander and R web sites; information about the active data set; and a link to a web site with detailed instructions for using R Markdown to create reports (see Section 3.6 ).
The complete R Commander menu tree is shown in the appendix to this book (starting on page 199 ).
FIGURE 3.1: The R Console and R Commander windows at startup under Windows 10.
FIGURE 3.1: The R Console and R Commander windows at startup under Windows 10.
FIGURE 3.2: The R.app and R Commander windows at startup under Mac OS X.
Below the menus is a toolbar, with a button showing the name of the active data set (displaying < No active dataset> at startup), buttons to edit and view the active data set, and a button showing the active statistical model (displaying <No active model> before a statistical model has been fit to data in the active data set). The Data set and Model buttons may also be used to choose from among multiple data sets and associated statistical models if more than one data set or model resides in the R workspace—the region of your computer’s main memory where R stores data sets, statistical models, and other objects.
Below the toolbar there is a window pane with two tabs, labelled respectively R Script and R Markdown, that collect the R commands generated during your R Commander session. The contents of the R Script and R Markdowntabs can be edited, saved, and reused (as described in Section 3.6 ), and commands in the R Script tab can be modified and re-executed by selecting a command or commands with the mouse (left-click and drag the mouse cursor over the command or commands) and pressing the Submit button below the R Script tab. If you know how, you can also type your own commands into the R Script tab and execute them with the Submit button (see Section 3.7 ). 5 The R Markdown tab, initially behind the R Script tab, also accumulates the R commands that are generated during a session, but in a dynamic document that you can edit and elaborate to create a printed report of your work (as described in Section 3.6.2 ).
The R Commander Output pane appears next: The Output pane collects R commands generated by the R Commander along with associated printed output. The text in the Output pane is also editable, and it can be copied and pasted into other programs (as described in Section 3.6.1 ).
Finally, at the bottom of the R Commander window, the Messages pane records messages generated by R and the R Commander—numbered and color-coded notes (dark blue), warnings (green), and error messages (red). For example, the startup note indicates the R Commander version, along with the date and time at the start of the session.
Once you have started the R Commander GUI, you can safely minimize the R Console window—this window occasionally reports messages, such as when the R Commander causes other R packages to be loaded, but these messages are incidental to the use of the R Commander and can almost always be safely ignored. 6
3.3 Reading Data into the R Commander
Statistical data analysis in the R Commander is based on an active data set in the form of an R data frame. A data frame is a rectangular data set in which the rows (running horizontally) represent cases (often individuals) and the columns (running vertically) represent variables descriptive of those cases. Columns in data frames can contain various forms of data —numeric variables, character-string variables (with values such as “Yes”, “No”, or “Maybe”), logical variables (with values TRUE or FALSE), and factors, which are the standard representation of categorical data in R. Typically, data frames used in the R Commander consist of numeric variables and factors, and character and logical variables, if present, are treated as factors.
R and the R Commander permit you to have as many data frames in your workspace as will fit, 7 but only one is active at any given time. You can read data into data frames from several sources using the R Commander menus: 8 See the Data > Import data submenu, and the Data > Data in packages > Read data set from an attached packagemenu item and associated dialog. If more than one data frame resides in your workspace, you can choose among them by pressing the Data set button in the toolbar or via the menus: Data > Active data set > Select active data set.
One convenient source of data is a plain-text (“ASCII”) file with one line per case, variable names in the first line, and values in each line separated by a simple delimiter such as spaces or a comma. An example of a plain-text data file with comma-separated values, GSS.csv, is shown in Figure 3.3 . 9
The data in the file GSS.csv are drawn from the U.S. General Social Survey (GSS), and were collected between 1972 and 2012. The GSS is a periodic cross-sectional sample survey of the U. S. population conducted by the National Opinion Research Center at the University of Chicago. Many of the questions in the GSS are repeated in each survey, while other questions are repeated at intervals. To compile the GSS data set, I selected instances of the GSS that asked the question, “There’s been a lot of discussion about the way morals and attitudes about sex are changing in this country. If a man and a woman have sex relations before marriage, do you think it is always wrong, almost always wrong, wrong only sometimes, or not wrong at all?” I also included information about the year of the survey, and the respondents’ gender, education, and religion. Table 3.1 shows the definition of the variables in the GSS data set.
FIGURE 3.3: The GSS.csv file, with comma-delimited data from the U.S. General Social Survey from 1972 to 2012. Only a few of the 33,355 lines in the file are shown; the widely spaced ellipses (…) represent elided lines. The first line in the file contains variable names.
TABLE 3.1: Variables in the GSS data set.
Variable |
Values |
year |
numeric, year of survey, between 1972 and 2012 |
gender |
character, female or male |
premarital.sex |
character, always wrong, almost always wrong, sometimes wrong, or not wrong at all |
education |
character, less than high school, high school, or post-secondary |
religion |
character, Protestant, Catholic, Jewish, other, or none |
This is a natural point at which to explain how objects, including data sets and variables, are named in R: Standard R names are composed of lower- and upper-case letters (a–z, A–Z), numerals (0–9), periods (.), and underscores (_), and must begin with a letter or a period. As well, R is case sensitive; so, for example, the names education, Education, and EDUCATION are all distinct.
In order to keep this introductory example as simple as possible, when I compiled the GSS data set from the original source, I eliminated cases with missing values for any of the four substantive variables (of course, there were no missing values for the year of the survey). In R, missing values are represented by NA (“not available”), and in the R Commander, NA is the default missing-data code for text-data input, although another missing-data code (such as ?, ., or 99) can be specified. This and some other complications and variations are discussed in Chapter 4 on reading and manipulating data in the R Commander.
To read simply formatted data in plain-text files into the R Commander, you can use Data > Import data > from text file, clipboard, or URL. As the name of this menu item implies, the data can be copied to the clipboard (e.g., from a suitably formatted spreadsheet) or read from a file on the Internet, but most often the data will reside in a file on your computer.
The resulting dialog box is shown in Figure 3.4 . This is a comparatively simple R Commander dialog box—for example, it doesn’t have multiple tabs—but it nevertheless illustrates several common elements of R Commanderdialogs:
FIGURE 3.4: The Read Text Data dialog as it appears on a Windows computer (left) and under Mac OS X (right).
• There is a check box to indicate whether variable names are included with the data, as they are in the GSS.csv data file.
• There are radio buttons for selecting one of several choices—here, where the data are located, how data values are separated, and what character is used for decimal points (e.g., commas are used in France and the Canadian province of Québec).
• There are text fields into which the user can type information—here, the name of the data set, the missing-data indicator, and possibly the data-field separator.
I’ve taken all of the defaults in this dialog box, with the following two exceptions: I changed the default data set name, which is Dataset, to the more descriptive GSS. Recall the rules, explained above, for naming R objects. For example, GSS data, with an embedded blank, would not be a legal data set name. I also changed the default field separator from White space (one or more spaces or a tab) to Commas, as is appropriate for the comma-separated-values file GSS.csv.
The Read Text Data dialog also has buttons at the bottom that are standard in R Commander dialogs:
• The Help button opens an R help page in a web browser, documenting either the use of the dialog or the use of an Rcommand that the dialog invokes. In this case, pressing the Help button opens up the help page for the Rread.table function, which is used to input simple plain-text data. R help pages are hyper-linked, so clicking on a link will open another, related help page in your browser. (Try it!)
FIGURE 3.5: The Open file dialog with the data file GSS.csv selected.
• Pressing the OK button generates and executes an R command (or, in the case of some dialogs, a sequence of R commands). 10 These commands are usually entered into the R Script and R Markdown tabs, and the commands and associated printed output appear in the Output pane. If graphical output is produced, it appears in a separate Rgraphics-device window.
Clicking OK in the Read Text Data dialog brings up a standard Open file dialog box, as shown in Figure 3.5 . I navigated to the location of the data file on my computer and selected the GSS.csv file. Notice that files of type .csv, .txt, and .dat (and their upper-case analogs) are listed by default—these are common file types associated with plain-text data files.
Clicking OK causes the data to be read from GSS.csv, creating the data frame GSS, and making it the active data set in the R Commander. The read.table command invoked by the dialog converts character data in the input file to R factors (here, the variables gender, premarital.sex, education, and religion).
• Clicking the Cancel button simply dismisses the Read Text Data dialog.
As is apparent, the order of the buttons at the bottom of the dialog box is different in Windows and in Mac OS X, reflecting differing GUI conventions on these two computing platforms.
FIGURE 3.6: The R Commander data-set viewer displaying the GSS data set.
3.4 Examining and Recoding Variables
Having read data into the R Commander from an external source, it’s generally a good idea to take a quick look at the data, if only to confirm that they’ve been read properly. Clicking the View data set button in the R Commandertoolbar brings up the data-viewer window shown in Figure 3.6 . Variable names remain at the top of the display as the rows are scrolled using the scrollbar at the right of the data-viewer window. Row numbers appear to the left of the data; if the rows of the data set were named, the row names would appear here (and row numbers or names remain at the left if it’s necessary to scroll the data viewer horizontally). You may leave the data-viewer window open on your desktop as you continue to work in the R Commander, or you may close the data viewer. If you leave it open, the data viewer will be automatically updated if you make subsequent changes to the active data set.
Although the GSS data set contains a moderately large number of cases (with n = 33, 354 rows), there are only five variables, and so I request a summary of all the variables in the data set, invoked by Statistics > Summaries > Active data set. The result is shown in Figure 3.7 :
• R commands generated in the R Commander session are accumulated in the R Script tab (and in the R Markdowntab, which is currently behind the R Script tab and consequently isn’t visible).
• These commands, along with associated printed output, appear in the Output pane; the scrollbar at the right of the pane allows you to examine previous input and output that has scrolled out of view. If some printed material is wider than the pane, you can similarly use the horizontal scrollbar at the bottom to inspect it. The R Commandermakes an effort to fit output to the width of the Output pane, but it isn’t always successful.
• Notice that the Messages pane now includes a note about the dimensions of the GSS data set, generated when the data set was read, and which appears below the initial start-up message.
The output produced by the summary(GSS) command includes a “five-number summary” for the numeric variable year, reporting the minimum, first quartile, median, third quartile, and maximum values of the variable, along with the mean. The other variables are factors, and the count in each level (category) of the factor is shown.
By default, the levels of a factor are ordered alphabetically. This is inconsequential in the case of gender, with levels “female” and “male”, but the levels of premarital.sex and education have natural orderings different from the alphabetic orderings. Although the categories of religion are unordered, I’d still prefer an ordering different from alphabetic, for example, putting the categories “other” and “none” after the others.
I won’t use all of the variables in the GSS data set in this chapter, but to illustrate reordering the levels of a factor, I’ll put the levels of education into their natural order. Clicking on Data > Manage variables in active data set > Reorder factor levels produces the dialog box at the left of Figure 3.8 . I select education in the variable list box in the dialog, leave the name for the factor at its default value same as original>, and keep the Make ordered factorbox unchecked. 11 Because the variable name is unchanged, the new education variable will replace the original variable in the GSS data frame, and so the R Commander will ask for confirmation when I click the OK button.
Variable list boxes are a common feature of R Commander dialogs:
• In general, left-clicking on a variable in an R Commander variable list selects the variable.
• If more than one variable is to be selected—which is not the case in the Reorder Factor Levels dialog—you can Ctrl-left-click to choose additional variables—that is, simultaneously hold down the Ctrl (Control) key on your keyboard and click the left mouse button.
• Ctrl-clicking “toggles” a selection, so if a variable is already selected, Ctrl-clicking on its name will de-select it.
• The Ctrl key is used in the same way on Macs and on PCs, although on a Mac keyboard, the key is named control. You cannot use the Mac command key here instead of control.
• Similarly, Shift-clicking may be used to select a contiguous range of variables in a list: Click on a variable at one end of the desired range and then Shift-click on the variable at the other end.
• Finally, you can use the scrollbar in a variable list if the list is too long to show all of the variables simultaneously, and pressing a letter key scrolls to the first variable whose name begins with that letter. It’s unnecessary to scroll the variable list here because there are only four factors in the data set.
FIGURE 3.7: The R Commander window after summarizing the active data set.
FIGURE 3.7: The R C
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.