Statistics II
Purpose of Course showclose
This course will introduce you to a number of statistical tools and techniques that are routinely used by modern statisticians for a wide variety of applications. First, we will review basic knowledge and skills that you learned in MA121: Introduction to Statistics. Units 25 will introduce you to new ways to design experiments and to test hypotheses, including multiple and nonlinear regression and nonparametric statistics. You will learn to apply these methods to building models to analyze complex, multivariate problems. You will also learn to write scripts to carry out these analyses in R, a powerful statistical programming language. The last unit is designed to give you a grand tour of several advanced topics in applied statistics.
Course Information showclose
Course Designer: Tuan Dinh
Primary Resources: This course is comprised of a range of different free, online materials. However, the course makes primary use of the following:
 MIT: Dmitry Panchenko’s “Statistics for Applications” Lecture Notes;
 University of Florida: Alan Agresti’s “Statistical Methods for the Social Sciences II” course; and
 Jeff Miller and Patricia Haden’s “Statistical Analysis with the General Linear Model”.
Time Commitment: The materials for this course will take approximately 188 hours to complete. Note that the time advisory for each unit also contains a time advisory for creating the element of courseware described in that unit.
Tips/Suggestions: You may not understand everything in the readings the first time around. Do not get frustrated. These are complex topics in statistics. Completing the assignments will often shed new lights on the topics and make the readings more understandable.
Learning Outcomes showclose
 apply statistical hypothesis testing for one population;
 conduct statistical hypothesis testing and estimation for two populations;
 apply multiple regression analysis to analyze a multivariate problem;
 analyze the outputs for a multiple regression model and interpret the regression results;
 conduct test hypotheses about the significance of a multiple regression model and test the significance of the independent variables in the model;
 select appropriate multiple regression models using automatic model selection, forward selection, backward elimination, and stepwise selection;
 recognize and address issues when using multiple regression analysis;
 identify situations when nonparametric tests are appropriate;
 conduct nonparametric tests; and
 explain the principles underlying General Linear Model, Multilevel Modeling, Data Mining, Machine Learning, Bayesian Belief Networks, Neural Network, and Support Vector Machine.
Course Requirements showclose
√ have access to a computer;
√ have continuous broadband Internet access;
√ have the ability/permission to install plugins or software (e.g., Adobe Reader or Flash);
√ have the ability to download and save files and documents to a computer;
√ be able to download and install R;
√ have the ability to open Microsoft files and documents (.doc, .ppt, .xls, etc.);
√ be competent in the English language;
√ have read the Saylor Student Handbook; and
√ have completed the following course: MA121: Introduction to Statistics.
Unit Outline show close

Unit 1: Review
In this unit, we will review some of the basic concepts you learned in Introduction to Statistics (MA121), including probability distribution, hypothesis testing, analysis of variance (ANOVA), basic regression, and correlation. This unit serves as the foundation for the subsequent units. Feel free to skim through subunits in which you are confident of a thorough understanding of the topics covered.
Unit 1 Time Advisory show close
In this unit, you will also learn to use R, a powerful statistical programming language. The New York Times has a fascinating article about R here. R is widely used by engineers, statisticians and scientists for data analysis. The main factor distinguishing R from other statistical and numerical languages like SAS, Matlab, Mathematica or SPLUS is that R is freely available to everybody. It is developed for scientists and maintained by scientists. Since its inception in 1996, the Rproject has been constantly upgraded by many contributors, which expand the capabilities of the language through addon packages.
Unit 1 Learning Outcomes show close

1.1 Overview of Probability Distributions
 Reading: MIT: Dmitry Panchenko’s “Statistics for Applications” Lecture Notes: Lecture 1—“Overview of Some Probability Distributions”
Link: MIT: Dmitry Panchenko’s “Statistics for Applications” Lecture Notes: Lecture 1—“Overview of Some Probability Distributions” (PDF)
Instructions: Download the PDF file of the lecture note of section #1, “Overview of Some Probability Distribution.” The reading provides an overview of several important discrete and continuous probability distributions, including binomial distribution, exponential distribution, Poisson distribution, and normal distribution. Under which conditions can a binomial distribution or a Poisson distribution be approximated by a normal distribution?
Reading this lecture should take approximately 1 hour.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Activity: The Comprehensive R Archive Network’s “Getting Started with R”
Link: The Comprehensive R Archive Network’s “Getting Started with R” (HTML)
Instructions: Click on the link above, which will take you to the official website for R: The Comprehensive R Archive Network. Download and install the appropriate version of R for your computer. The installation procedure is very straightforward. Once R is installed, go back to the website and click on “Manuals” under “Documentation,” which will take you to various R manuals. Click on “An Introduction to R” (either HTML or PDF) and read it. This manual will give you basic commands in R. Pay careful attention to chapters 7 and 8. Be sure to test out the R examples in the manual in the R command window to get used to R syntaxes.
Completing this activity should take approximately 4 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Activity: Dr. Alastair Sanderson’s “An Introduction to Using R”
Link: Dr. Alastair Sanderson’s “An Introduction to Using R” (HTML)
Instructions: Click on the above link. This webpage is a detailed tutorial for getting started with R. Copy and paste the R code from the webpage to the R command window. Make sure that you obtain similar results as shown on the webpage.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.
 Reading: MIT: Dmitry Panchenko’s “Statistics for Applications” Lecture Notes: Lecture 1—“Overview of Some Probability Distributions”
 1.2 Overview of Tests for Statistical Significance

1.2.1 Basic Concepts
 Reading: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 7: Tests of Statistical Significance: Three Overarching Concepts”
Link: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 7: Tests of Statistical Significance: Three Overarching Concepts” (HTML)
Instructions: Click on “Table of Contents” and read “Chapter 7: Tests of Statistical Significance: Three Overarching Concepts.” The reading will provide an overview of basic concepts in testing for statistical significance, including mean chance expectation, null hypothesis and research hypothesis, directional versus nondirectional research hypotheses, and oneway versus twoway tests of significance. What is the main difference between the null hypothesis and the research hypothesis? Think of a situation that you would prefer twoway testing over oneway testing. This section highlights the importance of coming up with the right question before figuring out which tool would be appropriate for answering the question.
Reading this chapter should take approximtely 1 hour.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.
 Reading: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 7: Tests of Statistical Significance: Three Overarching Concepts”

1.2.2 ChiSquare Procedures
 Reading: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 8: ChiSquare Procedures for the Analysis of Categorical Frequency Data”
Link: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 8: ChiSquare Procedures for the Analysis of Categorical Frequency Data” (HTML)
Instructions: Click on “Table of Contents” and read “Chapter 8: ChiSquare Procedures for the Analysis of Categorical Frequency Data.” The reading will provide an overview of applications of chisquare procedures. Chisquare procedures are often used for tests of goodness of fit and tests of independence. What are the limitations of chisquare procedures? Under which conditions is a Fisher Exact Probability Test more suitable than a chisquare test?
Reading this chapter should take approximately 1 hour.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.
 Reading: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 8: ChiSquare Procedures for the Analysis of Categorical Frequency Data”

1.2.3 Student’s ttests
 Reading: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 10. tProcedures for Estimating the Mean of a Population,” “Chapter 11. tTest for Two Independent Samples,” and “Chapter 12. tTest for Two Correlated Samples”
Links: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 10. tProcedures for Estimating the Mean of a Population,” “Chapter 11. tTest for Two Independent Samples,” and “Chapter 12. tTest for Two Correlated Samples” (HTML)
Instructions: Click on “Table of Contents” and read “Chapter 10: tProcedures for Estimating the Mean of a Population,” “Chapter 11: tTest for Two Independent Samples,” and “Chapter 12: tTest for Two Correlated Samples.” These readings will provide an overview of student’s tdistribution and applications of ttests for independent and correlated samples. Student’s t distribution arises when estimating the mean of normally distributed random variables of a sample with small size and the “true” standard deviation is unknown.
Reading these chapters should take approximately 1 hour.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Assessment: McGraw Hill: Bowerman, O’Connell, Schermer, and Adcock’s “Business Statistics in Practice: Multiple Choice Quiz for Chapter 14”
Link: McGraw Hill: Bowerman, O’Connell, Schermer, and Adcock’s “Business Statistics in Practice: Multiple Choice Quiz for Chapter 14” (HTML)
Instructions: Click on the link above and answer all questions in the quiz. Select your answer from choices given for each question. Click on “Submit Answers” at the bottom of the webpage when you have answered all the questions. The webpage will tell you whether your answer is correct and what the correct answer is.
Completing this quiz should take less than 1 hour.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Assessment: The Saylor Foundation’s “One and TwoSample Problems: Student’s TTests”
Link: The Saylor Foundation’s “One and TwoSample Problems: Student’s TTests” (PDF)
Instructions: Complete the linked assessment, titled “One and twosample problems: Student’s ttests”. When you are done, check your work against The Saylor Foundation’s “Answer Key for One and twosample problems: Student’s ttests” in unit 1.2.
Completing this assessment should take you no longer than 4 hours. If you have not done so already, click on the following link http://cran.rproject.org to download and install R on your computer. R will be used throughout the course for computer assessments.  Assessment: The Saylor Foundation’s “Subunit 1.2.3 Assessment”
Link: The Saylor Foundation’s “Subunit 1.2.3 Assessment”
Instructions: Complete this assessment to gauge your understanding of the materials covered thus far in this course. When you click “submit,” you will be shown the correct answers.
 Reading: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 10. tProcedures for Estimating the Mean of a Population,” “Chapter 11. tTest for Two Independent Samples,” and “Chapter 12. tTest for Two Correlated Samples”

1.3 Overview of Analysis of Variance (ANOVA)
This subunit provides an overview of oneway and twoway analyses of variance (or ANOVAs). The purpose of analysis of variance (ANOVA) is to test for significant differences between means. In order to test for statistical significance between means, we actually analyze variances – hence the name “analysis of variance.” ANOVAs will enable you to examine the amount of variability in a response variable and/or understand where the variability is coming from. This unit will specifically teach you how to use the oneway ANOVA to test for differences between the means of several groups and to use the twoway ANOVA and interpret the interaction effect.

1.3.1 Basic Concepts
 Lecture: YouTube: Medical College of Wisconsin: Sergey Tarima’s “ANOVA: Comparing More Than Two Treatments”
Link: YouTube: Medical College of Wisconsin: Sergey Tarima’s “ANOVA: Comparing More Than Two Treatments” (YouTube). This content is also available in PDF format on the Medical College of Wisconsin's website “ANOVA: Comparing More Than Two Treatments” (PDF)
Instructions: Click on the link for the video for the lecture on “ANOVA: Comparing More Than Two Treatments.” You may also want to download the presentation (in PDF) used for the lecture. In this lecture, Sergey Tarima discusses applications of oneway and twoway ANOVA in medical research. The lecture will cover content for subunits 1.3.11.3.3.
Watching this lecture and pausing to take notes should take approximately 1 hour.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Reading: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 13: Conceptual Introduction to the Analysis of Variance”
Link: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 13: Conceptual Introduction to the Analysis of Variance” (HTML)
Instructions: Click on “Table of Contents” and read “Chapter 13: Conceptual Introduction to the Analysis of Variance.” The reading will provide an overview of basic concepts in analysis of variance. The fundamental idea of ANOVA is that the observed variance of a variable can be decomposed into different sources of variations. Different ways of partitioning sources of variations are referred to as “statistical models.” The ANOVA depends on Fstatistics, that is, the ratio of the variance of the means to the variance within the samples. What is the relationship between Fdistribution and tdistribution?
Reading this chapter should take approximately 1 hour.Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.
 Lecture: YouTube: Medical College of Wisconsin: Sergey Tarima’s “ANOVA: Comparing More Than Two Treatments”

1.3.2 OneWay ANOVA
Note: This subunit is covered by the video lecture assigned beneath subunit 1.3.1. The video lecture will provide an example illustrating the use of oneway ANOVA in medical research.
 Reading: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 15: OneWay Analysis of Variance for Correlated Samples”
Link: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 15: OneWay Analysis of Variance for Correlated Samples” (HTML)
Instructions: Click on “Table of Contents” and read “Chapter 15: OneWay Analysis of Variance for Correlated Samples.” The reading provides an overview of oneway ANOVA for correlated samples, which can be considered to be an extension of correlatedsamples ttest.
Reading this chapter should take approximately 1 hour.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Reading: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 14: OneWay Analysis of Variance for Independent Samples”
Link: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 14: OneWay Analysis of Variance for Independent Samples” (HTML)
Instructions: Click on “Table of Contents” and read “Chapter 14: OneWay Analysis of Variance for Independent Samples.” The reading will provide an overview of using oneway ANOVA to compare means of two or more sampled using the F distribution for independent samples. The ANOVA tests the null hypothesis of the samples in two or more groups being drawn from the same population. What are the basic assumptions of oneway ANOVA? Why should the variance of the group means be lower than the variance of the samples?
Reading this chapter should take approximately 1 hour.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.
 Reading: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 15: OneWay Analysis of Variance for Correlated Samples”

1.3.3 TwoWay ANOVA
Note: This subunit is covered by the video lecture assigned beneath subunit 1.3.1. The second half of the video focuses on the use of twoway ANOVA in medical research.
 Reading: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 16: TwoWay Analysis of Variance for Independent Samples”
Link: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 16: TwoWay Analysis of Variance for Independent Samples” (HTML)
Instructions: Click on “Table of Contents” and read “Chapter 16: TwoWay Analysis of Variance for Independent Samples.” Twoway ANOVA is an extension of the oneway ANOVA to examine the influence of different categorical independent variables on one dependent variable.
Reading this chapter should take approximately 1 hour.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Assessment: University of Chicago: Robert Brandon Gramacy’s “Applied Regression Analysis: Homework 1”
Link: University of Chicago: Robert Brandon Gramacy’s “Applied Regression Analysis: Homework 1” (PDF)
Instructions: Click on the link above and scroll down to Homework 1 to complete this assignment. Follow the instructions for the problems closely, particularly the Rbased assignments. The solutions to the homework are in the pdf file and the R code.
Completing this assessment should take approximately 3 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Assessment: McGraw Hill: Bowerman, O’Connell, Schermer, and Adcock’s “Business Statistics in Practice: Multiple Choice Quiz for Chapter 10”
Link: McGraw Hill: Bowerman, O’Connell, Schermer, and Adcock’s “Business Statistics in Practice: Multiple Choice Quiz for Chapter 10” (HTML)
Instructions: Click on the link above and answer all questions in the quiz. Select your answers from choices given for each question. Click on “Submit Answers” at the bottom of the webpage when you have answered all the questions. The webpage will tell you whether your answer is correct and what the correct answer is.
Completing this quiz should take less than 1 hour.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Activity: John M Quick’s “R Tutorial Series: TwoWay ANOVA with Simple Interactions with Simple Main Effects”
Link: John M Quick’s “R Tutorial Series: TwoWay ANOVA with Simple Interactions with Simple Main Effects” (HTML)
Instructions: Click on the link and follow the instructions. This webpage contains a detailed tutorial for how to perform twoway ANOVA in R, it is attributed to John M Quick. Copy and paste the R code from the webpage to the R command window. Make sure that you obtain similar results as shown on the webpage.
Optional: Feel free to further explore other ANOVA tutorials:
 OneWay Omnibus ANOVA (HTML)
http://rtutorialseries.blogspot.com/2010/10/rtutorialseriesonewayomnibusanova.html
 OneWay ANOVA with Comparisons (HTML)
http://rtutorialseries.blogspot.com/2011/01/rtutorialseriesonewayanovawith.html
 TwoWay ANOVA with Unequal Sample Sizes (HTML)
http://rtutorialseries.blogspot.com/2011/02/rtutorialseriestwowayanovawith_28.html
Terms of Use: Please respect the copyright and terms of use displayed
on the webpage above  Assessment: The Saylor Foundation’s “Subunit 1.3.3 Assessment”
Link: The Saylor Foundation’s “Subunit 1.3.3 Assessment”
Instructions: Complete this assessment to gauge your understanding of the materials covered thus far in this course. When you click “submit,” you will be shown the correct answers.
 Reading: Vassar College: Richard Lowry’s Concepts and Applications of Inferential Statistics: “Chapter 16: TwoWay Analysis of Variance for Independent Samples”

1.4 Overview of Regression
This subunit provides an overview of linear regression or the method of using one variable to predict another variable using a linear function (i.e. a straight line). You will learn to calculate regression coefficients, make inferences about the slope and correlation coefficient, estimate mean values, and predict individual values.

1.4.1 Regression Basics
 Reading: Global Text: Thomas K. Tiemann’s Introductory Business Statistics: “Chapter 8: Regression Basics”
Link: Global Text: Thomas K. Tiemann’s Introductory Business Statistics: “Chapter 8: Regression Basics” (PDF)
Instructions: Download the PDF file. Open the file and browse to“Chapter 8: Regression Basics.” Read pages 7079. Linear regression models the relationship between a dependent variable (the response) and an independent variable (the cause) by fitting a linear equation to observed data. Define outliners and influential observations in a sample. How do you use Fscore for testing a regression?
Reading this chapter should take approximately 1 hour.
Terms of Use: The above book is released under a Creative Commons Attribution 3.0 License (HTML). It is attributed to Thomas K. Tieman, and the original version can be found here.  Web Media: YouTube: Perdisco’s Introductory Statistics Textbook: “Chapter 10: Regression”
Link: YouTube: Perdisco’s Introductory Statistics Textbook: “Chapter 10: Regression” (YouTube)
Instructions: This video provides a brief overview of regression. This lecture is optional if you already have a good understanding of regression.
Watching this video should take approximately 10 minutes.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.
 Reading: Global Text: Thomas K. Tiemann’s Introductory Business Statistics: “Chapter 8: Regression Basics”

1.4.2 Correlation and Covariance
 Reading: Global Text: Thomas K. Tiemann’s Introductory Business Statistics: “Chapter 8: Regression Basics”
Link: Global Text: Thomas K. Tiemann’s Introductory Business Statistics: “Chapter 8: Regression Basics” (PDF)
Instructions: Download the PDF file. Open the file and browse to “Chapter 8: Regression Basics.” Read pages 7982. Covariance measures how much two random variables change together. How does correlation relate to covariance? What are the connections among regression, correlation, and covariance?
Reading this chapter should take approximately 30 minutes.
Terms of Use: The above book is released under a Creative Commons Attribution 3.0 License (HTML). It is attributed to Thomas K. Tieman, and the original version can be found here.
 Reading: Global Text: Thomas K. Tiemann’s Introductory Business Statistics: “Chapter 8: Regression Basics”

1.4.3 Traps and Pitfalls of Regression
 Reading: University of Otago: Jeff Miller and Patricia Haden’s Statistical Analysis with the General Linear Model: “Chapter 13: Traps and Pitfalls of Regression Analysis”
Link: University of Otago: Jeff Miller and Patricia Haden’s Statistical Analysis with the General Linear Model: “Chapter 13: Traps and Pitfalls of Regression Analysis” (PDF)
Instructions: Click on the link “Download the book as a PDF file” to download and save the textbook. You will use this textbook throughout the course. Read “Chapter 13: Traps and Pitfalls of Regression Analysis.” What does “regression towards the mean” actually mean?
Reading this chapter should take approximately 1 hour.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Assessment: Massachusetts Institute of Technology: Cynthia Rudin’s “Statistical Thinking and Data Analysis Exam 4”
Link: Massachusetts Institute of Technology: Cynthia Rudin’s “Statistical Thinking and Data Analysis Exam 4” (PDF)
Instructions: Answer question 3, on page 4 of the exam.
Answering this question should take approximately 1 hour.
Terms of Use: These articles are licensed under a Creative Commons AttributionNonCommercialShareAlike 3.0 Unported License. They are attributed to Massachusetts Institute of Technology, and the original versions can be found here.  Assessment: McGraw Hill: Bowerman, O’Connell, Schermer, and Adcock’s “Business Statistics in Practice: Multiple Choice Quiz for Chapter 11”
Link: McGraw Hill: Bowerman, O’Connell, Schermer, and Adcock’s “Business Statistics in Practice: Multiple Choice Quiz for Chapter 11” (HTML)
Instructions: Click on the link above and answer all questions in the quiz. Select your answer from the choices given for each question. Click on “Submit Answers” at the bottom of the webpage when you have answered all the questions. The webpage will tell you whether your answer is correct and what the correct answer is.
Completing this quiz should take approximately 1 hour.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Assessment: The Saylor Foundation’s “Simple Linear Regression”
Link: The Saylor Foundation’s “Simple Linear Regression” (PDF)
Instructions: Complete the linked assessment, titled “Simple Linear Regression.” When you are done, solutions can be found here, under “Exercise2RSolutions.pdf”.
Completing this assessment should take you no longer than 4 hours. If you have not done so already, click on the following link http://cran.rproject.org to download and install R on your computer. R will be used throughout the course for assignments.
 Reading: University of Otago: Jeff Miller and Patricia Haden’s Statistical Analysis with the General Linear Model: “Chapter 13: Traps and Pitfalls of Regression Analysis”

Unit 1 Assessment
 Assessment: The Saylor Foundation’s “Unit 1 Assessment”
Link: The Saylor Foundation’s “Unit 1 Assessment”
Instructions: Complete this assessment to gauge your understanding of the materials covered thus far in this course. When you click “submit,” you will be shown the correct answers.
 Assessment: The Saylor Foundation’s “Unit 1 Assessment”

Unit 2: Multiple Regression and Correlation
The term “multiple regression” was first introduced by Pearson in 1908. Since then, multiple regression has evolved to be a powerful statistical technique used in all fields of research, ranging from biology, sociology, psychology, to engineering. Multiple regression enables you to learn about the relationship between several independent or predictor variables and one dependent variable. For instance, multiple regression has been used to understand how housing price depends on location, the size (in square feet), the number of bedrooms, the number of bathrooms, the architecture style, the average income of the respective neighborhood, the crime statistics, and so forth. In this unit, you will learn several fundamental concepts of multiple regression, including R^{2} and partial correlation. You also will learn to perform and interpret results of multiple regression.
Unit 2 Time Advisory show close
Unit 2 Learning Outcomes show close

2.1 Introduction to Multiple Regression
 Lecture: YouTube: EconDrD’s “Multiple Regression Lecture Notes 1”
Link: YouTube: EconDrD’s “Multiple Regression Lecture Notes 1” (YouTube)
Instructions: Click on the link for “Multiple Regression Lecture Notes 1”. The video will introduce you to multiple regression.
Watching this video and pausing to take notes should take approximately 30 minutes.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Reading: Jeff Miller and Patricia Haden’s Statistical Analysis with the General Linear Model: “Chapter 14: Multiple Regression”
Link: Jeff Miller and Patricia Haden’s Statistical Analysis with the General Linear Model: “Chapter 14: Multiple Regression” (PDF)
Instructions: Click on the link “Download the book as a PDF file” to download and save the textbook. You will use this textbook throughout the course. For many problems, it is unreasonable to expect that there is only one independent variable that influences the behavior of the dependent variable. Multiple regression is designed to address this type of problem. What are the basic steps of multiple regression? Compare these steps to linear regression.
Reading this chapter should take approximately 3 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Reading: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 11: Multiple Regression and Correlation”
Link: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 11: Multiple Regression and Correlation” (PDF)
Instructions: Browse to the bottom of the webpage and download the PDF file for “Chapter 11 draft.” Open the file and read section 11.1, “The Multiple Regression Model.” and section 11.2. “Example with Multiple Regression Computer Output.” This reading will supplement the reading from Jeff Miller and Patricia Haden’s Statistical Analysis with the General Linear Model book. How do you define degrees of freedom?
Reading this chapter should take approximately 2 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.The Saylor Foundation does not yet have materials for this portion of the course. If you are interested in contributing your content to fill this gap or aware of a resource that could be used here, please submit it here.
 Lecture: YouTube: EconDrD’s “Multiple Regression for Dummy Variables”
Link: YouTube: EconDrD’s “Multiple Regression for Dummy Variables” (YouTube)
Instructions: Click on the link for “Multiple regression dummy variables”. The lecture will introduce you to multiple regression for categorical and qualitative variables. Sometimes we want to include a categorical variable (e.g., gender or education level) in our model. These types of variables can be represented by dummy variables – variables with only two values, 0 and 1. The number of dummy variables for each categorical variable is k1, where k is the number of levels of the original variable.
Watching this video and pausing to take notes should take approximately 30 minutes.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Assessment: University of Chicago: Robert Brandon Gramacy’s “Applied Regression Analysis: Homework 5”
Link: University of Chicago: Robert Brandon Gramacy’s “Applied Regression Analysis: Homework 5” (PDF)
Instructions: Click on the link above, which will take you to the website of BUS 41100. Scroll down to Homework 5 and complete all problems in the homework. Follow the instructions for the problems closely. The solutions to the homework are in the R code. The comments in the R code are helpful for understanding the results.
Completing this assessment should take approximately 3 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.
 Lecture: YouTube: EconDrD’s “Multiple Regression Lecture Notes 1”

2.2 Multiple Correlation and R2
 Reading: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 11: Multiple Regression and Correlation”
Link: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 11: Multiple Regression and Correlation” (PDF)
Instructions: Browse to the bottom of the webpage and download the PDF file for “Chapter 11 draft.” Open the file and read section 11.3, “Multiple Correlation and RSquared.” This reading extend the concept of bivariate correlation that you reviewed in Unit 1 to multiple variables. Multiple correlation for a regression model describes the correlation between the observed values and the predicted values. What are the properties of R^{2}? What is multicollinearity?
Reading this chapter should take approximately 2 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.The Saylor Foundation does not yet have materials for this portion of the course. If you are interested in contributing your content to fill this gap or aware of a resource that could be used here, please submit it here.
 Reading: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 11: Multiple Regression and Correlation”

2.3 Inference for Multiple Regression and Coefficients
 Reading: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 11: Multiple Regression and Correlation”
Link: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 11: Multiple Regression and Correlation” (PDF)
Instructions: Browse to the bottom of the webpage and Download the PDF file for “Chapter 11 draft.” Open the file and read section 11.4, “Inference for Multiple Regression and Coefficients.”
Reading this chapter should take approximately 2 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.The Saylor Foundation does not yet have materials for this portion of the course. If you are interested in contributing your content to fill this gap or aware of a resource that could be used here, please submit it here.
 Reading: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 11: Multiple Regression and Correlation”

2.4 Relationships between Predictors
 Reading: Jeff Miller and Patricia Haden’s Statistical Analysis with the General Linear Model: “Chapter 16: Relationships among Predictors”
Link: Jeff Miller and Patricia Haden’s Statistical Analysis with the General Linear Model: “Chapter 16: Relationships among Predictors” (PDF)
Instructions: Click on “Download the book as a PDF file” to save and download the textbook. You will use this textbook throughout the course. Read “Chapter 16: Relationships among Predictors.” This chapter will address an important aspect of regression analysis, that is, the relationships between the different predictor variables. You will learn about the “context effect,” which is a situation when the usefulness of a predictor depends on which other predictor variables are included in the model. You will also learn about the use of the Venn diagram to visualize the relationships between predictors that lead to redundancy, error reduction, and suppressor effects.
Studying this chapter should take approximately 4 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Reading: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 11: Multiple Regression and Correlation
Link: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 11: Multiple Regression and Correlation” (PDF)
Instructions: Browse to the bottom of the webpage and download the PDF file for “Chapter 11 draft.” Open the file and read section 11.5, “Interaction between Predictors in Their Effects.” This reading will supplement your reading from Jeff Miller and Patricia Haden’s Statistical Analysis with the General Linear Model.
Reading this chapter should take approximately 1 hour.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.The Saylor Foundation does not yet have materials for this portion of the course. If you are interested in contributing your content to fill this gap or aware of a resource that could be used here, please submit it here.
 Reading: Jeff Miller and Patricia Haden’s Statistical Analysis with the General Linear Model: “Chapter 16: Relationships among Predictors”

2.5 Partial Correlation
 Reading: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 11: Multiple Regression and Correlation”
Link: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 11: Multiple Regression and Correlation” (PDF)
Instructions: Browse to the bottom of the webpage and download the PDF file for “Chapter 11 draft.” Open the file and read section 11.6, “Partial Correlation.”
Reading this chapter should take approximately 1 hour.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.The Saylor Foundation does not yet have materials for this portion of the course. If you are interested in contributing your content to fill this gap or aware of a resource that could be used here, please submit it here.
 Assessment: McGraw Hill: Bowerman, O’Connell, Schermer, and Adcock’s “Business Statistics in Practice: Multiple Choice Quiz for Chapter 12”
Link: McGraw Hill: Bowerman, O’Connell, Schermer, and Adcock’s “Business Statistics in Practice: Multiple Choice Quiz for Chapter 12” (HTML)
Instructions: Select your answer from choices given for each question and click on “Submit Answers” at the bottom of the webpage when you have answered all the questions. The webpage will tell you whether your answer is correct and what the correct answer is.
Completing this quiz should take approximately 1 hour.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Assessment: The Saylor Foundation’s “Multiple Regression Model”
Link: The Saylor Foundation’s “Multiple Regression Model” (PDF)
Instructions: Complete the linked assessment, titled “Multiple Regression Model”. When you are done, check your work against The Saylor Foundation’s “Answer Key for Multiple Regression Model” (PDF) in subunit 2.5.
Completing this assessment should take you no longer than 4 hours. If you have not done so already, please click on the following link http://cran.rproject.org to download and install R on your computer. R will be used throughout the course for assignments.
 Reading: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 11: Multiple Regression and Correlation”

Unit 2 Assessment
 Assessment: The Saylor Foundation’s “Unit 2 Assessment”
Link: The Saylor Foundation’s “Unit 2 Assessment”
Instructions: Complete this assessment to gauge your understanding of the materials covered thus far in this course. When you click “submit,” you will be shown the correct answers.
 Assessment: The Saylor Foundation’s “Unit 2 Assessment”

Unit 3: Model Building with Multiple Regression
This unit will provide you with a framework for building and evaluating regression models. You will learn how to select regression models using automatic procedures such as forward selection and backward elimination. You will also learn to use diagnostic tools to evaluate the accuracy of a regression model and to measure effects of multicollinearity.
Unit 3 Time Advisory show close
Unit 3 Learning Outcomes show close

3.1 Model Selection Procedures
 Reading: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 14: Model Building with Multiple Regression”
Link: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 14: Model Building with Multiple Regression” (PDF)
Instructions: Browse to the bottom of the webpage and download the PDF file for “Chapter 14 draft.” Open the file and read section 14.1, “Model Selection Procedures.” This reading will supplement Chapter 17 of Jeff Miller and Patricia Haden’s book. What are the general rules of thumb in selecting dependent variables? How do we avoid multicollinearity? Will different automatic selection methods lead to different results?
Reading this chapter should take approximately 2 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.The Saylor Foundation does not yet have materials for this portion of the course. If you are interested in contributing your content to fill this gap or aware of a resource that could be used here, please submit it here.
 Reading: Jeff Miller and Patricia Haden’s Statistical Analysis with the General Linear Model: “Chapter 17: Finding the Best Model”
Link: Jeff Miller and Patricia Haden’s Statistical Analysis with the General Linear Model: “Chapter 17: Finding the Best Model” (PDF)
Instructions: Click on the link “Download the book as a PDF file” to save and download the textbook. You will use this textbook throughout the course. Read “Chapter 17: Finding the Best Model,” which will provide an overview of best practices for selecting models. You will also learn how to use automatic model selection procedures, including forward selection, backward elimination, and stepwise procedures.
Reading this chapter should take approximately 4 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Assessment: University of Chicago: Robert Brandon Gramacy’s “Applied Regression Analysis: Homework 6”
Link: University of Chicago: Robert Brandon Gramacy’s “Applied Regression Analysis: Homework 6” (PDF)
Instructions: Click on the link above, which will take you to the website of BUS 41100. Scroll down to Homework 6 and complete all problems in the homework. Follow the instructions for the problems closely. The solutions to the homework are in the R code. The comments in the R code are helpful for understanding the results.
Completing this assessment should take approximately 4 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.
 Reading: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 14: Model Building with Multiple Regression”

3.2 Regression Diagnostics
 Reading: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 14: Model Building with Multiple Regression”
Link: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 14: Model Building with Multiple Regression” (PDF)
Instructions: Browse to the bottom of the webpage and download the PDF file for “Chapter 14 draft.” Open the file and read section 14.2, “Regression Diagnostics.” How do we know a regression model fits the data well or whether the data conform to the basic assumptions of a regression model? In this reading, you will learn a number of tools that allow you to “diagnose” your regression models for potential errors and limitations. What is homoscedasticity?
Studying this chapter should take approximately 4 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.The Saylor Foundation does not yet have materials for this portion of the course. If you are interested in contributing your content to fill this gap or aware of a resource that could be used here, please submit it here.
 Reading: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 14: Model Building with Multiple Regression”

3.3 Effects of Multicollinearity
 Reading: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 14: Model Building with Multiple Regression”
Link: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 14: Model Building with Multiple Regression” (PDF)
Instructions: Browse to the bottom of the webpage and download the PDF file for “Chapter 14 draft.” Open the file and read section 14.2, “Effects of multicollinearity.” What are the maineffects ofmulticollinearity? How do you measure inflation of variance due to multicollinearity? What are the methods to reduce or to remove the effects of multicollinearity?
Reading this chapter should take approximately 1 hour.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.The Saylor Foundation does not yet have materials for this portion of the course. If you are interested in contributing your content to fill this gap or aware of a resource that could be used here, please submit it here.
 Web Media: YouTube: How2Stats’ “Multicollinearity—Explained Simply (part 1)” and “Multicollinearity—Explained Simply (part 2)”
Link: YouTube: How2Stats’ “Multicollinearity—Explained Simply (part 1)” and “Multicollinearity—Explained Simply (part 2)” (YouTube)
Instructions: Watch the two videos, which will give an overview of the effects of multicollinearity on variance as well as methods to measure multicollinearity, including tolerance and variance inflation factor (VIF). What happens when VIF exceeds 10? What can you do to correct for multicollinearity?
Watching the two videos should take approximately 15 minutes.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Assessment: University of Chicago: Robert Brandon Gramacy’s “Applied Regression Analysis: Homework 7”
Link: University of Chicago: Robert Brandon Gramacy’s “Applied Regression Analysis: Homework 7” (PDF)
Instructions: Click on the link above, which will take you to the website of BUS 41100. Select Homework 7 and complete all problems in the assignment. Follow the instructions for the problems closely. The solutions to the homework are in the R code. The comments in the R code are helpful for understanding the results.
Completing this assessment should take approximately 5 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Assessment: The Saylor Foundation’s “Regression Diagnostics”
Link: The Saylor Foundation’s “Regression Diagnostics” (PDF)
Instructions: Complete the linked assessment, titled “Regression Diagnostics.” When you are done, check your work against The Saylor Foundation’s “Answer Key for Regression Diagnostics” (PDF) in subunit 3.3.This assessment should take you no longer than 4 hours. If you have not done so already, click on the following link http://cran.rproject.org to download and install R on your computer. R will be used throughout the course for assignments.
 Reading: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 14: Model Building with Multiple Regression”

Unit 3 Assessment
 Reading: The Saylor Foundation’s “Unit 3 Assessment”
Link: The Saylor Foundation’s “Unit 3 Assessment”
Instructions: Complete this assessment to gauge your understanding of the materials covered thus far in this course. When you click “submit,” you will be shown the correct answers.
 Reading: The Saylor Foundation’s “Unit 3 Assessment”

Unit 4: Generalized Linear Model and Nonlinear Regression
In this unit, you will be introducd to several new regression models, including logistic regression, polynomial regression, and exponential regression. These statistical models will provide you with a set of toolboxes to analyze a wide range of problems. For instance, logistic regression enables you to conduct regression analysis to predict outcome of a binary dependent variable (e.g., “success” vs. “failure”). This model is often used in medicine to predict risk of developing disease in a predefined period. You also will learn about generalized linear models, which were formulated as a way to unify various statistical models, including linear regression and logistic regression.
Unit 4 Time Advisory show close
Unit 4 Learning Outcomes show close

4.1 Generalized Linear Models
 Reading: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 14: Model Building with Multiple Regression”
Link: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 14: Model Building with Multiple Regression” (PDF)
Instructions: Browse to the bottom of the webpage and download the PDF file for “Chapter 14 draft.” Open the file and read section 14.4, “Generalized Linear Models.” Generalized linear model (GLM) is a generalization of linear regression. GLM accounts for nonnormal and categorical responses. What is a link function?
Studying this chapter should take approximately 3 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.The Saylor Foundation does not yet have materials for this portion of the course. If you are interested in contributing your content to fill this gap or aware of a resource that could be used here, please submit it here.
 Reading: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 14: Model Building with Multiple Regression”

4.2 Nonlinearity: Polynomial Regression
 Reading: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 14: Model Building with Multiple Regression”
Link: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 14: Model Building with Multiple Regression” (PDF)
Instructions: Browse to the bottom of the webpage and download the PDF file for “Chapter 14 draft.” Open the file and read section 14.5, “Nonlinear Relationships: Polynomial Regression.”
Studying this chapter should take approximately 4 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.The Saylor Foundation does not yet have materials for this portion of the course. If you are interested in contributing your content to fill this gap or aware of a resource that could be used here, please submit it here.
 Reading: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 14: Model Building with Multiple Regression”

4.3 Exponential Regression and Log Transforms
 Reading: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 14: Model Building with Multiple Regression”
Link: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 14: Model Building with Multiple Regression” (PDF)
Instructions: Browse to the bottom of the webpage and download the PDF file for “Chapter 14 draft.” Open the file and read section 14.6, “Exponential Regression and Log Transforms.”
Studying this chapter should take approximately 3 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.The Saylor Foundation does not yet have materials for this portion of the course. If you are interested in contributing your content to fill this gap or aware of a resource that could be used here, please submit it here.
 Reading: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 14: Model Building with Multiple Regression”
 4.4 Logistic Regression

4.4.1 Logistic Regression Basics
 Lecture: Medical College of Wisconsin: Sergey Tarima’s “Logistical Regression”
Link: Medical College of Wisconsin: Sergey Tarima’s “Logistical Regression” (YouTube and PDF)
Instructions: Click on the link for the video for the lecture on “Logistic Regression,” dated June 24, 2010 and watch the video. You may also want to download the presentation in PDF format. In this lecture, Sergey Tarima discusses logistic regression, using medical examples. You will learn about odds, odds ratio, and how to interpret results of a simple logistic regression model.
Watching this lecture and pausing to take notes should take approximately 1 hour and 15 minutes.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Reading: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 15: Logistic Regression”
Link: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 15: Logistic Regression” (PDF)
Instructions: Browse to the bottom of the webpage and download the PDF file for “Chapter 15 draft.” Open the file and read section 15.1, “Logistic Regression.”
Studying this chapter should take approximately 2 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.The Saylor Foundation does not yet have materials for this portion of the course. If you are interested in contributing your content to fill this gap or aware of a resource that could be used here, please submit it here.
 Lecture: Medical College of Wisconsin: Sergey Tarima’s “Logistical Regression”

4.4.2 Multiple Logistic Regression
 Reading: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 15: Logistic Regression”
Link: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 15: Logistic Regression” (PDF)
Instructions: Browse to the bottom of the webpage and download the PDF file for “Chapter 15 draft.” Open the file and read section 15.2, “Multiple Logistic Regression.”
Studying this chapter should take approximately 2 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.The Saylor Foundation does not yet have materials for this portion of the course. If you are interested in contributing your content to fill this gap or aware of a resource that could be used here, please submit it here.
 Reading: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 15: Logistic Regression”

4.4.3 Inference for Logistic Regression Models
 Reading: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 15: Logistic Regression”
Link: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 15: Logistic Regression” (PDF)
Instructions: Browse to the bottom of the webpage and download the PDF file for “Chapter 15 draft.” Open the file and read section 15.3, “Inference for Logistic Regression Models.”
Studying this chapter should take approximately 2 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.The Saylor Foundation does not yet have materials for this portion of the course. If you are interested in contributing your content to fill this gap or aware of a resource that could be used here, please submit it here.
 Assessment: Carnegie Mellon University: Cosma Shalizi’s “Advanced Data Analysis: Homework 7”
Link: CarnegieMellon University: Cosma Shalizi’s “Advanced Data Analysis: Homework 7” (PDF)
Instructions: Click on the link above and download the PDF file for Homework 7 (hw07.pdf). Follow the instructions for the problems closely. The solutions to the homework are in the PDF file, solutions07.pdf.
Completing this assignment should take approximately 5 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Assessment: The Saylor Foundation’s “Logistic Regression”
Link: The Saylor Foundation’s “Logistic Regression” (PDF)
Instructions: Complete the linked assessment. When you are done, check your work against The Saylor Foundation’s “Answer Key” (PDF) for subunit 4.4 in R.
If you have not done so already, click on the following link to download and install R on your computer: http://cran.rproject.org/. R will be used throughout the course for assignments.
Completing this assessment should take you no longer than 3 hours.
 Reading: University of Florida: Alan Agresti’s Statistical Methods for the Social Sciences II: “Chapter 15: Logistic Regression”

Unit 4 Assessment
 Assessment: The Saylor Foundation’s “Unit 4 Assessment”
Link: The Saylor Foundation’s “Unit 4 Assessment”
Instructions: Complete this assessment to gauge your understanding of the materials covered thus far in this course. When you click “submit,” you will be shown the correct answers.
 Assessment: The Saylor Foundation’s “Unit 4 Assessment”

Unit 5: Nonparametric Statistics
Many methods in statistics assume that your population distribution is normal. However, there are many scenarios in which this is not true. In these cases, nonparametric statistics provide alternative statistical methods because they do not involve population parameters. Upon completion of this unit, you should be able to distinguish between parametric and nonparametric test procedures and perform basic nonparametric tests such as sign test and Wilcoxon signedranks test. You also will learn to identify situations for which nonparmetric tests are more appropriate.
Unit 5 Time Advisory show close
Unit 5 Learning Outcomes show close

5.1 Introduction to Nonparametric Tests
 Reading: Pennsylvania State University Center for Astrostatistics’s “Summer School in Statistics for Astronomers VI”
Link: Pennsylvania State University Center for Astrostatistics’s “Summer School in Statistics for Astronomers VI” (PDF)
Instructions: Browse to the schedule for June 10, 2010 and download the PDF file for the lecture on “Nonparametric Statistics,” given by Dr. Thomas Hettmansperger. Open the file and read the entire lecture.
Studying this section should take approximately 4 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.
 Reading: Pennsylvania State University Center for Astrostatistics’s “Summer School in Statistics for Astronomers VI”

5.2 Nonparametric RankBased Tests
 Lecture: Medical College of Wisconsin: John Klein’s “Uses and Abuses of Nonparametric Statistics in Medical Research”
Link: Medical College of Wisconsin: John Klein’s “Uses and Abuses of Nonparametric Statistics in Medical Research” (PDF and YouTube)
Instructions: Click on the lecture “Uses and Abuses of Nonparametric Statistics” and watch the video. You may also want to download the presentation (in PDF) used for the lecture. In this video, John Klein discusses the use of nonparametric tests including sign test and Wilcoxon signedranks test. You will also learn when to use nonparametric statistics. What is the most significant advantage of nonparametric statistics?
Watching this lecture should take approximately 1 hour.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Reading: Lakehead University: Bruce Weaver’s “Chapter 3: Nonparametric Test”
Link: Lakehead University: Bruce Weaver’s “Chapter 3: Nonparametric Test” (PDF)
Instructions: Download the PDF file “nonpar.pdf” from the webpage (bullet #2 under “My own notes”) and read the whole chapter. In this reading you will learn about several nonparametric tests, including the sign test, the Wilcoxon signedranks test, the MannWhitney U test, and the KruskalWallis Htest.
Reading this chapter should take approximately 4 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Assessment: Carnegie Mellon University: Judy Huixia Wang’s “Applied Nonparametric Statistics: Homework 5”
Link: Carnegie Mellon University: Judy Huixia Wang’s “Applied Nonparametric Statistics: Homework 5” (PDF)
Instructions: Download the PDF file for Homework 5, and solve problem 1. The solution to the homework can be found on the same webpage.
Completing this assessment should take approximately 6 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.The Saylor Foundation does not yet have materials for this portion of the course. If you are interested in contributing your content to fill this gap or aware of a resource that could be used here, please submit it here.
 Assessment: McGraw Hill: Bowerman, O’Connell, Schermer, and Adcock’s “Business Statistics in Practice: Multiple Choice Quiz for Chapter 13”
Link: McGraw Hill: Bowerman, O’Connell, Schermer, and Adcock’s “Business Statistics in Practice: Multiple Choice Quiz for Chapter 13” (HTML)
Instructions: Click on the link above and answer all questions in the quiz. Select your answers from choices given for each question. Click on “Submit Answers” at the bottom of the webpage when you have answered all the questions. The webpage will tell you whether your answer is correct and what the correct answer is.
Completing this quiz should take approximately 1 hour.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Assessment: The Saylor Foundation’s “Nonparametric Statistics”
Link: The Saylor Foundation’s “Nonparametric Statistics” (PDF)
Instructions: Complete the linked assessment, titled “Nonparametrics Statistics.” When you are done, check your work against The Saylor Foundation’s “Answer Key for Nonparametrics Statistics” (PDF) in Unit 5.
Completing this assessment should take you no longer than 4 hours. If you have not done so already, click on the following link http://cran.rproject.org to download an install R on your computer. R will be used throughout the course for assignments.
 Lecture: Medical College of Wisconsin: John Klein’s “Uses and Abuses of Nonparametric Statistics in Medical Research”

Unit 5 Assessment
 Assessment: The Saylor Foundation’s “Unit 1 Assessment”
Link: The Saylor Foundation’s “Unit 1 Assessment”
Instructions: Complete this assessment to gauge your understanding of the materials covered thus far in this course. When you click “submit,” you will be shown the correct answers.
 Assessment: The Saylor Foundation’s “Unit 1 Assessment”

Unit 6: Introduction to Advanced Topics
This unit introduces you to a wide range of techniques used in modern statistics to address the rapidly increasing need for analyzing largescale data and making predictions. Advances in information technology have led to explosion in data in all aspects of our life. Consequently, statistics and predictive analytics have become a major driving force for scientific discoveries, technological advances, and improvements in quality of life in the twentyfirst century. You will learn about survival analysis, time series analysis, principal component analysis, structural equation models, and support vector machine. While each of the topics discussed in this unit deserves a separate course, the objective of this unit is to get you acquainted with these topics and provide you with some basic guidance so you can study these topics with more depth in the future.
Unit 6 Time Advisory show close
Unit 6 Learning Outcomes show close

6.1 Factor Analysis
 Reading: Lakehead University: Bruce Weaver’s “Chapter 7: Factor Analysis”
Link: Lakehead University: Bruce Weaver’s “Chapter 7: Factor Analysis” (PDF)
Instructions: Download the PDF file “pcafa.pdf” from the webpage (bullet #12 under “My own notes”) and read the whole chapter. In this reading, you will learn about factor analysis and principal component analysis.
Reading this chapter should take approximately 4 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Reading: Johns Hopkins School of Public Health: Elizabeth GarrettMayer’s “Lecture 8: Factor Analysis I” and “Lecture 9: Factor Analysis II”
Link: Johns Hopkins School of Public Health: Elizabeth GarrettMayer’s “Lecture 8: Factor Analysis I” and “Lecture 9: Factor Analysis II” (PDF)
Instructions: Download the PDF files for “Lecture 8: Factor Analysis I” and “Lecture 9: Factor Analysis II.” In these readings, you will learn to identify when a factor analysis is appropriate and when it is not, perform a onefactor and multifactor analysis, and interpret the results from a factor analysis. These readings supplement Dr. Bruce Weaver’s lecture notes on factor analysis.
Reading through these lectures should take approximately 2 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Reading: University of Minnesosta: Neils Waller’s “The Foundations of Factor Analysis: Factor Analysis in R”
Link: University of Minnesosta: Neils Waller’s “The Foundations of Factor Analysis: Factor Analysis in R” (PDF)
Instructions: Download the PDF file for “Factor Analysis in R” located in the first row and third column of the table at the bottom of the webpage. In this reading, you will explore several functions related to factor analysis in the R package.
Reading this assignment should take approximately 2 hours and 30 minutes.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Lecture: Stanford University: Andrew Ng’s: “Artificial Intelligence/Machine Learning: Lecture 14: Factor Analysis”
Link: Stanford University: Andrew Ng’s: “Artificial Intelligence/Machine Learning: Lecture 14: Factor Analysis” (YouTube, iTunes, and MP4)
Instructions: Please watch the video for Lecture 14. In this video, Andrew Ng discusses the derivation for factor analysis as well as principal component analysis. You may also download the transcript of the lecture (PDF or HTML).
Watching this lecture should take approximately 1 hour and 30 minutes.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.
 Assessment: The Saylor Foundation’s “Factor Analysis Assessment”
Link: The Saylor Foundation’s “Factor Analysis Assessment” (PDF)
Instructions: Complete the linked assessment. When you are done, check your work against The Saylor Foundation’s “Answer Key” for subunit 6.1 (PDF).
Completing this assessment should take less than 2 hours.  Assessment: The Saylor Foundation’s “Principal Component Analysis”
Link: The Saylor Foundation’s “Principal Component Analysis” (PDF)
Instructions: Complete the linked assessment, titled “Principal Component Analysis.” When you are done, check your work against The Saylor Foundation’s “Answer Key for Principal Component Analysis” (PDF) in subunit 6.2.
Completing this assessment should take you no longer than 5 hours. If you have not done so already, click on the following link http://cran.rproject.org to download and install R on your computer. R will be used throughout the course for assignments.
 Reading: Lakehead University: Bruce Weaver’s “Chapter 7: Factor Analysis”

6.2 StructuralEquation Models
 Reading: McMaster University: John Fox’s “Structural Equation Models”
Link: McMaster University: John Fox’s “Structural Equation Models” (PDF)
Instructions: Browse to “Lecture Notes and R Scripts” at the bottom of the webpage and download the PDF file for “Structural Equation Models.” This reading will provide an overview of structuralequation models (SEMs). An important feature of SEMs is their ability to deal with a variety of models for the analysis of latent variables. SEMs incorporate independent and dependent variables and latent constructs that clusters of observed variables might represent. SEM enables hypothesis testing when experiments are not possible.
Reading this lecture should take approximately 4 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Assessment: Carnegie Mellon University: Cosma Shalizi’s “Advanced Data Analysis: Homework 9”
Link: Carnegie Mellon University: Cosma Shalizi’s “Advanced Data Analysis: Homework 9” (PDF)
Instructions: Click on the link above, download the PDF file for Homework 9 (hw09.pdf), and complete all problems in the homework. Follow the instructions for the problems closely. The solutions to the homework are in the PDF file, solutions09.pdf.
Completing this assessment should take approximately 6 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Assessment: The Saylor Foundation’s “Principal Component Analysis”
Link: The Saylor Foundation’s “Principal Component Analysis” (PDF)
Instructions: Complete the linked assessment. When you are done, check your work against The Saylor Foundation’s “Answer Key for Principal Component Analysis” (PDF) in subunit 6.2.
Completing this assessment should take you no longer than 5 hours. If you have not done so already, click on the following link http://cran.rproject.org to download and install R on your computer. R will be used throughout the course for assignments.  Assessment: The Saylor Foundation’s “Structural Equation Modeling Assessment”
Link: The Saylor Foundation’s “Structural Equation Modeling Assessment” (PDF)
Instructions: Complete the linked assignment. When you are done, check your work against The Saylor Foundation’s “Answer Key” for subunit 6.2 (PDF).
Completing this assignment should take you no longer than 2 hours.
 Reading: McMaster University: John Fox’s “Structural Equation Models”

6.3 Survival Analysis
 Lecture: Medical College of Wisconsin: John Klein’s “Introduction to Survival Analysis”
Link: Medical College of Wisconsin: John Klein’s “Introduction to Survival Analysis” (YouTube and PDF)
Instructions: Click on the lecture “Uses and Abuses of Nonparametric Statistics” and watch the video. You may also want to download the presentation (in PDF) used for the lecture. In this video, John Klein provides an overview of survival analysis, including Kaplan Meier estimation and competing risk.
Watching this lecture should take approximately 1 hour.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Reading: McMaster University: John Fox’s “Survival Analysis”
Link: McMaster University: John Fox’s “Survival Analysis” (PDF)
Instructions: Scroll down to “Lecture Notes and R Scripts” at the bottom of the webpage and download the PDF file for “Survival Analysis.”
Reading these lecture notes should take approximately 3 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Assessment: The Saylor Foundation’s “Survival Analysis”
Link: The Saylor Foundation’s “Survival Analysis” (PDF)
Instructions: Complete the linked assessment. When you are done, check your work against The Saylor Foundation’s “Answer Key for Survival Analysis” (PDF) in subunit 6.3.
Completing this assessment should take you no longer than 5 hours. If you have not done so already, click on the following link http://cran.rproject.org to download and install R on your computer. R will be used throughout the course for assignments.
 Assessment: The Saylor Foundation’s “Unit 6.3 Assessment”
Link: The Saylor Foundation’s “Unit 6.3 Assessment” (PDF)
Instructions: Complete the linked assessment. When you are done, check your work against The Saylor Foundation’s “Answer Key” for subunit 6.3 (PDF).
Completing this assessment should take you no longer than 2 hours.
 Lecture: Medical College of Wisconsin: John Klein’s “Introduction to Survival Analysis”
 6.4 Multilevel (Hierarchical) Models

6.4.1 Introduction to Multilevel Modeling
 Reading: Harvey Goldstein’s Multilevel Statistical Models: “Chapter 1: Introduction”
Link: Harvey Goldstein’s Multilevel Statistical Models: “Chapter 1: Introduction” (PDF)
Instructions: Click on the link for “Chapter 1: Introduction” and download the PDF file. This chapter will introduce you to multilevel models, which are also known as hierarchical linear models, nested models, mixed models, random coefficient, or randomeffects models. These types of models account for variations in parameters at multiple levels.
Reading this chapter should take approximately 2 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.
 Reading: Harvey Goldstein’s Multilevel Statistical Models: “Chapter 1: Introduction”

6.4.2 A Basic Linear Multilevel Model
 Reading: Harvey Goldstein’s Multilevel Statistical Models: “Chapter 2: The basic linear multilevel model and its estimation”
Link: Harvey Goldstein’s Multilevel Statistical Models: “Chapter 2: The basic linear multilevel model and its estimation” (PDF)
Instructions: Click on the link and download the PDF file for Chapter 2. This chapter will walk you through a simple twolevel linear regression model.
Reading this chapter should take approximately 4 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Assessment: The Saylor Foundation’s “Multilevel Modeling”
Link: The Saylor Foundation’s “Multilevel Modeling” (PDF)
Instructions: Complete the linked assessment. When you are done, check your work against The Saylor Foundation’s “Answer Key for Multilevel Modeling” (PDF) in subunit 6.5.
Completing this assessment should take you no longer than 4 hours. If you have not done so already, click on the following link http://cran.rproject.org to download and install R on your computer. R will be used throughout the course for assignments.  Assessment: The Saylor Foundation’s “MetaAnalysis Assessment”
Link: The Saylor Foundation’s “MetaAnalysis Assessment” (PDF)
Instructions: Complete the linked assignment. When you are done, check your work against The Saylor Foundation’s “Answer Key” for subunit 6.4 (PDF).
Completing this assignment should take you no longer than 2 hours.
 Reading: Harvey Goldstein’s Multilevel Statistical Models: “Chapter 2: The basic linear multilevel model and its estimation”
 6.5 Longitudinal data analysis

6.5.1 Overview of Longitudinal Data Analysis
 Reading: Marie Davidian’s “Introduction to Modeling and Analysis of Longitudinal Data”
Link: Marie Davidian’s “Introduction to Modeling and Analysis of Longitudinal Data” (PDF)
Instructions: Click on the link for “Introduction to Modeling and Analysis of Longitudinal Data” (Introductory Lecture Session at the 2006 ENAR Spring Meeting, March 2006) and download the PDF file. The slide deck will provide an overview of longitudinal data analysis.
Reading this lecture should take approximately 2 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.
 Reading: Marie Davidian’s “Introduction to Modeling and Analysis of Longitudinal Data”

6.5.2 TimeSeries Analysis
 Reading: Engineering Statistics Handbook: “Section 6.4: Introduction to Time Series Analysis”
Link: Engineering Statistics Handbook: “Section 6.4: Introduction to Time Series Analysis” (HTML)
Instructions: Read sections 6.4.16.4.3 on the webpage.
Reading this section should take approximately 2 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Activity: Middle Eastern Technical University: G.P Nason’s “Introduction to R for Times Series Analysis”
Link: Middle Eastern Technical University: G.P Nason’s “Introduction to R for Times Series Analysis” (PDF)
Instructions: Click on the link above and download the PDF file “Introduction to R for Times Series Analysis” in the “R Help” section. This is a tutorial for times series analysis in R. Follow the instructions given in the tutorial. Pay careful attention to Section 3, which will guide you stepbystep on how to perform autoregressive integrated moving average (ARIMA).
Reading this section should take approximately 4 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.
 Reading: Engineering Statistics Handbook: “Section 6.4: Introduction to Time Series Analysis”

6.5.3 Case Study: Nonlinear Mixed Effects Models for Pharmacokinetic and Pharmacodynamic Analysis
 Reading: Marie Davidian’s “An Introduction to Nonlinear Mixed Effects Models and PK/PD Analysis”
Link: Marie Davidian’s “An Introduction to Nonlinear Mixed Effects Models and PK/PD (PDF)
Instructions: Click on the link for “An Introduction to Nonlinear Mixed Effects Models and PK/PD Analysis” (ASA Biopharmaceutical Section webinar, April 2010) and download the PDF file. The slide deck will introduce you to the use of a multilevel modeling for analyzing longitudinal data in the areas of pharmacokinetic and pharmacodynamics. This case study will leverage what you learn in subunit 6.4 and subunit 6.5.1.
Reading this study should take approximately 2 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.
 Reading: Marie Davidian’s “An Introduction to Nonlinear Mixed Effects Models and PK/PD Analysis”
 6.6 Data Mining and Machine Learning

6.6.1 Introduction
 Reading: Jiawei Han, Micheline Kamber, and Jian Pei’s Data Mining: Concepts and Techniques: “Chapter 1: Introduction”
Link: Jiawei Han, Micheline Kamber, and Jian Pei’s Data Mining: Concepts and Techniques: “Chapter 1: Introduction” (PowerPoint)
Instructions: Click on the link for “Chapter 1: Introduction” and download the slide deck. The slide deck will provide an overview of data mining.
Reading through this slide show should take approximately 2 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Lecture: Stanford University: Andrew Ng’s Artificial Intelligence/Machine Learning: “Lecture 1”
Link: Stanford University: Andrew Ng’s Artificial Intelligence/Machine Learning: “Lecture 1” (YouTube, iTunes, and MP4)
Instructions: Watch the video for Lecture 1. In this video, Andrew Ng discusses the basic concepts and applications of machine learning. You may also download the transcript of the lecture (PDF of HTML). This lecture series is one of the best resources available on the web on machine learning. Consider watching other videos in the lecture series if you want to learn more about machine learning.
Watching this lecture should take approximately 1 hour and 15 minutes.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.
 Reading: Jiawei Han, Micheline Kamber, and Jian Pei’s Data Mining: Concepts and Techniques: “Chapter 1: Introduction”

6.6.2 Classification: Overview and Basics
 Reading: Jiawei Han, Micheline Kamber, and Jian Pei’s Data Mining: Concepts and Techniques: “Chapter 8: Classification: Basic Concepts”
Link: Jiawei Han, Micheline Kamber, and Jian Pei’s Data Mining: Concepts and Techniques: “Chapter 8: Classification: Basic Concepts” (PowerPoint)
Instructions: Click on the link for “Chapter 8: Classification: Basic Concepts” and download the slide deck. The slide deck will provide an overview of concepts of classification used in data mining, including supervised versus unsupervised learning, classification versus numeric prediction. You will also learn about basic classification techniques such as decision tree induction and Bayes classification.
Reading this chapter should take approximately 2 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.
 Reading: Jiawei Han, Micheline Kamber, and Jian Pei’s Data Mining: Concepts and Techniques: “Chapter 8: Classification: Basic Concepts”

6.6.3 Advanced classification methods: Bayesian belief networks, neural network, Support Vector Machine (SVM)
 Reading: Jiawei Han, Micheline Kamber, and Jian Pei’s Data Mining: Concepts and Techniques: “Chapter 9: Classification: Advanced Methods”
Link: Jiawei Han, Micheline Kamber, and Jian Pei’s Data Mining: Concepts and Techniques: “Chapter 9: Classification: Advanced Methods” (PowerPoint)
Instructions: Click on the link for “Chapter 9: Classification: Advanced Methods” and download the slide deck. The slide deck will provide an overview of advanced statistical methods developed for classification, including neural networks and Support Vector Machine (SVM), kNearest Neighboring (kNN) algorithm, and genetic algorithm. Which scenarios are best for training Bayesian networks? What are the strengths and weaknesses of neural network as a classifier? In SVM, what is the marginal hyperplane?
Reading this chapter should take approximately 1 hour.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Activity: Technical University of Wien: David Meyer’s “Tutorial for Support Vector Machines in R”
Link: Technical University of Wien: David Meyer’s “Tutorial for Support Vector Machines in R” (PDF)
Instructions: Click on the link above and download the PDF file “svmdoc.pdf”. This is a tutorial for Support Vector Machines package in R. Follow the instructions given in the tutorial closely. You will need to install and load the “e1071” package for this tutorial.
Reading this tutorial should take approximately 3 hours.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.
 Reading: Jiawei Han, Micheline Kamber, and Jian Pei’s Data Mining: Concepts and Techniques: “Chapter 9: Classification: Advanced Methods”

6.6.4 Introduction to Bayesian Inference
 Lecture: Cambridge University: Christopher Bishop’s “Introduction To Bayesian Inference”
Link: Cambridge University: Christopher Bishop’s “Introduction To Bayesian Inference” (Adobe Flash)
Instructions: Watch the video for Lecture 1. In this video, Christopher Bishop provides a brief overview of the past, the present, and the future of machine. You will also learn about Bayesian inference, the foundation of the thirdgeneration of machine learning techniques.
Watching this lecture should take approximately 1 hour and 30 minutes.
Terms of Use: Please respect the copyright and terms of use displayed on the webpage above.  Assessment: The Saylor Foundation’s “Classification Techniques”
Link: The Saylor Foundation’s “Classification Techniques” (PDF)
Instructions: Complete the linked assessment. When you are done, check your work against The Saylor Foundation’s “Answer Key” (PDF) for subunit 6.6 (which was written in R).
Completing this assessment should take you no longer than 4 hours.  Assessment: The Saylor Foundation’s “Unit 6.6 Assessment”
Link: The Saylor Foundation’s “Unit 6.6 Assessment” (PDF)
Instructions: Complete the linked assignment. When you are done, check your work against The Saylor Foundation’s “Answer Key” for subunit 6.6 (PDF).
Completing this assignment should take you no longer than 2 hours.
 Lecture: Cambridge University: Christopher Bishop’s “Introduction To Bayesian Inference”

Unit 6 Assessment
 Assessment: The Saylor Foundation’s “Unit 6 Assessment”
Link: The Saylor Foundation’s “Unit 6 Assessment”
Instructions: Complete this assessment to gauge your understanding of the materials covered thus far in this course. When you click “submit,” you will be shown the correct answers.
 Assessment: The Saylor Foundation’s “Unit 6 Assessment”

Final Exam
 Final Exam: The Saylor Foundation’s MA251 Final Exam
Link: The Saylor Foundation’s MA251 Final Exam
Instructions: You must be logged into your Saylor Foundation School account in order to access this exam. If you do not yet have an account, you will be able to create one, free of charge, after clicking the link.
 Final Exam: The Saylor Foundation’s MA251 Final Exam