General Information
Time: Tu-Th 8:00-9:30 AM
Location: Berkeley Way West 1213
Sections : F 10-11 BWW 1213
Note: Students must come to class and section with a laptop.
Instructor: Theunissen, Frédéric
Office Hours: Th 9:45-10:45 or by appointment.
GSI : Foushee, Ruthe
Office Hours: M 2:30-3:30 or by appointment.
Course Description.
The course is an introduction to modern statistical modeling with a focus on applications in Social Sciences. Topic covered include: basic probability theory, distributions, modeling, parameter fitting, error estimation, statistical significance and cross-validation. In addition, the course will cover all statistical tests that are part of the generalized mixed effect models: n-way analysis of variance (ANOVA), multiple regression, analysis of covariance, logistic regression, between subjects, within subjects, mixed designs and designs with random factors. Students will also be introduced to statistical programming using the computer language R.
.
Course Logistics, Assignments and Tests.
The course meets on Tuesdays and Thursdays 8-9:30 AM in 1213 Berkeley Way for lecture sessions that will combine lecture material and hand on computer tutorials. These computer tutorials will be “unfinished” and your weekly homework will be to finish them. Additional pencil and paper assignments will also be given. Tutorials/homeworks will be assigned on Friday (or before) and will be due the following Friday (with some exceptions as indicated in the syllabus or bcourses announcements.
Section meets on Fridays from 10:00 to 11:00 AM also in 1213 Berkeley Way. Attending one of these sections is mandatory. During section you will receive additional tips and guidance for your homework. If you are efficient you will be able to finish the homework during section.
Tests include two midterms and one final exam. The midterms will be during class time and last 90 minutes. The final will be given during final week and will last 3 hours.
There will be a final project that involves a detailed analysis of a data set chosen by the student.
Grades will be based on homework (25%), tests 50% (Each midterm 10%, Final 30%), final project (20%) and class participation (5%).
The psychology department does teaching evaluations on-line. The evaluations will be available between weeks 14 and 15. The evaluations are an important tool to improve the quality of our teaching mission and all students are expected to participate.
Learning Goals.
This quantitative requirement for Psychology Honor Students addresses facets of each of the seven program learning goals of the Psychology Major at UC Berkeley, with particular emphasis on Program Learning Goals #4, 5 and 6
- Understand basic concepts that characterize psychology as a field of scientific inquiry, and appreciate the various subfields that form the discipline as well as things that differentiate it from other related disciplines. Scientific enquiry in Social Sciences and in Psychology in particular is based on the formulation of statistical models. Each scientific hypothesis corresponds to a particular model and hypothesis testing involves comparing models in terms of their predictive power. The field of psychology, because of the complexity of the data it attempts to explain relies heavily (and more so that other biological disciplines) on statistical modeling and other quantitative approaches. Students who desire to pursue a scientific career in psychology need to be well trained in these methods.
- Develop an understanding of the central questions/issues in contemporary psychology as well as a historical perspective of psychological theories and key empirical data. In this class, students will learn the current approaches in statistical modeling but these will be related to the more traditional statistics that have been used in the field in prior years giving the students a historical perspective.
- Develop a thorough understanding of one of the major content areas of psychology (i.e., Social/Personality, Developmental, Clinical, Cognitive, Biological). Although, we might be using examples from different areas of psychology, the student will not gain a major understanding of these content areas in this course.
- Develop skills to critically evaluate the presentation of scientific ideas and research in original scientific papers as well as in the popular media. In this course, the students will learn not only how to formulate competing hypothesis and generate the corresponding statistical models but also how to best interpret the results from these quantitative analyses so that they can be communicated in written form in publication format or in spoken form for presentations. These skills are critical for the evaluation of scientific work and conclusions performed by experts in the field and others.
- Become familiar with research methods used in psychological research, and become proficient in basic concepts of statistical analyses and familiar with more advanced methods in data analyses and modeling. This is the central learning goal of this class.
- Learn to develop, articulate, and communicate, both orally and in written form, a testable hypothesis, or an argument drawing from an existing body of literature. The students will not make a formal oral presentation during this class but will be asked during lecture and section to orally explain their results and reasoning. The final written project is designed to teach how to write-up quantitative analyses and statistical reasoning within a longer manuscript analyzing a particular question in the field of psychology.
- Apply a psychological principle to an everyday problem, or take an everyday problem and identify the relevant psychological mechanisms/issues. This learning goal will not be emphasized in the class but students will learn how to formulate a psychological principle in terms of a particular model. This formulation is key to identify how particular problems observed in the student’s everyday life could be analyzed.
Academic Integrity.
https://sa.berkeley.edu/conduct/integrity
Prerequisites.
Undergraduate Statistics for Psychology. Very basic elements of calculus and linear algebra will be used in the course but reintroduced.
Textbooks.
Required:
An R Companion to Applied Regression. John Fox. Sage Publications. Third Edition. (electronic version is available).
Recommended:
Generalized Linear Models with Examples in R. P. Dunn, G. Smyth. Springer.
All of Statistics. A concise Course in Statistical Inference. L Wasserman. Springer.
Applied Regression Analysis and Generalized Linear Models. John Fox. Sage Publications. Third Edition.
Class Schedule
Week 1.
Tuesday. Jan 21rst. Class Logistics: Syllabus, Installing R, R studio, Jupyter. Math Review.
Thursday. Jan 23rd. Basics of Probability Theory. Likelihood.
Homework: Read Ch1 of Fox for week 2.
Week 2.
Tuesday Jan 28th. Important Probability Distributions. Intro to R (Ch 1)
Thursday Jan 30th. Standard Error of measurements. Intro to resampling in R.
Homework: Tutorial 1. SEM exercise (due week 3).
Week 3.
Tuesday Feb 4th. Maximum Likelihood estimation for linear models. Derivation of Normal Equation for bi-variate regression. Introduction to the General Linear Model.
Thursday Feb 6th. General form of Normal Equation. Interpretation of parameters in Multivariate Regression and ANOCOVA
Reading: Fox R Ch4. Fitting linear models (Section 4.1-4.2)
Fox T. Ch 5 and Ch 6.
Homework: Tutorial 2 and Homework 1. (due week 4)
Week 4.
Tuesday Feb 11th. Theory and Practice in R with lm(): ANOVA, MANOVA, ANOCOVA.
Thursday Feb 13th. Theory and Practice in R with lm(): Non-linear fits. Interpreting coefficients (review) and Prob distributions (review)
Reading: Fox R Ch. 4. (Section 4.3) Analysis of Variance Models
Fox T Ch. 7 and Ch 8
Homework: Tutorial 3.
Week 5.
Tuesday Feb 18th. Goodness of fit of linear models. R2, R2adj, Cross-validation. Hypothesis testing and model comparison. Nested models. Likelihood Ratio test. F-test.
Thursday Feb 20th. Finishing F-test.
Reading: Same as last week.
Homework: Tutorial 4.
Week 6.
Tuesday Feb 25th . Cross validation.
Thursday Feb 27th. Midterm 1.
Reading: Fox T. Ch. 21.
Week 7.
Tuesday March 3rd . Cross-validation, Bootstrap and Jackknife..
Thursday March 5th . Permutation Tests. Comments on overfitting and Bayesian Regression (TO BE CONTINUED).
Week 8.
Tuesday March 10th . Introduction to the Generalized Linear Model. Poisson and Binomial Distribution.
Thursday March 12th .GLM: Logistic Regression. Interpreting coefficients. glm() in R.
Reading: Fox R. Ch. 5. Fitting Generalized Linear Models.
Fox T. Ch 14. Logit and Probit Models
Week 9.
Tuesday March 17th. Model Validation in glm(). Deviance
Thrusday March 20th . Model Comparison in glm(). Likelihood Ratio test and Chi-square test. Review of lm() and glm(). Example of Poisson Regression.
Reading: Same as week 7.
Week 10. Tuesday March 31rst. Midterm 2.
Thursday April 2nd. Introduction to Hierarchical Linear Models. Mixed or Random Effects.
Reading: Additional reading will be assigned for this section.
Week 11. April 7th-9th . Finishing Hierarchical Linear Models. The R Psych Package.
Week 12. April 14-16. Introduction to Multivariate Statistics. Principal Component Analysis. Independent component Analysis, Factor Analysis.
Additional reading will be assigned for this section
Week 13. April 21-23. Supervised and Unsupervised Classifiers. LDA (MANOVA), QDA, Random Forest, K-means, Mixture of Gaussians.
Additional reading will be assigned for this section
Week 14. April 28-3. Intro to latent variable modeling. Review.
RRR Week. May 4th-8th.
Final: Thursday May 14. 7-10 PM
Final Project.
The final project involves modeling and analyzing a data set of your choice, hopefully of your own or your own lab. The final project is due at the beginning of the final examination.
Final Project Outline.
Introduction. State the question/hypothesis that lead to the data acquisition with brief theoretical framework. Finish this introduction with a statement that says how this new data set addresses the question. This introduction should be one or two paragraph and include 2-5 critical references. (10 points).
Methods. Describe how the data was collected and the subject pool (5 points). One short paragraph.
Results/Data Display. Create figures to illustrate the major patterns in the data. The axis of the figures should be labeled and the figures need a title. Each figure will also need a short figure legend that you can write in your word processing software. I expect 2 to 5 figures. Embed the figures in your text. (20 points including the R code which will be in the Appendix)
Model Description. State what your competing models are both in English and with equations. The competing models can be just the null model (the mean) and a particular model that corresponds to your hypothesis as in a classical hypothesis testing procedure. But you can also decide that it is more relevant to compare different complex models. (10 points).
Model Fitting. Fit the parameter in your models and give short interpretations for the values of the coefficients (10 points including R code).
Classical Statistical Analysis. Perform the classical statistical analysis using the appropriate R commands. You might combine this section with the previous one but in that case remember to clearly separate the interpretation of the parameters with the statistical analysis. Additional figures might be helpful but are not required. (10 points including R code)
Statistical Analysis by Resampling. Perform a resampling or cross validation in R to perform a second statistical test. Additional figures might be helpful here as well but not required (25 points including R Code)
Conclusion. Write a brief conclusion of your modeling efforts, the statistical analysis and the implications for the question that was raised in the introduction. One or two paragraphs. (10 points)
Appendix. R code.