Peace.berkeley.edu



Introduction to Probability and Statistics in Biology and Public HealthPH142 (4 units) Fall 2018Course SyllabusInstructorsCorinne Riddell, MSc, PhDAssistant Adjunct Professor of Epidemiology and BiostatisticsUniversity of California, Berkeley2121 Berkeley Way Berkeley, CA, 94720Email: c.riddell@berkeley.eduAlan Hubbard, PhDProfessor of BiostatisticsUniversity of California, Berkeley2121 Berkeley WayBerkeley, CA, 94720Email: hubbard@berkeley.eduOffice hours with instructors are by appointment.Graduate Student Instructors (GSIs)Sarah Johnson (Head GSI)sarah_johnson@berkeley.eduEdie Espejoespejo@berkeley.eduAsem Berkalievaasem_berkalieva@berkeley.eduPhillippe Boileauphillippe_boileau@berkeley.edu Naomi Wilcoxnwilcox01@berkeley.edu Office hours: Date/time/location to be added to the syllabus during the second week of instruction. Course DescriptionThis course is an introduction to statistics and data science, primarily for MPH and undergraduate public health majors, and others interested in public health topics. The course can be divided into three parts. In Part I, we will focus on learning to use R to explore and summarize univariate and bivariate distributions. Specifically, we will use the dplyr and ggplot2 packages. Part II of the course introduces classical problems in probability and the Normal, binomial, and Poisson distributions. The most important topic we will cover in Part II is the Central Limit Theorem. In Part III, we introduce statistical inference, the process of estimating statistics from samples to make inference about populations. Throughout the course, we will follow the PPDAC model, which stands for “Problem, Plan, Data, Analysis, and Conclusion”. Learning ObjectivesOn the first midterm you may be tested on your ability to:Describe distributions of variables visually and calculate summary statistics for measures of centrality and spreadDetermine the appropriate graphic to plot distributions and provide code snippets to manipulate and visualize data frames Interpret output from a simple linear regression modelOn the second midterm you may be tested on your ability to:Compute probabilities using the general rulesIdentify and describe binomial and Poisson random variablesUse basic properties of the Normal distribution to compute probabilitiesDescribe the central limit theoremEstimate means, proportions, and differences between means and proportions, compute their confidence intervals and perform statistical testsState the assumptions and importance of the assumptions for statistical testsOn the final exam you may be tested on your ability to:Perform a simple chi-squared testPerform a matched t-testDescribe and check the assumptions for simple linear regression. Interpret the confidence interval and statistical test of regression intercept and slope coefficientsDescribe ANOVA, including the null and alternative hypotheses, and interpret outputDescribe when bootstrapping can be usedDescribe a permutation testDemonstrate knowledge that has been used throughout the term, in terms of data visualization and data manipulationSchedulingLecture (CC# 28928): MWF 8:10am – 9am, 245 Li Ka ShingLab SectionNo. CC#Date/TimeLocationGSI101B29031Th 5:00P-5:59P124?Wheeler?Edie 102B29032W 1:00P-1:59P242?Hearst GymNaomi103B29033Th 5:00P-5:59P185?BarrowsAsem104B29034W 3:00P-3:59P126?WheelerAsem105B29035W 4:00P-4:59PNEW LOCATION: 141 GianniniSarah106B29036W 5:00P-5:59PNEW LOCATION: 141 Giannini?Sarah107B29037Th 8:00A-8:59A179?DwinellePhil108B29038Th 9:00A-9:59A124?Wheeler?Edie110B29040Th 4:00P-4:59P136?BarrowsAsem109B29039Th 12:00P-12:59P220?WheelerREMOVEDDiscussion SectionNo. CC#Date/TimeLocationGSI10128929Th 5:00P-5:59P2038?Valley Life Sciences?Naomi 10228930Th 4:00P-4:59P?220?Wheeler?Naomi10328931Th 4:00P-4:59P185?BarrowsPhil10428932F 12:00P-12:59P242?Hearst GymPhil10528934F 1:00P-1:59P215?DwinelleNaomi10628987F 1:00P-1:59P223?Dwinelle?Phil10728988F 1:00P-1:59P229?DwinelleSarah10828989F 2:00P-2:59P385?LeConteAsem10928990F 2:00P-2:59P124?WheelerSarah11028991F 2:00P-2:59P126?WheelerEdieSoftwareWe will be using R, a statistical programming language, and RStudio, an integrated development environment on the RStudio Cloud. Here is the link to join our course on the cloud: . Sign up for an RStudio Cloud account today, or before your first discussion section. Use of the cloud is required for homework assignments and lab exercises. Using the cloud requires an internet connection and web browser. LectureThe course schedule is included below. Any changes to the schedule will be updated in the syllabus on bCourses. Lecture slides that are created in R markdown will be available on RStudio Cloud, ideally at least 12 hours in advance of the lecture, but this is not guaranteed. Occasionally, slide sets in other formats (pdf and pptx) will be distributed on bCourses, and in this case an announcement will notify you of this. Many lectures, including all of Part I of the course encourage the use of RStudio Cloud during lecture, lab, and discussion section. Please bring your laptop to class if possible, though take steps to minimize distractions (i.e., turning off notifications on your laptops/phones). Specifically, do not use laptops for entertainment and do not display any material on the laptop which may be distracting or offensive to your fellow students.Lectures will be webcast, with recordings posted to CalCentral on the Class Page for PH142. All official students will have access to this page via the My Academics tab. ReadingsThe course textbook is “The practice of statistics in the life sciences” by Brigitte Baldi and David S. Moore. The 4th edition is the latest one, but previous editions are fine. The course textbook is on reserve for two-hour time intervals at the Biosciences, Natural Resources, and Public Health Library (2101 Valley Life Sciences Building).Discussion section and accompanying assignmentsMost often, the discussion section will be used to go over the problem sets that will be distributed on RStudio Cloud on Tuesday night or Wednesday morning. While group work is encouraged, students must submit their own code and completed answers for marking. Instructions for submitting assignments will be provided on RStudio Cloud in the first assignment’s folder when it is posted. Discussion section attendance will not be monitored, though it will be your best opportunity for working on the assignment and getting assignment questions answered. Students are encouraged to go to their registered section but may attend other sections if necessary.Lab sectionThe lab section will be used to practice R programming skills taught in the previous week and prepare for the assignment. Lab attendance will be monitored, and additional credit may be given to students with high lab attendance if they have done poorly on some graded aspects of the course. Students are encouraged to go to their registered section but may attend other sections if necessary.Midterms and Final ExamThere are two in-class midterms (September 17 and October 22) and one final exam(December 10th, 7-10pm). If you have a conflict with any of the exam dates, please email the instructor by September 1st so that we can discuss possible accommodations, such as taking the exam early. Please note that only in extremely rare circumstances such as illness (with a doctor's note) will the in-class midterm be given to individual students after the scheduled examination date. Exams will cover the material presented in lecture, discussion, and lab sections, including R coding syntax, unless otherwise noted.You can bring one sheet of notes that are hand-written or typed. If typed, the font size must not be less than 10 points. We will attempt to return graded examinations within two weeks after the exam date. Occasionally, students request a review of a graded exam question. Such requests must be detailed and submitted in writing to the GSIs no later than three days after the graded exams have been returned. Note that if you request reconsideration of a graded question, instructors may reconsider grades on other exam questions.Appropriate accommodations for the midterm will be made for those with disabilities (pleaserefer to the “Disabilities” section, below).GradingCourse grades are based on the following activities:Homework assignments: 20% Midterm Exam 1 (Monday, September 17: 8:10am-9am): 20%Midterm Exam 2 (Monday, October 22: 8:10am-9am): 20%Infographic (Due on Friday, November 30th): 10%Final Exam (December 10: 7pm-10pm): 30%S/U (satisfactory/unsatisfactory) grading is permitted for this course. There are no differencesin the course requirements or the grading for students who choose this option. “S” will appearon transcripts for grades of “B-” or above.Student Questions and ConcernsThis term we will be using Piazza for class discussion. The system is highly catered to getting you help fast and efficiently from classmates, the GSIs, and myself. Rather than emailing questions to the teaching staff, please post your questions on Piazza. If you have any problems or feedback for the developers, email team@.Find our class page at:? will respond to posted questions within 24 hours. Please do not email course content questions directly to the instructor or GSIs. GSIs will respond to Piazza questions up until 24 hours before exams. However, students may continue to post and answer each other’s questions in the last 24 hours before an exam.Questions during lecture, discussion and lab section are strongly encouraged. If something is unclear to you, it is probably unclear to many others in the room. There may be times, however, when the instructor or the GSI decides that a particular question or discussion is not helpful to the entire class or will take too long to address satisfactorily. In these cases, we may defer the question to be answered outside of class time. If important questions are answered outside of class, the answers will be shared with the entire class on Piazza.Class Announcements and ResourcesClass announcements will be made at the beginning of lecture or discussion sections and onPiazza or bCourses. bCourses will be used to post material that is not posted on RStudio Cloud, such as some lecture slides, and the syllabus. If you need any bCourses material, we will include an announcement to let you know. Information on bCoursesYou can access the bCourses course site through . Enter yourCalNet ID and passphrase as your username and password and click Login. Click on the Coursesmenu tab and then click on PH142 to access the course material. If you are not officially enrolled in the class, please ask your GSI about gaining permission to the site.Harassment policyWe are all responsible for creating an environment that is welcoming, civil, safe, and tolerant. UC Berkeley does not tolerate harassment of PH142 students, GSIs, or instructors.Instructors and GSIs will act to stop acts of harassment in the classroom.Students experiencing harassment can contact the office for the prevention of harassment and discrimination. To file a report, you can email ask_ophd@berkeley.edu or call them at (510) 643-7984. For more information, see: . Please note that Instructors and GSIs are Responsible Employees and must report incidents of sexual violence and harassment to the Office for Prevention of Harassment and Discrimination. Please see this website for confidential reporting resources: EtiquettePlease arrive on time for class and navigate to RStudio Cloud so that we can begin the lecture promptly at 8:10am. If you need to arrive late or leave early, please sit near an exit to minimize distraction. The instructors will strive to end the lecture exactly at 9:00am so you have time to commute to your next class. If we happen to go over by a minute or two, please do not begin packing up until lecture is finished as this means the material being covered is important. All email communication should be polite and respectful. MisconductAs a member of the campus community, you are expected to demonstrate integrity in all ofyour academic endeavors and will be evaluated on your own merits. The consequences ofcheating and academic misconduct - including failure of the course and a formal discipline file -are simply not worth it. You are expected to abide by the UC Berkeley Honor Code: “As amember of the UC Berkeley community, I act with honesty, integrity, and respect for others.”Disabilities: The Disabled Students Program (DSP)The mission of the Disabled Students' Program (DSP) is to ensure that all students withdisabilities have equal access to educational opportunities at UC Berkeley. The DSP offers awide range of services, accommodations, and auxiliary services for students with disabilities.These services are individually designed and based on the specific needs of each student asidentified by DSP's Specialists.We will accommodate disabled students’ needs according to DSP documentation; please notifythe DSP if you require such accommodation (DSP will then contact the instructor). Note thatthis may take several weeks, so please initiate this process ASAP so that any accommodations can be implemented in time for the first midterm exam. Steps to the application process: Students’ Program260 César E. Chávez Student Center, #4250Berkeley, CA 94720-4250dsp@berkeley.edu(link sends e-mail)Voice: (510) 642-0518TTY: (510) 642-6376Fax: (510) 643-9686Course scheduleLectureDateTopicReadings1Wednesday, Aug 22Introduction to the course, the cloud, and PPDACNone2Friday, Aug 24Working with data in R and RStudio (dplyr package)None3Monday, Aug 27Visualizing data in R and RStudio (ggplot2 package)None 4Wednesday, Aug 29Visualizing distributions for one variable, numerically summarizing spread and central tendencyChapter 1 & 25Friday, Aug 31Exploring relationships between two variablesChapter 3--Monday, Sept 3Holiday--6Wednesday, Sept 5Introduction to Regression Chapter 4 7Friday, Sept 7Two-way tables (Relationships between two categorical variables)Chapter 58Monday, Sept 10Samples and observational studiesChapter 69Wednesday, Sept 12Designing ExperimentsChapter 710Friday, Sept 14TBD: Catchup or review11Monday, Sept 17MIDTERM 1?12Wednesday, Sept 19Introduction to probabilityChapter 913Friday, Sept 21General rules of probabilityChapter 10 14Monday, Sept 24The Normal distributionChapter 11 15Wednesday, Sept 26Discrete probability distributions (binomial and Poisson)Chapter 1216Friday, Sept 28Sampling distributions for a mean and proportionChapter 1317Monday, Oct 1The central limit theoremChapter 1318Wednesday, Oct 3Intro to confidence intervals and significance testingChapter 1419Friday, Oct 5Power, type I and type II error, sample sizeChapter 1520Monday, Oct 8Inference for a population meanChapter 1721Wednesday, Oct 10Comparing two means Chapter 1822Friday, Oct 12Inference for a population proportionChapter 1923Monday, Oct 22MIDTERM 2?24Wednesday, Oct 24Comparing two proportionsChapter 2025Friday, Oct 26Matched comparisonsChapter 1826Monday, Oct 29Bootstrapping confidence intervalsNone27Wednesday, Oct 31Comparing two proportions part II (RR OR RRR ARR NNT)Chapter 2028Friday, Nov 2The Chi-square test for goodness of fitChapter 2129Monday, Nov 5The Chi-square test for two-way tablesChapter 2230Wednesday, Nov 7Permutation testsNone31Friday, Nov 9 Inference for regression I Chapter 23--Monday, Nov 12 Holiday--32Wednesday, Nov 14 Inference for regression II Chapter 2333Friday, Nov 16 Comparison of many means (ANOVA)Chapter 2434Monday, Nov 19 ANOVA II/Tukey's HSDChapter 24--Wednesday, Nov 21Holiday----Friday, Nov 23Holiday--35Monday, Nov 26Review for the final examination36Wednesday, Nov 28Review for the final examination37Friday, Nov 30Review for the final examination ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download