Support hours

KN95 mask required

Prof Bailey

Tuesdays, 2 – 4 p.m.

Thursdays, 10:15 – 11:15 a.m.

SMUD 403

SDS Fellows

Sunday through Thursday, 7 – 9 p.m.

SCCE E208

Class meetings

KN95 mask required

Section 1

TR 8:30 – 9:50 a.m.

WEBS 102

Section 2

TR 11:30 a.m. – 12:50 p.m.

WEBS 102

Description

Computational data analysis is an essential part of modern statistics and data science. This course provides a practical foundation for students to think with data by participating in the entire data analysis cycle. Students will generate statistical questions and then address them through data acquisition, cleaning, transforming, modeling, and interpretation. This course will introduce students to tools for data management, wrangling, and databases that are common in data science and will apply those tools to real-world applications. Students will undertake practical analyses of large, complex, and messy data sets leveraging modern computing tools.

Prerequisite: STAT 111 or 135 (Intro Stat) and COSC 111 (Intro CS) or instructor consent.

Learning goals

By the end of this course, students will be able to

  • demonstrate comprehensive knowledge of data wrangling using a novel dataset, including gathering, reshaping, and cleaning data, with a reproducible workflow

  • demonstrate your ability to uncover patterns and narratives across a variety of data types including spatial data, textual data, and network data

  • articulate a novel question that you can address with data and recognize when a question cannot be answered with data (or cannot be answered with the data at hand)

  • communicate data narratives via effective visualizations, writing, and oral presentation, including acknowledging the limitations of the data and what the data cannot tell us

  • support your peers, serve as resources for one another, and recognize the value of collaboration and teamwork

  • identify some of the ethical considerations in data science, and contribute thoughtful opinions to current discussions in this area

Approach

This is an active learning course that requires wrestling with the material both in and out of class. Only 3 of the 12 hours that students are expected to dedicate to this course occur in person, so we will dedicate those precious few hours to active engagement with the material and timely feedback from me.

The typical flow will be:

  1. complete assigned material, take notes on your own before synchronous sessions, and ask clarifying questions in office hours, on the class discussion board, or in class

  2. actively problem-solve with peers on in-class labs

  3. demonstrate individual comprehension through homework, reflections, and projects

Required material

Textbook

We will use Modern Data Science with R (2nd Edition) by Baumer, Kaplan, and Horton as our primary source for learning outside of class time. You can access the text for free online or purchase a new or used copy from an online book vendor.

Three copies of the textbook are available on reserve at the circulation desk of the Keefe Science Library (to your left when you enter the first floor of the Science Center) and can be checked out for 4 hours at a time.

Computer

We will be working in RStudio all semester, and this work cannot be done on a tablet or cell phone (Chromebooks may also pose a challenge). Please bring your computer and charger to class each day so you are prepared to work in RStudio. If you are struggling with connectivity or other device issues (e.g., broken computer and need a loaner) please reach out to me as soon as possible and we will work with IT to get you what you need.

Technology

GitHub:

Our primary tool for communication, assignment distribution, collaborative coding, and file version control. There are many GitHub interfaces, and this class will focus primarily on using GitHub within RStudio. Guidance will be provided for getting set up and using GitHub.

  • All questions/comments should be posted in the GitHub Discussion Board unless they cannot be shared with class members.

  • Only use email or private messages for things that cannot be shared with the class (accommodations, grades, absences, interpersonal concerns, etc.). When emailing, please include “STAT 231” at the start of the subject line.

Gradescope:

Our primary tool for assignment submission and feedback. Log in via SSO.

RStudio:

We will use R software with the RStudio interface throughout the course for statistical analysis. While it is possible to use R through a version of RStudio accessible on the web at r.amherst.edu, I expect you to install R and RStudio on your own machine, if possible. R, RStudio, and the necessary TeX package are all freely available. Check the Resources menu for more.

Course components

Component Percent Evaluation
Reading Sets 5 Completion
Labs 5 Participation
Practice Sets 15 Correctness
Project 1 20 Rubric provided
Project 2 25 Rubric provided
Project 3 30 Rubric provided

Active reading (1–3 hours)

Active reading, note-taking, and constructive learning strategies are expected of each student. To assist with these efforts, resources for active reading, effective note-taking, and metacognitive learning strategies will be discussed and emphasized, and resources for success will be available on the course website.

After actively reading the assigned material, check your initial understanding of the material by completing the Reading Set for the week, a set of problems designed to promote active reading of the textbook. The Reading Sets should be done on your own as you initially wrestle with the material.

Assigned readings and Reading Sets should be completed by 10 p.m. ET on Mondays. This will give you the foundation you need to work through our labs on Tuesdays and Thursdays.

Labs (class time)

There are a number of ways to stay engaged with the course (some less optional than others), including maintaining an active presence in class and on GitHub Discussions (engaging with peers by asking and answering questions, sharing content), regularly pulling and pushing content on GitHub, completing assignments, and interacting with me or the SDS Fellows.

In-class time may involve active problem-solving, data analysis, project work, or discussion of assigned readings. There will be a mixture of small-group activities, whole-class discussions, and demo-based sessions lead by me.

Practice sets (3–5 hours)

During the week, get further practice with the material by working through the Practice Set, a set of problems designed to give you practice beyond the examples produced in the text. You may work through these problems with peers, but all work must be completed by you (see Honor Code) and you must indicate who you worked with. Even then, the best approach here is to try the problems on your own before discussing them with peers, and then write your final solutions yourself.

Complete any assigned Practice Sets by Fridays at 10 p.m. ET.

  • I encourage you to start the assignments the day they are available. You can resubmit your work on Gradescope as many times as you want before the deadline, so you should not wait until the last minute to submit some version of your work.

  • When you submit your work to Gradescope, make sure you have selected all pages that correspond to a particular problem when you upload your work to Gradescope. You will not get credit for work that is not assigned to a particular problem.

  • After feedback has been provided, you should review and reflect on the feedback, perhaps in consultation with any provided solutions, me, and/or the SDS Fellows to help you figure out how to improve.

Projects

There will be three projects throughout this class: one individual and two group projects. Details will be provided.

Weekly planning

Here is how you might plan your time on this course in a typical week.

Friday through Monday

  • 2–5 hours: Actively read the textbook chapter(s) and work on the Reading Set for the week
  • Turn in the Reading Set by 10 p.m. ET on Monday

Tuesday

  • 80 minutes: Attend the class session
  • Begin working on the Practice Set for the week

Wednesday

  • Continue working on the Practice Set for the week

Thursday

  • 80 minutes: Attend the class session
  • Continue working on the Practice Set for the week

Friday

  • Turn in the Practice Set by 10 p.m. ET on Friday
  • Reflect on what you have learned so far and what you are still struggling with

Support and resources

My pledge to support you

I strive to make this course a place where all students are welcome and all students can thrive. I look forward to working with you to understand your needs and support your academic success. If you would like to discuss your learning needs with me, please schedule a meeting.

Your mental and physical health are foundational to your overall success and take priority over academic performance. We can work together to make adjustments to the course so that you can take care of personal needs while also demonstrating your knowledge of the material. Please let me know as soon as something arises so we can work together to ensure your success in the course. I do not expect you to disclose personal details—just make me aware when flexibility might be needed so we can work something out. It is much harder to make adjustments weeks or months after the fact, and I may not be able to do so.

Additional support services are available on campus, including your Class Dean, the Counseling Center, and student resource centers.

College accommodations

The college will provide accommodations and services after a student has completed the interactive process with Accessibility Services and is deemed eligible. Accommodations include but are not limited to flexibility with exams (e.g., extra time, reduced distraction, breaks, staggering), assistive devices and services (e.g., note takers), alternative formats for course materials (e.g., large print, braille), and flexibility with attendance. You can reach Accessibility Services via email at accessibility@amherst.edu or via phone at 413-542-2337. Please meet with me to discuss the best implementation of your accommodations once you have them in place-college accommodations can only be implemented after electing them for the course and notifying the instructor.

Honor code and academic integrity

“Every person’s education is the product of their intellectual effort and participation in a process of critical exchange. Amherst College cannot educate those who are unwilling to submit their own work and ideas to critical assessment. Nor can it tolerate those who interfere with the participation of others in the critical process. Therefore, the College considers it a violation of the requirements of intellectual responsibility to submit work that is not one’s own or otherwise to subvert the conditions under which academic work is performed by oneself or by others.”

This course is filled with collaborative work, and you are expected and encouraged to work together with a partner or in small groups to study, complete assignments, and prepare for exams. However, everything you write must be your own. Copying or “slightly rewording” sentences, paragraphs, or blocks of R code from another student is not acceptable and will receive a penalty. No interaction with anyone but the instructor is allowed on any exams or quizzes. Cases of dishonesty, plagiarism, etc., will be reported, per the full statement of the Amherst College Honor Code. If you are not sure if you have plagiarized any material, a handy diagram is available to help you out.

Note on recording

Audio and/or video recording of classes without advance approval from the instructor or an approved disability accommodation is prohibited under the Student Code of Conduct. Any other audio and/or video recording of any individual without that individual’s knowledge or permission (see Massachusetts General Law Part 4, Title I, Chapter 272, Section 99) is also not allowed under the code.