Statistics & Machine Learning

SML 201: Introduction to Data ScienceIntroduction to Data Science provides a practical introduction to the burgeoning field of data science. The course introduces students to the essential tools for conducting datadriven research, including the fundamentals of programming techniques and the essentials of statistics. Students will work with realworld datasets from various domains; write computer code to manipulate, explore, and analyze data; use basic techniques from statistics and machine learning to analyze data; learn to draw conclusions using sound statistical reasoning; and produce scientific reports. No prior knowledge of programming or statistics is required.

COS 302/SML 305: Mathematics for Numerical Computing and Machine LearningThis course provides a comprehensive and practical background for students interested in continuous mathematics for computer science. The goal is to prepare students for higherlevel subjects in artificial intelligence, machine learning, computer vision, natural language processing, graphics, and other topics that require numerical computation. This course is intended students who wish to pursue these more advanced topics, but who have not taken (or do not feel comfortable) with universitylevel multivariable calculus (e.g., MAT 201/203) and probability (e.g., ORF 245 or ORF 309). See "Other Information"

SML 310: Research Projects in Data ScienceA projectbased seminar course in which students work individually or in small teams to tackle data science and machine learning problems, working with realworld datasets. The course emphasizes critical thinking about experiments and large dataset analysis and the ability to clearly communicate one's research. This course is intended to support students in developing the analytical skills necessary for quantitative independent work; students should consult with their home department about how this course could appropriately complement, but not replace, their independent work requirements.

SML 480: Pedagogy of Data ScienceIn this seminar, we will explore the pedagogy of introductory data science. Students in the seminar will be required to work as undergraduate course assistants in SML 201  Introduction to Data Science. SML 201 topics will be discussed in more depth in the seminar, with a view of teaching the basic material. We will discuss literature in the pedagogy of computer science and statistics.Discussion topics will include teaching programming using the functional programming paradigm, the design of the dplyr package, simulationbased inference, teaching statistics using simulationbased inference, the grammar of graphics, and causal inference.

SML 510: Graduate Research SeminarThis course is for graduate students enrolled in the CSML Graduate Certificate Program and is part of the certificate requirements. Students enrolled in the certificate must enroll, attend and present their research during at least one semester. Each week features a presentation by a student, invited faculty or external visitors. All students are required to read materials prior to the workshop and come prepared to engage in conversation. Each week a student presents, a second student introduces the speaker and gives background on the work and a third student moderates the postpresentation discussion.

SML 515/AST 515: Topics in Statistics and Machine Learning: Statistical Data AnalysisThe course provides an introduction to modern data analysis and data science. It addresses the central question, "what should I do if these are my data and this is what I want to know"? The course covers basic and advanced statistical descriptions of data. It also introduces the computational means and software packages to explore data and infer underlying structural parameters from them. The topics are exemplified by realworld applications. Prerequisites are linear algebra, multivariate analysis, and a familiarity with basic statistics and programming (ideally in python).