Statistics & Machine Learning
- COS 513/SML 513: Foundations of Probabilistic ModelingThis course covers fundamental topics in probabilistic modeling, an important area of machine learning research. We learn how to model data and develop algorithms to learn the structure underlying these data for the purpose of prediction and decision-making. We cover several model classes--conditional and unconditional models--and several inference algorithms, including variational inference, the algorithm behind variational auto-encoders. At the end of the course, students should be well-equipped to come up with a probabilistic model and inference algorithm for their data, and use the fitted model for tasks of interest.
- PHI 543/SML 543: Machine Learning: A Practical Introduction for Humanists and Social ScientistsMachine learning - especially deep learning - is opening new horizons for research in the humanities and social sciences. This course offers a practical introduction to deep learning for graduate students, without assuming calculus/linear algebra or prior experience with coding. By the end of the course, students are able to code a variety of models themselves, including language and image recognition models, and gain an appreciation for the uses of ML in the humanities/social sciences. The course thus aims to support graduate students' professional development and is correspondingly offered in partnership with GradFUTURES.
- SML 201: Introduction to Data ScienceIntroduction to Data Science provides a practical introduction to the burgeoning field of data science. The course introduces students to the essential tools for conducting data-driven research, including the fundamentals of programming techniques and the essentials of statistics. Students will work with real-world datasets from various domains; write computer code to manipulate, explore, and analyze data; use basic techniques from statistics and machine learning to analyze data; learn to draw conclusions using sound statistical reasoning; and produce scientific reports. No prior knowledge of programming or statistics is required.
- SML 301: Data Intelligence: Modern Data Science MethodsThis course provides the training for students to be independent in modern data analysis. The course emphasizes the rigorous treatment of data and the programming skills and conceptual understanding required for dealing with modern datasets. The course examines data analysis through the lens of statistics and machine learning methods. Students verify their understanding by working with real datasets. The course also covers supporting topics such as experiment design, ethical data use, best practices for statistical and machine learning methods, reproducible research, writing a quantitative research paper, and presenting research results.
- SML 310: Research Projects in Data Science (A)A project-based seminar course in which students work individually or in small teams to tackle data science and machine learning problems, working with real-world datasets. The course emphasizes critical thinking about experiments and large dataset analysis and the ability to clearly communicate one's research. This course is intended to support students in developing the analytical skills necessary for quantitative independent work; students should consult with their home department about how this course could appropriately complement, but not replace, their independent work requirements.
- SML 312: Research Projects in Data Science (B)Project-based course in which students work individually/small teams to tackle DS and ML problems, working with real-world datasets.The course emphasizes critical thinking about experiments and dataset analysis and the ability to clearly communicate one's research. Programming components are taught in Python. Experience in only one of the two programming languages (R and Python) is required.This course is intended to support students in developing the analytical skills for quantitative independent work; students should consult with their home department about how this course could complement, not replace, their independent work requirements.
- SML 320: Bayesian AnalysisThis course provides an introduction to Bayesian analysis---a powerful statistical framework for making inferences and modeling uncertainty in a wide range of applications. Students will explore the fundamental principles of Bayesian statistics, probability theory, Bayesian inference, and practical applications of Bayesian modeling. The course will cover both the theory and hands-on implementation using data science software and the R programming language.
- SML 354/PHI 354: Artificial Intelligence: A Hands-on Introduction from Basics to ChatGPTThis course offers an introduction to deep learning, which is the core technology behind most modern AI applications, aimed at students with minimal coding experience/mathematical background. Emphasis will be placed on gaining a conceptual understanding of deep learning models and on practicing the basic coding skills required to use them in simple contexts. By the end of the course, students will be able to understand, code and train a variety of basic deep learning models, including basic neural nets, image recognition models, and natural language processing models. As a capstone, students will build their own tiny GPT-style text generator.
- SML 505/AST 505: Modern StatisticsThe course provides an introduction to modern statistics and data analysis. It addresses the question, "What should I do if these are my data and this is what I want to know"? The course adopts a model based, largely Bayesian, approach. It introduces the computational means and software packages to explore data and infer underlying parameters from them. An emphasis will be put on streamlining model specification and evaluation by leveraging probabilistic programming frameworks. The topics are exemplified by real-world applications drawn from across the sciences.
- SML 510: Graduate Research SeminarThis course is for graduate students enrolled in the CSML Graduate Certificate Program and is part of the certificate requirements. Students enrolled in the certificate must enroll, attend and present their research during at least one semester. Each week features a presentation by a student, invited faculty or external visitors. All students are required to read materials prior to the workshop and come prepared to engage in conversation. Each week a student presents, a second student introduces the speaker and gives background on the work and a third student moderates the post-presentation discussion.