Chapter 1: Introduction
Machine Learning from Human Preferences explores the challenge of efficiently and effectively eliciting preferences from individuals, groups, and societies and embedding them within AI systems.
We focus on statistical and conceptual foundations and strategies for interactively querying humans to elicit information that improves learning and applications.
This class is not exhaustive!
Feedback can be included at any step of training
Feedback-Update Taxonomy
| Dataset Update | Loss Function Update | Parameter Space Update | |
|---|---|---|---|
| Domain | Dataset modification, Augmentation, Preprocessing, Data generation from constraint, Fairness, weak supervision, Use unlabeled data, Check synthetic data | Constraint specification, Fairness, Interpretability, Resource constraints | Model editing, Rules, Weights, Model selection, Prior update, Complexity |
| Observation | Active data collection, Add data, Relabel data, Reweighting data, collecting expert labels, Passive observation | Constraint elicitation, Metric learning, Human representations, Collecting contextual information, Generative factors, concept representations, Feature attributions | Feature modification, Add/remove features, Engineering features |
Builds on research studying human feedback in language
Harpale, Sarawagi, and Chakrabarti (2004)


He et al. (2016)
Ouyang et al. (2022)
OpenAI Experiments with RLHF


Stiennon et al. (2020)
We have not figured out how to do it quite right, or we need new approaches


Santurkar et al. (2023)
Personalize therapy


Why elicit metric preferences?
Robertson, Haupt, and Koyejo (2023)

Bradley Knox and Stone (2008)

Christiano et al. (2017)
Flying helicopters using imitation learning and inverse reinforcement learning (IRL)
Coates, Abbeel, and Ng (2008)
Biyik and Sadigh (2018)
Biyik, Talati, and Sadigh (2022)
Reward hacking
Design of tools for eliciting feedback from humans often has to tradeoff several factors
CS 221, CS 229, or equivalent. You are expected to:
Our textbook is available online at: mlhp.stanford.edu
Chapter 2: Human Decision Making and Choice Models

Chapter 1: Introduction