Machine Learning from Human Preferences

Authors

Sang T. Truong

Andreas Haupt

Sanmi Koyejo

Updated

February 13, 2026

Introduction

Machine learning is increasingly shaping various aspects of our lives, from education and healthcare to scientific discovery. A key challenge in developing trustworthy intelligent systems is ensuring they align with human preferences. Learning from human feedback offers a promising solution to this challenge. This book introduces the foundations and practical applications of machine learning from human preferences. Instead of manually predefining the learning goal, the book presents preference-based learning that incorporates human feedback to guide the learning process, drawing insights from related fields such as economics, psychology, and human-computer interaction. Throughout, we emphasize not only the methods themselves but also their assumptions, limitations, and the conditions under which they can and cannot be applied—understanding when a model fails is as important as understanding when it works. By the end of this book, readers will be equipped with the key concepts and tools needed to design systems that effectively align with human preferences.

The book is intended for researchers, practitioners, and students who are interested in integrating machine learning with human-centered application. We assume some basic knowledge of probability and statistics, but provides sufficient background and references for the readers to follow the main ideas. The book also provides illustrative program examples and datasets. The field of machine learning from human preference is a vibrant area of research and practice with many open challenges and opportunities, and we hope that this book will inspire readers to further explore and advance this exciting field.

We hope with the present book to both allow more use of human preferences in machine learning, and new data modalities as Artificial Intelligence systems become increasingly important.

Stanford, May 2025, THK

Structure of this book

The book has three parts which introduce foundational models, present learning paradigms, and discuss societal considerations.

Part 1: Foundations

Chapter 1 lays the mathematical groundwork for the rest of the book. It covers random preference models, types of comparison data (binary rankings, accept-reject, lists), and deterministic and stochastic utility models including the Rasch model, Bradley-Terry, and Gaussian Processes. A central theme is the Independence of Irrelevant Alternatives (IIA) axiom: how it simplifies modeling, its connection to modern methods like DPO and Elo, and its key limitations due to population heterogeneity.

Part 2: Learning

The second part introduces approaches to learning from and acting on comparisons.

  • Chapter 2 studies how to infer utility from preference data. It covers maximum likelihood estimation, Bayesian inference via MCMC and Laplace approximation, and online learning through Elo ratings as stochastic gradient descent. The chapter also addresses regularization, model selection via cross-validation, and modern optimization methods, with a case study on LLM preference learning.

  • Chapter 3 considers active data collection with the goal of efficient preference elicitation. Fisher information quantifies how much each comparison teaches us, enabling optimal query selection via A-optimal, D-optimal, and E-optimal design criteria. The chapter includes applications to robotic trajectory learning and active DPO for language model alignment.

  • Chapter 4 studies how learned preferences guide sequential decisions under uncertainty. It introduces Thompson Sampling for both linear and nonlinear objectives, dueling bandits with distinct winner concepts (Condorcet, Borda, von Neumann), preferential Bayesian optimization, and human-agent cooperation through CIRL.

Part 3: Society

The final part of the book addresses the societal dimensions of preference learning: how to aggregate diverse preferences and whose preferences should count.

  • Chapter 5 examines preference aggregation when multiple stakeholders disagree. It introduces social choice theory, Arrow’s and Gibbard-Satterthwaite’s impossibility theorems, and ways to escape these impossibilities through domain restrictions and scoring rules. It connects the Borda count to DPO, discusses nosy preferences and the liberal paradox, analyzes Community Notes as a case study, and covers mechanism design for incentive-compatible preference elicitation.

  • Chapter 6 asks a normative question: whose preferences matter? It shows how technical design choices at each stage of the preference learning pipeline embed value judgments. The chapter covers individual and group fairness, how unfairness compounds across pipeline stages, and design principles for responsible preference learning systems, including human values, AI alignment, and human-centered design.

Chapter 7 concludes the book by reflecting on its interdisciplinary approach and discussing open challenges, including moving beyond pairwise comparisons, scalable oversight, heterogeneity and personalization, fairness, and foundation model alignment.

How to engage with this book

There are four models of reading, and teaching with, this book. Chapter 1 is foundational to all of the book, so is part of all of these pathways.

  • For practitioners and those teaching applied AI content, we recommend Chapters 1, 2, and 4, which can be used as a sequence in an early graduate course on Machine Learning. This sequence covers foundations, learning methods, and decision-making with human preferences.

  • For people with background in discrete choice, we propose to skim Chapter 1, and study Chapters 2 and 4. These studies allow readers to integrate machine learning in their studies of discrete choice, demand models, and Industrial Organization.

  • For those with deep background in machine learning, we propose to study Chapters 2-4. These chapters maximize the amount of machine learning covered, and are suitable for a deep learning-based course on machine learning.

  • For those interested in the societal and theoretical foundations of machine learning from comparisons, we recommend Chapters 1, 5, and 6. Chapter 1 establishes the modeling foundations, Chapter 5 studies aggregation and mechanism design, and Chapter 6 examines fairness and value alignment. This pathway is suitable for a course on Computation and Society.

For instructors

This book is designed to support course instruction, and each chapter includes lecture plan callouts with suggested timing. A few notes for instructors adopting this book:

  • Pacing: The book can be covered in a 10-week quarter (as in Stanford’s CS329H) or a 15-week semester. For a quarter: allocate approximately 2 weeks each for Chapters 1–2, 1.5 weeks for Chapter 3, 2.5 weeks for Chapter 4, and 1 week each for Chapters 5–6. For a semester, expand each chapter proportionally.
  • Core vs. optional material: The “Beyond Bradley-Terry” sections in Chapter 1 and the RL foundations section in Chapter 4 (marked “Optional”) can be skipped without loss of continuity. All other material is core.
  • Assumptions and limitations: Throughout the book, callout boxes marked with warnings highlight key assumptions and their limitations. We encourage instructors to treat these as first-class content—understanding when models fail is as pedagogically important as understanding when they work.
  • Cross-chapter connections: Several results connect across chapters in non-obvious ways: Elo is SGD on the Bradley-Terry likelihood (Ch1–2); DPO implicitly optimizes the Borda score (Ch2–5); Fisher information drives both active learning (Ch3) and fairness auditing (Ch6). Emphasizing these connections reinforces the book’s unified perspective.
  • Assessment: Each chapter includes exercises at three difficulty levels, discussion questions for class participation, and quick check questions for self-assessment. Problem sets are available on the Teaching Materials page.

Prior knowledge

The book assumes knowledge of the fundamentals of statistics, linear algebra and machine learning. Many example code excerpts are written in python, and make experience in the python programming language valuable for readers.

Additional Materials

Every chapter has problems for readers and slides for teaching of the material available. See the Teaching Materials page for all slides, problem sets, and suggested course pathways.

Acknowledgments

Initial versions of this book were compiled as lecture notes to the class CS329H: Machine Learning from Human Preferences at Stanford University taught in Fall 2023 and Fall 2024. We thank Rehaan Ahmad, Ahmed Ahmed, Jirayu Burapacheep, Michael Byun, Akash Chaurasia, Andrew Conkey, Tanvi Deshpande, Eric Han, Laya Iyer, Adarsh Jeewajee, Shreyas Kar, Arjun Karanam, Jared Moore, Aashiq Muhamed, Bidipta Sarkar, William Shabecoff, Stephan Sharkov, Max Sobol Mark, Kushal Thaman, Joe Vincent, Yibo Zhang, Duc Nguyen, Grace Sodunke, Ky Nguyen, and Mykkel Kochenderfer for their early contributions and feedback.

Citation

Thanks for reading our book! We hope you find this book useful in your research and teaching.

BibTeX citation:
@book{mlhp,
  author    = {Truong, Sang and Haupt, Andreas and Koyejo, Sanmi},
  title     = {{Machine Learning from Human Preferences}},
  year      = {2025},
  publisher = {Stanford University},
  doi       = {},
  note      = {}
}
For attribution, please cite this work as:
S. Truong, A. Haupt, and S. Koyejo. 2025. Machine Learning from Human Preferences. Stanford University.