Machine Learning from Human Preferences
Introduction
Machine learning is increasingly shaping various aspects of our lives, from education and healthcare to scientific discovery. A key challenge in developing trustworthy intelligent systems is ensuring they align with human preferences. Learning from human feedback offers a promising solution to this challenge. This book introduces the foundations and practical applications of machine learning from human preferences. Instead of manually predefining the learning goal, the book presents preference-based learning that incorporates human feedback to guide the learning process, drawing insights from related fields such as economics, psychology, and human-computer interaction. By the end of this book, readers will be equipped with the key concepts and tools needed to design systems that effectively align with human preferences.
The book is intended for researchers, practitioners, and students who are interested in intergrating machine learning with human-centered application. We assume some basic knowledge of probability and statistics, but provides sufficient background and references for the readers to follow the main ideas. The book also provides illustrative program examples and datasets. The field of machine learning from human preference is a vibrant area of research and practice with many open challenges and opportunities, and we hope that this book will inspire readers to further explore and advance this exciting field.
We hope with the present book to both allow more use of human preferences in machine learning, and new data modalities as Artificial Intelligence systems become increasingly important.
Stanford, May 2025, THK
Structure of this book
The book has three parts which introduce fundamental models, present learning paradigms, and discuss assumptions.
Background
We provide background on axioms underlying comparisons in Chapter 1. We discover key modeling assumptions It covers random preference models the Independence of Irrelevant Alternatives (IIA), and types of comparison data (binary rankings, accept-reject, lists). The chapter also discusses the main limitations of IIA based on heterogeneity.
Learning
The second part introduces several approaches to learning from comparisons.
Chapter 2 considers a setting where comparison data is given and studies both maximum likelihood and posterior-based learning of comparison models. It uses case studies from language modeling and robotics. We discuss the challenges in learning multimodal/heterogenous rewards that fail to satisfy IIA.
Chapter 3 considers active data collection of comparisons with the goal of optimal inference on comparison models using Various strategies are explored, including reducing the learner’s variance, exploiting ambiguity and domain knowledge in ranking, with a case study from robotics.
Chapter 4 studies processes where comparisons are used to guide decisions. We first set up the bandit approach to recommending maximal objects with respect to comparisons, and discuss dueling bandits. We then consider as well as reinforcement learning from human feedback (RLHF) to align language models that decide on which text to generate. We highlight the role of uncertainty quantification and exploration for decision-making.
Chapter 5 considers decision-making in the presence of heterogeneity. We first focus on dealing with heterogeneity to maximize average utility using personalization. We then discuss aggregation mechanisms that are voting-based and decisions that are independent of some some features of the outcome.
Reflection
The final part of the book discusses limitations of comparison data, and opportunities resulting from stated preference data.
Chapter 6 critiques machine learning from comparisons. It takes different disciplinary lenses, from social psychology, philosophy, and critical studies, to highlight where comparisons are limited in the expression of human preferences, and what are alternatives.
Chapter 7 considers models that are broader than comparisons in our model, many of which we can think of as stated preferences. These are models in which value judgments are given in terms of Likert scales or textual descriptions. We propose ways in how such feedback can be merged with comparison data to better express preferences.
How to engage with this book
Threre are three models of reading, and teaching with, this book. Chapter 1 is underlying all of the book, so is part of all of these pathways.
For practitioners and those teaching applied AI content, we recommend a reading of Chapters 1, 2, 4, and 7, which can be used as a sequence in an early graduate course on Machine Learning. It allows to highlight human data sources in an introductory machine learning course.
For people with background in discrete choice, we propose to skim Chapter 1, and study Chapters 2 and 4. These studies allow readers to integrate machine learning in their studies of discrete choice, demand models, and Industrial Organization.
For those with deep background in machine learning, we propose to study chapters 2-4 and 7. These chapters maximize the amount of machine learning covered, and is suitable for a deep learning-based course of machine learning.
For those interested in the methodological and theoretical foundations of machine leraning from comparisons, we recommend a reading of chapters 1, 5, 6, and 7. Chapter 1 and 5 study the underpinnings of revealed preferences and aggregation, chapter 6 critiques these assumptions, and chapter 7 looks at broader ways of eliciting preferences. It is suitable for critical study in a course on Computation and Society.
Prior knowledge
The book assumes knowledge of the fundamentals of statistics, linear algebra and machine learning. Many example code excerpts are written in python
, and make experience in the python
programming language valuable for readers.
Additional Materials
Every chapter has problems for readers and slides for teaching of the material available. They are available on the book’s website.