Chapter 6: Whose Preferences?
These are not merely technical decisions — they are fairness decisions with real consequences.
Every preference learning system follows four stages, each embedding values:
Bias at each stage compounds through feedback loops.
Setting: Large ML conference, 10,000 papers, 5,000 reviewers. AI assistant learns from past reviews to help write better reviews.
Clear stakeholders, measurable outcomes, real fairness concerns.
Tracing the pipeline stage by stage
Three design options for elicitation policy \(\mathcal{E}\):
Value embedded: Options 2 & 3 prioritize efficiency over representation
This seems purely technical — but has profound fairness consequences.
Observable behavior \(\neq\) underlying mental state:
\[ P(B \mid M, C) \neq P(B \mid M) \]
A detailed review could reflect expertise (\(M\)) or available time (\(C\)).
Naive interpretation: Reviewer A writes 800-word reviews, B writes 400-word reviews \(\Rightarrow\) A is more thorough \(\Rightarrow\) query A more
Reality: A is a senior professor with a light teaching load. B is a postdoc with 60-hour weeks, writing reviews at midnight.
Result: Active learning queries A more \(\rightarrow\) model learns A’s style \(\rightarrow\) assists A well \(\rightarrow\) A uses it more \(\rightarrow\) more data. B gets poor assistance \(\rightarrow\) disengages \(\rightarrow\) less data. The gap widens.
Optimizing for revealed behavior systematically misunderstands groups whose context differs.
If we query productive reviewers more, we systematically undersample:
The AI assistant becomes good at helping senior, privileged reviewers and poor at helping those already disadvantaged — this is disparate impact.
Ensure representation from diverse reviewer populations:
The tradeoff: Efficiency vs. fairness
There is no “right” answer — but you must choose consciously and transparently.
Bradley-Terry assumes Independence of Irrelevant Alternatives (IIA):
\[ \frac{P(\text{choose } j \mid \{j, k\})}{P(\text{choose } k \mid \{j, k\})} = \frac{P(\text{choose } j \mid \{j, k, \ell\})}{P(\text{choose } k \mid \{j, k, \ell\})} \]
Reduces model from \(M!\) parameters to \(M\) item utilities — makes learning tractable.
But IIA is violated in peer review — and these violations have fairness consequences.
Complementarity: Already reviewed 2 similar papers \(\rightarrow\) less interested in a third. Preference changes with choice set.
Framing effects: Paper C is exceptional quality \(\rightarrow\) Paper A now looks mediocre by comparison. Reference point shifts preference.
Reviewer load: Heavy assignment load \(\rightarrow\) tolerance for marginal papers decreases. Preferences reflect current state, not intrinsic quality.
By assuming IIA, we say: these are “irrational” and we’ll ignore them.
Who is disadvantaged by assuming IIA?
Groups with context-dependent preferences:
These groups overlap with disadvantaged demographics: junior reviewers, non-Western reviewers, caregivers.
IIA works well for low-load senior reviewers. It works poorly for overburdened reviewers — this is individual unfairness.
Model context-dependent preferences explicitly:
\[ H_{ij} = U_i^\top V_j + f(C_i) \]
Example with reviewer load \(n_i\):
\[ H_{ij} = U_i^\top V_j - \lambda n_i \]
Tradeoff: More data needed (additional parameters) vs. fairer model for all groups.
DPO (Chapter 3) maximizes:
\[ \mathcal{L}_{\text{DPO}} = -\mathbb{E}\!\left[\log \sigma\!\left(\beta \log \frac{\pi_\theta(y_w \mid x)}{\pi_{\text{ref}}(y_w \mid x)} - \beta \log \frac{\pi_\theta(y_l \mid x)}{\pi_{\text{ref}}(y_l \mid x)}\right)\right] \]
Key question: Who contributes to \(\pi_{\text{ref}}\)?
If trained on all past reviews, reviewers who write more reviews get more weight — volume-weighted aggregation.
High-volume reviewers tend to be:
Volume-weighting encodes status quo bias — the system learns to reproduce existing inequities.
Value embedded: Past data reflects “true” preferences worth replicating. But past data also reflects historical inequalities.
Instead of volume-weighted:
Tradeoff:
| Strategy | Variance | Bias | Fairness |
|---|---|---|---|
| Volume-weighted | Low | High (status quo) | Poor |
| Equal-weighted | Higher | Low | Better |
| Group-stratified | Medium | Low | Best |
Should the AI respect stated preferences or sometimes override them?
Drawing on Amartya Sen’s framework:
Liberal assistance: Respects non-nosy preferences — helps each reviewer follow their own style
Illiberal assistance: Enforces community standards — nudges toward constructive feedback
Liberal: Reviewer A writes harsh reviews \(\rightarrow\) assistant helps them write consistently harsh reviews
Illiberal: Assistant nudges Reviewer A toward constructive feedback per community norms
When is illiberal justified? (Sen’s framework)
Liberal assistance may be unfair if:
Illiberal assistance may be unfair if:
No easy answer — this is the core tension in fairness.
How small biases multiply through feedback loops
Each pipeline stage introduces bias — and they compound:
Feedback: Juniors find assistant unhelpful \(\rightarrow\) use less \(\rightarrow\) less data \(\rightarrow\) even worse assistance \(\rightarrow\) the gap widens exponentially
Let \(q_t^{(g)}\) = queries to group \(g\) at time \(t\), \(a_t^{(g)}\) = assistant quality for group \(g\)
Elicitation: Queries proportional to past usage: \[q_{t+1}^{(g)} \propto a_t^{(g)}\]
Learning + Aggregation: Quality depends on data: \[a_{t+1}^{(g)} = f\!\left(q_{t+1}^{(g)}\right)\]
Combined: \(a_{t+1}^{(g)} = f(a_t^{(g)})\)
No single-stage fix prevents compounding — need interventions at multiple stages:
The tradeoff: All interventions sacrifice some efficiency for fairness. You cannot optimize both simultaneously.
Fundamental tensions with no perfect resolution
Definition (Dwork et al., 2012): Similar individuals should be treated similarly.
\[ d(x_i, x_j) \leq \epsilon \implies d(f(x_i), f(x_j)) \leq \delta(\epsilon) \]
In peer review: Papers with similar topics and quality should get similar-quality reviewers. If Papers A and B both study transformers on low-resource languages, they should get comparable reviewers.
Definition (Demographic parity): Protected groups should have equal average outcomes.
\[ \mathbb{E}[f(x) \mid G = g_1] = \mathbb{E}[f(x) \mid G = g_2] \]
In peer review: Papers from mainstream ML subfields vs. small subfields should receive equally qualified reviewers on average.
Dwork et al. (2012): Individual and group fairness are often mutually incompatible.
Individual fairness says:
Similar topics \(\rightarrow\) similar reviewers. Small subfield papers get less-expert reviewers (few experts exist).
Group fairness says:
Both groups get equally expert reviewers on average. Must assign small subfield papers to less-matched (but expert) reviewers.
These conflict: To satisfy group fairness, must violate individual fairness (similar topics \(\rightarrow\) different review quality based on group). This is a mathematical impossibility, not an implementation failure.
Another fundamental tension:
| Process Fairness | Outcome Fairness | |
|---|---|---|
| Definition | Same rules for all | Equitable results |
| In review | All bids weighted equally | Adjust weights for junior reviewers |
| Philosophy | Procedural justice | Distributive justice |
| Connection | Liberal assistance | Illiberal assistance |
The tension: Equal treatment of unequal groups perpetuates inequality. But adjusting for demographics means unequal treatment.
Process-fair allocation: Senior reviewers bid strategically on high-quality papers \(\rightarrow\) get better assignments. Juniors bid less strategically \(\rightarrow\) worse papers.
Outcome-fair allocation: Boost junior bids to compensate for less strategic bidding \(\rightarrow\) equalized outcomes.
Key insight: Process and outcome fairness are fundamentally in tension when groups have unequal starting positions or differential access to strategic information.
Two proven incompatibilities constrain every system:
These aren’t engineering failures — they are impossibility results. No system satisfies all fairness criteria. You must choose which to prioritize.
Eight actionable guidelines for fairer systems
Document what values are prioritized, what’s sacrificed, who benefits, who is harmed.
Example template:
| Decision | DPO with volume-weighted \(\pi_{\text{ref}}\) |
| Value prioritized | Statistical efficiency |
| Value sacrificed | Group fairness (equal influence) |
| Who benefits | High-volume annotators |
| Who is harmed | Low-volume annotators |
| Mitigation | Minimum sampling quotas |
Claiming “technical neutrality” hides choices and prevents accountability.
Don’t optimize for clicks, bids, or engagement without asking if they reflect true preferences.
\[P(B \mid M, C) \neq P(B \mid M)\]
Red flag: Using revealed preferences (clicks, purchases) as ground truth without modeling context = inversion fallacy
Map the full pipeline. Simulate over time. Does disparity grow or shrink?
Audit checklist:
Set circuit breakers: If disparity exceeds threshold (e.g., 2x gap in quality), trigger automatic review.
Accept impossibilities. Choose based on domain and stakes:
| Context | Prioritize | Rationale |
|---|---|---|
| High-stakes (hiring, healthcare) | Outcome fairness | Must correct systemic disparities |
| Well-defined merit (expertise matching) | Individual fairness | Similar entities \(\rightarrow\) similar treatment |
| Requiring diversity (research) | Group fairness | All perspectives represented |
| Low-stakes personal (shopping) | Process fairness | Autonomy paramount |
Make the choice, document it, and measure violations of non-prioritized criteria.
Report performance per subgroup, not just average. Set minimum thresholds:
\[ \max_{\theta} \text{Utility}(\theta) \quad \text{s.t.} \quad \min_{g \in \mathcal{G}} \text{Quality}_g(\theta) \geq \tau \]
Optimization on average hides disparities. Stratified reporting reveals them.
Involve affected communities early and throughout:
Designers often lack lived experience of disadvantaged groups. Participatory design surfaces invisible issues.
Log decisions and context for auditing. Enable external verification.
Privacy note: Logging demographics raises concerns — use differential privacy, aggregate before sharing.
Without observability, fairness claims are unverifiable.
Default to respecting preferences. Override with structured paternalism:
Override (illiberal) when:
Respect (liberal) when:
Always: transparency (explain overrides), justification (document harm), recourse (allow appeals).
Warning signs that indicate fairness problems
Monitoring: Fairness audits before deployment, at launch, and monthly ongoing.
Important
Every technical choice in the preference learning pipeline is a value choice about whose preferences matter and how they should be weighted.
You cannot avoid making these choices. You can only choose whether to make them consciously and transparently.

Chapter 6: Whose Preferences?