5  Aggregation

5.1 Social Choice Theory and Implications for AI Preference Aggregation

In many applications, human preferences must be aggregated across multiple individuals to determine a collective decision or ranking. This process is central to social choice theory, which provides a mathematical foundation for preference aggregation. Unlike individual preference modeling, which focuses on how a single person makes decisions, social choice theory addresses the challenge of combining multiple preference profiles into a single coherent outcome. A social welfare function (SWF) takes as input each individual’s preference ranking over a set of alternatives and produces a social ranking of those alternatives. A related concept is a social choice function (SCF), which selects a single winning alternative given individuals’ preferences. Many voting rules can be seen as social choice functions that aim to reflect the group’s preferences. Formally, let \(N=\{1,2,\dots,n\}\) be a set of \(n\) voters (agents) and \(A=\{a_1,\dots,a_m\}\) a set of \(m\) alternatives (with \(m \ge 3\)). Each voter \(i\) has a preference order \(\succ_i\) over \(A\). A social choice function is a mapping \(f: (\succ_1,\dots,\succ_n)\mapsto A\) that picks a winning alternative for each possible profile of individual preferences. A social welfare function is a mapping that produces a complete societal ranking \(\succ^*\) of the alternatives. The central question is: can we design an aggregation rule that faithfully represents individual preferences while satisfying certain fairness or rationality axioms?

Many common voting rules illustrate different methods of aggregation, each with its own merits and vulnerabilities:

  • Plurality: Each voter names their top choice; the alternative with the most votes wins.
  • Borda Count: Voters rank all alternatives, and points are assigned based on the position in each ranking. For example, with \(m\) alternatives, a voter’s top-ranked alternative gets \(m-1\) points, the second-ranked gets \(m-2\), and so on down to 0. The Borda score of an alternative is the sum of points from all voters, and the winner is the alternative with the highest total score.
  • Single Transferable Vote (STV): Voters rank choices, and the count proceeds in rounds. In each round, the alternative with the fewest votes is eliminated and those votes are transferred to the next preferred remaining alternative on each ballot, until one candidate has a majority.
  • Condorcet Methods: These look for a candidate that wins in all pairwise majority contests against other alternatives (the Condorcet winner), if such an alternative exists.

However, preference aggregation is not always straightforward. The Condorcet paradox illustrates that majority preferences can be cyclic (rock-paper-scissors style), so that no single alternative is majority-preferred to all others, violating transitivity. Different voting rules can yield different winners on the same profile, highlighting how the choice of rule influences the outcome. To guide the design of social choice functions, several desirable properties or axioms have been proposed. Three classical fairness criteria are:

  1. Unanimity (Pareto efficiency): If all individuals strictly prefer one alternative \(x\) over another \(y\) (i.e. \(x \succ_i y\) for every voter \(i\)), then the group ranking should prefer \(x\) over \(y\) as well (\(x \succ^* y\)).
  2. Independence of Irrelevant Alternatives (IIA): The social preference between any two alternatives \(x\) and \(y\) should depend only on the individual preferences between \(x\) and \(y\). In other words, if we change individuals’ rankings of other “irrelevant” alternatives (not \(x\) or \(y\)) in any way, the group’s relative ordering of \(x\) and \(y\) should remain unchanged.
  3. Non-dictatorship: The aggregation should not simply follow a single individual’s preference regardless of others. There is no voter \(i\) who always gets their top choice as the social choice (or whose rankings always become the social ranking), irrespective of other voters’ preferences.

Additionally, we assume an unrestricted domain (universal admissibility): individuals may have any transitive preference ordering over the \(m\) alternatives (no restrictions like single-peaked preferences unless explicitly imposed). One might hope that a fair voting rule exists that satisfies all the above properties for three or more alternatives. Surprisingly, a seminal negative result shows this is impossible.

Arrow’s Impossibility Theorem (Arrow 1951) is a cornerstone of social choice theory. It states that when there are three or more alternatives (\(m\ge 3\)), no social welfare function can simultaneously satisfy Unanimity, IIA, and Non-dictatorship – unless it is a trivial dictatorial rule. In other words, any aggregation mechanism that is not dictatorial will inevitably violate at least one of the fairness criteria. The theorem is usually proven by contradiction: assuming a social welfare function satisfies all conditions, one can show that one voter’s preferences always decide the outcome, hence the rule is dictatorial. Intuitively, Arrow’s theorem is driven by the possibility of preference cycles in majority voting. Even if individual preferences are transitive, aggregated majorities can prefer \(A\) to \(B\), \(B\) to \(C\), and \(C\) to \(A\) in a cycle, as in the Condorcet paradox. Under Unanimity and IIA, the social ranking must locally match these pairwise preferences, but this produces a contradiction with transitivity unless one voter’s ranking is given overriding authority. A sketch of Arrow’s proof is as follows: one shows that under the axioms, the social ranking between any two alternatives \(x\) and \(y\) must agree with some particular voter’s preference (the “pivotal” voter for that pair). With IIA, the identity of the pivotal voter must be the same across all pairs of alternatives, otherwise by cleverly constructing profiles one can derive a conflict. This single pivotal voter then effectively dictates the entire social order, violating Non-dictatorship. Hence, the axioms are incompatible.

Arrow’s Impossibility Theorem has profound implications: it formalizes the inherent trade-offs in designing any fair aggregation scheme. In practice, different voting rules relax one or more of Arrow’s conditions. For instance, Borda count violates IIA (since introducing or removing an irrelevant alternative can change the point totals), while a dictatorship violates fairness blatantly. The theorem suggests that every practical voting system must sacrifice at least one of the ideal fairness criteria. It also motivated the exploration of alternative frameworks (such as allowing interpersonal comparisons of utility or cardinal preference aggregation) to escape the impossibility by weakening assumptions.

Complementing Arrow’s theorem, the Gibbard–Satterthwaite theorem focuses on incentives and strategic manipulation in voting systems (Gibbard 1973; Satterthwaite 1975). It considers any deterministic social choice function \(f\) that chooses a single winner from the set of \(m\ge 3\) alternatives. The theorem states that if \(f\) is strategy-proof (incentive compatible) and onto (its range of outcomes is the entire set of alternatives), then \(f\) must be dictatorial. Strategy-proofness (also called truthfulness or dominant-strategy incentive compatibility) means that no voter can ever benefit by misrepresenting their true preferences, regardless of what others do. In other words, reporting their genuine ranking is a (weakly) dominant strategy for each voter. The theorem implies that for any realistic voting rule where every alternative can possibly win, either one voter effectively decides the outcome (a dictatorship) or else the rule is susceptible to strategic manipulation by voters. The Gibbard–Satterthwaite theorem tells us that every non-dictatorial voting rule for 3 or more alternatives is manipulable: there will exist some election scenario where a voter can gain a more preferred outcome by voting insincerely (i.e. not according to their true preferences). For example, in a simple plurality vote, a voter whose true favorite is a long-shot candidate might vote for a more viable candidate to avoid a worst-case outcome (“lesser of two evils” voting). In a Borda count election, voters might strategically raise a competitor in their ranking to push down an even stronger rival. The only way to avoid all such strategic voting incentives is to have a dictatorship or limit the choice set to at most two alternatives.

The proof of Gibbard–Satterthwaite is non-trivial, but one can outline the idea: Given a non-dictatorial and onto rule \(f\), one shows there exist at least three distinct outcomes that can result from some preference profiles. By carefully constructing profiles and using the onto property, one finds a situation where a single voter can change the outcome by switching their order of two candidates, demonstrating manipulability. The theorem is robust – even if we allow ties or weaker conditions, similar impossibility results hold (Gibbard’s 1978 extension handles randomized rules). The practical takeaway is that all meaningful voting protocols encourage tactical voting in some situations. Nonetheless, certain systems are considered “harder to manipulate” or more resistant due to complexity or uncertainty. For instance, while STV (ranked-choice voting) can be manipulated in theory, determining a beneficial strategic vote can be NP-hard in worst cases, which arguably provides some practical deterrence to manipulation (Bartholdi, Tovey, and Trick 1989).

Arrow’s and Gibbard–Satterthwaite’s theorems highlight the limitations any preference aggregation method must face. In domains like reinforcement learning from human feedback (RLHF) and AI alignment, we also aggregate preferences – often preferences of multiple human evaluators or preferences revealed in pairwise comparisons – to guide machine learning systems. While these settings sometimes use cardinal scores or learned reward functions (escaping the strict ordinal framework of Arrow’s theorem), the spirit of these impossibility results still applies: there is no perfect way to aggregate human opinions without trade-offs.

For example, aggregating human feedback to train a model may run into inconsistencies analogous to preference cycles, especially when feedback comes from diverse individuals with different values. A simple majority vote over preferences might yield unstable or unfair outcomes if some annotators are systematically in the minority. Weighting votes by some credibility or expertise (weighted voting) can improve outcomes but raises the question of how to set the weights without introducing dictator-like influence. Recent research has proposed methods like jury learning – which integrates dissenting voices by having a panel (“jury”) of models or human subgroups whose aggregated judgment guides the learning (Gordon et al. 2022) – to ensure minority preferences are not entirely ignored. Another perspective is social choice in AI alignment, which suggests using social choice theory to design AI systems that respect a plurality of human values instead of collapsing everything into a single objective. In pluralistic value alignment, instead of forcing a single “best” solution, an AI might present a diverse set of options or behave in a way that reflects a distribution of values. This approach aims to preserve the diversity of human preferences rather than always aggregating to one monolithic preference. For instance, a conversational AI might be designed to recognize multiple acceptable responses (each aligning with different value systems) rather than one canonical “aligned” response for a given query.

These considerations are especially relevant in generative AI and large language models, where training involves human preference data. If we aggregate feedback naively, we might overfit to the majority preference and lose minority perspectives (a form of tyranny of the majority). On the other hand, trying to satisfy everyone can lead to indecision or an incoherent objective. The impossibility results remind us there is no free lunch: we must carefully decide which properties to prioritize (e.g. giving more weight to expert annotators versus preserving broader fairness, or balancing consistency vs inclusivity). Designing aggregation mechanisms for AI that reflect collective human values is an ongoing challenge. It often involves insights from traditional voting theory (to understand trade-offs and failure modes) combined with machine learning techniques (to model and learn from preference data). In summary, social choice theory provides cautionary guidance as we build systems that learn from human preferences: we need to be conscious of which fairness criteria we relax and be transparent about the compromises being made in any preference aggregation pipeline.

5.2 Mechanism Design

While voting rules aggregate ordinal rank preferences to select a social outcome, another class of preference aggregation occurs in economic settings like auctions and general mechanism design. Here individuals reveal their valuations (numerical utilities) for outcomes, and the mechanism chooses an outcome (such as an allocation of goods) and possibly payments. Mechanism design asks: how can we design rules so that rational agents, acting in their own interest, end up revealing information that leads to a socially desirable outcome? A central concept is incentive compatibility – the mechanism should be designed so that each participant’s best strategy is to act according to their true preferences (e.g. bid their true value). In this section, we focus on auctions as a prime example of preference aggregation with money, and highlight classical results including Vickrey–Clarke–Groves (VCG) mechanisms and Myerson’s optimal auction.

Consider a single-item auction with one item for sale and \(n\) bidders. Bidder \(i\) has a private valuation \(v_i\) for the item (how much the item is worth to them). Each bidder’s goal is to maximize their own utility, defined as \(v_i - p_i\) if they win and pay price \(p_i\), or \(0\) if they do not win (assuming quasilinear utility where money is the transferable utility). The auction’s task is to allocate the item to one of the bidders and possibly determine payments. We can think of an auction as a mechanism that asks each bidder for a “message” (typically a bid representing how much they are willing to pay), then selects a winner and a price based on the bids. A key objective might be social welfare maximization – allocate the item to the bidder who values it most (maximizing \(v_i\) of the winner). Another possible objective is revenue maximization for the seller – choose the allocation and price to maximize the seller’s expected payment.

A classic result in auction theory is that to maximize social welfare in a single-item private-value setting, one should award the item to the highest valuer – and this can be done in an incentive-compatible way by using a second-price auction. The Vickrey second-price auction works as follows: (1) All bidders submit sealed bids \(b_1, b_2, \ldots, b_n\). (2) The bidder with the highest bid wins the item. (3) The price paid by the winner is the second-highest bid. For example, if the bids are \((2,\, 6,\, 4,\, 1)\) (in some currency units), the highest bid is \(6\) (by bidder 2, say) and the second-highest is \(4\). Bidder 2 wins the item and pays \(4\).

Under this mechanism, it turns out that bidding truthfully \(b_i = v_i\) is a dominant strategy for each bidder. In other words, the auction is dominant-strategy incentive compatible (DSIC): no matter what others do, a bidder maximizes their expected utility by reporting their true valuation. The intuition is as follows. If bidder \(i\) bids lower than their true value (i.e. \(b_i < v_i\)), and if their true value was actually the highest, they risk losing the item even though they value it more than the price they would have paid – a missed opportunity for positive utility. Bidding higher than their value (\(b_i > v_i\)) cannot help them win in any situation where bidding truthfully wouldn’t (it could only make a difference if their true \(v_i\) wasn’t the highest but they tried to win anyway); and if they do win with an inflated bid, they might end up paying the second-highest bid which could be above their true value, yielding negative utility. By bidding exactly \(v_i\), if they win, it means all other bids were lower, so \(v_i\) is at least as high as the second-highest bid \(p\) they pay – guaranteeing non-negative utility \(v_i - p \ge 0\). If they lose, it means someone else had a higher bid (hence higher value, if others are truthful), so bidder \(i\) wouldn’t have gained anyway. This argument, made rigorous by Vickrey (Vickrey 1961), establishes that truth-telling is a dominant strategy in the second-price auction. As a consequence, when everyone bids truthfully, the item is allocated to the bidder with the highest \(v_i\), achieving maximum social surplus (allocative efficiency). The second-price auction is thus an elegant mechanism that aligns individual incentives with social welfare maximization.

It is worth contrasting this with a first-price auction, where the winner pays their own bid. In a first-price auction, bidders have an incentive to bid below their true value (to avoid the winner’s curse of paying too much), in a Nash equilibrium that involves bid shading. The first-price auction can still allocate to the highest valuer in equilibrium, but only through strategic behavior (and it is not DSIC). By charging the second-highest bid, the Vickrey auction removes the incentive to shade bids, since the price does not directly depend on one’s own bid beyond the fact of winning or losing.

So far, we discussed auctions aimed at maximizing social welfare. In many cases, the auctioneer (seller) is interested in maximizing revenue. A foundational result by Roger Myerson (1981) provides a characterization of optimal auctions (those that maximize the seller’s expected revenue) for single-item settings under certain assumptions (Myerson 1981). The problem can be formulated as follows: suppose each bidder’s private value \(v_i\) is drawn independently from a known distribution \(F_i\) (for simplicity, assume identical distribution \(F\) for all bidders, i.i.d.). We seek a mechanism (allocation rule and payment rule) that maximizes the seller’s expected payment, subject to incentive compatibility and individual rationality (participants should not expect negative utility from truthful participation).

Myerson’s theorem states that the optimal auction in such a setting is a threshold auction characterized by virtual valuations. Define the virtual value for a bidder with value \(v\) as \(\varphi(v) = v - \frac{1-F(v)}{f(v)}\), where \(f\) is the probability density function of \(F\) (assuming it is continuous). An assumption called regularity (which holds for many distributions) is that \(\varphi(v)\) is non-decreasing in \(v\). Myerson showed that the revenue-maximizing strategy is: treat \(\varphi(v)\) as the effective “score” of a bid, allocate the item to the bidder with the highest non-negative virtual value (if all virtual values are negative, allocate to no one), and charge them the smallest value they could have such that they would still win (the payment is essentially the critical bid where \(\varphi\) of that bid equals the second-highest virtual value or the zero cutoff). In practice, for i.i.d. bidders, this reduces to: there is an optimal reserve price \(r\) such that you sell to the highest bidder if and only if their bid \(b_{\max} \ge r\); if sold, the price is the max of the second-highest bid and \(r\).

In the case of \(n\) bidders with values i.i.d. uniform on \([0,1]\) (which is a regular distribution), one can compute the optimal reserve price. The virtual value function for uniform \([0,1]\) is \(\varphi(v) = v - \frac{1-v}{1} = 2v - 1\). Setting \(\varphi(v)\ge 0\) gives \(v \ge 0.5\). So Myerson’s mechanism says: don’t sell the item if all bids are below 0.5; otherwise, sell to the highest bidder at at least 0.5. This is exactly a second-price auction with a reserve of \(r=0.5\). Our earlier example implicitly demonstrated this: with two uniform(0,1) bidders, the optimal auction sets a reserve price of \(0.5\) and yields a certain expected revenue. We can break down the cases: - With probability \(1/4\), both bidders have values below \(0.5\) (each below 0.5 with probability 1/2), in which case nobody wins and revenue is 0. - With probability \(1/4\), both bidders have \(v > 0.5\). In this case, the second-price auction with reserve will sell to the highest bidder at the max of the second-highest value and 0.5. Given both \(v_1, v_2 > 0.5\), the expected second-highest value (conditional on both >0.5) is \(\frac{2}{3}\) (in fact, the order statistics of two uniforms on [0.5,1] give mean of min = 2/3). So in this case the expected price is the second-highest value (since that will exceed 0.5), about 0.667. - With probability \(1/2\), one bidder is above 0.5 and the other below. In that case, the one above 0.5 wins at price equal to the reserve 0.5 (since the second-highest bid is the reserve).

Taking the expectation, the seller’s expected revenue is \(0*(1/4) + (2/3)*(1/4) + (1/2*1/2) = 0 + 1/6 + 1/4 = 5/12 \approx 0.417\). This is higher than the expected revenue without a reserve. In fact, without a reserve (just a plain second-price with two bidders uniform [0,1]), one can compute the expected revenue is \(1/3 \approx 0.333\) (the second order statistic’s expectation). Thus, the reserve has increased revenue. Myerson’s theory tells us that indeed the second-price auction with an optimally chosen reserve maximizes revenue among all DSIC mechanisms for this setting. A notable special case result is that when bidder distributions are i.i.d. and regular, an optimal auction is essentially “allocatively efficient with a reserve price” – i.e. aside from possibly excluding low-value bidders via a reserve, it allocates to the highest remaining bid.

Myerson’s work also highlighted the gap between revenue maximization and welfare maximization. The price of optimality (in revenue) is that the seller might sometimes forego efficient allocation (e.g. not selling despite a willing buyer, in order to preserve a high reserve price strategy). In contrast, Vickrey’s auction always allocates efficiently but may not maximize revenue.

An interesting insight in auction theory is that increasing competition can yield more revenue than fine-tuning the auction mechanism. The Bulow–Klemperer theorem (Bulow and Klemperer 1996) demonstrates that, under certain regularity assumptions, a simple welfare-maximizing auction with one extra bidder outperforms the optimal auction with fewer bidders. Specifically, for i.i.d. bidders with a regular distribution \(F\), the expected revenue of a second-price auction with \(n+1\) bidders is at least as high as the expected revenue of the Myerson-optimal auction with \(n\) bidders. In formula form:

\[ \mathbb{E}_{v_1,\ldots,v_{n+1} \sim F}[\text{Rev}^{\text{(second-price)}}(n+1 \text{ bidders})] \geq \mathbb{E}_{v_1,\ldots,v_n \sim F}[\text{Rev}^{\text{(optimal)}}(n \text{ bidders})] \,. \tag{4.1}\label{eq-eq3.64} \]

This result suggests that, in practice, having more participants (competition) is often more valuable than exploiting detailed knowledge of bidder distributions. As a corollary, a policy recommendation is that a seller is usually better off using a simple auction design (like a Vickrey auction or other transparent mechanism) and putting effort into attracting more bidders, rather than using a complex optimal mechanism that might discourage participation.

Vickrey’s second-price auction can be generalized to multiple items and more complex outcomes by the Vickrey–Clarke–Groves (VCG) mechanism. The VCG mechanism is a cornerstone of mechanism design that provides a general solution for implementing socially efficient outcomes (maximizing total stated value) in dominant strategies, for a broad class of problems. It works for any scenario where agents have quasilinear utilities and we want to maximize the sum of valuations.

In a general mechanism design setting, let \(\Omega\) be the set of possible outcomes. Each agent \(i\) has a private valuation function \(v_i(\omega)\) for outcomes \(\omega \in \Omega\) (the amount of utility, in money terms, that \(i\) gets from outcome \(\omega\)). Agents report bids \(b_i(\omega)\) (which we hope equal \(v_i(\omega)\) if they are truthful). The mechanism then chooses an outcome \(\omega^* \in \Omega\) to maximize the reported total value:

\[ \omega^* = \arg\max_{\omega \in \Omega} \sum_{i=1}^n b_i(\omega) \,, \]

i.e. \(\omega^*\) is the outcome that would be socially optimal if the \(b_i\) were true values. To induce truth-telling, VCG sets payments such that each agent pays the externality they impose on others by their presence. Specifically, one convenient form of the VCG payment for agent \(i\) is:

\[ p_i(b) = \max_{\omega \in \Omega} \sum_{j \neq i} b_j(\omega)\;-\;\sum_{j \neq i} b_j(\omega^*) \,, \]

which can be interpreted as: what would the total value of others be if \(i\) were not present (first term, maximizing without \(i\)) minus the total value others actually get in the chosen outcome \(\omega^*\). Equivalently, we can write the payment as the agent’s bid for the chosen outcome minus a rebate term:

\[ p_i(b) = b_i(\omega^*) \;-\; \Big[\sum_{j=1}^n b_j(\omega^*) - \max_{\omega \in \Omega} \sum_{j \neq i} b_j(\omega)\Big] \,. \tag{4.2}\label{eq-eq3.67} \]

This formula (which in single-item auction reduces to second-price logic) ensures that each agent’s net payoff is \(v_i(\omega^*) - p_i = \max_{\omega} \sum_{j\neq i} v_j(\omega) + v_i(\omega^*) - \sum_{j\neq i} v_j(\omega^*)\). All terms except \(v_i(\omega^*)\) cancel out, meaning each agent’s utility equals the max total welfare of others plus their own value for the chosen outcome minus others’ welfare in the chosen outcome – which does not depend on \(v_i(\omega^*)\) except through the decision of \(\omega^*\). By construction, an agent cannot influence \(\omega^*\) in a way that improves this expression unless it genuinely increases total welfare, so misreporting \(v_i\) cannot increase their utility. Thus truthful reporting is a dominant strategy. VCG is dominant-strategy incentive compatible (DSIC) and produces an outcome that maximizes \(\sum_i v_i(\omega)\), achieving social welfare maximization.

VCG provides a powerful existence result: under broad conditions, there is a mechanism that achieves efficient allocation with truth-telling (in fact, VCG is essentially the unique one, aside from adding harmless constant transfers). However, implementing VCG in practice can be difficult. One challenge is computational: finding \(\arg\max_{\omega}\sum_i b_i(\omega)\) can be NP-hard if \(\Omega\) is a combinatorially large space (as in many combinatorial auctions). Another issue is budget balance and revenue: VCG payments might not yield any revenue to the mechanism designer in some cases (or even require subsidies in complex settings), and they can be low or zero in certain environments, which is problematic if the seller needs revenue. VCG is also vulnerable to collusion or the presence of fake identities (sybil attacks) – the mechanism assumes each participant is a separate entity; if one bidder can split into two identities, they might game the outcome.

Nonetheless, for many domains, VCG or variants have been successfully used or at least studied. Notably, combinatorial auctions (where multiple items are up for sale and bidders have valuations for bundles of items) can in theory be handled by VCG: just let \(\Omega\) be all possible allocations of items to bidders, and have bidders report \(b_i(S)\) for each bundle \(S\) of items. VCG would allocate the items in the way that maximizes total reported value and charge each bidder the opportunity cost their presence imposes on others. In practice, as mentioned, combinatorial auctions face exponential complexity in preference reporting (each bidder potentially has to specify a value for every subset of items) and winner determination (solving an NP-hard combinatorial optimization). Heuristic or restricted approaches (like limiting the kinds of bundles or using iterative bidding with query learning of preferences) are used to make the problem tractable. Additionally, pure VCG in combinatorial settings can have undesirable properties: for example, in some cases adding more bidders can cause VCG prices to drop to zero (the so-called “threshold problem” or revenue monotonicity failure), and bidders may collude to manipulate their bids collectively.

One high-stakes application of combinatorial auctions is spectrum auctions for selling licenses of electromagnetic spectrum to telecom companies. Governments have used multi-round combinatorial auctions to allocate spectrum, with billions of dollars at stake. Designing these auctions requires balancing efficiency with simplicity and robustness to strategic behavior. Early spectrum auctions that used simpler formats (like sequential auctions or one-shot sealed bids for each license) ran into problems like the exposure problem – a bidder valuing a combination of items (say complementary licenses in adjacent regions) risks winning only part of the combination at a high price, which could be bad for them if the items are worth much less separately. The simultaneous multi-round auction (SMRA) was an innovation that allowed bidding on all items at once in rounds, giving bidders some price discovery to mitigate the exposure problem. Even so, strategic issues like demand reduction (bidders deliberately not bidding on too many items to keep prices low) and tacit collusion through signaling bids have been observed. These practical complications underscore that while VCG is a beautiful theoretical ideal, real-world mechanism design often involves compromises and tweaks.

5.2.1 Case Study 1: Mechanism for Peer Grading

To illustrate an application of mechanism design beyond auctions, consider a classroom setting where students grade each other’s work (peer assessment). The goal is to design a system (a “mechanism”) that produces fair and accurate grades while incentivizing students to put effort into grading. Jason Hartline and colleagues (2020) studied such a scenario, examining how to optimize scoring rules for peer grading (Hartline et al. 2020). In this setting, students are both agents (who might strategize to maximize their own grade or minimize their effort) and graders. The “outcome” we want is a set of final grades for students, ideally reflecting the true quality of their work.

One idea is to use proper scoring rules to evaluate the peer graders. A proper scoring rule is a concept from forecast evaluation that gives highest expected score for truthful reporting of probabilities. In peer grading, one might try to reward students based on how close their grading is to some ground truth or to the TA’s grades. However, a naive application of proper scoring can backfire. Hartline et al. observed a “lazy peer grader” problem: if students figure out that always giving an average score (say 80%) yields a decent reward under the scoring rule, they might not bother to carefully distinguish good and bad work. In one experiment, giving all peers an 80% could yield a 96% accuracy score for the grader under a certain scoring rule (Hartline et al. 2023). This clearly undermines the goal – the grader is basically cheating the system by always predicting the class average.

To combat this, the mechanism designers sought a scoring rule that maximizes the difference in reward between a diligent grading and a lazy strategy, thereby incentivizing effort. They formulated this as an optimization problem: design the reward function for peer graders such that truthful, careful grading yields a strictly higher expected score than any degenerate strategy like always giving the average. By analyzing data and grader behavior models, they adjusted the scoring rules to penalize obviously lazy patterns and reward variance when warranted. The resulting mechanism improved the accuracy of peer grading by aligning the incentives of student graders (who want a high score for their grading job) with the objective of accurate assessment. This case study highlights how ideas of incentive compatibility and mechanism design apply even in social/educational contexts: the “payments” are points towards one’s own grade, and the mechanism must account for strategic behavior to ensure a reliable outcome.

In conclusion, mechanism design provides a toolkit for aggregating preferences (or signals, like grades or bids) in a principled way, by explicitly accounting for individual incentives. Whether in auctions, peer grading, or other domains, the design of rules (allocation algorithms, payment or scoring schemes) crucially determines whether people feel encouraged to be truthful or to game the system. The theories of VCG and Myerson give us optimal baselines for efficiency and revenue in auctions, while impossibility results like Gibbard–Satterthwaite warn us of the limitations in voting. Real-world implementations often have to grapple with complexity and approximate these ideals. While learning from individual human preference is a powerful approach, it too faces aggregation challenges. If the human feedback is inconsistent or if different annotators have different preferences, the reward model may end up capturing an average that satisfies no one perfectly. There is active research on scalable oversight: techniques to gather and aggregate human feedback on tasks that are too complex for any single person to evaluate reliably. This includes approaches like recursive reward modeling, iterated amplification (Christiano, Shlegeris, and Amodei 2018), and AI-assisted debate (Irving, Christiano, and Amodei 2018), where AI systems help humans provide better feedback or break down tasks. The goal of scalable oversight is to leverage human preferences and principles in guiding AI even as AI systems tackle increasingly complex or open-ended tasks, while mitigating the human burden and bias in evaluation.

In summary, preference aggregation in machine learning spans from simple models like Bradley–Terry for pairwise comparisons to elaborate RLHF pipelines for training large models. The deep mathematical foundations – whether Arrow’s theorem or Myerson’s auction theory – remind us that whenever we aggregate preferences or signals from multiple sources, we must consider incentive effects, fairness criteria, and the possibility of inconsistency. By combining insights from social choice, economics, and statistical learning, we aim to build AI systems that not only learn from human preferences but do so in a principled, robust, and fair manner. The next chapter will delve further into aligning AI with human values, building on the mechanisms and learning algorithms discussed here to ensure AI systems remain beneficial and in line with what people truly want.

5.2.2 Case Study 2: Incentive-Compatible Online Learning

To address this problem, we seek to create a model. We first outline the key criteria that our model must achieve. The model revolves around repeated interactions between a planner (the system) and multiple agents (the users). Each agent, upon arrival in the system, is presented with a set of available options to choose from. These options could vary widely depending on the application of the model, such as routes in a transportation network, a selection of hotels in a travel booking system, or even entertainment choices in a streaming service. The interaction process is straightforward but crucial: agents arrive, select an action from the provided options, and then report feedback based on their experience. This feedback is vital as it forms the basis upon which the planner improves and evolves its recommendations. The agents in this model are considered strategic; they aim to maximize their reward based on the information available to them. This aspect of the model acknowledges the real-world scenario where users are typically self-interested and seek to optimize their own outcomes. The planner, on the other hand, has a broader objective. It aims to learn which alternatives are best in a given context and works to maximize the overall welfare of all agents. This involves a complex balancing act: the planner must accurately interpret feedback from a diverse set of agents, each with their own preferences and biases, and use this information to refine and improve the set of options available. The ultimate goal of the planner is to create a dynamic, responsive system that not only caters to the immediate needs of individual agents but also enhances the collective experience over time, leading to a continually improving recommendation ecosystem.

Here, we seek to address the inherent limitations faced by the planner, particularly in scenarios where monetary transfers are not an option, and the only tool at its disposal is the control over the flow of information between agents. This inquiry aims to understand the extent to which these limitations impact the planner’s ability to effectively guide and influence agent behavior. A critical question is whether the planner can successfully induce exploration among agents, especially in the absence of financial incentives. This involves investigating strategies to encourage users to try less obvious or popular options, thus broadening the scope of feedback and enhancing the system’s ability to learn and identify the best alternatives. Another question is understanding the rate at which the planner learns from agent interactions. This encompasses examining how different agent incentives, their willingness to explore, and their feedback impact the speed and efficiency with which the planner can identify optimal recommendations.

The model can be extended in several directions, each raising its own set of questions.

1.  Multiple Agents with Interconnected Payoffs: When multiple agents arrive simultaneously, their choices and payoffs become interconnected, resembling a game. The research question here focuses on how these interdependencies affect individual and collective decision-making.

2.  Planner with Arbitrary Objective Function: Investigating scenarios where the planner operates under an arbitrary objective function, which might not align with maximizing overall welfare or learning the best alternative.

3.  Observed Heterogeneity Among Agents: This involves situations where differences among agents are observable and known, akin to contextual bandits in machine learning. The research question revolves around how these observable traits can be used to tailor recommendations more effectively.

4.  Unobserved Heterogeneity Among Agents: This aspect delves into scenarios where differences among agents are not directly observable, necessitating the use of causal inference techniques to understand and cater to diverse user needs.

In our setup, there is a “planner,” which aims to increase exploration, and many independent “agents,” which will act selfishly (in a way that they believe will maximize their individual reward) (Mansour, Slivkins, and Syrgkanis 2019; Mansour et al. 2021). Under our model shown in Figure 1.1, there are \(K\) possible actions that all users can take, and each action has some mean reward \(\mu_i \in [0, 1]\). In addition, there is a common prior belief on each \(\mu_i\) across all users.. The \(T\) agents, or users, will arrive sequentially. As the \(t\)’th user arrives, they are recommended an action \(I_t\) by the planner, which they are free to follow or not follow. After taking whichever action they choose, the user experiences some realized reward \(r_i \in [0, 1]\), which is stochastic i.i.d. with mean \(\mu_i\), and reports this reward back to the planner.

So far, the model we have defined is equivalent to a multi-armed bandit model, which we have seen earlier in this chapter (1). Under this model, well-known results in economics, operations research and computer science show that \(O(\sqrt{T})\) regret is achievable (Russo and Roy 2015; Auer, Cesa-Bianchi, and Fischer 2002; Lai and Robbins 1985) with algorithms such as Thompson sampling and UCB. However, our agents are strategic and aim to maximize their own rewards. If they observe the rewards gained from actions taken by other previous users, they will simply take the action they believe will yield the highest reward given the previous actions; they would prefer to benefit from exploration done by other users rather than take the risk of exploring themselves. Therefore, exploration on an individual level, which the planner would like to facilitate, is not guaranteed under this paradigm.

In light of this, we also require that our model satisfy incentive compatibility, or that taking the action recommended by the planner has an expected utility that is as high as any other action the agent could take. Formally, \(\forall i : \, E[\mu_i | I_t = i] \geq E[\mu_{i'} | I_t = i].\) Note that this incentivizes the agents to actually take the actions recommended by the planner; if incentive compatibility is not satisfied, agents would simply ignore the planner and take whatever action they think will lead to the highest reward.

At a high level, the key to achieving incentive compatibility while still creating a policy for the planner that facilitates exploration is information asymmetry. Under this paradigm, the users only have access to their previous recommendations, actions, and rewards, and not to the recommendations, actions, and rewards of other users. Therefore, they are unsure of whether, after other users take certain actions and receive certain rewards, arms that they might have initially considered worse in practice outperform arms that they initially considered better. Only the planner has access to the previous actions and rewards of all users; the user only has access to their own recommendations and overall knowledge of the planner’s policy. The main question we aim to answer for the rest of this section is, given this new constraint of incentive compatibility, is \(O(\sqrt{T})\) regret still achievable? We illustrate such an algorithm in the following.

The main result here is a black-box reduction algorithm to turn any bandit algorithm into an incentive compatible one, with only a constant increase in Bayesian regret. Since, as mentioned earlier, there are bandit algorithms with \(O(\sqrt{T})\) Bayesian regret, black-box reduction will also allow us to get incentive-compatible algorithms with \(O(\sqrt{T})\) regret. The idea of black-box reduction will be to simulate \(T\) steps of any bandit algorithm in an incentive-compatible way in \(c T\) steps. This allows us to design incentive-compatible recommendation systems by using any bandit algorithm and then adapting it. Consider the following setting: there are two possible actions, \(A_1\) and \(A_2\). Assume the setting of deterministic rewards, where action 1 has reward \(\mu_1\) with prior \(U[1/3, 1]\) and mean \(\mathbb{E}[\mu_1] = 2/3\), and action 2 has reward \(\mu_2\) with prior \(U[0, 1]\) and mean \(\mathbb{E}[\mu_2] = 1/2\). Without the planner intervention and with full observability, users would simply always pick \(A_1\), so how can the planner incentivize users to play \(A_2\)?

The key insight is going to be to hide exploration in a pool of exploitation. The users are only going to receive a recommendation from the planner, and no other observations. After deterministically recommending the action with the highest expected reward (\(A_1\)), the planner will pick one guinea pig to recommend the exploratory action of \(A_2\). The users don’t know whether they are the guinea pig, so intuitively, as long as the planner picks guinea pigs uniformly at random and at low enough frequencies, the optimal decision for the users is still to follow the planner’s recommendation, even if it might go against their interest. The planner will pick the user who will be recommended the exploratory action uniformly at random from the \(L\) users that come after the first one (which deterministically gets recommended the exploitation action). Under this setting (illustrated in Figure 1.2), it is optimal for users to always follow the option that is recommended for them. More formally, if \(I_t\) is the recommendation that a user receives at time \(t\), then we have that:

\[ \begin{split} \mathbb{E}[\mu_1 - \mu_2 | I_t = 2] Pr[I_t = 2] &= \frac{1}{L} (\mu_1 - \mu_2) \quad \text{(Gains if you are the unlucky guinea pig)}\\ &+ (1 - \frac{1}{L}) \mathbb{E}[\mu_1 - \mu_2 | \mu_1 < \mu_2] \times p[\mu_1 < \mu_2] \quad \text{(Loss if you are not and $\mu_1 < \mu_2$)}\\ &\leq 0 \end{split} \]

This holds when \(L \geq 12\). It means that the gains from not taking the recommended action are negative, which implies that users should always take the recommendation. So far we have considered the case where rewards are deterministic, but what about stochastic rewards? We are now going to consider the case where rewards are independent and identically distributed from some distribution, and where each action \(A_i\) has some reward distribution \(r_i^t \sim D_i, \mathbb{E}[r_i^t] = \mu_i\). Back to the case where there are only two actions, we are going to adapt the prior algorithm of guinea pig-picking to the stochastic reward setting. Since one reward observation is not enough to fully know \(\mu_1\) anymore, we’ll instead observe the outcome of the first action \(M\) times to form a strong posterior \(\mathbb{E}[\mu_1 | r_1^1, \ldots r_1^M]\). We can use with stochastic rewards when there are two actions. Similarly, as before, we pick one guinea pig uniformly at random from the next \(L\) users and use the reward we get as the exploratory signal. In a very similar manner, we can generalize this algorithm from always having two actions to the general multi-armed bandit problem. Now suppose we have a general multi-armed bandit algorithm \(A\). We will wrap this algorithm around our black box reduction algorithm to make it incentive-compatible. We wrap every decision that \(A\) would make by exactly \(L-1\) recommendations of the action believed to be the best so far. This guarantees that the expected rewards for the users that are not chosen as guinea pigs are at least as good as \(A\)’s reward at phase \(n\).

5.3 Mutual Information Paradigm

In this section we discuss an influential new framework for designing peer prediction mechanisms, the Mutual Information Paradigm (MIP) introduced by Kong and Schoenebeck (Kong and Schoenebeck 2019). Traditional peer prediction approaches typically rely on scoring rules and correlation between agents’ signals. However, these methods often struggle with issues like uninformed equilibria, where agents can coordinate on uninformative strategies that yield higher payoffs than truth-telling. The core idea is to reward agents based on the mutual information between their report and the reports of other agents. We consider a setting with \(n\) agents, each possessing a private signal \(\Psi_i\) drawn from some set \(\Sigma\). The mechanism asks each agent to report their signal, which we denote as \(\hat{\Psi}_i\). For each agent \(i\), the mechanism randomly selects a reference agent \(j \neq i\). Agent \(i\)’s payment is then calculated as \(MI(\hat{\Psi}_i; \hat{\Psi}_j)\) where \(MI\) is an information-monotone mutual information measure. An information-monotone \(MI\) measure must satisfy the following properties:

  • Symmetry: \(MI(X; Y) = MI(Y; X)\).

  • Non-negativity: \(MI(X; Y) \geq 0\), with equality if and only if \(X\) and \(Y\) are independent.

  • Data processing inequality: For any transition probability \(M\), if \(Y\) is independent of \(M(X)\) conditioned on \(X\), then \(MI(M(X); Y) \leq MI(X; Y)\).

Two important families of mutual information measures that satisfy these properties are \(f\)-mutual information and Bregman mutual information. The \(f\)-mutual information is defined as \(MI_f(X; Y) = D_f(U_{X,Y}, V_{X,Y})\), where \(D_f\) is an \(f\)-divergence, \(U_{X,Y}\) is the joint distribution of \(X\) and \(Y\), and \(V_{X,Y}\) is the product of their marginal distributions. The Bregman mutual information is defined as: \(BMI_{PS}(X; Y) = \mathbb{E}_{X} [D{PS}(U_{Y|X}, U_Y)]\), where \(D_{PS}\) is a Bregman divergence based on a proper scoring rule \(PS\), \(U_{Y|X}\) is the conditional distribution of \(Y\) given \(X\), and \(U_Y\) is the marginal distribution of \(Y\). The MIP framework can be applied in both single-question and multi-question settings. In the multi-question setting, the mechanism can estimate the mutual information empirically from multiple questions. In the single-question setting, additional techniques like asking for predictions about other agents’ reports are used to estimate the mutual information. A key theoretical result of the MIP framework is that when the chosen mutual information measure is strictly information-monotone with respect to agents’ priors, the resulting mechanism is both dominantly truthful and strongly truthful. This means that truth-telling is a dominant strategy for each agent and that the truth-telling equilibrium yields strictly higher payoffs than any other non-permutation strategy profile. As research continues to address practical implementation challenges of designing truthful mechanisms, MIP-based approaches have significant potential to improve preference elicitation and aggregation in real-world applications lacking verifiable ground truth.

5.4 Exercises

5.4.1 Question 1: Pairwise Feedback Mechanisms for Digital Goods

Consider a marketplace for digital goods (such as personalized articles, artwork, or AI-generated data), where the exact utility derived from these goods is only revealed to buyers after the goods have been generated and delivered. To elicit truthful preferences from buyers who find it difficult to precisely quantify their valuations beforehand, the marketplace implements a pairwise feedback mechanism, inspired by the work of Robertson and Koyejo (2023).

Formally, each buyer requests a personalized digital good and, upon receiving the good, provides feedback by indicating whether their realized utility is higher or lower than a randomly chosen reference price \(c \in [0,1]\). The mechanism utilizes this binary feedback to estimate valuations and allocate future goods accordingly.

Answer the following:

  1. Formalize the Problem: Let \(u_i\) denote the true valuation of buyer \(i\), and let \(r_i(c)\) denote the buyer’s reported feedback (\(r_i(c) = 1\) if \(u_i \geq c\), 0 otherwise). Prove that, under uniform random selection of the reference price \(c\), the expected value \(\mathbb{E}[r_i(c)]\) is equal to the true valuation \(u_i\).

  2. Incentive Compatibility Analysis: Discuss conditions under which this feedback-based mechanism is incentive compatible, i.e., buyers have no incentive to misreport their preferences. Specifically, analyze why strategic misreporting (reporting \(r_i(c)\) incorrectly for some reference prices) would not increase a buyer’s expected payoff.

  3. Regret Analysis: Suppose the mechanism estimates buyers’ utilities from past feedback and allocates future goods using an epsilon-greedy strategy (exploration rate \(\eta_t\)). Provide an informal discussion of the trade-off involved in choosing the exploration rate, and how it affects the social welfare and revenue of the marketplace over time.

  4. Practical Implications: Suggest one practical scenario outside the digital-goods marketplace where such a feedback-driven, pairwise comparison approach would be beneficial. Briefly justify your choice, mentioning challenges and benefits.

5.4.2 Question 2: Scalable Oversight in Complex Decision-Making

In scenarios involving complex or high-dimensional outcomes (such as summarizing lengthy texts, assessing the quality of detailed AI-generated reports, or reviewing scientific papers), evaluating the quality of outputs can become infeasible for a single human overseer. One practical solution is scalable oversight, where the evaluation task is decomposed and distributed among multiple human evaluators or even assisted by AI agents. Consider a scalable oversight scenario inspired by recursive reward modeling, where complex evaluations are hierarchically decomposed into simpler tasks. Specifically, suppose you want to evaluate a lengthy report generated by an AI system. Answer the following:

(a) Decomposition of the Task: Propose a formal recursive decomposition strategy to evaluate a long AI-generated report of length (N) paragraphs. Specifically, describe a hierarchical evaluation method that decomposes the original evaluation into simpler subtasks at multiple hierarchical levels. Clearly describe how many subtasks you have at each level and how the final aggregated evaluation is computed.

(b) Statistical Aggregation Method: Suppose each evaluation subtask yields a binary score (s_i {0,1}), where (1) indicates acceptable quality and (0) indicates unacceptable quality. Propose a simple statistical aggregation method (e.g., majority voting, threshold voting, weighted aggregation, etc.) to combine subtask evaluations into a single global quality assessment at the top level. Justify your choice mathematically.

(c) Computational Simulation: Implement a Python simulation of your hierarchical decomposition and aggregation method described in parts (a) and (b). Assume each subtask is evaluated with some fixed probability (p) of being correct (representing human evaluators with bounded accuracy).

Specifically, your simulation should: - Implement a hierarchical evaluation scheme (e.g., binary-tree decomposition). - Assume evaluators have accuracy (p = 0.8) (i.e., probability of correctly identifying paragraph quality). - Simulate how evaluator accuracy at the leaf nodes affects the reliability of the global evaluation at the root node. - Plot how the reliability of the top-level evaluation (accuracy at the root) varies as you increase the depth of hierarchy for a report of fixed length (e.g., (N = 64) paragraphs).

(d) Practical Discussion: Briefly discuss advantages and potential drawbacks of scalable oversight approaches such as recursive decomposition in the context of AI alignment. Include considerations such as evaluator fatigue, consistency, cost, and vulnerability to manipulation or collusion.

References

Arrow, Kenneth J. 1951. Social Choice and Individual Values. John Wiley; Sons.
Auer, Peter, Nicolò Cesa-Bianchi, and Paul Fischer. 2002. “Finite-Time Analysis of the Multiarmed Bandit Problem.” Machine Learning 47 (2). https://doi.org/10.1023/A:1013689704352.
Bartholdi, John J., Craig A. Tovey, and Michael A. Trick. 1989. “The Computational Difficulty of Manipulating an Election.” Social Choice and Welfare 6 (3): 227–41.
Bulow, Jeremy, and Paul Klemperer. 1996. “Auctions Versus Negotiations.” The American Economic Review 86 (1): 180–94. http://www.jstor.org/stable/2118262.
Christiano, Paul, Buck Shlegeris, and Dario Amodei. 2018. “Supervising Strong Learners by Amplifying Weak Experts.” arXiv Preprint arXiv:1810.08575.
Gibbard, Allan. 1973. “Manipulation of Voting Schemes: A General Result.” Econometrica 41 (4): 587–601.
Gordon, Noah J., Vaishnavh Nagarajan Shankar, Shi Feng, Yejin Choi, and Noah A. Smith. 2022. “Jury Learning: Integrating Dissenting Voices into Machine Learning Models.” In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2658–73. Association for Computational Linguistics.
Hartline, Jason D., Yingkai Li, Liren Shan, and Yifan Wu. 2020. “Optimization of Scoring Rules.” CoRR abs/2007.02905. https://arxiv.org/abs/2007.02905.
Hartline, Jason D., Liren Shan, Yingkai Li, and Yifan Wu. 2023. “Optimal Scoring Rules for Multi-Dimensional Effort.” In Proceedings of Thirty Sixth Conference on Learning Theory, edited by Gergely Neu and Lorenzo Rosasco, 195:2624–50. Proceedings of Machine Learning Research. PMLR. https://proceedings.mlr.press/v195/hartline23a.html.
Irving, Geoffrey, Paul Christiano, and Dario Amodei. 2018. “AI Safety via Debate.” arXiv Preprint arXiv:1805.00899.
Kong, Yuqing, and Grant Schoenebeck. 2019. “An Information Theoretic Framework for Designing Information Elicitation Mechanisms That Reward Truth-Telling.” ACM Trans. Econ. Comput. 7 (1). https://doi.org/10.1145/3296670.
Lai, T. L, and Herbert Robbins. 1985. “Asymptotically Efficient Adaptive Allocation Rules.” Advances in Applied Mathematics 6 (1): 4–22. https://doi.org/https://doi.org/10.1016/0196-8858(85)90002-8.
Mansour, Yishay, Aleksandrs Slivkins, and Vasilis Syrgkanis. 2019. “Bayesian Incentive-Compatible Bandit Exploration.” https://arxiv.org/abs/1502.04147.
Mansour, Yishay, Aleksandrs Slivkins, Vasilis Syrgkanis, and Zhiwei Steven Wu. 2021. “Bayesian Exploration: Incentivizing Exploration in Bayesian Games.” https://arxiv.org/abs/1602.07570.
Myerson, Roger B. 1981. “Optimal Auction Design.” Mathematics of Operations Research 6 (1): 58–73.
Russo, Daniel, and Benjamin Van Roy. 2015. “An Information-Theoretic Analysis of Thompson Sampling.” https://arxiv.org/abs/1403.5341.
Satterthwaite, Mark Allen. 1975. “Strategy-Proofness and Arrow’s Conditions: Existence and Correspondence Theorems for Voting Procedures and Social Welfare Functions.” Journal of Economic Theory 10 (2): 187–217.
Vickrey, William. 1961. “Counterspeculation, Auctions, and Competitive Sealed Tenders.” Journal of Finance 16 (1): 8–37.