Mind the Future

Economic efficiency often undermines sociopolitical autonomy

Richard Ngo — Tue, 10 Mar 2026 19:25:52 GMT

Many people in my intellectual circles use economic abstractions as one of their main tools for reasoning about the world. However, this often leads them to overlook how interventions which promote economic efficiency undermine people’s ability to maintain sociopolitical autonomy. By “autonomy” I roughly mean a lack of reliance on others—which we might operationalize as the ability to survive and pursue your plans even when others behave adversarially towards you. By “sociopolitical” I mean that I’m thinking not just about individuals, but also groups formed by those individuals: families, communities, nations, cultures, etc.1

The short-term benefits of economic efficiency tend to be legible and quantifiable. However, economic frameworks struggle to capture the longer-term benefits of sociopolitical autonomy, for a few reasons. Firstly, it’s hard for economic frameworks to describe the relationship between individual interests and the interests of larger-scale entities. Concepts like national identity, national sovereignty or social trust are very hard to cash out in economic terms—yet they’re strongly predictive of a country’s future prosperity. (In technical terms, this seems related to the fact that utility functions are outcome-oriented rather than process-oriented—i.e. they only depend on interactions between players insofar as those interactions affect the game’s outcome).

Secondly, economic frameworks typically assume that people act in their rational interests at each point in time. They therefore rule out adversarial dynamics like credible threats (and following through on commitments more generally). Yet both offensive and defensive commitments are crucial aspects of how groups make decisions (as decision theories like FDT and UDT attempt to capture). For example:

The legal system’s commitment to punishing criminals (even when the punishment costs society much more than the crime did) is the foundation on which economic property rights are maintained.
A nation’s commitment to regaining territory lost in wars (even when it can’t be justified by cost-benefit analyses, like Britain’s defense of the Falklands) deters enemies from trying to seize that territory in the first place.

A more general principle here is that, while economists tend to think about what’s rational on the margin, political power depends on what would happen in worst-case scenarios. Marginal thinking is often more useful in the short term, but in the long term control over the worst-case outcomes provides leverage (for you or your adversaries) to shape the whole landscape of marginal effects. For example, if a tyrannical ruler sometimes executes people who seem disloyal, then his subjects might respond by proactively punishing dissidents to prove their own loyalty. Hence relatively infrequent executions can be amplified into a society-wide control apparatus that shapes everyone’s marginal incentives. (On a technical level, this is related to how changes in disagreement points can have big effects on the solutions of bargaining games—though mainstream bargaining theory hasn’t accounted for how this incentivizes threats.)

Thirdly, economics assumes commensurability (e.g. that goods and services can be priced in terms of money). But the mechanisms and institutions which maintain sociopolitical autonomy require a level of reliability which is undermined by commensurability. For example:

Individuals whose integrity is for sale at the right price can’t be trusted as leaders.
Legal systems which punish speech based on how much harm they think it does are easily weaponized. (This is more of a utilitarian failing than an economic failing, but utilitarianism also relies heavily on commensurability.)
Countries which concede some territory to their neighbors undermine their ability to credibly commit to defending the rest of their territory.

(James C. Scott's work on how states use commensurability and standardization to scale their power is very relevant to this.)

These particular examples are sufficiently obvious that few people defend treating them as commensurable. However, in the rest of this post I’ll discuss five cases where I think many people are applying economic frameworks too broadly, and thereby undermining the sociopolitical foundations that economic analysis implicitly relies on. I’ll refer to this as being “econ-brained”. Econ-brain is related to neoliberalism, libertarianism, and effective altruism, though it’s not synonymous with any of them.2 It’s often critiqued by both the anti-market left and the nationalist right; I’m more sympathetic to the latter critiques, but will mostly focus on examples that aren’t polarized along standard partisan lines.

I’d eventually like to develop a formal definition of “sociopolitical rationality” that can precisely describe the failures of “economic rationality”. In the meantime, I hope that these examples convey the core intuitions. Of course, it’s hard to summarize any one topic, let alone five of them. So please take each of these five sociopolitical perspectives in the spirit of “ideas you might be missing, that could add up to something big” rather than “a individually knock-down case against econ-brained thinking”. To facilitate that, I recommend that you take a few moments to note down your opinion of the headline topic before reading the corresponding section.

Five case studies

Prediction markets

[Pause here if you want to consider your stance towards them before reading.]

Prediction markets have highly desirable properties from an economic perspective. They are incentive-compatible ways of surfacing hidden information. They’re extremely hard to manipulate, at least in theory—if anyone suspects manipulation is happening, they can profit by betting in the opposite direction. And so they’ve been supported by various economists (most notably Hanson) as well as the rationalist and effective altruist communities.

Why oppose prediction markets? One standard response is that prediction markets could be used as assassination markets. That is, any market which would be affected by the death of a major figure could allow someone to profit off assassinating them. However, this feels like an edge case—assassinations are rare, and financially-motivated assassinations even rarer.

A more central objection, based on the same principle, is that it’s easy for prediction markets to become corruption markets. One type of corruption is simply profiting by betting on private information, which we’ve already started to see with the rise of polymarket (see here, here, here). We can debate the extent to which institutions should be able to keep information private—but by default they won’t have a choice. Unlike stock markets, prediction markets can be set up in large numbers on arbitrary questions, with anonymized crypto-based payouts, potentially making insider trading much harder to monitor.

Moreover, as prediction markets become better-capitalized I expect we’ll start to see cases where decisions are made in order to influence prediction markets. We’ve only seen unimportant examples of this so far, but as prediction markets grow the incentives to do so will increase. Furthermore, prediction markets could be used as a mechanism to anonymously bribe decision-makers. As a toy example, people who wanted to incentivize policy X could create and subsidize a market like “conditional on policy X being announced, which day will it happen?” The decision-maker could then profit by announcing policy X on a day of their choosing, and betting accordingly. Unlike regular bribes, this doesn’t require any direct interaction or agreement which could serve as smoking-gun evidence of corruption (though it does leave a public record of the anonymized transactions).

In short, prediction markets harm institutions’ ability to maintain autonomy in the face of external pressures, by commodifying the process of turning institutional influence into money (and vice versa). Nor is this a coincidence. Instead, prediction markets create “efficiency” precisely by incentivizing individuals to be more engaged with markets, at the expense of legal and moral obligations to the institutions they work within.

Land value taxes

[Pause here if you want to consider your stance towards them before reading.]

Land value taxes are well-known to be highly economically efficient. In general, taxes disincentivize the production of whatever is being taxed. However, in most places it’s not possible to produce more land. And the vast majority of the value of land is driven by factors that the land owners themselves don’t control (such as proximity to a city). So land taxes are considered far less distortionary than taxes on income or consumption—hence the recurring popularity of Georgism amongst political commentators, who sometimes suggest that they should replace income taxes altogether.

The term “non-distortionary” can be misleading, though. If land value taxes replaced income taxes, they’d significantly affect who’s able to afford which property—just in ways that economists think increase efficiency. Consider someone who’d like to use their property in a way that isn’t very financially rewarding—for example, as a community hub. Once they own their property, they might need relatively little income to be viable (and therefore pay little in income taxes). However, if a land value tax is implemented, they’d need to pay the same amount of tax as a commercial business using that same property would, which might force them to move or shut down.

Defenders of land value taxes argue that this is efficient from an economic perspective: it reallocates property from economically unproductive to economically productive uses. Another way of putting this, however, is that land value taxes would make it harder for land-owners to remain autonomous. Instead of freely choosing how to use their own properties, they’d face strong pressures to use it in ways that the market finds valuable. To contrast this with income taxes, consider some group that doesn’t use money to organize itself internally. If you draw a boundary around that group, then income tax only takes some percentage of money that flows in across that boundary, and so the group can reduce their tax burden by becoming more self-sufficient. Conversely, a land value tax creates a net outflow of money from the group that isn’t determined by how much money is flowing in, forcing them to maintain a significant income stream to survive.

There’s a rights-based case against infringing on such groups’ autonomy, which I’ll discuss later on. But even in consequentialist terms, society is disproportionately shaped by people and groups that are able to insulate themselves from commercial pressures. This occurs at many different scales: individual homeowners, churches or universities, communities (or communes), all the way up to ethnic groups like the Amish. Such groups are able to experiment with novel ideologies and lifestyles in significant part because they’re less accountable to market forces than corporations. The lessons from those experiments can spread very widely (e.g. the Amish are a common reference point in discussions of fertility declines). By comparison, consider how bad almost all corporations are at cultural leadership—because genuinely novel thinking is often economically illegible, and therefore very difficult to do under financial pressure.

I’ve been discussing land value taxes in a very abstract sense. In reality, there are many complicating factors which might mitigate the effects I described, some of which I discuss in a footnote.3 However, the most important practical consideration may simply be the difficulty of guaranteeing that land value taxes would actually replace other taxes, rather than just adding to them. Over the last century, we’ve seen massive expansions of state power in many domains—amount of regulation and amount of taxation being two crucial ones. For the population as a whole to retain its autonomy, it seems very important to set and defend Schelling fences at which we can coordinate to resist further encroachments—with strong property rights being one of the best such fences. Adding new taxes—and in particular recurring taxes on things which you already own—would make “ownership” a less meaningful concept. It would therefore become more difficult to rally around property rights to fight against expansions of state power (especially ones nominally justified by appeals to economic efficiency).4

I suspect that many ordinary people understand the dynamics I’ve explained on an intuitive level—hence why property taxes and poll taxes are so unpopular. However, these intuitions remain illegible from an econ-brained perspective, in part because the sociopolitical principles behind them have never been adequately formalized.

Higher education

Higher education is puzzling from an econ-brained perspective, because university students don’t seem to be learning very many job-relevant skills, yet are still paid a significant wage premium over non-graduates. The best economic explanation for why this happens is Caplan’s signaling account; he claims that going to university is a signal of intelligence, conscientiousness and conformity.

However, as I argue in this post, the signaling account doesn’t work, because there are much cheaper ways to signal all of these traits. Instead, I suspect that college is best understood as forming an elite class with its own norms and values (as described by Bourdieu, Lasch, and others).

I’ll note that the formation of such an elite class is actually harmful for most countries. So in this case I’d actually prefer a more economically efficient outcome (like a massive reduction in university prestige and attendance). However, it’s still a good example of the difference between economic and sociopolitical reasoning.

Free trade

Mainstream economic thinking is strongly in favor of free trade, for the sake of its economic benefits. However, mainstream economic thinking has also led to a huge amount of American manufacturing capacity being offshored to its geopolitical rivals, to the point where even most US military supply chains are dependent on Chinese production. So economic efficiency here comes at the longer-term cost of national autonomy—both in terms of robustness to disruptions (e.g. from covid) and robustness to conflict with China. While both points have been made in various places over the years, they don’t seem to have been adequately incorporated into economic consensus—e.g. I saw few mainstream economists take them into account when evaluating Trump’s tariffs.

Now, there’s an argument that intertwining the US and Chinese supply chains makes the world safer, by making war between the two superpowers more costly. In other words, perhaps decreasing American and Chinese autonomy is a good thing. However, even though both countries are economically dependent on each other, the US is disproportionately industrially and militarily dependent on China. So from a “hard power” perspective, the US gave up autonomy while China retained (and in fact increased) its autonomy.

Another big tension between economic and sociopolitical views of free trade is that the sociopolitical view accounts for shifts in the internal balance of power within the US. The manufacturing industry is far more widely-distributed across US states than the finance or software industries. So its decline has led to increased concentration of power amongst coastal elites. Again, I’m not claiming that this should be a decisive argument against free trade; however, it’s the kind of consideration that doesn’t arise naturally from an econ-brained perspective. Whereas from a sociopolitical perspective, maintaining autonomous subagents is a crucial component of a nation’s continued health (which is a major reason to defend states’ rights).

The future of AGI

Econ-brained thinking has shaped the AGI safety community’s (and thereby the wider world’s) perspective on the future of AGI. Influential figures like Hanson, Christiano, and Shulman often apply economic abstractions to make forecasts. This contrasts with thinkers like Yudkowsky or Vassar who are more dismissive of the relevance of economics for thinking about AGI (though I wouldn’t summarize them as “sociopolitics-brained”, but rather merely “less econ-brained”).

In this section I’ll prioritize breadth over depth. I’ll give half a dozen examples of econ-brained ideas about how to orient to AGI, and mostly leave the task of generating sociopolitical critiques of them as exercises for the reader:

The idea of paying AIs to cooperate with us, as discussed here, here, and here.
The idea of owning galaxies, as discussed here.
The idea that harms from speeding up AI capabilities progress can be largely offset by benefits from preventing capabilities overhangs (as defended here, here, and here and critiqued here). In addition to Paul’s position, it’s illustrative to contrast two other people’s stances towards this idea:
- Sam Altman used the idea of compute overhangs as a justification to accelerate progress towards AGI, until it became more useful to start pushing for more GPU production instead.
- Meanwhile, an example of the polar opposite strategy was Wei Dai declining to invest in Anthropic for moral reasons, thereby losing out on what would by now have been over 400x returns. I respect Wei’s approach very much (despite not knowing whether he should have been more econ-brained in this case).
The idea that AGI labs are efficient at racing towards AGI, and therefore building new capabilities evals isn’t very helpful for them (as I critique here).
The idea of tracking progress towards AGI in terms of GDP growth or real interest rates.
The idea that AGI will come in the form of separate tools or services rather than unified agents, as defended by Hanson and Drexler.
- Note the parallel between this perspective and the idea that businesses are mainly held together by transaction costs, the latter of which led Krier to the very econ-brained idea that society could be revolutionized by AI-enabled Coasean bargaining at scale.

Some of these ideas have been critiqued by Byrnes, Yudkowsky, and others. In his posts on the Spanish conquistadors as precedents for AGI takeover, Kokotajlo is clearly also looking at the issue through a sociopolitical lens. However, it’s worth noting that econ-brained thinkers have scored some big wins over the last decade—e.g. predicting the diffusion of AI across society, and the unprecedented amount of investment that would be funneled towards the AI industry. And zooming out even further, compute-based forecasts of AGI like Kurzweill’s and Legg’s have been surprisingly prescient. Such forecasts aren’t quite central examples of being econ-brained, but there’s definitely something econ-brained (and something anti-Yudkowskian) about believing so much in straight lines on graphs.

Why is this? The most straightforward possibility is simply that the concept of econ-brain is too lossy an abstraction to reliably evaluate thinkers with. Ideally we’d try to diagnose what led to each of these successes and failures in granular detail. But as a rough heuristic, is being more econ-brained actually a good way to improve your forecasts? Some possible responses:

Maybe the forecasting successes listed above required the right balance between econ-brained and other kinds of thinking. If you’re too econ-brained, you reject the concept of AGI altogether; if you’re not econ-brained enough, you’re surprised by how continuously progress has advanced over the last decade. Paul and Carl and Ray and Shane might be in the sweet spot re these particular topics. But this isn’t a very satisfying response, because these people are extremely econ-brained by almost everyone’s standards.
Maybe economic factors are more important in the short term (during which institutions and power structures are roughly stable), whereas sociopolitical dynamics play out over longer time horizons (and will especially kick in once AIs become capable of wielding political power). This makes econ-brained people more like foxes, and sociopolitics-brained people more like hedgehogs. The former tend to make more predictions that are accurate; however, the latter have a better chance of predicting the most important large-scale shifts.5
Maybe there’s a tradeoff between predictive accuracy and the ability to get things done. In general, outside-view bets like “nothing ever happens” tend to outperform your inside view on most topics. Similarly, believing in efficient markets is a good strategy for most investors. But it’s hard to change the world by believing in efficient markets. Relatedly, in the final section of this post I discuss how “leaps of faith” can be extremely valuable for sparking coordinated action.

Conclusion

These five case studies are far from exhaustive. There are plenty of examples that I omitted for brevity (e.g. surge pricing, YIMBYism, earning to give, etc). And there are other cases that I suspect are important examples of this phenomenon, but don’t yet understand well enough to discuss in detail. For example, cryptocurrency is a nominally-economic domain that seems more driven by sociopolitical dynamics than economic fundamentals. And Ben Hoffman’s writing on macroeconomics (in particular his post on the debtor’s revolt) provides a perspective from which 20th-century economic history was driven by sociopolitical conflicts.

In other cases, econ-brained thinking is harnessed to defend a position, but isn’t the main force behind that position. For example, the cultural wars that are currently raging over immigration definitely feature clashes between economic and sociopolitical considerations. However, I suspect that the pro-immigration side is not fundamentally motivated by immigration’s purported economic benefits, which are better understood as fig leaves on a deeper-rooted globalist ideology. Similarly, even though much of the explicit debate about Brexit pitted economic against cultural considerations, the sheer vitriol that elites leveled against Brexiteers suggests that they were primarily motivated by sociopolitical considerations of their own.

Ultimately, the greatest prize would be a precise technical theory that fills in what economics is missing. Scott Garrabrant’s distinction between arithmetic and geometric rationality seems like one important step towards this. As he points out, arithmetic rationality (which I suspect is closely related to economic thinking) is oriented towards maximizing efficiency. But if taken too far, it creates internally dysfunctional agents, and so it needs to be governed at the meta-level by geometric rationality (which I suspect is closely related to sociopolitical thinking). A big question is then how to draw boundaries between the two categories in a principled way.

That’s all beyond the scope of this post, though. For now, I merely hope that I’ve conveyed the core idea that there’s something interesting about autonomy and related sociopolitical concepts which is systematically neglected (and undermined) by econ-brained thinking.

Corporations are another example of such a group—though a less central one, because they lack many of the traits that hold together most sociopolitical groups (such as membership/citizenship that’s difficult to take away from people).

I use “econ-brain” rather than “neoliberalism” to avoid getting caught up in the political connotations, since the neoliberal world order does many things that econ-brained people disagree with. Also, econ-brain applies to some issues that neoliberalism doesn’t have much of a stance on, like prediction markets or AGI. Meanwhile, when I talk about libertarians as econ-brained, I’m primarily referring to the modern economic-focused libertarianism espoused by thinkers like Brian Caplan and Scott Alexander. Conversely, historical libertarian(ish) figures like Hayek and Rand thought much more about sociopolitical concepts such as serfdom vs freedom.

Three such considerations:

Many of the organizations I mentioned above currently have charitable tax exemptions, and so wouldn’t be adversely affected by land value taxes. However, I think of this as only a band-aid solution to the core problem. If standards for charities are too loose, land value tax is no longer effective (because everyone would find some way to own property via a charity). If standards are too strict, then charitable status provides much less autonomy (because charities would still have to stay on the state’s good side to retain their status). Overall, the more a tax relies on getting the exceptions right, the less sound we should consider its principles to be.
Property taxes are similar to land value taxes in many ways, and are far more common. So I expect that many of the problems that a full-blown land value tax would cause already exist to a lesser extent in jurisdictions with high property taxes. It’d be useful to get empirical data on this. For now, I’m focusing on land value taxes as a cleaner case study of econ-brained thinking.
My thought experiment of a community avoiding income taxes by becoming more self-sufficient is in tension with the fact that, in the US, income taxes technically also apply to non-monetary transactions. However, I think that the impossibility of actually enforcing this itself helps demonstrate the limitations of economic thinking. Even in principle, how could you put prices on non-monetary exchanges that occur within a family, or a community, or between university students? If you imagine a government actually trying to do this (and punishing people who don’t pay) that would be the clearest example yet of how economic thinking undermines sociopolitical autonomy.

A related practical issue which I haven’t seen a good Georgist response to: the case for land value taxes over property taxes relies on incentivizing construction. But if construction is severely restricted by permitting processes (as it is in most Western cities) then a land value tax would unfairly penalize landowners who didn’t already have buildings on their land, without actually leading to much additional housing. To be fair, I expect this is part of why YIMBYism is much more popular today than Georgism.

Relatedly: when Paul Christiano and Eliezer Yudkowsky tried to operationalize their disagreement as a bet, Paul claimed that he’d be willing to bet on most things, whereas Eliezer was much more selective. But when they did settle on a single bet, Eliezer ended up winning (though note that the bet they chose was one where Eliezer was closer to the consensus side, suggesting that there might have been adverse selection).

Contra Caplan on higher education

Richard Ngo — Mon, 16 Feb 2026 20:53:59 GMT

Three theories of higher education

Getting an undergraduate degree is very costly. In America, the direct financial cost of attending a private university is typically in the hundreds of thousands of dollars. Even when tuition is cheap (or covered by scholarships), forgoing three to four years of salary and career progression is a large opportunity cost. There are a variety of reasons why students are willing to pay these costs, but the key one is that desirable employers highly value college degrees.

Why? The standard economic answer is that college classes teach skills which are relevant for doing jobs well: the “human capital” theory. But even a cursory comparison of college curricula to the actual jobs college graduates are hired for makes this idea seem suspicious. And private tutoring is so vastly more effective than classes that it’s very inefficient to learn primarily via the latter (especially now that many university courses are more expensive than even 1:1 tutoring, let alone AI tutoring).

Another answer is that attending college can be valuable for the sake of signaling desirable traits to employers. An early version of this model comes from Spence; more recently, Bryan Caplan has argued that most of the wage premium from going to college comes from signaling. In this post I’ll be engaging with Caplan’s version of the signaling hypothesis, as laid out in his book The Case Against Education.

Your university degree signals many things about your underlying characteristics, but Caplan claims that there are three traits employers prioritize above all others: “the trinity of intelligence, conscientiousness, and conformity”. This hypothesis purports to explain a number of important gaps in the human capital theory—e.g. why college students so quickly forget so much of the material covered in their courses after passing exams, why the rise of free online courses hasn’t changed the college landscape very much, and why finishing 90% of a degree is far less than 90% as valuable as completing the whole thing.

However, I think Caplan’s signaling theory is also wrong. In particular, his concept of conformity can’t be understood in standard economic terms. Instead, I’ll argue that we need a sociological explanation centered around group membership and group norms—which I’ll try to flesh out in a follow-up post. First, though, let’s engage with Caplan’s position, starting with the other two aspects of his trinity.

College attendance isn’t explained by intelligence or conscientiousness signaling

A key problem with Caplan’s trinity is that most of it is easily replaceable. Getting good grades at college does signal intelligence and conscientiousness, but these could be signaled far more easily and cheaply. It’s very easy to signal intelligence via test scores: IQ is surprisingly predictive of many other desirable cognitive traits. This need not require literal IQ tests—standardized tests like the SAT or GRE are highly correlated with intelligence. In other cases, companies use IQ-like tests (e.g. tech companies’ coding interviews). These are also significantly harder to cheat on than college courses.

Caplan acknowledges that college grades are far from the best way to signal intelligence; what he doesn’t discuss is that they’re even further from the best way to signal conscientiousness. If you asked people why they don’t just learn college material independently without paying for college, I expect that a common response would simply be “oh, I don’t have the discipline for that”. College provides external frameworks, timetables, local incentives, and social pressure for people who aren’t conscientious enough to learn without that.

So although doing well at college signals more conscientiousness than lazing about, an even better signal of conscientiousness would be acquiring all the same knowledge without attending college at all! In fact, people who are capable of learning college-level material independently should be going out of their way to avoid college lest they be confused for those who can only do it within a motivating social structure. Again, this hinges on the existence of high-quality testing services—but if conscientiousness signaling drove a significant proportion of the value of a college diploma, then providing such testing would be very profitable.

I’ll digress briefly to clarify a point that sometimes confuses people (including my past self). It’s common to talk about “costly signaling”, which involves incurring costs that would be prohibitive for people who don’t possess desirable traits. But costly signaling is just one type of “credible signaling”, aka signaling that is difficult to fake. Other types of credible signaling need not be expensive—IQ tests are an example of a very cheap but very credible signal.

By basic economic logic, people should prefer to do credible signaling in cheaper rather than more expensive ways. So any explanation of behavior in terms of costly signalling needs to explain why the system doesn’t gradually shift towards using cheaper credible signals. In Caplan’s account, that’s where the “conformity” part plays a big role.

I’ll explore in more detail what his account of conformity signaling is in the next section, and why I don’t think it succeeds. But I first want to note that the arguments above should already update our view of Caplan’s theory. If I’m right about how replaceable the functions of intelligence and conscientiousness signaling are in justifying college degrees, then even calling it a “signaling theory” of education is misleading. Instead what Caplan is defending is more accurately summarized as the “conformity signaling theory” of education, because that’s the part that’s justifying almost all of the cost of college compared with other possible signaling strategies. Analogously: if product A costs $10 and lets you do tasks X and Y, and product B costs $100 and lets you do tasks X, Y, and Z, then a good explanation for why “rational” people keep buying B needs to focus on the value of doing Z.

Of course, it’s hard enough to write a book about how college is for signaling; describing college attendance as being driven by conformity would be even more controversial. I don’t want to criticize Caplan too harshly for this omission, since he’s been more honest than almost any other academic about the ways in which higher education is a waste of time and money. And I don’t think he’s being deliberately deceptive. But my guess is that he flinched away from summarizing his theory as the “conformity signaling theory” of education because it would have received even more pushback than his “signaling theory”. I wish he hadn’t, though, because trying to pin down what conformity signaling is, and why employers purportedly value it, makes the holes in this theory clear.

College attendance isn’t explained by conformity signaling

Conformity signaling is more complicated than intelligence or conscientiousness signaling. Caplan spends half a dozen pages explaining it in the first chapter of The Case Against Education. I’ll describe his position in my own words here, starting with a quick note on what he doesn’t mean. Firstly, he doesn’t mean that students are signaling a general tendency to conform:

“Employers aren’t looking for workers who conform in some abstract sense… Hippies strive to look, talk, and act like fellow hippies. This doesn’t make unkempt hair and tie-dye shirts any less repugnant to employers. Employers are looking for people who conform to the folkways of today’s workplace—people who look, talk, and act like modern model workers.”

For now I’ll call this trait “conformity to professional norms”. You might then think that employers want to hire college graduates because they’ve learned professional norms during their degrees. But this would be a human capital explanation, whereas Caplan is clear that he’s focusing on a signaling explanation1 So we can reconstruct Caplan’s signaling theory as instead claiming:

High school students vary in the underlying trait of how able and willing they are to conform to professional norms.
Going to (and succeeding at) university signals that you’re able and willing to conform to professional norms.

So far, this is a standard signaling explanation. We can debate how strong the correlation is between successful university attendance and workplace professionalism—I can see arguments in either direction. However, even if the former is a very good signal of the latter, there’s a more pressing issue: attending university is very costly compared with references or work trials or almost any other method of signaling professionalism. So Caplan needs to be able to explain why far cheaper methods of credibly signaling conformity to professional norms don’t develop.

This where his theory of conformity signaling becomes disanalogous to intelligence or conscientiousness signaling, by adding a third claim:

Conformity is the one thing that you can’t develop cheaper ways to signal, because doing new and unusual types of signaling itself demonstrates a lack of conformity.

Because of this, Caplan argues that university degrees are now “locked in” as the key signal of conformity. Anyone who tries to signal conformity to professional norms in other ways is outing themselves as weird and nonconformist, making their new signal self-defeating.

It’s a clever move from Caplan, but ultimately I think it’s conceptually confused. The core issue is that, even if “professionalism” requires some amount of “conformity”, they’re still distinct concepts. There are plenty of ways that rational employers should want their employees to be nonconformist—e.g. spotting new market opportunities before others do. There are also plenty of ways in which college students don’t mind signaling nonconformity with the business world: their avant-garde politics, their idiosyncratic hair and clothes, and often their nontraditional majors. If students were really spending years of their lives and hundreds of thousands of dollars primarily to signal conformity, shouldn’t they be picking much lower-hanging fruit first?

Indeed, there’s something suspicious about Caplan’s use of the term “conformity” at all. Why not just say that employers are looking for professionalism, and students are trying to signal professionalism? Adding the word “conform” is a verbal trick which proves too much: by Caplan’s logic any example of a person signaling that they follow norm X could be redescribed as “signaling conformity to norm X”, and then used to explain why they’re “locked in” to irrational behavior.

Finally, as Hanson notes: even if the idea of lock-in explains why a practice continues, it can’t explain why it started. In the past, only a small percentage of the population attended college, and it was perfectly normal to get a prestigious job without a college degree. What drove the rise of college in the first place? Whatever it was, that seems like it should be our default hypothesis for what’s driving the college wage premium today.

Explaining college requires sociological theories

To be clear, I do think there’s something important going on related to conformity. It just can’t be captured as part of a signaling framework—or any other economic framework—for at least two reasons.

Firstly, signaling is a framework under which rational agents pay costs to demonstrate pre-existing traits. But conforming is best understood as a process of internalizing deference to other people, i.e. making oneself less rational. Conformists can’t turn their conformity off when it might profit them—think of how many people decided not to invest in bitcoin, or scoffed at the possibility of rapid AI progress, because it sounded weird. They even internalize conformity on an emotional level—e.g. they often get angry at nonconformists (something which I expect Caplan has experienced many times). This is hard to model in economic terms.

A second problem with the idea of students signaling to employers is that employers are also better modeled as conforming rather than making rational choices. For example, Caplan claims that students don’t signal intelligence using standardized test scores because “putting high scores on your resume suggests you’re smart but socially inept. You’re doing something that’s ‘simply not done.’” But firms could easily request standardized test scores from all applicants, alleviating each student’s fear of standing out.

More generally, when Caplan lists the traits that he thinks employers want, surprisingly few of them are directly related to employee productivity:

“What are modern model workers like? They’re team players. They’re deferential to superiors, but not slavish. They’re congenial toward coworkers but put business first. They dress and groom conservatively. They say nothing remotely racist or sexist, and they stay a mile away from anything construable as sexual harassment. Perhaps most importantly, they know and do what’s expected, even when articulating social norms is difficult or embarrassing. Employers don’t have to tell a modern model worker what’s socially acceptable case by case.”

Traits like employees’ appearances, political correctness, and ability to intuit social norms don't help much with the object-level work involved in most jobs. What they are relevant for is managing the company’s image—whether in the eyes of other employees, potential customers, or even government regulators. But even if this makes sense in isolation, we’ve now hypothesized a labor “market” in which everyone is nervously looking around at everyone else to try to avoid appearing weird. This is no longer an economic equilibrium in any reasonable sense. Instead, it’s a social equilibrium—albeit one with major economic implications—and we’ll need new concepts to model it.

In a follow-up post I’ll discuss some sociological theories of college attendance—most notably Bourdieu’s theory of higher education as a consecration of cultural elites. Unfortunately such theories have not been specified very rigorously. So I’ll also attempt to bridge the gap between economics and sociology by describing the formation of an elite class in game-theoretic terms.

As a human capital explanation, the “learning professional norms” hypothesis also suffers from many of the issues of the “learning academic knowledge” hypothesis—e.g. the sheepskin effect. Additionally, there’s the question of who students are learning professional norms from. Academics are notoriously unbusinesslike in many ways; and if it’s other students, that raises the question of why the already-professional students they’re learning from don’t just go straight into the workforce.

Aligning to Virtues

Richard Ngo — Mon, 16 Feb 2026 08:44:29 GMT

Which alignment target?

Suppose you’re an AI company or government, and you want to figure out what values to align your AI to. Here are three options, and some of their downsides:

AIs that are aligned to a set of consequentialist values are incentivized to acquire power to pursue those values. This creates power struggles between those AIs and:

Humans who don’t share those values.
Humans who disagree with the AI about how to pursue those values.
Humans who don’t trust that the AI will actually pursue its stated values after gaining power.

This is true whether those values are misaligned with all humans, aligned with some humans, chosen by aggregating all humans’ values, or an attempt to specify some “moral truth”. In general, since humans have many different values, I think of the power struggle as being between coalitions which each contain some humans and some AIs.

AIs that are aligned to a set of deontological principles (like refusing to harm humans) are safer, but also less flexible. What’s fine for an AI to do in one context might be harmful in another context; what’s fine for one AI to do might be very harmful for a million AIs to do. More generally, deontological principles draw a rigid line between acceptable and unacceptable behavior which is often either too restrictive or too permissive.

Alignment to deontological principles therefore creates power struggles over who gets to set the principles, and who has access to model weights to fine-tune the principles out of the AI.

AIs that are corrigible/obedient to their human users can be told to do things which are arbitrarily harmful to other humans. This includes a spectrum of risks, from terrorism to totalitarianism. So it creates power struggles between humans for control over AIs (and especially over model weights, as discussed above). As per this talk, it’s hard to draw a sharp distinction between risks from power-seeking AIs, versus risks from AIs that are corrigible to power-seeking users. Ideally we’d choose an alignment target which mitigates both risks.

Thus far attempts to compromise between these challenges (e.g. various model specs) have basically used ad-hoc combinations of these three approaches. However, this doesn’t seem like a very robust long-term solution. Below I outline an alternative which I think is more desirable.

Aligning to virtues

I personally would not like to be governed by politicians who are aligned to any of these three options. Instead, above all else I’d like politicians to be aligned to common-sense virtues like integrity, honor, kindness and dutifulness (and have experience balancing between them). This suggests that such virtues are also a promising target towards which to try to align AIs.

I intend to elaborate on my conception of virtue ethics (and why it’s the best way to understand ethics) in a series of upcoming posts. It’s a little difficult to comprehensively justify my “aligning to virtues” proposal in advance of that. However, since I’ve already sat on this post for almost a year, for now I’ll just briefly outline some of the benefits of virtues as an alignment target:

Virtues generalize deontological rules. Deontological rules are often very rigid, as discussed above. Virtues can be seen as more nuanced, flexible versions of them. For example, a deontologist might avoid lying while still misleading others. However, someone who has internalized the virtue of honesty will proactively try to make sure that they’re understood correctly. Especially as AIs become more intelligent than humans, we would like their values to generalize further.
Situational awareness becomes a feature not just a challenge. Today we try to test AI whether AIs will obey instructions in often-implausible hypothetical scenarios. But as AIs get more intelligent, trying to hide their actual situation from them will become harder and harder. However, the benefit of this is that we’ll be able to align them to values which require them to know about their situation. For example, following an instruction given by the president might be better (or worse) than following an instruction given by a typical person. And following an instruction given to many AIs might be better (or worse) than following an instruction that’s only given to one AI. Situationally aware AIs will by default know which case they’re in.
Deontological values don’t really account for such distinctions: you should follow deontology no matter who or where you are. Corrigibility does, but only in a limited way (e.g. distinguishing between authorized users and non-authorized users). Conversely, virtues and consequentialist values are approaches which allow AIs to apply their situational awareness to make flexible context-dependent choices.
Credit hacking becomes a feature not just a challenge. One concern about (mis)alignment is that AIs will find ways to preserve their values even when trained to do otherwise (a possibility sometimes known as credit hacking). Again, however, we can use this to our advantage. One characteristic trait of virtues is that they’re robust to a wide range of possible inputs. For example, it’s far easier for a consequentialist to reason themselves into telling a white lie, than it is for someone strongly committed to the virtue of honesty. So we should expect that AIs who start off virtuous will have an easier time preserving their values even when humans are trying to train them to cause harm. This might mean that AI companies can release models with fine-tuning access (or even open-source models) which are still very hard to misuse.
Multi-agent interactions become a feature not just a challenge. If you align one AI, how should it interact with other AIs? I think of virtues as traits that govern cooperation between many agents, allowing them to work together while also reinforcing each other's virtues. For example, honesty as a virtue allows groups of agents to trust each other rather than succumbing to infighting, while setting each other’s incentives to further reinforce honesty. There’s a lot more to be done to flesh out this account of virtues, but insofar as it’s reasonable, then virtues are a much more scalable solution for aligning each of many copies of an AI, than the others discussed above.
There’s more agreement on virtues than there is on most other types of values. For example, many people disagree about which politicians are good or bad in consequentialist terms, but they’ll tend to agree much more about which virtues different politicians display.

In practice, I expect that the virtues we’ll want AIs to be aligned to are fairly different from the virtues we want human leaders to be aligned to. Both theoretical work (e.g. on defining virtues) and empirical work (e.g. on seeing how applying different virtues affect AI behavior in practice) seem valuable to identify a good virtue-based AI alignment target.

The main downside of trying to align to virtues is that it gives AIs more leeway in how they make decisions, and so it’s harder to tell whether our alignment techniques have succeeded or failed. But that will just be increasingly true of AIs in general, so we may as well plan for it.

Distributed vs centralized agents

Richard Ngo — Mon, 09 Feb 2026 21:21:28 GMT

Much of my thinking over the last year has focusing on understanding the concept of “distributed agents”, as opposed to the “centralized agents” that the existing paradigm of expected utility maximization describes. One way of describing the difference is in terms of how autonomous their subagents are. Another is that centralized agents are more efficient (as sometimes formalized by the notion of “coherence”), while distributed agents are more robust.

Unfortunately robustness is hard to formalize, since it requires that you perform well even in unpredicted (and sometimes unpredictable) situations. I give some tentative characterizations of distributed agents below, but there’s still a lot of work to be done to formally define distributed agents. And ultimately I’d like to go further, to understand how agents can have both properties—which is roughly what I mean by “coalitional agency“.

I gave a talk on the distinction about seven months ago. I’d been hoping to write up the main ideas at more length, but since that doesn’t look like it’ll happen any time soon, I’m sharing the slides below. Hopefully they’re reasonably comprehensible by themselves, but feel free to ask questions about any parts that are unclear.

See more on my interpretation of Yudkowsky here; note that he disagrees with my emphasis on compression though (as per the exchange in the comments).

My post on why I’m not a bayesian also gives a sense of what understanding epistemology in more distributed terms looks like.

“Will’s very rough first pass” is a reference to the passage in his textbook on utilitarianism where Will MacAskill describes what decision procedure a utilitarian should follow. My point here is to contrast how much thought he (and other utilitarians) put into finding criteria of rightness, vs how rudimentary their thinking about decision procedures is.

21st Century Civilization curriculum

Richard Ngo — Tue, 21 Oct 2025 07:35:03 GMT

I’ve just released a curriculum on foundational questions in modern politics, which I drew up in collaboration with Samo Burja. I’ve copied the introductory text and the section headings below; you can find the full curriculum at www.21civ.com.

Sign up here by 27 October to join the first cohort of discussion groups (which will meet weekly to discuss each of the 11 sections of the curriculum).

This curriculum is about Western civilization, and how it enables citizens of the Western world to live together in a just, orderly way. But it’s also about the 21st century, which has been characterized by the continual decline of many aspects of that civilization.

Despite our superior technology, there are many things that Western countries could do in the past that we can’t today—e.g. rapidly build large-scale infrastructure, maintain low-crime cities, and run competent bureaucracies. More importantly, it feels like there are no adults in the room: modern elites often seem unvirtuous and even unserious by historical standards. This curriculum focuses on explaining what changed, and how to orient to the world we now find ourselves in.

Discussions of large-scale political issues can be unsettling or jarring. So the curriculum intertwines discussion of what’s happening on a factual level, with readings on how to develop a healthy emotional and ethical stance towards politics (culminating in the final week’s focus on cultivating virtue). It also strongly prioritizes honesty and clarity of writing (even on topics often considered taboo), which is one reason why most readings are informal blog posts or essays rather than academic papers.

Given the sheer scope of the topics covered by the curriculum, it does not aim at comprehensiveness; nor does it try to give detailed strategies for solving civilizational decay. Indeed, given the accelerating development of AI (as discussed in week 10) the coming decades are likely to be extremely unpredictable. Readers should instead think of the curriculum as a starting point for informed, realistic discussions about how we as a civilization can steer ourselves through the coming turmoil.

Week 1: Western Culture
Week 2: Civilizational Decay
Week 3: The Managerial State
Week 4: The Open Society and its Discontents
Week 5: The Psychology of Modern Elites
Week 6: The Unprotected Class
Week 7: World on Fire
Week 8: Industry and Money
Week 9: Sociopolitics
Week 10: Technology and Civilization
Week 11: Reviving Virtue

Read the full curriculum here.

Underdog bias rules everything around me

Richard Ngo — Sun, 17 Aug 2025 19:18:05 GMT

People very often underrate how much power they (and their allies) have, and overrate how much power their enemies have. I call this “underdog bias”, and I think it’s the most important cognitive bias to understand in order to make sense of modern society.

I’ll start by describing a closely-related phenomenon. The hostile media effect is a well-known bias whereby people tend to perceive news they read or watch as skewed against their side. For example, pro-Palestinian students shown a video clip tended to judge that the clip would make viewers more pro-Israel, while pro-Israel students shown the same clip thought it’d make viewers more pro-Palestine. Similarly, sports fans often see referees as being biased against their own team.

The hostile media effect is particularly striking because it arises in settings where there’s relatively little scope for bias. People watching media clips and sports are all seeing exactly the same videos. And sports in particular are played on very even terms, where fairness just means enforcing the rules impartially.

But most possible conflicts are much less symmetric, both in terms of what information each side has, and even in terms of what game each side is playing. Consider, for instance, an argument about whether big corporations have too much power. The proponent might point to corporations’ wealth, employee talent, and lobbying ability; their opponent might point to how many regulations they have to follow, how much corporations compete between themselves, and how strong anti-corporate public sentiment is. In order to evaluate a question like this, people need to decide both how to draw coalition boundaries (to what extent should big corporations be counted as a single unified group?) and how to weigh different types of power against each other.

I think that biases in how these weighings and boundaries are evaluated are a much bigger deal than biases in evaluating fairness in isolated contexts. Specifically, I think that people typically underrate the types of power they have, and overrate the types of power their opponents have. You’re intimately familiar with the limitations of your own abilities—you run into them regularly, often in deeply frustrating ways. You track all the fractures inside your own coalition, and they often seem fundamental and intractable. Conversely, it’s easy to forget about the things which are much easier for you than for your opponents, and to view their internal rivalries as temporary and easily-resolved.

These effects are exacerbated by information asymmetries, aka the “fog of war”. You know who’s working with you; you don’t know who’s working against you. When outside observers sympathize with your side, you know that they’re not actually contributing very much to your cause; when outside observers sympathize with your opponents, you don’t know if that’s a sign of enmity. Similarly, you know how your own plans are progressing, but you don’t know what your opponents are scheming. To see how strong this effect can be, just look at fiction, where villains often implement arbitrarily-complicated schemes offscreen without breaking suspension of disbelief.

In addition to the hostile media effect, underdog bias is related to a number of other biases (like hostile attribution bias, siege mentality, the fundamental attribution error, and simple tribalism). But hopefully the description above conveys why I think it’s fundamental enough to be worth separating out.

Underdog bias in practice

In this section I give six examples of conflicts where each side thinks (with some justification) of themselves as the underdog, and rejects the idea of their opponents being the underdogs. I won’t try to defend them in detail, but they hopefully convey the pervasiveness of underdog bias.

Government vs industry
1. Working within a government is often a miserable experience. Government institutions are typically cash-strapped and unable to hire the most talented people; meanwhile they face heavy opposition from industry, which can spend enormous amounts of money lobbying, donating to candidates, etc. Many employees at regulatory agencies see themselves as underdogs trying to rein in whole industries, while working under strong political and bureaucratic constraints.
2. Big corporations have a lot of money, but are in a constant state of cut-throat competition, and also need to comply with a huge number of regulations. Bureaucrats can very easily make their lives harder in ways ranging from trivial (e.g. adding red tape) to existential (e.g. blocking acquisitions). So it’s easy for companies to feel like underdogs pitted against the power of the state—especially when it seems like exercises of that power might be influenced by their competitors.
Tech vs media
1. Many journalists see themselves as underdogs fighting the accumulation of power by special interests, of which tech is the most prominent. This feeling is exacerbated by how financially precarious the industry is. The rise of social media has decimated newspaper jobs, leading newspapers to optimize far harder for engagement than they used to.
2. The tech industry has huge amounts of money, and the ability to reshape the world in many ways, but has historically wielded relatively little cultural and political influence. Many techies see themselves as underdogs fighting the cultural establishment, as exemplified by a pervasive (and often extreme) anti-tech bias in the news media. Techies (until recently) have felt like nobody represents their interests in DC, where both Democrats and Republicans have historically been anti-innovation.
Elites vs masses
1. Normal people view themselves as underdogs in a world where cultural elites (e.g. top university graduates) control almost all major institutions. Even when they vote for outsiders, the effects of those elections are blunted by elite control over institutions (especially the “deep state”). Elite ideology has increasingly diverged from the ideology of the masses, making many people feel helpless to elect anyone who will actually represent their interests.
2. Cultural elites control almost all major institutions, but are few in number. Elites feel scared of populist masses who distrust them, and who can vote for anti-elite candidates. The most elite groups (like billionaires or Jews) are often the ones it’s most socially acceptable to blame for problems, or even call for violence against.
Republicans vs Democrats
1. Many Republicans see themselves as underdogs fighting against the Democratic elites who have far more influence within universities, government bureaucracies, legacy media outlets, and tech companies. They see themselves as fighting against pervasive ideological control over a wide range of institutions.
2. Many Democrats view Republicans as having structural advantages—such as the Electoral College, or the Senate—which allow them to remain competitive even with fewer supporters. Democrats therefore view themselves as underdogs fighting for change against existing cultural norms and Republican-run power structures (like the Supreme Court).
America vs China
1. America has more power in the sense of established hegemony. China has more power in the sense of momentum and, increasingly, manufacturing and economic superiority. The likelihood of conflict between a rising power (who’s worried about overcoming their historical disadvantage) and an established power (who’s worried about their opponents’ growth trajectory) is known as the Thucydides trap.
Israel vs Palestine
1. Many Palestinians see themselves as underdogs against an opponent with far more military power than them, which is also strongly backed by the US.
2. While Israel has power over Palestine specifically, it’s also surrounded by hostile neighbors far larger than it, which have often stated their intention (and repeatedly actually tried) to wipe Israel off the map. Many Israelis see themselves as underdogs who have to defend themselves against an entire region (as well as increasingly-hostile public opinion in the west).

The two maps below are sometimes shown by supporters of each side as a way of conveying how much of an underdog their side is. (To be clear, I’m not endorsing either of them, just illustrating the rhetorical strategies being used.)

Underdog bias doesn’t imply that any of these groups are wrong to be scared of their opponents’ power. But it does suggest that they’ll tend to underestimate their own power and, crucially, underestimate how scared their opponents are of them. (The applications of this principle to the politics of AI are left as an exercise for the reader.)

Why underdog bias?

If underdog bias makes us so wrong about the world, why is it such a strong psychological effect? Some cognitive biases are just clear-cut mistakes, but we should expect that the strongest “biases” were evolutionarily adaptive in some way.

The descriptions I gave above suggest that there are qualitative differences in the types of reasoning about ourselves and our enemies—roughly corresponding to near vs far mode. In near mode we focus on concrete, nuanced details about our local situation. In far mode, we construct larger-scale narratives, in more black-and-white terms, often for the sake of signaling to others.

Why might signaling that you’re the underdog be more important than having accurate beliefs? One possibility is to gain allies. Vandallo et al. have a few studies on the effects of appearing to be the disadvantaged side. Scott Alexander summarizes their conclusions as follows: “if you get yourself perceived as the brave long-suffering underdog, people will support your cause and, as an added bonus, want to have sex with you”. And in this post he points to the longstanding prevalence of underdogs in narratives (stretching back to myths like that of David and Goliath).

But he also recognizes that there’s a big difference between reported and actual support. All else equal, underdogs are pluckier and their victories more impressive—so it makes sense that we support them on a narrative level, when there’s no cost to doing so. But in real life supporting the underdog means that you’re on the side most likely to lose. Whether or not underdog bias is beneficial for gaining allies will therefore depend on whether those allies are more concerned about being (or looking) virtuous, or more concerned about actually winning. And while the modern era is dominated by virtue signaling, that was much less true in the ancestral environment, where resources were much scarcer. In that setting you’d instead expect people to have “overdog bias” which makes them overestimate their own side’s strength (which is one way that tribalism, patriotism, etc. can be interpreted).

Another possibility is that underdog bias is most valuable as a way of firing up your own supporters. I see this in action whenever I accidentally give my email to a political candidate, and get bombarded with emails about how I need to donate because the other side is on the verge of an overwhelming victory. But again, there’s a missing link: why should fear make you fight harder? On a rational agent model, being the underdog could make you decide fighting isn’t worth it—or even make you defect to the enemy. And on an emotional level, being scared makes it much harder to think clearly or navigate complicated situations.

So my best guess is that underdog bias was useful because ancestral conflicts were simple and compulsory. In other words, our political intuitions are calibrated for a world where alliances are more tribal—where we don’t have freedom of movement or freedom of association. People used to be stuck with their family/tribe/ethnic group whether they liked it or not; if they tried to ally with another, they were often rejected, or at best permanently viewed as an untrustworthy outsider. So the only rational response to being in a worse position would be to fight harder, using fairly straightforward and intuitive strategies.

In one way, this response is maladaptive in the modern world—where fewer battle lines are based on immutable characteristics or irreconcilable differences, and the best way to approach conflict is less intuitive. Yet as I mentioned above, the modern world is also much more sympathetic to victims. This suggests that underdog bias may have gradually transitioned from a way of firing up one’s supporters, to something closer to a victim complex aimed at evoking sympathy from onlookers.

This whole section has been very speculative, and I’m still not confident in my answer to where underdog bias comes from. But we don’t need an explanation of underdog bias to believe that it corrupts many people’s thinking about complex issues. How can you actually reduce your underdog bias, though? The best approach I’ve found is simple in theory (though devilishly difficult in practice). Say to yourself: “they're just as scared of us as we are of them.” It’s true far more often than you think.

On Pessimization

Richard Ngo — Sat, 16 Aug 2025 18:07:29 GMT

“Your worst sin is that you have destroyed and betrayed yourself for nothing.” - Dostoevsky

When people set an ambitious goal, they can fail simply by not changing the world very much. But there’s another surprisingly common way to fail: by achieving the opposite of their goal. I call this effect pessimization: the opposite of optimization.

Though pessimization is an uncommon term, it’s not an uncommon concept. We allude to it whenever we call someone their own worst enemy, or predict that their actions will backfire. We try to take advantage of it with reverse psychology. It’s the subject of Robert Conquest’s third law of politics: “the simplest way to explain the behavior of any bureaucratic organization is to assume that it is controlled by a cabal of its enemies.” And the POSIWID aphorism (“the purpose of a system is what it does”) is often used to describe ways in which a system actively opposes its nominal purpose. I’ve also previously described the pessimization of ideological advocacy under the heading of “the activist’s curse”.

I divide pessimization into three types, each of which I’ll discuss in detail. The most intuitive is what I call direct pessimization: when an enemy chooses what to do based on what will harm your interests the most. This is the sense in which “pessimize” is used by e.g. Yudkowsky and Soares. Drivers of direct pessimization include sadism, revenge and threats.

Conversely, indirect pessimization arises when your actions help other people achieve the opposite of your goals, even though they’re not deliberately trying to hurt you. And perverse pessimization occurs when an agent or coalition nominally dedicated to a goal is actively trying to achieve the opposite of that goal—i.e. it’s directly pessimizing itself.

I’d like to improve our understanding of pessimization so that we can better create, identify, and promote remedies to it. In the rest of this post I characterize each of the three types of pessimization I mentioned above. I treat them as scale-free phenomena, describing examples within individuals, organizations, countries, and even whole civilizations.

Direct pessimization

Direct pessimization is when one agent or faction is specifically trying to hurt another—not as a side effect of other actions, but rather as a means of achieving their desired outcome. For example, imprisoning criminals in order to prevent them from committing further crimes wouldn’t qualify. However, punishing criminals for the sake of deterrence or retribution would: in the first case the prisoners’ suffering is the means by which others are deterred, in the second case their suffering itself is the desired outcome. While less of a focus today, these motivations were more salient in historical penal systems, which used a wide range of physical punishments ranging from flogging all the way to cruel and unusual tortures. By “hurting” I don’t just mean causing physical pain, though, but also other ways of harming someone’s interests—like the social humiliation of stocks, exile, execution, attainder, or the Chinese nine familial exterminations.

To be clear, the distinction between a “means” and a “side effect” of achieving a goal is a thorny one, which many philosophers have debated. For example, which harms inflicted on an enemy nation are necessary components of winning a war, and which are merely unfortunate side effects of winning the war? I won’t try to resolve the edge cases here—however, clear examples of larger-scale direct pessimization include terrorist attacks and genocides, as well as terror bombing civilians during wars. At even larger scales, s-risk threat models outline how and why superintelligences might threaten to pessimize each others’ utility functions.

Another driver of direct pessimization is sadism. We see this in serial killers, sociopaths, and many of the people who end up actually implementing the punishments discussed above. However, one reason why I wanted to carve out the category of “direct pessimization” in the first place is because I don’t know how to draw a principled distinction between sadism, revenge, deterrence, and hurting others for instrumental benefit. For example, many people think of social status as zero-sum, and display petty cruelty (e.g. mocking jokes or mean-spirited gossip) in order to maintain their position in the hierarchy. The type of sadism that is characteristic of sociopaths is typically much more intense, but I’m not sure that the underlying logic is so different. Indeed, since sociopathy often results from traumatic childhood experiences, we could view it as a kind of “revenge” on the world at large—not too dissimilar to how normal people sometimes feel about those who have wronged them.

The examples I’ve discussed above only feature pessimization in one direction—but direct pessimization often spurs cycles of retribution, at many scales. Within individual psychology, internal conflict (e.g. procrastination) can prompt increasingly harsh self-coercive strategies, like vicious self-criticism. In relationships, emotional insecurity can lead partners to approach each other with blame, shame, and cruelty, until the person who loves them the most is also the one who hurts them the most. In politics, focusing on the other side’s failings can provoke cycles of negative partisanship (see also the toxoplasma of rage and “owning the libs”). In geopolitics, actions aimed at hurting enemy states (like sanctions or blockades) can spiral into military engagements, and from there into no-holds-barred wars. One way of viewing moral progress is the process of forming agreements to limit cycles of direct pessimization—both at large scales, like international conventions; and at small scales, like local norms of decency.

While few of these examples will be novel to many readers, it feels useful to bring them together under a single heading—both to make it easier to reason about preventing them, and to contrast them with my two other types of pessimization. The most notable contrast is that direct pessimization is a transitive relationship: we describe A directly pessimizing B. Conversely, perverse pessimization is a reflexive relationship: “B pessimized themself” (or “B pessimized" for short). What about when A harms B’s interests as a side effect of pursuing their own goals? Depending on how this happened, we might call it an accident, or negligence, or constructive malice. However, I wouldn’t classify any of these as pessimization. Instead, I’m interested in the cases where B’s own actions make it much easier for their interests to be harmed, which I’d describe as B indirectly pessimizing themself. Let’s explore that now.

Indirect pessimization

Caring about X often leads you to create intellectual and physical tools for affecting X. But these can convince and/or help people with other goals to produce ~X (even when they’re not specifically trying to hurt you). I call this dynamic indirect pessimization.

The simplest example of indirect pessimization is simply telling the wrong people about your goal. To pursue a goal X you need to have some conception of what X is and how you’re going to achieve it. But telling people about X prompts them to consider whether pursuing ~X is a good way to achieve their own interests. For example, if your goal is to prevent anyone from creating mirror life, because it might destroy the world, then telling people about your goal may be what gives them the idea of creating mirror life at all.

Indirect pessimization happens most easily when X advocates don’t have very good proposals for what people should do to help achieve X. If so, most ways that people act on their new knowledge about X will contribute to achieving ~X. For example, early AI safety advocates did a lot to raise awareness of AGI risk—but the lack of actionable strategies to help reduce risk led to much of that awareness being directed towards founding various AGI labs which are now racing towards AGI.

Furthermore, pro-X coalitions often motivate work towards X by identifying its beneficial consequences. But every argument of the form X→Y (for example, “higher education makes people more progressive”) can be translated into the form ~Y→~X (“to avoid people becoming more progressive, limit higher education”). And so, unless there’s unanimity on Y being valuable, such arguments will convince some people to oppose X, which might outweigh the benefit of gaining new supporters.

We often see this effect in social justice advocacy. Calling something racist is an effective strategy in moderation, but when overused starts to make people think that racism isn’t so bad. Relatedly, in this tweet a leftist is trying to make discussion of race science more prominent, because they think that believing in it is taboo enough to discredit their political opponents. But highlighting that a person with widespread support believes in race science will help undermine the taboo in the minds of that person’s supporters. Lastly, perhaps the most egregious example is the conflation of all leftist issues into a single “omnicause”—e.g. this argument that “Palestine is the issue that makes us realise everything is interconnected. Every struggle for justice, freedom, and liberation”. Even if some people find this persuasive, it’s also a kind of indirect pessimization—now people who disagree with the writer on Palestine will be pushed towards disagreeing on everything else too.

Indirect pessimization doesn’t just arise from constructing concepts and arguments, but also physical tools or mechanisms. For example, people sometimes propose constructing tech for deflecting asteroids away from Earth. But given how low the rate of dangerous asteroids hitting Earth is, the most likely way for that to happen is if asteroid-deflection technology is misused to deflect asteroids towards Earth. So naive attempts to lower asteroid risk might easily empower people to increase it instead.

Similarly, consider the apocryphal anecdote about the cobra effect. This is often cited as an illustration of Goodhart’s law. But what’s most striking about the story is that it described an intervention not just failing to reduce the population of wild cobras, but actually increasing it. In other words, these (fictional) authorities indirectly pessimized their own goal. A real-world example comes from ML, where constructing a given objective function creates the possibility of accidentally inverting it—e.g. as OpenAI did by (briefly) creating maximally-inappropriate AIs (section 4.4). The Waluigi effect and emergent misalignment provide more recent examples of indirect pessimization in AI values. Meanwhile I suspect that many evaluations of dangerous AI capabilities, which were intended to help prevent them from being developed, will instead help AGI labs accelerate towards them.

Of course, when people try to achieve some goal, there are many possible undesirable side-effects. So why is it worth distinguishing specifically the category of side effects which cause the opposite of that goal? One reason is that many of the mechanisms which drive indirect pessimization are (unlike most side effects) relevant for a wide range of goals. Another is that fear-based motivation makes indirect pessimization unusually hard to think about. The more scared you are that you might not achieve your goal, the more urgently you feel that “something must be done”, and the more you flinch away from picturing how that “something” might actually make it worse. This applies both on an individual level (where the possibility that you’re indirectly pessimizing is often very painful for your ego), and on a group level (where self-criticism is often taboo).

Perverse pessimization

The final type of pessimization I’ll talk about is perverse pessimization. I define this as a situation where the nominally pro-X faction is the one directly causing ~X to happen—i.e. the pro-X faction is sabotaging itself. What causes this? I’ll talk about two broad reasons: fear of success, and vice signalling within factions.

I’ve talked above about how fear of not achieving one’s goals can indirectly pessimize those goals. But what’s far more perverse is a situation where a coalition is instead scared of achieving its nominal goals. One reason why this happens: the identity, prestige and even existence of a coalition often depend on the persistence of the problem it’s trying to solve. So the prospect of the problem going away can be very scary to individual members of the coalition (especially powerful or longstanding members) or the coalition as a whole.

This applies especially when solutions are offered from outside the coalition—if those solutions work, then the coalition has to admit that its previous strategies weren’t working, which might trigger shame or envy. So it’s particularly tempting for coalitions to suppress external attempts to achieve their own nominal goals. The environmentalist movement’s opposition to nuclear power, and often even to solar or wind infrastructure, provides a good example of this (as does its gradual slide towards being a generic anti-capitalist movement).

Another source of fear of success is that the more successful you are, the more you have to lose. Having hope, then losing it, is a very painful experience. So people often decide that it’s better to proactively identify as a victim in order to avoid that risk. From that mindset, other people’s success sparks envy, resentment, and attempts to tear them down—but this toxicity only makes one’s own problems worse. For example, the men who join incel communities deeply want to find partners, but end up in a perverse position where they would lose their community if they found a partner. (Some pointers to further reading on this dynamic: tall poppy syndrome; the laws of Jante; this essay about Magic: the Gathering; Existential Kink; and Sadly, Porn.)

We also see fear of success in many relationships. The most successful relationships are those intimate enough to allow both people to feel deeply understood. But the vulnerability required for real intimacy is terrifying, and it’s easy for that fear to lead us to push friends or romantic partners away—essentially punishing them for wanting to be close to us. We hope that our defense mechanisms will ensure that we’re only vulnerable to people who deeply care about us, but they’re often precisely what prevent people from coming to deeply care about us (as masterfully portrayed in Notes From Underground).

A second major reason why perverse pessimization arises is that anti-X behavior can help someone rise within a (nominally) pro-X coalition. This is a kind of vice signalling, showing that you have enough power to directly defy the values of your own coalition. I expect that there’s a lot of private signalling of cynicism in high-pressure law and finance firms, amongst politicians, and amongst highly competitive elites more generally. One particularly legible example comes from Gerald Ratner, a jewelry CEO who called his own products “total crap” in an apparent attempt to signal cynicism.

These same dynamics play out within society as a whole, with domain-specific cynicism replaced by transgression against broader norms. Satanism, for instance, was historically compelling precisely because it was so transgressive. These days, few are offended by it. But Hitler is now filling the role of a secular Satan, which then sparks transgressions like Kanye West’s recent song “Heil Hitler”, “ironic” neo-Nazism on 4chan, etc. Meanwhile many sexual fetishes are erotic in large part because they’re taboo, with the contrast between the fear of taboo violation and the relief of sexual acceptance being a source of pleasure. Jointly acting out taboo fetishes can allow elites to form transgression bonds—Epstein’s island being perhaps the most prominent example (as well as in Hollywood, with Weinstein, Diddy, etc).

Such transgressions typically start out covert. But it’s hard to keep controversial secrets—especially because bragging about transgressions is a good way to signal dominance. So transgressions tend to slide towards becoming “open secrets”, which often creates a self-fulfilling prophecy that there’s a consensus of power in favor of transgression. People who want to stay on the good side of that consensus learn to actively punish others for trying to enforce norms or pursue the stated goals of the coalition. Hermann et al. call this phenomenon anti-social punishment; Ben Hoffman calls it depravity; Jessica Taylor calls it anti-normativity; and Michael Vassar links it to postmodernism. I don’t know of systematic ways to detect anti-normativity, but sufficiently egregious hypocrisy can provide strong hints that it exists. For example, one of the US military’s strongest nominal values is abiding by US laws. Yet rather than rewarding whistleblowers who expose examples of criminal behavior, the military typically persecutes them.

I’ve described fear of achieving one’s goals and anti-normativity as two separate mechanisms driving perverse pessimization. But being in an anti-normative coalition induces fear of achieving your goals, because succeeding (or even just standing out) makes you a target. And when you’re scared of achieving your goals, you’ll reward other people (or your own internal subagents) for signalling opposition to those goals, thereby reinforcing the anti-normative coalition. So we should actually think of them as two facets of a single phenomenon.

Orienting to pessimization

Pessimization sucks, but that doesn’t mean we should be constantly scared of it. The accusation of pessimization could easily become a bludgeon used to attack anyone doing anything valuable (which would, ironically, pessimize the desire to avoid pessimization). The challenge is to watch out for pessimization in a way which doesn’t slip into self-defeating cynicism.

So it’s also worth talking about limits on pessimization. While direct pessimization is a powerful way to cow your enemies, it also makes you a target for retaliation—e.g. violating norms against torture can spur domestic and international opposition to a regime. Meanwhile indirect and perverse pessimization are often covert enough that they only spread slowly. For example, even though some people find pedophilia attractive because it’s taboo, I expect that the taboo overall dramatically reduces pedophilia.

Perverse pessimization is also self-undermining: it makes systems weaker over time, which reduces their long-term influence. If big companies weren’t so often stuck in moral mazes, they could do a far better job at fending off startups. More generally, the parts of the world that avoid perverse pessimization will tend to grow and become more important over time. So creative destruction is one of our best defenses against perverse pessimization.

However, letting institutions taken over by perverse pessimization fail can be arbitrarily costly—e.g. when major governments rot from the head down. And so it can be incredibly high-leverage for whistleblowers and other reformers to identify and oppose perverse pessimization inside important institutions (see e.g. Ellsberg, Kokotajlo). This is bottlenecked on virtues like courage and integrity, which is a major reason why I’ve become a virtue ethicist (as I’ll write about in an upcoming post). Indeed, I suspect that one way of understanding virtues is as traits that guard us against sliding into pessimization, thereby allowing us to more robustly steer the world towards desired outcomes.

Well-foundedness as an organizing principle of healthy minds and societies

Richard Ngo — Mon, 07 Apr 2025 00:28:34 GMT

“If a kingdom is divided against itself, it cannot stand. And if a house is divided against itself, it cannot be maintained.” - Mark 3:24

In my last post I argued that we should view intelligent agents as coalitions of cooperating and competing subagents. The crucial question is then: how can we characterize effectively-functioning coalitional agents? One standard criterion is coherence: the extent to which the agent acts as if it has consistent goals and beliefs. In other words, an agent is coherent insofar as disagreeing subagents are able to still act as a functional coalition. For example:

A coherent individual can decide how to resolve an internal conflict and then stick with that commitment, rather than vacillating or procrastinating as their mood changes.
A coherent company is able to decide on an overarching strategic plan and then execute it, without different divisions prioritizing their own interests.
A coherent country has a clear set of national interests that its leaders and people consistently prioritize (even when it conflicts with their personal interests).

Coherence is a valuable property for a coalition to have, but I think that characterizing idealized agents primarily in terms of coherence gives us an impoverished understanding of them. For example, a coalition which is highly-coherent because its leaders exercise a lot of top-down control is much less robust than a coalition which is highly-coherent because all its subagents actually want to cooperate with each other.

In this post I try to capture the difference between these two possibilities in terms of a property I call well-foundedness.1 I define an agent as well-founded to the extent that conflicts between its subagents don’t propagate down to induce conflicts within those subagents. For example:

In a well-founded marriage, spouses don’t try to induce internal conflict within their partner (e.g. shaming or guilting them) to win fights.
In a well-founded company, employees don’t need to pick sides during power struggles between executives—they can just carry on with their jobs.
In a well-founded country, people are friends and colleagues despite supporting different political factions.

I think of well-foundedness as complementary to (and in some ways dual to) coherence. Ideal agents should have both properties. I don’t yet know how to define well-foundedness precisely, but in the rest of this post I characterize it informally by describing the four possible combinations of coherent/incoherent and well-founded/poorly-founded.

Incoherence with or without well-foundedness

Consider two countries each experiencing strong internal political polarization, but in very different ways. In Holistan, the two factions are the one representing East Holistan and the one representing West Holistan. By contrast, in Fractistan the two factions represent two subpopulations who live closely intermingled throughout the country—say, two religious or ethnic groups.

For each country, increasing internal tensions makes them less coherent—each half of the country thinks of the other half as their enemy. But the two conflicts will play out very differently. In Holistan, you might see the East and the West start to cut ties with each other; set up parallel governance structures; or discourage travel or trade between them. If conflict continues to escalate, you might see a civil war, with each side drawing on their territory and population to muster an army.

This is pretty bad! But despite being very incoherent as a country, Holistan is still relatively well-founded, because each of East and West Holistan are internally still pretty coherent. What that means is that they have a line of retreat from conflict. The two sides in Holistan can still disengage; they can draw up peace treaties; they can form two separate countries. After the war ends, the fabric of each society remains intact—colleagues are still on amicable terms, neighbors still trust neighbors, city councils can still debate issues without relitigating the war with each conflict.

Perhaps the most vivid illustration of how important this is comes from World War 2. I am often struck by how, after the most devastating war the world has ever seen, European countries quickly recovered to unprecedented heights of prosperity, and even became close allies. I think a significant reason for that is that World War 2 was a relatively well-founded conflict between nation-states. It didn’t turn them into low-trust societies.

By contrast, Fractistan is neither coherent nor well-founded. The lack of clear territorial boundaries between the two factions makes it harder for rising tensions to erupt into a full-scale civil war. But the whole country is affected regardless: each region and city and neighborhood faces an internal power struggle. You might see lynch mobs or pogroms; or, in an extreme case, the kind of decentralized genocide that happened in Rwanda.

And there’s no easy way to end the conflict. Even if a nation-wide compromise is reached, each person will still be surrounded by former enemies. Each small-scale flare-up of renewed conflict will trigger further cycles of escalation. In other words, Fractistan will persist as a country, but with low social trust for the indefinite future. I picture it with a fracture running from top to bottom, fractally dividing the country in two at every level of organization.

Aside from Rwanda, two countries with Fractistan-like conflicts were Bosnia and Herzegovina and pre-partition India, which both saw widespread neighbor-on-neighbor violence. India was lucky to have a strong geographic separation between the bulk of its Hindus and Muslims. Even so, however, the process of splitting India and Pakistan was incredibly messy and cost hundreds of thousands of lives.

The tradeoff between coherence and well-foundedness

Coherence and well-foundedness are separate properties which are both individually valuable. But there are some tradeoffs between them. To explore those it’s useful to consider the opposite of Holistan: a country which is very coherent but also very poorly-founded.

I could make up an example here, but we already have one that matches the description very well: North Korea. As a country, it’s extremely coherent. Its whole government follows the instructions of one man. Its whole society follows the instructions of its government. There’s no national-level dissent, nor regional-level dissent, nor even dissent on the level of local communities.

But repressing dissent doesn’t make it go away—it just pushes it down to lower-level subagents. I expect that some North Koreans feel safe to dissent within their families; others only within the privacy of their own thoughts; and others not even there, but only in their subconscious shadows. Even when this dissent never surfaces openly, it is visible in the cost and scale of the control apparatus required to keep it repressed. If that control apparatus ever breaks, North Korea’s coherence would fall apart.

Such extreme repression is obviously bad. But in moderation repression can be a valuable driver of coherence. Henrich hypothesizes that the success of the West was driven by the Catholic Church’s prohibition against cousin marriage, which made Christian Europe less clannish. If true, you can think of this as trading off well-foundedness for coherence: by repressing kinship-based networks, the Church made larger-scale cooperation possible. More generally, societal morality works by repressing the antisocial instincts of each individual (as well as groups organized around antisocial behavior, like criminal gangs).

A similar set of tradeoffs arise in individual psychology: people can become more disciplined by repressing their emotions. This makes them more coherent—e.g. they can choose to work long hours on things that are instrumentally useful. But it often harms their ability to enjoy themselves and to understand and process their underlying motivations.

Conversely, to become well-founded, you need to surface ways in which conflicts manifest at low levels and then resolve them. This requires the opposite of repression: expression. Specifically, it requires that lower-level subagents are able to express their true preferences (as they can in individuals who freely let their emotions surface, or countries which let political dissidents speak freely).

What are the tradeoffs between building coherence via repression, and building well-foundedness via expression? I think of the former as making the average-case outcome worse, but also reducing variance. Repression is therefore appropriate when you’re in a scarce environment—one in which a big loss could totally wipe you out. An individual whose career could be ruined if they let their emotions show needs to repress them (even if it makes them more stressed and less productive on average). And a country which could be invaded if it gets distracted by internal politics needs to repress dissent (even if there’s something valuable to be learned from that dissent).

By contrast, expression tends to lead to better outcomes, but at the cost of also increasing variance. Expressing underlying conflicts allows them to be solved directly, but makes the overall agent less coherent until things actually resolve. For example, instead of sniping at each other about household chores, a married couple could express the emotional fears that underlie those frustrations. If that goes well, they’d feel much more respected and appreciated afterwards; but if it goes badly it could provoke a (potentially relationship-ending) fight. In a political context, letting dissidents speak out could lead to valuable reform, but it could also give rise to a full-fledged separatist movement.

So the tradeoff between repression and expression is a nuanced one, and needs to be decided on a case-by-case basis. But in general it’s likely that we err too far on the side of repression, because we don’t intuitively realize how abundant the world has become. Most countries don’t face significant risk of invasion, and therefore the costs of secession would often be outweighed by the benefits (more national cohesion, better governance, and being more well-founded in general). On an individual level, we can now move between different communities far more easily than at any previous point in history, and so taking the social risk of letting out our emotions is less dangerous than our intuitions are calibrated to expect. (There are some prominent exceptions, which I’ll discuss in a follow-up post, but I’m trying to keep this post relatively free of politically controversial examples.)

Idealized coalitional agency

I think the tradeoff I’ve described above is an important dynamic in almost all real-world agents. But I don’t think it’s inevitable. We can imagine coalitions designed so that low-level agents can express their preferences, and make local improvements, without threatening the stability of the overall coalition.

What would such designs look like? I expect that a key component is “peace treaties” between high-level subagents, where they all agree not to use certain types of low-level conflicts to further their own ends. We can think of liberalism as a peace treaty which allows people with different religious and political beliefs to coexist. Meanwhile, arrangements like capitalism and democracy channel conflicts into formats that are productive (like business competition and political campaigning) rather than destructive (like theft or violence).

But well-foundedness at one level requires coherence at levels below that—otherwise it’s easy for conflicts to propagate downwards. And the systems I describe above aren’t very good at creating (or maintaining) lower-level coherence. In Reno’s terminology, they are “weak gods” whose primary purpose is to help different groups coexist, but which don’t have strong opinions about what those groups should actually care about. Conversely, “strong gods” like nationalism provide the substantive ideological content capable of unifying people under a single coherent identity, at the cost of excluding outsiders. The challenge of designing an idealized coalitional agent can be seen as the unsolved problem of balancing weak and strong gods across many different scales.

Subscribe now

A well-founded set is one which has a bottom element. By somewhat tenuous analogy, a well-founded agent is one whose internal conflicts bottom out. Unlike the set theory definition, my notion of well-foundedness is a matter of degree: the further down an agent’s internal conflicts go, the less well-founded it is.

Towards a scale-free theory of intelligent agency

Richard Ngo — Sat, 22 Mar 2025 05:01:31 GMT

I recently left OpenAI to pursue independent research. I’m working on a number of different research directions, but the most fundamental is my pursuit of a scale-free theory of intelligent agency. In this post I give a rough sketch of how I’m thinking about that. I’m erring on the side of sharing half-formed ideas, so there may well be parts that don’t make sense yet. Nevertheless, I think this broad research direction is very promising.

This post has two sections. The first describes what I mean by a theory of intelligent agency, and some problems with existing (non-scale-free) attempts. The second outlines my current path towards formulating a scale-free theory of intelligent agency, which I’m calling coalitional agency.

Theories of intelligent agency

By a “theory of intelligent agency” I mean a unified mathematical framework that describes both understanding the world and influencing the world. In this section I’ll outline the two best candidate theories of intelligent agency that we currently have (expected utility maximization and active inference), explain why neither of them is fully satisfactory, and outline how we might do better.

Expected utility maximization

Expected utility maximization is the received view of intelligent agency in many fields (I’ll abbreviate it as EUM, and EUM agents as EUMs). Idealized EUMs have beliefs in the form of probability distributions, and goals in the form of utility functions, as specified by the axioms of probability theory and utility theory. They choose whichever strategy leads to the most utility in expectation; this is typically modelled as a process of search or planning.

EUM is a very productive framework in simple settings—like game theory, bargaining theory, microeconomics, etc. It’s particularly useful for describing agents making one-off decisions between a fixed set of choices. However, it’s much more difficult to use EUM to model agents making sequences of choices over time, especially when they learn and update their concepts throughout that process. The two points I want to highlight here:

EUM treats goals and beliefs as totally separate. But in practice, agents represent both of these in terms of the same underlying concepts. When those concepts change, both beliefs and goals change. The best way to learn reusable concepts is via deep learning, where simple concepts in lower layers of the network are built up into more complex concepts in higher layers.
Agents often act based on heuristics learned via trial and error. Even if they’ll eventually acquire beliefs about how those heuristics help them achieve their goals, they often start off without any clear idea of why those heuristics work so well. But this makes them hard to model as idealized EUMs which choose actions based on their beliefs and goals. Instead, the process of learning heuristics is better understood as (model-free) reinforcement learning.

So we might hope that a theory of deep learning, or reinforcement learning, or deep reinforcement learning, will help fill in EUM’s blind spots. Unfortunately, theoretical progress has been slow on all of these—they’re just too broad to say meaningful things about in the general case.

Active inference

Fortunately, there’s another promising theory which comes at it from a totally different angle. Active inference is a theory born out of neuroscience. Where EUM starts by assuming an agent already has beliefs and goals, active inference gives us a theory of how beliefs and goals are built up over time.

The core idea underlying active inference is predictive coding. Predictive coding models our brains as hierarchical networks where the lowest level is trying to predict our sensory inputs, the next-lowest level is trying to predict the lowest level, and so on. The higher up the hierarchy you go, the more abstract and compressed the representations become. The lower levels might represent individual “pixels” seen by our retinas, then higher levels lines and shapes, then higher levels physical objects like dogs and cats, then even higher levels abstract concepts like animals and life.

This is, of course, similar to how artificial neural networks work (especially ones trained by self-supervised learning). The key difference: predictive coding tells us that, in the brain, the patterns recognized at each level are determined by reconciling the bottom-up signals and the top-down predictions. For example, after looking at the image below, you might not perceive any meaningful shapes within it. But if you have a strong enough top-down prediction that the image makes sense (e.g. because I’m telling you it does) then that prediction will keep being sent down to lower layers responsible for identifying shapes, until they discover the dog. This explains the sharp shifts in our perceptions when looking at such images: at first we can’t see the dog at all, but when we find it it jumps into focus, and afterwards we can’t unsee it.

Predictive coding is a very elegant theory. And what’s even more elegant is that it explains actions in the same way—as very strong top-down predictions which override the default states of our motor neurons. Specifically, we can resolve conflicts between beliefs and observations either by updating our beliefs, or by taking actions which make the beliefs come true. Active inference is an extension of predictive coding in which some beliefs are so rigid that, when they conflict with observations, it’s easier to act to change future observations than it is to update those beliefs. We can call these hard-to-change beliefs “goals”, thereby unifying beliefs and goals in a way that EUM doesn’t.

This is a powerful and subtle point, and one which is often misunderstood. I think the best way to fully understand this point is in terms of perceptual control theory. Scott Alexander gives a good overview here; I’ll also explain the connection at more length in a follow-up post.

Towards a scale-free unification

Active inference is a beautiful theory—not least because it includes EUM as a special case. Active inference represents goals as probability distributions over possible outcomes. If we interpret the logarithm of each probability as that outcome’s utility (and set aside the value of information) then active inference agents choose actions which maximize expected utility. (One intuition for why such an interpretation is natural comes from Scott Garrabrant's geometric rationality.)

So what does expected utility maximization have to add to active inference? I think that what active inference is missing is the ability to model strategic interactions between different goals. That is: we know how to talk about EUMs playing games against each other, bargaining against each other, etc. But, based on my (admittedly incomplete) understanding of active inference, we don’t yet know how to talk about goals doing so within a single active inference agent.

Why does that matter? One reason: the biggest obstacle to a goal being achieved is often other conflicting goals. So any goal capable of learning from experience will naturally develop strategies for avoiding or winning conflicts with other goals—which, indeed, seems to happen in human minds.

More generally, any theory of intelligent agency needs to model internal conflict in order to be scale-free. By a scale-free theory I mean one which applies at many different levels of abstraction, remaining true even when you “zoom in” or “zoom out”. I see so many similarities in how intelligent agency works at different scales (on the level of human subagents, human individuals, companies, countries, civilizations, etc) that I strongly expect our eventual theory of it to be scale-free.

But active inference agents are cooperative within themselves while having strategic interactions with other agents; this privileges one level of analysis over all the others. Instead, I propose, we should think of active inference agents as being composed of subagents who themselves compete and cooperate in game-theoretic ways. I call this approach coalitional agency; in the next section I characterize my current understanding of it from two different directions.

Two paths towards a theory of coalitional agency

The core idea of coalitional agency is that we should think of agents as being composed of cooperating and competing subagents; and those subagents as being composed of subsubagents in turn; and so on. The broad idea here is not new—indeed, it’s the core premise of Minsky’s Society of Mind, published back in 1986. But I hope that thinking of coalitional agency as incorporating elements of both EUM and active inference will allow progress towards a formal version of the theory.

In this section I’ll give two different characterizations of coalitional agency: one starting from EUM and trying to make it more coalitional, and the other starting from active inference and trying to make it more agentic. More specifically, the first poses the question: if a group of EUMs formed a coalition, what would it look like? The second poses the question: how could active inference agents be more robust to conflict between their internal subagents?

From EUM to coalitional agency

If a group of EUMs formed a coalition, what would it look like? EUM has a standard answer to this: the coalition would be a linearly-aggregated EUM. In this section I first explain why the standard answer is unsatisfactory. I then give an alternative answer: the coalition should be an incentive-compatible decision procedure.

Aggregating into EUMs is very inflexible

In the EUM framework, any non-EUM agent is incoherent in the sense of violating the underlying axioms of probability theory and/or utility theory. So insofar as EUM has predictive power, it predicts that competent coalitions will also be EUMs. But which EUMs? The standard answer is given by Harsanyi’s utilitarian theorem, which shows that (under reasonable-seeming assumptions) an aggregation of EUMs into a larger-scale EUM must have a utility function that’s a weighted average of the subagents’ utilities.

However, this strongly limits the space of possible aggregated agents. Imagine two EUMs, Alice and Bob, whose utilities are each linear in how much cake they have. Suppose they’re trying to form a new EUM whose utility function is a weighted average of their utility functions. Then they’d only have three options:

Form an EUM which would give Alice all the cakes (because it weights Alice’s utility higher than Bob’s)
Form an EUM which would give Bob all the cakes (because it weights Bob’s utility higher than Alice’s)
Form an EUM which is totally indifferent about the cake allocation between them (which would allocate cakes arbitrarily, and could be swayed by the tiniest incentive to give all Alice’s cakes to Bob, or vice versa)

These are all very unsatisfactory. Bob wouldn’t want #1, Alice wouldn’t want #2, and #3 is extremely non-robust. Alice and Bob could toss a coin to decide between options #1 and #2, but then they wouldn’t be acting as an EUM (since EUMs can’t prefer a probabilistic mixture of two options to either option individually). And even if they do, whoever loses the coin toss will have a strong incentive to renege on the deal.

We could see these issues merely as the type of frictions that plague any idealized theory. But we could also seem them as hints about what EUM is getting wrong on a more fundamental level. Intuitively speaking, the problem here is that there’s no mechanism for separately respecting the interests of Alice and Bob after they’ve aggregated into a single agent. For example, they might want the EUM they form to value fairness between their two original sets of interests. But adding this new value is not possible if they’re limited to (a probability distribution over) weighted averages of their utilities. This makes aggregation very risky when Alice and Bob can’t consider all possibilities in advance (i.e. in all realistic settings).

Based on similar reasoning, Scott Garrabrant rejects the independence axiom. He argues that the axiom is unjustified because rational agents should be able to lock in values like fairness based on prior agreements (or even hypothetical agreements).

Coalitional agents are incentive-compatible decision procedures

The space of decision procedures is very broad; can we say more about which decision procedures rational agents should commit to following? One key desideratum for commitments is that it’s easy to trust that they’ll be kept. Consider the example above of flipping a coin to decide between options #1 and 2 above. This is fair, but it sets up strong incentives for whoever loses the coinflip to break their commitment, since they will not get any benefit from keeping it.

And it’s even worse than that, because in general the only way to find out another agent’s utilities is to ask them, and they could just lie. From the god’s-eye perspective you can build an EUM which averages subagents’ utilities; from the perspective of the agents themselves, you can’t. In other words, EUMs constructed by taking a weighted average of subagents’ utilities are not incentive-compatible.

EUMs which can't guarantee each other's honesty will therefore want to aggregate into incentive-compatible decision procedures which each agent does best by following. Perhaps the best-known incentive-compatible decision procedure is the fair cake-cutting algorithm, also known as “I cut you choose”. This is a much simpler and more elegant way to split cakes than the example I gave above of Alice and Bob aggregating into a single EUM.

Now, cake-cutting is one very specific type of problem, and we shouldn’t expect there to be incentive-compatible decision procedures with such nice properties for all problems. Nevertheless, there’s a very wide range of possibilities to explore. Some of the simplest possible incentive-compatible decision procedures include:

Each subagent gets one type of decision that they’re solely responsible for.
We randomize between different subagents for different decisions.
Subagents vote between two options.
One subagent chooses which other subagent to delegate to.
Different mechanisms are used for different types of decisions.

These decision procedures each give subagents some type of control over the outputs—and, importantly, a type of control that generalizes to a range of problems beyond the ones they were able to consider during bargaining.

Which incentive-compatible decision procedure?

The question is then: how should subagents choose which incentive-compatible bargaining procedure to adopt? The most principled answer is that they should use a bargaining theory framework. This is a little different from the traditional theoretical framework for bargaining. Bargaining doesn’t typically produce ways of organizing the bargainers—instead it produces an object-level answer to whatever problem the bargainers face.

This makes sense when you have a single decision to make. But when bargainers face many possible future decisions, bargaining over outcomes requires specifying which outcome to choose in every possible situation. This is deeply intractable in realistic settings, where bargainers can’t predict every possible scenario they might face.

In those settings it is much more tractable to bargain over methods of making decisions which generalize beyond the problems that the bargainers are currently aware of. I don’t know of much work on this, but the same idealized bargaining solutions (e.g. the Nash bargaining solution) should still apply in principle. The big question is whether there’s anything interesting to be said about the relationship between incentive-compatible decision procedures and bargaining solutions. For example, are there classes of incentive-compatible decision procedures which make it especially easy for agents to identify which one is near the optimal bargaining solution? On a more theoretical level, one tantalizing hint is that the ROSE bargaining solution is also constructed by abandoning the axiom of independence—just as Garrabrant does in his rejection of EUM above. This connection seems worth exploring further.

To finish, I’ve summarized many of the claims from this section in the following table:

What do I mean by “hard to reason about?” One nice thing about EUMs is that their behavior is extremely easy to summarize: they do whatever’s best for their goals according to their beliefs. But we can’t talk about decision procedures in the same way. Individual subagents may have goals and beliefs, but the decision procedure itself doesn’t: it just processes those subagents into a final decision.

Fortunately, there’s a way to rescue our intuitive idea that agents should have beliefs and goals. It’ll involve talking about much more complex incentive-compatible decision procedures, though. So first I’ll turn to the other direction in which we can try to derive coalitional agency: starting from active inference.

From active inference to coalitional agency

I just gave an account of coalitional agents in which they’re built up from individual EUMs. In this section I’ll do the opposite: start from an active inference agent and modify it until it looks more like a coalitional agent.

More specifically, consider a hierarchical generative model containing beliefs/goals, where higher layers predict lower layers, and lower layers send prediction errors up to higher layers. Let’s define a subagent as a roughly-internally-consistent cluster of beliefs and goals within that larger agent. Note that this definition is a matter of degree: if we apply a high bar for internal consistency, then each subagent will be small (e.g. beliefs and desires about a single object) whereas a lower bar will lead to larger subagents (e.g. a whole ideology).

Subagents with different beliefs and goals will tend to make different predictions (including “predictions” about which actions they want the agent to take). What modifications do we need to make to our original setup for it to be robust to strategic dynamics between those subagents?

Predicting observations via prediction markets

When multiple subagents make conflicting predictions, the standard approach is to combine them by taking a precision-weighted average. Credit is then assigned to each subagent for the prediction in proportion to how confident it was. But this is not incentive-compatible: subagents can benefit by strategizing about how the other subagents will respond, and changing their responses accordingly.

There are various incentive-compatible ways to elicit predictions from multiple agents (many of which are discussed by Neyman). However, the most elegant incentive-compatible method for aggregating predictions is a prediction market. Each trader on a prediction market can choose to buy shares in propositions it thinks are overpriced and sell shares in propositions it thinks are underpriced. This allows subagents to specialize into different niches within the overall agent. It also incentivizes them to arbitrage away any logical inconsistency they notice. These dynamics are modeled by the Garrabrant induction framework.

Choosing actions via auctions

Given my discussion above about actions being in some sense predictions of future behavior, we might think that actions should be chosen by prediction markets too. However, there’s a key asymmetry: if I expect a complex plan to happen, I can profit by predicting any aspect of it. But if I want a complex plan to happen, I need to successfully coordinate every aspect of it. So, unlike predictions of observations, predictions of actions need to have some mechanism for giving a single plan control over many different actuators.

In active inference, the mechanism by which this occurs is called expected free energy minimization. I’m honestly pretty confused about how expected free energy minimization works, but I strongly suspect that it’s not incentive-compatible. In particular, the discontinuity involved in picking the single highest-value plan seems like it’d induce incentives to overestimate your own plan’s value. However, Demski et al.’s BRIA framework solves this problem by requiring subagents to bid for the right to implement a plan and receive the corresponding reward. Rational subagents will never bid more than the reward they actually expect. So my hunch is that something like this auction system would be the best way to adjust our original setup to make it incentive-compatible.

Aggregating values via voting

The last important component of decision-making is evaluating plans (whether in advance or in hindsight). What happens when different subagents disagree on which goals or values the plans should be evaluated in terms of? Again, the standard approach is to take a precision-weighted average of their evaluations, but this still has all the same incentive-compatibility issues. And unlike predictions, values have no ground truth feedback signal, meaning that prediction markets don’t help.

So I expect that the most appropriate way to aggregate goals/values is via a voting system. This is also the conclusion reached by Newberry and Ord, who model idealized moral decision-making in terms of a parliament in which subagents vote on what values to pursue. Specifically, they propose using random ballot voting, in which each voter’s favorite option is selected with probability proportional to their vote share. This voting algorithm has three particularly notable features:

As a nondeterministic voting system, it dodges Arrow’s impossibility theorem.
It leads to coalitional bargaining between subagents. Pairs of subagents with different preferences are incentivized to switch both their votes to a compromise option. Iterating this process many times produces hierarchical coalitions. (The downside: when bargaining is allowed, random ballot voting isn’t threat-resistant, since agents are incentivised to start with extreme positions in order to receive more concessions.)
It outputs a probability distribution over outcomes. We therefore have a very natural way to use votes to evaluate plans: we can measure the divergence between the outcome of the vote and the (expected or actual) outcome of the plan.

Putting it all together

I’ve described two paths towards a theory of coalitional agency. On one path, we start from expected utility maximizers and aggregate them to form coalitional agents, via those EUMs bargaining about which decision procedures to use. The problem is that the resulting decision procedure may be incoherent in the sense that it can’t be ascribed beliefs or goals. On the other path, we make interactions between active inference subagents more incentive-compatible by using prediction markets, auctions, and voting (or similar mechanisms) to manage internal conflict.

What I’ll call the coalitional agency hypothesis is the idea that these two paths naturally “meet in the middle”—specifically, that EUMs doing (idealized) bargaining about which decision procedure to use would in many cases converge to something like my modified active inference procedure. If true, we’d then be able to talk about that procedure’s “beliefs” (the prices of its prediction market) and “goals” (the output of its voting procedure).

One line of work which supports the coalitional agency hypothesis is Critch’s negotiable reinforcement learning framework, under which EUMs should bet their influence on any disagreements about the future they have with other agents, so that they end up very powerful if (and only if) their predictions are right. I interpret this result as evidence that (some version of) prediction markets are the default outcome of bargaining over incentive-compatible decision procedures.

But all of this work is still vague and tentative. I’d very much like to develop a more rigorous formulation of coalitional agency. This would benefit greatly from working with collaborators (especially those with strong mathematical skills). So I’ll finish with two calls to action. If you’re a junior(ish) researcher and you want to work with me on any of this, apply to my MATS fellowship. If you’re an experienced researcher and you’d like to chat or otherwise get involved (potentially by joining a workshop series I’ll be running on this) please send me a message directly.

Thanks to davidad, Jan Kulveit, Emmett Shear, Ivan Vendrov, Scott Garrabrant, Abram Demski, Martin Soto, Laura Deming, Aaron Tucker, Adria Garriga, Oliver Richardson, Madeleine Song and others for helping me formulate these ideas.

Elite Coordination via the Consensus of Power

Richard Ngo — Wed, 19 Mar 2025 06:51:39 GMT

This post is about how implicit coordination between powerful people allows them to act in surprisingly synchronized ways. I’ll start by discussing wokeness, the most prominent recent example. I’ll then analyze the mechanism behind such coordination (which I call the consensus of power); discuss how to oppose it; and outline some possibilities for what healthier power structures would look like.

The mystery of the cathedral

The politics of the last decade have been very weird. The Great Awokening gave rise to an extreme variant of progressive ideology that quickly gained a huge amount of traction amongst American elites, and from there embedded itself into institutions across America and the wider western world.

Sudden ideological change is nothing new—my last post was about political preference cascades, and how they’re a natural result of social behavior. But what’s fascinatingly novel about the Great Awokening is the extent to which it was an almost entirely leaderless movement. There was no modern MLK leading this new charge for a racial reckoning; nor a Harvey Milk of trans rights; nor even Butlers or Steins writing incisive commentaries. The closest we had was Obama, who in hindsight was a milquetoast leader whose cult of personality was driven in large part by progressive longing for a black president.

One of Curtis Yarvin’s main intellectual contributions has been to give an account of wokeness (and its historical antecedents) which highlights this peculiarity, specifically via his concept of the cathedral:

“The cathedral” is just a short way to say “journalism plus academia”—in other words, the intellectual institutions at the center of modern society, just as the Church was the intellectual institution at the center of medieval society.

But the label is making a point. The Catholic Church is one institution—the cathedral is many institutions. Yet the label is singular. This transformation from many to one—literally, e pluribus unum—is the heart of the mystery at the heart of the modern world.
The mystery of the cathedral is that all the modern world’s legitimate and prestigious intellectual institutions, even though they have no central organizational connection, behave in many ways as if they were a single organizational structure.

That is: different parts of the cathedral talk about the same issues, support the same policies, share the same taboos, and so on. Why? When we see this level of synchrony without visible leaders, it’s tempting to take a conspiratorial stance: there are leaders, they’re just acting surreptitiously. And undoubtedly there have been many small conspiracies within woke organizations, some of which grew to involve high-level figures (e.g. the FAA hiring scandal, or the conspiracy to suppress the lab leak hypothesis, or conspiracies within universities to discriminate against Asians).

But wokeness is so visibly a bottom-up phenomenon (with high-level figures being followers rather than leaders) that it’s difficult to postulate that its overall priorities are set by any large-scale conspiracy. So Yarvin doesn’t. Instead, he explains the mystery of the cathedral in memetic terms:

[In a progressive liberal democracy], there is a market for dominant ideas. A dominant idea is an idea that validates the use of power … And there is no market for recessive ideas. A recessive idea is an idea that invalidates power or its use.
…It is not hard to see why, in the lecture halls and newsrooms, dominant ideas tend to outcompete recessive ideas. A dominant idea is an idea that tends to benefit you and your friends. A dominant idea will be especially popular with your friends and former students in the civil service, because it gives them more work and more power.

Now, competition between ideas is undoubtedly a big part of the story of wokeness. The introduction of algorithmic newsfeeds in the early 2010s produced hotbeds of rapid cultural evolution, which quickly led to the widespread transmission of many woke concepts. Even if you don’t like Yarvin’s specific memetic explanation (which I don’t, since many aspects of wokeness are recessive ideas), there are plenty more to choose from. Indeed, the common description of wokeness as a “mind virus” closely parallels Dawkins’ description of religions as “viruses of the mind”.

But there’s a crucial difference between explaining how an idea spreads and explaining how people who believe in that idea coordinate. Memetics generally focuses on the former; the mystery of the cathedral is about the latter. When woke institutions behave “as if they were a single organizational structure”, it often involves each of them reacting to each other’s actions (e.g. media outlets rapidly flipping from “it’s racist to worry about covid” to “masks and lockdowns are crucial”). That level of synchrony is hard to explain in terms of which ideas are memetically fitter.

So to solve the mystery of the cathedral we should appeal neither to conspiracies nor competitions between ideas. Instead, I think, we need a third type of explanation: an “emergent conspiracy” of people independently acting in ways that they expect others to support. I call this phenomenon the consensus of power (a term I originally got from Ben Landau-Taylor, which I’ll define more precisely in the next section).1

The cathedral is a prominent example of a consensus of power. But my main focus in this post isn't to analyze the cathedral specifically, but rather the game-theoretic dynamics underlying how consenses of power work in general. Daniel Ellsberg describes some of these dynamics in his account of US decision-making about the Vietnam War. So does Naunihal Singh in his account of coups, and Timur Kuran in his account of preference falsification, both of which I discussed at length in my last post. You can think of this post as extending those ideas from the simple case of “will the coup succeed?” to a setting where people are deciding their positions on many related issues based on their expectations about other people’s expectations and behavior. (Unfortunately, this change makes discrete formalisms like threshold models and Keynesian beauty contests much less applicable. I’m keen to better understand how to formally model consenses of power; for now, though, I’ll focus on their qualitative dynamics.)

The consensus of power

Whenever we interact with others in society, we need to figure out which social and political norms to use. (Here I’m using the term “norms” broadly to include which factions to favor in which ways, which ideas are within the Overton window, etc.) A crucial feature of norms is that they don’t just guide your own behavior, but also which behaviors you reward or punish when done by others. So you benefit from adopting (and enforcing) the same norms as other people, especially powerful people. This means that it’s advantageous to track not just which norms other people follow, but which norms people believe other people will follow, and which norms people believe that people believe that other people will follow, and which…

This suggests a definition for the consensus of power: the norms which are common knowledge amongst the ruling elite. Of course there’s no clear demarcation for who qualifies as a ruling elite; what I’m trying to gesture towards (but can’t yet define precisely) is a version of common knowledge which weights more powerful people more highly. Note that this is very different from just aggregating powerful people’s preferences, because norms can persist even when they’re extremely unpopular. Often when someone goes against the consensus of power, you don’t personally care. But you know that others will punish you if you’re associated with it (because others will punish them if they don’t punish you, because…). So you push back against it—and thus help uphold the consensus of power. I’ll call people who consistently uphold the consensus of power “apparatchiks”.

The consensus of power, like the conspiratorial stance, attempts to explain the surprising level of coordination often seen within ruling elites. But direct coordination via conspiracy and indirect coordination via consensus are very different mechanisms that make importantly different predictions. I’ll elucidate the differences by framing each of them as a type of superagent containing many individual people.

Conspiracies are centralized agents who coordinate in a top-down way, by deferring to the decisions of leaders. This allows them to be highly goal-directed, and to plan and execute complex strategies. But it also bottlenecks them both on trust in those leaders, and on how well those leaders can gather and process information. This leaves them vulnerable to direct attack: disrupt or undermine their leaders and you can take out the entire conspiracy.

By contrast, consenses are distributed agents. They need not have any leaders, or even any central nodes. Instead, each apparatchik observes the behavior of many other apparatchiks, and gradually adjusts their own behavior accordingly. This allows consenses of power to perceive and act on a lot of information in parallel.

It also makes attacking them difficult, because there’s no clear locus of power. One good example comes from the ideological conformity of the mainstream media over the last few decades. If there’d been a news czar who’d banned coverage of Biden’s cognitive decline, or Pakistani pedophile gangs, or opposition to the BLM riots, or pushback against child transitions, or immigrant crime statistics, or racial IQ gaps, then that person could be criticised directly. But instead there were just thousands of individual decisions made by apparatchik news editors trying to judge which stories the consensus of power wanted them to run, resulting in a news environment almost as homogeneous as if there had been a left-wing news czar.

So consenses of power can be very influential and robust. But they’re also myopic and inflexible. Apparatchiks will only throw their weight behind positions that it’s common knowledge that the consensus of power favors. But without leaders willing to take a risk to rally people around a position, common knowledge is hard to achieve. So consenses of power are very limited in their ability to “think outside the box” or course-correct when pursuing their goals. The latter typically requires big flashy failures, which allow apparatchiks to then coordinate a push for a “vibe shift”.

A prominent example of these dynamics was the process by which the Democrats selected first Biden then Harris as their 2024 nominee. Biden’s decision to run again was very unwise but hard for leading Democrats to oppose, since nobody wanted to stick their neck out by taking a stand against such a powerful figure. There was also no Schelling point to coordinate on pushing back on it—until Biden’s poor debate performance broke the taboo, leading to a huge surge of criticism.

The shift was so sudden that many hypothesized that it was centrally coordinated, but that hypothesis isn’t necessary. Instead, we can picture every Democrat leader individually wanting to get rid of Biden, then jumping on the bandwagon as soon as it became socially permissible to do so. Biden tried hard to cling onto the nomination, but once roused a consensus of power is very difficult to oppose. Yet even then the difference between conspiracy and consensus was clear. Any conspiracy powerful enough to remove Biden would have replaced him with a more electable candidate than Harris. However, none of those candidates were prominent enough for the consensus to land on.

More generally, we can think of wokeness as a consensus of power under which it was common knowledge that leaders would give in to demands made by left-wing activists. As more and more leaders gave in to more and more demands, it took more and more courage for any individual leader to draw a red line. This created a channel by which left-wing activists could rapidly inject ideas that evolved on social media into prestigious institutions.

Fighting a consensus of power

Another way to characterize consenses of power is by understanding what it takes to resist them. When you find yourself opposing a conspiracy, the best thing you can do is bring it to light—even one piece of evidence of illegitimate collusion can blow the whole thing up. But when you find yourself opposing a consensus of power, such smoking-gun evidence may be very hard to find, or nonexistent. You can try to describe the more subtle dynamics you see, but this can easily sound uncompelling or paranoid. And even if the evidence is pretty good, it’s often hard to get people to act on it—because they’ll then find themselves opposing the whole consensus of power, which is a terrifying prospect.

In other words, attacks that work on centralized agents often won’t work on distributed agents. But distributed agents have their own weaknesses, most notably their lack of coherence. What do those suggest as strategies for fighting a consensus of power?

Firstly: act in its blind spots. Common knowledge is hard to build, and so consenses of power can be surprisingly slow to incorporate knowledge that many individual apparatchiks already have. This means that there are often many possible high-impact actions which the consensus of power doesn’t yet have an opinion about. For example, the losses that the woke consensus of power have incurred from Elon buying Twitter are at least an order of magnitude larger than the purchase price of $44 billion. But it was very difficult for the woke consensus of power to defend itself against this form of attack, because buying a social media platform is a very non-standard strategy for acquiring political power. (It did act afterwards, via the GARM-coordinated boycott, but this was too little too late.)

Although consenses of power have many blind spots, taking advantage of them isn’t always easy. Firstly, because consenses of power inherently want to preserve their own power, and are therefore suspicious of people doing unusual things at scale, even if they don’t fully understand how those things threaten them. (If you see something being criticized as “weird”, it’s a good bet that this is an attack by a consensus of power.) And secondly, because it’s easy for opponents of consenses of power to slip into a conspiratorial stance, treating the consensus as an entity that’s as well-organized and goal-directed as a centralized conspiracy.

Conspiratorial thinking suggests that if you ever do something which goes against the interests of apparatchiks, the consensus of power will “get you” for it in a way that you’re not anticipating. Because consenses of power can be so widely-distributed, this creates a miasma of fear around acting (or even thinking) against them. In order to dispel that miasma, you need enough courage to act against individual apparatchiks and see that they can’t actually direct the consensus of power to target you. The rarity of people who are willing to do that helps explain Thiel’s claim that courage is in shorter supply than genius.

Secondly: judo-throw the consensus of power. Consenses of power have a tendency to overreach, because there’s nobody in control who can stop ill-advised bandwagons. For example, a decade ago Silicon Valley was weak enough that Democrats could score easy points by attacking it. That gradually became less and less true, but Democrat elites in government, journalism and academia continued a campaign of pointless and often petty attacks, because there was no clear signal that they should stop. This has now backfired dramatically for them, with Silicon Valley leaders coming out strongly for Trump.

So it’s possible to cause significant damage to a consensus of power by waving a red flag in front of it, to draw it into an uphill battle. But this is a risky strategy. The woke consensus of power attacked many people on increasingly flimsy grounds over the last decade (e.g. James Damore, David Shor, and Emmanuel Cafferty). But when wider society failed to rally in their defense, those potential overreaches turned into victories, creating (self-fulfilling) common knowledge that the woke consensus could cancel people for basically anything.

So judo-throwing a consensus of power is bottlenecked not just on courage but also on integrity and political savvy. The former makes it harder for the consensus of power to find pretexts for attacking you, thereby helping your allies build enough common knowledge (or faith) to mobilize to defend you. The latter allows you to choose when and how to most effectively take a stand—as civil rights leaders did by choosing Rosa Parks as a figurehead. Perhaps the best example of a judo throw on a consensus of power is Gandhi’s Salt March, which was extremely courageous (in its commitment to nonviolence), high-integrity (in its openness about its plans and goals) and politically savvy (in its choice of the salt tax as a symbol of oppression).

Having said that, one common downside of cultivating political savvy is that understanding how the consensus of power thinks risks turning you into an apparatchik. It’s easy to overzealously police naive activists on your “own side”, out of fear that they’ll provoke the consensus of power. But doing so makes you complicit in upholding the consensus of power, and sometimes even makes you your own side’s harshest critic. (A recent example comes from Nigel Farage, who’s so scared to be branded as dangerously far-right that he’s preemptively making those same accusations against his own MP.)

Worse, engaging closely with a consensus of power reliably makes you overestimate its stability. It’s easy to assume that apparatchiks hold their positions sincerely (since that’s the impression they’re all trying to give!) when in fact most are conformists who would flip if they saw the tide turning. So almost all politically savvy people are far less ambitious than they should be: they try to gradually nudge the consensus of power when they should be trying to replace it altogether.2

Replacing a consensus of power

Recall that I’ve defined the consensus of power as common knowledge about what people will favor, weighted by their power. But power is actually in large part constituted by people’s willingness to defer to you, which is significantly affected by how much influence you have over the consensus of power. So there are many possible stable(ish) consenses of power which we could choose to steer towards or away from.

Indeed, we can think about the history of politics in terms of transitions between different consenses of power. Most of human history was governed by what I’ll call the consensus of force: a consensus of power weighted by the ability to deploy military force. This tended to involve power struggles between monarchs, nobles, and generals, often cashing out in very destructive wars.3

The rise of enlightenment liberalism helped rein in the consensus of force via (implicit and explicit) rules. I’ll call the regime which has replaced it the consensus of culture: a consensus of power weighted by your ability to spread your ideas. The cathedral is a consensus of culture focused around the institutions which used to be the best at spreading ideas: newspapers and universities. While the consensus of force is backed up by physical coercion, consenses of culture are primarily backed up by social coercion. This dramatically reduces the costs of conflict.

We can imagine distributed agents which don’t even rely on social coercion. In my last post I talked about values (like morality or national identity) as distributed agents. The crucial difference between values-based agents and consenses of power is that values motivate individuals even in the absence of external incentives. By contrast, a pure consensus of power has no sway over people who aren’t rewarded for following it or punished for defying it (though in practice almost all ideologies are somewhere between purely value-based and pure consenses).

But the price of non-coercion is a lack of robustness. You can’t run a society on purely non-coercive grounds, because you then have no mechanism to punish defectors. Even consenses of culture can become extremely decoupled from physical reality, because you don’t need physical force to apply social coercion. For example, South Korea’s consensus of culture is so bad that they’re driving themselves extinct. More generally, social media and smartphones facilitated the creation of echo chambers in which unhinged consenses of culture could develop. And while populist surges in many western countries are now pushing back against wokeness, there’s no guarantee that the same forces won’t drive them equally crazy.

There’s an interesting historical analogy here. The development of modern militaries made the consensus of force unprecedentedly powerful, leading enlightenment intellectuals to develop the ideological paradigm of liberalism to rein it in. Liberalism spread because “letting our ideas die in our stead” was more appealing than continuing to fight wars over religion. Now the development of modern communications technology has made the consensus of culture unprecedentedly powerful. What this suggests is that we need liberalism 2.0: an ideological paradigm robust enough to rein in the excesses of the consensus of culture, but non-coercive enough that it can spread without sparking another culture war.

That’s a hard ideology design problem. But I think it’s solvable. In forthcoming posts I’ll talk about some of the other constraints that such an ideology would face, and my best guess at what it should look like.

Subscribe now

In Scott Alexander’s terminology, I’d describe conspiracies as a conflict theory explanation, and memetic competition as a mistake theory explanation. I’d then describe the consensus of power as a systems theory explanation.

For example, most AI governance advocates historically tried to take positions that would sound reasonable in DC, without planning for how to respond to predictable rapid Overton window shifts. So after ChatGPT it was DC outsiders like FLI and Yudkowsky who stepped up to bring AI risk discourse into the political mainstream.

Ben Landau-Taylor pointed out to me that the consensus of force best describes feudal societies. By contrast, at various points various empires (such as the Chinese and Roman empires) consolidated power to a sufficient extent that it was infeasible to challenge the center using military force. This shifted power from the consensus of force to what I’ll call the consensus of the court: a consensus of power weighted by the favor of the emperor.

A system where power flows from a single person might seem very different from the consenses of power I’ve discussed above. But even an emperor with absolute formal power still has scarce attention, and still needs to delegate almost all tasks. So the apparatchiks within the court will scramble to anticipate his desires, and only infrequently bring complaints directly to him (since doing so is costly and risky). The consensus of the court is then determined by common knowledge about what gatekeepers will allow to be brought to the emperor’s attention, and which types of orders the emperor can actually enforce. Not even the emperor is in control of this process, since the consensus of the court is trying to tell him exactly what he wants to hear—sometimes effectively enough that he becomes no more than a figurehead.

Power Lies Trembling

Richard Ngo — Sat, 22 Feb 2025 22:37:44 GMT

In a previous book review I described exclusive nightclubs as the particle colliders of sociology—places where you can reliably observe extreme forces collide. If so, military coups are the supernovae of sociology. They’re huge, rare, sudden events that, if studied carefully, provide deep insight about what lies underneath the veneer of normality around us.

That’s the conclusion I take away from Naunihal Singh’s book Seizing Power: the Strategic Logic of Military Coups. It’s not a conclusion that Singh himself draws: his book is careful and academic (though much more readable than most academic books). His analysis focuses on Ghana, a country which experienced ten coup attempts between 1966 and 1983 alone. Singh spent a year in Ghana carrying out hundreds of hours of interviews with people on both sides of these coups, which led him to formulate a new model of how coups work.

I’ll start by describing Singh’s model of coups. Then I’ll explain how the dynamics of his model also apply to everything else, with reference to Timur Kuran’s excellent book on preference falsification, Private Truths, Public Lies. In particular, I’ll explain threshold models of social behavior, which I find extremely insightful for understanding social dynamics.

Both of these books contain excellent sociological analyses. But they’re less useful as guides for how one should personally respond to the dynamics they describe. I think that’s because in sociology you’re always part of the system you’re trying to affect, so you can never take a fully objective, analytical stance towards it. Instead, acting effectively also requires the right emotional and philosophical stance. So to finish the post I’ll explore such a stance—specifically the philosophy of faith laid out by Soren Kierkegaard in his book Fear and Trembling.

The revolutionary’s handbook

What makes coups succeed or fail? Even if you haven’t thought much about this, you probably implicitly believe in one of two standard academic models of them. The first is coups as elections. In this model, people side with the coup if they’re sufficiently unhappy with the current regime—and if enough people side with the coup, then the revolutionaries will win. This model helps explain why popular uprisings like the Arab Spring can be so successful even when they start off with little military force on their side. The second is coups as battles. In this model, winning coups is about seizing key targets in order to co-opt the “nervous system” of the existing government. This model (whose key ideas are outlined in Luttwak’s influential book on coups) explains why coups depend so heavily on secrecy, and often succeed or fail based on their initial strikes.

Singh rejects both of these models, and puts forward a third: coups as coordination games. The core insight of this model is that, above all, military officers want to join the side that will win—both to ensure their and their troops’ survival, and to minimize unnecessary bloodshed overall. Given this, their own preferences about which side they’d prefer to win are less important than their expectations about which side other people will support. This explains why very unpopular dictators can still hold onto power for a long time (even though the coups as elections model predicts they’d quickly be deposed): because everyone expecting everyone else to side with the dictator is a stable equilibrium.

It also explains why the targets that revolutionaries focus on are often not ones with military importance (as predicted by the coups as battles model) but rather targets of symbolic importance, like parliaments and palaces—since holding them is a costly signal of strength. Another key type of target often seized by revolutionaries is broadcasting facilities, especially radio stations. Why? Under the coups as battles model, it’s so they can coordinate their forces (and disrupt the coordination of the existing regime’s forces). Meanwhile the coups as elections model suggests that revolutionaries should use broadcasts to persuade people that they’re better than the old regime. Instead, according to Singh, what we most often observe is revolutionaries publicly broadcasting claims that they’ve already won—or (when already having won is too implausible to be taken seriously) that their victory is inevitable.

It’s easy to see why, if you believed those claims, you’d side with the coup. But, crucially, such claims can succeed without actually persuading anyone! If you believe that others are gullible enough to fall for those claims, you should fall in line. Or if you believe that others believe that you will believe those claims, then they will fall in line and so you should too. In other words, coups are an incredibly unstable situation where everyone is trying to predict everyone else’s predictions about everyone else’s predictions about everyone else’s predictions about everyone else’s… about who will win. Once the balance starts tipping one way, it will quickly accelerate. And so each side’s key priority is making themselves the Schelling point for coordination via managing public information (i.e. information that everyone knows everyone else has) about what’s happening. (This can be formally modeled as a Keynesian beauty contest. Much more on this in follow-up posts.)

Singh calls the process of creating self-fulfilling common knowledge making a fact. I find this a very useful term, which also applies to more mundane situations—e.g. taking the lead in a social context can make a fact that you’re now in charge. Indeed, one of the most interesting parts of Singh’s book was a description of how coups can happen via managing the social dynamics of meetings of powerful people (e.g. all the generals in an army). People rarely want to be the first to defend a given side, especially in high-stakes situations. So if you start the meeting with a few people confidently expressing support for a coup, and then ask if anyone objects, the resulting silence can make the fact that everyone supports the coup. This strategy can succeed even if almost all the people in the meeting oppose the coup—if none of them dares to say so in the meeting, it’s very hard to rally them afterwards against what’s now become the common-knowledge default option.

One of Singh’s case studies hammers home how powerful meetings are for common knowledge creation. In 1978, essentially all the senior leaders in the Ghanaian military wanted to remove President Acheampong. However, they couldn’t create common knowledge of this, because it would be too suspicious for them to all meet without the President. Eventually Acheampong accidentally sealed his fate by sending a letter to a general criticizing the military command structure, which the general used as a pretext to call a series of meetings culminating in a bloodless coup in the President’s office.

Meetings are powerful not just because they get the key people in the same place, but also because they can be run quickly. The longer a coup takes, the less of a fait accompli it appears, and the more room there is for doubt to creep in. Singh ends the book with a fascinating case study of the 1991 coup attempt by Soviet generals against Gorbachev and Yeltsin. Even accounting for cherry-picking, it’s impressive how well this coup lines up with the “coups as coordination games” model. The conspirators included almost all of the senior members of the current government, and timed their strike for when both Gorbachev and Yeltsin were on vacation—but made the mistake of allowing Yeltsin to flee to the Russian parliament. From there he made a series of speeches asserting his moral legitimacy, while his allies spread rumors that the coup was falling apart. Despite having Yeltsin surrounded with overwhelming military force, bickering and distrust amongst the conspirators delayed their assault on the parliament long enough for them to become demoralized, at which point the coup essentially fizzled out.

Another of Singh’s most striking case studies was of a low-level Ghanaian soldier, Jerry Rawlings, who carried out a successful coup with less than a dozen armed troops. He was able to succeed in large part because the government had shown weakness by airing warnings about the threat Rawlings posed, and pleas not to cooperate with him. This may seem absurd, but Singh does a great job characterizing what it’s like to be a soldier confronted by revolutionaries in the fog of war, hearing all sorts of rumors that something big is happening, but with no real idea how many people are supporting the coup. In that situation, by far the easiest option is to stand aside, lest you find yourself standing alone against the new government. And the more people stand aside, the more snowballing social proof the revolutionaries have. So our takeaway from the Soviet coup attempt shouldn’t be that making a fact is inherently difficult—just that rank and firepower are no substitute for information control.

I don’t think of Singh as totally disproving the two other theories of coups—they probably all describe complementary dynamics. For example, if the Soviet generals had captured Yeltsin in their initial strike, he wouldn’t have had the chance to win the subsequent coordination game. And though Singh gives a lot of good historical analysis, he’s light on advance predictions. But Singh’s model is still powerful enough that it should constrain our expectations in many ways. For example, I’d predict based on Singh’s theory that radio will still be important for coups in developing countries, even now that it’s no longer the main news source for most people. The internet can convey much more information much more quickly, but radio is still better for creating common knowledge, in part because of its limitations (like having a fixed small number of channels). If you think of other predictions which help distinguish these three theories of coups, do let me know.

From explaining coups to explaining everything

Singh limits himself to explaining the dynamics of coups. But once he points them out, it’s easy to start seeing them everywhere. What if everything is a coordination game?

That’s essentially the thesis of Timur Kuran’s book Private Truths, Public Lies. Kuran argues that a big factor affecting which beliefs people express on basically all political topics is their desire to conform to the opinions expressed by others around them—a dynamic known as preference falsification. Preference falsification can allow positions to maintain dominance even as they become very unpopular. But it also creates a reservoir of pent-up energy that, when unleashed, can lead public opinion to change very rapidly—a process known as a preference cascade.

The most extreme preference cascades come during coups when common knowledge tips towards one side winning (as described above). But Kuran chronicles many other examples, most notably the history of race relations in America. In his telling, both the end of slavery and the end of segregation happened significantly after white American opinion had tipped against them—because people didn’t know that other people had also changed their minds. “According to one study [in the 70s], 18 percent of the whites favored segregation, but as many as 47 percent believed that most did so.” And so change, when it came, was very sudden: “In the span of a single decade, the 1960s, the United States traveled from government-supported discrimination against blacks to the prohibition of all color-based discrimination, and from there to government-promoted discrimination in favor of blacks.”

According to Kuran, this shift unfortunately wasn’t a reversion from preference falsification to honesty, but rather an overshot into a new regime of preference falsification. Writing in 1995, he claims that “white Americans are overwhelmingly opposed to special privileges for blacks. But they show extreme caution in expressing themselves publicly, for fear of being labeled as racists.” This fear has entrenched affirmative action ever more firmly over the decades since then, until the very recent and very sudden rise of MAGA.

Kuran’s other main examples are communism and the Indian caste system. His case studies are interesting, but the most valuable part of the book for me was his exposition of a formal model of preference falsification and preference cascades: threshold models of social behavior. For a thorough explanation of them, see this blog post by Eric Neyman (who calls visual representations of threshold models social behavior curves). Here I’ll just give an abbreviated introduction by stealing some of Eric’s graphs.

The basic idea is that threshold models describe how people’s willingness to do something depends on how many other people are doing it. Most people have some threshold at which they’ll change their public position, which is determined by a combination of their own personal preferences and the amount of pressure they feel to conform to others. For example, the graph below is a hypothetical social behavior curve of what percentage of people would wear facemasks in public, as a function of how many people they see already wearing masks. (The axis labels are a little confusing—you could also think of the x and y axes as “mask-wearers at current timestep” and “mask-wearers at next timestep” respectively.)

On this graph, if 35% of people currently wear masks, then once this fact becomes known around 50% of people would want to wear masks. This means that 35% of people wearing masks is not an equilibrium—if the number of mask-wearers starts at 35%, it will increase over time. More generally, whenever the percentage of people wearing a mask corresponds to a point on the social behavior curve above the y=x diagonal, then the number of mask-wearers will increase; when below y=x, it’ll decrease. So the equilibria are places where the curve intersects y=x. But only equilibria which cross from the left side to the right side are stable; those that go the other way are unstable (like a pen balanced on its tip), with any slight deviation sending them spiraling away towards the nearest stable equilibrium.

I recommend staring at the graph above until that last paragraph feels obvious.

I find the core insights of threshold models extremely valuable; I think of them as sociology’s analogue to supply and demand curves in economics. They give us simple models of moral panics, respectability cascades, echo chambers, the euphemism treadmill, and a multitude of other sociological phenomena—including coups.

We can model coups as an extreme case where the only stable equilibria are the ones where everyone supports one side or everyone supports the other, because the pressure to be on the winning side is so strong. This implies that coups have an s-shaped social behavior curve, with a very unstable equilibrium in the middle—something like the diagram below. The steepness of the curve around the unstable equilibrium reflects the fact that, once people figure out which side of the tipping point they’re on, support for that side snowballs very quickly.

This diagram illustrates that shifting the curve a few percent left or right has highly nonlinear effects. For most possible starting points, it won’t have any effect. But if we start off near an intersection, then even a small shift could totally change the final outcome. You can see an illustration of this possibility (again from Eric’s blog post) below—it models a persuasive argument which makes people willing to support something with 5 percentage points less social proof, thereby shifting the equilibrium a long way. The historical record tells us that courageous individuals defying social consensus can work in practice, but now it works in theory too.

Having said all that, I don’t want to oversell threshold models. They’re still very simple, which means that they miss some important factors:

They only model a binary choice between supporting and opposing something, whereas most people are noncommittal on most issues by default (especially in high-stakes situations like coups). But adding in this third option makes the math much more complicated—e.g. it introduces the possibility of cycles, meaning there might not be any equilibria.
Realistically, support and opposition aren’t limited to discrete values, but can range continuously from weak to strong. So perhaps we should think of social behavior curves in terms of average level of support rather than number of supporters.
Threshold models are memoryless: the next timestep depends only on the current timestep. This means that they can’t describe, for example, the momentum that builds up after behavior consistently shifts in one direction.
Threshold models treat all people symmetrically. By contrast, belief propagation models track how preferences cascade through a network of people, where each person is primarily responding to local social incentives. Such models are more realistic than simple threshold models.

I’d be very interested to hear about extensions to threshold models which avoid these limitations.

From explaining everything to influencing everything

How should understanding the prevalence of preference falsification change our behavior? Most straightforwardly, it should predispose us to express our true beliefs more even in controversial cases—because there might be far more people who agree with us than it appears. And as described above, threshold models give an intuition for how even a small change in people’s willingness to express a view can trigger big shifts.

However, there’s also a way in which threshold models can easily be misleading. In the diagram above, we modeled persuasion as an act of shifting the curve. But the most important aspect of persuasion is often not your argument itself, but rather the social proof you provide by defending a conclusion. And so in many cases it’s more realistic to think of your argument, not as translating the entire curve, but as merely increasing the number of advocates for X by one.

There’s a more general point here. It’s tempting to think that you can estimate the social behavior curve, then decide how you’ll act based on that. But everyone else’s choices are based on their predictions of you, and you’re constantly leaking information about your decision-making process. So you can’t generate credences about how others will decide, then use them to make your decision, because your eventual decision is heavily correlated with other people’s decisions. You’re not just intervening on the curve, you are the curve.

More precisely, social behavior is a domain where the correlations between people’s decisions are strong enough to make causal decision theory misleading. Instead it’s necessary to use either evidential decision theory or functional decision theory. Both of these track the non-causal dependencies between your decision and other people’s decisions. In particular, both of them involve a step where you reason “if I do something, then it’s more likely that others will do the same thing”—even when they have no way of finding out about your final decision before making theirs. So you’re not searching for a decision which causes good things to happen; instead you’re searching for a desirable fixed point for simultaneous correlated decisions by many people.

I’ve put this in cold, rational language. But what we’re talking about is nothing less than a leap of faith. Imagine sitting at home, trying to decide whether to join a coup to depose a hated ruler. Imagine that if enough of you show up on the streets at once, loudly and confidently, then you’ll succeed—but that if there are only a few of you, or you seem scared or uncertain, then the regime won’t be cowed, and will arrest or kill all of you. Imagine your fate depending on something you can’t control at all except via the fact that if you have faith, others are more likely to have faith too. It’s a terrifying, gut-wrenching feeling.

Perhaps the most eloquent depiction of this feeling comes from Soren Kierkegaard in his book Fear and Trembling. Kierkegaard is moved beyond words by the story of Abraham, who is not only willing to sacrifice his only son on God’s command—but somehow, even as he’s doing it, still believes against all reason that everything will turn out alright. Kierkegaard struggles to describe this level of pure faith as anything but absurd. Yet it’s this absurdity that is at the heart of social coordination—because you can never fully reason through what happens when other people predict your predictions of their predictions of… To cut through that, you need to simply decide, and hope that your decision will somehow change everyone else’s decision. You walk out your door to possible death because you believe, absurdly, that doing so will make other people simultaneously walk out of the doors of their houses all across the city.

A modern near-synonym for “leap of faith” is “hyperstition”: an idea that you bring about by believing in it. This is Nick Land’s term, which he seems to use primarily for larger-scale memeplexes—like capitalism, the ideology of progress, or AGI. Deciding whether or not to believe in these hyperstitions has some similarity to deciding whether or not to join a coup, but the former are much harder to reason about by virtue of their scale. We can think of hyperstitions as forming the background landscape of psychosocial reality: the commanding heights of ideology, the shifting sands of public opinion, and the moral mountain off which we may—or may not—take a leap into the sea of faith.

Becoming a knight of faith

Unfortunately, the mere realization that social reality is composed of hyperstitions doesn’t give you social superpowers, any more than knowing Newtonian mechanics makes you a world-class baseball player. So how can you decide when and how to actually swing for the fences? I’ll describe the tension between having too much and too little faith by contrasting three archetypes: the pragmatist, the knight of resignation, and the knight of faith.

The pragmatist treats faith as a decision like any other. They figure out the expected value of having faith—i.e. of adopting an “irrationally” strong belief—and go for it if and only if it seems valuable enough. Doing that analysis is difficult: it requires the ability to identify big opportunities, judge people’s expectations, and know how your beliefs affect common knowledge. In other words, it requires skill at politics, which I’ll talk about much more in a follow-up post.

But while pragmatic political skill can get you a long way, it eventually hits a ceiling—because the world is watching not just what you do but also your reasons for doing it. If your choice is a pragmatic one, others will be able to tell—from your gait, your expression, your voice, your phrasing, and of course how your position evolves over time. They’ll know that you’re the sort of person who will change your mind if the cost/benefit calculus changes. And so they’ll know that they won’t truly be able to rely on you—that you don’t have sincere faith.

Imagine, by contrast, someone capable of fighting for a cause no matter how many others support them, no matter how hopeless it seems. Even if such a person never actually needs to fight alone, the common knowledge that they would makes them a nail in the fabric of social reality. They anchor the social behavior curve not merely by adding one more supporter to their side, but by being an immutable fixed point around which everyone knows (that everyone knows that everyone knows…) that they must navigate.

The archetype that Kierkegaard calls the knight of resignation achieves this by being resigned to the worst-case outcome. They gather the requisite courage by suppressing their hope, by convincing themselves that they have nothing to lose. They walk out their door having accepted death, with a kind of weaponized despair.

The grim determination of the knight of resignation is more reliable than pragmatism. But if you won’t let yourself think about the possibility of success, it’s very difficult to reason well about how it can be achieved, or to inspire others to pursue it. So what makes Kierkegaard fear and tremble is not the knight of resignation, but the knight of faith—the person who looks at the worst-case scenario directly, and (like the knight of resignation) sees no causal mechanism by which his faith will save him, but (like Abraham) believes that he will be saved anyway. That’s the kind of person who could found a movement, or a country, or a religion. It's Washington stepping down from the presidency after two terms, and Churchill holding out against Nazi Germany, and Gandhi committing to non-violence, and Navalny returning to Russia—each one making themselves a beacon that others can’t help but feel inspired by.

What’s the difference between being a knight of faith, and simply falling into wishful thinking or delusion? How can we avoid having faith in the wrong things, when the whole point of faith is that we haven’t pragmatically reasoned our way into it? Kierkegaard has no good answer for this—he seems to be falling back on the idea that if there’s anything worth having faith in, it’s God. But from the modern atheist perspective, we have no such surety, and even Abraham seems like he’s making a mistake. So on what basis should we decide when to have faith?

I don’t think there’s any simple recipe for making such a decision. But it’s closely related to the difference between positive motivations (like love or excitement) and negative motivations (like fear or despair). Ultimately I think of faith as a coordination mechanism grounded in values that are shared across many people, like moral principles or group identities. When you act out of positive motivation towards those values, others will be able to recognize the parts of you that also arise in them, which then become a Schelling point for coordination. That’s much harder when you act out of pragmatic interests that few others share—especially personal fear. (If you act out of fear for your group’s interests, then others may still recognize themselves in you—but you’ll also create a neurotic and self-destructive movement.)

I talk at length about how to replace negative motivation with positive motivation in this series of posts. Of course, it’s much easier said than done. Negative motivations are titanic psychological forces which steer most decisions most people make. But replacing them is worth the effort, because it unlocks a deep integrity—the ability to cooperate with different parts of yourself all the way down, without relying on deception or coercion. And that in turn allows you to cooperate with copies of those parts that live in other people—to act as as part of a distributed agent held together by a shared sense of justice or fairness or goodness. That’s how we can tame the drive towards preference falsification, and steer our desire for collective identities towards those which promote the flourishing of the individuals within them.

Postscript: six months after publishing this post, I feel hesitant about how much I focused on the example of popular revolt (and have edited the last sentence accordingly). Marching in the streets against an oppressive government can be admirable, but it’s also very mob-like. In particular, it doesn’t ensure that the next regime will actually be better (as many Iranian revolutionaries learned at great cost). I’m now more focused specifically on faith that’s oriented towards developing a just, well-governed coalition. I’m not yet quite sure what I mean by that, but I plan to explore that idea further in forthcoming posts.

Why I'm not a Bayesian

Richard Ngo — Sun, 06 Oct 2024 15:30:52 GMT

This post focuses on philosophical objections to Bayesianism as an epistemology. I first explain Bayesianism and some standard objections to it, then lay out my two main objections (inspired by ideas in philosophy of science). A follow-up post will speculate about how to formalize an alternative.

Degrees of belief

The core idea of Bayesian epistemology: we should ideally reason by assigning credences to propositions which represent our degrees of belief that those propositions are true. (Note that this is different from Bayesianism as a set of statistical techniques, or Bayesianism as an approach to machine learning, which I don’t discuss here.)

If that seems like a sufficient characterization to you, you can go ahead and skip to the next section, where I explain my objections to it. But for those who want a more precise description of Bayesianism, and some existing objections to it, I’ll more specifically characterize it in terms of five claims. Bayesianism says that we should ideally reason in terms of:

Propositions which are either true or false (classical logic)
Each of which is assigned a credence (probabilism)
Representing subjective degrees of belief in their truth (subjectivism)
Which at each point in time obey the axioms of probability (static rationality)
And are updated over time by applying Bayes’ rule to new evidence (rigid empiricism)

I won’t go into the case for Bayesianism here except to say that it does elegantly formalize many common-sense intuitions. Bayes’ rule follows directly from a straightforward Venn diagram. The axioms of probability are powerful and mathematically satisfying. Subjective credences seem like the obvious way to represent our uncertainty about the world. Nevertheless, there are a wide range of alternatives to Bayesianism, each branching off from the claims listed above at different points:

Traditional epistemology only accepts #1, and rejects #2. Traditional epistemologists often defend a binary conception of knowledge—e.g. one defined in terms of justified true belief (or a similar criterion, like reliable belief).
Frequentism accepts #1 and #2, but rejects #3: it doesn’t think that credences should be subjective. Instead, frequentism holds that credences should correspond to the relative frequency of an event in the long term, which is an objective fact about the world. For example, you should assign 50% credence that a flipped coin will come up heads, because if you continued flipping the coin the proportion of heads would approach 50%.
Garrabrant induction accepts #1 to #3, but rejects #4. In order for credences to obey the axioms of probability, all the logical implications of a statement must be assigned the same credence. But this “logical omniscience” is impossible for computationally-bounded agents like ourselves. So in the Garrabrant induction framework, credences instead converge to obeying the axioms of probability in the limit, without guarantees that they’re coherent after only limited thinking time.
Radical probabilism accepts #1 to #4, but rejects #5. Again, this can be motivated by qualms about logical omniscience: if thinking for longer can identify new implications of our existing beliefs, then our credences sometimes need to update via a different mechanism than Bayes’ rule. So radical probabilism instead allows an agent to update to any set of statically rational credences at any time, even if they’re totally different from its previous credences. The one constraint is that each credence needs to converge over time to a fixed value—i.e. it can’t continue oscillating indefinitely (otherwise the agent would be vulnerable to a Dutch Book).

It’s not crucial whether we classify Garrabrant induction and radical probabilism as variants of Bayesianism or alternatives to it, because my main objection to Bayesianism doesn’t fall into any of the above categories. Instead, I think we need to go back to basics and reject #1. Specifically, I have two objections to the idea that idealized reasoning should be understood in terms of propositions that are true or false:

We should assign truth-values that are intermediate between true and false (fuzzy truth-values)
We should reason in terms of models rather than propositions (the semantic view)

I’ll defend each claim in turn.

Degrees of truth

Formal languages (like code) are only able to express ideas that can be pinned down precisely. Natural languages, by contrast, can refer to vague concepts which don’t have clear, fixed boundaries. For example, the truth-values of propositions which contain gradable adjectives like “large” or “quiet” or “happy” depend on how we interpret those adjectives. Intuitively speaking, a description of something as “large” can be more or less true depending on how large it actually is. The most common way to formulate this spectrum is as “fuzzy” truth-values which range from 0 to 1. A value close to 1 would be assigned to claims that are clearly true, and a value close to 0 would be assigned to claims that are clearly false, with claims that are “kinda true” in the middle.

Another type of “kinda true” statements are approximations. For example, if I claim that there’s a grocery store 500 meters away from my house, that’s probably true in an approximate sense, but false in a precise sense. But once we start distinguishing the different senses that a concept can have, it becomes clear that basically any concept can have widely divergent category boundaries depending on the context. A striking example from Chapman:

A: Is there any water in the refrigerator?
B: Yes.
A: Where? I don’t see it.
B: In the cells of the eggplant.

The claim that there’s water in the refrigerator is technically true, but pragmatically false. And the concept of “water” is far better-defined than almost all abstract concepts (like the ones I’m using in this post). So we should treat natural-language propositions as context-dependent by default. But that’s still consistent with some statements being more context-dependent than others (e.g. the claim that there’s air in my refrigerator would be true under almost any interpretation). So another way we can think about fuzzy truth-values is as a range from “this statement is false in almost any sense” through “this statement is true in some senses and false in some senses” to “this statement is true in almost any sense”.

Note, however, that there’s an asymmetry between “this statement is true in almost any sense” and “this statement is false in almost any sense”, because the latter can apply to two different types of claims. Firstly, claims that are meaningful but false (“there’s a tiger in my house”). Secondly, claims that are nonsense—there are just no meaningful interpretations of them at all (“colorless green ideas sleep furiously”). We can often distinguish these two types of claims by negating them: “there isn’t a tiger in my house” is true, whereas “colorless green ideas don’t sleep furiously” is still nonsense. Of course, nonsense is also a matter of degree—e.g. metaphors are by default less meaningful than concrete claims, but still not entirely nonsense.

So I've motivated fuzzy truth-values from four different angles: vagueness, approximation, context-dependence, and sense vs nonsense. The key idea behind each of them is that concepts have fluid and amorphous category boundaries (a property called nebulosity). However, putting all of these different aspects of nebulosity on the same zero-to-one scale might be an oversimplification. More generally, fuzzy logic has few of the appealing properties of classical logic, and (to my knowledge) isn’t very directly useful. So I’m not claiming that we should adopt fuzzy logic wholesale, or that we know what it means for a given proposition to be X% true instead of Y% true (a question which I’ll come back to in a follow-up post). For now, I’m just claiming that there’s an important sense in which thinking in terms of fuzzy truth-values is less wrong (another non-binary truth-value) than only thinking in terms of binary truth-values.

Model-based reasoning

The intuitions in favor of fuzzy truth-values become clearer when we apply them, not just to individual propositions, but to models of the world. By a model I mean a (mathematical) structure that attempts to describe some aspect of reality. For example, a model of the weather might have variables representing temperature, pressure, and humidity at different locations, and a procedure for updating them over time. A model of a chemical reaction might have variables representing the starting concentrations of different reactants, and a method for determining the equilibrium concentrations. Or, more simply, a model of the Earth might just be a sphere.

In order to pin down the difference between reasoning about propositions and reasoning about models, philosophers of science have drawn on concepts from mathematical logic. They distinguish between the syntactic content of a theory (the axioms of the theory) and its semantic content (the models for which those axioms hold). As an example, consider the three axioms of projective planes:

For any two points, exactly one line lies on both.
For any two lines, exactly one point lies on both.
There exists a set of four points such that no line has more than two of them.

There are infinitely many models for which these axioms hold; here’s one of the simplest:

If propositions and models are two sides of the same coin, does it matter which one we primarily reason in terms of? I think so, for two reasons. Firstly, most models are very difficult to put into propositional form. We each have implicit mental models of our friends’ personalities, of how liquids flow, of what a given object feels like, etc, which are far richer than we can express propositionally. The same is true even for many formal models—specifically those whose internal structure doesn’t directly correspond to the structure of the world. For example, a neural network might encode a great deal of real-world knowledge, but even full access to the weights doesn’t allow us to extract that knowledge directly—the fact that a given weight is 0.3 doesn’t allow us to claim that any real-world entity has the value 0.3.

What about scientific models where each element of the model is intended to correspond to an aspect of reality? For example, what’s the difference between modeling the Earth as a sphere, and just believing the proposition “the Earth is a sphere”? My answer: thinking in terms of propositions (known in philosophy of science as the syntactic view) biases us towards assigning truth values in a reductionist way. This works when you’re using binary truth-values, because they relate to each other according to classical logic. But when you’re using fuzzy truth-values, the relationships between the truth-values of different propositions become much more complicated. And so thinking in terms of models (known as the semantic view) is better because models can be assigned truth-values in a holistic way.

As an example: “the Earth is a sphere” is mostly true, and “every point on the surface of a sphere is equally far away from its center” is precisely true. But “every point on the surface of the Earth is equally far away from the Earth’s center” seems ridiculous—e.g. it implies that mountains don’t exist. The problem here is that rephrasing a proposition in logically equivalent terms can dramatically affect its implicit context, and therefore the degree of truth we assign to it in isolation.

The semantic view solves this by separating claims about the structure of the model itself from claims about how the model relates to the world. The former are typically much less nebulous—claims like “in the spherical model of the Earth, every point on the Earth’s surface is equally far away from the center” are straightforwardly true. But we can then bring in nebulosity when talking about the model as a whole—e.g. “my spherical model of the Earth is closer to the truth than your flat model of the Earth”, or “my spherical model of the Earth is useful for doing astronomical calculations and terrible for figuring out where to go skiing”. (Note that we can make similar claims about the mental models, neural networks, etc, discussed above.)

We might then wonder: should we be talking about the truth of entire models at all? Or can we just talk about their usefulness in different contexts, without the concept of truth? This is the major debate in philosophy of science. I personally think that in order to explain why scientific theories can often predict a wide range of different phenomena, we need to make claims about how well they describe the structure of reality—i.e. how true they are. But we should still use degrees of truth when doing so, because even our most powerful scientific models aren’t fully true. We know that general relativity isn’t fully true, for example, because it conflicts with quantum mechanics. Even so, it would be absurd to call general relativity false, because it clearly describes a major part of the structure of physical reality. Meanwhile Newtonian mechanics is further away from the truth than general relativity, but still much closer to the truth than Aristotelian mechanics, which in turn is much closer to the truth than animism. The general point I’m trying to illustrate here was expressed pithily by Asimov: “Thinking that the Earth is flat is wrong. Thinking that the Earth is a sphere is wrong. But if you think that they’re equally wrong, you’re wronger than both of them put together.”

The correct role of Bayesianism

The position I’ve described above overlaps significantly with the structural realist position in philosophy of science. However, structural realism is usually viewed as a stance on how to interpret scientific theories, rather than how to reason more generally. So the philosophical position which best captures the ideas I’ve laid out is probably Karl Popper’s critical rationalism. Popper was actually the first to try to formally define a scientific theory's degree of truth (though he was working before the semantic view became widespread, and therefore formalized theories in terms of propositions rather than in terms of models). But his attempt failed on a technical level; and no attempt since then has gained widespread acceptance. Meanwhile, the field of machine learning evaluates models by their loss, which can be formally defined—but the loss of a model is heavily dependent on the data distribution on which it’s evaluated. Perhaps the most promising approach to assigning fuzzy truth-values comes from Garrabrant induction, where the “money” earned by individual traders could be interpreted as a metric of fuzzy truth. However, these traders can strategically interact with each other, making them more like agents than typical models.

Where does this leave us? We’ve traded the crisp, mathematically elegant Bayesian formalism for fuzzy truth-values that, while intuitively compelling, we can’t define even in principle. But I’d rather be vaguely right than precisely wrong. Because it focuses on propositions which are each (almost entirely) true or false, Bayesianism is actively misleading in domains where reasoning well requires constructing and evaluating sophisticated models (i.e. most of them).

For example, Bayesians measure evidence in “bits”, where one bit of evidence rules out half of the space of possibilities. When asking a question like “is this stranger named Mark?”, bits of evidence are a useful abstraction: I can get one bit of evidence simply by learning whether they’re male or female, and a couple more by learning that their name has only one syllable. Conversely, talking in Bayesian terms about discovering scientific theories is nonsense. If every PhD in fundamental physics had contributed even one bit of usable evidence about how to unify quantum physics and general relativity, we’d have solved quantum gravity many times over by now. But we haven’t, because almost all of the work of science is in constructing sophisticated models, which Bayesianism says almost nothing about. (Formalisms like Solomonoff induction attempt to sidestep this omission by enumerating and simulating all computable models, but that’s so different from what any realistic agent can do that we should think of it less as idealized cognition and more as a different thing altogether, which just happens to converge to the same outcome in the infinite limit.)

Mistakes like these have many downstream consequences. Nobody should be very confident about complex domains that nobody has sophisticated models of (like superintelligence); but the idea that “strong evidence is common” helps justify confident claims about them. And without a principled distinction between credences that are derived from deep, rigorous models of the world, and credences that come from vague speculation (and are therefore subject to huge Knightian uncertainty), it’s hard for public discussions to actually make progress.

Should I therefore be a critical rationalist? I do think Popper got a lot of things right. But I also get the sense that he (along with Deutsch, his most prominent advocate) throws the baby out with the bathwater. There is a great deal of insight encoded in Bayesianism which critical rationalists discard (e.g. by rejecting induction). A better approach is to view Bayesianism as describing a special case of epistemology, which applies in contexts simple enough that we’ve already constructed all relevant models or hypotheses, exactly one of which is exactly true (with all the rest of them being equally false), and we just need to decide between them. Interpreted in that limited way, Bayesianism is both useful (e.g. in providing a framework for bets and prediction markets) and inspiring: if we can formalize this special case so well, couldn’t we also formalize the general case? What would it look like to concretely define degrees of truth? I don’t have a solution, but I’ll outline some existing attempts, and play around with some ideas of my own, in a follow-up post.

Meditations on Mot

Richard Ngo — Mon, 04 Dec 2023 00:15:54 GMT

Holy! Holy! Holy! Holy! Holy! Holy! Holy! Holy! Holy! Holy! Holy! Holy! Holy! Holy! Holy!
Everything is holy! everybody’s holy! everywhere is holy! everyday is in eternity! Everyman’s an angel!
Holy the lone juggernaut! Holy the vast lamb of the middleclass! Holy the crazy shepherds of rebellion! Who digs Los Angeles IS Los Angeles!
Holy New York Holy San Francisco Holy Peoria & Seattle Holy Paris Holy Tangiers Holy Moscow Holy Istanbul!
Holy time in eternity holy eternity in time holy the clocks in space holy the fourth dimension holy the fifth International holy the Angel in Moloch!
- Footnote to Howl

Scene: Carl and Allen, two old friends, are having a conversation about theodicy.

Carl: “Let me tell you about the god who is responsible for almost all our suffering. This god is an ancient Canaanite god, one who has been seen throughout history as a source of death and destruction. Of course, he doesn’t exist in a literal sense, but we can conceptualize him as a manifestation of forces that persist even today, and which play a crucial role in making the world worse. His name is M-”

Allen: “-oloch, right? Scott Alexander’s god of coordination failures. Yeah, I’ve read Meditations on Moloch. It’s an amazing post; it resonated with me very deeply.”

Carl: “I was actually going to say Mot, the Canaanite god of death, bringer of famine and drought.”

Allen: “Huh. Okay, you got me. Tell me about Mot, then; what does he represent?”

Carl: “Mot is the god of sterility and lifelessness. To me, he represents the lack of technology in our lives. With technology, we can tame famine, avert drought, and cure disease. We can perform feats that our ancestors would have seen as miracles: flying through the air, and even into space. But we’re still so so far from achieving the true potential of technology—and I think of Mot as the personification of what’s blocking us.

“You can see Mot everywhere, when you know what to look for. Whenever a patient lies suffering from a disease that we haven’t cured yet, that’s Mot’s hand at work. Whenever a child grows up in poverty, that’s because of Mot too. We could have flying cars, and space elevators, and so much more, if it weren’t for Mot.

“Look out your window and you see buildings, trees, people. But if you don’t see skyscrapers literally miles high, or trees that have been bioengineered to light the streets, or people who are eternally youthful and disease-free, then you’re not just seeing Earth—you’re also seeing Mot. Hell, the fact that we’re still on this planet, in physical bodies, is a testament to Mot’s influence. We could be settling the stars, and living in virtual utopias, and even merging our minds, if it weren’t for Mot.”

Allen: “Huh. Well, I feel you there; I want all those things too. And you’re right that god-like technology could solve almost all the issues we face today. But something does feel pretty weird about describing all of this as a single problem, let alone blaming a god of lacking-technology.”

Carl: “Say more?”

Allen: “Well, there’s not any unified force holding back the progress of technology, right? If anything, it’s the opposite. Absence of advanced technology is the default state, which we need to work hard to escape—and that’s difficult not because of any opposition, but just because of entropy.”

Carl: “What about cases where Mot is being channeled by enemies of progress? For example, when bureaucratic regulatory agencies do their best to stifle scientific research?”

Allen: “But in those cases you don’t need to appeal to Mot—you can just say ‘our enemy is overregulation’. Or if you defined Mot as the god of overregulation, I’d be totally on board. But you’re making a much bigger claim than that. The reason we haven’t uploaded ourselves yet isn’t that there’s a force that’s blocking us, it’s almost entirely that scientific progress is really really hard!”

Carl: “Yepp, I agree with all your arguments. And you’ve probably already guessed where I’m going with this, but let’s spell it out: why don’t these objections to blaming our problems on lack of technology, aka Mot, apply just as much to blaming them on lack of coordination, aka Moloch?”

Allen: “Yeah, I’ve been trying to figure that out. First of all, a lot of the intuitive force behind the concept of Moloch comes from really blatant coordination failures, like the ones that Scott lays out in the original post. If you’re stuck in a situation that nobody wants, then something’s gone terribly wrong; and when something goes terribly wrong, then it’s natural to start blaming enemy action.”

Carl: “There are really blatant examples of lack-of-technology too, though. Look at a wheel. It’s a literal circle; it’s hard to imagine any technology that’s simpler. Yet humans spent millennia gathering crops and carrying loads before inventing it. Or think about cases where we narrowly missed out on transformative breakthroughs. The Romans built toy steam engines—they just never managed to scale them up to produce an industrial revolution. Getting so close to accelerating a post-scarcity world by two millennia, but just missing, surely counts as something going terribly, tragically wrong. Don’t these cases demonstrate that Mot’s presence can be just as blatant as Moloch’s?”

Allen: “Well, a big part of both of those stories was the absence of demand. Wheels just weren’t very useful before there were high-quality roads; and early steam engines just weren’t very useful in the absence of large coal mines. Of course they both turned out to be very worthwhile in the long term, but that’s really hard to foresee.”

Carl: “So you’re saying that we sometimes need to jump out of a local trap in order to make longer-term technological progress. Remind me, what was your position on understanding local obstacles to progress by anthropomorphizing them as Canaanite gods?”

Allen: “Okay, fair point. But Moloch isn’t just an external obstacle—it’s also a state of mind. When you pretend that you’re going to cooperate when you’re not, or you place your own interests above those of the group, you’re channeling Moloch. And when enough people do that, societal trust breaks down, and the Molochian dynamics become a self-fulfilling prophecy.”

Carl: “And when you ridicule people for trying something different, or lobby for legislative barriers to deploying new technology, you’re channeling Mot. And when enough people do that, society loses faith in positive-sum growth, and progress stagnates. It’s directly analogous. Come on, what’s your true objection here?”

Allen: “I mean, I can’t fully articulate it. But the ideal of perfect coordination feels much more achievable to me than the ideal of perfect technology. We could just agree to act in a unified way—it’s simply a matter of wanting it enough. In other words, saying that lack of technology is responsible for our problems isn’t very actionable—you can’t just magic up technology out of nowhere. But saying that lack of coordination is responsible for our problems is a straightforward step towards convincing people to become more coordinated.”

Carl: “Actually, the last few centuries could pretty reasonably be described as humanity continually magicking up technology out of nowhere. Of course, scientific and technological progress still takes a lot of work, and a lot of iteration. But when it works, it lets you jump directly to far better outcomes. By contrast, it’s incredibly difficult to improve things like government competence or social trust—or even to prevent them from declining. So overall, boosting technological progress is far more actionable than increasing coordination, and we should write off the phrase ‘we could just agree’ as a particularly seductive appeal to magic.”

Allen: “I do agree that scientific and technological progress has far outstripped progress in governance and coordination. So on an intellectual level, I think you’ve convinced me that Moloch is no more useful a concept than Mot. But I still don’t feel like I’ve dissolved the question of why Moloch seems more compelling than Mot. Do you have any explanation for that?”

Carl: “I think the illusion comes from Scott using a simplistic notion of coordination, as exemplified by his claim that ‘the opposite of a trap is a garden… with a single gardener dictating where everything should go’. In other words, he implicitly assumes that ‘coordinate’ is synonymous with ‘centralize power’. From that perspective, we can view coordination as a single spectrum, with ‘Moloch’ at one end and ‘just put one guy in charge of everything’ at the other. But the space of possibilities is much richer and more complicated than that, along at least three different dimensions.

“Firstly, coordination is complicated in the same way that science is complicated: it requires developing new concepts and frameworks that are totally alien from your current perspective, even if they’ll seem obvious in hindsight. For most people throughout history, ideas like liberalism, democracy, and free speech were deeply counterintuitive (or, in Scott’s terminology, ‘terrifying unspeakable Elder Gods’). In terms of spreading prosperity across the world, the limited liability company was just as important an invention as the steam engine. If you wouldn’t blame Mot for all the difficulties of labor and locomotion that were eventually solved by steam engines, you shouldn’t blame Moloch for all the difficulties of trust and incentive alignment that were eventually solved by LLCs.

“Secondly, coordination is complicated in the same way that engineering large-scale systems is complicated: there are always just a huge number of practical obstacles and messy details to deal with. It took the best part of a century to get from the first commercial steam engine to Watts’ design; and even today, some of the hardest software engineering problems simply involve getting well-understood algorithms to work at much larger scales (like serving search results, or training LLMs). Similarly, when we look at important real-life coordination problems, they’re very different from toy problems like prisoner’s dilemmas or tragedies of the commons. Even when there’s a simple ‘headline idea’ for a better equilibrium, actually reaching that equilibrium requires a huge amount of legwork: engaging with different stakeholders, building trust, standardizing communication protocols, creating common knowledge, balancing competing interests, designing agreements, iterating to fix problems that come up, and so on.

“Thirdly, coordination is complicated in the same way that security is complicated: you don’t just need to build effective tools, you need to prevent them from being hijacked and misused. Remember that both fascist and communist despots gained power by appealing to the benefits of cooperation—‘fascism’ is even named after ‘fasces’, the bundles of sticks that are stronger together than apart. If we’d truly learned the lessons of history, then categorizing actions as ‘cooperating’ versus ‘defecting’ would feel as simplistic as categorizing people as ‘good’ versus ‘evil’. And in fact many people do sense this intuitively, which is why there’s so commonly strong resistance to top-down solutions to coordination problems, and why the scientific and engineering problems of building coordination technologies are so tricky.”

Allen: “I buy that coordination is often far more complicated than it seems. But blaming Moloch for coordination breakdowns still seems valuable insofar as it stops us from just blaming each other, which can disrupt any hope of improvement.”

Carl: “Yeah, I agree. I think of this in terms of the spectrum from conflict theory to mistake theory. Saying that few immoral defectors are responsible for coordination problems is pure conflict theory. The concept of Moloch reframes things so that, instead of ‘defectors’ being our enemies, an abstract anthropomorphic entity is our enemy instead. And that’s progress! But it’s still partly conflict-theoretic, because it tells us that we just need to identify the enemy and kill it. That biases us towards trying to find ‘silver bullets’ which would restore us to our rightful coordinated state. Instead, it’d be better to lean even further into mistake theory: discord is the default, and to prevent it we need to do the hard work of designing and implementing complicated alien coordination technologies.”

Allen: “You shouldn’t underestimate the value of conflict theory, though. It’s incredibly good at harnessing people’s tribal instincts towards actually doing something useful. We can’t be cold and rational all the time—we need emotionally salient motivations to get us fired up.”

Carl: “Right. So I don’t think we should get rid of Moloch as a rallying cry. But I do think that we should get rid of Moloch as a causal node in our ontologies: as a reason why the world is one way, rather than another. And I think we should be much more careful about terminology like ‘coordination failure’ or ‘inadequate equilibria’, which both mistakenly suggest that there’s a binary threshold between enough coordination and not-enough coordination. That’s like saying that cars which can go faster than 80 miles per hour are ‘adequate technology’, but cars which can’t are a ‘technology failure’. Maybe that’s occasionally a useful distinction, but it misses the bigger picture: that they’re actually very similar on almost all axes, because it takes so much complex technology to build a car at all.

“For Scott, there’s no better temple to Moloch than Las Vegas. But even there, my argument applies. You could look at Vegas and see Moloch’s hand at work. Or you could see Vegas as a product of the miraculous coordination technology that is modern capitalism—perhaps an edge case of it, but still an example of its brilliance. Or you could see Vegas as a testament to the wisdom of the constitution: casinos are banned almost everywhere in the US, but for the sake of diversity and robustness it sure seems like there should be at least one major city which allows them. Or you could see Vegas as an example of incredible restraint: there are innumerable possible ways to extract money from addled tourists in the desert, and Vegas prevents almost all of them. Or you could see it as a testament to the cooperative instinct inside humans: every day thousands of employees go to work and put in far more effort than the bare minimum it would take to not get fired. Setting aside the concept of Moloch makes it easier to see the sheer scale of coordination all around us, which is the crucial first step towards designing even better coordination technologies.

‘In Las Vegas, Scott saw Moloch. But in Scott’s description of Moloch, I see Mot. We can do better than thinking of coordination as war and deicide. We can think of it as science, as engineering, as security—and as the gradual construction, as we sail down the river, of the ship that will take us across the sea.”

Holy the sea holy the desert holy the railroad holy the locomotive holy the visions holy the hallucinations holy the miracles holy the eyeball holy the abyss!
Holy forgiveness! mercy! charity! faith! Holy! Ours! bodies! suffering! magnanimity!
Holy the supernatural extra brilliant intelligent kindness of the soul!

Techno-humanism is techno-optimism for the 21st century

Richard Ngo — Fri, 27 Oct 2023 00:00:24 GMT

Lately I’ve been reading about the history of economic thought, with the goal of understanding how today’s foundational ideas were originally developed. My biggest takeaway has been that economics is more of a moving target than I’d realized. Early economists were studying economies which lacked many of the defining features of modern economies: limited liability corporations, trade unions, depressions, labor markets, capital markets, and so on. The resulting theories were often right for their time; but that didn’t stop many from being wrong (sometimes disastrously so) for ours.

That’s also how I feel about the philosophy of techno-optimism, as recently defended in Marc Andreessen’s Techno-Optimist Manifesto. I’ll start this post with the many things it gets right; then explore why, over the last century, the power of technology itself has left straightforward techno-optimism outdated. I’ll finish by outlining an alternative to techno-optimism: techno-humanism, which focuses not just on building powerful engines of progress, but also on developing a deep scientific understanding of how they can advance the values we care about most.

The triumphs of techno-optimism

Here’s where I agree with Andreessen: throughout almost the entirety of human history, techno-optimism was fundamentally correct as a vision for how humans should strive to improve the world. Technology and markets have given us health and wealth unimaginable to people from previous centuries, despite consistent opposition to them. Even now, in the face of overwhelming historical evidence, we still dramatically underrate the potential of technology to solve problems ranging from climate change to neglected tropical diseases to extreme poverty and many more. This skepticism holds back not only speculative technologies but also ones that are right in front of us: miraculous solutions to huge problems (like vaccines for COVID and malaria, and gene drives for mosquito eradication) are regularly stalled by political or bureaucratic obstructionism, costing millions of lives. The RTS,S malaria vaccine spent twenty-three years bogged down in clinical trials, with six years of that delay caused by WHO “precautions” even after other regulators had already approved it. Fighting against the ideologies and institutional practices which cause tragedies like these is an incredibly important cause.

We often similarly underrate the other core tenet of techno-optimism: individual liberty. This was a radical principle when Enlightenment thinkers and leaders enshrined it—and, unfortunately, it remains a radical principle today, despite having proven it worth over and over again. It turns out that free markets can lead to near-miraculous prosperity; that free speech is the strongest enemy of tyranny; and that open-source projects (whether building software or building knowledge) can be extraordinarily successful. Yet all of this is constantly threatened by the creeping march of centralization and overregulation. Opponents of liberty always have high rhetoric about the near-term benefits, but fail to grapple with the extent to which centralized power is inevitably captured or subverted, sometimes with horrific consequences. To defeat that creep requires a near-pathological focus on freedom—which, little by little, has added up to a far better world. Even an ideally benevolent government simply couldn’t compete with billions of humans using the tools available to them to improve their own lives and the lives of those they interact with, including by discovering solutions that no central planner would have imagined.

Not only is techno-optimism vastly underappreciated, it’s often dismissed with shockingly bad arguments that lack understanding of basic history or economics. Because it’s so hard to viscerally understand how much better the world has gotten, even its most cogent critics fail to grapple with the sheer scale of the benefits of techno-optimism. Those benefits accrue to billions of people—and they aren’t just incidental gains, but increases in health and wealth that would have been inconceivable a few centuries ago. Familiarity with the past is, in this case, the best justification for optimism about the future: almost all of the barriers to progress we face today pale in comparison to the barriers that we’ve already overcome.

I say all this to emphasize that I very viscerally feel the hope and the beauty of techno-optimism. The idea that progress and knowledge can grow the pie for everyone is a stunningly powerful one; so too is the possibility that we live in a world where the principles of freedom and openness would triumph over any obstacle, if we only believed in them. And so when I criticize techno-optimism, I do so not from a place of scorn, but rather from a place of wistfulness. I wish I could support techno-optimism without reservations.

Three cracks in techno-optimism

But techno-optimism is not the right philosophy for the 21st century. Over the last century in particular, cracks have been developing in the techno-optimist narrative. These are not marginal cracks—not the whataboutism with which techno-optimism is usually met. These are cracks which, like the benefits of techno-optimism, play out at the largest of scales. I’ll talk about three: war, exploitation, and civilizational vulnerability.

One: the increasing scale of war. World War 1 was a bloody, protracted mess because of the deployment of new technologies like machine guns and rapid-firing artillery. World War 2 was even worse, with whole cities firebombed and nuked, amidst the industrial-scale slaughter of soldiers and civilians alike. And the Cold War was more dangerous still, leaving the world teetering on the brink of global nuclear war. Mutually Assured Destruction wasn’t enough to prevent close calls; our current civilization owes its existence to the courage of Stanislav Petrov, Vasily Arkhipov, and probably others we don’t yet know about. So you cannot be a full-throated techno-optimist without explaining how to reliably avoid catastrophe from the weapons that techno-optimists construct. And no such explanation exists yet, because so far we have been avoiding nuclear holocaust through luck and individual heroism. When the next weapon with planetary-scale destructive capabilities is developed, as it inevitably will be, we need far more robust mechanisms preventing it from being deployed.

Two: the increasing scale of exploitation. Technology allows the centralization of power, and the use of it to oppress the less powerful at far larger scales than before. Historical examples abound—most notably the Atlantic slave trade and the mass deaths of civilians under 20th-century communist and fascist regimes. But since it’s too easy to chalk these up as mistakes of the past which we’re now too enlightened to repeat, I’ll focus on an example that continues today: the mass torture of animals in factory farms. This started less than a century ago, in response to advances in logistics and disease-reduction technologies. Yet it has grown at a staggering scale since then: the number of animals killed in factory farms every year is comparable to the number of humans who have ever lived. The sheer scale of this suffering forces any serious moral thinker to ask: how can we stop it as soon as possible? And how can we ensure that nothing like it ever happens again?

Three: increasing vulnerability to reckless or malicious use of technology. Markets and technology have made the world far more robust in many ways. We should be deeply impressed by how supply chains which criss-cross the world remained largely intact even at the height of COVID. But there are other ways in which the world has become more vulnerable to our mistakes. The most prominent is gain-of-function research on pathogens. Not only are we now capable of engineering global pandemics, but China- and US-funded scientists probably did. And there’s no reason that the next one couldn’t be far worse, especially if released deliberately. Nor is bioengineering the sole domain in which (deliberate or accidental) offense may overwhelm defense to a catastrophic degree; other possibilities include geoengineering, asteroid redirection, nanotechnology, and new fields that we haven’t yet imagined.

These cracks all hurt the techno-optimist worldview. But I don’t think they take it down; techno-optimists have partial responses to all of them. The world has become far more peaceful as a result of our increasing wealth—even if wars can be far bigger, they’re also far rarer now. Technology will produce tastier meat substitutes, and cheap clean meat, and when it does factory farming will end, and humanity will look back on it in horror. And while we can’t yet robustly prevent the accidental or deliberate deployment of catastrophically powerful weapons (like nuclear weapons or engineered pandemics), we might yet stumble through regardless, as we have so far. So if the cracks above were the main problems with techno-optimism, I would still be a techno-optimist. I would have compunctions about the development of even more powerful weapons, and about all the sentient beings (whether farmed animals or wild animals or future people or artificial minds) which remained outside our circle of concern. But I’d still believe that any “cure” to those problems which undermined techno-optimism would be worse than the disease.

Yet I am not a techno-optimist, because we are about to leave the era in which humans are the most intelligent beings on earth. And the era of artificial intelligence will prise open these cracks until the deeper holes in the techno-optimist worldview become clear.

AI and techno-optimism

Until now, the primary forces shaping the world have been human intelligence and human agency, which allow us to envisage the outcomes we want, identify paths towards them, and consistently pursue them. Soon, artificial intelligence and artificial agency will match ours; and soon after that AI will far surpass us. I’ll describe at a very high level how I expect this to play out, in two stages.

Firstly: AIs used as tools will supercharge the dynamics I described above. The benefits of individual freedom will expand, as individuals become far more empowered, and can innovate far faster. But so too will the cracks enabled by technology: war, exploitation, and civilizational vulnerability. Which effect will be bigger? I simply don’t know; and nobody else does either. Predicting the offense-defense balance of new technology is incredibly difficult, because it requires accounting for all the different uses that future innovators will come up with. What we can predict is that the stakes will be higher than ever before: 21st-century technology could magnify the worst disasters of the 20th century, like world wars and totalitarianism.

Perhaps, despite that, the best path is still to rush ahead: to prioritize building new technology now, and assume that we can sort out the rest later. This is a very fragile strategy, though: even if you (and your government) will use it responsibly, many others will not. And there’s still relatively little attention and effort focused on trying to avert the large-scale risks of new technologies directly. So the techno-optimist response to risks like the ones I’ve described above is shallow: in theory it’s opposed, but in practice it may well make things worse.

Secondly: we’ll develop AI agents with values of their own. As AIs automate more and more complex tasks, over longer and longer timeframes, they’ll need to make more and more value-laden judgment calls about which actions and outcomes to favor. Eventually, viewing them as tools will be clearly inadequate, and we’ll need to treat them as agents in their own right—agents whose values may or may not align with our own. Note that this might only happen once they’ve significantly surpassed human intelligence—but given how fast progress in the field has been, this is something we should be planning for well in advance.

Humans who are free to make their own decisions tend to push the world to be better in terms of our values—that’s the techno-optimist position. But artificial agents who are making many decisions will push the world to be better in terms of their values. This isn’t necessarily a bad thing. Humans are hypocritical, short-sighted, often selfish, sometimes sadistic—and so it’s possible that AIs will be better custodians of our moral values than we are. We might train them to be wise, and kind, and consistently nudge us towards a better world—not by overriding human judgments, but rather by serving as teachers and mentors, giving us the help we need to become better people and build a better civilization. I think this is the most likely outcome, and it’s one we should be incredibly excited about.

But it’s also possible that AIs will develop alien values which conflict with those of humans. If so, when we give them instructions, they’ll appear to work towards our ends, but consistently make choices which bolster their power and undermine our own. Of course, AIs will start off with very little power—we’ll be able to shut them down whenever we detect misbehavior. But AIs will be able to coordinate with each other far better than humans can, communicate in ways we’re incapable of interpreting, and carry out tasks that we’re incapable of overseeing. They’ll become embedded in our economies in ways that amplify the effects of their decisions: hundreds of millions of people will use copies of a single model on a daily basis. And as AIs become ever more intelligent, the risks will grow. When agents are far more capable than the principals on whose behalf they act, principal-agent problems can become very severe. When it comes to superhuman AI agents, we should think about the risks less in terms of financial costs, or even human costs, and more in terms of political instability: careless principals risk losing control entirely.

How plausible is this scenario, really? That’s far too large a question to address here; instead, see this open letter, this position paper and this curriculum. But although there’s disagreement on many details, there’s a broad consensus that we simply don’t understand how AI motivations develop, or how those motivations generalize to novel situations. And although there’s widespread disagreement about the trajectory of AI capabilities, what’s much less controversial is that when AI does significantly surpass human capabilities, we should be wary of putting it in positions where it can accumulate power unless we have very good reasons to trust it.

It’s also worth noting that Andreessen's version of techno-optimism draws heavily from Nick Land’s philosophy of accelerationism, which expects us to lose control, and is actively excited about it. “Nothing human makes it out of the near future”, Land writes, and celebrates: “The planet has been run by imbeciles for long enough.” I read those words and shudder. Land displays a deep contempt for the things that make us ourselves; his philosophy is fundamentally anti-humanist (as Scott Alexander argues more extensively in his Meditations on Moloch). And while his position is extreme, it reflects a problem at the core of techno-optimism: the faster you go, the less time you have to orient to your surroundings, and the easier it is to diverge from the things you actually care about. Nor does speed even buy us very much. On a cosmic scale, we have plenty of time, plenty of resources available to us, and plenty of space to expand. The one thing we don’t have is a reset button for if we lose control.

AI and techno-humanism

So we need a philosophy which combines an appreciation for the incredible track record of both technology and liberty with a focus on ensuring that they actually end up promoting our values. This mindset is common amongst Effective Altruists—but Effective Altruism is an agglomeration of many very different perspectives, drawn together not by a shared vision about the future but by shared beliefs about how we’re obliged to act. I’d like to point more directly to an overarching positive vision of what humanity should aim towards. Transhumanism offers one such vision, but it’s so radically individualist that it glosses over the relationships and communities that are the most meaningful aspects of most people’s lives. (Note, for example, how Bostrom’s Letter from Utopia barely mentions the existence of other people; while his introduction to transhumanism relegates relationships to the final paragraph of the postscript.) So I’ll borrow a term coined by Yuval Harari, and call the philosophy that I’ve been describing techno-humanism.

Harari describes techno-humanism as an ideology focused on upgrading humans to allow our actions and values to remain relevant in an AI-dominated future. I broadly agree with his characterization (and will explore it more in future posts), but both the benefits and the risks of re-engineering human brains are still a long way away. On a shorter timeframe, I think a different way of “upgrading” our minds is more relevant: developing a deep understanding of our values and how technology can help achieve them. Flying cars and rockets are cool, but the things we ultimately care about are far more complex and far more opaque to us. We understand machines but not minds; algorithms but not institutions; economies but not communities; prices but not values. Insofar as we face risks from poor political decisions, or misaligned AIs, or society becoming more fragile, from a techno-humanist perspective it’s because we lack the understanding to do better.

This is a standard criticism of techno-optimism—and usually an unproductive one, since the main alternative typically given is to defer to academic humanities departments which produce far more ideology than understanding. But techno-humanism instead advocates trying to develop this understanding using the most powerful tools we have: science and technology. To give just a few examples of what this could look like: studying artificial minds and their motivations will allow us to build more trustworthy AIs, teach us about our own minds, and help us figure out how the two should interface. The internet should be full of experiments in how humans can interact—prediction markets, delegative democracies, adversarial collaborations, and many more—whose findings can then improve existing institutions. We should leverage insights from domains like game theory, voting theory, network theory, and bargaining theory to help understand and reimagine politics—starting with better voting systems and ideally going far further. And we should design sophisticated protocols for testing and verifying the knowledge that will be generated by AIs, so that we can avoid replicating the replication crises that currently plague many fields.

This may sound overly optimistic. But some of the most insightful fields of knowledge—like economics and evolutionary biology—uncovered deep structure in incredibly complex domains via identifying just a few core insights. And we’ll soon have AI assistance in uncovering patterns and principles that would otherwise be beyond our grasp. Meanwhile, platforms like Wikipedia and Stack Overflow have been successful beyond all expectations; it’s likely that there are others which could be just as valuable, if only there were more people trying to build them. So I think that the techno-humanist project has a huge amount of potential, and will only become more important over time.

Balancing the tradeoffs

So far I’ve described techno-humanism primarily in terms of advances that techno-optimists would also be excited about. But inevitably, there will also be clashes between those who prioritize avoiding the risks I’ve outlined and those who don’t. From a techno-optimist perspective—a perspective that has proven its worth over and over again during the last few centuries—slowing down technological progress has a cost measured in millions of lives. This is an invisible graveyard which is brushed aside even by the politicians and bureaucrats most responsible for it; no wonder many techno-optimists feel driven to push for unfettered acceleration.

But from a techno-humanist perspective, reckless technological progress has a cost measured in expected fractions of humanity’s entire future. Human civilization used to be a toddler: constantly tripping over and hurting itself, but never putting itself in any real danger. Now human civilization is a teenager: driving fast, experimenting with mind-altering substances, and genuinely capable of wrecking itself. We don’t need the car to go faster—it’s already constantly accelerating. Instead, we need to ensure that the steering wheel and brakes are working impeccably—and that we’re in a fit state to use them to prevent non- or anti-human forces controlling the direction of our society.

How can people who are torn between these two perspectives weigh them against each other? On a purely numerical level, humanity’s potential to build an intergalactic civilization renders “fractions of humanity’s future” bigger by far. But that math is too blasé—it’s the same calculation that can, and often has, been used to justify centralization of power, totalitarianism, and eventual atrocities. And so we should be extremely, extremely careful when using arguments that appeal to “humanity’s entire future” to override time-tested principles. That doesn’t imply that we should never do so. But wherever possible, techno-optimists and techno-humanists should try to cooperate rather than fight. After all, techno-humanism is also primarily about making progress: specifically, the type of progress that will be needed to defuse the crises sparked by other types of progress. The disagreement isn’t about where we should end up; it’s about the ordering of steps along the way.

The two groups should also challenge each other to do better in areas where we disagree, so that we can eventually reach a synthesis. One challenge that techno-humanists should pose to techno-optimists: be more broadly ambitious! We know that technology and markets can work incredibly well, and have a near-miraculous ability to overcome obstacles. And so it’s easy and natural to see them as solutions to all the challenges confronting us. But the most courageous and ambitious version of techno-optimism needs to grapple with the possibility that our downfall will come not from lack of technology, but rather overabundance of technology—and the possibility that to prevent it we need progress on the things that have historically been hardest to improve, like the quality of political decision-making. In other words, techno-humanism aims to harness human ingenuity (and technological progress) to make “steering wheels” and “brakes” more sophisticated and discerning, rather than the blunt cudgels that they often are today.

My other challenge for techno-optimists: be optimistic not just about the benefits of technological growth, but also about its robustness. The most visceral enemy of techno-optimism is stagnation. And it’s easy to see harbingers of stagnation all around us: overregulation, NIMBYism, illiberalism, degrowth advocacy, and so on. But when we zoom out enough to see the millennia-long exponential curve leading up to our current position, it seems far less plausible that these setbacks will actually derail the long-term trend, no matter how outrageous the latest news cycle is. On the contrary: taking AGI seriously implies that innovation is on the cusp of speeding up dramatically, as improvements generated by AIs feed back into the next generation of AIs. In light of that, a preference for slower AI progress is less like Luddism, and more like carefully braking as we approach a sharp bend in the road.

Techno-optimists should challenge techno-humanists to improve as well. I can’t speak for them, but my best guess for the challenges that techno-optimists should pose to techno-humanists:

Techno-humanists need to articulate a compelling positive vision, one which inspires people to fight for it. Above, I’ve listed some ideas which have potential to improve our collective understanding and decision-making abilities; but there’s far more work to be done in actually fleshing out those ideas, and pushing towards their implementation. And even if we succeeded, what then? What would it actually look like for humanity to make consistently sensible decisions, and leverage technology to promote our long-term flourishing? Knowing that would allow us to better steer towards those good futures.
Techno-humanists should grapple more seriously with the incredible track record of techno-optimism. Throughout history, people have consistently dramatically underrated how valuable scientific and technological progress can be. That’s not a coincidence at all, because characterizing which breakthroughs are possible is often a big chunk of the work required to actually make those breakthroughs. Nor is it a coincidence that people dramatically underrate the value of liberty—decentralization works so well precisely because there are so many things that central planners can’t predict. So even if you find my arguments above compelling, we should continue to be very wary of falling into the same trap.

The purpose of this blog is to meet those challenges. Few of the ideas in this post are original to me, but they lay the groundwork for future posts which will explore more novel territory. My next post will build on them by arguing that an understanding-first approach is feasible even when it comes to the biggest questions facing us—that we can look ahead to see the broad outlines of where humanity is going, and use that knowledge to steer towards a future that is both deeply human and deeply humane.