“Your worst sin is that you have destroyed and betrayed yourself for nothing.” - Dostoevsky
When people set an ambitious goal, they can fail simply by not changing the world very much. But there’s another surprisingly common way to fail: by achieving the opposite of their goal. I call this effect pessimization: the opposite of optimization.
Though pessimization is an uncommon term, it’s not an uncommon concept. We allude to it whenever we call someone their own worst enemy, or predict that their actions will backfire. We try to take advantage of it with reverse psychology. It’s the subject of Robert Conquest’s third law of politics: “the simplest way to explain the behavior of any bureaucratic organization is to assume that it is controlled by a cabal of its enemies.” And the POSIWID aphorism (“the purpose of a system is what it does”) is often used to describe ways in which a system actively opposes its nominal purpose. I’ve also previously described the pessimization of ideological advocacy under the heading of “the activist’s curse”.
I divide pessimization into three types, each of which I’ll discuss in detail. The most intuitive is what I call direct pessimization: when an enemy chooses what to do based on what will harm your interests the most. This is the sense in which “pessimize” is used by e.g. Yudkowsky and Soares. Drivers of direct pessimization include sadism, revenge and threats.
Conversely, indirect pessimization arises when your actions help other people achieve the opposite of your goals, even though they’re not deliberately trying to hurt you. And perverse pessimization occurs when an agent or coalition nominally dedicated to a goal is actively trying to achieve the opposite of that goal—i.e. it’s directly pessimizing itself.
I’d like to improve our understanding of pessimization so that we can better create, identify, and promote remedies to it. In the rest of this post I characterize each of the three types of pessimization I mentioned above. I treat them as scale-free phenomena, describing examples within individuals, organizations, countries, and even whole civilizations.
Direct pessimization
Direct pessimization is when one agent or faction is specifically trying to hurt another—not as a side effect of other actions, but rather as a means of achieving their desired outcome. For example, imprisoning criminals in order to prevent them from committing further crimes wouldn’t qualify. However, punishing criminals for the sake of deterrence or retribution would: in the first case the prisoners’ suffering is the means by which others are deterred, in the second case their suffering itself is the desired outcome. While less of a focus today, these motivations were more salient in historical penal systems, which used a wide range of physical punishments ranging from flogging all the way to cruel and unusual tortures. By “hurting” I don’t just mean causing physical pain, though, but also other ways of harming someone’s interests—like the social humiliation of stocks, exile, execution, attainder, or the Chinese nine familial exterminations.
To be clear, the distinction between a “means” and a “side effect” of achieving a goal is a thorny one, which many philosophers have debated. For example, which harms inflicted on an enemy nation are necessary components of winning a war, and which are merely unfortunate side effects of winning the war? I won’t try to resolve the edge cases here—however, clear examples of larger-scale direct pessimization include terrorist attacks and genocides, as well as terror bombing civilians during wars. At even larger scales, s-risk threat models outline how and why superintelligences might threaten to pessimize each others’ utility functions.
Another driver of direct pessimization is sadism. We see this in serial killers, sociopaths, and many of the people who end up actually implementing the punishments discussed above. However, one reason why I wanted to carve out the category of “direct pessimization” in the first place is because I don’t know how to draw a principled distinction between sadism, revenge, deterrence, and hurting others for instrumental benefit. For example, many people think of social status as zero-sum, and display petty cruelty (e.g. mocking jokes or mean-spirited gossip) in order to maintain their position in the hierarchy. The type of sadism that is characteristic of sociopaths is typically much more intense, but I’m not sure that the underlying logic is so different. Indeed, since sociopathy often results from traumatic childhood experiences, we could view it as a kind of “revenge” on the world at large—not too dissimilar to how normal people sometimes feel about those who have wronged them.
The examples I’ve discussed above only feature pessimization in one direction—but direct pessimization often spurs cycles of retribution, at many scales. Within individual psychology, internal conflict (e.g. procrastination) can prompt increasingly harsh self-coercive strategies, like vicious self-criticism. In relationships, emotional insecurity can lead partners to approach each other with blame, shame, and cruelty, until the person who loves them the most is also the one who hurts them the most. In politics, focusing on the other side’s failings can provoke cycles of negative partisanship (see also the toxoplasma of rage and “owning the libs”). In geopolitics, actions aimed at hurting enemy states (like sanctions or blockades) can spiral into military engagements, and from there into no-holds-barred wars. One way of viewing moral progress is the process of forming agreements to limit cycles of direct pessimization—both at large scales, like international conventions; and at small scales, like local norms of decency.
While few of these examples will be novel to many readers, it feels useful to bring them together under a single heading—both to make it easier to reason about preventing them, and to contrast them with my two other types of pessimization. The most notable contrast is that direct pessimization is a transitive relationship: we describe A directly pessimizing B. Conversely, perverse pessimization is a reflexive relationship: “B pessimized themself” (or “B pessimized" for short). What about when A harms B’s interests as a side effect of pursuing their own goals? Depending on how this happened, we might call it an accident, or negligence, or constructive malice. However, I wouldn’t classify any of these as pessimization. Instead, I’m interested in the cases where B’s own actions make it much easier for their interests to be harmed, which I’d describe as B indirectly pessimizing themself. Let’s explore that now.
Indirect pessimization
Caring about X often leads you to create intellectual and physical tools for affecting X. But these can convince and/or help people with other goals to produce ~X (even when they’re not specifically trying to hurt you). I call this dynamic indirect pessimization.
The simplest example of indirect pessimization is simply telling the wrong people about your goal. To pursue a goal X you need to have some conception of what X is and how you’re going to achieve it. But telling people about X prompts them to consider whether pursuing ~X is a good way to achieve their own interests. For example, if your goal is to prevent anyone from creating mirror life, because it might destroy the world, then telling people about your goal may be what gives them the idea of creating mirror life at all.
Indirect pessimization happens most easily when X advocates don’t have very good proposals for what people should do to help achieve X. If so, most ways that people act on their new knowledge about X will contribute to achieving ~X. For example, early AI safety advocates did a lot to raise awareness of AGI risk—but the lack of actionable strategies to help reduce risk led to much of that awareness being directed towards founding various AGI labs which are now racing towards AGI.
Furthermore, pro-X coalitions often motivate work towards X by identifying its beneficial consequences. But every argument of the form X→Y (for example, “higher education makes people more progressive”) can be translated into the form ~Y→~X (“to avoid people becoming more progressive, limit higher education”). And so, unless there’s unanimity on Y being valuable, such arguments will convince some people to oppose X, which might outweigh the benefit of gaining new supporters.
We often see this effect in social justice advocacy. Calling something racist is an effective strategy in moderation, but when overused starts to make people think that racism isn’t so bad. Relatedly, in this tweet a leftist is trying to make discussion of race science more prominent, because they think that believing in it is taboo enough to discredit their political opponents. But highlighting that a person with widespread support believes in race science will help undermine the taboo in the minds of that person’s supporters. Lastly, perhaps the most egregious example is the conflation of all leftist issues into a single “omnicause”—e.g. this argument that “Palestine is the issue that makes us realise everything is interconnected. Every struggle for justice, freedom, and liberation”. Even if some people find this persuasive, it’s also a kind of indirect pessimization—now people who disagree with the writer on Palestine will be pushed towards disagreeing on everything else too.
Indirect pessimization doesn’t just arise from constructing concepts and arguments, but also physical tools or mechanisms. For example, people sometimes propose constructing tech for deflecting asteroids away from Earth. But given how low the rate of dangerous asteroids hitting Earth is, the most likely way for that to happen is if asteroid-deflection technology is misused to deflect asteroids towards Earth. So naive attempts to lower asteroid risk might easily empower people to increase it instead.
Similarly, consider the apocryphal anecdote about the cobra effect. This is often cited as an illustration of Goodhart’s law. But what’s most striking about the story is that it described an intervention not just failing to reduce the population of wild cobras, but actually increasing it. In other words, these (fictional) authorities indirectly pessimized their own goal. A real-world example comes from ML, where constructing a given objective function creates the possibility of accidentally inverting it—e.g. as OpenAI did by (briefly) creating maximally-inappropriate AIs (section 4.4). The Waluigi effect and emergent misalignment provide more recent examples of indirect pessimization in AI values. Meanwhile I suspect that many evaluations of dangerous AI capabilities, which were intended to help prevent them from being developed, will instead help AGI labs accelerate towards them.
Of course, when people try to achieve some goal, there are many possible undesirable side-effects. So why is it worth distinguishing specifically the category of side effects which cause the opposite of that goal? One reason is that many of the mechanisms which drive indirect pessimization are (unlike most side effects) relevant for a wide range of goals. Another is that fear-based motivation makes indirect pessimization unusually hard to think about. The more scared you are that you might not achieve your goal, the more urgently you feel that “something must be done”, and the more you flinch away from picturing how that “something” might actually make it worse. This applies both on an individual level (where the possibility that you’re indirectly pessimizing is often very painful for your ego), and on a group level (where self-criticism is often taboo).
Perverse pessimization
The final type of pessimization I’ll talk about is perverse pessimization. I define this as a situation where the nominally pro-X faction is the one directly causing ~X to happen—i.e. the pro-X faction is sabotaging itself. What causes this? I’ll talk about two broad reasons: fear of success, and vice signalling within factions.
I’ve talked above about how fear of not achieving one’s goals can indirectly pessimize those goals. But what’s far more perverse is a situation where a coalition is instead scared of achieving its nominal goals. One reason why this happens: the identity, prestige and even existence of a coalition often depend on the persistence of the problem it’s trying to solve. So the prospect of the problem going away can be very scary to individual members of the coalition (especially powerful or longstanding members) or the coalition as a whole.
This applies especially when solutions are offered from outside the coalition—if those solutions work, then the coalition has to admit that its previous strategies weren’t working, which might trigger shame or envy. So it’s particularly tempting for coalitions to suppress external attempts to achieve their own nominal goals. The environmentalist movement’s opposition to nuclear power, and often even to solar or wind infrastructure, provides a good example of this (as does its gradual slide towards being a generic anti-capitalist movement).
Another source of fear of success is that the more successful you are, the more you have to lose. Having hope, then losing it, is a very painful experience. So people often decide that it’s better to proactively identify as a victim in order to avoid that risk. From that mindset, other people’s success sparks envy, resentment, and attempts to tear them down—but this toxicity only makes one’s own problems worse. For example, the men who join incel communities deeply want to find partners, but end up in a perverse position where they would lose their community if they found a partner. (Some pointers to further reading on this dynamic: tall poppy syndrome; the laws of Jante; this essay about Magic: the Gathering; Existential Kink; and Sadly, Porn.)
We also see fear of success in many relationships. The most successful relationships are those intimate enough to allow both people to feel deeply understood. But the vulnerability required for real intimacy is terrifying, and it’s easy for that fear to lead us to push friends or romantic partners away—essentially punishing them for wanting to be close to us. We hope that our defense mechanisms will ensure that we’re only vulnerable to people who deeply care about us, but they’re often precisely what prevent people from coming to deeply care about us (as masterfully portrayed in Notes From Underground).
A second major reason why perverse pessimization arises is that anti-X behavior can help someone rise within a (nominally) pro-X coalition. This is a kind of vice signalling, showing that you have enough power to directly defy the values of your own coalition. I expect that there’s a lot of private signalling of cynicism in high-pressure law and finance firms, amongst politicians, and amongst highly competitive elites more generally. One particularly legible example comes from Gerald Ratner, a jewelry CEO who called his own products “total crap” in an apparent attempt to signal cynicism.
These same dynamics play out within society as a whole, with domain-specific cynicism replaced by transgression against broader norms. Satanism, for instance, was historically compelling precisely because it was so transgressive. These days, few are offended by it. But Hitler is now filling the role of a secular Satan, which then sparks transgressions like Kanye West’s recent song “Heil Hitler”, “ironic” neo-Nazism on 4chan, etc. Meanwhile many sexual fetishes are erotic in large part because they’re taboo, with the contrast between the fear of taboo violation and the relief of sexual acceptance being a source of pleasure. Jointly acting out taboo fetishes can allow elites to form transgression bonds—Epstein’s island being perhaps the most prominent example (as well as in Hollywood, with Weinstein, Diddy, etc).
Such transgressions typically start out covert. But it’s hard to keep controversial secrets—especially because bragging about transgressions is a good way to signal dominance. So transgressions tend to slide towards becoming “open secrets”, which often creates a self-fulfilling prophecy that there’s a consensus of power in favor of transgression. People who want to stay on the good side of that consensus learn to actively punish others for trying to enforce norms or pursue the stated goals of the coalition. Hermann et al. call this phenomenon anti-social punishment; Ben Hoffman calls it depravity; Jessica Taylor calls it anti-normativity; and Michael Vassar links it to postmodernism. I don’t know of systematic ways to detect anti-normativity, but sufficiently egregious hypocrisy can provide strong hints that it exists. For example, one of the US military’s strongest nominal values is abiding by US laws. Yet rather than rewarding whistleblowers who expose examples of criminal behavior, the military typically persecutes them.
I’ve described fear of achieving one’s goals and anti-normativity as two separate mechanisms driving perverse pessimization. But being in an anti-normative coalition induces fear of achieving your goals, because succeeding (or even just standing out) makes you a target. And when you’re scared of achieving your goals, you’ll reward other people (or your own internal subagents) for signalling opposition to those goals, thereby reinforcing the anti-normative coalition. So we should actually think of them as two facets of a single phenomenon.
Orienting to pessimization
Pessimization sucks, but that doesn’t mean we should be constantly scared of it. The accusation of pessimization could easily become a bludgeon used to attack anyone doing anything valuable (which would, ironically, pessimize the desire to avoid pessimization). The challenge is to watch out for pessimization in a way which doesn’t slip into self-defeating cynicism.
So it’s also worth talking about limits on pessimization. While direct pessimization is a powerful way to cow your enemies, it also makes you a target for retaliation—e.g. violating norms against torture can spur domestic and international opposition to a regime. Meanwhile indirect and perverse pessimization are often covert enough that they only spread slowly. For example, even though some people find pedophilia attractive because it’s taboo, I expect that the taboo overall dramatically reduces pedophilia.
Perverse pessimization is also self-undermining: it makes systems weaker over time, which reduces their long-term influence. If big companies weren’t so often stuck in moral mazes, they could do a far better job at fending off startups. More generally, the parts of the world that avoid perverse pessimization will tend to grow and become more important over time. So creative destruction is one of our best defenses against perverse pessimization.
However, letting institutions taken over by perverse pessimization fail can be arbitrarily costly—e.g. when major governments rot from the head down. And so it can be incredibly high-leverage for whistleblowers and other reformers to identify and oppose perverse pessimization inside important institutions (see e.g. Ellsberg, Kokotajlo). This is bottlenecked on virtues like courage and integrity, which is a major reason why I’ve become a virtue ethicist (as I’ll write about in an upcoming post). Indeed, I suspect that one way of understanding virtues is as traits that guard us against sliding into pessimization, thereby allowing us to more robustly steer the world towards desired outcomes.
Posiwid, constructive malice, and systemic pessimizationism. The algorithm blessed me today. Thank you.