50 Comments
User's avatar
Rafael Kaufmann's avatar

The "Bayesianism" being argued against here seems to actually be "Yudkowskianism" and clicking through your references digs up a lot of LessWrong posts. If so, I mostly agree with the substance of your criticism. In particular, I agree that attempting to tag probabilities onto propositions without making explicit and understanding the model that translates context into application of Bayes' rule (which I call the "Yudkowskian vice" by analogy with the "Ricardian vice" of economics) makes many if not most attempts at Internet rationalism fail before they have even started. However, common Bayesian practice outside of Internet forums (going back at least to Jaynes's seminal book, I won't make claims about the past further than that, I'm not a historian of science) does give models exactly the primacy you mention. (This is also not a philosophical innovation, but rather an ipsis litteris implementation of Quinean holism, an idea from the 1940s. And Quine was a *popular philosopher!*.) Indeed, for a few decades already we've gone past that and into explicitly conditioning first-order variables on model-valued variables, and then performing higher-order inference on model space, which indeed lets us solve (approximately) any kind of problem. For a particularly clear exposition of modern Bayesianism, check out Richard McElreath's "Statistical Rethinking", both the book and the accompanying lectures on YouTube.

On your first gripe, with "degrees of truth", I claim it's a fundamentally misguided concept. "P='The Earth is a sphere' is mostly true" is not a statement of a scalar attribute of a proposition that's just waiting to be quantified, it's a statement about the applicability of the proposition -- under which conditions it's OK to make this approximation. It's a convenient way to say "If you're trying to use the truth value of P to make claims/decisions about astronomy, then it's True; if you're trying to use it to make claims/decisions about some specific kind of engineering that cares about the exact distances to the center of the Earth, gravity, etc, then it's False." To say that "P is 99.99% true" may be logically possible in principle by somehow summing over model space, but it has no usefulness, because it misses the all-important fact that makes the "mostly true" statement useful -- under which conditions it's to be taken as true!

Richard Ngo's avatar

Ty for the helpful comment. A few quick responses:

1. I've changed the second paragraph to read "The core idea of Bayesian epistemology: we should ideally reason by assigning credences to propositions which represent our degrees of belief that those propositions are true. (Note that this is different from Bayesianism as a set of statistical techniques, or Bayesianism as an approach to machine learning, which I don’t discuss here.)" Bayesian epistemology now links to the corresponding Stanford Encyclopedia Article - while this post was inspired by Yudkowskyism, I think academic philosophers make similar mistakes.

2. After skimming the first few chapters of the McElreath book, it seems like he focuses on using Bayesian methods to learn model parameters. But my focus is on the interactions *between* models (since humans don't learn models using Bayesian methods).

3. "To say that "P is 99.99% true" ... has no usefulness". Well, I'd certainly say it's useful to know that "the earth is a sphere" is more true than "the earth is a cube". You might then say that we can't pin down any real numbers; I won't address that here, but will try to do so in a follow-up post.

Rafael Kaufmann's avatar

You're welcome, and likewise thank you for the responses. In order:

1. I haven't tracked the development of academic Bayesian epistemology, but if a philosopher makes such mistakes, then it seems to me like yet another case of using plausible-looking math as a new varnish on bad old philosophy (some kind of naive empiricism?). When looking at it from the other direction -- at what modern information and probability theory really seems to be saying about epistemology -- it seems a lot more compatible with a Quinean holism, instrumentalism a la Dennett and others, etc. (Note that one can indeed prove theorems that *assume* that there exists a certain real generative process in the world that generates observations about a system, and show that under certain conditions on its observability etc, Bayesian updating (on models, not atomic propositions!) will converge on a truthful representation of this process. However, this assumption is not at all required for operationalizing the theory, of course.)

2. McElreath covers a lot of hierarchical inference (where higher-level parameters are effectively parameterizing distributions over lower-level models) in later chapters, but he does not indeed cover some key parts of "structure learning" - where your candidate models come from, how to combine models). Is that your issue, or were you talking about something else?

3. Saying "P is 99.99% true" without conditioning on a model means that it is true after simultaneously marginalizing over every possible logical antecedent. I can give you a distribution over models such that "The Earth is a square" has a higher posterior probability than "The Earth is a sphere". In order to say this is wrong, you need to condition the allowable distribution of models on a wide set of definitional and observational parameters and then marginalize them away. This easily becomes uncomputable. The alternative is to a priori constrain your model set, and the interesting question is which specific degrees of freedom make a difference.

Kaiser Basileus's avatar

The Earth is a sphere to a certain degree of approximation, sufficient for most purposes. There are no exact spheres in physical reality, only more or less spherical depending on what resolution of understanding is required.

Richard Futrell's avatar

Regarding vagueness and pragmatic interpretation of things like "large" and "water in the fridge", it's worth pointing out that the best models of these phenomena (in terms of accurately predicting how people will interpret utterances) are in fact probabilistic Bayesian models. For example, as outlined here https://www.problang.org/chapters/05-vagueness.html

These are models where utterance interpretation is a process whereby a listener does Bayesian inference about what an *informative* speaker would say, where that speaker is reasoning about a listener, who is in turn reasoning about a speaker, and so on recursively. The recursion bottoms out in a base case consisting of truth-value semantics, for example for a word like "large" the semantics is "x is large if x is larger than a threshold θ", where θ is a free parameter whose value ends up being inferred probabilistically as part of the recursive reasoning process. The effect is that the (distribution on the) threshold θ ends up being set with respect to a reference class that would make the utterance informative based on the world models of the simulated listeners. For example if I say "Jupiter is large" then you will end up inferring a threshold θ that would make sense for planets, and if I say "my thumb is large" you'll end up inferring a threshold θ that would make sense for bodily appendages.

Which is all to say: a more complex, but still fully Bayesian and probabilistic process, grounding out in definite truth values, actually provides a very good model for how people use vague expressions like this. There's a pretty big academic literature on models like this, of which the size threshold above is just one example.

Abram Moats's avatar

I cannot help but think that were Wittgenstein alive today he’d fire off 3 tweets in response to this that would occupy scholars for a century or more.

Kaiser Basileus's avatar

Bayesian training is essentiality that as long as you iterate properly when faced with new information you will always approach truth.

Thesmara's avatar

“Of course, nonsense is also a matter of degree—e.g. metaphors are by default less meaningful than concrete claims, but still not entirely nonsense.”

Very much disagree. Nonsense is where one could not make sense of a proposition, so that the proposition couldn’t convey anything (like “this sentence is false” or “colorless green ideas sleep furiously”). Nothing gets conveyed from truly nonsensical statements.

Metaphors convey significant meaning and there is no reason that metaphor that contain a higher level meaning are automatically deemed less meaningful than literal statements. We even understand much of the world through metaphor rather than direct claims. Not to mention that all of art and literate are in a sense metaphorical, which I wouldn’t say is less meaningful than statements of fact or logic.

Nebu Pookins's avatar

I suspect that Ngo believes there's an interesting "dimension" along which we can use to justify why fuzzy truth values may be more useful than binary truth values that is "orthogonal" to "vagueness", "approximation", etc., and Ngo chooses to label that dimension (sense vs) "nonsense". This may or may not align with one's intuition about what the word "nonsense" means in a more general context. In essence, they are introducing a term that they intend to use in a limited technical sense for the purpose of their argument, and they are (implicitly) providing a definition for this term.

And so I'm worried that perhaps you have some pre-existing intuition of what the term "nonsense" might mean (which is fine, probably almost all English speakers do) and then you're getting stuck on this part of Ngo's argument because your intuition about what the word means doesn't match what you've inferred Ngo's definition to be. This part is "not fine" if your goal is to "understand" Ngo's argument.

I think it's analogous to a situation where a somewhat technical blog post (more formal than casual conversation but less formal than an academic paper) might say something like "I want to define the term 'sphere' to mean the set of all points equidistant from some other given point, regardless of the number of dimensions we are working in, and regardless of how we measure distance. So for example, in a 2D world where we use taxicab distance, a 'sphere' looks like a jagged-edged diamond https://en.wikipedia.org/wiki/Taxicab_geometry#/media/File:TaxicabGeometryCircle.svg "

In my analogy, the "right" thing to do, if the goal is to understand the argument that the blog post is presenting, is to just (temporarily) accept the definition the author chose, and see whether their argument makes sense when this definition is taken as an axiom. It's "not useful" to object "but that's not what a 'sphere' is!"

Kaiser Basileus's avatar

All things that are fully understood can be explained in simple physical metaphors, which express basic relationships and are the foundation of language.

Dmitry Erkin's avatar

Fantastic post! Thank you!

I am very happy that someone- you- made a point about model view of the world. I intuitively tried to argue that but never was able to do it successfully. And you did it! Thank you!

Dmitry Erkin's avatar

How would you respond to “Bayesean network “ a model that theoretically could scale to describe general case

Moddy's avatar

My objection to Bayesianism and to Fuzzy Logic is that we're trying to force a model (i.e. that truth values are numbers between 0 and 1) onto the real world, where people don't actually think in those terms. Assigning numbers to the a proposition gives the false impression that you can ask which of the following sentences is more true: "I can speak Spanish" or "the Sopranos was a great TV show"

Bayesianism is by itself a model, and can be useful or not according to what you want to achieve.

If you assume there's a real world, then "everything I know" is a model of the world. "Everything I can express in English" is a model of the world. "Newtonian Physics" is a model of the world, and they are all false since they don't represent the world exactly, and they are all true since they are useful in some scenarios.

I think the relationship between models is of reductionism: logic in a reduction of language, and Astronomy is a reduction of physics, etc. One problem is that there is a process of transforming a question from a model to a reduced model in order to solve it, and it is often done without much consideration. Say I want to count how much food to make and I thus count the guests, and multiply with the amount a person would eat, and thus reduce a real-world question to mathematics. We do it automatically, and almost never try to justify it. We can make an error in the reducing (e.g. by choosing the wrong model) or by misinterpreting the results. My favorite example of the latter is claiming that gambling is irrational because the house always wins.

But my real objection to Bayesianism is the confirmation bias. Once you have an opinion, more and more evidence will not necessarily move you towards the "truth", but might as well do the opposite.

Peter Gerdes's avatar

I'd argue that the only charitable way to understand the claim that one should be a Bayesian is in terms of a certain model/idealization as is true of classic epistemology and all the options you mentioned -- which seems to fit with the great points about model based reasoning -- but then I'm confused as to why you seem to think the idealizations made in these different epistemic approaches are right/wrong or in tension rather than merely useful or not in various situations.

I mean, in a fully literal sense of course one 'should not' be a Bayesian given that it is clearly impossible. Logical omniscience for one. But the same can be said for any normative account of how we should reason. Almost certainly our brain/world states plus the laws of physics guarantee we won't in fact so reason. At the very deepest level the whole idea of beliefs or credences is an idealization, a good one but one that can break down if you look too closely at what's going on in the brain. All talk of how we should reason seems to inherently be a certain kind of idealization/model that ignores certain facts and attends to others.

Part of the way the Bayesian model idealizes is by assuming we can describe things in terms of sets of outcomes that do or don't materialize. That's a useful idealization in many situations and elsewhere we might find another useful.

So I guess I don't really understand the sense in which you are saying Bayesianism is wrong rather than just saying sometimes it's not the most useful idealization. Are you suggesting there is a fully literal sense in which some other model of epistemic reasoning is correct which isn't making any idealizations at all? That some other model subsumes Bayesianism in the way that QM seemingly subsumes classical EM? Some thinrd thing?

Blake Putney's avatar

Excellent piece, Richard—your critique of Bayesian epistemology as propositional credences hits exactly why it feels brittle for real-world reasoning: the vagueness, context-dependence, nebulosity, and heavy lifting of model construction that crisp priors and updates simply don’t address.

I believe John Boyd’s OODA loop offers a cleaner, more powerful alternative for operating in that same uncertain terrain. Start from zero: everything is unknown and provisional. You Observe raw signals, Orient by synthesizing (and ruthlessly destroying) mental models—implicitly fuzzy and holistic, not proposition-by-proposition—then Decide and Act. The loop immediately feeds back, revising goals, models, or both.

The decisive variable isn’t the precision of your credences; it’s tempo. Cycle faster than the environment (or your adversary) changes, and you converge on reality—or reshape the problem—while others are still updating priors. Boyd proved this in air combat and strategy: the side that iterates quickest wins by getting inside the other’s loop.

This directly tackles the gaps you flag—Knightian uncertainty, sophisticated model-building, “vaguely right over precisely wrong.” OODA doesn’t calculate degrees of belief in advance; it tests and evolves them through action in real time. No logical omniscience required.

Strongly recommend it as the practical epistemology for anyone building in fast-moving, high-stakes domains. Would love your take on whether OODA’s implicit guidance/orientation phase could formalize some of the fuzzy/model-based reasoning you’re pointing toward.

Inner Peace Arts's avatar

I think the core point — not confusing confidence in propositions with structural adequacy of models — is sound. But that doesn’t overturn Bayesian reasoning so much as clarify its scope. Bayesian updating is powerful within a defined hypothesis space; model construction and evaluation operate at a different layer. Most serious practitioners already distinguish between calibrated belief and representational adequacy. So the critique feels less like a revolution and more like a reminder against overreach. It sharpens an important boundary, but it doesn’t undermine the strengths of probabilistic reasoning where it actually applies.

citrit's avatar

updating my priors based on this post

Cephalo Monk's avatar

I don’t know if its in this paper linked below, or in his book “Surfing Uncertainty” but even Andy Clark knows that Predictive Processing isn’t a GUT (Grand Unifying Theory). He says something to the effect that its a good umbrella to start stitching these phenomena together. I like and can appreciate this perspective. A place at which we can start to stitch, seems promising, as it allows incremental growth and decay of the theory when necessary.

https://www.fil.ion.ucl.ac.uk/~karl/Whatever%20next.pdf

JS036215's avatar

In the examples you supply to argue for fuzzy logic because of linguistic ambiguity or imprecision, you confuse words with their referents by calling words "concepts" and confuse elliptical statements with false statements by equating alternative interpretations of elliptical statements. For example, if I refer to the tallest building in the world as "tall", you know that I mean " tall in comparison to other buildings." and not "tall in comparison to mountains." but you claim that my statement's truth is fuzzy. No, what I meant was either true or false but that can only be decided about a correct interpretation of what I meant (what concepts I intended to convey) with the words that I used.

You can correctly or incorrectly interpret a statement and, independently, the product of your interpretation can be true or false. This is reality for everyone.

Model-based reasoning offers no special context of truth value determination beyond the decision whether to test the model's predictions or to test the truth of its theoretical basis.

How you use the model's predictions determines its value to you. The issue of truth comes up when you examine the model's theoretical basis. You ask, "Are the explanations offered for the model's usefulness true?" The model has a degree of usefulness independent of whether the explanations for the model's utility are true but the explanations will either be true or false, not a degree in between.

There is a different sense of "partly true" that people commonly use. For example, if I say that I went to a party and wore a suit but I actually went to a party and wore casual clothes, then you could say that my statements were partly true or had a degree of truth, but this is not an example of fuzzy truth as you intend to describe it.

Anyway, I am also not a Bayesian.

Devadatta's avatar

Probably the model of Earth used in calculations by most people, accepted by scientifically educated people is a point, which is the most reductionistic model possible. I'm talking about students learning Newtons law of gravity. So, which is the best model of Earth, flat or pointlike?

Paulin's avatar

Is the idea that if we manage to quantify the vagueness, approximation, context-dependence, and sense of a given proposition, we can then imagine a probability distribution over this 4D space?

This is getting tough to imagine but I think it makes sense

Manjari Narayan's avatar

Thanks for bringing these ideas together. I didn't know about Popper's degrees of truth formulation before.

Your comment about Bayesianism being useful for choosing between models reminds me a bit about how statisticians who evaluate model selection procedures evaluate these claims under 3 different settings — M-open, M-closed and M-complete problems. M-closed scenario corresponds to a situation where you have a finite number of models and one of them is actually true. It is when scientific problems don't correspond to M-closed problems that classical Bayesian methods don't work well. Many Bayesian statisticians have moved towards making Bayes-NonBayes compromises by adopt frequentist methods like cross-validation in order to tackle M-open problems.

https://link.springer.com/article/10.1007/s42113-018-0020-6