The "Bayesianism" being argued against here seems to actually be "Yudkowskianism" and clicking through your references digs up a lot of LessWrong posts. If so, I mostly agree with the substance of your criticism. In particular, I agree that attempting to tag probabilities onto propositions without making explicit and understanding the model that translates context into application of Bayes' rule (which I call the "Yudkowskian vice" by analogy with the "Ricardian vice" of economics) makes many if not most attempts at Internet rationalism fail before they have even started. However, common Bayesian practice outside of Internet forums (going back at least to Jaynes's seminal book, I won't make claims about the past further than that, I'm not a historian of science) does give models exactly the primacy you mention. (This is also not a philosophical innovation, but rather an ipsis litteris implementation of Quinean holism, an idea from the 1940s. And Quine was a *popular philosopher!*.) Indeed, for a few decades already we've gone past that and into explicitly conditioning first-order variables on model-valued variables, and then performing higher-order inference on model space, which indeed lets us solve (approximately) any kind of problem. For a particularly clear exposition of modern Bayesianism, check out Richard McElreath's "Statistical Rethinking", both the book and the accompanying lectures on YouTube.
On your first gripe, with "degrees of truth", I claim it's a fundamentally misguided concept. "P='The Earth is a sphere' is mostly true" is not a statement of a scalar attribute of a proposition that's just waiting to be quantified, it's a statement about the applicability of the proposition -- under which conditions it's OK to make this approximation. It's a convenient way to say "If you're trying to use the truth value of P to make claims/decisions about astronomy, then it's True; if you're trying to use it to make claims/decisions about some specific kind of engineering that cares about the exact distances to the center of the Earth, gravity, etc, then it's False." To say that "P is 99.99% true" may be logically possible in principle by somehow summing over model space, but it has no usefulness, because it misses the all-important fact that makes the "mostly true" statement useful -- under which conditions it's to be taken as true!
Ty for the helpful comment. A few quick responses:
1. I've changed the second paragraph to read "The core idea of Bayesian epistemology: we should ideally reason by assigning credences to propositions which represent our degrees of belief that those propositions are true. (Note that this is different from Bayesianism as a set of statistical techniques, or Bayesianism as an approach to machine learning, which I don’t discuss here.)" Bayesian epistemology now links to the corresponding Stanford Encyclopedia Article - while this post was inspired by Yudkowskyism, I think academic philosophers make similar mistakes.
2. After skimming the first few chapters of the McElreath book, it seems like he focuses on using Bayesian methods to learn model parameters. But my focus is on the interactions *between* models (since humans don't learn models using Bayesian methods).
3. "To say that "P is 99.99% true" ... has no usefulness". Well, I'd certainly say it's useful to know that "the earth is a sphere" is more true than "the earth is a square". You might then say that we can't pin down any real numbers; I won't address that here, but will try to do so in a follow-up post.
You're welcome, and likewise thank you for the responses. In order:
1. I haven't tracked the development of academic Bayesian epistemology, but if a philosopher makes such mistakes, then it seems to me like yet another case of using plausible-looking math as a new varnish on bad old philosophy (some kind of naive empiricism?). When looking at it from the other direction -- at what modern information and probability theory really seems to be saying about epistemology -- it seems a lot more compatible with a Quinean holism, instrumentalism a la Dennett and others, etc. (Note that one can indeed prove theorems that *assume* that there exists a certain real generative process in the world that generates observations about a system, and show that under certain conditions on its observability etc, Bayesian updating (on models, not atomic propositions!) will converge on a truthful representation of this process. However, this assumption is not at all required for operationalizing the theory, of course.)
2. McElreath covers a lot of hierarchical inference (where higher-level parameters are effectively parameterizing distributions over lower-level models) in later chapters, but he does not indeed cover some key parts of "structure learning" - where your candidate models come from, how to combine models). Is that your issue, or were you talking about something else?
3. Saying "P is 99.99% true" without conditioning on a model means that it is true after simultaneously marginalizing over every possible logical antecedent. I can give you a distribution over models such that "The Earth is a square" has a higher posterior probability than "The Earth is a sphere". In order to say this is wrong, you need to condition the allowable distribution of models on a wide set of definitional and observational parameters and then marginalize them away. This easily becomes uncomputable. The alternative is to a priori constrain your model set, and the interesting question is which specific degrees of freedom make a difference.
The Earth is a sphere to a certain degree of approximation, sufficient for most purposes. There are no exact spheres in physical reality, only more or less spherical depending on what resolution of understanding is required.
Regarding vagueness and pragmatic interpretation of things like "large" and "water in the fridge", it's worth pointing out that the best models of these phenomena (in terms of accurately predicting how people will interpret utterances) are in fact probabilistic Bayesian models. For example, as outlined here https://www.problang.org/chapters/05-vagueness.html
These are models where utterance interpretation is a process whereby a listener does Bayesian inference about what an *informative* speaker would say, where that speaker is reasoning about a listener, who is in turn reasoning about a speaker, and so on recursively. The recursion bottoms out in a base case consisting of truth-value semantics, for example for a word like "large" the semantics is "x is large if x is larger than a threshold θ", where θ is a free parameter whose value ends up being inferred probabilistically as part of the recursive reasoning process. The effect is that the (distribution on the) threshold θ ends up being set with respect to a reference class that would make the utterance informative based on the world models of the simulated listeners. For example if I say "Jupiter is large" then you will end up inferring a threshold θ that would make sense for planets, and if I say "my thumb is large" you'll end up inferring a threshold θ that would make sense for bodily appendages.
Which is all to say: a more complex, but still fully Bayesian and probabilistic process, grounding out in definite truth values, actually provides a very good model for how people use vague expressions like this. There's a pretty big academic literature on models like this, of which the size threshold above is just one example.
“Of course, nonsense is also a matter of degree—e.g. metaphors are by default less meaningful than concrete claims, but still not entirely nonsense.”
Very much disagree. Nonsense is where one could not make sense of a proposition, so that the proposition couldn’t convey anything (like “this sentence is false” or “colorless green ideas sleep furiously”). Nothing gets conveyed from truly nonsensical statements.
Metaphors convey significant meaning and there is no reason that metaphor that contain a higher level meaning are automatically deemed less meaningful than literal statements. We even understand much of the world through metaphor rather than direct claims. Not to mention that all of art and literate are in a sense metaphorical, which I wouldn’t say is less meaningful than statements of fact or logic.
All things that are fully understood can be explained in simple physical metaphors, which express basic relationships and are the foundation of language.
I suspect that Ngo believes there's an interesting "dimension" along which we can use to justify why fuzzy truth values may be more useful than binary truth values that is "orthogonal" to "vagueness", "approximation", etc., and Ngo chooses to label that dimension (sense vs) "nonsense". This may or may not align with one's intuition about what the word "nonsense" means in a more general context. In essence, they are introducing a term that they intend to use in a limited technical sense for the purpose of their argument, and they are (implicitly) providing a definition for this term.
And so I'm worried that perhaps you have some pre-existing intuition of what the term "nonsense" might mean (which is fine, probably almost all English speakers do) and then you're getting stuck on this part of Ngo's argument because your intuition about what the word means doesn't match what you've inferred Ngo's definition to be. This part is "not fine" if your goal is to "understand" Ngo's argument.
I think it's analogous to a situation where a somewhat technical blog post (more formal than casual conversation but less formal than an academic paper) might say something like "I want to define the term 'sphere' to mean the set of all points equidistant from some other given point, regardless of the number of dimensions we are working in, and regardless of how we measure distance. So for example, in a 2D world where we use taxicab distance, a 'sphere' looks like a jagged-edged diamond https://en.wikipedia.org/wiki/Taxicab_geometry#/media/File:TaxicabGeometryCircle.svg "
In my analogy, the "right" thing to do, if the goal is to understand the argument that the blog post is presenting, is to just (temporarily) accept the definition the author chose, and see whether their argument makes sense when this definition is taken as an axiom. It's "not useful" to object "but that's not what a 'sphere' is!"
I'd argue that the only charitable way to understand the claim that one should be a Bayesian is in terms of a certain model/idealization as is true of classic epistemology and all the options you mentioned -- which seems to fit with the great points about model based reasoning -- but then I'm confused as to why you seem to think the idealizations made in these different epistemic approaches are right/wrong or in tension rather than merely useful or not in various situations.
I mean, in a fully literal sense of course one 'should not' be a Bayesian given that it is clearly impossible. Logical omniscience for one. But the same can be said for any normative account of how we should reason. Almost certainly our brain/world states plus the laws of physics guarantee we won't in fact so reason. At the very deepest level the whole idea of beliefs or credences is an idealization, a good one but one that can break down if you look too closely at what's going on in the brain. All talk of how we should reason seems to inherently be a certain kind of idealization/model that ignores certain facts and attends to others.
Part of the way the Bayesian model idealizes is by assuming we can describe things in terms of sets of outcomes that do or don't materialize. That's a useful idealization in many situations and elsewhere we might find another useful.
So I guess I don't really understand the sense in which you are saying Bayesianism is wrong rather than just saying sometimes it's not the most useful idealization. Are you suggesting there is a fully literal sense in which some other model of epistemic reasoning is correct which isn't making any idealizations at all? That some other model subsumes Bayesianism in the way that QM seemingly subsumes classical EM? Some thinrd thing?
I am very happy that someone- you- made a point about model view of the world. I intuitively tried to argue that but never was able to do it successfully. And you did it! Thank you!
In the examples you supply to argue for fuzzy logic because of linguistic ambiguity or imprecision, you confuse words with their referents by calling words "concepts" and confuse elliptical statements with false statements by equating alternative interpretations of elliptical statements. For example, if I refer to the tallest building in the world as "tall", you know that I mean " tall in comparison to other buildings." and not "tall in comparison to mountains." but you claim that my statement's truth is fuzzy. No, what I meant was either true or false but that can only be decided about a correct interpretation of what I meant (what concepts I intended to convey) with the words that I used.
You can correctly or incorrectly interpret a statement and, independently, the product of your interpretation can be true or false. This is reality for everyone.
Model-based reasoning offers no special context of truth value determination beyond the decision whether to test the model's predictions or to test the truth of its theoretical basis.
How you use the model's predictions determines its value to you. The issue of truth comes up when you examine the model's theoretical basis. You ask, "Are the explanations offered for the model's usefulness true?" The model has a degree of usefulness independent of whether the explanations for the model's utility are true but the explanations will either be true or false, not a degree in between.
There is a different sense of "partly true" that people commonly use. For example, if I say that I went to a party and wore a suit but I actually went to a party and wore casual clothes, then you could say that my statements were partly true or had a degree of truth, but this is not an example of fuzzy truth as you intend to describe it.
Probably the model of Earth used in calculations by most people, accepted by scientifically educated people is a point, which is the most reductionistic model possible. I'm talking about students learning Newtons law of gravity. So, which is the best model of Earth, flat or pointlike?
Is the idea that if we manage to quantify the vagueness, approximation, context-dependence, and sense of a given proposition, we can then imagine a probability distribution over this 4D space?
This is getting tough to imagine but I think it makes sense
Thanks for bringing these ideas together. I didn't know about Popper's degrees of truth formulation before.
Your comment about Bayesianism being useful for choosing between models reminds me a bit about how statisticians who evaluate model selection procedures evaluate these claims under 3 different settings — M-open, M-closed and M-complete problems. M-closed scenario corresponds to a situation where you have a finite number of models and one of them is actually true. It is when scientific problems don't correspond to M-closed problems that classical Bayesian methods don't work well. Many Bayesian statisticians have moved towards making Bayes-NonBayes compromises by adopt frequentist methods like cross-validation in order to tackle M-open problems.
I’d prefer to keep updating my evidence and probabilities for everything 🙂👌
I’d prefer not to assume I know everything and understand the whole situation/distribution…
The reason why we have so many problems in the world is because it’s dominated by frequentists who assume they know everything and don’t need to update their evidence 😁
We need more baysians and baysian thinking, people open to new evidence 🙂👌
The notion of degrees of truth (taken literally rather than symbolically) implies the rejection of the law of excluded middle, and is therefore a priori, provably false. Taken symbolically, it may imply certainty about some mathematical function/model applied to a data-set, which may or may not have objective validity, therefore is only a sophisticated assumption and not knowledge of objective reality. The phrase “mostly true” implies, logically, totally false. The qualifier “mostly” applies to the model, not to the reality that the model purports to represent, hence an implicit category mistake. Perhaps the problem here is just the inconsistent use of language, which confuses us about what is meant. When we speak in modelling terms (make scientific claims about the world as we know it), rationality demands that we ought to signify only the implications of the model, not purport to make claims about objective reality. Nevertheless, models that do not cohere cannot possibly be true in the same world; the most coherent model is the current “world as we know it”, but this is no guarantee of objective reality of universal truth.
> The notion of degrees of truth (taken literally rather than symbolically) implies the rejection of the law of excluded middle, and is therefore a priori, provably false.
I mean, it's only a priori provably false if you assume the law of excluded middle as one of your axioms. If you don't assume it's one of your axioms, then it's (probably) not a priori provably false. And further more, if you assume the negation of the law of the excluded middle as one of your axioms, then it's a priori, provably true!
Perhaps you know this already, but just in case... there *are* systems of logic which are "taken seriously" which reject the law of excluded middle. https://en.wikipedia.org/wiki/Intuitionistic_logic is one example.
> The qualifier “mostly” applies to the model, not to the reality that the model purports to represent, hence an implicit category mistake.
What's an example of a qualifier that *does* apply to the reality as opposed to the model? Because, in so far as you use a language of some sort (even some sort of non-text-based language) to express such a qualifier, you can only possibly succeed at expressing it to the degree that the qualifier refers to some concept which is expressible, and thus is part of a *model* of reality, and not necessarily reality itself.
Like, if you talk about the "mass" of "particles" or whatever, we don't know whether the underlying reality actually has this concept of "mass", or if that's just your model, and reality is something more complicated of which "mass" is some sort of over-simplified emergent phenomena.
Or if you talk about "existence" and "objectively shared", like "either an objective shared reality exists or an objective shared reality doesn't exist", again, that's just a model, and maybe reality is neither "objectively shared" nor "not objectively shared", and neither "exists" nor "doesn't exist", but is something more complicated, and a model which assumes "existence" and "objective-sharedness" are meaningful concepts is an oversimplification of whatever reality *actually* is.
The law of excluded middle is not an optional axiom, because it is one of the fundamental laws of sense, which are mutually conditional; to reject any of the laws implies rejecting them all. You cannot speak or think meaningfully without complying with it, and one cannot meaningfully defend “intuitionist logic” or any system without implicitly affirming the law of excluded middle. Even just to say that ‘intuitionist logic is valid’ implies acceptance of the law (it does not say 56% valid, which would be nonsense). For example, the expression ‘X is 99% true’ (which taken literally is nonsense, confusion of logical types that can be formally shown to imply contradiction) is typically used to signify that in a particular model the ‘probability that X is 99%’ is (100%) true. Yes, humans take a lot of nonsense ‘seriously’, and suffer the consequences. All these common questions about logic are carefully answered here, with formal proofs where necessary: https://www.amazon.com/dp/1763717216
> You cannot speak or think meaningfully without complying with it, and one cannot meaningfully defend “intuitionist logic” or any system without implicitly affirming the law of excluded middle. Even just to say that ‘intuitionist logic is valid’ implies acceptance of the law (it does not say 56% valid, which would be nonsense)
I think your reasoning is incorrect.
The Law of Excluded Middle (LoEM) basically states that for all propositions, that proposition is either true or false.
You're claiming that *because* a statement like "Intuitionistic logic is valid" assigns a truth value of "true" to some proposition (e.g. the proposition that "intuitionistic logic is valid"), this "affirms" the LoEM.
It seems like perhaps you're assuming that if some logical system rejects LoEM, it may therefore never assign a truth value of "true" or "false" to any proposition, because the mere act of assigning "true" or "false" to some proposition is proof that LoEM is true.
This is a fallacy. A logical system that rejects LoEM may assign values of "true" or "false" to some or even the vast majority of propositions, but assigns some other value to a small subset of propositions.
Therefore, the mere fact that we can utter "Intuitionistic logic is valid" (or the mere fact that we might assign the truth value of "true" to the proposition "Intuitionistic logic is valid") is not sufficient to prove that that LoEM is true. You must additionally prove that no other proposition has any truth value *other* than "true" or "false".
I am saying that rejecting LEM in any case implies its rejection as a universal law, which in turn implies contradiction. This can be formally proven.
Informally, without LEM all words lose their meaning, as they could mean anything in-between their affirmation an negation, and if one were to make additional statements affirming a definite (binary) truth-value of individual words in the primary statement, such additional statements would themselves be indefinite as to their truth value, therefore infinite regress.
Thank you for clarifying, I had misunderstood your position.
I don't think that rejecting LEM necessarily implies its contradiction (unless you (inconsistently) also assume LEM).
It seems coherent to me to say that you're going to try to develop a logical system where you're initially unsure whether LEM is true or not (but you do have some other axioms), and then see how far you can get. Maybe you'll end up proving LEM anyway in that system, or maybe you'll prove its negation, or maybe it'll turn out that your system is consistent with both LEM and not-LEM.
And I think you might be (unintentionally?) sneaking in a subtle but important equivocation error in here. One can "reject LEM" in the sense of "I'm using a logical system that takes, as an axiom, 'LEM is false'", and then one can deduce "Since (in my logic system) 'LEM is false', I can deduce that 'LEM (is true)' is false", all without relying on LEM (and thus avoiding proving that LEM is true). Specifically, your logical system can rely on "If a proposition has some truth value X, it cannot simultaneously have a different truth value Y where X ≠ Y" without relying on there only being two possible values (i.e. "true" and "false") for X and Y. That latter part is necessary for LEM, but logical systems can be coherent without it (and thus without LEM).
> Informally, without LEM all words lose their meaning, as they could mean anything in-between their affirmation an negation
I don't think this is true, empirically. Again, you can assign the definite truth-value "true" and "false" to propositions in a logical system without relying on LEM. LEM requires that *all* propositions only have either "true" or "false" for their truth values. Your assertion assumes that reject LEM means assuming *no* propositions have neither "true" nor "false" for their truth values, but that is an incorrect reasoning. It's sufficient for only one proposition to have a value other than "true" or "false" to provide a system that reject LEM. All other propositions could have a truth-value of either "truth" or "false", and thus be "meaningful" as per your criteria for that concept.
> if one were to make additional statements affirming a definite (binary) truth-value of individual words in the primary statement, such additional statements would themselves be indefinite as to their truth value, therefore infinite regress.
I guess you're taking as an axiom that this is a bad thing? It's not obvious to me that this is necessarily bad. I agree that it's certain much more *complicated* than classical logic, but it might be "better" than classical logic in the sense that I think it's a more accurate and realistic system for describing of how intellectually honest humans actually reason.
Like, in so far as you think what you just said is coherent, what you're describing could be seen as a desirable property for some logicians working in certain domains, and thus give an motivation for why one would *want* a logical system that rejects LEM.
The proof is rather straight forward. You can start with the formal premise that there is an X for which LEM is false, which implies that there is a Y that is both not X and not not-X, and this resolves a priori to contradiction. Or one could proceed from the law of identity instead, showing that any X for which LEM is not true is not identical to itself, or show the logical interdependence of the three laws: I did all these in the book and explained it informally as well as I could.
I am not a mathematician nor a statistician. I do have about seven years of graduate study in math specializing in dynamical systems theory. One of the things that most annoys me about the bayesian vs frequentist debate is that the possibility of non-stationary processes is not even mentioned. Since the 1990s with the emergence of the Santa Fe Institute and complex adaptive systems theory, why are so many statisticians stuck in the early 20th century? How can you talk about logic based truth with a straight face? Are you all bereft of any awareness of the complexity of relationships where logic is often tossed into the garbage? Now, if all such statisticians are Sheldon Cooper clones who haven’t an iota of common sense about how deep relationships can go then it all makes sense.
1. The point about talking in Bayesian terms about discovering scientific theories being nonsense is right on. You can actually mathematically see the point of failure. Namely, nearly all Bayesian reasoning is done on spaces of possibilities (like a space of possibly true scientific hypotheses) where to do basic things like marginalization, you assume the possibilities are exclusive and exhaustive. But science is an especially egregious area where the exhaustiveness assumption is often obviously wrong. But this problem also applies generally. You may add to the space a 'not one of the other possibilities' proposition but how do you reason with that?
Moreover, there's the problem about whether the exclusivity assumption applies to statistical models where one might say the probability of getting heads on a coin flip is 0.5 and another 0.6; are these two models inconsistent?
2. I think thinking about Bayesianism as a special case with a range of validity is the way to think about it. I like to think about it in analogy with science where you start with theories with small ranges of validity and progressively try to expand the validity, often by taking what is valuable in the current theory and loosening the assumptions. Galilean Relativity -> Special Relativity for example.
My personal bias is that you won't get very far trying to come up with a better theory that tries to formalize fuzzy logic or properly incorporate approximating models. My bet is that it's better to start with a theory that tries for universal inference but doesn't directly codify such things. Rather, think of models and such as approximations to something underlying that, because of their approximate nature, shouldn't be expected to fit in a precise epistemology. Once you have an underlying theory, then you can potentially show when and where different approximation schema work or fail.
3. Language by its very nature seems imprecise. What if our attempts at trying to formalize language are the problem? For what it's worth, here are my ideas about it https://philpapers.org/rec/HASACT-4
1. I don't understand how you imply the problem you describe should be solved in any framework "better" than Bayesianism; it seems that "solving" that problem would be equivalent to solving the Frame Problem, which is known to be Turing non-computable.
Bayesians approach this as a model sampling problem, which in turn can be also seen from the Bayesian perspective: see e.g. works on GFlowNets from Yoshua Bengio's group.
Yeah I expect any such theory to be Turing non-computable but that's not a deal-breaker. In particle physics, all sufficiently useful interacting quantum field theories have not been exactly solved; they are too complex. Infact, a few mathematicians believe they have proven that all non-trivial interacting quantum field theories do not mathematically exist. But this hasn't stopped physicists from making the most accurate and precise predictions in all of science with tractable approximate calculations.
Moreover, the approximate calculations couldn't have been made without the equations describing the exact theory and defining the contours of the solution.
>I personally think that in order to explain why scientific theories can often predict a wide range of different phenomena, we need to make claims about how well they describe the structure of reality—i.e. how true they are.
I find his arguments against this view very convincing. The evidence against both reference -> more progress AND progress -> more reference seems really strong to me. Plus, how would you ever empirically test this theory?
The "Bayesianism" being argued against here seems to actually be "Yudkowskianism" and clicking through your references digs up a lot of LessWrong posts. If so, I mostly agree with the substance of your criticism. In particular, I agree that attempting to tag probabilities onto propositions without making explicit and understanding the model that translates context into application of Bayes' rule (which I call the "Yudkowskian vice" by analogy with the "Ricardian vice" of economics) makes many if not most attempts at Internet rationalism fail before they have even started. However, common Bayesian practice outside of Internet forums (going back at least to Jaynes's seminal book, I won't make claims about the past further than that, I'm not a historian of science) does give models exactly the primacy you mention. (This is also not a philosophical innovation, but rather an ipsis litteris implementation of Quinean holism, an idea from the 1940s. And Quine was a *popular philosopher!*.) Indeed, for a few decades already we've gone past that and into explicitly conditioning first-order variables on model-valued variables, and then performing higher-order inference on model space, which indeed lets us solve (approximately) any kind of problem. For a particularly clear exposition of modern Bayesianism, check out Richard McElreath's "Statistical Rethinking", both the book and the accompanying lectures on YouTube.
On your first gripe, with "degrees of truth", I claim it's a fundamentally misguided concept. "P='The Earth is a sphere' is mostly true" is not a statement of a scalar attribute of a proposition that's just waiting to be quantified, it's a statement about the applicability of the proposition -- under which conditions it's OK to make this approximation. It's a convenient way to say "If you're trying to use the truth value of P to make claims/decisions about astronomy, then it's True; if you're trying to use it to make claims/decisions about some specific kind of engineering that cares about the exact distances to the center of the Earth, gravity, etc, then it's False." To say that "P is 99.99% true" may be logically possible in principle by somehow summing over model space, but it has no usefulness, because it misses the all-important fact that makes the "mostly true" statement useful -- under which conditions it's to be taken as true!
Ty for the helpful comment. A few quick responses:
1. I've changed the second paragraph to read "The core idea of Bayesian epistemology: we should ideally reason by assigning credences to propositions which represent our degrees of belief that those propositions are true. (Note that this is different from Bayesianism as a set of statistical techniques, or Bayesianism as an approach to machine learning, which I don’t discuss here.)" Bayesian epistemology now links to the corresponding Stanford Encyclopedia Article - while this post was inspired by Yudkowskyism, I think academic philosophers make similar mistakes.
2. After skimming the first few chapters of the McElreath book, it seems like he focuses on using Bayesian methods to learn model parameters. But my focus is on the interactions *between* models (since humans don't learn models using Bayesian methods).
3. "To say that "P is 99.99% true" ... has no usefulness". Well, I'd certainly say it's useful to know that "the earth is a sphere" is more true than "the earth is a square". You might then say that we can't pin down any real numbers; I won't address that here, but will try to do so in a follow-up post.
You're welcome, and likewise thank you for the responses. In order:
1. I haven't tracked the development of academic Bayesian epistemology, but if a philosopher makes such mistakes, then it seems to me like yet another case of using plausible-looking math as a new varnish on bad old philosophy (some kind of naive empiricism?). When looking at it from the other direction -- at what modern information and probability theory really seems to be saying about epistemology -- it seems a lot more compatible with a Quinean holism, instrumentalism a la Dennett and others, etc. (Note that one can indeed prove theorems that *assume* that there exists a certain real generative process in the world that generates observations about a system, and show that under certain conditions on its observability etc, Bayesian updating (on models, not atomic propositions!) will converge on a truthful representation of this process. However, this assumption is not at all required for operationalizing the theory, of course.)
2. McElreath covers a lot of hierarchical inference (where higher-level parameters are effectively parameterizing distributions over lower-level models) in later chapters, but he does not indeed cover some key parts of "structure learning" - where your candidate models come from, how to combine models). Is that your issue, or were you talking about something else?
3. Saying "P is 99.99% true" without conditioning on a model means that it is true after simultaneously marginalizing over every possible logical antecedent. I can give you a distribution over models such that "The Earth is a square" has a higher posterior probability than "The Earth is a sphere". In order to say this is wrong, you need to condition the allowable distribution of models on a wide set of definitional and observational parameters and then marginalize them away. This easily becomes uncomputable. The alternative is to a priori constrain your model set, and the interesting question is which specific degrees of freedom make a difference.
The Earth is a sphere to a certain degree of approximation, sufficient for most purposes. There are no exact spheres in physical reality, only more or less spherical depending on what resolution of understanding is required.
Regarding vagueness and pragmatic interpretation of things like "large" and "water in the fridge", it's worth pointing out that the best models of these phenomena (in terms of accurately predicting how people will interpret utterances) are in fact probabilistic Bayesian models. For example, as outlined here https://www.problang.org/chapters/05-vagueness.html
These are models where utterance interpretation is a process whereby a listener does Bayesian inference about what an *informative* speaker would say, where that speaker is reasoning about a listener, who is in turn reasoning about a speaker, and so on recursively. The recursion bottoms out in a base case consisting of truth-value semantics, for example for a word like "large" the semantics is "x is large if x is larger than a threshold θ", where θ is a free parameter whose value ends up being inferred probabilistically as part of the recursive reasoning process. The effect is that the (distribution on the) threshold θ ends up being set with respect to a reference class that would make the utterance informative based on the world models of the simulated listeners. For example if I say "Jupiter is large" then you will end up inferring a threshold θ that would make sense for planets, and if I say "my thumb is large" you'll end up inferring a threshold θ that would make sense for bodily appendages.
Which is all to say: a more complex, but still fully Bayesian and probabilistic process, grounding out in definite truth values, actually provides a very good model for how people use vague expressions like this. There's a pretty big academic literature on models like this, of which the size threshold above is just one example.
“Of course, nonsense is also a matter of degree—e.g. metaphors are by default less meaningful than concrete claims, but still not entirely nonsense.”
Very much disagree. Nonsense is where one could not make sense of a proposition, so that the proposition couldn’t convey anything (like “this sentence is false” or “colorless green ideas sleep furiously”). Nothing gets conveyed from truly nonsensical statements.
Metaphors convey significant meaning and there is no reason that metaphor that contain a higher level meaning are automatically deemed less meaningful than literal statements. We even understand much of the world through metaphor rather than direct claims. Not to mention that all of art and literate are in a sense metaphorical, which I wouldn’t say is less meaningful than statements of fact or logic.
All things that are fully understood can be explained in simple physical metaphors, which express basic relationships and are the foundation of language.
I suspect that Ngo believes there's an interesting "dimension" along which we can use to justify why fuzzy truth values may be more useful than binary truth values that is "orthogonal" to "vagueness", "approximation", etc., and Ngo chooses to label that dimension (sense vs) "nonsense". This may or may not align with one's intuition about what the word "nonsense" means in a more general context. In essence, they are introducing a term that they intend to use in a limited technical sense for the purpose of their argument, and they are (implicitly) providing a definition for this term.
And so I'm worried that perhaps you have some pre-existing intuition of what the term "nonsense" might mean (which is fine, probably almost all English speakers do) and then you're getting stuck on this part of Ngo's argument because your intuition about what the word means doesn't match what you've inferred Ngo's definition to be. This part is "not fine" if your goal is to "understand" Ngo's argument.
I think it's analogous to a situation where a somewhat technical blog post (more formal than casual conversation but less formal than an academic paper) might say something like "I want to define the term 'sphere' to mean the set of all points equidistant from some other given point, regardless of the number of dimensions we are working in, and regardless of how we measure distance. So for example, in a 2D world where we use taxicab distance, a 'sphere' looks like a jagged-edged diamond https://en.wikipedia.org/wiki/Taxicab_geometry#/media/File:TaxicabGeometryCircle.svg "
In my analogy, the "right" thing to do, if the goal is to understand the argument that the blog post is presenting, is to just (temporarily) accept the definition the author chose, and see whether their argument makes sense when this definition is taken as an axiom. It's "not useful" to object "but that's not what a 'sphere' is!"
I'd argue that the only charitable way to understand the claim that one should be a Bayesian is in terms of a certain model/idealization as is true of classic epistemology and all the options you mentioned -- which seems to fit with the great points about model based reasoning -- but then I'm confused as to why you seem to think the idealizations made in these different epistemic approaches are right/wrong or in tension rather than merely useful or not in various situations.
I mean, in a fully literal sense of course one 'should not' be a Bayesian given that it is clearly impossible. Logical omniscience for one. But the same can be said for any normative account of how we should reason. Almost certainly our brain/world states plus the laws of physics guarantee we won't in fact so reason. At the very deepest level the whole idea of beliefs or credences is an idealization, a good one but one that can break down if you look too closely at what's going on in the brain. All talk of how we should reason seems to inherently be a certain kind of idealization/model that ignores certain facts and attends to others.
Part of the way the Bayesian model idealizes is by assuming we can describe things in terms of sets of outcomes that do or don't materialize. That's a useful idealization in many situations and elsewhere we might find another useful.
So I guess I don't really understand the sense in which you are saying Bayesianism is wrong rather than just saying sometimes it's not the most useful idealization. Are you suggesting there is a fully literal sense in which some other model of epistemic reasoning is correct which isn't making any idealizations at all? That some other model subsumes Bayesianism in the way that QM seemingly subsumes classical EM? Some thinrd thing?
Bayesian training is essentiality that as long as you iterate properly when faced with new information you will always approach truth.
Fantastic post! Thank you!
I am very happy that someone- you- made a point about model view of the world. I intuitively tried to argue that but never was able to do it successfully. And you did it! Thank you!
How would you respond to “Bayesean network “ a model that theoretically could scale to describe general case
In the examples you supply to argue for fuzzy logic because of linguistic ambiguity or imprecision, you confuse words with their referents by calling words "concepts" and confuse elliptical statements with false statements by equating alternative interpretations of elliptical statements. For example, if I refer to the tallest building in the world as "tall", you know that I mean " tall in comparison to other buildings." and not "tall in comparison to mountains." but you claim that my statement's truth is fuzzy. No, what I meant was either true or false but that can only be decided about a correct interpretation of what I meant (what concepts I intended to convey) with the words that I used.
You can correctly or incorrectly interpret a statement and, independently, the product of your interpretation can be true or false. This is reality for everyone.
Model-based reasoning offers no special context of truth value determination beyond the decision whether to test the model's predictions or to test the truth of its theoretical basis.
How you use the model's predictions determines its value to you. The issue of truth comes up when you examine the model's theoretical basis. You ask, "Are the explanations offered for the model's usefulness true?" The model has a degree of usefulness independent of whether the explanations for the model's utility are true but the explanations will either be true or false, not a degree in between.
There is a different sense of "partly true" that people commonly use. For example, if I say that I went to a party and wore a suit but I actually went to a party and wore casual clothes, then you could say that my statements were partly true or had a degree of truth, but this is not an example of fuzzy truth as you intend to describe it.
Anyway, I am also not a Bayesian.
Probably the model of Earth used in calculations by most people, accepted by scientifically educated people is a point, which is the most reductionistic model possible. I'm talking about students learning Newtons law of gravity. So, which is the best model of Earth, flat or pointlike?
Is the idea that if we manage to quantify the vagueness, approximation, context-dependence, and sense of a given proposition, we can then imagine a probability distribution over this 4D space?
This is getting tough to imagine but I think it makes sense
Thanks for bringing these ideas together. I didn't know about Popper's degrees of truth formulation before.
Your comment about Bayesianism being useful for choosing between models reminds me a bit about how statisticians who evaluate model selection procedures evaluate these claims under 3 different settings — M-open, M-closed and M-complete problems. M-closed scenario corresponds to a situation where you have a finite number of models and one of them is actually true. It is when scientific problems don't correspond to M-closed problems that classical Bayesian methods don't work well. Many Bayesian statisticians have moved towards making Bayes-NonBayes compromises by adopt frequentist methods like cross-validation in order to tackle M-open problems.
https://link.springer.com/article/10.1007/s42113-018-0020-6
That’s a shame you’re not.
I’d prefer to keep updating my evidence and probabilities for everything 🙂👌
I’d prefer not to assume I know everything and understand the whole situation/distribution…
The reason why we have so many problems in the world is because it’s dominated by frequentists who assume they know everything and don’t need to update their evidence 😁
We need more baysians and baysian thinking, people open to new evidence 🙂👌
The notion of degrees of truth (taken literally rather than symbolically) implies the rejection of the law of excluded middle, and is therefore a priori, provably false. Taken symbolically, it may imply certainty about some mathematical function/model applied to a data-set, which may or may not have objective validity, therefore is only a sophisticated assumption and not knowledge of objective reality. The phrase “mostly true” implies, logically, totally false. The qualifier “mostly” applies to the model, not to the reality that the model purports to represent, hence an implicit category mistake. Perhaps the problem here is just the inconsistent use of language, which confuses us about what is meant. When we speak in modelling terms (make scientific claims about the world as we know it), rationality demands that we ought to signify only the implications of the model, not purport to make claims about objective reality. Nevertheless, models that do not cohere cannot possibly be true in the same world; the most coherent model is the current “world as we know it”, but this is no guarantee of objective reality of universal truth.
> The notion of degrees of truth (taken literally rather than symbolically) implies the rejection of the law of excluded middle, and is therefore a priori, provably false.
I mean, it's only a priori provably false if you assume the law of excluded middle as one of your axioms. If you don't assume it's one of your axioms, then it's (probably) not a priori provably false. And further more, if you assume the negation of the law of the excluded middle as one of your axioms, then it's a priori, provably true!
Perhaps you know this already, but just in case... there *are* systems of logic which are "taken seriously" which reject the law of excluded middle. https://en.wikipedia.org/wiki/Intuitionistic_logic is one example.
> The qualifier “mostly” applies to the model, not to the reality that the model purports to represent, hence an implicit category mistake.
What's an example of a qualifier that *does* apply to the reality as opposed to the model? Because, in so far as you use a language of some sort (even some sort of non-text-based language) to express such a qualifier, you can only possibly succeed at expressing it to the degree that the qualifier refers to some concept which is expressible, and thus is part of a *model* of reality, and not necessarily reality itself.
Like, if you talk about the "mass" of "particles" or whatever, we don't know whether the underlying reality actually has this concept of "mass", or if that's just your model, and reality is something more complicated of which "mass" is some sort of over-simplified emergent phenomena.
Or if you talk about "existence" and "objectively shared", like "either an objective shared reality exists or an objective shared reality doesn't exist", again, that's just a model, and maybe reality is neither "objectively shared" nor "not objectively shared", and neither "exists" nor "doesn't exist", but is something more complicated, and a model which assumes "existence" and "objective-sharedness" are meaningful concepts is an oversimplification of whatever reality *actually* is.
The law of excluded middle is not an optional axiom, because it is one of the fundamental laws of sense, which are mutually conditional; to reject any of the laws implies rejecting them all. You cannot speak or think meaningfully without complying with it, and one cannot meaningfully defend “intuitionist logic” or any system without implicitly affirming the law of excluded middle. Even just to say that ‘intuitionist logic is valid’ implies acceptance of the law (it does not say 56% valid, which would be nonsense). For example, the expression ‘X is 99% true’ (which taken literally is nonsense, confusion of logical types that can be formally shown to imply contradiction) is typically used to signify that in a particular model the ‘probability that X is 99%’ is (100%) true. Yes, humans take a lot of nonsense ‘seriously’, and suffer the consequences. All these common questions about logic are carefully answered here, with formal proofs where necessary: https://www.amazon.com/dp/1763717216
> You cannot speak or think meaningfully without complying with it, and one cannot meaningfully defend “intuitionist logic” or any system without implicitly affirming the law of excluded middle. Even just to say that ‘intuitionist logic is valid’ implies acceptance of the law (it does not say 56% valid, which would be nonsense)
I think your reasoning is incorrect.
The Law of Excluded Middle (LoEM) basically states that for all propositions, that proposition is either true or false.
You're claiming that *because* a statement like "Intuitionistic logic is valid" assigns a truth value of "true" to some proposition (e.g. the proposition that "intuitionistic logic is valid"), this "affirms" the LoEM.
It seems like perhaps you're assuming that if some logical system rejects LoEM, it may therefore never assign a truth value of "true" or "false" to any proposition, because the mere act of assigning "true" or "false" to some proposition is proof that LoEM is true.
This is a fallacy. A logical system that rejects LoEM may assign values of "true" or "false" to some or even the vast majority of propositions, but assigns some other value to a small subset of propositions.
Therefore, the mere fact that we can utter "Intuitionistic logic is valid" (or the mere fact that we might assign the truth value of "true" to the proposition "Intuitionistic logic is valid") is not sufficient to prove that that LoEM is true. You must additionally prove that no other proposition has any truth value *other* than "true" or "false".
I am saying that rejecting LEM in any case implies its rejection as a universal law, which in turn implies contradiction. This can be formally proven.
Informally, without LEM all words lose their meaning, as they could mean anything in-between their affirmation an negation, and if one were to make additional statements affirming a definite (binary) truth-value of individual words in the primary statement, such additional statements would themselves be indefinite as to their truth value, therefore infinite regress.
Thank you for clarifying, I had misunderstood your position.
I don't think that rejecting LEM necessarily implies its contradiction (unless you (inconsistently) also assume LEM).
It seems coherent to me to say that you're going to try to develop a logical system where you're initially unsure whether LEM is true or not (but you do have some other axioms), and then see how far you can get. Maybe you'll end up proving LEM anyway in that system, or maybe you'll prove its negation, or maybe it'll turn out that your system is consistent with both LEM and not-LEM.
And I think you might be (unintentionally?) sneaking in a subtle but important equivocation error in here. One can "reject LEM" in the sense of "I'm using a logical system that takes, as an axiom, 'LEM is false'", and then one can deduce "Since (in my logic system) 'LEM is false', I can deduce that 'LEM (is true)' is false", all without relying on LEM (and thus avoiding proving that LEM is true). Specifically, your logical system can rely on "If a proposition has some truth value X, it cannot simultaneously have a different truth value Y where X ≠ Y" without relying on there only being two possible values (i.e. "true" and "false") for X and Y. That latter part is necessary for LEM, but logical systems can be coherent without it (and thus without LEM).
> Informally, without LEM all words lose their meaning, as they could mean anything in-between their affirmation an negation
I don't think this is true, empirically. Again, you can assign the definite truth-value "true" and "false" to propositions in a logical system without relying on LEM. LEM requires that *all* propositions only have either "true" or "false" for their truth values. Your assertion assumes that reject LEM means assuming *no* propositions have neither "true" nor "false" for their truth values, but that is an incorrect reasoning. It's sufficient for only one proposition to have a value other than "true" or "false" to provide a system that reject LEM. All other propositions could have a truth-value of either "truth" or "false", and thus be "meaningful" as per your criteria for that concept.
> if one were to make additional statements affirming a definite (binary) truth-value of individual words in the primary statement, such additional statements would themselves be indefinite as to their truth value, therefore infinite regress.
I guess you're taking as an axiom that this is a bad thing? It's not obvious to me that this is necessarily bad. I agree that it's certain much more *complicated* than classical logic, but it might be "better" than classical logic in the sense that I think it's a more accurate and realistic system for describing of how intellectually honest humans actually reason.
Like, in so far as you think what you just said is coherent, what you're describing could be seen as a desirable property for some logicians working in certain domains, and thus give an motivation for why one would *want* a logical system that rejects LEM.
The proof is rather straight forward. You can start with the formal premise that there is an X for which LEM is false, which implies that there is a Y that is both not X and not not-X, and this resolves a priori to contradiction. Or one could proceed from the law of identity instead, showing that any X for which LEM is not true is not identical to itself, or show the logical interdependence of the three laws: I did all these in the book and explained it informally as well as I could.
I am happy that I see someone besides the members of my university research circle talking about fuzzy logic.
I am not a mathematician nor a statistician. I do have about seven years of graduate study in math specializing in dynamical systems theory. One of the things that most annoys me about the bayesian vs frequentist debate is that the possibility of non-stationary processes is not even mentioned. Since the 1990s with the emergence of the Santa Fe Institute and complex adaptive systems theory, why are so many statisticians stuck in the early 20th century? How can you talk about logic based truth with a straight face? Are you all bereft of any awareness of the complexity of relationships where logic is often tossed into the garbage? Now, if all such statisticians are Sheldon Cooper clones who haven’t an iota of common sense about how deep relationships can go then it all makes sense.
Pardon? FEP crowd apply Bayesian framework to non-stationary processes all day long, see https://arxiv.org/abs/2205.11543
Good post. Some thoughts it brought up for me:
1. The point about talking in Bayesian terms about discovering scientific theories being nonsense is right on. You can actually mathematically see the point of failure. Namely, nearly all Bayesian reasoning is done on spaces of possibilities (like a space of possibly true scientific hypotheses) where to do basic things like marginalization, you assume the possibilities are exclusive and exhaustive. But science is an especially egregious area where the exhaustiveness assumption is often obviously wrong. But this problem also applies generally. You may add to the space a 'not one of the other possibilities' proposition but how do you reason with that?
Moreover, there's the problem about whether the exclusivity assumption applies to statistical models where one might say the probability of getting heads on a coin flip is 0.5 and another 0.6; are these two models inconsistent?
2. I think thinking about Bayesianism as a special case with a range of validity is the way to think about it. I like to think about it in analogy with science where you start with theories with small ranges of validity and progressively try to expand the validity, often by taking what is valuable in the current theory and loosening the assumptions. Galilean Relativity -> Special Relativity for example.
My personal bias is that you won't get very far trying to come up with a better theory that tries to formalize fuzzy logic or properly incorporate approximating models. My bet is that it's better to start with a theory that tries for universal inference but doesn't directly codify such things. Rather, think of models and such as approximations to something underlying that, because of their approximate nature, shouldn't be expected to fit in a precise epistemology. Once you have an underlying theory, then you can potentially show when and where different approximation schema work or fail.
3. Language by its very nature seems imprecise. What if our attempts at trying to formalize language are the problem? For what it's worth, here are my ideas about it https://philpapers.org/rec/HASACT-4
1. I don't understand how you imply the problem you describe should be solved in any framework "better" than Bayesianism; it seems that "solving" that problem would be equivalent to solving the Frame Problem, which is known to be Turing non-computable.
Bayesians approach this as a model sampling problem, which in turn can be also seen from the Bayesian perspective: see e.g. works on GFlowNets from Yoshua Bengio's group.
Yeah I expect any such theory to be Turing non-computable but that's not a deal-breaker. In particle physics, all sufficiently useful interacting quantum field theories have not been exactly solved; they are too complex. Infact, a few mathematicians believe they have proven that all non-trivial interacting quantum field theories do not mathematically exist. But this hasn't stopped physicists from making the most accurate and precise predictions in all of science with tractable approximate calculations.
Moreover, the approximate calculations couldn't have been made without the equations describing the exact theory and defining the contours of the solution.
>I personally think that in order to explain why scientific theories can often predict a wide range of different phenomena, we need to make claims about how well they describe the structure of reality—i.e. how true they are.
Have you read Laudan's "A Confutation of Convergent Realism"? (https://www.jstor.org/stable/187066)
I find his arguments against this view very convincing. The evidence against both reference -> more progress AND progress -> more reference seems really strong to me. Plus, how would you ever empirically test this theory?