Possibly a Serious Possibility

(kucharski.substack.com)

217 points | by samclemens 13 hours ago ago

95 comments

chaps 12 hours ago ago
The City of Chicago's lawyers went the opposite direction in response to @tpacek's affidavit that the release of table/column names would have "marginal value" to an attacker. The city latched onto that to get a trial that eventually went to the IL Supreme Court and lost.
```
    [I]n my affidavit, I wrote that SQL schemas would provide “only marginal value” to an attacker. Big mistake. Chicago jumped on those words and said “see, you yourself agree that a schema is of some value to an attacker.” Of course, I don’t really believe that; “only marginal value” is just self-important message-board hedging. I also claimed on the stand that “only an incompetently built application” could be attacked with nothing but it’s schema. Even I don’t know what I meant by that.
```
His post: https://sockpuppet.org/blog/2025/02/09/fixing-illinois-foia/ My post: https://mchap.io/losing-a-5yr-long-illinois-foia-lawsuit-for...
[-]
- snitty 10 hours ago ago
  >The City of Chicago's lawyers went the opposite direction
  Not really.
  >I wrote that SQL schemas would provide “only marginal value” to an attacker. Big mistake. Chicago jumped on those words and said “see, you yourself agree that a schema is of some value to an attacker.”
  The City of Chicago's argument was that something of ANY value, no matter how insignificant, would help an attacker exploit their system, and was therefore possible to keep secret under the FOIA law.
  [-]
  - tptacek 6 hours ago ago
    You can just read the posts before trying to rebut the plaintiff in the case. The City of Chicago argued a bunch of stuff, but what matters is what the judges decided. Chicago's "no matter how insignificant" argument failed in Chancery Court and wasn't revived either in Appeals Court or at the Supreme Court.
    Ultimately, we lost because the Illinois Supreme Court interpreted the statute such that "file layouts" were per se exempt, regardless of how dangerous they were(n't), and then decided SQL schemas were "file layouts".
    (SQL schemas are basically the opposite of file layouts, but whatever).
    [-]
    - xnorswap 2 hours ago ago
      You shut down someone disagreeing because:
      > [...] what matters is what the judges decided
      But then say
      > SQL schemas are basically the opposite of file layouts
      Which is you disagreeing with what a judge has decided?
      It seems hypocritical to shut-down someone arguing with one aspect of the case on that basis, only to end with your own disagreement with a judge's decision.
      [-]
      - tptacek 2 hours ago ago
        No, I think you're mistaken. That the case didn't turn on how marginal a security risk was isn't a matter of opinion. That a SQL schema isn't a file layout is (though: there's clearly a right answer to that: mine).
  - fc417fc802 10 hours ago ago
    Such a literal interpretation isn't reasonable. There are all sorts of patterns that can be indirectly leaked through supposedly unrelated data. Yet FOIA exists and is obviously intended to be useful.
    So obviously there must be some threshold for the value to an attacker. He attempted to communicate that schemas are clearly below such a threshold and they used his wording to attempt to argue the opposite.
  - numpad0 9 hours ago ago
    > “only marginal value” to an attacker
    > “see, you yourself agree that a schema is of some value to an attacker.”
    IANAL, it appears justice systems universally interpret this type of "technically yes if that makes you happy but honestly unlikely" statements as "yes with technical bonus", not a "no with extra steps" at all, and it has to be shortened as just "unlikely from my professional perspective" or something lawyer approved for intended effect. Courts are weird.
    [-]
    - tptacek 5 hours ago ago
      To be clear: I think it was dumb of me to have written those hedges in my testimony, but they didn't really impact the case.
  - chaps 9 hours ago ago
    Yes really. Our argument, upheld by a judge, was that there was no value to an attacker. Their point stands legally, but nothing else.
    Despite all that, Chicago still pushes back aggressively. Here's a fun one from a recent denial letter they sent for data within the same database:
```
    "When DOF referred to reviewing over 300 variable CANVAS pages, these are not analog sequential book style pages of data. Instead, they are 300 different webpages with unique file layouts for which there is no designated first page."
```
    This is after I requested every field reflected in within the 300 different pages because it would be unduly burdensome to go through. I'm waiting for the city's response for the TOP page rather than the FIRST page. It's asinine that we have to do this in order to understand how these systems can blindly ruin the lives of many.
    They also argued the same 7(1)(g) exemption despite me being explicit about not wanting the column names. Effectively turning their argument into them saying that the release of information within a database, fullstop, is exempt because it could be used to figure out what data exists within a database. That's against the spirit of IL FOIA, which includes this incredibly direct statutory language:
```
    Sec. 1.2. Presumption. All records in the custody or possession of a public body are presumed to be open to inspection or copying. Any public body that asserts that a record is exempt from disclosure has the burden of proving by clear and convincing evidence that it is exempt.
```
    https://www.documentcloud.org/documents/25930500-foia-burden...
    https://www.documentcloud.org/documents/25930501-foia-burden...
    [-]
    - tptacek 6 hours ago ago
      Upheld by several judges, in fact. :)
  - mcphage 10 hours ago ago
    > The City of Chicago's argument was that something of ANY value, no matter how insignificant, would help an attacker exploit their system, and was therefore possible to keep secret under the FOIA law.
    I’m glad that argument lost, since it totally subverts the purpose and intention of the FOIA. Any piece of information could be of value to some attacker, but that doesn’t outweigh the need for transparency.
hlieberman 12 hours ago ago
It’s not just the UK who has standardized on these language; the U.S. intelligence community also has a list of required terminology to use for different confidence levels and different likelihoods — and distinguishing between them. It’s all laid out in ICD-203, publicly available at https://www.dni.gov/files/documents/ICD/ICD-203.pdf
I’ve found it very helpful in the same vein as RFC 2119 terminology (MUST, SHOULD, MAY, etc.); when you need your meanings to be understood by a counterparty and can agree on a common language to use.
[-]
- bo1024 9 hours ago ago
  Interesting. This terminology really makes no sense without more shared context, in my view. For example, I would not describe something that happens to me every month as a "remote possibility". Yet for a 3% chance event, repeated every day, monthly occurrences are what we expect. Similarly, someone who describes events as "nearly certain" would surely be embarrassed when one of the first 20 fails to happen, no?
  [-]
  - randallsquared 8 hours ago ago
    A 1-in-a-million chance seems quite remote, yet if you are rolling the dice once a millisecond...
    This applies to any repeated chance, so it probably doesn't need to be called out again when translating odds to verbal language.
    [-]
    - bo1024 7 hours ago ago
      I'm not only talking about repeated events, though. If someone told me about 20 different events that they were almost certain, and one failed to happen, I would doubt their calibration.
      [-]
      - Nevermark 5 hours ago ago
        19 out of 20 could easily mean they get 119 right out of every 120 “almost certainties” and they just rolled a 1 on a die 6 for the “luck” component.
        One off, 1 out of N errors are really hard to interpret, even with clear objective standards.
        And emphasizes why mapping regular language to objective meanings is a necessity for anything serious, but can still lead to problematic interactions.
        Probability assessments are will almost always sometimes certainly could be considered likely hard!
senderista 11 hours ago ago
I was so frustrated when I tried to get doctors to quantify their assessment of risk for a surgery my sister was about to undergo. They simply wouldn't give me a number, not even "better or worse than even odds". Finally an anesthesiologist privately told me she thought my sister had maybe a one-third chance of dying on the table and that was enough for me. I'm not sure how much fear of liability had to do with this reluctance, or if it was just a general aversion to discussing risk in quantitative terms (which isn't that hard, gamblers do it all the time!).
[-]
- chychiu 10 hours ago ago
  Doctor here
  1. It’s generally difficult to quantify such risks in any meaningful manner
  2. Provision of any number adds liability, and puts you in a damned-if-does, damned-if-it-doesn’t-work-out situation
  3. The operating surgeon is not the best to quantify these risks - the surgeon owns the operation, and the anaesthesiologist owns the patient / theatre
  4. Gamblers quantify risk because they make money from accurate assessment of risk. Doctors are in no way incentivised to do so
  5. The returned chance of 1/3 probably had an error margin of +/-33% itself
  [-]
  - Jach 7 hours ago ago
    Not a lawyer but I do wonder if refusal to provide any number also adds liability, especially if it can be demonstrated to a court later that a reasonable estimate was known or was trivial to look up, and the deciding party would not have gone through with the action that ended in harm if they had been provided said number. I'm also not seeing how giving a number and then the procedure working out results in increased risk, perhaps you can expand on that? Like, where's the standing for a lawsuit if everything turned out fine but in one case you said the base rate number for a knee replacement surgery was around 1/1000 for death at the hospital and 1/250 for all-cause death within 90 days, but in another case you refused to quantify?
  - fc417fc802 10 hours ago ago
    > It’s generally difficult to quantify such risks in any meaningful manner
    According to the literature 33 out of 100 patients who underwent this operation in the US within the past 10 years died. 90% of those had complicating factors. You [ do / do not ] have such a factor.
    Who knows if any given layman will appreciate the particular quantification you provide but I'm fairly certain that data exists for the vast majority of serious procedures at this point.
    I've actually had this exact issue with the veterinarian. I've worked in biomed. I pulled the literature for the condition. I had lots of different numbers but I knew that I didn't have the full picture. I'm trying to quantify the possible outcomes between different options being presented to me. When I asked the specialist, who handles multiple such cases every day, I got back (approximately) "oh I couldn't say" and "it varies". The latter is obviously true but the entire attitude is just uncooperative bullshit.
    > puts you in a damned-if-does, damned-if-it-doesn’t-work-out situation
    Not really. Don't get me wrong, I understand that a litigious person could use just about anything to go after you and so I appreciate that it might be sensible to simply refuse to answer. But from an academic standpoint the future outcome of a single sample does not change the rigor of your risk assessment.
    > Doctors are in no way incentivised to do so
    Don't they use quantifications of risk to determine treatment plans to at least some extent? What's the alternative? Blindly following a flowchart? (Honest question.)
    > The returned chance of 1/3 probably had an error margin of +/-33% itself
    What do you mean by this? Surely there's some error margin on the assessment itself but I don't see how any of us commenting could have any idea what it might have been.
    [-]
    - munificent 6 hours ago ago
      > According to the literature 33 out of 100 patients who underwent this operation in the US within the past 10 years died. 90% of those had complicating factors. You [ do / do not ] have such a factor.
      Everyone has complicating factors. Age, gender, ethnicity, obesity, comorbidities, activity level, current infection status, health history, etc. Then you have to factor in the doctor's own previous performance statistics, plus the statistics of the anaesthesiologist, nursing staff, the hospital itself (how often do patients get MRSA, candidiasis, etc.?).
      And, of course, the more factors you take into account, the fewer relevant cases you have in the literature to rely on. If the patient is a woman, how do you correctly weight data from male patients that had the surgery? What are the error bars on your weighting process?
      It would take an actuary to chew through all the literature and get a maximally accurate estimate based on the specific data that is known for that patient at that point in time.
      [-]
      - fc417fc802 4 hours ago ago
        No one said anything about a maximally accurate estimate. This is exactly the sort of obtuse attitude I'm objecting to.
        By complicating factors I was referring to things that are known to have a notable impact on the outcome of this specific procedure. This is just summarizing what's known. It explicitly does not take into account the performance of any particular professional, team, or site.
        Something like MRSA is entirely separate. "The survival rate is 98 out of 100, but in this region of the country people recovering from this sort of thing have been exhibiting a 10% risk of MRSA. Unfortunately our facility is no exception to that."
        If the recipients of a procedure are predominately female and the patient is a male then you simply indicate that to them. "The historical rate is X out of Y, but you're a bit unusual in that only 10% of past recipients are men. I'm afraid I don't know what the implications of that fact might be."
        You provide the known facts and make clear what you don't know. No weasel words - if you don't know something then admit that you don't know it but don't use that as an excuse to hide what you do know. It's utterly unhelpful.
      - chipsrafferty 6 hours ago ago
        Don't they use quantifications of risk to determine treatment plans to at least some extent?
        [-]
        fiddlerwoaroof a minute ago ago
        I doubt doctors do: my guess would be most doctors follow a list of best practices devised by people like malpractice actuaries and by their sense of the outcomes from experience.
  - ekianjo 3 hours ago ago
    > It’s generally difficult to quantify such risks in any meaningful manner
    It's not for lack of data, that's for sure...
  - garrickvanburen 7 hours ago ago
    I would rather not have a surgeon considering failure rates ahead of any operation they're about to conduct.
    [-]
    - kmoser 5 hours ago ago
      On the off chance you're not being facetious: why? Isn't it part of their job description to weigh the ups and downs of any operation before conducting it? I'd imagine failure to do so would open them to liability.
- lwo32k 5 hours ago ago
  Gamblers are a poor example. Their decisions hardly effect anyone else or institutions or nations.
  Increase the cost of the fallout of a decision (your relationships, your bosses job, your orgs existence, economy, national security etc etc) and the real fun starts.
  People no matter what they say about other people's risk avoidance, all start behaving the same way as the cost increases.
  This is why we end up with Trump like characters up the hierarchy, every where you look, cause no one capable of appreciating the odds, wants to be sitting in those chairs and being held responsible for all kinds of things outside their control.
  Its also the reason why we get elaborate Signalling (costumes/rituals/pageantry/ribbons and medals/imposing buildings/PR/Marketing etc) to shift focus away from quantifying anything. See Theory of the Leisure Class. Society hasn't found better ways to keep Groups together while handling complexity the group is incapable of handling. Even small groups will unravel if there is too much focus on low odds of a solution.
nickm12 6 hours ago ago
Anyone else find the standard "probability yardstick" very misleading on the "unlikely" side? I know the whole point of the article is that English speakers can interpret these phrases differently, but calling a 1-in-3 chance "unlikely" seems off. I would shift that whole side down—30% as "a possibility", 10% as "unlikely", 5% as "highly unlikely".
[-]
- irjustin 6 hours ago ago
  You're right, whatever they pick will be wrong, but that's "missing the forest for the tree"
  The goal is to remove uncertainty in the language when documenting/discussing situations for the state.
  It doesn't matter that it's wrong colloquially or "feels wrong". It's that when you're reading or talking about a subject with the government, you need to use a specific definition (and thusly change your mental model because everyone is doing as such) so that no one gets misunderstood.
  Would it be better to always use raw numbers? Honestly I don't know.
  [-]
  - rendaw 4 hours ago ago
    Or new words unrelated to vernacular? Like "This is a cat-3 risk". When repurposing words in common usage they're always going to be fighting against intuition and triggering confusion because of it.
    [-]
    - fc417fc802 4 hours ago ago
      I was thinking this while reading it as well. Why go to all this trouble when we know in advance there will still be issues? Standardize on not using casual wording for quantified risks and instead provide the actual numbers. Something like "40 (out of 100) error 10 (plus or minus)". No more ambiguity. My offhand example even abbreviates nicely as scientific notation ie 40e10. I'm sure someone who actually spent some time on this could come up with something better.
christiangenco 11 hours ago ago
I've had the same sort of difficulty with phrases like "most" or "almost all" or "hardly any"—I crave for these to map to unambiguous numbers like the probability yardstick referenced in this article.
I spun up a quick survey[1] that I sent out to friends and family to try to get some numbers on these sorts of phrases. Results so far are inconclusive.
1. https://www.vaguequantifiers.com/
[-]
- SAI_Peregrinus 10 hours ago ago
  "Almost all" is an interesting one, because it has family of mathematical definitions in addition to any informal definitions. If X is a set, "almost all elements of X" means "all elements of X except those in a negligible subset of X", where "negligible" depends on context but is well-defined.
  If there's a finite subset of an infinite set, almost all members of the infinite set are not in the finite set. E.g. Almost all integers are not 5: the set of integers equal to five is finite and the set of integers not equal to five is countably infinite.
  Likewise for two infinite sets of different size: Almost all real numbers are not integers.
  Etc.
- mannykannot 10 hours ago ago
  The more precisely they are defined, the less frequently will you see them used correctly.
- jbaber 11 hours ago ago
  "Almost all" in math can mean "except at every integer or fraction" :)
  [-]
  - tejtm 10 hours ago ago
    I would expect almost NO numbers are rational (integer or fraction) with an infinite number of Reals between each.
    [-]
    - noqc 10 hours ago ago
      in between any two real numbers, there is a rational number, and vice versa.
      [-]
      - concordDance 17 minutes ago ago
        And yet somehow there are infinity times more reals than rationals...
        Very hard to get your head around!
    - JadeNB 10 hours ago ago
      > I would expect almost NO numbers are rational (integer or fraction) with an infinite number of Reals between each.
      You're right (technically correct, which is the best etc.)! That is why "almost all" can mean everything except rational numbers.
  - layer8 11 hours ago ago
    The semantics are almost always reasonable: https://en.wikipedia.org/wiki/Almost_all
  - dullcrisp 10 hours ago ago
    Sure but that’s because 100% of real numbers, by any standard measure, aren’t integers or fractions. It bothers me if it’s used to mean 95% of something though.
  - JadeNB 10 hours ago ago
    > "Almost all" in math can mean "except at every integer or fraction" :)
    I am a mathematician, but, even so, I think that this is one of those instances where we have to admit that we have mangled everyday terminology when appropriating it, and so non-measure theoretic users should just ignore our definition. (Similarly with "group," where, at the risk of sounding tongue-in-cheek because it's so obvious, if I were trying to analyze how people usually understand its everyday meaning I wouldn't include the requirement that every element have an inverse.)
dejobaan 12 hours ago ago
That was a good read (and short, with a cool graph—I want to know who tagged "Almost No Chance" as 95% likely; a would-be Pratchett fan, perhaps). In biz, that's part of why I like to separate out goals ("we'll focus on growing traffic") and concrete objectives ("25% audience growth between now and June 1st").
[-]
- patrickmay 12 hours ago ago
  But is it EXACTLY a million to one chance?
  [-]
  - forrestthewoods 11 hours ago ago
    I hate “one in a million” because its meaning depends on how many times you’re rolling the die!
    I’ll never forgot old World of Warcraft discussions about crit probability. If a particular sequence is “one in a million” and there are 10 million players and each player encounters hundreds or thousands of sequences per day then “one in a million” is pretty effing common!
    [-]
    - pmontra 4 hours ago ago
      One in a million is more than rolling 4 doubles in a row in backgammon (it's played with two 6 sided dice.) So if a backgammon app or server starts having about 10 thousands players it's not uncommon that every single month (or day) there is such a sequence. Some players will eventually write in a review or in a support forum that the server, the bot, the app cheats against them because of the impossible odds of what just happened. The support staff have to explain the math with dubious results, which is ironic because every single decision in backgammon should be made with probabilities in mind.
    - gpcz 11 hours ago ago
      In functional safety, probabilities are usually clamped to an hour of use.
    - JadeNB 10 hours ago ago
      > I hate “one in a million” because its meaning depends on how many times you’re rolling the die!
      I'd argue that it doesn't depend on that at all. That is, its meaning is the same whether you're performing the trial once, a million times, ten million times, or whatever. It's rather whether the implication is "the possibility may be disregarded" or "this should be expected to happen occasionally" that depends on how many times you're performing the trial.
      [-]
      - forrestthewoods 6 hours ago ago
        I accept your terminology as more precise.
- ModernMech 10 hours ago ago
  My feeling is it's a measure of the number of people who read the question wrong.
hunter2_ 11 hours ago ago
"Rare" versus "common" is an interesting one. They sound like antonyms, but I don't think the typical probabilities are really symmetrical. Maybe something like 0%-10% for rare (although some sources say 5%) and something like 40%-100% for common.
[-]
- konstantinua00 10 hours ago ago
  "common" has such a large spread because meaning behind it is sort of "at least one in each sample", where that sample can be anything (graspable)
  if you're a teacher and one student per class does the same thing - it's common. Even though it's only 1/25 or 1/30 of all students
- Macha 10 hours ago ago
  Maybe it's my amount of video games played in childhood that influenced that, but common and rare are just two points on a spectrum (with at least "uncommon" in between)
chipsrafferty 9 hours ago ago
Why not just actually list the number you have in mind so everyone's on the same page "we consider it a serious possibility - about 60% - that bla bla bla"
[-]
- andrewflnr 8 hours ago ago
  Almost no one making these statements has an actual number in mind, or they would just say it. Probably not even in intelligence, definitely not in popular usage.
  [-]
  - qznc 25 minutes ago ago
    Using actual numbers requires a little bit of training but not much. I believe many would benefit from doing it.
  - kmoser 5 hours ago ago
    But if there's a standardized chart that maps the phrase to a number, certainly you'd expect whoever is writing the phrase to know what number it maps to. For the sake of simplicity, then, why not just use the number to avoid all doubt?
    [-]
    - andrewflnr 4 hours ago ago
      The biggest reason is probably just that your boss told you to. But also downthread we were talking about the illusion of quantitative thought that comes from using specific numbers, which would be a slightly better reason to use words instead.
    - qmr 4 hours ago ago
      I don't think there is a standardized chart. Rather the chart shows people's subjective mental mapping of the description to a probability.
      [-]
      - andrewflnr 4 hours ago ago
        The chart at the bottom of the article is explicitly a standard adopted by UK intelligence. (ed: well, proposed for adoption, it's not clear from the article how far it actually got.)
- tasuki 9 hours ago ago
  Because then it doesn't happen and (dumb) people will say "see you were wrong".
SoftTalker 11 hours ago ago
I have a habit of saying "almost definitely" which I have tried to break but I still fall back to it occasionally. And I know several people who will say something is "definitely possible" or "certainly a possibility" or something along those lines. It's all insecure language we use to avoid making a statement that might turn out to be wrong.
[-]
- rekenaut 11 hours ago ago
  I often say "definitely possible" when I am not sure what the chance of something happening is but I ought to acknowledge that it is possible. It is definitely possible that I should choose better language to communicate this.
  [-]
  - smitty1e 10 hours ago ago
    When they won't quit asking, I'm "willing to commit to a definite maybe".
Macha 10 hours ago ago
Who are the people that have a small bump of believing "better than even" is 10-20%? Why?
[-]
- tempestn 9 hours ago ago
  You also see the opposite bump for most of the negative assessments. My assumption is that they're likely reading the question backwards. ie. "how unlikely" vs "how likely" or similar.
  [-]
  - vpribish 8 hours ago ago
    maybe something like dyslexia but for semantics. I did some searching and couldn't find a term for this.
Zanni 3 hours ago ago
I don't understand the point of standardizing language around specific numerical ranges when they could just use numbers.
[-]
- inejge 2 hours ago ago
  They would have to use ranges, though, and I think that the non-numeric phrases flow better, which should aid comprehension.
didgetmaster 10 hours ago ago
"The odds are more like a million to one!"
"So...you're telling me there is a chance!"
bmurray7jhu 10 hours ago ago
Text of NIE 29-51 "Probability of an Invasion of Yugoslavia in 1951"
Partial HTML: https://history.state.gov/historicaldocuments/frus1951v04p2/...
Full text PDF scan: https://www.cia.gov/readingroom/docs/CIA-RDP79R01012A0007000...
tempestn 9 hours ago ago
Interesting. Two things that jumped out to me were 1) why do the regions of the standardization line not overlap or at least meet? And 2) What's up with the small but clear minority of people who took all the 'unlikely' phrasings to mean somewhere in the realm of 90 to 100%? My guess would be they're misreading the question and that is their estimate of unlikelihood?
[-]
- pictureofabear 9 hours ago ago
  Because many people cannot or will not accept ambiguity. Charitably, I suppose this comes from a desire to logically deduce risk by multiply the severity of the consequences by the chance that something will happen. Uncharitably, it gives decisionmakers a scapegoat should they need one.
jMyles 12 hours ago ago
> Since then, some governments have tried to clean up the language of probability. After the Iraq War—which was influenced by misinterpreted intelligence
While I laud the gracious application of Hanlon's Razor here, I also think that, for at least some actors, the imprecision was the feature they needed, rather than the bug they mistakenly implemented.
mempko 12 hours ago ago
It's strange to map language to probability ranges. The guidance should be to just say the probability range. No ambiguity. Clear. Actionable and also measurable.
[-]
- throwaway81523 12 hours ago ago
  It's still a subjective estimate, but Samosvety (predictor group) does seem to work that way, and HPMOR suggested something similar. Basically assign probabiltiies to less complex unknowns using numbers pulled out of your butt if that's all you can do. Then you can compute conditional probabilities of various more complicated events using those priors. At least then, a consistent set of numbers has carried through the calculation, even if those numbers were wrong at the outset. It's suppose to help your mental clarity. I guess you can also perturb the initial numbers to guess something like a hyperdistribution at the other end.
  I haven't tried this myself and haven't run across a situation to apply it to lately, but I thought it was interesting.
  [-]
  - Jach 7 hours ago ago
    If you don't perturb the initial numbers to see what changes to help construct a richer model (as well as one with tighter bounds on what it predicts or forbids), you're leaving out a lot of the benefits of the exercise. Sometimes the prior doesn't matter that much because you find you already have or can easily collect sufficient data to overcome various arbitrary priors. Sometimes different priors will result in surprisingly different conclusions, some of them you can even be more confident in ruling out because of absence of data in their predictions. (Mathematically, absence of evidence is evidence of absence, though of course it's just an inequality so the proof says nothing on whether it's weak or strong evidence.) And of course some priors are straight from the butt but others are more reasonably estimated even if still quite uncertain; in any case much like unit testing of boundary conditions you can still work through a handful of choices to see the effects.
    Someone mentioned fermi calculations, a related fun exercise in this sort of logic is the work on grabby aliens: https://grabbyaliens.com/
  - andrewflnr 12 hours ago ago
    > a consistent set of numbers has carried through the calculation, even if those numbers were wrong at the outset
    I kind of see how this might be useful, but what I've actually seen is an illusion of certainty from looking at numbers and thinking that means you're being quantitative instead of, you know, pulling things out of your butt. Garbage in, garbage out still applies.
    [-]
    - throwaway81523 12 hours ago ago
      Yes, the potential illusion is most dangerous if you show someone else the numbers and they take them seriously. If they're only for your own calculations then you can remember what they are made of.
      [-]
      - plorkyeran 11 hours ago ago
        In practice people seem to be very bad at remembering that. Pretty universally people act as though doing math on made up numbers makes them less erroneous rather than more.
        [-]
        widforss 11 hours ago ago
        That's the whole point of Fermi estimates. Find a plausible number given uncertain inputs.
        chrisweekly 11 hours ago ago
        Yeah, mistaking precision for accuracy is a common fallacy.
        [-]
        JadeNB 10 hours ago ago
        > Yeah, mistaking precision for accuracy is a common fallacy.
        I remember an appealing description of the difference being that a precise archer might sink all their arrows at the same spot on the edge of the target, whereas an accurate archer might sink all their arrows near the bull's eye without always hitting the same spot.
- Muromec 12 hours ago ago
  That's the other way around -- there was no probability range to begin with.
- layer8 11 hours ago ago
  How would you possibly measure the “Probability of an Invasion of Yugoslavia in 1951”, in March 1951?
  [-]
  - konstantinua00 10 hours ago ago
    9/12
    3 months have passed, 9 to go :)
- csours 11 hours ago ago
  Or use a histogram.
a3w 11 hours ago ago
How was this not on lesswrong.com, they are all about ]0..1[

This problem crops up everywhere, especially when it's a consequential claim. Eg when the US Department of Energy says with 'low confidence' that the Sars-COV2 outbreak and pandemic was 'most likely' was the result of a laboratory leak, what number does that translate to on the certainty scale?

Also, what likelihood can we assign to claims that the virus was deliberately modified at the furin cleavage site as part of a gain-of-function research program aimed at assessing the risks of species-jumping behavior in bat coronaviruses? This is a separate question from the lab escape issue, which dould have involved either a collected wild-type virus or one that had been experimentally modified.

Perhaps experts in the field 'misinterpreted the evidence' back in the early months of the pandemic, much as happened with the CIA and its 'intelligence on Iraq'?

https://interestingengineering.com/health/us-doe-says-covid-...

[-]

nightpool 12 hours ago ago

I highly recommend that you read through https://www.astralcodexten.com/p/practically-a-book-review-r... and watch the underlying debate (starting here https://www.youtube.com/watch?v=Y1vaooTKHCM), it does a really good job of laying out the arguments for and against lab leak in a very thorough and evidence-based way like you're asking for here.

[-]

photochemsyn 9 hours ago ago

"Viral" by Alina Chan and Matt Ridley is worth reading. But I don't think there's much doubt now that Sars-CoV2 was the result of reckless gain-of-function research conducted jointly between China's Wuhan Institute of Virology, America's Baric Lab in North Carolina, and facilitated by funding through EcoHealth Alliance, the NIH and the NIAID. Whoopsie.

[-]

nightpool 6 hours ago ago

There's a lot of doubt, actually. Peter responds to this possibility in a lot of detail in the debate I linked:

    Even if WIV did try to create COVID, they couldn’t have. As Yuri said, COVID looks like BANAL-52 plus a furin cleavage site. But WIV didn’t have BANAL-52. It wasn’t discovered until after the COVID pandemic started, when scientists scoured the area for potential COVID relatives. WIV had a more distant COVID relative, RATG-13. But you can’t create COVID from RATG-13; they’re too different. You would need BANAL-52, or some as-yet-undiscovered extremely close relative. WIV had neither.

    Are we sure they had neither? Yes. Remember, WIV’s whole job was looking for new coronaviruses. They published lists of which ones they had found pretty regularly. They published their last list in mid-2019, just a few months before the pandemic. Although lab leak proponents claimed these lists showed weird discrepancies, this was just their inability to keep names consistent, and all the lists showed basically the same viruses (plus a few extra on the later ones, as they kept discovering more). The lists didn’t include BANAL-52 or any other suitable COVID relatives - only RATG-13, which isn’t close enough to work.

    Could they have been keeping their discovery of BANAL-52 secret? No. Pre-pandemic, there was nothing interesting about it; our understanding of virology wasn’t good enough to point this out as a potential pandemic candidate. WIV did its gain-of-function research openly and proudly (before the pandemic, gain-of-function wasn’t as unpopular as it is now) so it’s not like they wanted to keep it secret because they might gain-of-function it later. Their lists very clearly showed they had no virus they could create COVID from, and they had no reason to hide it if they did.

    COVID’s furin cleavage site is admittedly unusual. But it’s unusual in a way that looks natural rather than man-made. Labs don’t usually add furin cleavage sites through nucleotide insertions (they usually mutate what’s already there). On the other hand, viruses get weird insertions of 12+ nucleotides in nature. For example, HKU1 is another emergent Chinese coronavirus that caused a small outbreak of pneumonia in 2004. It had a 15 nucleotide insertion right next to its furin cleavage site. Later strains of COVID got further 12 - 15 nucleotide insertions. Plenty of flus have 12 to 15 nucleotide insertions compared to other earlier flu strains.

....

    COVID’s furin cleavage site is a mess. When humans are inserting furin cleavage sites into viruses for gain-of-function, the standard practice is RRKR, a very nice and simple furin cleavage site which works well. COVID uses PRRAR, a bizarre furin cleavage site which no human has ever used before, and which virologists expected to work poorly. They later found that an adjacent part of COVID’s genome twisted the protein in an unusual way that allowed PRRAR to be a viable furin cleavage site, but this discovery took a lot of computer power, and was only made after COVID became important. The Wuhan virologists supposedly doing gain-of-function research on COVID shouldn’t have known this would work. Why didn’t they just use the standard RRKR site, which would have worked better? Everyone thinks it works better! Even the virus eventually decided it worked better - sometime during the course of the pandemic, it mutated away from its weird PRRAR furin cleavage site towards a more normal form.

    COVID is hard to culture. If you culture it in most standard media or animals, it will quickly develop characteristic mutations. But the original Wuhan strains didn’t have these mutations. The only ways to culture it without mutations are in human airway cells, or (apparently) in live raccoon-dogs. Getting human airway cells requires a donor (ie someone who donates their body to science), and Wuhan had never done this before (it was one of the technologies only used at the superior North Carolina site).

[-]

photochemsyn 6 hours ago ago
It's equally likely that the Wuhan Institute of Virology was testing constructs created in the Baric Lab in their bat and mice models, and this was initiated during the 2014-2017 ban on gain-of-function research in the USA that Fauci vocally opposed.
The reality here is that there are thousands of mammalian viruses that don't infect humans that could be modified to infect humans via specific modifications of their target mammalian cell-surface receptor proteins, as was done in this specific case of a bat coronavirus modified at its furin cleavage site to make it human-targetable. Any modern advanced undergrad student in molecular biology could explain this to you, if you bothered to listen.
So first, we need an acknowledgement of Covid that vastly embarrasses China and the USA, and second we need a global treaty banning this generation of novel human pathogens from wild mammalian viral types... I guess I won't hold my breath.
[-]
- fc417fc802 3 hours ago ago
  > there are thousands of mammalian viruses that don't infect humans that could be modified to infect humans via specific modifications of their target mammalian cell-surface receptor proteins
  Are you claiming that's what happened here? What virus do you propose was modified in such a manner? Where was it sourced from? Why was it not on the published lists of discovered viruses?
  > as was done in this specific case of a bat coronavirus modified at its furin cleavage site to make it human-targetable
  Did you even read the comment you're responding to? It details why the furin cleavage site does not resemble something that you would expect humans to have produced.