Chomsky and the Two Cultures of Statistical Learning (2011)

(norvig.com)

91 points | by atomicnature 5 days ago ago

48 comments

intalentive 13 hours ago ago
This essay is missing the words “cause” and “causal”. There is a difference between discovering causes and fitting curves. The search for causes guides the design of experiments, and with luck, the derivation of formulae that describe the causes. Norvig seems to be confusing the map (data, models) for the territory (causal reality).
[-]
- gsf_emergency_6 12 hours ago ago
  A related* essay (2010) by a statistician on the goals of statistical modelling that I've been procrastinating on:
  https://www.stat.berkeley.edu/~aldous/157/Papers/shmueli.pdf
  To Explain Or To Predict?
  Nice quote
  We note that the practice in applied research of concluding that a model with a higher predictive validity is “truer,” is not a valid inference. This paper shows that a parsimonious but less true model can have a higher predictive validity than a truer but less parsimonious model.
  Hagerty+Srinivasan (1991)
  *like TFA it's a sorta review of Breiman
- tripletao 10 hours ago ago
  This essay frequently uses the word "insight", and its primary topic is whether an empirically fitted statistical model can provide that (with Norvig arguing for yes, in my opinion convincingly). How does that differ from your concept of a "cause"?
  [-]
  - musicale 10 hours ago ago
    > I agree that it can be difficult to make sense of a model containing billions of parameters. Certainly a human can't understand such a model by inspecting the values of each parameter individually. But one can gain insight by examing (sic) the properties of the model—where it succeeds and fails, how well it learns as a function of data, etc.
    Unfortunately, studying the behavior of a system doesn't necessarily provide insight into why it behaves that way; it may not even provide a good predictive model.
    [-]
    - tripletao 7 hours ago ago
      Norvig's textbook surely appears on the bookshelf of researchers including those building current top LLMs. So it's odd to say that such an approach "may not even provide a good predictive model". As of today, it is unquestionably the best known predictive model for natural language, by huge margin. I don't think that's for lack of trying, with billions of dollars or more at stake.
      Whether that model provides "insight" (or a "cause"; I still don't know if that's supposed to mean something different) is a deeper question, and e.g. the topic of countless papers trying to make sense of LLM activations. I don't think the answer is obvious, but I found Norvig's discussion to be thoughtful. I'm surprised to see it viewed so negatively here, dismissed with no engagement with his specific arguments and examples.
      [-]
      - atomicnature 7 hours ago ago
        You can look into Judea Pearl's definitions of causality for more information.
        Pearl defines a ladder of causation:
        1. Seeing (association) 2. Doing (intervention) 3. Imagining (counterfactuals)
        In his view - most ML algos are at level 1 - they look at data and draw associations, and "agents" have started some steps in level 2 - doing.
        The smartest of humans operate mostly in level (3) of abstractions - where they see things, gain experience, and later build up a "strong causal model" of the world and become capable of answering "what if" questions.
codeulike 7 hours ago ago
(this is from 2017)
[-]
- atomicnature 7 hours ago ago
  It was available earlier. Here's the HN history:
  https://hn.algolia.com/?query=Chomsky%20and%20the%20Two%20Cu...
  The oldest submission is from 15 y.o ago - that is 2010.
  I resubmitted it - thinking - with the success of LLMs - felt this was worth a revisit from "how real-world scientific progress works" point of view.
- cubefox 6 hours ago ago
  No it is from 2011. The text mentions an event in 2011, so it couldn't have been written earlier, and the first HN submission [1] was in 2011, so it also wasn't written later.
  The title should say (2011), otherwise the whole piece is confusing.
  1: https://news.ycombinator.com/item?id=2591154
barrenko 11 hours ago ago
Is this bayesian vs. frequentist?
[-]
- tgv 9 hours ago ago
  In one word: no.
  In more detail: Chomsky is/was not concerned with the models themselves, but rather with the distinction between statistical modelling in general, and "clean slate" models in particular on the one hand, and structural models discovered through human insight on the other.
  With "clean slate" I mean models that start with as little linguistically informed structure as possible. E.g., Norvig mentions hybrid models: these can start out as classical rule based models, whose probabilities are then learnt. A random neural network would be as clean as possible.
  [-]
  - barrenko 6 hours ago ago
    Thank you!
tripletao 11 hours ago ago
Here's Chomsky quoted in the article, from 1969:
> But it must be recognized that the notion of "probability of a sentence" is an entirely useless one, under any known interpretation of this term.
He was impressively early to the concept, but I think even those skeptical of the ultimate value of LLMs must agree that his position has aged terribly. That seems to have been a fundamental theoretical failing rather than the computational limits of the time, if he couldn't imagine any framework in which a novel sentence had probability other than zero.
I guess that position hasn't aged worse than his judgment of the Khmer Rouge (or Hugo Chavez, or Epstein, or ...) though. There's a cult of personality around Chomsky that's in no way justified by any scientific, political, or other achievements that I can see.
[-]
- thomassmith65 10 hours ago ago
  I agree that Chomsky's influence, especially in this century, has done more harm than good.
  There's no point minimizing his intelligence and achievements, though.
  His linguistics work (eg: grammars) is still relevant in computer science, and his cynical view of the West has merit in moderation.
  [-]
  - tripletao 10 hours ago ago
    If Chomsky were known only as a mathematician and computer scientist, then my view of him would be favorable for the reasons you note. His formal grammars are good models for languages that machines can easily use, and that many humans can use with modest effort (i.e., computer programming languages).
    The problem is that they're weak models for the languages that humans prefer to use with each other (i.e., natural languages). He seems to have convinced enough academic linguists otherwise to doom most of that field to uselessness for his entire working life, while the useful approach moved to the CS department as NLP.
    As to politics, I don't think it's hard to find critics of the West's atrocities with less history of denying or excusing the West's enemies' atrocities. He's certainly not always wrong, but he's a net unfortunate choice of figurehead.
    [-]
    - thomassmith65 4 hours ago ago
      I have the feeling we're focusing on different time periods.
      Chomsky already was very active and well-known by 1960.
      He pioneered areas in Computer Science, before Computer Science was a formal field, that we still use today.
      His political views haven't changed much, but they were beneficial back when America was more naive. They are harmful now only because we suffer from an absurd excess of cynicism.*
      How would you feel about Chomsky and his influence if we ignored everything past 1990 (two years after Manufacturing Consent)?
      ---
      * Just imagine if Nixon had been president in today's environment... the public would say "the tapes are a forgery!" or "why would I believe establishment shills like Woodward and Bernstein?" Too much skepticism is as bad as too little.
      [-]
      - thomassmith65 3 hours ago ago
        I wrote "when America was more naive" but that isn't entirely correct. Americans are more naive today in certain areas. If my comment weren't locked, I would change that sentence to something like "when Americans believed most of what they read in the newspaper"
      - jeremyjh 2 hours ago ago
        > Just imagine if Nixon had been president in today's environment... the public would say "the tapes are a forgery!" or "why would I believe establishment shills like Woodward and Bernstein?" Too much skepticism is as bad as too little.
        Today it would not matter in the least if the president were understood to have covered up a conspiracy to break into the DNC headquarters. Much worse things have been dismissed or excused. Most of his party would approve of it and the rest would support him anyway so as not to damage "their side".
- techsystems 9 hours ago ago
  He did say 'any known' back in the year 1969 though, so judging it to today's knowns would still not be a justification to the idea's age.
  [-]
  - tripletao 8 hours ago ago
    Shannon first proposed Markov processes to generate natural language in 1948. That's inadequate for the reasons discussed extensively in this essay, but it seems like a pretty significant hint that methods beyond simply counting n-grams in the corpus could output useful probabilities.
    In any case, do you see evidence that Chomsky changed his view? The quote from 2011 ("some successes, but a lot of failures") is softer but still quite negative.
- dleeftink 11 hours ago ago
  > novel sentence
  The question then becomes on of actual novelty versus the learned joint probabilities of internalised sentences/phrases/etc.
  Generation or regurgitation? Is there a difference to begin with..?
  [-]
  - tripletao 9 hours ago ago
    I'm not sure what you mean? As the length of a sequence increases (from word to n-gram to sentence to paragraph to ...), the probability that it actually ever appeared (in any corpus, whether that's a training set on disk, or every word ever spoken by any human even if not recorded, or anything else) quickly goes to exactly zero. That makes it computationally useless.
    If we define perplexity in the usual way in NLP, then that probability approaches zero as the length of the sequence increases, but it does so smoothly and never reaches exactly zero. This makes it useful for sequences of arbitrary length. This latter metric seems so obviously better that it seems ridiculous to me to reject all statistical approaches based on the former. That's with the benefit of hindsight for me; but enough of Chomsky's less famous contemporaries did judge correctly that I get that benefit, that LLMs exist, etc.
    [-]
    - dleeftink 8 hours ago ago
      My point is, that even in the new paradigm where probabilistic sequences do offer a sensible approximation of language, would novelty become an emergent feature of said system, or would such a system remain bound to the learned joint probabilities to generate sequences that appear novel, but are in fact (complex) recombinations of existing system states?
      And again the question being, whether there is a difference at all between the two? Novelty in the human sense is also often a process of chaining and combining existing tools and thought.
- agumonkey 10 hours ago ago
  wasn't his grammar classification revolutionary at the time ? it seems it influenced parsing theory later on
  [-]
  - eru 10 hours ago ago
    His grammar classification is really useful for formal grammars of formal languages. Like what computers and programming languages do.
    It's of rather limited use for natural languages.
    [-]
    - koolala 3 hours ago ago
      "BNF itself emerged when John Backus, a programming language designer at IBM, proposed a metalanguage of metalinguistic formulas ... Whether Backus was directly influenced by Chomsky's work is uncertain."
      https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form
      I'm not sure it required Chomsky's work.
    - adamddev1 3 hours ago ago
      It's incredibly useful for natural languages.
    - ogogmad 6 hours ago ago
      Don't you think people would have figured it out by themselves the moment programmers started writing parsers? I'm not sure his contribution was particularly needed.
bo1024 12 hours ago ago
Is this essay from 2011?
cubefox 6 hours ago ago
(2011)
pmkary 7 hours ago ago
I have many books from Chomsky, and I want to throw them away because it disgusts me to have them. Then I think, why should I throw away things I spent so much on? It makes me more angry. So I have pilled them up somewhere to figure out what ti do with them and each time I walk past it I feel sad to ever passed by his work.
[-]
- IndySun 27 minutes ago ago
  Make sure to vet your entire circle - friends, relatives, books, movies, everything... it's going to take a while. In the meantime you'll stop learning/growing too.
  Mine is as ludicrous a suggestion as it is to damn by association.
- rixed 2 hours ago ago
  Are you reacting with as much intensity when you walk past any scientific work older than 20 years?
- eucyclos 4 hours ago ago
  There's an interview with Dan schmachtenberger where he talks about the worst book ever written (his opinion is that it's 'the 48 laws of power'). He made the point that being consistently wrong is actually pretty impressive, and there are worthwhile lessons from watching someone getting taken seriously despite being wrong. Maybe you could revisit them with that approach.
  [-]
  - aleph_minus_one 2 hours ago ago
    > There's an interview with Dan schmachtenberger where he talks about the worst book ever written (his opinion is that it's 'the 48 laws of power').
    Could it be this?
    > https://www.youtube.com/watch?v=eIzRV4TxHo8
  - malvim 2 hours ago ago
    I don’t think they’re disgusted by Chomsky’s work because it’s wrong. They’re disgusted because of the recently surfaced ties with Epstein.
    Not sure the approach holds.
- f1shy 6 hours ago ago
  I assume this comes from his views in politics and/or association with things like Epstein. I would say, independent of that, some works of him can be very valuable. Private life of persons and their work, are better put in totally different context, and not mixed.
  [-]
  - darubedarob 4 hours ago ago
    Is that a Werner von Braun quote?
  - spwa4 5 hours ago ago
    The thing is, nothing that usually changes things applies to Chomsky. What he did was most certainly not a normal thing to do in his time. Like one might say about George Washington or even further back, like Clovis. By today's standards they were morally wrong, but not by the standards of their time and they advanced morals. They made things better.
    Chomsky is wrong by the standards of his time and is making things worse rather than better.
    It was very much the opposite of Chomsky's ideology as well. So it additionally means he's fake. BOTH on his morals and politics/activism, from both sides (ie. both helping a paedophile, and helping/entertaining a billionnaire).
    So it's (yet another) case of an important figure that supposedly stands for something, not just demonstrating he stands for nothing at all, but being a disgusting human being as well.
    [-]
    - mikojan 5 hours ago ago
      > It was very much the opposite of Chomsky's ideology as well.
      On the contrary. Chomsky was open about his civil-libertarian principles: If you are convicted, and you complete your court-ordered obligations, you have a clean slate.
      [-]
      - spwa4 2 hours ago ago
        Tell me, did that attitude extend to helping billionnaires who are having sex with minors? Because that's what he did. Is that what this ideology stands for?
        [-]
        mikojan 20 minutes ago ago
        Yes, of course. It is the whole point. Nobody cares about your 20 year old parking tickets.
- andyjohnson0 6 hours ago ago
  I don't understand. What is it about Chomsky's work that disgusts you? Or is this a reference to his political opinions?
  [-]
  - darubedarob 3 hours ago ago
    His russian imperialism support and his broad rejection of the eastern european civilian uprising against the communist project. Like many idealists he took a utopian, idealizing view and ran with it reality and real suffering caused be damned. Like many idealists he offered basically a API for sociopaths to be hijacked and used as a useful idiot against humanity. This way predictable leads to ruin and ashes as legacy and it did so for him. The epstein connection is just the cherry on top.
    [-]
    - wanderlust123 3 hours ago ago
      Sounds like bit of an over-reaction if I am being honest.
      Some of his books are deeply insightful even if you decide to draw the opposite conclusion. I wouldn’t say anything would create disgust unless you had a conclusion you wanted supported before reading the book.
      Regarding the Epstein thing, bizarre to bring that up when discussing his works, seems like you hate him on a personal level.
  - cubefox 6 hours ago ago
    Read the article above. There is a link at the top of this submission to an essay by Peter Norvig, arguing (correctly, in retrospect) that Chomsky's approach to language modelling is mistaken.
    [-]
    - andyjohnson0 5 hours ago ago
      Obviously I did read the article. And I know how the hn site works.
      I have a passing familiarity with the debate over Chomsky's theories of universal grammar etc. I didn't notice anything in the article that would cause disgust, and so I wondered what I was failing to understand.
      [-]
      - cubefox 5 hours ago ago
        If you have read many books by Chomsky, it might make you angry that you have wasted so much time on what turned out to be a fundamentally mistaken theory.