How can AI ID a cat?

(quantamagazine.org)

104 points | by sonabinu 3 days ago ago

26 comments

  • reilly3000 6 minutes ago ago

    Long have I wanted a cat door that would only open for my cats, not the mean neighborhood one that eats their food. I can’t be the only one. I’ve been meaning to try to build one with a camera, rPi and Google Coral, but never got around to it. There’s the matter of the locking mechanism and more.

  • bdcravens 4 hours ago ago

    I have six animals, and Apple Photos does a great job of recognizing them by name after I labeled them the first time (the office dog as well). Two of them however are gray tabbies (brothers) and it can't distinguish them, so I had to name them with an ampersand ("Harley & Ralph Lauren")

    Impressed that it can do as well as it does, I just find that amusing.

    • javchz 2 hours ago ago

      The same with Google photos, it groups similar cats as just one. Fun fact does the same for human twins

    • mshockwave 3 hours ago ago

      Came to say Apple also did a great job on tagging my bois who are both grey-ish cats, even in pictures they faced backward, no idea how they did that

      • dhosek 3 hours ago ago

        What I found impressive was that Apple Photos, given pictures of my cousins when they were 50 or more years old, was able to identify pictures of them as kids. On the other hand, it could never consistently distinguish between my two older brothers (although to be fair, they were identical twins). It also insists that a beagle I once owned was a cat. I mean, sure, he sometimes slept on his back with his paws in the air like a cat, but he was all dog.

  • bc569a80a344f9c 5 hours ago ago

    An interesting follow-up is using various xAI (explainable AI) techniques to then investigate what features in an image the classifier uses to make its decisions. Saliency maps work great for images. When I was playing around with it, the binary classifier I trained from scratch to distinguish cats from dogs ended up basically only looking at eyes. Enough images in the dataset featured cats with visible, open eyes, and the vertical slit is an excellent predictor. It was an interesting lesson that also emphasized how much the training data matters.

    • krackers 2 hours ago ago

      This article seemed really basic, no insight other than "it learns the high dimensional manifold on which cat images lie, thus separating cats from non-cats" (not that simple explanations are bad, but Quanta articles seem to be getting more watered down over time).

      The real question is whether we can get some insight as to how exactly it's able to do this. For convolution neural networks it turns out that you can isolate and study the behavior of individual circuits and try to understand what "traditional image processing" function they perform, and that gives some decent intuition: https://distill.pub/2020/circuits/ - CNNs become less mysterious when you break them down as being decomposed into "edge detectors, curve detectors, shape classifiers, etc."

      For LLMs it's a bit harder, but anthropic did some research in this vein.

    • cco 5 hours ago ago

      ExAI feels like a better shortening, both for clarity and given that xAI is a company already.

  • trjordan an hour ago ago

    One of the funny things about LLMs and modern AI is that "the ability to recognize a cat" isn't a trained behavior anymore, as described here. It's an emergent property of training it to predict a lot of things, and cats happens to be present enough in the data such that they're one of the things you can ask a larger model and have it work.

    My favorite work on digging into the models to explain this is Golden Gate Claude [0]. Basically, the folks at Anthropic went digging into the many-level, many-parameter model and found the neurons associated with the Golden Gate Bridge. Dialing it up to 11 made Claude bring up the bridge in response to literally everything.

    I'm super curious to see how much of this "intuitive" model of neural networks can be backed out effectively, and what that does to how we use it.

    [0] https://www.anthropic.com/news/golden-gate-claude

  • cmpalmer52 3 hours ago ago

    Just an anecdote, but back in college, I had an algorithms professor who gave us a classifier problem like the square and triangle boundary problem. His English was poor and nobody understood the problem as he stated it. I got an okay score on it, but never understood it very well.

    Anyway, it’s 40 years later and I just read this article and said, “Oh! Now I get it.” A little too late, for Dr. Hippe’s class.

  • BobbyTables2 2 hours ago ago

    Wasn’t it “Hitchhikers Guide to the Galaxy” that humorously described an AI controlled train system failing because it was looking at the clock instead of the trains?

    Seems extremely prescient…

  • isopede 5 hours ago ago

    Neat. Anyone know what is used to make the animations? I like the graphic design!

    • chacham15 5 hours ago ago
    • cwmoore 5 hours ago ago

      Small but effective visual cues, smooth and carefully chromatic.

      I am struck by the conceptual framework of classification tasks so snappily rendering clear categories from such fuzziness.

  • StrandedKitty 5 hours ago ago

    For some reason I thought this article would explain how to ID a specific cat, that is basically facial recognition for cats.

    Is this even something that's possible with current tech? Like, surely cats have some facial features that can be used to uniquely identify them? It would be cool to have a global database of all cats that users would be able to match their photos against. Imagine taking a picture of a cat you see on the street, and it immediately tells you the owner's details and whether it's missing.

    • tanelpoder 5 hours ago ago

      I wrote the CatBench vector search playground toy app exactly for this reason! [1] ("cat-similarity search for recommendation engines and cat-fraud detection"). I built it both for learning & fun, but also it's useful for demoing vector search functionality, plugged in to regular RDBMS application schemas in business context. I used cats & dogs as it's something everyone understands, instead of diving deep into some narrow industry vertical specific use case.

      [1]: https://tanelpoder.com/posts/catbench-vector-search-query-th...

    • dhosek 3 hours ago ago

      I imagine when they run out of other sensors to add to our phones, they’ll add chip readers so you can just scan for the implanted microchip on a cat you encounter. (said semi-sarcastically since the tech requires close proximity between animal and reader which most cats you encounter on the street will not countenance)

      • ch4s3 2 hours ago ago

        > which most cats you encounter on the street will not countenance

        Maybe not with you ;)

  • Veliladon 4 hours ago ago

    I have a Finnish Lapphund dog and from the right angle AI thinks it's a cat.

  • spacecadet 2 hours ago ago

    Many years ago one of our cats got out, she was gone for 3 weeks, we tracked her down using 6 game cameras. Long story short, I have 200,000 images of "wild life"... Last year I used a VLM to catalog all of the images by generating detailed descriptions. I was able to find images of our cat in 3 searches, the same images we used to identify her originally, which took hours each day combing through thousands of images.

  • busymom0 5 hours ago ago

    Probably one of the first articles on this topic which I have read to the finish line and understood everything fully. Thanks.