Square Minus Square – A coding agent benchmark

(aedm.net)

13 points | by Topfi 7 days ago ago

1 comments

  • wariatus an hour ago ago

    Have you tried to equip those agents with an access to grounded vision model to analyse that image?

    In my experience most models can’t understand such imput properly

    I am now experimenting with Molmo2 and it looks promising