Square Minus Square – A coding agent benchmark

(aedm.net)

13 points | by Topfi 7 days ago ago

1 comments

wariatus an hour ago ago
Have you tried to equip those agents with an access to grounded vision model to analyse that image?
In my experience most models can’t understand such imput properly
I am now experimenting with Molmo2 and it looks promising