Why SWE-bench Verified no longer measures frontier coding capabilities

(openai.com)

96 points | by kmdupree  4 hours ago

72 comments