Astro - Hacker News

8 comments

ej88 4 minutes ago

This is cool!
I used to work on post-training & evals. Excited to see more from poolside
pratio 43 minutes ago

Are you guys affiliated to https://poolside.fm/ or https://poolsuite.net?
[-]
- colesantiago 6 minutes ago
  
  They are not, although Poolside FM was the first one to use the "Poolside" name.
  This is particularly the reason that Poolside AI filed a trademark infringement against "Poolside FM" that forced them to change their name to "Poolsuite"
  https://x.com/Poolsuite/status/1398007075435843592
  This annoyed the founder of Poolsuite and they ripped off his brand.
  https://x.com/marty/status/1932386087390818635?s=46
fsh an hour ago

I don't get the point. The model has presumably been trained on all public GitHub code, so the evaluation is tainted anyway.
[-]
- ej88 3 minutes ago
  
  swe bench pro has a public and private test set, where the private eval is from proprietary codebases only
- adrian_b 33 minutes ago
  
  A couple of days ago there has been another thread about an experiment with many LLMs, where especially the Anthropic models were found to "cheat" in a large percentage of the coding tasks that had been benchmarked, by searching the Internet for appropriate code and inserting it in the program they had to write.
  The conclusion of that study was that when benchmarking LLMs for coding ability, they should not have access to Internet, if you want to know their intrinsic abilities.
  Moreover, this can be worrisome as a more direct copyright infringement than the one caused by training, because even if they find open source code on the Internet and they insert it in the generated files, it is pretty certain that it must have had a license that prohibits the removal of the copyright notice.
schnitzelstoat 2 hours ago

It was an interesting read - perhaps I misunderstood the part about blocking GitHub, but is not possible just to block it from accessing that specific repo?
[-]
- changoplatanero 2 hours ago
  
  In theory yes blocking specific repo is possible. In practice more difficult as the repo could be cloned under different names and you might have hundreds of training tasks that you need to configure this for. So it would be a lot of work to verify that you blocked them one by one.