Astro - Hacker News

41 comments

saagarjha an hour ago

It's worth noting here that the author came up with a handful of good heuristics to guide Claude and a very specific goal, and the LLM did a good job given those constraints. Most seasoned reverse engineers I know have found similar wins with those in place.
What LLMs are (still?) not good at is one-shot reverse engineering for understanding by a non-expert. If that's your goal, don't blindly use an LLM. People already know that you getting an LLM to write prose or code is bad, but it's worth remembering that doing this for decompilation is even harder :)
[-]
- ph4evers an hour ago
  
  Are they not performing well because they are trained to be more generic, or is the task too complex? It seems like a cheap problem to fine-tune.
  [-]
  - pixl97 an hour ago
    
    Sounds like a more agentic pipeline task. Decompile, assess, explain.
simonw 21 minutes ago

For anyone else who was initially confused by this, useful context is that Snowboard Kids 2 is an N64 game.
I also wasn't familiar with this terminology:
> You hand it a function; it tries to match it, and you move on.
In decompilation "matching" means you found a function block in the machine code, wrote some C, then confirmed that the C produces the exact same binary machine code once it is compiled.
rlili 2 hours ago

Makes me wonder if decompilation could eventually become so trivial that everything would become de-facto open source.
[-]
- jasonjmcghee 26 minutes ago
  
  It would be "source available", if anything, not "open source".
  > An open-source license is a type of license for computer software and other products that allows the source code, blueprint or design to be used, modified or shared (with or without modification) under defined terms and conditions.
  https://en.wikipedia.org/wiki/Open_source
  Companies have been really abusing what open source means- claiming something is "open source" cause they share the code and then having a license that says you can't use any part of it in any way.
  Similarly if you ever use that software or depending on where you downloaded it from, you might have agreed not to decompile or read the source code. Using that code is a gamble.
  [-]
  - sa1 23 minutes ago
    
    But clean room reverse engineered code can have its own license, no?
    
    [-]
    
    simonw 12 minutes ago
    
    Yeah, I think it can. I'm reminded of the thing in the 80s when Compaq reverse engineered and reimplemented the IBM BIOS by having one team decompile it and write a spec which they handed to a separate team who built a new implementation based on the spec.
    I expect that for games the more important piece will be the art assets - like how the Quake game engine was open source but you still needed to buy a copy of the game in order to use the textures.
- tcdent 13 minutes ago
  
  That's definitely a possible future abstraction and one are about the future of technology I'm excited about.
  First we get to tackle all of the small ideas and side projects we haven't had time to prioritize.
  Then, we start taking ownership of all of the software systems that we interact with on a daily basis; hacking in modifications and reverse engineering protocols to suit our needs.
  Finally our own interaction with software becomes entirely boutique: operating systems, firmware, user interfaces that we have directed ourselves to suit our individual tastes.
- DrNosferatu 15 minutes ago
  
  This day will arrive.
  And it will be great for retro game preservation.
  Having more integrated tools and tutorials on this would be awesome.
- VikingCoder an hour ago
  
  I wonder when you're never going to run expensive software on your own CPU.
  It'll either all be in the cloud, so you never run the code...
  Or it'll be on a chip, in a hermetically sealed usb drive, that you plug in to your computer.
- Aeolun 32 minutes ago
  
  When the decompilation like that is trivial, so is recreation without decompilation. It implies the LLM know exactly how thins work.
- js8 an hour ago
  
  Yes, I believe it will. What I predict will happen is that most commercial software will be hosted and provided through "trusted" platforms with limited access, making reverse engineering impossible.
- Xmd5a 2 hours ago
  
  This deserves a discussion
  [-]
  - stevemk14ebr an hour ago
    
    We're very far away from this.
  - ronsor 2 hours ago
    
    I've used LLMs to help with decompilation since the original release of GPT-4. They're excellent at recognizing the purpose of functions and refactoring IDA or Ghidra pseudo-C into readable code.
    
    [-]
    
    galangalalgol 2 hours ago
    
    How does it do on things that were originally written in assembly?
    
    [-]
    
    saagarjha an hour ago
    
    This is typically easier because the code was written for humans already.
    
    euroderf 2 hours ago
    
    Someone please try this on an original (early 1980s) IBM-PC BIOS.
DrNosferatu 7 minutes ago

More than an overview, a step by step tutorial on this would be awesome!
ACCount37 4 hours ago

If you aren't using LLMs for your reverse engineering tasks, you're missing out, big time. Claude kicks ass.
It's good at cleaning up decompiled code, at figuring out what functions do, at uncovering weird assembly tricks and more.
[-]
- keepamovin 2 hours ago
  
  The article is a useful resource for setting up automated flows, and Claude is great at assembly. Codex less so, Gemini is also good at assembly. Gemini will happily hand roll x86_64 bytecode. Codex appears optimized for more "mainstream" dev tasks, and excels at that. If only Gemini had a great agent...
- skerit an hour ago
  
  I've been using Claude for months with Ghidra. It is simply amazing.
- amelius 3 hours ago
  
  Makes sense because LLMs are quite good at translating between natural languages.
  Anyway, we're reaching the point where documentation can be generated by LLMs and this is great news for developers.
  [-]
  - saagarjha an hour ago
    
    Documentation is one place where humans should have input. If an LLM can generate documentation, why would I want you to generate it when I can do so myself (probably with a better, newer model)?
    
    [-]
    
    ACCount37 5 minutes ago
    
    That's great if those humans are around to have that input.
    Not so much when you have a lot of code built around an obscure SDK from 6 years ago, and you have to figure out how it works, and the documentation is both incredibly sparse and in Chinese.
    
    simonw 8 minutes ago
    
    I definitely want documentation that a project expert has reviewed. I've found LLMs are fantastic at writing documentation about how something works, but they have a nasty tendency to take guesses at WHY - you'll get occasional sentences like "This improves the efficiency of the system".
    I don't want invented rationales for changes, I want to know the actual reason a developer decided that the code should work that way.
  - james_marks 2 hours ago
    
    I stumbled across a fun trick this week. After making some API changes, I had CC “write a note to the FE team with the changes”.
    I then pasted this to another CC instance running the FE app, and it made the counter part.
    Yes, I could have CC running against both repos and sometimes do, but I often run separate instances when tasks are complex.
  - monsieurbanana 2 hours ago
    
    Maybe documentation meant for other llms to ingest. Their documentation is like their code, it might work, but I don't want to have to be the one to read it.
    Although of course if you don't vibe document but instead just use them as a tool, with significant human input, then yes go ahead.
    
    [-]
    
    dunham an hour ago
    
    Although with code it's implementing functions that don't exist yet and with documentation, it's describing functions that don't exist yet.
knackers 8 days ago

I've been experimenting with running Claude in headless mode + a continuous loop to decompile N64 functions and the results have been pretty incredible. (This is despite already using Claude in my decompilation workflow).
I hope that others find this similarly useful.
[-]
- plastic-enjoyer 2 hours ago
  
  This sounds interesting! Do you have some good introduction to N64 decompiliation? Would you recommend using Claude right from the start or rather try to get to know the ins and outs of N64 decomp?
- garrettjoecox 4 hours ago
  
  What game are you working on?
  [-]
  - wk_end 3 hours ago
    
    Last sentence of the first paragraph says it’s Snowboard Kids 2.
    
    [-]
    
    rat9988 2 hours ago
    
    For his defense, it is missing a "Tell HN"
    
    [-]
    
    dpkirchner 2 hours ago
    
    And it isn't always obvious when the commenter is the submitter (no [S] tag like you see on other sites).
    
    [-]
    
    garrettjoecox an hour ago
    
    whoops, I did indeed miss that this was OP
- turnsout 2 hours ago
  
  This is super cool! I would be curious to see how Gemini 3 fares… I've found it to be even more effective than Opus 4.5 at technical analysis (in another domain).
VikingCoder an hour ago

I've been waiting for decompilation to show up in this space.
butz an hour ago

Are there any similar specialized decompilation LLM models available to be used locally?
jamesbelchamber 2 hours ago

This is a refreshingly practical demonstration of an LLM adding value. More of this please.