My favorite tidbit of the series (but it's in Metal Gear Solid 3): in the torture scene where Volgin interrogates Snake in an attempt to discover who is the spy in his ranks, of all the persons present in the room (and if I got everything right from memory):
- Volgin is not an agent
- Snake is an agent (USA spy)
- The Boss is a double agent (Soviet/USA spy)
- Eva is a triple agent (USA/Soviet/Chinese spy)
- Ocelot is a quadruple agent (GRU/KGB/CIA/Philosophers spy)
And of course, after that Volgin then kills Granin thinking he was the spy, but he wasn't.
>this remains a tremendous milestone for games preservation
Clearly if it was able to be leaked it already was being preserved. It is shameful that such a publication tries and celebrate copyright infringement like this.
In any sane world not completely captured by corporate interests, the game would already be in the public domain after 25 years. The harm is non-existent.
> Clearly if it was able to be leaked it already was being preserved
Preserved by whom? Many leaks are done by old or ex-employees who quietly kept a shall we say 'backup' of their work. More than one 'official' re-release has been rumored to be an embarrassed company quietly filing the serial numbers off a rogue leak because they realized way too late that their archival practices were inadequate.
Also how WarioWare: Smooth Moves shows their in-house developers using third-party emulators to source graphics for their first-party nostalgia bait: https://tcrf.net/WarioWare:_Smooth_Moves#Punch-Out (not said derisively; I love WarioWare!)
If someone breaks into a warehouse and makes off with a pallet of cartridges, and then those carts are recovered, would it be strange if Nintendo resold those carts? It's their property at the end of the day.
Aside from that thought exercise, like many "internet facts" this one also might not be true, and repeating it doesn't really help either "side."
It’s, what, 25 years old? There have been many sequels, prequels, remastering. The economic benefits of this IP are largely exhausted; that it is now leaked to the commons isn’t an alarming thing.
Recompilation efforts raise the bar for future re-releases, and incentivize proper remaster efforts like MGS Delta instead of the half-assed Master collection. I would love to see Konami thrive as a company and get more people interested in MGS, but their recent re-releases don't deserve to be priced at $60. Their monopoly of the source code feels like an existential threat to both future preservation and high-quality MGS remakes, it's healthier for Konami to simply let it go at this point.
Copyrights were only intended to be secure for a _limited_ time. Originally 20 years. Konami has been granted at least 2 decades of FBI backed security of their property. I'd say they got a plenty good deal and have nothing to complain about here.
I wonder if it’s a real leak or just an agent recreation of the source from machine code.
I’ve been having fun lately with agents and decompilation. You can literally point them at any game and ask them to decompile the game and structure and format as if it was the original source code. Asking them to ensure it compiles works fine.
Some proof: i made online save game editors for jagged alliance 3; grandcheaten.com and news tower; thedailycheat.com (.com domains are only $10 so i figured why not).
You can do this with any game i’ve found. Older games work best due to the forced simplicity of the source code though.
There is no way you could recreate a convincing enough 90s era codebase of a japanese videogame + its associated tools + scripts and commented out codepaths with current ai tools.
I wouldn't be too sure about that. The original decompilations of Mario 64 and Ocarina of Time were done mostly by hand because LLMs weren't really around yet, but these kinds of projects seem perfectly suited for handing the gritty work off to AI: There is a clear output (exact binary recreation) and a straightforward path to get there (look at this assembly code and produce some C code from it). The decompilation of Twilight Princess jumped from very little to basically 100% of core code in the past year alone: https://github.com/zeldaret/tp
I have no doubt that this would be possible for MGS2 as well.
Keep your eyes open for Sonic R too. Sadly a lot of the online Sonic community has been toxic to the dev for being transparent about using Claude for the majority of the disassembly. Even though he's a very talented developer with lots of credit to his name, and only took a few weeks compared to a year+ if fully manual.
Having followed his bsky during his announcement, he started off per-emptively dissing on his haters that... didn't even exist yet. Constantly posting memes about how everyone was dissing him and how AI was totally superior (and then posting his angry sessions with Claude when it got something wrong) when most other users were just "that's cool man". The thing that made him quit bsky was a (now-deleted) thread someone posted criticizing the weird crash-outs. I think he was more... normal about the whole thing, people would have received the project quite a bit more positively.
I don't think it's impossible, but it would take a lot of time and a lot of money; likely more time than good enough models have been commercially available.
I have been working on an incremental decompilation-based reimplementation (basically how OpenRCT2 was done) of Worms Armageddon for the past 2 months with a lot of help from LLM tools; primarily Claude Code and Ghidra MCP. I've worked on it almost every day, reaching Claude Code Max 5x's 5 hour session limit multiple times every day. Suffice to say as a software rendered, sprite-based 90s PC game, Worms Armageddon is several orders of magnitude simpler than MGS2. Despite that, I think it will be 2-3 more months of work before I can compile a fully independent version of the game.
This is despite the game being an almost ideal candidate for automated RE, as it uses deterministic game logic with built-in checksum checks in replays and multiplayer. I've downloaded all the speedruns I could find for the game (as replay files) and I've retrofitted the replay system into a massively parallel test framework, which simulates over 600 games in about 30 seconds. So Claude can port all game logic independently without much need for manual testing; the replay tests can almost guarantee perfect correctness.
MGS2 doesn't have anything like that, so every ported function requires extensive manual testing. Even with LLM tools, an accurate decomp could take years (unless you're willing spend thousands of $currency per month on it).
This is really cool! Your process is compelling, and your choice of game is excellent. I'd like to read a long blog post about your entire journey from the beginning to a working binary once you get there.
As it happens I do have the habit of writing very long blog posts - though none on OpenWA so far. The OpenWA readme file serves as a bit of an introduction, though it's already a month old.
Decompilation to C (and even C++!) has been done automatically for 2-3 decades at least. I am not sure what has changed in recent years other than people playing fast and loose with copyright (and GitHub allowing it, likely because their LLMs also stand to benefit). Introducing LLMs here is only going to introduce errors, delays and likely push you away from a reliable result.
The challenge here is readability. Reading the TP source leak you link I think it's even behind the current state of the art, as it's barely above assembly. This is where I suspect even the smallest of LLMs may help, since you don't care that much if it introduces errors.
That's pre-2026 thinking. At this point, with the ability to lash IDA or similar tools to an agentic harness, there is no longer any such thing as a closed-source binary.
Absolutely. This is just some delusions of a vibe coder at best. Not with just current generation of AI tools but essentially never. The conversion from C, C++, Rust or whatever, through post-processing (macros etc), through IR generation, through compile time optimizations, through link time optimizations, to the generated machine code is a one way street for low level languages. You can get a pretty close higher level approximation that matches the flow/logic/structure - but the code will never be anywhere near close to the original source code. I could write the same C++ program in 3 different ways and get identical assembly, how do you go back to the exact source? The answer is that you don't.
Here's the same simple program, written in 3 different ways, producing identical binary compatible code: https://godbolt.org/z/qWrc8fEnn
How does the AI know whether it should produce back the snippet #1, #2 or #3? It does not. It cannot.
Who cares? Who said anything about recreating the exact code? You will get usable, compilable, and surprisingly readable source code, in your language of choice, that yields the functional equivalent of the binary.
Barring obvious edge cases that could show up but don't usually, like intentional race conditions. Timing is the one area where things get iffy.
It’s the real code there is code for known removed content (tanker escape scene and the 9/11 removed cutscene). Also AI can’t do what you’re theorizing yet.
>and ask them to decompile the game and structure and format as if it was the original source code. Asking them to ensure it compiles works fine
lot of people claiming this the end result is the AI downloading an emulator and rom
> It’s the real code there is code for known removed content (tanker escape scene and the 9/11 removed cutscene). Also AI can’t do what you’re theorizing yet.
There are lots of decompilation community efforts for N64 games, etc.
Someone should train a model on this. Giving the decompiled symbols good names, etc.
De-minification and de-obfuscation while we're at it.
It should be easy to generate a ton of "synthetic" (actually real) training data for this by simply compiling sources and using that as (input, output) pairs.
It’s playable but nowhere near as good and a lot of criticism is warranted. A true ‘70%’ game.
Gunplay is weak. Accuracy drops off waaaay too fast based on maximum range of the gun and burst fire has arbitrary damage reduction per bullet. So short range guns almost always missed (mechanics documented from source in the above guide) and if they hit they did little damage. It means the only viable weapons are long range weapons. Rifles and assault rifles. A submachine gun is worse than a sniper rifle even at close range.
The plot has a key gameplay changing moment that triggers waaay to early meaning you have to work to see much of the game content. Everyone tries to avoid the trigger on the second playthrough which is a silly thing to do game design wise. A desire to teleport across the map was the original motivation to the above from my point of view.
Enemies are bullet sponges in the late game too. A lame way o balance weak ai and gunplay.
It could have been as good as ja2 but they just didn’t refine the above enough.
I’m sure the builds from doing what i’ve been doing won’t generate identical bytecode but it’s fun for the sake of messing with the game or understanding it (eg. The checksum logic for newstowers save game logic was cooy pastable as was the whole save game structure formatting itself and clearly matches the game - it works!). Likewise with all the JA3 mechanics documented in that linked guide.
Maybe with the source code, I'd be able to figure out what the hell happened in the last ~2 hours of the game.
My favorite tidbit of the series (but it's in Metal Gear Solid 3): in the torture scene where Volgin interrogates Snake in an attempt to discover who is the spy in his ranks, of all the persons present in the room (and if I got everything right from memory):
- Volgin is not an agent
- Snake is an agent (USA spy)
- The Boss is a double agent (Soviet/USA spy)
- Eva is a triple agent (USA/Soviet/Chinese spy)
- Ocelot is a quadruple agent (GRU/KGB/CIA/Philosophers spy)
And of course, after that Volgin then kills Granin thinking he was the spy, but he wasn't.
Not much, just accurately predicted the next 30 years exactly
Including the cardboard boxes: https://taskandpurpose.com/news/marines-ai-paul-scharre/
Try these classic analyses: https://www.deltaheadtranslation.com/MGS2/ and https://www.aumaan.org/form1/tus1/features/dreaming1.htm.
Kojima saw the writing on the wall so speak, and told us what the future held in a series of metaphors and dense monologues.
I need scissors, 61!
Dunkey MGS Explained https://www.youtube.com/watch?v=aaLiLRVeaZA
That’s great, now do kingdom hearts
Dunkey Kingdom Hearts explained https://www.youtube.com/watch?v=8o1ieehttdA
Holy shit
Thread:
https://boards.4chan.org/vr/thread/12541637/metal-gear-solid...
The re-constructed URL:
https://pixeldrain.com/l/aPyoCBax
Now do Red Alert 2 and Yuri's Revenge!
Minecraft Legacy Console Edition apparently leaked on 4chan recently, too: https://github.com/MCLCE/MinecraftConsoles
Almost no coverage on HN or mainstream media though. Surprising, considering the popularity of this game.
>this remains a tremendous milestone for games preservation
Clearly if it was able to be leaked it already was being preserved. It is shameful that such a publication tries and celebrate copyright infringement like this.
In any sane world not completely captured by corporate interests, the game would already be in the public domain after 25 years. The harm is non-existent.
> Clearly if it was able to be leaked it already was being preserved
Preserved by whom? Many leaks are done by old or ex-employees who quietly kept a shall we say 'backup' of their work. More than one 'official' re-release has been rumored to be an embarrassed company quietly filing the serial numbers off a rogue leak because they realized way too late that their archival practices were inadequate.
Anti-emulation Nintendo was caught repacking a pirated ROM.
https://www.eurogamer.net/did-nintendo-download-a-mario-rom-...
Also how WarioWare: Smooth Moves shows their in-house developers using third-party emulators to source graphics for their first-party nostalgia bait: https://tcrf.net/WarioWare:_Smooth_Moves#Punch-Out (not said derisively; I love WarioWare!)
If someone breaks into a warehouse and makes off with a pallet of cartridges, and then those carts are recovered, would it be strange if Nintendo resold those carts? It's their property at the end of the day.
Aside from that thought exercise, like many "internet facts" this one also might not be true, and repeating it doesn't really help either "side."
https://medium.com/@AberrantWolf/mario-illegal-roms-and-medi...
It’s, what, 25 years old? There have been many sequels, prequels, remastering. The economic benefits of this IP are largely exhausted; that it is now leaked to the commons isn’t an alarming thing.
The game just had an update to support the Switch 2 only 2 months ago. It is still being used commercially.
Konami willing, they'll drag the IP to their grave. Lest we forget MGS3's first remaster... for Pachinko parlors: https://www.youtube.com/watch?v=VsJ4QgBpQN8
Recompilation efforts raise the bar for future re-releases, and incentivize proper remaster efforts like MGS Delta instead of the half-assed Master collection. I would love to see Konami thrive as a company and get more people interested in MGS, but their recent re-releases don't deserve to be priced at $60. Their monopoly of the source code feels like an existential threat to both future preservation and high-quality MGS remakes, it's healthier for Konami to simply let it go at this point.
Crowdsourcing the preservation means the one UND ONLY ONE copy can't be destroyed by fire, flood, disk failure, ransomware, whatever else.
> It is shameful that such a publication tries and celebrate copyright infringement like this.
Oh no.
Anyway.
copyright infringement is awesome
shrugs Not always.
This
Copyrights were only intended to be secure for a _limited_ time. Originally 20 years. Konami has been granted at least 2 decades of FBI backed security of their property. I'd say they got a plenty good deal and have nothing to complain about here.
because intellectual property laws are inherently worthy of respect and they are never used against consumers ever
I wonder if it’s a real leak or just an agent recreation of the source from machine code.
I’ve been having fun lately with agents and decompilation. You can literally point them at any game and ask them to decompile the game and structure and format as if it was the original source code. Asking them to ensure it compiles works fine.
Some proof: i made online save game editors for jagged alliance 3; grandcheaten.com and news tower; thedailycheat.com (.com domains are only $10 so i figured why not).
You can do this with any game i’ve found. Older games work best due to the forced simplicity of the source code though.
There is no way you could recreate a convincing enough 90s era codebase of a japanese videogame + its associated tools + scripts and commented out codepaths with current ai tools.
I wouldn't be too sure about that. The original decompilations of Mario 64 and Ocarina of Time were done mostly by hand because LLMs weren't really around yet, but these kinds of projects seem perfectly suited for handing the gritty work off to AI: There is a clear output (exact binary recreation) and a straightforward path to get there (look at this assembly code and produce some C code from it). The decompilation of Twilight Princess jumped from very little to basically 100% of core code in the past year alone: https://github.com/zeldaret/tp
I have no doubt that this would be possible for MGS2 as well.
Keep your eyes open for Sonic R too. Sadly a lot of the online Sonic community has been toxic to the dev for being transparent about using Claude for the majority of the disassembly. Even though he's a very talented developer with lots of credit to his name, and only took a few weeks compared to a year+ if fully manual.
Having followed his bsky during his announcement, he started off per-emptively dissing on his haters that... didn't even exist yet. Constantly posting memes about how everyone was dissing him and how AI was totally superior (and then posting his angry sessions with Claude when it got something wrong) when most other users were just "that's cool man". The thing that made him quit bsky was a (now-deleted) thread someone posted criticizing the weird crash-outs. I think he was more... normal about the whole thing, people would have received the project quite a bit more positively.
I don't think it's impossible, but it would take a lot of time and a lot of money; likely more time than good enough models have been commercially available.
I have been working on an incremental decompilation-based reimplementation (basically how OpenRCT2 was done) of Worms Armageddon for the past 2 months with a lot of help from LLM tools; primarily Claude Code and Ghidra MCP. I've worked on it almost every day, reaching Claude Code Max 5x's 5 hour session limit multiple times every day. Suffice to say as a software rendered, sprite-based 90s PC game, Worms Armageddon is several orders of magnitude simpler than MGS2. Despite that, I think it will be 2-3 more months of work before I can compile a fully independent version of the game.
This is despite the game being an almost ideal candidate for automated RE, as it uses deterministic game logic with built-in checksum checks in replays and multiplayer. I've downloaded all the speedruns I could find for the game (as replay files) and I've retrofitted the replay system into a massively parallel test framework, which simulates over 600 games in about 30 seconds. So Claude can port all game logic independently without much need for manual testing; the replay tests can almost guarantee perfect correctness.
MGS2 doesn't have anything like that, so every ported function requires extensive manual testing. Even with LLM tools, an accurate decomp could take years (unless you're willing spend thousands of $currency per month on it).
This is really cool! Your process is compelling, and your choice of game is excellent. I'd like to read a long blog post about your entire journey from the beginning to a working binary once you get there.
For those wondering, there is a public Git repository at https://github.com/paavohuhtala/OpenWA.
As it happens I do have the habit of writing very long blog posts - though none on OpenWA so far. The OpenWA readme file serves as a bit of an introduction, though it's already a month old.
Decompilation to C (and even C++!) has been done automatically for 2-3 decades at least. I am not sure what has changed in recent years other than people playing fast and loose with copyright (and GitHub allowing it, likely because their LLMs also stand to benefit). Introducing LLMs here is only going to introduce errors, delays and likely push you away from a reliable result.
The challenge here is readability. Reading the TP source leak you link I think it's even behind the current state of the art, as it's barely above assembly. This is where I suspect even the smallest of LLMs may help, since you don't care that much if it introduces errors.
My take was more along the lines of: it wouldn't be convincing enough, if anything it would be too clean and perfect.
Does the TP decomp use AI to achieve their speed?
That's pre-2026 thinking. At this point, with the ability to lash IDA or similar tools to an agentic harness, there is no longer any such thing as a closed-source binary.
What is the state of the art of compilers here? What size of project are we speaking here?
What is the experience faulty decompilation, and the existence of bugs in the binary?
Could one decompile a binary to a more modern language than C?
Absolutely. This is just some delusions of a vibe coder at best. Not with just current generation of AI tools but essentially never. The conversion from C, C++, Rust or whatever, through post-processing (macros etc), through IR generation, through compile time optimizations, through link time optimizations, to the generated machine code is a one way street for low level languages. You can get a pretty close higher level approximation that matches the flow/logic/structure - but the code will never be anywhere near close to the original source code. I could write the same C++ program in 3 different ways and get identical assembly, how do you go back to the exact source? The answer is that you don't.
Here's the same simple program, written in 3 different ways, producing identical binary compatible code: https://godbolt.org/z/qWrc8fEnn
How does the AI know whether it should produce back the snippet #1, #2 or #3? It does not. It cannot.
Who cares? Who said anything about recreating the exact code? You will get usable, compilable, and surprisingly readable source code, in your language of choice, that yields the functional equivalent of the binary.
Barring obvious edge cases that could show up but don't usually, like intentional race conditions. Timing is the one area where things get iffy.
> Who said anything about recreating the exact code?
The person I'm replying to? Who said you will get the same code as if it were the original source?
It’s the real code there is code for known removed content (tanker escape scene and the 9/11 removed cutscene). Also AI can’t do what you’re theorizing yet.
>and ask them to decompile the game and structure and format as if it was the original source code. Asking them to ensure it compiles works fine
lot of people claiming this the end result is the AI downloading an emulator and rom
>Also AI can’t do what you’re theorizing yet.
Did you try the above links? I haven’t shared the full source but all game mechanics listed in the ja3 guide including code snippets where helpful.
> It’s the real code there is code for known removed content (tanker escape scene and the 9/11 removed cutscene). Also AI can’t do what you’re theorizing yet.
There are lots of decompilation community efforts for N64 games, etc.
Someone should train a model on this. Giving the decompiled symbols good names, etc.
De-minification and de-obfuscation while we're at it.
It should be easy to generate a ton of "synthetic" (actually real) training data for this by simply compiling sources and using that as (input, output) pairs.
Whoa, since when is there a Jagged Alliance 3? Is it any good? JA2 is one of my favorite games of all time
It’s playable but nowhere near as good and a lot of criticism is warranted. A true ‘70%’ game.
Gunplay is weak. Accuracy drops off waaaay too fast based on maximum range of the gun and burst fire has arbitrary damage reduction per bullet. So short range guns almost always missed (mechanics documented from source in the above guide) and if they hit they did little damage. It means the only viable weapons are long range weapons. Rifles and assault rifles. A submachine gun is worse than a sniper rifle even at close range.
The plot has a key gameplay changing moment that triggers waaay to early meaning you have to work to see much of the game content. Everyone tries to avoid the trigger on the second playthrough which is a silly thing to do game design wise. A desire to teleport across the map was the original motivation to the above from my point of view.
Enemies are bullet sponges in the late game too. A lame way o balance weak ai and gunplay.
It could have been as good as ja2 but they just didn’t refine the above enough.
check for console headers. those aren't that easy to get out of LLMs
It's (probably) a real leak. There are original comments in Japanese describing cut content and game logic that was scrapped in the final release.
Raw assets are probably the better tell
How do you verify that everything is correct?
I’m sure the builds from doing what i’ve been doing won’t generate identical bytecode but it’s fun for the sake of messing with the game or understanding it (eg. The checksum logic for newstowers save game logic was cooy pastable as was the whole save game structure formatting itself and clearly matches the game - it works!). Likewise with all the JA3 mechanics documented in that linked guide.