SimCity had a read-after-free bug that Microsoft patched in Windows 95. That was a lot easier for customers than having Maxis fix it, which could have required exchanging copies of the game.
I think we're starting to see more of this sort of thing happening now with Proton and Wine gaining prominence in the Linux community. Some games (Elden Ring comes to mind) have bad enough PC ports when they come out that the compatibility layer can incorporate a hotfix to improve performance, while users of the software on the original platform still had to suffer.
Fairly sure GPU drivers do the same thing where they include a ton of per game tweaks to make them run faster. It does feel like a fragile way of doing things where an external component that should be agnostic to the software running ends up including a handful of junk trying to fix stuff that should have been fixed by the consumer of the driver.
GPU driver packages are already a huge collection of workarounds for bad game engine coding.
An Nvidia employee once told me that one of the easiest ways to squeeze out a few extra frames on your old machine is to rename the game executable to hl2.exe.
This sounds like a really interesting story, would like to read more on why half life 2 specifically? the game itself was pretty well optimized and ran on really low end hardware even back in the day.
If you go back 5 years, everyone was using Quake 3 Arena as the benchmark. ATI got in some hot water because if you renamed quake3.exe to quack3.exe, your FPS would drop by 15%, because they were silently reducing quality to juice their benchmark numbers.
> Anyway, my colleague found that there was one program that needed to allocate around 64KB of memory on the stack and initialize it. The standard way of doing this is to perform a stack probe to ensure that 64KB of memory is available, then subtracting 65536 from the stack pointer, and then initializing the memory in a small, tight loop.
Actually, the standard way of allocating 64 kB of memory on the stack is to just assume you can do it, subtract 64k from the stack pointer, and hope for the best.
Most stack allocations in the wild are not checked.
This reminds me of a story from 15 years ago, where I was developing a technology to download games on demand by hooking into the OS calls.
There was a particular game that was superslow when this tech was applied. Original game loading took around 15-20 seconds, whereas once the tech was applied it took easily 3-5 min, even with all data already downloaded.
When I started digging into it, I realized the reason was the game was using something like
fread(data, 1, 65536, fptr);
instead of
fread(data, 65536, 1, fptr);
Which basically expanded back in the day to 65k reads of 1 byte for several MB file. Each fread translated to 65k reads of ReadFile Windows API. Since my code was hooking on ReadFile system call, and my call was heavier than ReadFile, the game loading felt really slow. Unusable. It would have not been fun for players.
The easy fix was to swap arguments for certain calls. The long fix required to use an internal cache to account for these cases so that the hooked ReadFile was faster when data was already in disk.
Funny thing is that as we started rolling out the tech and applying it to more and more games we realized lots of games did this. We went for the cache fix and games ended up loading faster than before. Honestly, games could have load all the data in a couple of seconds by just swapping the args. I'm guessing developers did this on purpose so that games seemed like they were loading a lot of stuff, although you never know.
I used to be a graphics card/chip architect for macs in the early/mid 90s - our chips were the fastest, but some programs were resistant because they did stupid stuff: pagemaker invalidated the font cache every time it went thru its main loop, quark with ATM did an n*2 thing every time it wrote text etc etc. We had special hardware to accelerate text drawing and it did nothing because the software pissed it away. We considered creating a plugin that fixed all these things, it would have been hard to maintain, in the end we travelled around to the people who made these apps and talked them through their problems
To be fair excel would erase places white that it wanted to write up to 9 times before it drew any black pixels, we made that very fast! we didn't tell them :-)
At the time 24-bit framebuffers were so slow that before we built graphics acceleration hardware people would switch back to 8-bit to get stuff done, making 24-bit/true colour your daily driver was a big step forward.
People from Transmeta told me stories about how their translators were full of special case optimizations to fix horrors they discovered in Microsoft Windows itself.
There is no indication that the compiler that produced the code was Microsoft's. Actually the article hints otherwise ("[...] whatever compiler was used to compile this code").
SimCity had a read-after-free bug that Microsoft patched in Windows 95. That was a lot easier for customers than having Maxis fix it, which could have required exchanging copies of the game.
I think we're starting to see more of this sort of thing happening now with Proton and Wine gaining prominence in the Linux community. Some games (Elden Ring comes to mind) have bad enough PC ports when they come out that the compatibility layer can incorporate a hotfix to improve performance, while users of the software on the original platform still had to suffer.
Fairly sure GPU drivers do the same thing where they include a ton of per game tweaks to make them run faster. It does feel like a fragile way of doing things where an external component that should be agnostic to the software running ends up including a handful of junk trying to fix stuff that should have been fixed by the consumer of the driver.
It goes the other way too, sometimes you trigger some optimization silliness in the driver and the game needs to adapt to avoid it.
GPU driver packages are already a huge collection of workarounds for bad game engine coding.
An Nvidia employee once told me that one of the easiest ways to squeeze out a few extra frames on your old machine is to rename the game executable to hl2.exe.
> to rename the game executable to hl2.exe
This seems genuinely unbelievable. Does anyone have a technical explanation for this?
gpu drivers detect games, among other thing by looking at executable names
then driver "optimizes" behavior, sometimes dishonestly (reducing precision), sometimes honestly (working around game engine stupidity)
This sounds like a really interesting story, would like to read more on why half life 2 specifically? the game itself was pretty well optimized and ran on really low end hardware even back in the day.
Because everyone reported performance metrics using it as a benchmark. Higher number = more sales.
If you go back 5 years, everyone was using Quake 3 Arena as the benchmark. ATI got in some hot water because if you renamed quake3.exe to quack3.exe, your FPS would drop by 15%, because they were silently reducing quality to juice their benchmark numbers.
A big portion of GPU driver updates are actually just that, same with Windows updates.
Windows 95 patched a bug in SimCity just to get it to work.
To be fair it is possible that the developer enabled a special "unroll all loops, no matter what" optimisation flag during compilation.
I agree it would be stupid for a compiler to even support such a flag, but those were the 1980s/90s.
Ahh... Good old funrollloops...
https://www.shlomifish.org/humour/by-others/funroll-loops/Ge...
> Anyway, my colleague found that there was one program that needed to allocate around 64KB of memory on the stack and initialize it. The standard way of doing this is to perform a stack probe to ensure that 64KB of memory is available, then subtracting 65536 from the stack pointer, and then initializing the memory in a small, tight loop.
Actually, the standard way of allocating 64 kB of memory on the stack is to just assume you can do it, subtract 64k from the stack pointer, and hope for the best.
Most stack allocations in the wild are not checked.
Arguably more of an optimization, rather than a fix. Looks like un-unrolling a loop, or better, rolling a loop. Or rolling straight line code?
This reminds me of a story from 15 years ago, where I was developing a technology to download games on demand by hooking into the OS calls.
There was a particular game that was superslow when this tech was applied. Original game loading took around 15-20 seconds, whereas once the tech was applied it took easily 3-5 min, even with all data already downloaded.
When I started digging into it, I realized the reason was the game was using something like
instead of Which basically expanded back in the day to 65k reads of 1 byte for several MB file. Each fread translated to 65k reads of ReadFile Windows API. Since my code was hooking on ReadFile system call, and my call was heavier than ReadFile, the game loading felt really slow. Unusable. It would have not been fun for players.The easy fix was to swap arguments for certain calls. The long fix required to use an internal cache to account for these cases so that the hooked ReadFile was faster when data was already in disk.
Funny thing is that as we started rolling out the tech and applying it to more and more games we realized lots of games did this. We went for the cache fix and games ended up loading faster than before. Honestly, games could have load all the data in a couple of seconds by just swapping the args. I'm guessing developers did this on purpose so that games seemed like they were loading a lot of stuff, although you never know.
I used to be a graphics card/chip architect for macs in the early/mid 90s - our chips were the fastest, but some programs were resistant because they did stupid stuff: pagemaker invalidated the font cache every time it went thru its main loop, quark with ATM did an n*2 thing every time it wrote text etc etc. We had special hardware to accelerate text drawing and it did nothing because the software pissed it away. We considered creating a plugin that fixed all these things, it would have been hard to maintain, in the end we travelled around to the people who made these apps and talked them through their problems
To be fair excel would erase places white that it wanted to write up to 9 times before it drew any black pixels, we made that very fast! we didn't tell them :-)
At the time 24-bit framebuffers were so slow that before we built graphics acceleration hardware people would switch back to 8-bit to get stuff done, making 24-bit/true colour your daily driver was a big step forward.
Betting Alpha was the native architecture in question. It seemed to have the best support.
heh, when Raymond Chen dunks on the MSVC team =)
> they fixed it during emulation
It means the fix was applied to run during the emulation loop execution, not that the fix was found and applied while the emulation loop was running.
Which would have made it an emulation code escape.
People from Transmeta told me stories about how their translators were full of special case optimizations to fix horrors they discovered in Microsoft Windows itself.
Couldn't they just turn the optimization off for this loop?
They didn't have the code for the offensive program, they were creating the emulator to run it on a different architecture.
> offensive program
Agreed.
Which optimizer replaces a 64k loop with 64k instructions?
Ah, yes. Microsoft's!
There is no indication that the compiler that produced the code was Microsoft's. Actually the article hints otherwise ("[...] whatever compiler was used to compile this code").
> All in all, it took this program 256 kilobytes of code to initialize 64 kilobytes of data.
solidity sweating profusely