Symbolic Forest : 2022 : October

Know your limits!

Or remember that computers are still not boxes of infinite resource, whatever you might think

Published at 4:03 pm on October 15th, 2022

Sometimes, given that I often work with people who are twenty years or so younger than me, I feel old. I mean, the archives of this blog go back over twenty years now: these are serious, intelligent colleagues, and when I started writing my first blog posts they were likely still toddlers.

Sometimes, though, that has an advantage. I was thinking of this when debugging some code a colleague had written, which worked fine up to a point, but failed if its input file was more than, say, a few tens of megabytes. When the input reached that size, the whole thing crashed with OutOfMemoryException even on a computer with multiple gigabytes of memory, a hundred times more memory than the hundred-megabyte example file the client had sent.

When I was younger, you see, that would have seemed a ridiculous amount of data, unimaginable to fit in one file. Even when I had my first PC, the thought of a file too big to fit on even a superfloppy like a Zip disk was a little bit mindblowing, even though the PC seemed massive compared to what I’d experienced before.

Back when I was at school, I’d tried to teach myself how to code on an Amstrad CPC, a mid-1980s 8-bit machine with a 64k address space and a floppy disk drive of 180k capacity. It was the second-generation of 8-bit home machine really, more powerful than a C64 or a Sinclair Spectrum despite sharing the same CPU as the latter. Unlike those, it had a fully-bitmapped screen with individual pixels all fully addressable; however, that took up 16k of the 64k address space, so the actual code on it had to be pretty damn tight to fit. The programmers’ Firmware Manual—what we’d now call the API reference documentation—is of course scanned and online; one of the reasons I was never very succssful coding on the machine itself^* was that in the 1980s and 90s copies of it were almost impossible to find once Amstrad’s print run was exhausted. On the CPC, every byte you used counted; a lot of software development houses ended up cross-assembling their code purely because for a large program it was difficult to fit the source code itself onto the machine.^** That’s the background I came from, and it makes me wary still nowadays not to waste too much memory or resources. I’m the sort of developer who will pass an expected size parameter to the List<T> constructor if it’s known, to avoid unnecessary reallocations, who doesn’t add ToList() automatically by reflex to the end of every LINQ operation—which is a good idea in any case, as long as you know when you do need to.

Returning to the present: what had my team member done, then, that he was provoking a machine into running out of memory when in theory he had plenty to play with? Well, there were two problems at work.

Firstly, yes, we’re talking about someone who has never tried building code on a tiny tiny environment. The purpose of this particular code was to take an input zip file, open it, modify some of its content, recompress it, and send it off to an API elsewhere. Moreover, this had been done re-using existing internal code, some of which wanted to operate on a Stream and some of which, for whatever reason I don’t know, wanted to operate on a byte[]. We had ended up with code that received the data in a MemoryStream, unzipped it in memory, and copied the contents out into more MemoryStream objects. Each of those was being copied into a byte array which was being passed to a routine that immediately copied its input into a new MemoryStream, before deserializing…well, you get the idea. The whole thing ended up with many, many copies of the input data in memory, either in essentially its original format, or in a slightly modified form, and all of these copies were still in memory at the end of the process.

Secondly, there was another issue that was not quite so much the developer’s responsibility. This .NET code was being combined in “Portable” form, and the server was, again for reasons best known to itself, deciding that it should run it with the 32-bit runtime. Therefore, although there should have been 16Gb of memory on the server instance, we were working with a 2Gb memory ceiling.

I did dig in and rewrite as much of the code as I thought I needed to. Some of the copying could be elided altogether; and as this wasn’t a time-critical piece of code, I changed a lot of the rest to use a temporary file instead of memory. The second issue had an easy, lazy fix: compile the thing as 64-bit only, so the server would have no choice of runtime. As a result I never did get to the bottom of why it was preferring the 32-bit runtime, but I had working, shippable, code at the end of the day, and that’s what mattered here.

What I couldn’t help thinking, though, was that the rewriting might not have been needed to begin with. A young developer—who’s never worked on a genuinely small system—has spent so much time, though, never worrying about working anywhere near the boundaries of what their virtual servers can cope with, that when they do hit those boundaries, it comes as a nasty, sudden shock. They have no idea at all what to do, or even where to start: an OutOfMemoryException may as well be an act of the gods. Maybe when I’m helping train people up, I should give them all an Amstrad CPC emulator and see what the result is.

* My high point was successfully cloning Minesweeper, but with keyboard controls.

** Some software was shipped on 16k ROMs, to go along with third-party ROM socket boxes that attached to the expansion bus; this kept the assembler and editor code out of the main address space, but it could still be difficult to fit the source code and the assembler output in memory at the same time. The ROMs were scanned on boot and each declared named entrypoints which could then be accessed as BASIC commands. At least one game I can remember—The Bard’s Tale—crashed if too many ROMs were attached, because each ROM could reserve an area of RAM for its own bookkeeping, and the game found itself without enough memory available.

Keyword noise: software, software development, resources, coding.

Symbolic Forest

Blog : Posts from October 2022

Know your limits!

Contact

Post Categories

Common Tags

Previously, on Symbolic Forest…

Links elsewhere