+++*

Symbolic Forest

A homage to loading screens.

Blog : Post Category : Geekery : Page 1

Modern technology

Or, keeping the site up to date

Well, hello there! This site has been on somethng of a hiatus since last summer, for one reason and another. There’s plenty to write about, there’s plenty going on, but somehow I’ve always been too busy, too distracted, too many other things going on to sit down and want to write a blog post. Moreover, there are more technical reasons that I’ve felt I needed to get resolved too.

This site has never been a “secure site”. By that I mean, the connection between the website’s server and your browser has never been encrypted; anyone with access to the network in-between can see what you’re looking at. Alongside that, there’s no way for you to be certain that you’re looking at my genuine site, that the connection from your browser or device is actually going to me, not just to someone pretending to be me. Frankly, I’d never thought, for the sort of nonsense I post here, that it was very important. You’re not going to be sending me your bank details or your phone number; since the last big technical redesign, all of four years ago now, you haven’t been able to send me anything at all because I took away the ability to leave a comment. After that redesign was finished, “turn the site into a secure site” was certainly on the to-do list, but never very near the top of it. For one thing, I doubt anyone would ever want to impersonate me.

That changed a bit, though, in the last few months. There has been a concerted effort from the big browser companies to push users away from accessing sites that don’t use encryption. This website won’t shop up for you in search results any more. Some web browsers will show you an error page if you go to the site, and you have to deliberately click past a warning telling you, in dire terms, that people might interfere with your traffic or capture your credit card number. That’s not really a risk for this site, but the general trend has been to push non-technical users towards thinking that all non-encrypted sites are all extremely dangerous to the same degree. It might be a bit debateable, but it’s easy and straightforward for them to do, and it does at least avoid any confusion for the users, avoids them having to make any sort of value judgement about a technical issue they don’t properly understand. The side effect: it puts a barrier in front of actually viewing this site. To get over that barrier, I’d have to implement TLS security.

After I did the big rewrite, switching this site over from Wordpress to a static site generator back in 2020, I wrote a series of blog posts about the generation engine I was using and the work pipeline I came up with. What I didn’t talk about very much was how the site was actually hosted. It was hosted using Azure Storage, which lets you expose a set of files to the internet as a static website very cheaply. Azure Storage supports using TLS encryption for your website, and it supports you hosting it under a custom domain like symbolicforest.com. Unfortunately, it doesn’t very easily let you do both at the same time; you have to put a Content Delivery Network in front of your Storage container, and terminate your TLS connection on the CDN. It’s certainly possible to do, and if this was the day job then I’d happily put the parts together. For this site, though, a weird little hobby site that I don’t sometimes update for months or years at a time, it felt like a fiddly and expensive way to go.

During the last four years, though, Microsoft have introduced a new Azure products which falls somewhere in-between the Azure Storage web-hosting functionality and the fully-featured hosting of Azure App Service. This is Azure Static Web Apps, which can host static files in a similar way to Azure Storage, but with a control panel interface more like Azure App Service. Moreover, Static Web Apps feature TLS support for custom domains, out of the box, for free. This is a far cry from 20-something years ago, when I remember having to get a solicitor to prove my identity before someone would issue me with a (very expensive) TLS certificate; according to the documentation, it Just Works with no real configuration needed at all. Why not, I thought, give it a bit of a try?

With Azure Storage, you dump the files you want to serve as objects in an Azure Blob Storage container and away you go. With an App Service, you can zip up the files that form your website and upload them. Azure Static Web Apps are a bit more complex than this: they only support deployment via a CI/CD pipeline from a supported source repository hosting service. For, say, Github, Azure tries to automate it as much as possible: you link the Static Web App to your Github account, specify the repository, and Azure will create an Action which is run on updates to the main branch, and which uses Microsoft Oryx to build your site and push the build artefacts into the web app. I’m sure you could manually pull apart what Oryx does, get the web app’s security token from the Azure portal, and replicate this whole process manually, but the goal is clearly that you use a fully automated workflow.

My site had never been set up with an automated workflow: that was another “nice to have” which had never been that high on the priority list. Instead, my deployment technique was all very manual: once I had a version of the site I wanted to deploy in my main branch—whose config was set up for local running—I would merge that branch into a deploy branch which contained the production config, manually run npm run clean && npm run build in that branch, and then use a tool to upload any and all new or changed files to the Azure Storage container. Making sure this all worked inside a Github Action took a little bit of work: changing parts of the site templates, for example, to make sure that all paths within the site were relative so that a single configuration file could handle both local and production builds. I also had to make sure that the top-level npm run build script also called npm install for each subsite, including for the shared Wintersmith plugins, so that the build would run on a freshly-cloned repository without any additional steps. With a few other little tweaks to match what Oryx expected—such as the build output directory being within the source directory instead of alongside it—everything would build cleanly inside a Github action runner.

It was here I hit the major issue. One of the big attractions of Azure Static Web Apps is that they’re free! Assuming you only want a personal site, with a couple of domain names, they’re free! Being from Northern England, I definitely liked that idea. However, free Static Web Apps also have a size limit of 250Mb. Oryx was hitting it.

This site is an old site, after all. There are just over a thousand posts on here, at the time of writing,* some of them over twenty years old. You can go back through all of them, ten at a time, from the home page; or you can go through them all by category; or month by month; or there are well over 3,000 different tags. Because this site is hosted through static pages, that means the text of each post is repeated inside multiple different HTML files, as well as each post having its own individual page. All in all, that adds up to about 350Mb of data to be hosted. I have to admit, that’s quite a lot. An average of 350Kb or so per post—admittedly, there are images in there which bump that total up a bit.

In the short term, this is fixable, in theory. Azure Static Web Apps offer two Hosting Plans at present. The free one, with its 250Mb limit, and a paid one. The paid one has a 500Mb limit, which should be enough for now. In the longer term, I might need to look at solutions to reduce the amount of space per post, but for now it would work. It wasn’t that expensive, either, so I signed up. And found that…Oryx still fell over. Instead of clearly hitting a size limit, I was getting a much vaguer error message. Failure during content distribution. That’s not really very helpful; but I could see two things. Firstly, this only occurred when Oryx was deploying to my production environment, not to the staging environment, so the issue wasn’t in my build artefacts. Secondly, it always occurred just as the deployment step passed the five-minute-runtime mark—handily, it printed a log message every 15 seconds which made that nice and easy to spot. The size of the site seemed to be causing a timeout.

The obvious place to try to fix this was with the tag pages, as they were making up over a third of the total file size. For comparison, all of the images included in articles were about half, and the remaining sixth, roughly speaking, covered everything else including the individual article pages. I tried cutting the article text out of the tag pages, assuming readers would think to click through to the indivdual articles if they wanted to read them, but the upload still failed. However, I did find a hint in a Github issue, suggesting that the issue could also occur for uploads which changed lots of content. I built the site with no tag pages at all, and the upload worked. I rebuilt it with them added in again, and it still worked.

Cutting the article text out of the tag pages has only really reduced the size to about 305Kb per post, so for the long term, I am definitely going to have to do more to ensure that I can keep blogging for as long as I like without hitting that 500Mb size limit. I have a few ideas for things to do on this, but I haven’t really measured how successful they will be likely to be. Also, the current design requires pretty much every single page on the site to change when a new post is added, because of the post counts on the by-month and by-category archive pages. That was definitely a nuisance when I was manually uploading the site after building it locally; if it causes issues with the apparent 5-minute timeout, it may well prove to be a worse problem for a Static Web App. I have an idea on how to work around this, too; hopefully it will work well.

Is this change a success? Well, it’s a relatively simple way to ensure the site is TLS-secured whilst still hosting it for a relatively cheap cost, and it didn’t require too much in the way of changes to fit it in to my existing site maintenance processes. The site feels much faster and more responsive, subjectively to me, than the Azure Storage version did. There are still more improvements to do, but they are improvements that would likely have been a good idea in any case; this project is just pushing them further to the top of the heap. Hopefully it will be a while before I get near the next hosting size limit; and in the meantime, this hosting change has forced me to improve my own build-and-deploy process and make it much smoother than it was before. You never know…maybe I’ll even start writing more blog posts again.

* If I’ve added up correctly, this is post 1,004.

No more cookies!

Or, rather, no more analytics

Regular readers—or, at least, people who have looked at this site before the last month or two—might remember that it used to have a discreet cookie consent banner at the top of the page, asking if you consented to me planting a tracking cookie that I promised not to send to anyone else. It would pop up again about once a year, just to make sure you hadn’t changed your mind. If you clicked yes, you appeared on my Google Analytics dashboard. If you clicked no, you didn’t.

What you probably haven’t noticed is that it isn’t there any more. A few weeks ago now, I quietly stripped it out. This site now puts no cookies of any sort on your machine, necessary or otherwise, so there’s no need for me to ask to do it.

When I first started this site’s predecessor, twenty-something years ago, I found it quite fascinating looking at the statistics, and in particular, looking at what search terms had brought people to the site. If you look back in the archives, it used to be a common topic for posts: “look what someone was searching for and it led them to me!” What to do when you find a dead bat was one common one; and the lyrics to the childrens’ hymn “Autumn Days When The Grass Is Jewelled”. It was, I thought—and I might not have been right about this—an interesting topic to read about, and it was certainly a useful piece of filler back in the days of 2005 when I was aiming to publish a post on this site every day, rather than every month. If you go back to the archives for 2005, there’s a lot of filler.

Now, though? Hopefully there’s not as much filler on the site as there was back then. But the logs have changed. Barely anything reaches this site through “organic search” any more—”organic search” is the industry term for “people entering a search phrase in their browser and hitting a link”. Whether this means Google has got better or worse at giving people search results I don’t know—personally, for the searches I make, Google has got a lot worse for the sort of searches where I didn’t know what site I wanted to go to beforehand, but for the sort of lazy searches where I already know where I want to go, it’s got better. I suspect the first sort were generally the sort that brought people here. Anyway, all the traffic to this site comes from people who follow me on social media so follow the link when I tell them there’s a new blog post up.

Given that the analytics aren’t very interesting, I hadn’t looked at them for months. And, frankly, do I write this site in order to generate traffic to it? No, I dont. I write this site to scratch an itch, to get things off my chest, because there’s something I want to say. I write this site in order to write this site, not to drive my income or to self-promote. I don’t really need a hit counter in order to do that. Morover, I realised that in all honesty I couldn’t justify the cutesy “I’m only setting a cookie to satisfy my own innate curiosity” message I’d put in the consent banner, because although I was just doing that, I had no idea what Google were doing with the information that you’d been here. The less information they can gather on us, the better. It’s an uphill struggle, but it’s a small piece in the jigsaw.

So, no more cookies, no more consent banner and no more analytics, until I come up with the itch to write my own on-prem cookie-based analytics engine that I can promise does just give me the sort of stats that satisfy my own nosiness—which I’m not likely to do, because I have more than enough things ongoing to last me a lifetime already. This site is that little bit more indie, that little bit more Indieweb, because I can promise I’m not doing anything at all to harvest your data and not sending any of it to any third parties. The next bit to protect you will be setting up an SSL certificate, which has been on the to-do list for some months now; for this site, given that you can’t send me any data, all SSL will really do is guarantee that I’m still me and haven’t been replaced, which isn’t likely to be anything you’re particularly worried about. It will come, though, probably more as a side-piece to some other aspect of improving the site’s infrastructure than anything else. This site is, always has been, proudly independent, and I hope it always will be.

Ongoing projects

As soon as something finishes, I start two more

The crafting project I mentioned in my last post is finished! Well, aside from blocking it and framing it, that is.

An actually completed cross stitch project of a Gothenburg tram

Me being me though, I couldn’t resist immediately starting two more. And then, of course, there’s the videos still to produce. I will get to the end of the list, eventually. In the meantime, here’s some photos of a few of the things in progress.

An in-progress Lego project all set up for filming

An in-progress crochet creation; this photo is from a few months ago but I still haven't produced the video about it

Frame from another in-progress Lego build which will probably be the first of these to hit YouTube

At some point, I promise, all of these projects will be complete and will have videos to go with them! Better make a start…

Self-promotion

A couple of Yuletide videos

It’s still the Yuletide season, although we’re now very much into the time-between-the-years when everybody is grazing on snacks and leftovers, has battened down the hatches against the storms, and has completely forgotten what day of the week it is.*

As it is still Yuletide, though, I thought I’d post a couple of the seasonal Lego videos I put up on my YouTube channel last week, before the holiday season had really got under way. Both of them are Lego build videos, for some seasonal sets that I picked up earlier in the month.

Firstly, a “Winter Holiday Train”…

…and, secondly, Santa’s Workshop

In a few days they’ll be going away ready for next year, but for now, I hope you enjoy them whilst you’re still feeling a little bit seasonal!

* No, don’t ask me either.

Know your limits!

Or remember that computers are still not boxes of infinite resource, whatever you might think

Sometimes, given that I often work with people who are twenty years or so younger than me, I feel old. I mean, the archives of this blog go back over twenty years now: these are serious, intelligent colleagues, and when I started writing my first blog posts they were likely still toddlers.

Sometimes, though, that has an advantage. I was thinking of this when debugging some code a colleague had written, which worked fine up to a point, but failed if its input file was more than, say, a few tens of megabytes. When the input reached that size, the whole thing crashed with OutOfMemoryException even on a computer with multiple gigabytes of memory, a hundred times more memory than the hundred-megabyte example file the client had sent.

When I was younger, you see, that would have seemed a ridiculous amount of data, unimaginable to fit in one file. Even when I had my first PC, the thought of a file too big to fit on even a superfloppy like a Zip disk was a little bit mindblowing, even though the PC seemed massive compared to what I’d experienced before.

Back when I was at school, I’d tried to teach myself how to code on an Amstrad CPC, a mid-1980s 8-bit machine with a 64k address space and a floppy disk drive of 180k capacity. It was the second-generation of 8-bit home machine really, more powerful than a C64 or a Sinclair Spectrum despite sharing the same CPU as the latter. Unlike those, it had a fully-bitmapped screen with individual pixels all fully addressable; however, that took up 16k of the 64k address space, so the actual code on it had to be pretty damn tight to fit. The programmers’ Firmware Manual—what we’d now call the API reference documentation—is of course scanned and online; one of the reasons I was never very succssful coding on the machine itself* was that in the 1980s and 90s copies of it were almost impossible to find once Amstrad’s print run was exhausted. On the CPC, every byte you used counted; a lot of software development houses ended up cross-assembling their code purely because for a large program it was difficult to fit the source code itself onto the machine.** That’s the background I came from, and it makes me wary still nowadays not to waste too much memory or resources. I’m the sort of developer who will pass an expected size parameter to the List<T> constructor if it’s known, to avoid unnecessary reallocations, who doesn’t add ToList() automatically by reflex to the end of every LINQ operation—which is a good idea in any case, as long as you know when you do need to.

Returning to the present: what had my team member done, then, that he was provoking a machine into running out of memory when in theory he had plenty to play with? Well, there were two problems at work.

Firstly, yes, we’re talking about someone who has never tried building code on a tiny tiny environment. The purpose of this particular code was to take an input zip file, open it, modify some of its content, recompress it, and send it off to an API elsewhere. Moreover, this had been done re-using existing internal code, some of which wanted to operate on a Stream and some of which, for whatever reason I don’t know, wanted to operate on a byte[]. We had ended up with code that received the data in a MemoryStream, unzipped it in memory, and copied the contents out into more MemoryStream objects. Each of those was being copied into a byte array which was being passed to a routine that immediately copied its input into a new MemoryStream, before deserializing…well, you get the idea. The whole thing ended up with many, many copies of the input data in memory, either in essentially its original format, or in a slightly modified form, and all of these copies were still in memory at the end of the process.

Secondly, there was another issue that was not quite so much the developer’s responsibility. This .NET code was being combined in “Portable” form, and the server was, again for reasons best known to itself, deciding that it should run it with the 32-bit runtime. Therefore, although there should have been 16Gb of memory on the server instance, we were working with a 2Gb memory ceiling.

I did dig in and rewrite as much of the code as I thought I needed to. Some of the copying could be elided altogether; and as this wasn’t a time-critical piece of code, I changed a lot of the rest to use a temporary file instead of memory. The second issue had an easy, lazy fix: compile the thing as 64-bit only, so the server would have no choice of runtime. As a result I never did get to the bottom of why it was preferring the 32-bit runtime, but I had working, shippable, code at the end of the day, and that’s what mattered here.

What I couldn’t help thinking, though, was that the rewriting might not have been needed to begin with. A young developer—who’s never worked on a genuinely small system—has spent so much time, though, never worrying about working anywhere near the boundaries of what their virtual servers can cope with, that when they do hit those boundaries, it comes as a nasty, sudden shock. They have no idea at all what to do, or even where to start: an OutOfMemoryException may as well be an act of the gods. Maybe when I’m helping train people up, I should give them all an Amstrad CPC emulator and see what the result is.

* My high point was successfully cloning Minesweeper, but with keyboard controls.

** Some software was shipped on 16k ROMs, to go along with third-party ROM socket boxes that attached to the expansion bus; this kept the assembler and editor code out of the main address space, but it could still be difficult to fit the source code and the assembler output in memory at the same time. The ROMs were scanned on boot and each declared named entrypoints which could then be accessed as BASIC commands. At least one game I can remember—The Bard’s Tale—crashed if too many ROMs were attached, because each ROM could reserve an area of RAM for its own bookkeeping, and the game found itself without enough memory available.

But first, a quick commercial break

Or, links to things going on elsewhere.

It’s been quiet around here lately, partly because I’ve been trying to hide from the various summer heatwaves, and partly because I’ve been beavering away at something else in the background. I’ve set up a YouTube channel, and have posted my first proper video, the start of a Lego build. It’s only small, and I’m still learning, but one thing I’ve already learned is that coming up with the idea, shooting all the footage, writing the narration, recording it, editing the whole thing together…well, it’s a lot more work than just writing a blog post.

It makes me think, actually: years and years and years ago, Radio Scotland had a documentary about blogging, and included posts from me, read by an actor. I wonder if the actor who played me found it as much effort.

Incidentally, after the previous post on the Perseids, I did go outside for a while each night last weekend, lie down on the grass, and watch for meteors. There were a few, each night, streaking across the sky; and lying on my back looking up seemed to be the best, most comfortable way to get a full view of as much of the sky as I could. The grass is much nicer for lying on, at this time of year, than it will be for the big meteor showers of winter.

Summer astronomy news (this year's edition)

The calendar comes around to the Perseids again

Just as it was this time last year, it’s Astronomy News time because we’re coming into the season of the best and biggest meteor shower of the year, the Perseids, which reach their peak next weekend. This year the peak coincides roughly with the full moon, which is in the early hours of Friday morning, but hopefully the brightest meteors will still stand out—or you can always wait a few days into the following week, because the like most meteor showers you can still see plenty of meteors in the few days either side of the Perseids’ peak. Get a chair you can lean back in, sit outside on a clear night, and watch the sky until you see them flash across it.

Incidentally, Saturn is also the largest it gets in the sky at the moment, as we’re the closest that we will be to it this year. I might be tempted, if there’s a clear sky, to get the telescope out and have a look, to see how well I can spot its rings. Of course, annoyingly, it will also be close in the sky to the full moon next weekend just because that’s how the geometry of the solar system works. The moon is full when it’s directly opposite the sun from us. The outer planets are closest to us when we’re directly between them and the sun—which is the same thing. At least the moon moves relatively quickly in the sky, day to day, so even one day after the full moon it should be far enough away from Saturn to not be too much of a problem. I’ll just have to hope the skies are clear.

Flatlander

Or, a trip to the Crowle Peatlands Railway

There are so many preserved and heritage railways in the UK—there must be something around a hundred at the moment, depending on your definition—that it’s very difficult to know all of them intimately, or even to visit them all. It doesn’t help that still, around 55 years after the “great contraction” of the railway network in a quixotic attempt to make it return to profitability, new heritage railways occasionally appear, like mushrooms out of the ground after rain.

Which is why, a few weeks ago, I decided to pop over to Another Part Of The Forest and visit one of the very newest: the Crowle Peatland Railway. It’s still only about three and a half years since the CPR first started laying track; only about ten years since the railway’s founders first conceived of the idea, and less than a year since passengers have been able to ride on it.

The CPR was built to preserve the memory of a very specific, industrial type of narrow-gauge railway, one that is associated more with Ireland than with Britain. The Crowle Peatlands are part of one of the largest lowland bogs in England, the Thorne Moors, on the boundary between North Lincolnshire and South Yorkshire. Drained from natural wetland by the engineer Cornelis Vermuyden in the late 1620s, from the mid-19th century the area started to be mined for peat, with a network of railways bringing the peat from the moors to factories at their edges for processing and transshipment. For many decades these railways used horse haulage, but from around 1950 the peat company switched to petrol and diesel locomotives.

Part of Thorne Moors in the early 1950s

This map shows the Moors at around the time the railways started using locomotive power: you can see the lines of railways across the moors, following the drainage canals, making sharp, almost-hairpin corners. Even in the 1980s there were still between 15 and 20 miles of narrow gauge railway across Thorne Moors, remaining in use until peat mining ended around 2000; the newest locomotive on the railway was bought new as late as 1991.

The preserved railway is in the area shown on that map as “Ribbon Row”, west of Crowle, reached along a narrow, dead-straight lane across the flat landscape of the Moors. As soon as you are outside the town, it feels disconnected, remote, outside the normal world entirely. As I drove, in my mirrors, I saw a young deer crossing the road behind me.

To date, the railway has a café with very nice home-made cake. a maintenance shed, and a straight line of track stretching out across the moor. The shed is full of the sort of small diesel locomotives that worked on the moors from the 1950s onwards—and also, a Portuguese tram.

Inside the maintenance shed

I think the loco on the left is Schöma 5130 of 1990; the diamond-shaped plate is the builder’s plate of Alan Keef of Ross-on-Wye, to mark the loco being rebuilt in the late 1990s with a more powerful engine, which you can see here.

Under the bonnet

The newest loco to run on the peat railways was named after a retired member of staff in the early 1990s, shortly after it was built.

The Thomas Buck

The railway’s other locomotive is a Motor-Rail Simplex built in 1967 and abandoned around 1996 due to worn bearings.

The Simplex loco Little Peat

It’s a nice little engine, much less powerful than the newer ones, but I’m not sure what I think of lime green as a locomotive colour.

On a map—if it had made it onto any maps yet—the railway would no doubt appear as a dead straight ine; but on the ground it follows gentle curves and undulations. At 3ft gauge the track feels wide for a narrow-gauge line, especially given the small size of the powered trolley that takes you out onto the moor. On my run I was the only passenger, with a driver and guard to look after me, as we pottered out along the track, surrounded by wildflowers pressing close up to the track, fat bumblebees buzzing close to the car as we went.

The driver looks out at the road ahead

No stations; no platforms; no sidings. No signals other than a fixed distant sign warning that the track will run out soon. Just a single line of track, which stops. Back we go; with me and the guard having the best view this time, although I tried to make myself thinner and not block the driver’s line of sight.

Heading back towards the shed

That photo shows practically the whole railway, with the shed visible in the distance. We trundled back in, feeling far faster than the handful of miles per hour we were actually doing. Retracing our steps, back to the yard.

Coming in to the shed

The Crowle Peatland Railway isn’t somewhere I’ll be going back to again that soon, because for now, I’ve seen all there is to see. But it is a nice little place to visit, a reminder that everyone starts somewhere, and that sometimes a railway is just a stretch of line running out into a field. It’s open and running trains about one weekend a month at the moment, and it’s worth visiting for the cakes. Or, indeed, if you just want to be out in a landscape where the horizon is straight as a ruler, and there are few noises beyond the wind blowing the grass.

Going through things one by one

Or, a coding exercise

One of my flaws is that as soon as I’m familiar with something, I assume it must be common knowledge. I love tutoring and mentoring people, but I’m bad at pitching exactly where their level might be, and in working out what they might not have come across before. Particularly, in my career, software development is one of those skills where beyond a certain base level nearly all your knowledge is picked up through osmosis and experience, rather than through formal training. Sometimes, when I’m reviewing my team’s code I come across things that surprise me a little. That’s where this post comes from, really: a few months back I spotted something in a review and realised it wouldn’t work.

This post is about C#, so apologies to anyone with no interest in coding in general or C# in particular; I’ll try to explain this at a straightforward level, so that even if you don’t know the language you can work out what’s going on. First, though, I have to explain a few basics. That’s because there’s one particular thing in C# (in .NET, in fact) that you can’t do, that people learn very on that you can’t do, and you have to find workarounds for. This post is about a very similar situation, which doesn’t work for the same reason, but that isn’t necessarily immediately obvious even to an experienced coder. In order for you to understand that, I’m going to explain the well-known case first.

Since its first version over twenty years ago, C# has had the concept of “enumerables” and “enumerators”. An enumerable is essentially something that consists of a set of items, all of the same type, that you can process or handle one-by-one. An enumerator is a thing that lets you do this. In other words, you can go to an enumerable and say “can I have an enumerator, please”, and you should get an enumerator that’s linked to your enumerable. You can then keep saying to the enumerator: “can I have the next thing from the enumerable?” until the enumerator tells you there’s none left.

This is all expressed in the methods IEnumerable<T>.GetEnumerator()* and IEnumerator<T>.MoveNext(), not to mention the IEnumerator<T>.Current property, which nobody ever actually uses. In fact, the documentation explicity recommends you don’t use them, because they have easier wrappers. For example, the foreach statement.

List<string> someWords = new List<string>() { "one", "two", "three" };
foreach (string word in someWords)
{
    Process(word);
}

Under the hood, this is equivalent** to:

List<string> someWords = new List<string>() { "one", "two", "three" };
IEnumerator<string> wordEnumerator = someWords.GetEnumerator();
while (wordEnumerator.MoveNext())
{
    string word = wordEnumerator.Current;
    Process(word);
}

The foreach statement is essentially using a hidden enumerator that the programmer doesn’t need to worry about.

The thing that developers generally learn very early on is that you can’t modify the contents of an enumerable whilst it’s being enumerated. Well, you can, but your enumerator will be rendered unusable. On your next call to the enumerator, it will throw an exception.

// This code won't work
List<string> someWords = new List<string>() { "one", "two", "three" };
foreach (string word in someWords)
{
    if (word.Contains('e'));
    {
        someWords.Remove(word);
    }
}

This makes sense, if you think about it: it’s reasonable for an enumerator to be able to expect that it’s working on solid ground, so to speak. If you try to jiggle the carpet underneath it, it falls over, because it might not know where to step next. If you want to do this using a foreach, you will need to do it some other way, such as by making a copy of the list.

List<string> someWords = new List<string>() { "one", "two", "three" };
List<string> copy = someWords.ToList();
foreach (string word in copy)
{
    if (word.Contains('e'));
    {
        someWords.Remove(word);
    }
}

So, one of my colleagues was in this situation, and came up with what seemed like a nice, clean way to handle this. They were going to use the LINQ API to both make the copy and do the filtering, in one go. LINQ is a very helpful API that gives you filtering, projection and aggregate methods on enumerables. It’s a “fluent API”, which means it’s designed for you to be able to chain calls together. In their code, they used the Where() method, which takes an enumerable and returns an enumerable containing the items from the first enumerable which matched a given condition.

// Can you see where the bug is?
List<string> someWords = new List<string>() { "one", "two", "three" };
IEnumerable<string> filteredWords = someWords.Where(w => w.Contains('e'));
foreach (string word in filteredWords)
{
    someWords.Remove(word);
}

This should work, right? We’re not iterating over the enumerable we’re modifying, we’re iterating over the new, filtered enumerable. So why does this crash with the same exception as the previous example?

The answer is that LINQ methods—strictly speaking, here, we’re using “LINQ-To-Objects”—don’t return the same type of thing as their parameter. They return an IEnunerable<T>, but they don’t guarantee exactly what implementation of IEnumerable<T> they might return. Moreover, in general, LINQ prefers “lazy evaluation”. This means that Where() doesn’t actually do the filtering when it’s called—that would be a very inefficient strategy on a large dataset, because you’d potentially be creating a second copy of the dataset in memory. Instead, it returns a wrapper object, which doesn’t actually evaulate its filter until something tries to enumerate it.

In other words, when the foreach loop iterates over filteredWords, filteredWords isn’t a list of words itself. It’s an object that, at that point, goes to its source data and thinks: “does that match? OK, pass it through.” And the next time: “does that match? No, next. Does that match? Yes, pass it through.” So the foreach loop is still, ultimately, triggering one or more enumerations of someWords each time we go around the loop, even though it doesn’t immediately appear to be used.

What’s the best way to fix this? Well, in this toy example, you really could just do this:

someWords = someWords.Where(w => !w.Contains('e')).ToList();

which gets rid of the loop completely. If you can’t do that for some reason—and I can’t remember why we couldn’t do that in the real-world code this is loosely based on—you can add a ToList() call onto the line creating filteredWords, forcing evaluation of the filter at that point. Or, you could avoid a foreach loop a different way by converting it to a for loop, which are a bit more flexible than a foreach and in this case would save memory slightly; the downside is a bit more typing and that your code becomes prone to subtle off-by-one errors if you don’t think it through thoroughly. There’s nearly always more than one way to do something like this, and they all have their own upsides and downsides.

I said at the start, I spotted the issue here straightaway just by reading the code, not by trying to run it. If I hadn’t spotted it inside somebody else’s code, I wouldn’t even have thought to write a blog post on something like this. There are always going to be people, though, who didn’t realise that the code would behave like this because they hadn’t really thought about how LINQ works; just as there are always developers who go the other way and slap a ToList() on the end of the LINQ chain because they don’t understand how LINQ works but have come across this problem before and know that ToList() fixed it. Hopefully, some of the people who read this post will now have learned something they didn’t know before; and if you didn’t, I hope at least you found it interesting.

* Note. for clarity I’m only going to use the generic interface in this post. There is also a non-generic interface, but as only the very first versions of C# didn’t support generics, we really don’t need to worry about that. If you write your own enumerable you’re still required to support the non-generic interface, but you can usually do so with one line of boilerplate: public IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();

** In recent versions of C#, at any rate. In earlier versions, the equivalence was slightly different. The change was a subtle but potentially breaking one, causing a change of behaviour in cases where the loop variable was captured by a lambda expression.

The Paper Archives (part three)

The title of this series is maybe not quite as suitable as it was

The previous post in this series is here.

Sometimes, sorting through the accumulated junk that fills my mother’s house, I come across things that I remember from my childhood. For example: alongside the stack of modern radio transceivers that my dad used to speak to random strangers over the airwaves, is the radio I remember being my Nanna’s kitchen radio, sitting on top of the fridge.

The old kitchen radio

It’s a big, clunky thing for a portable, its frame made of leather-covered plywood. I know it has valves (or tubes) inside, not transistors, because I remember my dad having to source spare valves for it and plug them in back when my Nanna still used it daily—he was the only person in the family who knew how to work out which of the valves had popped when it stopped working.

With only a vague idea how old it might be, I looked at the tuning dial to see if it would give me any clues.

The tuning dial

Clearly from before the Big BBC Renaming of the late 1960s. I’m not sure how much it can be trusted for dating, though, as Radio Athlone officially changed to Radio Éireann in the 1930s, but I was fairly sure the radio probably wasn’t quite that old. Of course, I should really have beeen looking at the bottom.

The makers' plate

And of course the internet can tell you exactly when a Murphy BU183M was first sold: 1956, a revision of the 1952 BU183, which had the same case. The rather more stylish B283 model came out the following year, so I suspect not that many of the BU183M were made.

I’m intrigued by the wide range of voltages it can run off: nowadays that sort of input voltage range is handled simply and automatically by power electronics, but in the 1950s you had to open your radio up and make sure the transformer was set correctly before you tried to plug it in, just in case you were about to blow yourself up otherwise. I suppose this is what radio shops were for, to do that for you, and potentially to hire out the large, chunky high-voltage batteries you might need if you didn’t have mains electricity. This radio is from the last years of the valve radio: low-voltage transistor sets were about to enter the marketplace and completely change how we listened to music. This beast—or the B283, which at least looks like an early transistor radio—needed a 90-volt battery to heat up the valves if you wanted to run them without mains power, not the sort of battery you can easily carry around in your handbag. The world has changed a lot in seventy years.