+++*

Symbolic Forest

A homage to loading screens.

Blog : Posts tagged with ‘development’

Going through things one by one

Or, a coding exercise

One of my flaws is that as soon as I’m familiar with something, I assume it must be common knowledge. I love tutoring and mentoring people, but I’m bad at pitching exactly where their level might be, and in working out what they might not have come across before. Particularly, in my career, software development is one of those skills where beyond a certain base level nearly all your knowledge is picked up through osmosis and experience, rather than through formal training. Sometimes, when I’m reviewing my team’s code I come across things that surprise me a little. That’s where this post comes from, really: a few months back I spotted something in a review and realised it wouldn’t work.

This post is about C#, so apologies to anyone with no interest in coding in general or C# in particular; I’ll try to explain this at a straightforward level, so that even if you don’t know the language you can work out what’s going on. First, though, I have to explain a few basics. That’s because there’s one particular thing in C# (in .NET, in fact) that you can’t do, that people learn very on that you can’t do, and you have to find workarounds for. This post is about a very similar situation, which doesn’t work for the same reason, but that isn’t necessarily immediately obvious even to an experienced coder. In order for you to understand that, I’m going to explain the well-known case first.

Since its first version over twenty years ago, C# has had the concept of “enumerables” and “enumerators”. An enumerable is essentially something that consists of a set of items, all of the same type, that you can process or handle one-by-one. An enumerator is a thing that lets you do this. In other words, you can go to an enumerable and say “can I have an enumerator, please”, and you should get an enumerator that’s linked to your enumerable. You can then keep saying to the enumerator: “can I have the next thing from the enumerable?” until the enumerator tells you there’s none left.

This is all expressed in the methods IEnumerable<T>.GetEnumerator()* and IEnumerator<T>.MoveNext(), not to mention the IEnumerator<T>.Current property, which nobody ever actually uses. In fact, the documentation explicity recommends you don’t use them, because they have easier wrappers. For example, the foreach statement.

List<string> someWords = new List<string>() { "one", "two", "three" };
foreach (string word in someWords)
{
    Process(word);
}

Under the hood, this is equivalent** to:

List<string> someWords = new List<string>() { "one", "two", "three" };
IEnumerator<string> wordEnumerator = someWords.GetEnumerator();
while (wordEnumerator.MoveNext())
{
    string word = wordEnumerator.Current;
    Process(word);
}

The foreach statement is essentially using a hidden enumerator that the programmer doesn’t need to worry about.

The thing that developers generally learn very early on is that you can’t modify the contents of an enumerable whilst it’s being enumerated. Well, you can, but your enumerator will be rendered unusable. On your next call to the enumerator, it will throw an exception.

// This code won't work
List<string> someWords = new List<string>() { "one", "two", "three" };
foreach (string word in someWords)
{
    if (word.Contains('e'));
    {
        someWords.Remove(word);
    }
}

This makes sense, if you think about it: it’s reasonable for an enumerator to be able to expect that it’s working on solid ground, so to speak. If you try to jiggle the carpet underneath it, it falls over, because it might not know where to step next. If you want to do this using a foreach, you will need to do it some other way, such as by making a copy of the list.

List<string> someWords = new List<string>() { "one", "two", "three" };
List<string> copy = someWords.ToList();
foreach (string word in copy)
{
    if (word.Contains('e'));
    {
        someWords.Remove(word);
    }
}

So, one of my colleagues was in this situation, and came up with what seemed like a nice, clean way to handle this. They were going to use the LINQ API to both make the copy and do the filtering, in one go. LINQ is a very helpful API that gives you filtering, projection and aggregate methods on enumerables. It’s a “fluent API”, which means it’s designed for you to be able to chain calls together. In their code, they used the Where() method, which takes an enumerable and returns an enumerable containing the items from the first enumerable which matched a given condition.

// Can you see where the bug is?
List<string> someWords = new List<string>() { "one", "two", "three" };
IEnumerable<string> filteredWords = someWords.Where(w => w.Contains('e'));
foreach (string word in filteredWords)
{
    someWords.Remove(word);
}

This should work, right? We’re not iterating over the enumerable we’re modifying, we’re iterating over the new, filtered enumerable. So why does this crash with the same exception as the previous example?

The answer is that LINQ methods—strictly speaking, here, we’re using “LINQ-To-Objects”—don’t return the same type of thing as their parameter. They return an IEnunerable<T>, but they don’t guarantee exactly what implementation of IEnumerable<T> they might return. Moreover, in general, LINQ prefers “lazy evaluation”. This means that Where() doesn’t actually do the filtering when it’s called—that would be a very inefficient strategy on a large dataset, because you’d potentially be creating a second copy of the dataset in memory. Instead, it returns a wrapper object, which doesn’t actually evaulate its filter until something tries to enumerate it.

In other words, when the foreach loop iterates over filteredWords, filteredWords isn’t a list of words itself. It’s an object that, at that point, goes to its source data and thinks: “does that match? OK, pass it through.” And the next time: “does that match? No, next. Does that match? Yes, pass it through.” So the foreach loop is still, ultimately, triggering one or more enumerations of someWords each time we go around the loop, even though it doesn’t immediately appear to be used.

What’s the best way to fix this? Well, in this toy example, you really could just do this:

someWords = someWords.Where(w => !w.Contains('e')).ToList();

which gets rid of the loop completely. If you can’t do that for some reason—and I can’t remember why we couldn’t do that in the real-world code this is loosely based on—you can add a ToList() call onto the line creating filteredWords, forcing evaluation of the filter at that point. Or, you could avoid a foreach loop a different way by converting it to a for loop, which are a bit more flexible than a foreach and in this case would save memory slightly; the downside is a bit more typing and that your code becomes prone to subtle off-by-one errors if you don’t think it through thoroughly. There’s nearly always more than one way to do something like this, and they all have their own upsides and downsides.

I said at the start, I spotted the issue here straightaway just by reading the code, not by trying to run it. If I hadn’t spotted it inside somebody else’s code, I wouldn’t even have thought to write a blog post on something like this. There are always going to be people, though, who didn’t realise that the code would behave like this because they hadn’t really thought about how LINQ works; just as there are always developers who go the other way and slap a ToList() on the end of the LINQ chain because they don’t understand how LINQ works but have come across this problem before and know that ToList() fixed it. Hopefully, some of the people who read this post will now have learned something they didn’t know before; and if you didn’t, I hope at least you found it interesting.

* Note. for clarity I’m only going to use the generic interface in this post. There is also a non-generic interface, but as only the very first versions of C# didn’t support generics, we really don’t need to worry about that. If you write your own enumerable you’re still required to support the non-generic interface, but you can usually do so with one line of boilerplate: public IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();

** In recent versions of C#, at any rate. In earlier versions, the equivalence was slightly different. The change was a subtle but potentially breaking one, causing a change of behaviour in cases where the loop variable was captured by a lambda expression.

Curious problem

In which we have an obscure font problem, in annoyingly specific circumstances

Only a day after the new garden blog went live, I found myself with a problem. This morning, I noticed a problem with it, on K’s PC. Moreover, it was only a problem on K’s PC. On her PC, in Firefox and in IE, the heading font was hugely oversized compared to the rest of the page. In Chrome, everything was fine.

Now, I’d tested the site in all of my browsers. On my Windows PC, running Window 7 just like K’s, there were no problems in any of the browsers I’d tried. On my Linux box, all fine; on my FreeBSD box, all fine. But on K’s PC, apart from in Chrome, the heading font was completely out. Whether I tried setting an absolute size or a relative size, the heading font was completely out.

All of the fonts on the new site are loaded through the Google Webfonts API, because it’s nice and simple and practically no different to self-hosting your fonts. Fiddling around with it, I noticed something strange: it wasn’t just a problem specific to K’s PC, it was a problem specific to this specific font. Changing the font to anything else: no problems at all. With the font I originally chose: completely the wrong size on the one PC. Bizarre.

After spending a few hours getting more and more puzzled and frustrated, I decided that, to be frank, I wasn’t that attached to the specific font. So, from day 2, the garden blog is using a different font on its masthead. The old one – for reference, “Love Ya Like A Sister” by Kimberly Geswein – was abandoned, rather than wrestle with getting it to render at the right size on every computer out there. The replacement – “Cabin Sketch” by Pablo Impallari – does that reliably, as far as I’ve noticed;* and although it’s different it fits in just as well.

* this is where someone writes in and says it looks wrong on their Acorn Archimedes, or something along those lines.

Not In My Back Garden

In which we talk about redevelopment and green space

Having just moved house, we’re very aware right now that in the south-west, affordable housing is hard to find. It might be getting harder, too. Yesterday’s news included an announcement that local councils will be able to block developments on garden land.

Note that the article there is rather optimistic as to whether that type of development will be stopped. It won’t be; the decision on whether to allow it will be devolved to local government, which is in democratic terms a Good Thing that’s hard to argue against. In practical terms, though, it means that developments will be stopped in areas where residents have the means and inclination to be influential and to lean on their councillors; and will be concentrated in areas where nobody’s going to complain. In other words, another polarisation policy, to increase the economic differentiation of our towns and suburbs.

At first sight, I thought, it sounds like it might be a good idea. After all, I grew up in a leafy suburb, built in a time and place when housing plots included reasonable gardens, and so I quite enjoy tree-lined avenues and verdant cul-de-sacs that help you forget you’re in a city. But, thinking about it, I’m not so sure it’s a good idea. Verdant cul-de-sacs are nice, but affordable housing is better. A blanket ban on building over gardens isn’t what’s needed; what would be more useful is a more general control on maximum density of housing. If the planning regulations included a rule that every X square metres of new housing must include Y square metres of private or public garden space, then developers would be as free as they liked to demolish old houses and replace them with flats; the open space and the greenery would be preserved, just in a slightly different form.

It doesn’t take much, after all, to give an area the greenery it needs. Symbolic Towers, from the front, has no green space at all, one house in a line of terrace with virtually every front yard concreted, tiled or gravelled over.* At the back, we only have a small square of garden, too. But despite its small size, the garden and the gardens alongside are a quiet, peaceful, green space, sheltered from the inner city with trees and bushes.

It’s easy to forget, when a development is fresh and harsh, how time mellows a landscape. As I said, I grew up in a tree-lined surburban estate, and that’s how it is in my memory. When I look back at photos from my childhood, though, I’m shocked by how bare it looks. There’s hardly any greenery to be seen: it’s a stark landscape of red-brick houses, bare, plain lawns and sticky saplings staked into the ground here and there. In my memory it’s always as it is now, those saplings all fleshed out into fully-grown trees, and gardens grown up to fill in the spaces.** We forget that gardens take time to grow and mature; we forget, indeed, that Britain has no such thing as natural countryside at all, even our “ancient woodlands” being to some extent man-made.*** Developing on garden space isn’t necessarily a bad thing, so long as some green space remains; and it’s easy for “we don’t want to lose the green space next door” to be a cover for “we don’t want flats that just anyone can afford next door!” If we have rules that ensure that some green space will remain, we can redevelop our cities in a sensible and healthy way. And in thirty years time, those new flats will be surrounded by greenery, and people will wonder that their street was ever any different.

* Do not ask about the gravel. Unless, that is, you would like some free gravel.

** Memo to my parents, 30 years ago: think twice about moving into a house with a horse chestnut sapling planted at the end of the driveway, because before it’s a third fully-grown it will already have buggered up the drains.

*** They are still ancient, of course. But pollen analysis shows firstly that their mixture of trees is rather different to the genuine primaeval forest that grew up between the end of the recent ice age and the start of farming; and, secondly, that we probably have rather more woodland today than we did 2,000 years or so ago.

Masochism

In which we go back to BASICs

No, I’m not a masochist.

I take a strange, geeky, masochistic pleasure, though, in making things hard for myself. In doing computer-based things the long way round. In solving the problems that are probably easy for some people, but hard for me. In learning new things just because it’s a new challenge.

Today, I was wrestling with a piece of Basic code in an Excel spreadsheet. I’ve not touched Basic since it had line numbers, which is a long long time ago, and I barely know any of it. I forced myself to work out, though, how to do what I wanted.* It was mentally hard work, and meant a lot of looking back and forth to the help pages, but I got it done in the end. It might not be written in the best way, the most efficient way, or the most idiomatic way.** But doing it was, strangely, fun.

* or, rather, what the consultant I was assisting wanted.

** for non-geeks: every computer language or system has its own programming idioms, which fit certain ways of programming particular problems. Someone used to language A will, on switching to language Z, often keep on programming in language A’s style even if this produces ugly and inefficient code in the other language.