+++*

Symbolic Forest

A homage to loading screens.

Blog : Posts tagged with ‘.NET Framework’

Going through things one by one

Or, a coding exercise

One of my flaws is that as soon as I’m familiar with something, I assume it must be common knowledge. I love tutoring and mentoring people, but I’m bad at pitching exactly where their level might be, and in working out what they might not have come across before. Particularly, in my career, software development is one of those skills where beyond a certain base level nearly all your knowledge is picked up through osmosis and experience, rather than through formal training. Sometimes, when I’m reviewing my team’s code I come across things that surprise me a little. That’s where this post comes from, really: a few months back I spotted something in a review and realised it wouldn’t work.

This post is about C#, so apologies to anyone with no interest in coding in general or C# in particular; I’ll try to explain this at a straightforward level, so that even if you don’t know the language you can work out what’s going on. First, though, I have to explain a few basics. That’s because there’s one particular thing in C# (in .NET, in fact) that you can’t do, that people learn very on that you can’t do, and you have to find workarounds for. This post is about a very similar situation, which doesn’t work for the same reason, but that isn’t necessarily immediately obvious even to an experienced coder. In order for you to understand that, I’m going to explain the well-known case first.

Since its first version over twenty years ago, C# has had the concept of “enumerables” and “enumerators”. An enumerable is essentially something that consists of a set of items, all of the same type, that you can process or handle one-by-one. An enumerator is a thing that lets you do this. In other words, you can go to an enumerable and say “can I have an enumerator, please”, and you should get an enumerator that’s linked to your enumerable. You can then keep saying to the enumerator: “can I have the next thing from the enumerable?” until the enumerator tells you there’s none left.

This is all expressed in the methods IEnumerable<T>.GetEnumerator()* and IEnumerator<T>.MoveNext(), not to mention the IEnumerator<T>.Current property, which nobody ever actually uses. In fact, the documentation explicity recommends you don’t use them, because they have easier wrappers. For example, the foreach statement.

List<string> someWords = new List<string>() { "one", "two", "three" };
foreach (string word in someWords)
{
    Process(word);
}

Under the hood, this is equivalent** to:

List<string> someWords = new List<string>() { "one", "two", "three" };
IEnumerator<string> wordEnumerator = someWords.GetEnumerator();
while (wordEnumerator.MoveNext())
{
    string word = wordEnumerator.Current;
    Process(word);
}

The foreach statement is essentially using a hidden enumerator that the programmer doesn’t need to worry about.

The thing that developers generally learn very early on is that you can’t modify the contents of an enumerable whilst it’s being enumerated. Well, you can, but your enumerator will be rendered unusable. On your next call to the enumerator, it will throw an exception.

// This code won't work
List<string> someWords = new List<string>() { "one", "two", "three" };
foreach (string word in someWords)
{
    if (word.Contains('e'));
    {
        someWords.Remove(word);
    }
}

This makes sense, if you think about it: it’s reasonable for an enumerator to be able to expect that it’s working on solid ground, so to speak. If you try to jiggle the carpet underneath it, it falls over, because it might not know where to step next. If you want to do this using a foreach, you will need to do it some other way, such as by making a copy of the list.

List<string> someWords = new List<string>() { "one", "two", "three" };
List<string> copy = someWords.ToList();
foreach (string word in copy)
{
    if (word.Contains('e'));
    {
        someWords.Remove(word);
    }
}

So, one of my colleagues was in this situation, and came up with what seemed like a nice, clean way to handle this. They were going to use the LINQ API to both make the copy and do the filtering, in one go. LINQ is a very helpful API that gives you filtering, projection and aggregate methods on enumerables. It’s a “fluent API”, which means it’s designed for you to be able to chain calls together. In their code, they used the Where() method, which takes an enumerable and returns an enumerable containing the items from the first enumerable which matched a given condition.

// Can you see where the bug is?
List<string> someWords = new List<string>() { "one", "two", "three" };
IEnumerable<string> filteredWords = someWords.Where(w => w.Contains('e'));
foreach (string word in filteredWords)
{
    someWords.Remove(word);
}

This should work, right? We’re not iterating over the enumerable we’re modifying, we’re iterating over the new, filtered enumerable. So why does this crash with the same exception as the previous example?

The answer is that LINQ methods—strictly speaking, here, we’re using “LINQ-To-Objects”—don’t return the same type of thing as their parameter. They return an IEnunerable<T>, but they don’t guarantee exactly what implementation of IEnumerable<T> they might return. Moreover, in general, LINQ prefers “lazy evaluation”. This means that Where() doesn’t actually do the filtering when it’s called—that would be a very inefficient strategy on a large dataset, because you’d potentially be creating a second copy of the dataset in memory. Instead, it returns a wrapper object, which doesn’t actually evaulate its filter until something tries to enumerate it.

In other words, when the foreach loop iterates over filteredWords, filteredWords isn’t a list of words itself. It’s an object that, at that point, goes to its source data and thinks: “does that match? OK, pass it through.” And the next time: “does that match? No, next. Does that match? Yes, pass it through.” So the foreach loop is still, ultimately, triggering one or more enumerations of someWords each time we go around the loop, even though it doesn’t immediately appear to be used.

What’s the best way to fix this? Well, in this toy example, you really could just do this:

someWords = someWords.Where(w => !w.Contains('e')).ToList();

which gets rid of the loop completely. If you can’t do that for some reason—and I can’t remember why we couldn’t do that in the real-world code this is loosely based on—you can add a ToList() call onto the line creating filteredWords, forcing evaluation of the filter at that point. Or, you could avoid a foreach loop a different way by converting it to a for loop, which are a bit more flexible than a foreach and in this case would save memory slightly; the downside is a bit more typing and that your code becomes prone to subtle off-by-one errors if you don’t think it through thoroughly. There’s nearly always more than one way to do something like this, and they all have their own upsides and downsides.

I said at the start, I spotted the issue here straightaway just by reading the code, not by trying to run it. If I hadn’t spotted it inside somebody else’s code, I wouldn’t even have thought to write a blog post on something like this. There are always going to be people, though, who didn’t realise that the code would behave like this because they hadn’t really thought about how LINQ works; just as there are always developers who go the other way and slap a ToList() on the end of the LINQ chain because they don’t understand how LINQ works but have come across this problem before and know that ToList() fixed it. Hopefully, some of the people who read this post will now have learned something they didn’t know before; and if you didn’t, I hope at least you found it interesting.

* Note. for clarity I’m only going to use the generic interface in this post. There is also a non-generic interface, but as only the very first versions of C# didn’t support generics, we really don’t need to worry about that. If you write your own enumerable you’re still required to support the non-generic interface, but you can usually do so with one line of boilerplate: public IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();

** In recent versions of C#, at any rate. In earlier versions, the equivalence was slightly different. The change was a subtle but potentially breaking one, causing a change of behaviour in cases where the loop variable was captured by a lambda expression.

Technical advice post of the week

Or, what to do with a particular compilation problem

This week, Microsoft released .NET 5, and it reminded me I’ve been meaning to post a piece of technical advice that has bitten me a few times but which doesn’t seem to be very well-documented or well-described online. It’s a piece of technical advice, though, that will slowly be fading away in relevance because it’s advice on .NET Framework; so I thought I should put it up here whilst it is still helpful to people.

(Note for non-technical readers, who are used to photos of trains and cemeteries and probably won’t find this post very interesting: .NET 5 is the latest version of .NET Core, which is the replacement for .NET Framework, hence Microsoft have dropped “Core” from the name to try to make that clearer. .NET 5 is the successor to .NET Core 3, because there were many very popular versions of .NET Framework 4.x which were and are heavily used for a long time, so Microsoft thought reusing the number 4 would be just Too Confusing. Are you less confused now?)

This problem, too, is pretty much specific to people working in teams. It only happens (well, I’ve only seen it happen) if all of the following apply:

  • you’re working in parallel in a team, on a complex system, probably that has solutions containing a relatively large number of projects
  • you’re using the MSBuild tool as part of your continuous integration pipeline, deployment process, or similar
  • you’re using Git as your version control system

The symptoms of the problem are:

  • You can open the solution in Visual Studio and build it with no problems
  • When MSBuild tries to build the solution, it immediately errors, claiming that the solution file has a syntax error on line 2.

Spoiler: there is no syntax error on line 2.

Another note for non-technical readers who are still here: what you might think of loosely as a programming project, in any kind of .NET flavour, has a primary file called a “solution” (its name ends in .sln). The solution contains one or more “projects”; each project contains code. Visual Studio can open your solution file and your project files and turn the projects into some sort of output product, such as a program, a website, a code library or whatever. However, you don’t have to use Visual Studio to do this. .NET Framework has a program called MSBuild that does the same thing. If you have automated your build process (which if you’re working in a team you probably should) and you’re using .NET Framework, your build process will probably use MSBuild to do its work. What happens here is one of a range of problems called “well it worked on my machine”. A developer has code that seems to be in a happy, working state, they upload their code to the team’s server, the automated build process runs, and the automated build process falls over and says it doesn’t work.

The cause of the problem is: two people on the team have added different projects to the solution, in parallel. Now, Git is often quite good, when two people change code at the same time, at either working out how to merge the changes together, or at least, asking you to sort the situation out manually. This, though, is a situation where Git does the wrong thing and breaks your solution file—but it breaks it in a way that only MSBuild notices, and that Visual Studio happily ignores.

The reason this happens is down to the syntax of solution files. The part which lists the projects they contain looks a bit like this:

Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Important.Project.Library", "Important.Project.Library\Important.Project.Library.csproj", "{E6FF8E04-A41D-446B-9F8A-CCFAF4B08AD2}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Important.Project", "Important.Project\Important.Project.csproj", "{9A7E2940-50B8-4F3A-A535-AB6220E6CE3A}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Important.Project.Tests", "Important.Project.Tests\Important.Project.Tests.csproj", "{68035DDB-1C24-407C-B6B3-32CEC1D964E5}"
EndProject

Don’t worry too much about what each line says: the important thing to spot is that each project has a pair of lines: a Project(...) = line that contains the important information, and an EndProject line that, er, doesn’t. The projects are in a fairly arbitrary order, too; on your screen in Visual Studio they get sorted alphabetically, but that isn’t reflected in the file, where they are in the order they were added in.

The real cause of the problem is that Git doesn’t know that every Project... has to be followed by an EndProject. So, imagine two people have added new, different projects to the solution file. Git sees this and thinks: Alice has added Project... to line 42, and Bob has added a different Project... to line 42. So I’ll make those into lines 42 and 43. Alice added EndProject to line 43, and so did Bob, so I’ll just pop that in as line 44. So you get this:

Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Alices.Library", "Alices.Library\Alices.Library.csproj", "{0902233A-3857-4E5E-99F4-54F3F5E695E5}"
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Bobs.Library", "Bobs.Library\Bobs.Library.csproj", "{56ABE9BB-1373-43D3-B1C5-1526E443AD73}"
EndProject

Visual Studio is quite unpeturbed by this. MSBuild, however, doesn’t like it at all. It reads the file, realises there’s a Project without a matching EndProject, and falls over. For some reason, it always complains that the error is on line 2, even though it isn’t anywhere near line 2.

The fix for this, as you might have guessed, is to open up the solution file in a text editor and manually enter that missing EndProject line after Alice’s project. And that’s it. Or, if you don’t feel comfortable going in and hacking your solution file directly, remember I said that Visual Studio is completely unfazed by this? You can make some sort of small change in Visual Studio that will imply a different change to the solution file: for example, tell it not to build one of the projects in one of the build configurations. Visual Studio doesn’t just change that bit, it will write out the whole file from scratch, so the problem gets silently fixed for you. Which one is less work depends on which one you’re happier doing, to be honest.

That’s the abstruse technical post over for now. Next time I write one, I’ll see if I can find something even more technically obscure.