+++*

Symbolic Forest

A homage to loading screens.

Blog

We can rebuild it! We have the technology! (part one)

Or, how many different ways can you host a website?

I said the other day I’d write something about how I rebuilt the site, what choices I made and what coding was involved. I’ve a feeling this might end up stretched into a couple of posts or so, concentrating on different areas. We’ll start, though, by talking about the tech I used to redevelop the site with, and, indeed, how websites tend to be structured in general.

Back in the early days of the web, 25 or 30 years ago now, to create a website you wrote all your code into files and pushed it up to a web server. When a user went to the site, the server would send them exactly the files you’d written, and their browser would display them. The server didn’t do very much at all, and nor did the browser, but sites like this were a pain to maintain. If you look at this website, aside from the text in the middle you’re actually reading, there’s an awful lot of stuff which is the same on every page. We’ve got the header at the top and the sidebar down below (or over on the right, if you’re reading this on a desktop PC). Moreover, look at how we show the number of posts I’ve written each month, or the number in each category. One new post means every single page has to be updated with the new count. Websites from the early days of the web didn’t have that sort of feature, because they would have been ridiculous to maintain.

The previous version of this site used Wordpress, technology from the next generation onward. With Wordpress, the site’s files contain a whole load of code that’s actually run by the web server itself: most of it written by the Wordpress developers, some of it written by the site developer. The code contains templates that control how each kind of page on the site should look; the content itself sits in a database. Whenever someone loads a page from the website, the web server runs the code for that template; the code finds the right content in the database, merges the content into the template, and sends it back to the user. This is the way that most Content Management Systems (CMSes) work, and is really good if you want your site to include features that are dynamically-generated and potentially different on every request, like a “search this site” function. However, it means your webserver is doing much more work than if it’s just serving up static and unchanging files. Your database is doing a lot of work, too, potentially. Databases are seen as a bit of an arcane art by a lot of software developers; they tend to be a bit of a specialism in their own right, because they can be quite unintuitive to get the best performance from. The more sophisticated your database server is, the harder it is to tune it to get the best performance from it, because how the database is searching for your data tends to be unintuitive and opaque. This is a topic that deserves an essay in its own right; all you really need to know right now is that database code can have very different performance characteristics when run against different sizes of dataset, not just because the data is bigger, but because the database itself will decide to crack the problem in an entirely different way. Real-world corporate database tuning is a full-time job; at the other end of the scale, you are liable to find that as your Wordpress blog gets bigger as you add more posts to it, you suddenly pass a point where pages from your website become horribly slow to load, and unless you know how to tune the database manually yourself you’re not going to be able to do much about it.

I said that’s how most CMSes work, but it doesn’t have to be that way. If you’ve tried blogging yourself you might have heard of the Movable Type blogging platform. This can generate each page on request like Wordpress does, but in its original incarnation it didn’t support that. The software ran on the webserver like Wordpress does, but it wasn’t needed when a user viewed the website. Instead, whenever the blogger added a new post to the site, or edited an existing post, the Movable Type software would run and generate all of the possible pages that were available so they could be served as static pages. This takes a few minutes to do each time, but that’s a one-off cost that isn’t particularly important, whereas serving pages to users becomes very fast. Where this architecture falls down is if that costly regeneration process can be triggered by some sort of end-user action. If your site allows comments, and you put something comment-dependent into the on-every-page parts of your template - the number of comments received next to links to recent posts, for example - then only small changes in the behaviour of your end-users hugely increase the load on your site. I understand Movable Type does now support dynamically-generated pages as well, but I haven’t played with it for many years so can’t tell you how the two different architectures are integrated together.

Nowadays most heavily-used sites, including blogs, have moved towards what I supposed you could call a third generation of architectural style, which offloads the majority of the computing and rendering work onto the user’s browser. The code is largely written using JavaScript frameworks such as Facebook’s React, and on the server side you have a number of simple “microservices” each carefully tuned to do a specific task, often a particular database query. Your web browser will effectively download the template and run the template on your computer (or phone), calling back to the microservices to load each chunk of information. If I wrote this site using that sort of architecture, for example, you’d probably have separate microservice calls to load the list of posts to show, the post content (maybe one call, maybe one per post), the list of category links, the list of month links, the list of popular tags and the list of links to other sites. The template files themselves have gone full-circle: they’re statically-hosted files and the webserver sends them back just as they are. This is a really good system for busy, high-traffic sites. It will be how your bank’s website works, for example, or Facebook, Twitter and so on, because it’s much more straightforward to efficiently scale a site designed this way to process high levels of traffic. Industrial-strength hosting systems, like Amazon Web Services or Microsoft Azure, have moved in ways to make this architecture very efficiently hostable, too. On the downside, your device has to download a relatively large framework library, and run its code itself. It also then has to make a number of round-trips to the back-end microservices, which can take some time on a high-latency connection. This is why sometimes a website will start loading, but then you’ll just have the website’s own spinning wait icon in the middle of the screen.

Do I need something quite so heavily-engineered for this site? Probably not. It’s not as if this site is intended to be some kind of engineering portfolio; it’s also unlikely ever to get a huge amount of traffic. With any software project, one of the most important things to do to ensure success is to make sure you don’t get distracted from what your requirements actually are. The requirements for this site are, in no real order, to be cheap to run, easy to update, and fun for me to work on; which also implies I need to be able to just sit back and write, rather than spend long periods of time working on site administration or fighting with the sort of in-browser editor used by most CMS systems. Additionally, because this site does occasionally still get traffic to some of the posts I wrote years ago, if possible I want to make sure posts retain the same URLs as they did with Wordpress.

With all that in mind, I’ve gone for a “static site generator”. This architecture works in pretty much the same way as the older versions of Movable Type I described earlier, except that none of the code runs on the server. Instead, all the code is stored on my computer (well, I store it in source control, which is maybe a topic we’ll come back to at another time) and I run it on my computer, whenever I want to make a change to the site. That generates a folder full of files, and those files then all get uploaded to the server, just as if it was still 1995, except nowadays I can write myself a tool to automate it. This gives me a site that is hopefully blink-and-you’ll-miss-it fast for you to load (partly because I didn’t incorporate much code that runs on your machine), that I have full control over, and that can be hosted very cheaply.

There are a few static site generators you can choose from if you decide to go down this architectural path, assuming you don’t want to completely roll your own. The market leader is probably Gatsby, although it has recently had some well-publicised problems in its attempt to repeat Wordpress’s success in pivoting from being a code firm to a hosting firm. Other popular examples are Jekyll and Eleventy. I decided to go with a slightly less-fashionable but very flexible option, Wintersmith. It’s not as widely-used as the others, but it is very small and slim and easily extensible, which for me means that it’s more fun to play with and adapt, to tweak to get exactly the results I want rather than being forced into a path by what the software can do. As I said above, if you want your project to be successful, don’t be distracted away from what your requirements originally were.

The downside to Wintersmith, for me, is that it’s written in CoffeeScript, a language I don’t know particularly well. However, CoffeeScript code is arguably just a different syntax for writing JavaScript,* which I do know, so I realised at the start that if I did want to write new code, I could just do it in JavaScript anyway. If I familiarised myself with CoffeeScript along the way, so much the better. We’ll get into how I did that; how I built this site and wrote my own plugins for Wintersmith to do it, in the next part of this post.

The next part of this post, in which we discuss how to get Wintersmith to reproduce some of the features of Wordpress, is here

* This sort of distinction—is this a different language or is this just a dialect—is the sort of thing which causes controversies in software development almost as much as it does in the natural languages. However, CoffeeScript’s official website tries to avoid controversy by taking a clear line on this: “The golden rule of CoffeeScript is: ’it’s just JavaScript’”.