+++*

Symbolic Forest

A homage to loading screens.

Blog

Modern technology

Or, keeping the site up to date

Well, hello there! This site has been on somethng of a hiatus since last summer, for one reason and another. There’s plenty to write about, there’s plenty going on, but somehow I’ve always been too busy, too distracted, too many other things going on to sit down and want to write a blog post. Moreover, there are more technical reasons that I’ve felt I needed to get resolved too.

This site has never been a “secure site”. By that I mean, the connection between the website’s server and your browser has never been encrypted; anyone with access to the network in-between can see what you’re looking at. Alongside that, there’s no way for you to be certain that you’re looking at my genuine site, that the connection from your browser or device is actually going to me, not just to someone pretending to be me. Frankly, I’d never thought, for the sort of nonsense I post here, that it was very important. You’re not going to be sending me your bank details or your phone number; since the last big technical redesign, all of four years ago now, you haven’t been able to send me anything at all because I took away the ability to leave a comment. After that redesign was finished, “turn the site into a secure site” was certainly on the to-do list, but never very near the top of it. For one thing, I doubt anyone would ever want to impersonate me.

That changed a bit, though, in the last few months. There has been a concerted effort from the big browser companies to push users away from accessing sites that don’t use encryption. This website won’t shop up for you in search results any more. Some web browsers will show you an error page if you go to the site, and you have to deliberately click past a warning telling you, in dire terms, that people might interfere with your traffic or capture your credit card number. That’s not really a risk for this site, but the general trend has been to push non-technical users towards thinking that all non-encrypted sites are all extremely dangerous to the same degree. It might be a bit debateable, but it’s easy and straightforward for them to do, and it does at least avoid any confusion for the users, avoids them having to make any sort of value judgement about a technical issue they don’t properly understand. The side effect: it puts a barrier in front of actually viewing this site. To get over that barrier, I’d have to implement TLS security.

After I did the big rewrite, switching this site over from Wordpress to a static site generator back in 2020, I wrote a series of blog posts about the generation engine I was using and the work pipeline I came up with. What I didn’t talk about very much was how the site was actually hosted. It was hosted using Azure Storage, which lets you expose a set of files to the internet as a static website very cheaply. Azure Storage supports using TLS encryption for your website, and it supports you hosting it under a custom domain like symbolicforest.com. Unfortunately, it doesn’t very easily let you do both at the same time; you have to put a Content Delivery Network in front of your Storage container, and terminate your TLS connection on the CDN. It’s certainly possible to do, and if this was the day job then I’d happily put the parts together. For this site, though, a weird little hobby site that I don’t sometimes update for months or years at a time, it felt like a fiddly and expensive way to go.

During the last four years, though, Microsoft have introduced a new Azure products which falls somewhere in-between the Azure Storage web-hosting functionality and the fully-featured hosting of Azure App Service. This is Azure Static Web Apps, which can host static files in a similar way to Azure Storage, but with a control panel interface more like Azure App Service. Moreover, Static Web Apps feature TLS support for custom domains, out of the box, for free. This is a far cry from 20-something years ago, when I remember having to get a solicitor to prove my identity before someone would issue me with a (very expensive) TLS certificate; according to the documentation, it Just Works with no real configuration needed at all. Why not, I thought, give it a bit of a try?

With Azure Storage, you dump the files you want to serve as objects in an Azure Blob Storage container and away you go. With an App Service, you can zip up the files that form your website and upload them. Azure Static Web Apps are a bit more complex than this: they only support deployment via a CI/CD pipeline from a supported source repository hosting service. For, say, Github, Azure tries to automate it as much as possible: you link the Static Web App to your Github account, specify the repository, and Azure will create an Action which is run on updates to the main branch, and which uses Microsoft Oryx to build your site and push the build artefacts into the web app. I’m sure you could manually pull apart what Oryx does, get the web app’s security token from the Azure portal, and replicate this whole process manually, but the goal is clearly that you use a fully automated workflow.

My site had never been set up with an automated workflow: that was another “nice to have” which had never been that high on the priority list. Instead, my deployment technique was all very manual: once I had a version of the site I wanted to deploy in my main branch—whose config was set up for local running—I would merge that branch into a deploy branch which contained the production config, manually run npm run clean && npm run build in that branch, and then use a tool to upload any and all new or changed files to the Azure Storage container. Making sure this all worked inside a Github Action took a little bit of work: changing parts of the site templates, for example, to make sure that all paths within the site were relative so that a single configuration file could handle both local and production builds. I also had to make sure that the top-level npm run build script also called npm install for each subsite, including for the shared Wintersmith plugins, so that the build would run on a freshly-cloned repository without any additional steps. With a few other little tweaks to match what Oryx expected—such as the build output directory being within the source directory instead of alongside it—everything would build cleanly inside a Github action runner.

It was here I hit the major issue. One of the big attractions of Azure Static Web Apps is that they’re free! Assuming you only want a personal site, with a couple of domain names, they’re free! Being from Northern England, I definitely liked that idea. However, free Static Web Apps also have a size limit of 250Mb. Oryx was hitting it.

This site is an old site, after all. There are just over a thousand posts on here, at the time of writing,* some of them over twenty years old. You can go back through all of them, ten at a time, from the home page; or you can go through them all by category; or month by month; or there are well over 3,000 different tags. Because this site is hosted through static pages, that means the text of each post is repeated inside multiple different HTML files, as well as each post having its own individual page. All in all, that adds up to about 350Mb of data to be hosted. I have to admit, that’s quite a lot. An average of 350Kb or so per post—admittedly, there are images in there which bump that total up a bit.

In the short term, this is fixable, in theory. Azure Static Web Apps offer two Hosting Plans at present. The free one, with its 250Mb limit, and a paid one. The paid one has a 500Mb limit, which should be enough for now. In the longer term, I might need to look at solutions to reduce the amount of space per post, but for now it would work. It wasn’t that expensive, either, so I signed up. And found that…Oryx still fell over. Instead of clearly hitting a size limit, I was getting a much vaguer error message. Failure during content distribution. That’s not really very helpful; but I could see two things. Firstly, this only occurred when Oryx was deploying to my production environment, not to the staging environment, so the issue wasn’t in my build artefacts. Secondly, it always occurred just as the deployment step passed the five-minute-runtime mark—handily, it printed a log message every 15 seconds which made that nice and easy to spot. The size of the site seemed to be causing a timeout.

The obvious place to try to fix this was with the tag pages, as they were making up over a third of the total file size. For comparison, all of the images included in articles were about half, and the remaining sixth, roughly speaking, covered everything else including the individual article pages. I tried cutting the article text out of the tag pages, assuming readers would think to click through to the indivdual articles if they wanted to read them, but the upload still failed. However, I did find a hint in a Github issue, suggesting that the issue could also occur for uploads which changed lots of content. I built the site with no tag pages at all, and the upload worked. I rebuilt it with them added in again, and it still worked.

Cutting the article text out of the tag pages has only really reduced the size to about 305Kb per post, so for the long term, I am definitely going to have to do more to ensure that I can keep blogging for as long as I like without hitting that 500Mb size limit. I have a few ideas for things to do on this, but I haven’t really measured how successful they will be likely to be. Also, the current design requires pretty much every single page on the site to change when a new post is added, because of the post counts on the by-month and by-category archive pages. That was definitely a nuisance when I was manually uploading the site after building it locally; if it causes issues with the apparent 5-minute timeout, it may well prove to be a worse problem for a Static Web App. I have an idea on how to work around this, too; hopefully it will work well.

Is this change a success? Well, it’s a relatively simple way to ensure the site is TLS-secured whilst still hosting it for a relatively cheap cost, and it didn’t require too much in the way of changes to fit it in to my existing site maintenance processes. The site feels much faster and more responsive, subjectively to me, than the Azure Storage version did. There are still more improvements to do, but they are improvements that would likely have been a good idea in any case; this project is just pushing them further to the top of the heap. Hopefully it will be a while before I get near the next hosting size limit; and in the meantime, this hosting change has forced me to improve my own build-and-deploy process and make it much smoother than it was before. You never know…maybe I’ll even start writing more blog posts again.

* If I’ve added up correctly, this is post 1,004.