+++*

Symbolic Forest

A homage to loading screens.

Blog : Posts tagged with ‘Wordpress’

We can rebuild it! We have the technology! (part one)

Or, how many different ways can you host a website?

I said the other day I’d write something about how I rebuilt the site, what choices I made and what coding was involved. I’ve a feeling this might end up stretched into a couple of posts or so, concentrating on different areas. We’ll start, though, by talking about the tech I used to redevelop the site with, and, indeed, how websites tend to be structured in general.

Back in the early days of the web, 25 or 30 years ago now, to create a website you wrote all your code into files and pushed it up to a web server. When a user went to the site, the server would send them exactly the files you’d written, and their browser would display them. The server didn’t do very much at all, and nor did the browser, but sites like this were a pain to maintain. If you look at this website, aside from the text in the middle you’re actually reading, there’s an awful lot of stuff which is the same on every page. We’ve got the header at the top and the sidebar down below (or over on the right, if you’re reading this on a desktop PC). Moreover, look at how we show the number of posts I’ve written each month, or the number in each category. One new post means every single page has to be updated with the new count. Websites from the early days of the web didn’t have that sort of feature, because they would have been ridiculous to maintain.

The previous version of this site used Wordpress, technology from the next generation onward. With Wordpress, the site’s files contain a whole load of code that’s actually run by the web server itself: most of it written by the Wordpress developers, some of it written by the site developer. The code contains templates that control how each kind of page on the site should look; the content itself sits in a database. Whenever someone loads a page from the website, the web server runs the code for that template; the code finds the right content in the database, merges the content into the template, and sends it back to the user. This is the way that most Content Management Systems (CMSes) work, and is really good if you want your site to include features that are dynamically-generated and potentially different on every request, like a “search this site” function. However, it means your webserver is doing much more work than if it’s just serving up static and unchanging files. Your database is doing a lot of work, too, potentially. Databases are seen as a bit of an arcane art by a lot of software developers; they tend to be a bit of a specialism in their own right, because they can be quite unintuitive to get the best performance from. The more sophisticated your database server is, the harder it is to tune it to get the best performance from it, because how the database is searching for your data tends to be unintuitive and opaque. This is a topic that deserves an essay in its own right; all you really need to know right now is that database code can have very different performance characteristics when run against different sizes of dataset, not just because the data is bigger, but because the database itself will decide to crack the problem in an entirely different way. Real-world corporate database tuning is a full-time job; at the other end of the scale, you are liable to find that as your Wordpress blog gets bigger as you add more posts to it, you suddenly pass a point where pages from your website become horribly slow to load, and unless you know how to tune the database manually yourself you’re not going to be able to do much about it.

I said that’s how most CMSes work, but it doesn’t have to be that way. If you’ve tried blogging yourself you might have heard of the Movable Type blogging platform. This can generate each page on request like Wordpress does, but in its original incarnation it didn’t support that. The software ran on the webserver like Wordpress does, but it wasn’t needed when a user viewed the website. Instead, whenever the blogger added a new post to the site, or edited an existing post, the Movable Type software would run and generate all of the possible pages that were available so they could be served as static pages. This takes a few minutes to do each time, but that’s a one-off cost that isn’t particularly important, whereas serving pages to users becomes very fast. Where this architecture falls down is if that costly regeneration process can be triggered by some sort of end-user action. If your site allows comments, and you put something comment-dependent into the on-every-page parts of your template - the number of comments received next to links to recent posts, for example - then only small changes in the behaviour of your end-users hugely increase the load on your site. I understand Movable Type does now support dynamically-generated pages as well, but I haven’t played with it for many years so can’t tell you how the two different architectures are integrated together.

Nowadays most heavily-used sites, including blogs, have moved towards what I supposed you could call a third generation of architectural style, which offloads the majority of the computing and rendering work onto the user’s browser. The code is largely written using JavaScript frameworks such as Facebook’s React, and on the server side you have a number of simple “microservices” each carefully tuned to do a specific task, often a particular database query. Your web browser will effectively download the template and run the template on your computer (or phone), calling back to the microservices to load each chunk of information. If I wrote this site using that sort of architecture, for example, you’d probably have separate microservice calls to load the list of posts to show, the post content (maybe one call, maybe one per post), the list of category links, the list of month links, the list of popular tags and the list of links to other sites. The template files themselves have gone full-circle: they’re statically-hosted files and the webserver sends them back just as they are. This is a really good system for busy, high-traffic sites. It will be how your bank’s website works, for example, or Facebook, Twitter and so on, because it’s much more straightforward to efficiently scale a site designed this way to process high levels of traffic. Industrial-strength hosting systems, like Amazon Web Services or Microsoft Azure, have moved in ways to make this architecture very efficiently hostable, too. On the downside, your device has to download a relatively large framework library, and run its code itself. It also then has to make a number of round-trips to the back-end microservices, which can take some time on a high-latency connection. This is why sometimes a website will start loading, but then you’ll just have the website’s own spinning wait icon in the middle of the screen.

Do I need something quite so heavily-engineered for this site? Probably not. It’s not as if this site is intended to be some kind of engineering portfolio; it’s also unlikely ever to get a huge amount of traffic. With any software project, one of the most important things to do to ensure success is to make sure you don’t get distracted from what your requirements actually are. The requirements for this site are, in no real order, to be cheap to run, easy to update, and fun for me to work on; which also implies I need to be able to just sit back and write, rather than spend long periods of time working on site administration or fighting with the sort of in-browser editor used by most CMS systems. Additionally, because this site does occasionally still get traffic to some of the posts I wrote years ago, if possible I want to make sure posts retain the same URLs as they did with Wordpress.

With all that in mind, I’ve gone for a “static site generator”. This architecture works in pretty much the same way as the older versions of Movable Type I described earlier, except that none of the code runs on the server. Instead, all the code is stored on my computer (well, I store it in source control, which is maybe a topic we’ll come back to at another time) and I run it on my computer, whenever I want to make a change to the site. That generates a folder full of files, and those files then all get uploaded to the server, just as if it was still 1995, except nowadays I can write myself a tool to automate it. This gives me a site that is hopefully blink-and-you’ll-miss-it fast for you to load (partly because I didn’t incorporate much code that runs on your machine), that I have full control over, and that can be hosted very cheaply.

There are a few static site generators you can choose from if you decide to go down this architectural path, assuming you don’t want to completely roll your own. The market leader is probably Gatsby, although it has recently had some well-publicised problems in its attempt to repeat Wordpress’s success in pivoting from being a code firm to a hosting firm. Other popular examples are Jekyll and Eleventy. I decided to go with a slightly less-fashionable but very flexible option, Wintersmith. It’s not as widely-used as the others, but it is very small and slim and easily extensible, which for me means that it’s more fun to play with and adapt, to tweak to get exactly the results I want rather than being forced into a path by what the software can do. As I said above, if you want your project to be successful, don’t be distracted away from what your requirements originally were.

The downside to Wintersmith, for me, is that it’s written in CoffeeScript, a language I don’t know particularly well. However, CoffeeScript code is arguably just a different syntax for writing JavaScript,* which I do know, so I realised at the start that if I did want to write new code, I could just do it in JavaScript anyway. If I familiarised myself with CoffeeScript along the way, so much the better. We’ll get into how I did that; how I built this site and wrote my own plugins for Wintersmith to do it, in the next part of this post.

The next part of this post, in which we discuss how to get Wintersmith to reproduce some of the features of Wordpress, is here

* This sort of distinction—is this a different language or is this just a dialect—is the sort of thing which causes controversies in software development almost as much as it does in the natural languages. However, CoffeeScript’s official website tries to avoid controversy by taking a clear line on this: “The golden rule of CoffeeScript is: ’it’s just JavaScript’”.

Performance

In which things turn to treacle

I’ve noticed, over the past few months or so, that sometimes this site seems to load rather slowly. The slow periods didn’t seem to match any spikes in my own traffic, though, so I didn’t see that there was necessarily much I could do about it; moreover, as it wasn’t this site’s traffic that seemed to be causing the problem, I wasn’t under any obligation to do anything about it.

As I’ve mentioned before, a few months back I switched to Google Analytics for my statistics-tracking. Which is all well and good; it has a lot more features than I had available previously. Its only limitation is: it uses cookies and Javascript to do its work. Because of that, it only logs visits by real people, using real browsers,* and not spiders, robots, RSS readers or nasty cracking attempts. Often, especially if you’re a marketing person, that’s exactly what you want. If you’re into the geekery, though, it can cover up what’s exactly going on, traffic-wise, at the server level.

Searching my logs, rather than looking at the Google statistics, showed that I was getting huge numbers of hits for very long URLs, consisting of valid paths joined together by lots of directories named ‘&’:

Logfile extract

That’s a screenshot of a single request in the logfile – the whole thing being about 850 characters long. ‘%26′ is an encoded ‘&’ character. Because of the way WordPress works, these things are valid URLs, and requests for them were coming in at a pretty fast rate. Before long, the request rate was faster than the page generation time – and that’s when the problem really starts to build up, because from there things snowball until nobody gets served.

All these requests were coming from a single IP address, an ordinary consumer type of address in Italy.** Moreover, the user-agent was being disguised. Each hit was coming in from the same IP address, but with a different-but-plausible-looking user-agent string, so the hits looked like a normal, ordinary browser with a real person behind it.

The problem was solved fairly easily, to be honest; and the site was soon behaving itself again. It should still be behaving itself now. But if you came here yesterday afternoon and thought the site didn’t seem to be working very well, that’s why it was. I’m going to have to keep an eye on things, to see if it starts happening again.

* and only if they have Javascript enabled, at that, although I know that covers 99% of the known world nowadays.

** which made me think to myself: “I know I’ve pissed people off … but none of them are Italian!

Classification

In which we discuss tagging and filksonomies

Another design point that’s come up as part of the Grand Redesign I keep promising you: tagging. The little bundle of links at the bottom of each post that I didn’t really think did very much.

I was a latecomer to tagging. When this site first started, it didn’t have any for the first month or so. After a while I started adding them, pointing them to Technorati. Back then, the site was running on WordPress version 1.5.something, and it didn’t have any built-in tagging support. I was trying to avoid using too many WordPress plugins, and I didn’t think that tag management (as distinct from tagging per se) mattered all that much; so I wrote all the tags manually. Like this, at the end of each post, with the <a> element repeated for each tag:

<small>Keyword noise: <a class="tag" rel="tag" href="…">tag1</a>, …</small>

Which worked, quite well; there was a visually distinct “tag” class, because I wanted tag links – which all led to Technorati back then – to be visually distinct from regular links which would go to whatever they were about.

Things move on, though, and WordPress has since gained built-in tagging functionality. Given that I’m redesigning the whole site, and putting in new built-from-scratch layout templates, I thought I may as well switch to using a more organising tagging system. For one thing, it means less typing each time I write a post. All that code up above is replaced by one little chunk in the template:

<p id="thetags"><small><?php the_tags('Keyword noise: ', ', ' ,");?></small></p>

This one covers all the tags, calling a Wordpress API function to pull them out of the database and convert them into HTML. I know all those commas and quotes look a bit confusing; but really they’re not that bad. And the point is: this is in the template, not in each post. That bit of code there only has to be written once; the previous chunk had to be typed out every time. The most awkward part is that WordPress isn’t flexible enough to let you set the class of each link individually, hence the <p class="…"> at the start.

The big change this leads to, though, is that the tag links no longer point to Technorati. Now, they point back to the site itself: tag page requests generate a page containing every post marked with that tag. And, already, that’s shown that people do indeed click on the tags. People, particularly people coming from searches, do seem to use them. Whether they find them useful or not is another matter, of course, now that they point back within the site and not to a broader variety of opinions on the matter; but they do get used.

Doing it this way means that I put more tags on each post, simply because there’s much less typing to do. Conversion, though, is going to be a bit of a job. Right now there are 760-odd posts on this site, all of which I’m having to reread and re-tag. It’s going to take a while, but hopefully the majority of it will be done by the time the new design is finished.* The only problem with this transitional phase is that: the current template is, because of its age, completely unaware of tags. So it doesn’t really know what a tag-based archive page is; so when you click on a tag, there’s no explanation as to what you’re looking at. I’m not sure if this is going to be a problem for you readers or not; and, hopefully, it’s only going to be a short-lived situation.

The word “folksonomy” has often been used to describe this sort of tagging system. I’m not sure it’s an ideal term for what I’m doing, though. “Filksonomy” might be more relevant: a bit like a folksonomy, but rather more whimsical and silly.

*** In any case, there are plenty of other parts of the new design that also need each post checking and potentially editing.

Up to date

In which we make sure everything is shiny-new

And on the subject of procrastination, I’ve finally got around to making sure this site is running on the latest version of WordPress. Hurrah! I’m normally slightly reluctant to upgrade, on the grounds that the upgrade procedure is very long and detailed, and involves deleting most of the site to reinstall the new one. So you have to take the site down,* and if anything goes wrong it might stay down. I know that doesn’t really matter for a frivolous site like this, but it makes me wary.

I’m thinking of introducing Guest Contributions to the site, partly to make sure I can keep to one post per day whilst still having time off occasionally. So, if you’d like to write maybe one blog post per month that would fit in with the style of this place, get in touch.

In other news, I’ve discovered** that classic 80s popular-science TV series The Secret Life Of Machines can now be found on Google Video. Written and presented by engineer and cartoonist Tim Hunkin, it really was a very good show that inspired an awful lot of modern “let’s see if we can make one in our garden shed” science telly. I’ve you’ve never heard of it before, go and find it.

* The latest press release from the Symbolic Forest Militant Invective Laboratories says: “Stop taking your site down – take it out for a drink instead! CaitlinMIL tests prove that 87% of .php files prefer to be taken to a quiet bar for a drink, rather than be taken offline in the ordinary way. Most .php files prefer to drink rum and coke, although plain .html will generally choose gin and tonic as its tipple.”

** Thanks to boingboing

Update, August 27th 2020: Who remembers Google Video? The Secret Life Of Machines, nowadays, has all been put onto YouTube by Mr Hunkin, with links from his website, here

Grumble grumble

In which we have problems

Well, in addition to not being able to find any of the Christmas presents I want to buy in the shops; the computer has started misbehaving. It crashed in the middle of an update, and hasn’t been working right ever since. For those of you who have been on the internet since the early 90s: I’m posting this using the text-only browser Lynx. because it was the only one I could get working quickly whilst getting the rest of the machine back on its feet.

So if anything in this post looks a bit strange, that’s because I can’t really see what I’m doing, because the text-entry widget in Lynx is a bit…

The Plain People Of The Internet: So what was the explanation for all those other posts looking a bit strange, then?

Me: Har har.

More whining posts tomorrow; or if I’m in a good mood, I’ll tell you about the pantomime I went to see.

Update: Although Lynx lets you create posts in WordPress, it doesn’t seem to like you editing them. Grrr, again.

The Expert

In which I provide blogging advice

A few weeks ago, after I’d just bought the hosting and domain name for this site, one of the friendlier managers at work came up to me in the office…

“Do you know much about website hosting?”

“Well, funny you should ask that, because…”

It was nothing to do with work at all; he’d decided he wanted to set up a family website, with a [surname].org address, to keep in touch with friends, family, and the distant relatives in Australia.* And I told him that I’d been looking into it, the prices I’d found, how much I’d paid, and so on.

This morning, he came up to me again:

“I’ve bought some webspace from the same place that you did, and I was trying to work out how to put stuff on it. I was wondering … do you know anything about setting up a blog? I was looking at WordPress, and I was wondering if you knew much about it?”

“Well, funny you should ask that, because…”

So, it looks like – at least as far as this manager is concerned – I’m going to be the office WordPress expert in future, even though I only have a couple of weeks’ experience with it. Lucky for me that it’s easy to set up, I suppose.

* If you’re not British or Australian you might not know this, but everyone in Britain seems to have some distant relatives in Australia that they never get in touch with more than once per decade. I’ve got several separate lots, apparently. Presumably if you’re Australian than the reverse applies.