The digital home of javamate

Some time ago I set out to rebuild a Drupal site on the JAMStack. I had two primary motivations. First, I needed to do something with the site because it was four major versions behind on Drupal and at least two major versions behind on PHP. Upgrading PHP would require an upgrade to Drupal and the audio plugin the site used for MP3 files was no longer supported in the current Drupal version. I could have updated the plugin for the newer PHP version but PHP is not my strong suit and I would rather work with the JAMStack. I’d heard about the JAMStack and read about it and it resonated with me. Since I was working with the JAMStack at work I wanted to deepen my knowledge of it and this seemed like great project for it.

The site that I was migrating is a church web site that is updated at least once weekly and sometimes more often. It contains a collection of sermon audios with outlines as body content and binary attachments, as well as blogs and a few standard pages. Each week at least one new sermon would be posted and blogs would be added periodically by the pastor.

The Drupal audio plugin provided some nice features including reading the MP3 metadata and creating pages to view audios by title, album, artist, year, as well as random audio block (which displayed a link to a random audio entry on each page view) and a latest audio block (which displayed a list of the latest X audio entries where X could be configured). Since Drupal is a server-rendered web application, these “pages” would be generated on the fly by querying the database: first to generate the list of links and second to generate the full-page content. Since I would be moving the site to the JAMStack this would not be possible in the same way.

I wanted as much of the site as possible to be pre-rendered and served statically from a commodity web server without any backend application server, including the ability to serve the site from an AWS S3 bucket. I wanted a purely serverless architecture. I am an application developer and in that role I have never enjoyed patching or updating operating sytems or application server runtimes. I’ve been in the IT space for over two decades and I can and have done that type of work, but when dealing with a simple web site that is a lot of extra overhead that I’d prefer to do without. I wanted to update this site architecture once and have it last another twenty or more years. I realize, of course, that such a statement is a bit of a pipe dream in the tech space since things change so quickly, but at least with the JAMStack and serverless I could hopefully minimize the necessary change. That’s the goal. I guess in twenty years I’ll be able to let you know how well that went.

Things to Consider

These are some of the considerations I had in migrating the Drupal site to the JAMStack.

What static site generator to use
Where to store the data
How to structure the data
Migrate all existing content
Retain existing functionality
- Multi-user editorial experience (CRUD)
- Tagging
- Display the latest sermons on the home page
- Paginating sermons and blogs
- Multi-faceted content viewing (by author, title, album/series, year)
- Stream MP3 audio files
- Site search
- Applying template changes to the whole site

Content

The site has over 500 sermons with audio files and a couple hundred blogs.

Stripping away the web application backend from the site presents some challenges. In the first place, the Drupal site has a single PHP file that ends up rendering all of the pages on the fly based on the URI requested. It looks up the URI in the database and assembles the content on the fly based on the content type and the configuration of the page (e.g. which blocks to display, what layout to use, etc.). The Drupal site allows the administrator to choose a different theme and apply it to the whole site in a single click. This is a nice feature, but not important for this site. What is important is the ability for me as the webmaster to be able to change the design as needed without having to manually touch all of the pages.

Throwback to the old days

Back in the day when I got started creating web sites the cool tech for maintaining a consistent layout and design was Server Side Includes (SSI). Using SSI you could maintain, for example, a header and footer and side menu each in a single file and then include them with a one-liner on every page of your site. This works by dynamically including the contents of the referenced file in the place of the include statement, so that when the page is rendered the contents of the referenced file would be part of the markup. This obviously requires a server (hence the Server in Server-Side Includes), something that the target architecture will not have. (Technically, of course, it will have a server, but not in the traditional sense.)

Enter Static Site Generation (SSG), a process whereby web sites are generated at build time (also called pre-rendering). Wait a minute, what is build time? In the case of a Drupal site there is no such concept, at least not in the normal process of running a Drupal site. If you’re a developer that actually works on Drupal itself, there is a build time when each release is cut, but most people who run Drupal sites don’t have to deal with that. Build time can be roughly equated to publish time in the context of a Drupal site. We’ll deal more with this later, but for now suffice it to say that in static site generation the output is fully rendered markup* (i.e. the content is mixed with the layout and design as if it had been hard-coded). (* Technically speaking the markup may not be fully rendered - it could be enhanced by runtime Javascript, but that’s a detail to take up later.)

The Publish Process

When content is published to a Drupal site, it becomes visible immediately since it is just a record in the database that is now marked as published, and thus when a request comes in for the matching URI it is found in the database and returned to the user. No new files are generated on the web server. When content is published to a statically generated site, a file has to be generated that combines the content with the template, resulting in the final markup, and then written to the server. When a request comes in, it is looked up on the web server by URI (without a database) and returned as-is.

That is the simplest case. But on a typical (Drupal) site, common navigation is shared across all pages, and that nagivation can be updated once and immediately reflected on all pages on the site (notwithstanding caching). Additionally, Drupal supports the creation of arbitrary categories (i.e. topics) that can be used to collect various posts on a similar topic and display them in one location. This provides a great deal of flexibility in authoring and finding content.

On the site at hand, the sermons posts are displayed along various categories, including the speaker, the series, the year, and the title. Each category has a top-level page that contains the index to all the entries in that category. Depending on the category, there may be sub-categories or the entries may be directly linked to the category itself. For example, the Sermons by Title page is an index to every sermon on the site by its title, whereas the Sermons by Year page has links to every year in which sermons are available, and each year is a page with an index to the sermons preached in that year.

This means that every sermon will be linked from multiple pages throughout the site, not just one page. Therefore, when a new sermon is posted in a static site generation model, we have to update (i.e. regenerate) multiple pages to add the new link to the content (in addition to generating the new content page itself).

The approach I took here is a two-part generation process. The first part generates a new page for the content that was published, and the second part regenerates the entire site to ensure that the new content is linked across all of those categories (and anywhere else that is appropriate based on tagging or date, e.g. the most recent 25 sermons are displayed on the home page).

Let’s look first at how the pages are generated.

Static Site Generation

There are (now) many different choices avaiable for static site generation. I had recently learned about 11ty (Eleventy) and after reviewing it briefly it resonated with me. It is very flexible, very fast, supports plugins, and can get content from anywhere. I’m not going to try to convince anyone to use it, I’m simply going to say that’s what I chose and it works very well. I wanted something that would run on (and could be customized via) Node.js (because with Node.js supporting Javascript on the server and Javascript being the programming language of the web, we now have a single programming language for both the front and back end. Technically this is not exactly the case yet, but we are getting closer and closer. It is certainly more the case with Javascript than with any other language since Javascript is ubiquitous in web browsers and nothing else is even common.)

One of the key aspects of 11ty that made it fit this use case very well is that it can be run in a (AWS) Lambda Function. This means it can be run in response to an event of one kind or another, including an SNS topic, an SQS queue, an EventBridge event, a DynamoDB stream, an API call, etc.

Another key aspect is the fact that the data can come from anywhere, including the file system (for the ultimate it static site generation where static files in a Git repo are your content - no database or other content management system required). In my case, I wanted to store the content in DynamoDB because it is serverless, extremely fast, and extremely cheap. The trick with DynamoDB is deciding how to store the data, and that is a very important factor for this site. We need to be able to retrieve all the data for the site at once, and we also need to be able to retrieve each individual entry.

Data Model

The data model I decided to use in DynamoDB for this is a simple model where the partition key is the type of content (e.g. page, sermon, blog, etc.) and the sort key is the unique ID of the content. This enables efficient retrieval of all the items of a given type since DynamoDB can perform query operations against only the partition key while also enabling efficient retrieval of any given item since DynamoDB can query against both the partition key and the sort key. DynamoDB ensures single-digit millisecond reads when using primary keys. I won’t elaborate more on the model here as it is not core to the topic. Suffice to say that the data is stored in DynamoDB and corresponding data sources are defined in 11ty to read all content of a particular type and generate the static pages.

Content Templates

Defining content templates in 11ty is quite easy. The first decision is which template language to use, thanks to 11ty supporting eleven (now twelve) different template languages, including the ability to mix and match multiple template langauges in a single project.

To be continued…