On Migrating Los Techies to Github Pages


We recently migrated Los Techies from a multi-site installation of WordPress to Github Pages, so I thought I’d share some of the more unique portions of the process. For a straightforward guide on migrating from WordPress to Github Pages, Tomomi Imura has published an excellent guide available here that covers exporting content, setting up a new Jekyll site (what Github Pages uses as its static site engine), porting the comments, and DNS configuration. The purpose of this post is really just to cover some of the unique aspects that related to our particular installation.

Step 1: Exporting Content

Having recently migrated my personal blog from WordPress to Github Pages using the aforementioned guide, I thought the process of doing the same for Los Techies would be relatively easy. Unfortunately, due to the fact that we had a woefully out-of-date installation of WordPress, migrating Los Techies proved to be a bit problematic. First, the WordPress to Jekyll Exporter plugin wasn’t compatible with our version of WordPress. Additionally, our installation of WordPress couldn’t be upgraded in place for various reasons. As a result, I ended up taking the rather labor-intensive path of exporting each author’s content using the default WordPress XML export and then, for each author, importing into an up-to-date installation of WordPress using the hosting site with which I previously hosting my personal blog, exporting the posts using the Jekyll Exporter plugin, and then deleting the posts in preparation for the next iteration. This resulted in a collection of zipped, mostly ready posts for each author.

Step 2: Configuring Authors

Our previous platform utilized the multi-site features of WordPress to facilitate a single site with multiple contributors. By default, Jekyll looks for content within a special folder in the root of the site named _posts, but there are several issues with trying to represent multiple contributors within the _posts folder. Fortunately Jekyll has a feature called Collections which allows you to set up groups of posts which can each have their own associated configuration properties. Once each of the author’s posts were copied to corresponding collection folders, a series of scripts were written to create author-specific index.html, archive.html, and tags.html files which are used by a custom post layout. Additionally, due to the way the WordPress content was exported, the permalinks generated for each post did not reflect the author’s subdirectory, so another script was written to strip out all the generated permalinks.

Step 3: Correcting Liquid Errors

Jekyll uses a language called Liquid as its templating engine. Once all the content was in place, all posts which contained double curly braces were interpreted as Liquid commands which ended up breaking the build process. For that, each offending post had to be edited to wrap the content in Liquid directives {% raw %} … {% endraw %} to keep the content from being interpreted by the Liquid parser. Additionally, there were a few other odd things which were causing issues (such as posts with non-breaking space characters) for which more scripts were written to modify the posts to non-offending content.

Step 4: Enabling Disqus

The next step was to get Disqus comments working for the posts. By default, Disqus will use the page URL as the page identifier, so as long as the paths match then enabling Disqus should just work. The WordPress Disqus plugin we were using utilized a unique post id and guid as the Disqus page identifier, so the Disqus javascript had to be configured to use these properties. These values were preserved by the Jekyll exporter, but unfortunately the generated id property in the Jekyll front matter was getting internally overridden by Jekyll so another script had to be written to modify all the posts to rename the properties used for these values. Properties were added to the Collection configuration in the main _config.yml to designate the Disqus shortname for each author and allow people to toggle whether disqus was enabled or disabled for their posts.

Step 5: Converting Gists

Many authors at Los Techies used a Gist WordPress plugin to embed code samples within their posts. Github Pages supports a jekyll-gist plugin, so another script was written to modify all the posts to use Liquid syntax to denote the gists. This mostly worked, but there were still a number of posts which had to be manually edited to deal with different ways people were denoting their gists. In retrospect, it would have been better to use JavaScript rather than the Jekyll gist plugin due to the size of the Los Techies site. Every plugin you use adds time to the overall build process which can become problematic as we’ll touch on next.

Step 6: Excessive Build-time Mitigation

The first iteration of the conversion used the Liquid syntax for generating the sidebar content which lists recent site-wide posts, recent author-specific posts, and the list of contributing authors. This resulted in extremely long build times, but it worked and who cares once the site is rendered, right? Well, what I found out was that Github has a hard cut off of 10 minutes for Jekyll site builds. If your site doesn’t build within 10 minutes, the process gets killed. At first I thought “Oh no! After all this effort, Github just isn’t going to support a site our size!” I then realized that rather than having every page loop over all the content, I could create a Jekyll template to generate JSON content one time and then use JavaScript to retrieve the content and dynamically generate the sidebar DOM elements. This sped up the build significantly, taking the build from close to a half-hour to just a few minutes.

Step 8: Converting WordPress Uploaded Content

Another headache that presented itself is how WordPress represented uploaded content. Everything that anyone had ever uploaded to the site for images and downloads used within their posts were stored in a cryptic folder structure. Each folder had to be interrogated to see which files contained therein matched what author, the folder structure had to be reworked to accommodate the nature of the Jekyll site, and more scripts had to be written to edit everyone’s posts to change paths to the new content. Of course, the scripts only worked for about 95% of the posts, a number of posts had to be edited manually to fix things like non-printable characters being used in file names, etc.

Step 9: Handling Redirects

The final step to get the initial version of the conversion complete was to handle redirects which were formally being handled by .httpacess. The Los Techies site started off using Community Server prior to migrating to WordPress and redirects were set up using .httpaccess to maintain the paths to all the previous content locations. Github Pages doesn’t support .httpaccess, but it does support a Jekyll redirect plugin. Unfortunately, it requires adding a redirect property to each post requiring a redirect and we had several thousand, so I had to write another script to read the .httpaccess file and figure out which post went with each line. Another unfortunate aspect of using the Jekyll redirect plugin is that it adds overhead to the build time which, as discussed earlier, can become an issue.

Step 10: Enabling Aggregation

Once the conversion was complete, I decided to dedicate some time to figuring out how we might be able to add the ability to aggregate posts from external feeds. The first step to this was finding a service that could aggregate feeds together. You might think there would be a number of things that do this, and while I did find at least a half-dozen services, there were only a couple I found that allowed you to maintain a single feed and add/remove new feeds while preserving the aggregated feed. Most seemed to only allow you to do a one-time aggregation. For this I settled on a site named feed.informer.com. Next, I replaced the landing page with JavaScript that dynamically built the site from the aggregated feed along with replacing the recent author posts section that did the same and a special external template capable of making an individual post look like it’s actually hosted on Los Techies. The final result was a site that displays a mixture of local content along with aggregated content.

Conclusion

Overall, the conversion was way more work than I anticipated, but I believe worth the effort. The site is now much faster than it used to be and we aren’t having to pay a hosting service to host our site.

Hello, React! – A Beginner’s Setup Tutorial