Automating website creation using Jekyll

Guilherme Müller
Level Up Coding
Published in
6 min readJun 21, 2020

--

A case study about building a solution that builds websites based on web content scraping and API requests using Jekyll with Liquid templates.

The real-world case

Imagine that you have a vast and complex website, with lots of pages and content which most of the time overwhelms the user with information and visual stimulus. Now, you’re moving to a slimmer information structure with smaller websites focused on your products. Your timespan to deploy a solution is short and the real challenge resides on how to compile a vast amount of information and automate the process of creating those smaller websites.

A user-centric solution

Remember that you’re trying to solve a problem for someone. The technical side is amazing, but at the end of the day, your solution must solve a business problem. In this case, we needed to create some smaller and more direct websites containing content from a bigger website in a way that made more sense for the end-user. Working directly with your design and business teams is essential. Understanding which pieces of information are essential and how they should be displayed, in which order and priority will help you avoid headaches in the future.

Understand the context

Before choosing the tool, understanding the context of your information is crucial. A few questions that will play a key role in your decision are:

  • How often does this data change?
  • How critical it’s that your information is up to date and correct for business and compliance reasons?
  • How much of it is standardized or exclusive for each case?
  • Does this content fit on a templating model?
  • What are the most important features of the project?

Those questions will help you to understand how often you will need to update your data, and if you will need to automate this update or it can be done manually when needed. How many exceptional cases can happen and what can be standardized and templated. And remember, don’t over-engineer solutions for problems you don’t have. If your data changes one time each year, most probably you will not need to daily update it through a routine.

The right tool for the right job

In this case, after understanding that the data wouldn’t change constantly and that speed and weight were key components for the success of the project (the project was focused on emerging markets, where low-end mobile phones rule and internet speeds are not the best), we choose to go with Jekyll for the static content generation. The content would be fetched by using a simple web scrapping application (since HTML is a structured markup language, you can extract lots of information with consistency) and an API, both of which would be saved on the _data folder of Jekyll.

The overall process (Icons by Flaticon)

Scalability in mind

If you’re thinking about scaling and automating your creation, have in mind that simple data modeling will help you a lot.

Have keys for your data and items

Using Jekyll will allow you to use your filesystem as a database. Use keys between your items, pages, and data. This will help you identify your content and allow for automated data fetching and merging. Imagine that you have a page with the key “1” in the front matter, and the .json inside the _data folder is called “data_1.json”, you can easily use variables, WHEREs, IFs, and other operators to locate the data that refers to any specific item.

Structure your project

After you understand your data context, think your project on the perspective of a layer stack and merge it into your tool architecture.

Data layer

Imagine your data layer as your database. In Jekyll, the folder _data is the location for your .csv, .yml, .json, or other data files. Inside here, create folders to organize your data, so you can understand and maintain it. You can have an external application that will fetch data from your data sources in intervals and update the files inside this folder.

Layouts layer

Here you define which are the layouts your website will have. You might need a layout for your home page, a landing page, some content page with one column, and another content page with two columns, for example. Otherwise, you can organize your layouts based on the content type you want to display, a layout for news, a layout for your categories of products, and a layout for your product.

Content layer

Now that you have your layouts (a macro vision of your website), your data (a micro vision of your content), it’s time to structure your content. Jekyll allows you to create custom content folders (by using Collections), it will help you organize your content.

Here comes the first trick

Your process will become more iterative: Build one page of each content that your website will have. Now you have your complete model. The layout has been tested, the data applied to the page have been validated and now you can see the final result, for this one case, at least. Now, let’s take a step back.

You have your sample page. So, let’s break it into components that are easier to maintain and will help you a lot while implementing an automation layer. This way, your automation will only need to create the component name and the data model, and all the fine-tuning will occur within the component.

Components layer

Your components will go into the _includes folder. Maintain a naming convention, will help you a lot. Create a folder inside the includes folder for each type of page or model. For example, you have your “ProductPage”, create a folder called “Product” inside the includes folder and put all the components for those pages in there. Also, you can use a Common folder, for those components that appear across multiple types of pages. So you will end up with a structure like this:

Structure for your includes folder.

While creating your components, have in mind that null cases might occur within the data. Maybe data will be missing for some items or maybe it doesn’t exist in some cases. Make the if-else statements your friends for those cases.

After creating all components, update your page, so it will use the components from the _includes. See more about includes and passing parameters here. Now, your page will be way simpler, the behavior and styling of the component will be in itself and you will be able to easily pass the data you need to it. Now we can go to the automation layer and create all the pages we still need.

Automation layer

Now, let’s create content in a scalable way. You have your pages using includes, you know where to store and fetch content within your _data folder and you have the structure for your content folder. Here comes a key component, an application that will fetch the results from your data sources and save the contents where they belong inside your Jekyll project. Based on what data you got from the web scrapper or the API endpoint, create a new page based on your template, make all the substitutions required (here the keys play an important role) and you will end up with new pages that are ready for deployment.

The process, now generating pages and content (Icons by Flaticon)

Update layer

To finish, you can automate the deploy process based on your favorite CI/CD workflow or even using a scheduler application on an onsite server. Fetch all data from the sources, build a new version of your website, and voilà.

Key learnings

While you were reading this article, you might have asked yourself: “Why not spend a little more time on planning all the structure from the beginning and make it more straight forward to develop it?”. Like any process that involves a lot of stakeholders and people from different areas (you will probably work with designers, product owners, business analysts, marketing staff, and so on) the specifications change quickly, and you’ll need to adapt the architecture to those changes. So, here’s the key learning (in my case):

Choose a vertical approach to your project. Make it work across all the layers as simple as possible, then you will understand the structure and be able to adapt it more easily and faster to the changes.

The result

After all this, you will end with a statically generated website, that uses different data sources. It’s more secure, lightweight, and faster. A sample result can be viewed at way.feevale.br/ and digital.feevale.br/. In both cases, the data comes from different sources and the content pages are statically generated using Jekyll.

--

--

A Brazilian software developer and devops engineer, who also likes communication and research.