Using Puppeteer Sharp as a headless static HTML component generator

Guilherme Müller
Level Up Coding
Published in
4 min readApr 19, 2020

--

PuppeteerSharp is a fantastic C# port of the Puppeteer library for running the Chromium web browser integrated to your solution. It can run on the .NET Framework or .NET Core projects. Through this tool, you can provide static HTML components based on almost real-time data as server-side rendered content.

Faster and lighter web content is a must, especially in developing countries. With low mobile and wired internet speeds and low-end smartphones and computers being the biggest part of the market. This is a story about implementing a solution that allowed for visually complex content to be built based on dynamic data and JS scripts using the D3.js library, and being exported as HTML, PDF (scalable vector) and images (both JPG and PNG, at large scales). This allowed for a secure environment for the data and faster load times for the content for the end-user.

The project had in scope about 10 complex infographics that should be rendered with dynamic data from a data source and they should be:
- Interactable web components with multiple states;
- Exportable in vectorial format to be used in Adobe Illustrator or other software;
- Exportable as JPG or PNG images with large resolutions and each state or interaction should be exported as a separated image;
- Compatible with a legacy ASP.NET application;

Sample of one of the component’s structures from this page.

Architectural. overview:

To match the required specifications, after lots of study and prototypes, the solution architecture can be defined as the following:

In general lines, the application architecture. (Icons by FlatIcon)

The core application manages an update routine. This routine is responsible for instantiating a Puppeteer Sharp process, requesting the web component’s files by its Entry Point file and recovering the data to build the static version (this allows for standard API calls or other web requests). After Puppeteer Sharp finished loading the data and the build logic of your web component, you should end up with a statically generated component in your page context (you can also inject a function to parse the styles into SSR). Then, your content has been parsed and it’s available at the page context, now you can define the output strategy for your required outputs, be it to save a file or a series of files, send it to another application or anything else you need it to do.

The web component structure:

Take the technical documentation, layouts and a sample of your component data. Standardized the component structure in folders, so you can easily load it into Puppeteer Sharp.

Each component receives its own folder with a standardized structure. (Icons by FlatIcon)

Splitting the component “build logic” and the “interaction logic”

To allow the component generation process to be run at the server only. The build logic (those scripts and styles used to build the static component based on the data) can be split from the interaction logic (the usual .js and .css content for the front-end) on separated files. Then you can deploy the interaction logic directly on an externally accessible path.

Key learnings from the code:

You can inject the data content as a string of structured JSON data (or another format) into the page context, which allows for a decoupled data provider or different queries that directly interact with the web components.

You can also include HTML tags to be removed before the export of the component. After the logic executes, you can remove those tags by injecting a removal function (using jQuery or vanilla javascript):

When instantiating a Browser from Puppeteer Sharp, remember to set the temporary user data directory (for the session created by Puppeteer), otherwise, it will be created on the system’s temp directory and probably will not be removed after the use.

Key learnings from the project:
-
For multiple export outputs use the power of the Strategy Pattern as a solution.
- Watch for the chromium life cycle, remember to close all tabs and instances or they will become a memory hog.
- Remember to set the temporary data of your chromium instance to a fixed folder within your application, otherwise, it will create random folders in your machine temporary storage and, most of the time, not delete it, stalling your hard disk in a few days.
- Create a loop routine for the content update, so your content is gradually updated over time, or limit the number of instances of chromium your application will run, otherwise, you can spike the memory and CPU consumption extremely fast.

Pros:
-
Secure data environment, since it’s never accessed by the user.
- Flexible content output, allowing for different outputs as required.
- A way to secure the component construction logic. In some cases, this logic can become a complex business secret.
- Easily extensible.

Cons:
- Not updated in real-time. You can do it in real-time, but if for each requested you instantiate a new Browser or even a Page, you will end up chewing up your server resources.
- After all, Chromium consumes a lot of memory and resources, so you will need a powerful machine or good management of your update process.

A sample project is available at https://github.com/gmullernh/puppeteersharpStaticRenderer

--

--

A Brazilian software developer and devops engineer, who also likes communication and research.