Rich Link Previews in Eleventy

Generating attractive link previews with Eleventy and Nunjucks

Jens-Fabian Goetzmann

29 December 2019 ‧ 7 min read

A few months ago, I re-built this website using the static site generator Eleventy. While migrating over all of my blog posts, I realized that for some posts, I wanted to be able include rich link previews like the one below:

How Do You Create a Product Roadmap?
Modern product roadmaps should not be a list of features and dates. How should you go about creating a roadmap in a lean, agile product…www.jefago.com

These link previews resemble the ones used in social networks like Twitter or Facebook, but also on publishing platforms like Medium and LinkedIn. In this post, I will walk through how to create them in Eleventy using Nunjucks as a templating language. I used Nunjucks because it supports asynchronous filters—most of the modules that support extracting metadata from websites work asynchronously. To extract metadata from websites, I used html-metadata because it supports all different kinds of metadata.

The complete resulting code for these link previews is available on GitHub.

Initial setup

Let's get started by setting everything up, installing eleventy as well as html-metadata:

npm init -y
npm install --save @11ty/eleventy
npm install --save html-metadata

We will also start with a pretty straight forward setup in .eleventy.js:

module.exports = function(config) {
  return {
    // We process everything we know how to handle, plus images, css
    templateFormats: [
      "md",
      "njk",
      "html",
      "css",
      "png",
      "jpg",
      "gif"
    ],
    // Use Nunjucks for all templates
    markdownTemplateEngine: "njk",
    htmlTemplateEngine: "njk",
    dataTemplateEngine: "njk",
    passthroughFileCopy: true,
    dir: {
      input: ".",
      includes: "_includes",
      data: "_data",
      output: "_site"
    }
  };
};

Setting up the link preview function

Next, we set up the link preview as an asynchronous Nunjucks filter inside .eleventy.js. You can read more about asynchronous Nunjucks filters in the eleventy documentation and the Nunjucks documentation.

Asynchronous Nunjucks filters get passed a callback(err, res) function that the filter should call with err set to an error (or null if the filter executed successfully), and res set to the result of the filter. A very simple version of the filter, without any actual scraping, may look as follows:

const linkPreview = (link, callback) => {
  setTimeout(function() {
    callback(null, `<a href="${link}">${link}</a>`),
    100
  });
}

module.exports = function(config) {

  // Add Nunjucks asynchronous filter
  config.addNunjucksAsyncFilter("linkPreview", linkPreview);

  // further configuration here...
}

This asynchronous filter can now be called from a simple index.md file. Note that we are calling the linkPreview filter followed by the safe filter, which avoids the output of the linkPreview filter being escaped (which would break the HTML, of course):

# Rich link previews in Eleventy
This is a demonstration of rich link previews in Eleventy

{{"https://www.jefago.com/" | linkPreview | safe}}

You should now be able to create the page using npx eleventy or running the local test server using npx eleventy --serve.

Scraping link metadata

So far, the linkPreview function is just returning a plain <a>-link. Let's get html-metadata to work and actually create a link that contains some metadata. To do this, we change the linkPreview function as follows:

const scrape = require('html-metadata');

// Helper function to escape HTML
const escape = (unsafe) => {
  return (unsafe === null) ? null : 
    unsafe.replace(/&/g, "&amp;")
      .replace(/</g, "&lt;")
      .replace(/>/g, "&gt;")
      .replace(/"/g, "&quot;")
      .replace(/'/g, "&#039;");
}

const linkPreview = (link, callback) => {

  // Helper function to format links
  const format = (metadata) => {
    // Extract some helpful metadata that we are going to use
    let domain = link.replace(/^http[s]?:\/\/([^\/]+).*$/i, '$1');
    let title = escape((metadata.openGraph ? metadata.openGraph.title : null) || metadata.general.title || "");
    let author = escape(((metadata.jsonLd && metadata.jsonLd.author) ? metadata.jsonLd.author.name : null) || "");
    let image = escape((metadata.openGraph && metadata.openGraph.image) ? (Array.isArray(metadata.openGraph.image) ? metadata.openGraph.image[0].url : metadata.openGraph.image.url) : null);
    let description = escape(((metadata.openGraph ? metadata.openGraph.description : "") || metadata.general.description || "").trim());
    if (description.length > 140) {
      description = description.replace(/^(.{0,140})\s.*$/s, '$1') + '…';
    }

    // Construct the return html
    return  `<p class="lp"><a class="lp-img" href="${link}">` +
            '<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 67.733 67.733"><path fill="#d0d0d0" d="M0 0h67.733v67.733H0z"/><path fill="#fff" d="M33.867 13.547a20.32 20.32 0 00-20.32 20.32 20.32 20.32 0 0020.32 20.32 20.32 20.32 0 0020.32-20.32H50.8A16.933 16.933 0 0133.867 50.8a16.933 16.933 0 01-16.934-16.933 16.933 16.933 0 0116.934-16.934z"/><path fill="#fff" d="M26.383 36.361l4.99 4.99 19.955-19.957 4.99 4.99V11.415H41.35l4.99 4.99L26.382 36.36"/></svg>' +
            (image ? `<img src="${image}" alt="${title}">` : '') +
            `</a><a class="lp-met" href="${link}"><strong class="lp-ttl">${title}<br></strong><em class="lp-dsc">${description}</em>` +
            (author ? `<span class="lp-by">${author}</span>` : ``)+
            `<span class="lp-dom">${domain}</span></a></p>`.replace(/[\n\r]/g, ' ');
  }
  
  // Asynchronously scrape the link, calling the callback on success or failure
  scrape(link).then((metadata => {
    if (!metadata) callback ("No metadata", `<div style="color:#ff0000; font-weight:bold">ERROR: Did not receive metadata</div>`);
    callback(null, format(metadata)); 
  }));
}


// file continues...

There's a lot going on here, so let's step through this one by one. The escape function merely makes sure that any metadata we've extracted doesn't contain HTML (so that we don't get any HTML injected in our page), by replacing the symbols <>"& with their respective HTML entity codes.

Inside of linkPreview, we define a helper function format that takes the metadata extracted using html-metadata and constructs an HTML snippet for the link preview. The function is defined inside of linkPreview mostly because it is a closure over the variable link which is local to the function linkPreview. In other words, we want to be able to access the link variable from inside the format function, and we can't pass it as a parameter (since the parameters of the function are defined by html-metadata).

You can see how the format function looks at the different types of metadata that html-metadata can extract, e.g. to define the variable title it looks for both metadata.openGraph.title and metadata.general.title, depending on whether or not the linked page contains OpenGraph information or not.

For the actual HTML generation, the snippet contains an inline SVG which is used as a fallback if the link contains no image. You can of course remove or replace that.

The actual call to HTML metadata happens in this very short piece of code. In case html-metadata doesn't return any metadata, the callback is called with the error set to No metadata.

// Asynchronously scrape the link, calling the callback on success or failure
  scrape(link).then((metadata => {
    if (!metadata) callback ("No metadata", `<div style="color:#ff0000; font-weight:bold">ERROR: Did not receive metadata</div>`);
    callback(null, format(metadata)); 
  }));

In order for the generated HTML snippet to look like anything, you also need to create and include a CSS file, like the following bare bones template. In order for it to be included, you will also need to create and include a layout in the index.md file—refer to the eleventy documentation for details.

.lp {
  border:1px solid #d0d0d0;
  display: flex;
  flex-flow: row;
  margin-bottom:20px;
  margin-top:20px;
}

.lp svg {
  width:100px;
  height:100px;
}	

.lp a, .lp a:hover, .lp a:link, .lp a:active, .lp a:visited {
  text-decoration: none;
  color: black;
}

.lp a:hover *, .lp a:active * {
  text-decoration:underline;
}

.lp-img {
  display: block;
  flex: 0 0 100px;
  background-color:#d0d0d0;
  min-height:100px;
  border-right:1px solid #d0d0d0;
  position: relative;
}

.lp-img img {
  position: absolute;
  left: 0px;
  top:0px;
  width:100%;
  height:100%;
  object-fit:cover;
}

.lp-met {
  display: block;
  flex: 1 0px;	
  padding:8px;
  padding-top:4px;
  padding-bottom:4px;
  overflow: hidden;	
  margin:0px;
  font-size: 12px;
  line-height: 16px;
}

.lp-ttl {
  display: block;
  font-weight: 500;
  font-size:14px;
  max-height:32px;
  overflow: hidden;
}

.lp-dsc {
  font-style: italic;	
  max-height: 80px;
  overflow: hidden;	
}

.lp-by, .lp-dom, .lp-dsc {
  display: block;
  margin-top: 8px;	
}

.lp-by::before {
  content:"by ";
}

Caching link metadata

While the above approach works perfectly fine, it will scrape metadata from the link every time the site is built. This increases build time (especially if you have a lot of rich link previews), and also makes builds more fickle if one of the linked pages is unavailable at build time. It is therefore a good idea to cache the results.

The idea here is actually very simple: We will just store the results of the scrape call as a JSON file. You can even check those files into your version control system, and wherever you then check out and build the site, the cached version of the link metadata will be used.

We will use some additional modules for the caching, so update the beginning of your .eleventy.js file as follows:

const fs = require("fs");
const crypto = require("crypto");
const path = require('path');

We will store the cached metadata in files in the directory _files, so create that first. To implement the caching, we extend the linkPreview function as follows:

const linkPreview = (link, callback) => {

  // Helper function to format links
  const format = (metadata) => {
    // ...
  }

  // Hash the link URL (using SHA1) and create a file name from it
  let hash = crypto.createHash('sha1').update(link).digest('hex');
  let file = path.join('_links', `${hash}.json`);

  if (fs.existsSync(file)) {
    // File with cached metadata exists
    console.log(`[linkPreview] Using persisted data for link ${link}.`);
    fs.readFile(file, (err, data) => {
      if (err) callback("Reading persisted metadata failed", `<div style="color:#ff0000; font-weight:bold">ERROR: Reading persisted metadata failed</div>`);
      // Parse file as JSON, pass it to the format function to format the link
      callback(null, format(JSON.parse(data.toString('utf-8'))));
    });
  } else {
    // No cached metadata exists
    console.log(`[linkPreview] No persisted data for ${link}, scraping.`);
    scrape(link).then((metadata => {
      if (!metadata) callback ("No metadata", `<div style="color:#ff0000; font-weight:bold">ERROR: Did not receive metadata</div>`);
      // First, store the metadata returned by scrape in the file
      fs.writeFile(file, JSON.stringify(metadata, null, 2), (err) => { /* Ignore errors, worst case we parse the link again */ });
      // Then, format the link
      callback(null, format(metadata)); 
    }));
  }  
}

Again, quite a bit going on here. Let's step through it one by one. In the first two lines, we're creating an SHA1 hash of the link URL, which we're going to use (in hex encoding) as the file name. SHA1 hashing ensures that we don't have any troubles with slashes and other special characters in the link, which we would have if we would just use the URL as the file name. On the other hand, having two URLs produce the same SHA1 hash is close to impossible. (Note: While SHA1 isn't safe anymore as a password hashing algorithm, for this purpose, it's completely fine.) Another thing that's worth noting is that this means that small differences in the URL will lead to two different files being created. Let's say for instance that you are using one link preview to https://www.jefago.com and another to https://www.jefago.com/—although they point to the same web page, their hash would be different.

Next, we're checking whether the file with that name already exists, using the synchronous function fs.existsSync(). It would also be possible to use the asynchronous function here instead, since the filter is already asynchronous. If the file exists, we try to read it. If reading fails, we pass an error to the callback. If it succeeds, we simply parse the data as JSON and pass it to the format() function—this is possible since the data we are storing in the file is the just JSON'ified result of the scrape() function.

In case the file doesn't exist, we basically have the scrape() call we had before. The only difference is that we also write the metadata to the file, using JSON.stringify. In case writing the file is unsuccesful, we simply ignore it (since we can always try again next time). This also means that if you didn't create a directory called _links, saving the file will probably fail silently.

Lazy loading preview images

The last possible improvement that I implemented is lazy loading the preview images. Otherwise, a page with a lot of link previews can get very slow to load. For that, I used the small lazy loading library bLazy, which is loaded and implemented in an HTML template like this:

<script defer type="text/javascript" src="//cdn.jsdelivr.net/blazy/latest/blazy.min.js"></script>
<script type="text/javascript">
  window.addEventListener('DOMContentLoaded', () => {
    var bLazy = new Blazy({
      selector: ".lp img",
      offset: 300,
      success: (element) => {element.style.backgroundColor = "white";}
    });
  })
</script>

This loads and initalizes bLazy, looking for any images matching the CSS-style selector .lp img (and on success, sets the image background to white so the gray background doesn't show through any transparent images). In order for this to work, we just have to change one line in the format() function:

// This line...
//            (image ? `<img src="${image}" alt="${title}">` : '') +
// gets replaced by this line
            (image ? `<img src="data:image/png;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs=" data-src="${image}" alt="${title}">` : '') +

This line changes the <img> element so that the actual source is a tiny base64-encoded inline transparent png, and the actual source is set as the attriute data-src, which bLazy uses to lazy load the image.

That's all! I hope this was helpful. As mentioned above, the full code is available on GitHub.