Insight Delivery @ Cloudinary

Cloudinary
Cloudinary Engineering Blog
3 min readFeb 13, 2019

By Daniel Cohen

Source Code
Qlik mailing: This is the open-source project from our codebase.

Indisputably, the last mile in the data-engineering pipeline is generating and delivering insight reports. For companies that aim to be “data-centric,” email is an efficient way for disseminating knowledge. By distributing reports generated by our business intelligence (BI) tool, Qlik, on email, we promote employee awareness and increase engagement with our data platform.

During the process, we found that integrating Qlik’s NPrinting did not meet our needs. Thankfully, Tal Admon of our Customer Success team introduced us to a cool feature in Qlik called mashups, with which we can generate custom embeddable reports. Subsequently, we devised an email data-reporting utility similar to the contemporary email-authoring systems, such as Mailchimp and others.

However, one problem of sending HTML pages by email is that email clients use a subset of the HTML spec and cannot parse complex HTML layouts or run JavaScript or CSS for robust presentations, which is key for BI reports. As a solution, we architected a system that replaces selected parts of the email’s HTML page with an image — a screenshot taken in place.

The Design

A BI email system has the following prerequisites:

  • A scheduler
  • A WYSIWYG editor
  • The ability to —
    * Manage tasks
    * Scrape raw HTML pages and convert them to HTML email-complaint format with screenshots
    * Upload, store, and deliver those screenshots

For the orchestration (scheduling, configuration, browser automation, and third-party integration), we used Node.js to build a tool on top of its rich npm ecosystem.

For editing, we leveraged Qlik Mashup’s WYSIWYG editor.

For scraping and taking screenshots, Puppeteer and Node.js are a natural fit. Puppeteer is a Google framework that automates Google Chrome and with which you can open a page, navigate, and click links, all in code.

For image management and delivery, Cloudinary was our obvious solution for screenshot uploads, storage, and delivery.

Utility

The entry point is `bin/mailinary`,in which the service loads all the scheduled email tasks from the `jobs` folder. That folder can comprise many task files, each containing a scheduled email, the addressee list for the email, the URL from which to generate the email, and other configuration options.

Here is a short example of a scheduled email-task file:

{"to": "daniel@example.com","subject": "email subject" ,"url": "https://my-bi-tool.com/daily_report.html",
"schedule": "0 10 * * *"}

When a schedule is met, the process initializes a new `Scraper` with the scheduled `Job`, awaits the loading of the report, finalizes the in-place image, and ships the report by email.

The Scraper launches a headless Puppeteer instance, redirects it to the URL, and waits for the page to load. In case of a login wall, the Scraper auto-fills the credential fields and continues to process the report. Once all the graphs are loaded and rendering is complete, the Scraper searches for the elements, which are specified in the job, whose content must be replaced with a screenshot. For each element, the Scraper creates a screenshot, uploads it to Cloudinary, and replaces the element content with this image tag: `<img src=’https://res.cloudinary.com/image_url’>.`

Finally, the process passes the augmented HTML body of the report to the mailer.js module and sends it with Mandrill (a Mailchimp email API) by means of the Simple Mail Transfer Protocol (SMTP).

--

--

Cloudinary
Cloudinary Engineering Blog

Cloudinary automates and streamlines the entire image, video and rich media workflow from upload to manipulation and delivery. Built for web and app developers.