Announcing Scumblr and Sketchy — Search, Screenshot, and Reclaim the Internet
Netflix is pleased to announce the open source release of two security-related web applications: Scumblr and Sketchy!
Scumbling The Web
Many security teams need to stay on the lookout for Internet-based discussions, posts, and other bits that may be of impact to the organizations they are protecting. These teams then take a variety of actions based on the nature of the findings discovered. Netflix’s security team has these same requirements, and today we’re releasing some of the tools that help us in these efforts.
Scumblr is a Ruby on Rails web application that allows searching the Internet for sites and content of interest. Scumblr includes a set of built-in libraries that allow creating searches for common sites like Google, Facebook, and Twitter. For other sites, it is easy to create plugins to perform targeted searches and return results. Once you have Scumblr setup, you can run the searches manually or automatically on a recurring basis.
Scumblr leverages a gem called Workflowable (which we are also open sourcing) that allows setting up flexible workflows that can be associated with search results. These workflows can be customized so that different types of results go through different workflow processes depending on how you want to action them. Workflowable also has a plug-in architecture that allows triggering custom automated actions at each step of the process.
Scumblr also integrates with Sketchy, which allows automatic screenshot generation of identified results to provide a snapshot-in-time of what a given page and result looked like when it was identified.
Scumblr makes use of the following components :
- Ruby on Rails 4.0.9
- Backend database for storing results
- Redis + Sidekiq for background tasks
- Workflowable for workflow creation and management
- Sketchy for screenshot capture
We’re shipping Scumblr with built-in search libraries for seven common services including Google, Twitter, and Facebook.
Getting Started with Scumblr and Workflowable
One of the features we wanted to see in Scumblr was the ability to collect screenshots and text content from potentially malicious sites — this allows security analysts to preview Scumblr results without the risk of visiting the site directly. We wanted this collection system to be isolated from Scumblr and also resilient to sites that may perform malicious actions. We also decided it would be nice to build an API that we could use in other applications outside of Scumblr. Although a variety of tools and frameworks exist for taking screenshots, we discovered a number of edge cases that made taking reliable screenshots difficult — capturing screenshots from AJAX-heavy sites, cut-off images with virtual X drivers, and SSL and compression issues in the PhantomJS driver for Selenium, to name a few. In order to solve these challenges, we decided to leverage the best possible tools and create an API framework that would allow for reliable, scalable, and easy to use screenshot and text scraping capabilities. Sketchy to the rescue!
At a high level, Sketchy contains the following components:
- Python + Flask to serve Sketchy
- PhantomJS to take lazy captures of AJAX heavy sites
- Celery to manage jobs and + Redis to schedule and store job results
- Backend database to store capture records (by leveraging SQLAlchemy)
Sketchy at its core provides a scalable task-based framework to capture screenshots, scrape page text, and save HTML through a simple to use API. These captures can be stored locally or on an AWS S3 bucket. Optionally, token auth can be configured and callbacks can be used if required. Sketchy uses PhantomJS with lazy-rendering to ensure AJAX-heavy sites are captured correctly. Sketchy also uses the Celery task management system, allowing users to scale Sketchy accordingly and manage time-intensive captures for large sites.
Getting Started with Sketchy
Sketchy is available now on the Netflix Open Source site and setup is straightforward. In addition, we’ve also created a Docker for Sketchy for interested users. Please visit the Sketchy wiki for documentation on how to get started.
Scumblr and Sketchy are helping the Netflix security team keep an eye on potential threats to our environment every day. We hope that the open source community can find new and interesting uses for the newest additions to the Netflix Open Source Software initiative. Scumblr, Sketchy, and the Workflowable gem are all available on our GitHub site now!
— Andy Hoernecke and Scott Behrens (Netflix Cloud Security Team)
Originally published at techblog.netflix.com on August 25, 2014.