Don’t be Fooled, NotAMagazine is not a Real Magazine, but it Features a Lot of News about Design
NotAMagazine is another experiment in the field of data aggregators. This completely automatically-generated magazine features the most popular news about the many fields of design that we touch every day at frog.
In an ever evolving field like design and innovation, being up to date isn’t just cool, it is necessary. Twitter is a perfect source of news, but in some way it lacks focus: first, Twitter updates are so fast that you cannot possibly read everything; second, being a Social Network, Twitter contains a lot of “noise” if your main goal is just getting informed about something very specific. So why not leverage the power and the reach of Twitter and build something on top of it to remove all the noise and return a better categorised set of information? Why not build an automatically generated magazine?
The first idea was regularly searching Twitter for specific keywords, analysing the results, and creating a databse of resources. The rationale is: if someone finds a link interesting, she will share it on Twitter, hence if I find a tweet containing a link I can consider that link worth reading. After few weeks I discovered that this isn’t always the case. My database was full of spam articles, fake news, harsh comments. The solution that worked for the Trending Topics project (NotAMagazine’s “grandpa”), creating a blacklist of users/websites was neither scalable enough to avoid spam, nor efficient enough to create a container full of meaningful resources. But the reverse approach (creating a whitelist of trusted sites) worked like a charm.
How it works
As mentioned above, the main source of information is Twitter. I structured the data this way:
- Collections (i.e. the main navigation categories you can browse on the site);
- Topics (i.e. the collection of keywords that belong to each Collection)
- Items (i.e. the articles themselves);
Every half an hour, a bot searches Twitter for each Topic belonging to each Collection, and does a first clean up of the received data: does the tweet contain a link? Is the link pointing to a whitelisted site? If not, discard. If the tweet meets both requirements is saved in a database for further analysis.
Every minute or so, another bot reads the list of links that have been saved and starts analysing them: a scraper connects to the URL and grabs OpenGraph tags (if available) or HTML meta-tags. Once all this information has been collected, an entry for a new Item is created and linked to the correct Topic.
The process is pretty straightforward, it’s just a little bit heavy on server resources, especially during the scraping activity: this is the reason why a queue of links is created, and the scraping and analysis of the links is performed asyncronously with just small sets of links at each run. Every day, the articles that have been imported more than five days before are removed from the database, but nothing prevents them from being imported again, in the future, if someone shares that link again.
The Twitter bot and the scraper are written in PHP. I already wrote elsewhere on this site why I like PHP so much, so I won’t say it again. Everything is built on top of my own PHP Boilerplate, a framework I built to quickly start PHP applications. Twitter APIs are consumed using Abraham Williams’ TwitterOAuth class. The scraper leverages the capabilities of Guzzle, an HTTP client for PHP. The database is MySql. The approved Items saved in the database are exposed to the frontend using a JSON-based API system.
The Progressive Web App is written to work following the “Offline First” approach: the user with a browser that supports Progressive Web Apps will be able to interact with the page almost immediately, and page updates (i.e. new articles) will be loaded in the background.
I plan to redesign the entire frontend sometimes soon. My idea is to get rid of the current frontend stack, and rewrite everything using ReactJS. Articles’ analysis will also receive an improvement with automatic tagging powered by Wit.ai, but I honestly don’t have any real due date for this. Part of the NotAMagazine’s family is Miss Fletcher, the demo chatbot I built on top of NotAMagazine’s database. She’s also expected to receive some sort of improvement, in form of third party services integration (mainly Slack and Twitter.
My name is Simone Lippolis, after spending almost ten years as Design Technologist at frog, I am now with Cisco, as a Data Visualization Expert. This article is part of my online portfolio that you can access at: simonelippolis.com.