Tapping into the Marvel API
Because who *wouldn’t* want all the character info in a CSV?
Singer is all about helping you work with data more easily. And we mean any data source. Want to get API data into a CSV? Cool. A SaaS data set into a Google Sheet? No problemo. Or maybe a random database into a BI tool like Magento or perhaps something to manage your data pipeline like Stitch. Yep. It can do that.
Best of all: Singer is open-source. Literally any data source can be pulled from, or what Singer calls a “Tap”. Alternatively any data can be sent to a destination, or “Target”.
To illustrate this point I wanted to show the process of creating a Tap and loading it into a CSV. Pretty reasonable use case. For illustration purposes I chose the Marvel API. I’ve been wanting to work with this API for a while now (pretty sure this is your fault, Keith). Anyway, seemed cool and might lead to some fun data mashups down the line.
First: Lé Marvel documentation:
Before we dive into the code it might be a good idea to check out a few places to familiarize ourselves with what’s going on.
For this example we’ll be calling the character endpoint and organizing everything by name, which you can see in the interactive Marvel Swagger documentation. It might also be helpful to check out the API result structure just to get acquainted.
And then of course if you’d like to play along you’ll need your own API keys which you can get by registering on the site.
Next: Introducing Singer
The next thing that would be helpful to check out is the Singer getting started documentation.
Let’s take a look at the part about developing a Python Tap:
There are two important bits here. This is the meat & potatoes of Singer:
Singer needs a SCHEMA. The singer-python library uses this to output schema in the right format:
It also needs a RECORD which can be used by:
For write_schema Singer uses something called JSON schema to format all the data. This basically says “Hey, data source! I’m gonna pull data (in this case GET from the Marvel API) and then we’re gonna make sure we format it real pretty so it’s more consumable.” Isn’t that nice?
And then write_records says “ok now that we know how we want our data to look and where it should go we’re going to send it where you want.” Coolness.
Finally: The actual Tap
To get started check out the Marvel Tap on Github
And let’s walk through the marvel.py file:
Lines 1–12 are all about getting everybody invited to the party:
This next bit is all about:
- Getting our API keys introduced using a config file (check out config_example.json)
- Allowing for correct formatting in the command line. This is important because to run Singer we’ll need to specify where the config file is and the type of output we want. We’ll note this at the end:
And now the Marvel API part of the script. You’ll recall from the documentation that Marvel requires certain parameters. In this case we need a timestamp and a has which holds out API keys. Like this:
There are also limits on calls which we’re going to address:
Next onto Singer. You’ll recall the schema needs us to say “Hey API put the info in this particular structure, please.” This is setting up our JSON schema:
And now the fun stuff. Let’s actually call the API.
The tricky part in this happens when the Marvel API puts a limit on the number of fields we get back. So we’re going to make a while loop. This way we can keep asking the API for more rows and dumping them into the csv until there aren’t anymore.
In our while loop first we’re going to give Marvel API with all the parameters it asks for. Then we’re going to write the record:
Finally let’s tell Python we want to run our main function first:
Now let’s run this puppy! Head over to terminal and run the script:
python3 marvel.py -c config.json | target-csv
Voila! you should now have a mess of character data in a characters.csv file.
There’s 2991 rows of fun stuff in there:
There’s links to images of characters I have never in my life seen:
And here’s a super quick data visualization using Raw to show the types of data we can use:
This circle pack shows the name of the comic mapped to the number of comics available. There’s quite a few Wolverine comics which isn’t necessarily surprising. However, I’ve never even heard of Squadron Sinister or Strong Guy and there’s a decent amount of comics out there:
And then just to show another type of visualization here is a tree map of the first 50 rows. This maps the name of the comic to the number of issues in the collection. Sweet.
In summary, we now have a treasure trove of Marvel nerdiness in an easily consumable CSV ready to be imported, analyzed, chopped, mashed, and whatever else your heart might desire.