Exploring mget: More fun with Web pages metadata

Picture by Mickaël Rémond. Taken during the final game of French LOL Open Tour

In a previous post, I explained what led me to write mget, as a tool to explore web page metadata. In this post, I will explain how to easily discover Twitter page properties to download related images.


Web pages contain important pieces of information as embedded metadata. As such, mget along with typical Unix command-line tools can prove to be pretty handy to perform some task that would be difficult to do without parsing the HTML.

I will show you how to simply download a Twitter image using just the Twitter URL, starting with pic.twitter.com, followed by an ID.

Usually those links load the Twitter page, rendering the full tweet inside Twitter interface. There is no simple way to directly download the image from that link.

However, Twitter page for a tweet contains Open Graph metadata. With mget, you can extract those info in JSON format, filter the value with jq, and download the image with wget.


As a demonstration, we will use this image URL https://pic.twitter.com/eIDZytxQVJ

You can check the available metadata with mget:

$ ./mget https://pic.twitter.com/eIDZytxQVJ
{
"properties": {
"og:description": "“#gowin LOL open tour finale”",
"og:image": "https://pbs.twimg.com/media/DsyMaG6WkAE1I5Q.jpg:large",
"og:site_name": "Twitter",
"og:title": "Mickaël Rémond on Twitter",
"og:type": "article",
"og:url": "https://twitter.com/mickael/status/1066381597702803458",
"title": "Mickaël Rémond sur Twitter : \"#gowin LOL open tour finale… \""
}
}

As you can see, the value you need is in the og:image property. Let extract it with jq. You can install jq with homebrew on MacOS:

brew install jq

You can then filter the content you need from the JSON result coming from mget. It will extract the proper URL:

$ ./mget https://pic.twitter.com/eIDZytxQVJ | jq '.properties."og:image"' -r 
https://pbs.twimg.com/media/DsyMaG6WkAE1I5Q.jpg:large

You now just have to expand the command to pass it to wget to download the image:

./mget https://pic.twitter.com/eIDZytxQVJ | jq '.properties."og:image"' -r | xargs wget -O image.jpg

The wanted image will be saved in the file named image.jpg. You will end up with the image in the header of this post.

mget is available as part of the Data Portability Kit project.

I hope you enjoy the tip and that it will make you want to explore Web meta information in more depth.