Tagged in

Clojure

Google Cloud - Community
Google Cloud - Community
A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.
More information
Followers
58K
Elsewhere
More, on Medium

Using Dataflow in Clojure to process Google’s huge new WikiReading dataset

Yesterday I was exploring the new WikiReading dataset, and managed to get its 208GB of uncompressed JSON down into about 50GB by simplifying the structure of the objects — basically removing a bunch of…