Parsing Hangouts.json

Recently I started working on a reference Laravel / AngularJS application after it became clear something like this was highly desired on the r/php subreddit. I came up with what I think is a cool idea to take all your hangouts data and put it in a database and generate some cool reports for users on the front end. I am not finished with that yet, sorry for those who are waiting, but I am here to talk about a small side project I ran into while doing this project.

After setting up the database, seeders, models and creating the CRUD methods for the User table it was time to actually get the data into the database. I looked around and found where I could download my google hangouts data, unsure of what format it would even come out in at this point. Google Takeout can be found here for others who want this data. Out came a 200MB json file. Great…

Although PHP is likely one of the worst languages you can choose for parsing a file like this I was determined to keep the application relatively simple. To me, that meant not including different languages since others would be using this as a reference and that would get needlessly confusing. I figured this was something that has been done 100 times already but when looking all I came up with was this. First I found a PHP extension that looks awesome butrequired a user to add an extension in php.ini and compile the source. Ext-jsonreader is located here for people who want to give it a shot. Next I found one that was a well written composer package but looking over how they parse the files I believe it would take way too long since they are handling a ton of cases I do not care about. It also required you to create an entire class to handle the parsing of the object. You can checkout salsifys JSON parser on github.

After posting to reddit a few people have wondered about my lack of using Salsifys JSON parser outside of what is posted above. A few things I found was the initial issue that it did not handle my JSON file which I am working with them on right now. Second I plan on fine tuning the parser to speed it up even more although initial tests have shows a 127s vs 157s time. So 30 seconds saved right away and more optimization to come. This also isn’t a perfect test as I had to modify the code in Salsifys JSON parser to completely skip some of it’s logic so it would pass the test.

Not finding anything suitable to my needs I started looking over the Hangouts.json file to find the best way to parse the file. I quickly found that the json is structured around one large array called conversation_state. This for me contains 152 different objects with the largest being just over 10MB. None of which individually can’t be read in with json_decode which will make it very easy to deal with the rest of the object.

Knowing this I set out to create an extremely easy to use package called jmem. Jmem asserts a few things before parsing your file in order to be able to parse it in any reasonable amount of time. First, that the file you are passing in is indeed valid JSON. Second that you are using php 5.5 because you really should be at this point… right.

After creating a messy prototype that you can see in my initial commits I brought it into the new age using the PSR-0 standards and adding it to the composer package repository. Ultimately I came up with this…

$gen = new Jmem\JsonLoader(“Hangouts.json”, “conversation_state”);
foreach($gen->parse()->start() as $obj) {
 $obj->stream;
}

Check it out at https://github.com/michaeljs1990/jmem.

P.S. If you are writing a package please document it in some kind of sane way. The amount of packages I came across that lacked even a simple example of how to start using it was astounding although it’s your free time to do as you please ☺