Parsing a Spreadsheet Into a JSON File Using Python

Aisha Ayoub
Coinmonks
6 min readJul 17, 2018

--

Written by Aisha Ayoub, Abeer AlShubat

The following article explains how to parse data from a .csv file and a .xls file into .json file using python with multiple levels of dependency.

Pasring a CSV file

We first prepared a CSV spreadsheet with a number of redundant data we will represent in a nested json format.

.csv file with multipe redundant data

The way we want to represent this data is by creating multiple nested levels, so the structure we have is:

A frame will contain an Id, name, protocol used and multiple segments.

A segment will contain a segment Id, name, src/dest ports and multiple packets.

A packet will contain a packet id, src/dest MAC, src/dest IP and receivers.

Now that we have our data ready, we will start writing our Python script:

We’ll start by importing the required libraries to interact with the file formats

then we’ll use the built-in function open() to open the csv file

DictReader() will load the contents of the .csv file. And we’ll be using the fieldnames property to add an appropriate field name to each column of our spreadsheet.

then we’ll make two lists, the list store will collect the objects we’ll be parsing. As for the list framenames, this will help us keep track of what has been stored in our list of objects.

we’ll start the parsing by adding the first level of indent our (frame). While going throughout the sheet, we’ll create an object in frame level and add the frame contents, but we’ll keep the list of segments empty as it will filled in the next level of indent.

After that, we’ll check if the frame has been added to the list of framenames. If the name of that frame is not in the list, then we’ll add it to the list. Then we’ll append the new frame object to our store list to be added in our json file.

for our second level of indentation, we’ll create a segment object and add the segmentname attribute with “” value

Now, we’ll loop through the frames we stored in the first indentation, and in each frame we’ll reopen the spreadsheet to go through the rows and find the next segment within the frame.

so if the new segment name doesn’t match the frame’s segment name, we’ll add a new segment object with all its attributes to that frame

And then add the segment object to the frame object and finish our second indent level.

Otherwise, we’ll create a packet object and fill its attribute. Afterwards we’ll add the packet object to the segment object and thus our third indent level.

Lastly, to store our data in a JSON format, we’ll create a JSON file

and write the contents of list store to it using the function json.dumps

Parsing an Excel file

The steps of parsing the excel sheet is almost the same as with the csv file except for a few diffrences we will discuss throughly

our data will be the same as .csv file except for the file format; it will be .xls format

.xls file with redundant data

in our script, we’ll import the required library to interact with the new file format

then we’ll use the function xlrd.open_workbook() to open the Excel file

after that we’ll choose the sheet containg our data using the function sheet_by_index()

Again, we’ll make two lists, the list store will collect the objects we’ll be parsing. and framenames, this list will help us keep track of what has been stored in our list of objects.

We’ll start the parsing by adding the first level of indent (frame). While going throughout the sheet, we’ll check if the frame has been added to our list framenames. If the name of that frame is not on the list, then we’ll add it. And afterwards, we’ll create an object in frame level and add its contents, but we’ll keep the list of segments empty as it will be filled in the next level of indent. Lastly, we’ll append the new frame object to our store list which will be added to our json file.

For the second level of indentation, we’ll create a segment object and add the segment name attribute with “” value.

Then we’ll loop through the frames we stored in the first indentation, and in each frame we’ll reread the spreadsheet to go through the rows and find the next segment within the frame.

so if the new segment name does’nt match the object’s segment name, we’ll add a new segment object with all its attributes

and then add the segment object to the frame object and finish the second indent level.

Otherwise, we’ll create a packet object and fill its attribute. Afterwards we’ll add the packet object to the segment object and thus our third indent level.

Lastly, to store our data in a JSON format, we’ll create a JSON file

and write the contents of list store to it using the function json.dumps

as we’ve seen, the parsing process is quite straight forward, but when you have a huge number of redundant data you need to be able to understand how they connect and to what level of indentation can your code handle without breaking.

To get access to the full Python script, visit: https://github.com/aiayoub/Spreadsheet-Parser.git

Join Coinmonks Telegram Channel and Youtube Channel get daily Crypto News

Also, Read

--

--