Parsing a Spreadsheet Into a JSON File Using Python

Published in

Coinmonks

6 min readJul 17, 2018

The following article explains how to parse data from a .csv file and a .xls file into .json file using python with multiple levels of dependency.

Pasring a CSV file

We first prepared a CSV spreadsheet with a number of redundant data we will represent in a nested json format.

The way we want to represent this data is by creating multiple nested levels, so the structure we have is:

A frame will contain an Id, name, protocol used and multiple segments.

A segment will contain a segment Id, name, src/dest ports and multiple packets.

A packet will contain a packet id, src/dest MAC, src/dest IP and receivers.

Now that we have our data ready, we will start writing our Python script:

We’ll start by importing the required libraries to interact with the file formats

import csv, json

then we’ll use the built-in function open() to open the csv file

f = open( 'filename.csv', 'rU' )

DictReader() will load the contents of the .csv file. And we’ll be using the fieldnames property to add an appropriate field name to each column of our spreadsheet.

reader = csv.DictReader( f, fieldnames = ("Protocol","Source Port","Destination Port","Source IP", "Destination IP", "Source MAC", "Destination MAC", "segment ID", "Frame name", "Frame ID", "segment name", "packet ID", "Receivers"))

then we’ll make two lists, the list store will collect the objects we’ll be parsing. As for the list framenames, this will help us keep track of what has been stored in our list of objects.

store = []
framenames = []

we’ll start the parsing by adding the first level of indent our (frame). While going throughout the sheet, we’ll create an object in frame level and add the frame contents, but we’ll keep the list of segments empty as it will filled in the next level of indent.

# Store frame names in a listfor row in reader:
   frame = {"FrameName":row["Frame name"],
   "FrameID": row["Frame ID"],
   "protocol": row["Protocol"],
   "segments":[]}

After that, we’ll check if the frame has been added to the list of framenames. If the name of that frame is not in the list, then we’ll add it to the list. Then we’ll append the new frame object to our store list to be added in our json file.

   if row["Frame name"] not in framenames:
      framenames.append(row["Frame name"])
      store.append(frame)

for our second level of indentation, we’ll create a segment object and add the segmentname attribute with “” value

# Create Objects for Frames, segments and packets
segment = {"segmentName": "" }

Now, we’ll loop through the frames we stored in the first indentation, and in each frame we’ll reopen the spreadsheet to go through the rows and find the next segment within the frame.

for frame in store:
   f = open( csvfile, 'rU' )
   reader = csv.DictReader( f, fieldnames = ( "Protocol","Source Port","Destination Port","Source IP", "Destination IP", "Source MAC", "Destination MAC", "segment ID","Frame name", "Frame ID", "segment name", "packet ID", "Receivers"))

so if the new segment name doesn’t match the frame’s segment name, we’ll add a new segment object with all its attributes to that frame

for row in reader:
      if frame["FrameName"] == row["Frame name"]:
         if segment["segmentName"] != row["segment name"]:
            segment = {
            "segmentID":row["segment ID"],
            "segmentName": row["segment name"],
            "srcPort":row["Source Port"],
            "destPort":row["Destination Port"],
            "packets":[{
                "packetID":row["packet ID"],
                "Receivers":row["Receivers"],
                "destIP":row["Destination IP"],
                "destMAC":row["Destination MAC"],
                "srcIP": row["Source IP"],
                "srcMAC":row["Source MAC"]
            }]
         }

And then add the segment object to the frame object and finish our second indent level.

            frame["segments"].append(segment)

Otherwise, we’ll create a packet object and fill its attribute. Afterwards we’ll add the packet object to the segment object and thus our third indent level.

         else:
            packet = {
            "packetID":row["packet ID"],
            "Receivers":row["Receivers"],
            "destIP":row["Destination IP"],
            "destMAC":row["Destination MAC"],
            "srcIP": row["Source IP"],
            "srcMAC":row["Source MAC"]
            }
            segment["packets"].append(packet)

Lastly, to store our data in a JSON format, we’ll create a JSON file

# Save the JSON
f = open( 'data.json', 'w')

and write the contents of list store to it using the function json.dumps

out = json.dumps(store, indent=4)
f.write(out)

Parsing an Excel file

The steps of parsing the excel sheet is almost the same as with the csv file except for a few diffrences we will discuss throughly

our data will be the same as .csv file except for the file format; it will be .xls format

in our script, we’ll import the required library to interact with the new file format

import xlrd

then we’ll use the function xlrd.open_workbook() to open the Excel file

book = xlrd.open_workbook('xlsfile.xls')

after that we’ll choose the sheet containg our data using the function sheet_by_index()

sh1 = book.sheet_by_index(0)

Again, we’ll make two lists, the list store will collect the objects we’ll be parsing. and framenames, this list will help us keep track of what has been stored in our list of objects.

store = []
framenames = []

We’ll start the parsing by adding the first level of indent (frame). While going throughout the sheet, we’ll check if the frame has been added to our list framenames. If the name of that frame is not on the list, then we’ll add it. And afterwards, we’ll create an object in frame level and add its contents, but we’ll keep the list of segments empty as it will be filled in the next level of indent. Lastly, we’ll append the new frame object to our store list which will be added to our json file.

# Store frame names in a list
for rx in range(1, sh1.nrows):
   if sh1.row(rx)[8].value not in framenames:
      framenames.append(sh1.row(rx)[8].value)
      frame = {
        "frameName": sh1.row(rx)[8].value,
        "frameID": sh1.row(rx)[9].value,
        "protocol": sh1.row(rx)[0].value,
        "segments":[]
      }
      store.append(frame)

For the second level of indentation, we’ll create a segment object and add the segment name attribute with “” value.

segment = {"segmentName": "" }

Then we’ll loop through the frames we stored in the first indentation, and in each frame we’ll reread the spreadsheet to go through the rows and find the next segment within the frame.

for frame in store:
   for rx in range(1, sh1.nrows):
      if frame["frameName"] == sh1.row(rx)[8].value:

so if the new segment name does’nt match the object’s segment name, we’ll add a new segment object with all its attributes

         if segment["segmentName"] != sh1.row(rx)[10].value:
            segment = {
            "segmentID":sh1.row(rx)[7].value,
            "segmentName": sh1.row(rx)[10].value,
            "srcPort":int(sh1.row(rx)[1].value),
            "destPort":int(sh1.row(rx)[2].value),
            "packets":[{
               "packetID":sh1.row(rx)[11].value,
               "Receivers":sh1.row(rx)[12].value,
               "srcMAC": sh1.row(rx)[5].value,
               "srcIP": sh1.row(rx)[3].value,
               "destIP":sh1.row(rx)[4].value,
               "destMAC":sh1.row(rx)[6].value
               }]
            }

and then add the segment object to the frame object and finish the second indent level.

            frame["segments"].append(segment)

Otherwise, we’ll create a packet object and fill its attribute. Afterwards we’ll add the packet object to the segment object and thus our third indent level.

        else:
           packet = {
           "packetID":sh1.row(rx)[11].value,
           "Receivers":sh1.row(rx)[12].value,
           "srcMAC": sh1.row(rx)[5].value,
           "srcIP": sh1.row(rx)[3].value,
           "destIP":sh1.row(rx)[4].value,
           "destMAC":sh1.row(rx)[6].value
        }
        segment["packets"].append(packet)

Lastly, to store our data in a JSON format, we’ll create a JSON file

# Save the JSONf = open( 'data.json', 'w')

and write the contents of list store to it using the function json.dumps

#rewrite JSON
out = json.dumps(store, indent=4)
f.write(out)

as we’ve seen, the parsing process is quite straight forward, but when you have a huge number of redundant data you need to be able to understand how they connect and to what level of indentation can your code handle without breaking.

To get access to the full Python script, visit: https://github.com/aiayoub/Spreadsheet-Parser.git

Join Coinmonks Telegram Channel and Youtube Channel get daily Crypto News

Also, Read

Copy Trading | Crypto Tax Software
Grid Trading | Crypto Hardware Wallet
Best Crypto Exchange | Best Crypto Exchange in India
Best Crypto APIs for Developers
Crypto Telegram Signals | Crypto Trading Bot
Best Crypto Lending Platform
An ultimate guide to Leveraged Token
Best VPNs for Crypto Trading
Crypto Trading Signals for Huobi | HitBTC Review
TraderWagon Review | Kraken vs Gemini vs BitYard
How to trade Futures on FTX Exchange
OKEx vs KuCoin | Celsius Alternatives | How to Buy VeChain
3Commas vs. Pionex vs. Cryptohopper
How to use Cornix Trading Bot
Bitget Review | Gemini vs BlockFi cmd| OKEx Futures Trading
10 Best Places to Buy Crypto with Credit Card

Parsing a Spreadsheet Into a JSON File Using Python

Pasring a CSV file

Now that we have our data ready, we will start writing our Python script:

Parsing an Excel file

Also, Read

Written by Aisha Ayoub