Parsing a Spreadsheet Into a JSON File Using Python
Written by Aisha Ayoub, Abeer AlShubat
The following article explains how to parse data from a .csv file and a .xls file into .json file using python with multiple levels of dependency.
Pasring a CSV file
We first prepared a CSV spreadsheet with a number of redundant data we will represent in a nested json format.
The way we want to represent this data is by creating multiple nested levels, so the structure we have is:
A frame will contain an Id, name, protocol used and multiple segments.
A segment will contain a segment Id, name, src/dest ports and multiple packets.
A packet will contain a packet id, src/dest MAC, src/dest IP and receivers.
Now that we have our data ready, we will start writing our Python script:
We’ll start by importing the required libraries to interact with the file formats
import csv, json
then we’ll use the built-in function open() to open the csv file
f = open( 'filename.csv', 'rU' )
DictReader() will load the contents of the .csv file. And we’ll be using the fieldnames property to add an appropriate field name to each column of our spreadsheet.
reader = csv.DictReader( f, fieldnames = ("Protocol","Source Port","Destination Port","Source IP", "Destination IP", "Source MAC", "Destination MAC", "segment ID", "Frame name", "Frame ID", "segment name", "packet ID", "Receivers"))
then we’ll make two lists, the list store will collect the objects we’ll be parsing. As for the list framenames, this will help us keep track of what has been stored in our list of objects.
store = []
framenames = []
we’ll start the parsing by adding the first level of indent our (frame). While going throughout the sheet, we’ll create an object in frame level and add the frame contents, but we’ll keep the list of segments empty as it will filled in the next level of indent.
# Store frame names in a listfor row in reader:
frame = {"FrameName":row["Frame name"],
"FrameID": row["Frame ID"],
"protocol": row["Protocol"],
"segments":[]}
After that, we’ll check if the frame has been added to the list of framenames. If the name of that frame is not in the list, then we’ll add it to the list. Then we’ll append the new frame object to our store list to be added in our json file.
if row["Frame name"] not in framenames:
framenames.append(row["Frame name"])
store.append(frame)
for our second level of indentation, we’ll create a segment object and add the segmentname attribute with “” value
# Create Objects for Frames, segments and packets
segment = {"segmentName": "" }
Now, we’ll loop through the frames we stored in the first indentation, and in each frame we’ll reopen the spreadsheet to go through the rows and find the next segment within the frame.
for frame in store:
f = open( csvfile, 'rU' )
reader = csv.DictReader( f, fieldnames = ( "Protocol","Source Port","Destination Port","Source IP", "Destination IP", "Source MAC", "Destination MAC", "segment ID","Frame name", "Frame ID", "segment name", "packet ID", "Receivers"))
so if the new segment name doesn’t match the frame’s segment name, we’ll add a new segment object with all its attributes to that frame
for row in reader:
if frame["FrameName"] == row["Frame name"]:
if segment["segmentName"] != row["segment name"]:
segment = {
"segmentID":row["segment ID"],
"segmentName": row["segment name"],
"srcPort":row["Source Port"],
"destPort":row["Destination Port"],
"packets":[{
"packetID":row["packet ID"],
"Receivers":row["Receivers"],
"destIP":row["Destination IP"],
"destMAC":row["Destination MAC"],
"srcIP": row["Source IP"],
"srcMAC":row["Source MAC"]
}]
}
And then add the segment object to the frame object and finish our second indent level.
frame["segments"].append(segment)
Otherwise, we’ll create a packet object and fill its attribute. Afterwards we’ll add the packet object to the segment object and thus our third indent level.
else:
packet = {
"packetID":row["packet ID"],
"Receivers":row["Receivers"],
"destIP":row["Destination IP"],
"destMAC":row["Destination MAC"],
"srcIP": row["Source IP"],
"srcMAC":row["Source MAC"]
}
segment["packets"].append(packet)
Lastly, to store our data in a JSON format, we’ll create a JSON file
# Save the JSON
f = open( 'data.json', 'w')
and write the contents of list store to it using the function json.dumps
out = json.dumps(store, indent=4)
f.write(out)
Parsing an Excel file
The steps of parsing the excel sheet is almost the same as with the csv file except for a few diffrences we will discuss throughly
our data will be the same as .csv file except for the file format; it will be .xls format
in our script, we’ll import the required library to interact with the new file format
import xlrd
then we’ll use the function xlrd.open_workbook() to open the Excel file
book = xlrd.open_workbook('xlsfile.xls')
after that we’ll choose the sheet containg our data using the function sheet_by_index()
sh1 = book.sheet_by_index(0)
Again, we’ll make two lists, the list store will collect the objects we’ll be parsing. and framenames, this list will help us keep track of what has been stored in our list of objects.
store = []
framenames = []
We’ll start the parsing by adding the first level of indent (frame). While going throughout the sheet, we’ll check if the frame has been added to our list framenames. If the name of that frame is not on the list, then we’ll add it. And afterwards, we’ll create an object in frame level and add its contents, but we’ll keep the list of segments empty as it will be filled in the next level of indent. Lastly, we’ll append the new frame object to our store list which will be added to our json file.
# Store frame names in a list
for rx in range(1, sh1.nrows):
if sh1.row(rx)[8].value not in framenames:
framenames.append(sh1.row(rx)[8].value)
frame = {
"frameName": sh1.row(rx)[8].value,
"frameID": sh1.row(rx)[9].value,
"protocol": sh1.row(rx)[0].value,
"segments":[]
}
store.append(frame)
For the second level of indentation, we’ll create a segment object and add the segment name attribute with “” value.
segment = {"segmentName": "" }
Then we’ll loop through the frames we stored in the first indentation, and in each frame we’ll reread the spreadsheet to go through the rows and find the next segment within the frame.
for frame in store:
for rx in range(1, sh1.nrows):
if frame["frameName"] == sh1.row(rx)[8].value:
so if the new segment name does’nt match the object’s segment name, we’ll add a new segment object with all its attributes
if segment["segmentName"] != sh1.row(rx)[10].value:
segment = {
"segmentID":sh1.row(rx)[7].value,
"segmentName": sh1.row(rx)[10].value,
"srcPort":int(sh1.row(rx)[1].value),
"destPort":int(sh1.row(rx)[2].value),
"packets":[{
"packetID":sh1.row(rx)[11].value,
"Receivers":sh1.row(rx)[12].value,
"srcMAC": sh1.row(rx)[5].value,
"srcIP": sh1.row(rx)[3].value,
"destIP":sh1.row(rx)[4].value,
"destMAC":sh1.row(rx)[6].value
}]
}
and then add the segment object to the frame object and finish the second indent level.
frame["segments"].append(segment)
Otherwise, we’ll create a packet object and fill its attribute. Afterwards we’ll add the packet object to the segment object and thus our third indent level.
else:
packet = {
"packetID":sh1.row(rx)[11].value,
"Receivers":sh1.row(rx)[12].value,
"srcMAC": sh1.row(rx)[5].value,
"srcIP": sh1.row(rx)[3].value,
"destIP":sh1.row(rx)[4].value,
"destMAC":sh1.row(rx)[6].value
}
segment["packets"].append(packet)
Lastly, to store our data in a JSON format, we’ll create a JSON file
# Save the JSONf = open( 'data.json', 'w')
and write the contents of list store to it using the function json.dumps
#rewrite JSON
out = json.dumps(store, indent=4)
f.write(out)
as we’ve seen, the parsing process is quite straight forward, but when you have a huge number of redundant data you need to be able to understand how they connect and to what level of indentation can your code handle without breaking.
To get access to the full Python script, visit: https://github.com/aiayoub/Spreadsheet-Parser.git
Join Coinmonks Telegram Channel and Youtube Channel get daily Crypto News
Also, Read
- Copy Trading | Crypto Tax Software
- Grid Trading | Crypto Hardware Wallet
- Best Crypto Exchange | Best Crypto Exchange in India
- Best Crypto APIs for Developers
- Crypto Telegram Signals | Crypto Trading Bot
- Best Crypto Lending Platform
- An ultimate guide to Leveraged Token
- Best VPNs for Crypto Trading
- Crypto Trading Signals for Huobi | HitBTC Review
- TraderWagon Review | Kraken vs Gemini vs BitYard
- How to trade Futures on FTX Exchange
- OKEx vs KuCoin | Celsius Alternatives | How to Buy VeChain
- 3Commas vs. Pionex vs. Cryptohopper
- How to use Cornix Trading Bot
- Bitget Review | Gemini vs BlockFi cmd| OKEx Futures Trading
- 10 Best Places to Buy Crypto with Credit Card