Simple Data Serialization And Extraction From JSON Using Python
I want to be able to extract information provided to me in JSON format. Here is the example:
{
"address": "vince0x@code.com",
"username": "vince0x",
"domain": "code.com",
"suggestion": "",
"validFormat": true,
"deliverable": true,
"fullInbox": false,
"hostExists": true,
"catchAll": false,
"gravatar": false,
"role": false,
"disposable": false,
"free": false
}
This is raw, unprocessed data that contains the information we want to extract. The JSON format is based on a C-Style language which makes it usable with Python. JSON supports primitive types (strings and numbers) and nested lists (tuples and objects). To begin extracting the data from JSON, it must first be serialized.
Serializing the data is a built in function provided by the json library module. It contains methods for that purpose which we can query from the development environment.
>>> dir(json)['JSONDecodeError', 'JSONDecoder', 'JSONEncoder', '__all__', '__author__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', '_default_decoder', '_default_encoder', 'codecs', 'decoder', 'detect_encoding', 'dump', 'dumps', 'encoder', 'load', 'loads', 'scanner']
Suppose we want to simply get all the unformatted data within the JSON file. First import the json library into memory. Then specify the JSON data source as a file. Write it to a variable ‘data’ using the ‘json.load’ method. This serializes the data into a string, unless otherwise specified. Finally, use the dumps() method which allows us to write the data as a string and then print to our output or display.
import jsonwith open('data.json') as json_file:
data = json.load(json_file)
b = json.dumps(data)
print(b)
This results in the following:
{"address": "vince0x@code.com", "username": "vince0x", "domain": "code.com", "suggestion": "", "validFormat": true, "deliverable": true, "fullInbox": false, "hostExists": true, "catchAll": false, "gravatar": false, "role": false, "disposable": false, "free": false}
Well that was just the preparation phase. Now that the data is serialized it can be processed to extract the information needed.
In this case, we already know the data contained within the JSON file. Let us say for example that we want to extract only a certain type of data. In this case I want the e-mail address that is contained in the “address”: field. I can just use the load() method.
import jsonwith open('data.json') as json_file:
data = json.load(json_file)
print('"address":', data['address'])
We can select what we want and this returns the “address”: field value of “vince0x@code.com”. Likewise we can apply the same code but change the argument to a different field.
import jsonwith open('data.json') as json_file:
data = json.load(json_file)
print('"validFormat":', data['validFormat'])
This will return the result “validFormat”: True. In more advanced coding, the data can be further parsed and written to a file and exported to another application.