Web Scraping, JSON, Dictionary and Pandas part_1

Mukesh Chaudhary
5 min readSep 11, 2020

--

Let’s dive in depth about web scrapping,JSON, Dictionary and pandas with example and simple way.

In this blog, we will discuss that how do we understand web scraping, json ,dictionary and pandas as a student and as a developer because I think there is some confusion arisen while we study, and implement in real life. Please don’t mix web scraping and JSON as a same concept even we use same python library, like request, for this concept. This is totally different concept each other . In simple way , JSON is database and web scrapping is method to capture contents as a data . During learning time , we see that JSON and pandas looks like very simple but we will face many challenges during real time project. There are a lot of problem come while trying to get data from internet like cybersecurity, permission, python library not support etc .I think this is very important topic because we works in world of data and we can’t get easy data from internet as student while working real time project. In this internet era , we can find every thing on internet . So we have to learn every thing about these before go to real life projects . I divide this blog two parts . One part is simple how to use JSON and pandas in simple way for just understanding core concept of JSON . Another part is about on how to handle JSON data from internet and what is different between web scrapping and JSON concept , and it will be on my second blog . Please check out part2too. Let’s start from simple.

  1. Dictionary:

Dictionary is very important core concept used to store an unordered collection of data values. If somebody ask me what is python in one word , I would say python is dictionary. Similarly, I would say group of class in java case. I think every programming language has same concept like variable, loop etc but they have something unique that makes them different each other. Dictionary is that something in python . If you know python, you may know dictionary. I assume here you know dictionary. Example of dictionary

{ "product":
{ "0":"Desktop Computer",
"1":"Laptop",
"2":"iphone",
"3":"Tablet"},
"price":
{ "0":"500",
"1":"650",
"2":"800",
"3":"400"}
}

2. JSON :

JSON is a very popular standardized data format that’s commonly used to transmit data. JSON stands for “JavaScript Object Notation “ — a lightweight data-interchange format that consists of key-value pairs.

Given JSON’s remarkable readability and its object-like structure, it has been widely used in web development and other software development settings. Let’s see first example of JSON format :

'{ "product":
{ "0":"Desktop Computer",
"1":"Laptop",
"2":"iphone",
"3":"Tablet"},
"price":
{ "0":"500",
"1":"650",
"2":"800",
"3":"400"}
}'

Oh! It looks like dictionary. Of course , it is dictionary but in text format which is defined by single quote. Now , we can imagine how it is easy to manage data in python. That is beauty on ‘JSON and python’ concept . Let’s dive a little more in JSON about some basic functionalities on how we can read and write JSON data in Python.

Data Type conversion:

When JSON data are transmitted, they’re in the form of texts or strings. But actually, when we prepare a JSON object, there are five valid data types, String, Number, Boolean, Array, and Object. In addition, there is a special type called Null, which we use to denote empty values for other data types.

We can handle JSON data in Python using its native data types, namely dict, list, tuple, string, int, float, bool, and NoneType. How can we convert JSON data to and from Python data? Let’s see the conversion table below.

3. Pandas & JSON :

When we come in Pandas and JSON , we should import two library json and pandas.

import json
import pandas as pd

The JSON module is a built-in Python module that is dedicated to handling JSON data by providing various methods to read and write JSON data.

The Pandas module is a very popular Python library that provides all important functionalities needed for data processing and analysis, especially for structured data, mainly tabular data.

3.1 Reading JSON data:

The process of reading and decoding a JSON object data is called as deserialization . We assume we have product json file which has .json extension . We can read via file, and then load in pandas data frame. Let’s see how is it happened in code.

import json# use the built-in json module
with open('product.json') as json_file:
product = json.load(json_file)

print("Use the built-in json module:\n ")
print(product)
# use the pandas modulewith open('product.json') as json_file:
product_pd = pd.read_json(json_file,orient='index')
print(" \nUse the pandas module:\n")
print(product_pd)

Output:

3.2 Writing json data:

It is called serialization when we dump data to json object data. Let’s see how

product = { "product":
{ "0":"Desktop Computer",
"1":"Laptop",
"2":"iphone",
"3":"Tablet"},
"price":
{ "0":"500",
"1":"650",
"2":"800",
"3":"400"}
}
# use the built-in json module
with open("product.json",'w+') as file:
json.dump(product,file)


# use the pandas module
df = pd.DataFrame(product)
print(df)
with open('product_pd.json','w+') as file:
df.to_json(file, orient = 'index')

Conclusion:

This blog is about a simple introduction Dictionary, JSON, Pandas together. It’s also about on how to read json data , and dump data to json file by both built-in json module and pandas module . However , Pandas is very effective library by which we can easily to read and write json file and widely used in every where.

References

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.htmlhttps://hackersandslackers.com/json-into-pandas-dataframes/https://www.geeksforgeeks.org/load-json-string-into-pandas-dataframe/

--

--