Creating a Purchase Agent Database with MySQL, Selenium, PyInstaller and 17track

Lester So
Analytics Vidhya
Published in
7 min readNov 12, 2020

用MySQL, Selenium, PyInstaller和17track建立一個代購資料庫

1. Display of result

1.1 Youtube video

*****************************NOTE ***************************

If you do not feel comfortable with coding, actually none of it is required. In the scenarios, both coding method and graphical interface method are included. Coding can enhance the experience of playing with the data but I promise you that you can just manipulate the database with the software as if it is an advanced Excel (the reasons for this software being superior to Office Excel are listed in Part 2).

*****************************NOTE ***************************

Please download MySQL Workbench, which is the user interface of this tool: https://dev.mysql.com/downloads/installer/

The Github link of the project content: https://github.com/lester1027/order_form_system

You may download the tools and sample data in this project from the link.

The code used in the video is shown as follows.

Scenario 1: Insert a company named Armani into the ‘companies’ table.

INSERT INTO companies
(company_name)
VALUES (‘Armani’)

Scenario 2: Insert a customer named Ann Tsukamoto in the ‘customer’ table.

INSERT INTO customers
(first_name,
last_name)
VALUES (‘Ann’,
‘Tsukamoto’)

Scenario 3: Insert an item named ‘Small, palmellato leather La Prima bag’ priced at CNY 10600 produced by Armani to the ‘items’ table.

INSERT INTO items
(item_name,
price_cny,
company_id)
VALUES (‘Small, palmellato leather La Prima bag’,
10600,
8)

Scenario 4: Insert a sale record in which Ann Tsukamoto bought 2 ‘Small, palmellato leather La Prima bag’ on 9-Nov-2020, the parcel ID is EK465326285HK.

INSERT INTO sales
(purchase_date,
customer_id,
item_id,
quantity,
parcel_id)
VALUES (‘2020–11–09’,
13,
22,
2,
‘EK465326285HK’)

For other common manipulation statements like UPDATE and DELETE, please refer to the links https://www.w3schools.com/sql/sql_update.asp and https://www.w3schools.com/sql/sql_delete.asp. This website is a good source when you want to check the syntax of a statement quickly.

1.2 Tableau

A Tableau dashboard based on the sample data of this project

Tableau link: https://public.tableau.com/views/order_form_system_dashboard/Dashboard1?:language=zh-Hant&:display_count=y&publish=yes&:origin=viz_share_link

The query for sheet 1’s data:

SELECT

DATE_FORMAT(s.purchase_date, ‘%M-%Y’) as purchase_month,

c.company_name,

i.company_id,

COUNT(s.quantity) AS quantity

FROM

sales s

INNER JOIN

items i ON s.item_id = i.item_id

INNER JOIN

companies c ON i.company_id = c.company_id

GROUP BY s.purchase_date , i.company_id , c.company_name;

The query for sheet 2’s data:

SELECT

CONCAT(c.first_name, ‘ ‘, c.last_name) AS full_name,

SUM((v.quantity * v.price_record_cny)) AS revenue

FROM

v_track_prices v

INNER JOIN

customers c ON v.customer_id = c.customer_id

GROUP BY c.first_name , c.last_name

ORDER BY revenue DESC

LIMIT 5;

2. Motivation

Purchase agents are a group of people who make a living by helping others to buy products across different regions in the world. When the amount of orders increases, there is a need for a systematic approach to handle the trades. The relational database is a good choice. You may ask why they cannot just use Office Excel, which is indeed the prevalent tool used by most of the purchase agents currently. This question can be answered by 3 aspects, data size, data integrity and data analysis.

i) Data size

Theoretically, Office Excel can have a maximum of 1,048,576 rows of record stored but the software can be extremely laggy far before this number is reached. Moreover, the record limit can be troublesome or even dangerous sometimes. Covid: how Excel may have caused loss of 16,000 test results in England is a vivid example. On the other hand, relational database is explicitly for storing data. It can store a huge number of records with ease. Even with the large amount of data stored, it can still maintain high efficiency when manipulating data because of the underlying optimizer determining how to handle tasks properly.

ii) Data integrity

A database pre-sets the type of data contained in a certain field when it is built. When the wrong data type is input (e.g. ‘apple’ is input into a field with type INT), an error will be returned. This property minimizes the risk of entering wrong data. Office Excel is good for fast calculation but when it comes to inputting a large amount of data, the flexibility turns out to be a downside. It does not require the users to pre-define the data types of a spreadsheet. The user may not be able to notice errors. In some more serious situations, Excel changes the data itself, like the case in Scientists rename human genes to stop Microsoft Excel from misreading them as dates.

Spreadsheets of a purchase agent. The structure is too flexible for data integrity.

Setting key constraints is a mature property in relational databases. These constraints also prevent users from inputting improper data. For example, in this project, if the item_id has not been added to the ‘items’ table first, it cannot be added to the ‘sales’ table.

iii) Data analysis

When users of relational databases want to acquire insight from the data, they can easily write SQL queries to get the output in the structure desired. SQL is a domain-specific language for relational database. This means it is specially designed for relational databases. Instead of a series of filtering, sorting, copying and pasting in Excel, relational databases allow users to get data under complex criteria in a short time.

3. Goal

The goal of this project is to develop an integrated database system for convenient data recording, manipulation and analysis. It should fulfill the needs for

i) Data tables with relational connections

Relational constraints make a database easy to update and analyze. The data integrity is more robust with constraints.

ii) Automatic price history update

After inserting a new item or updating the price of a current item, a change should be induced in the price tracking table automatically. It allows the users to check the price change of items easily.

iii) Automatic shipping information acquisition

If the database can integrate with the information of parcel shipping information, the users can track the items easily without using a separate tool.

iv) Easy data analysis

The data in the database should be readily for further analysis. Data mining is important for business decisions like adjusting advertising strategies and planning discount events.

4. Methods

i) Relational schema: data tables with relational connections

Relational Schema of the Database

The graph above is the relational schema of the database in this project. A relational schema is a figure illustrating the structure of a database. ‘PK’ stands for ‘primary key’. The primary keys in a table is unique for every record. ‘FK’ stands for ‘foreign key’. Foreign keys lay out the relational connections of the database. For example, in the above figure, the customer_id in ‘sales’ is a FK pointing toward the PK item_id in ‘items’. This means the customer_id in ‘sales’ is referencing to the PK item_id in ‘items’. If an item_id does not exist in ‘items’, it cannot exist in ‘sales’.

The code creating the database is create_database.sql.

ii) MySQL triggers: automatic price history update

MySQL trigger is a type of stored program, associated with a table. It will be automatically activated once a specific event related to the table of association occurs.

Trigger ‘price_update_insert’ is used to update the ‘price_history’ table when there is a new record of an item in the ‘items’ table.

Code:

Trigger ‘price_update_update’ is used to update the ‘price_history’ table when there is a change in price in the existing records of the ‘items’ table.

Code:

iii) Selenium, PyMySQL and PyInstaller: automatic shipping info acquisition

When I wanted to acquire shipping information automatically, the first idea that came across my mind was using API, naturally. However, most of the tracking APIs are not free while the free ones have few request quota. I tried to communicate with the team of 17track, which is one of the most popular tracking systems currently. However, it seems that their API is not for public use. They required me to give them detailed information about my company, which I do not have. Even if I successfully get an API access, I cannot develop a project for general use with it because other people do not have the access.

The email reply from 17track
API Application Form

That’s why I thought of a method to bypass this limitation. I found that 17track has a page which is originally used for letting people test how they can embed the tracking console into their own website.

17track’s Website

I used Chrome to inspect the elements in this page and used Selenium, a web driver to enter parcel ID, press the ‘TRACK’ button and acquire the return result.

Selenium’s Actions

Then I used PyMySQL library for Python to connect to the database developed previously and update the shipping_info column in the ‘sales’ table.

PyMySQL’s Actions

Lastly, I used PyInstaller to convert the whole script (acquire_shipping_info.py) into an executable program (acquire_shipping_info.exe) such that others can use the updating function easily.

iv) Tableau: easy data analysis

The data can be easily manipulated and exported as csv files. The csv files are good for data mining. The above interactive Tableau dashboard shown in Part 1 demonstrated how to combine SQL queries and Tableau sheets to gain insight from the data.

--

--