How to insert data from CSV file into a SQLite Database using Python

Using SQLAlchemy and Pandas to load CSV file data into database

Saurabh Kulkarni
The Startup
5 min readAug 4, 2019

--

Photo by Mika Baumeister on Unsplash

Why write this post?

I am working on a business problem where I have to build a self sustaining dashboard from scratch, that will display visualizations of key performance indicators for different products. By self sustaining I mean, that it should be sharable with team members who can view and interact with visualizations and it should update itself based on a schedule.

Currently, my team uses Excel that contains data for various products. This data is stored on network drives. However, since we build a lot of products, it is cumbersome to process these large amount of files for data analysis and visualizations. I am planning to use either Tableau (paid visualization software) or Flask (free) to build a dashboard. Either ways, I want data to be stored in a SQLite database rather than a large number of CSV files on a network drive.

After spending hours on Stack Overflow and going through tutorials and multiple solutions, I finally got a working solution to insert data from CSV files into a SQLite database. It took me some time because I had not worked with SQLAlchemy before. There are a lot of different solutions on the web, but I observed that they were described in bits and pieces. So I thought of writing a short tutorial that would describe the process from start to end.

What did I want to achieve?

I have 200+ CSV files. Every CSV file contains a table with 50–200 rows. Column headers and number of columns are same in each file. Data in each CSV file is different and describes a particular product. The objective was to create a database containing tables derived from each CSV file(one table for each product).

Code

I use Jupyter Notebook since it is easy to prototype, especially when you are experimenting like me.

Here is the Github link to the repo. It contains:

  1. code_for_blog.ipynb: Jupyter Notebook file containing code
  2. product_data.csv: Sample CSV file
  3. product_dbase.sqlite: Output of program (sample), how the database would look like

I have added lots of comments in the code to make it easier to understand what each chunk is doing.

Code Explanation

The first few lines of code imports Pandas, SQLAlchemy, glob and os which are the required modules.

Importing Required Modules

The next part deals with SQLAlchemy setup. If you are not familiar with SQLAlchemy, I would recommend reading about it. I came across a great tutorial which you can see here.

SQL Alchemy Setup

We then define the structure of our table using class. The class “product_table” contains attributes which are columns of our table. For each column, we create an attribute.

Class to define structure of table

Now, we define a function that contains our table definition. We specify the types for each column. In our example, all columns are “String”. The function also creates and maps our table.

Function to define table

My function loops through CSV files and creates multiple tables (one table for each CSV file). However, when it maps the second table, my program hit an error “Class already has a primary mapper defined”.

I was not able to wrap my head around this error but I was able to come up with a work around. I am sure there is another way of side-stepping this problem but this way worked for me.

Creating a dummy table to avoid “Class already has a primary mapper defined” error

I created a dummy table which has the same structure as my product_table class. I mapped that table and it became the “primary mapper”. In the function table_definition, you will observe that the mapper method has non_primary set to True.

Lastly, the function create_table loops through all CSV files in the folder. It uses Pandas to read CSV files and converts into a list.

csv_data is a list of lists. Each list in that list is a row from the CSV file.

We are using dynamic table names here. It means that each table that we create in this database (from each CSV file) has a different name.

By calling table_definition function, we define our table.

Finally, we loop through each row (each list in our lists of lists, csv_data) and insert data into our newly created table.

Conclusion and final comments

I hope this serves as one of the many ways of inserting data from CSV files into a SQLite3 database. I am sure there are more ways of doing this and I would love to know how this solution can be improved both in terms of compactness and efficiency.

Special note

We extract this name from the CSV filename using “split”. Please note that in the line: “table_name_from_file=file_name.split(‘/’)[8][:-4]”, the part is bold will be different for you.

file_name is a string that contains path of current CSV file being read. Depending on the operating system you are using it will either have ‘\’ or ‘\\’.

For example, I am using Ubuntu. So my file path looks like: “/home/saurabh/Documents/filename.csv”

On my Windows 10 workstation, I come across “\\”. So you need to change the file_name.split line accordingly.

Additionally, the split() method gives us a list that splits the file path. Depending on your path and file name you will also need to modify the bold part: “file_name.split(‘/’)[8][:-4]”. The position of file name in your path may change (which is [8]).

--

--