Encrypted pandas DataFrames for secure storage and sharing in Python
Introducing cryptpandas — a lightweight tool for dataframe encryption
Modern data science often involves working with confidential information. Data encryption in Python can however be cumbersome and involve a few convoluted steps. Instead, most data scientists need the ability to store and share data with collaborators in a fast, efficient, and reliable manner, so as to allow sharing of data over non-secure channels, or within a team where not all members may have clearance to sensitive data.
Introducing cryptpandas
! cryptpandas
is a lightweight python tool for encryption (and decryption) of pandas dataframes. We can simply install with pip install cryptpandas
, and we are set and ready to encrypt and decrypt our pandas dataframes.
Encrypting DataFrames
First, let’s start by defining a simple pandas dataframe:
Then, we can simply import cryptpandas
and easily encrypt our dataframe with a familiar syntax:
The file file.crypt
is now encrypted and cannot be read without the set password.
Decrypting DataFrames
Decrypting files is now very easy:
Salt, for an additional layer of security
What we have described so far allows any user in possession of the password to read the encrypted file. However, for an additional layer of safety, users can also generate their own salt, a random piece of data which can be used to safeguard password storage. By default, a salt common to all users is stored in cryptpandas.SALT
, but a new one can be generated via the built in function cryptpandas.make_salt
. Here is an example:
Now in order to read the encrypted file, both password and salt need to be provided.
Conclusions
Pythoners in need of a simple and reliable tool for encrypting their pandas dataframes can now addcryptpandas
to their toolkit. In this brief tutorial we have shown how to easily encrypt and decrypt dataframe using cryptpandas
.
Feel free to leave comments, suggestions for edits, or ask questions in the dedicated section below!