Read CSV Files from your Delta.Storage using Pandas & Delta.Storage API

Marvin Jay Reyes
4 min readNov 30, 2023

--

Image by @frimufilms on Freepik

As a Data Engineer, I’ve always loved the idea of integrating multiple tools altogether to add on to my Data Engineering stack. With various tools popping out here and there. I asked myself

“Is it possible to actually manipulate and transform stored on a DApp (Decentralized Application)?”

The common struggle with DApp’s is that sometimes it would take awhile for data to come through due to the limitations of blockchain.

But as I went along with playing around with Delta.Storage, I noticed that one of their Key features is to be able to retrieve the contents of your files via their API. So being the curious cat that I am, I tried to retrieve a CSV file that I uploaded via the UI and used their API to retrieve and display the data into a Pandas Dataframe.

For those of you who haven’t read my introduction to Delta.Storage. Please read about it here!

Data from Kaggle

So here’s how I did it.

  1. Get your API-Key from the Delta.Storage UI. Simply go to your UI, Click on “API Keys”, and then click on “New API Key”. You can simply give API Endpoint access to your API key as needed. For this article, I simply just ticked all of the access.
API Key Creation

2. Open up your IDE of preference and create a Python Notebook. In my case, I’m using Visual Studio Code.

3. Get the File ID of your CSV File in Delta.Storage. To do this, you can simply navigate to your file in your Delta.Storage UI, click on the settings button and select “View Details”

From there it will bring up a popout box that has all the details of your Files. NOTE: You only need the “File ID” for this article.

4. Install the package:

python -m pip install requests

5. Build out a POST Request code. This will allow you to communicate to the Delta.Storage API

Retrieve decrypted file by file ID URL:

https://api.delta.storage/files/decrypt/{id}

Build out your POST Request:

import pandas as pd
import io
import requests
import os

url = "https://api.delta.storage/files/decrypt/{fileid}"

Build out the Headers:

headers = {
"accept": "application/octet-stream",
"content-type": "application/json",
"authorization": "YOUR_API_KEY"
}

Send POST Request to Delta.Storage:

response = requests.post(url, headers=headers)

Once you manage to run this. You can simply run the following code to display the content of your File:

response = response.text
print(response)
Response

As you can see, whenever the Delta.Storage API retrieves your file, It actually retrieves the whole content of it and returns it as a text. Now since we’re wanting to add this to a Pandas Dataframe, we want to get rid of the TABs (\t) from our response to be able to chuck it in into a Dataframe.

response = response.replace('\t','')

Once that is done, your response text would be looking more like the content that you have on the CSV File that you’ve uploaded.

6. Load response to Pandas Dataframe. Now that we manage to retrieve the contents of our file, We now want to load it onto a Pandas Dataframe.

df = pd.read_csv(io.StringIO(response), sep=";")

We want to make sure that we use read_csv() loading our data into the Dataframe since, technically, the data that we have is of CSV Format.

Next, we want to encapsulate our response using StringIO to create a file object from our response.

Next, we want to add a separator so that Pandas know what to look out for as a delimiter from our response.

df = pd.read_csv(io.StringIO(response), sep=";")

Once you’ve done this. You should now be able to display the contents of your Delta.Storage file into a Pandas Dataframe.

As a Data Engineer, I’ve always been keen on looking for ways to be able to innovate more into my field. One of which is getting my hands dirty when it comes to working on Web3 Projects.

Delta.Storage’s solution is quite easy to work with it’s made it easier for us to setting our foot into Web3!

Since Delta.Storage is still on its early stages. Some files may not work properly at this point in time. But this is just to showcase how as Data Engineers, we can start to expand our stack to cater to the Web3 Space.

Important Note: Delta.Storage is still in its early stages. The team’s been working extremely hard to create more features that would allow its users to interact more with its application.

Thanks for reading! If you liked what you’ve read, please feel free to subscribe to my latest articles about Data Engineering & Web3.

--

--