Working With IBM Cloud Object Storage In Python

IBM Cloud Object Storage

Working with Data Science Experience comes with a flexible storage option of IBM Cloud Object Storage. When you create project in DSX you get two storage options.

  1. IBM Cloud Object Storage : Data stored using COS option is encrypted and dispersed across multiple geographic locations. This data can be accessed over HTTP using a REST API.
  2. Object Storage (Swift API ): You have to use Swift API to interact with these storage accounts. You can learn more about how to interact with this storage option here. If you are still using Object Storage (Swift API ), we encourage you to start working with IBM Cloud Object Storage.

This blog is focused on how to use IBM Cloud Object Storage in Python, but you can also easily load data in DSX using UI.

Import Credentials

To access IBM Cloud Object Storage you need credentials. You can get these credentials using insert credentials option in the DSX notebook. To insert credentials you need to first upload some data to DSX using browse functionality.

Insert credentials in the notebook
# @hidden_cell
# The following code contains the credentials for a file in your IBM Cloud Object Storage.
# You might want to remove those credentials before you share your notebook.
credentials = {
'IBM_API_KEY_ID': '*******************************',
'IAM_SERVICE_ID': '*******************************',
'ENDPOINT': '*******************************',
'IBM_AUTH_ENDPOINT': '*******************************',
'BUCKET': '*******************************',
'FILE': '*******************************'
}

ibm_boto3 library provides complete access to the IBM® Cloud Object Storage API. We need to create a low-level client using above credentials.

from ibm_botocore.client import Config
import ibm_boto3
cos = ibm_boto3.client(service_name='s3',
ibm_api_key_id=credentials['IBM_API_KEY_ID'],
ibm_service_instance_id=credentials['IAM_SERVICE_ID'],
ibm_auth_endpoint=credentials['IBM_AUTH_ENDPOINT'],
config=Config(signature_version='oauth'),
endpoint_url=credentials['ENDPOINT'])

Upload Files

To upload file in COS, we will be using upload_file function. It takes three parameters — local file name(along with path), bucket name and key. Key can be different from local file name. Your file will be identified by this name within the bucket. When you create a project with IBM Cloud Object Storage option, DSX creates bucket for your project . You can find bucket corresponding to your project in credentials.

cos.upload_file(Filename='wine/wine.csv',Bucket=credentials['BUCKET'],Key='wine_data.csv')

We can use this function for uploading zip-files or pickle objects. Here I have Gradient Boosting Classifier as pickle object.

#Upload zip file
cos.upload_file('wine.gz', credentials['BUCKET'],'wine.gz')
#upload pickle object
cos.upload_file('GB_Classification_model.pkl', credentials['BUCKET'],'GB_Classification_model.pkl')

To help you with quickly uploading files here is the upload_file_cos function. You need to pass your credentials, local file name and key as parameters to this function.

from ibm_botocore.client import Config
import ibm_boto3
def upload_file_cos(credentials,local_file_name,key):  
cos = ibm_boto3.client(service_name='s3',
ibm_api_key_id=credentials['IBM_API_KEY_ID'],
ibm_service_instance_id=credentials['IAM_SERVICE_ID'],
ibm_auth_endpoint=credentials['IBM_AUTH_ENDPOINT'],
config=Config(signature_version='oauth'),
endpoint_url=credentials['ENDPOINT'])
try:
res=cos.upload_file(Filename=local_file_name, Bucket=credentials['BUCKET'],Key=key)
except Exception as e:
print(Exception, e)
else:
print('File Uploaded')

Download Files To Local Machine

Once you have your file in IBM Cloud Object Storage , now you can download it on your local machine.

  1. Click on your project
  2. Click find and add data icon from upper right hand side panel.
  3. Select the file and click download.

Get Data From COS Into Notebook

To download file from COS into notebook, we will be using download_file function. It takes same parameters as above. Here I am downloading file wine.csv to folder data and saving it with name wine1.csv

cos.download_file(Bucket=credentials['BUCKET'],Key='wine.csv',Filename='data/wine1.csv')

Here is download_file_cos function for quick use.

from ibm_botocore.client import Config
import ibm_boto3
def download_file_cos(credentials,local_file_name,key):  
cos = ibm_boto3.client(service_name='s3',
ibm_api_key_id=credentials['IBM_API_KEY_ID'],
ibm_service_instance_id=credentials['IAM_SERVICE_ID'],
ibm_auth_endpoint=credentials['IBM_AUTH_ENDPOINT'],
config=Config(signature_version='oauth'),
endpoint_url=credentials['ENDPOINT'])
try:
res=cos.download_file(Bucket=credentials['BUCKET'],Key=key,Filename=local_file_name)
except Exception as e:
print(Exception, e)
else:
print('File Downloaded')

Instead of file if you want to upload/download file-like object you can use upload_fileobj and download_fileobj. Object must implement the read method, and return/accept bytes respectively.

with open('wine.csv', 'rb') as data:
cos.upload_fileobj(data, credentials['BUCKET'], 'wine_bytes')
with open('wine_copy.csv', 'wb') as data:
cos.download_fileobj(credentials['BUCKET'], 'wine_bytes', data)

Credentials you get in DSX using insert credentials options are scoped to one bucket access permission i.e. they allow you to only interact with your project’s bucket. If you want to interact with other buckets then you will have to create new credentials with appropriate access permissions.

Create New Service Credentials

To create new credentials, you need to follow these steps.

  1. Go to IBM Cloud and click on IBM Cloud Object Storage service under services section.
  2. Select Service credentials from left hand panel and click on New credential button.
  3. Give name and choose appropriate role based on your requirements and hit add button.

You should be able to see this credential under service credentials.

Just copy these credentials from view credentials option and create cos object.

cos_credentials={
"apikey": "***********************",
"endpoints": "***********************",
"iam_apikey_description": "***********************",
"iam_apikey_name": "***********************",
"iam_role_crn": "***********************",
"iam_serviceid_crn": "***********************",
"resource_instance_id": "***********************"
}
auth_endpoint = 'https://iam.bluemix.net/oidc/token'
service_endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'
cos = ibm_boto3.client('s3',
ibm_api_key_id=cos_credentials['apikey'],
ibm_service_instance_id=cos_credentials['resource_instance_id'],
ibm_auth_endpoint=auth_endpoint,
config=Config(signature_version='oauth'),
endpoint_url=service_endpoint)

List Buckets

Using list_buckets function we can list all buckets.

for bucket in cos.list_buckets()['Buckets']:
print(bucket['Name'])

Create & Delete Buckets

create_bucket and delete_bucket function will help you in creating and deleting buckets.

cos.create_bucket(Bucket='bucket1-test')
cos.delete_bucket(Bucket='bucket1-test')

There are many functions you can use to manage your IBM Cloud Object Storage . In this blog,we have covered basic functions to make your job easy while working with IBM Cloud Object Storage in Data Science Experience using Python.

For more information you can refer this documentation. For python code refer this notebook.