How to use AWS S3 as Document storage for IBM B2B Integrator

Satyajit Paul
7 min readMar 2, 2019

--

If you are running IBM B2B Integrator and mired with the challenges of handling high volume data, one of the options that might have crossed your mind is use of AWS S3 Object store. Will not it be cool if you can use AWS S3 as Document storage for B2Bi/SFG?

This article looks into the options you have for using AWS S3 and highlights some of the pros and cons.

First, lets take a look at the out of the box B2Bi service — AWS S3 Service, supported starting with v6.0 release. This service is run as part of the Business Process execution and good for lightweight usage. By design, it’s not expected to help with Document storage & Mailbox Payload storage to AWS S3.

If your need is to use AWS S3 for Document Storage for Adapters and Services (i.e. BPs and Mailboxes), then IBM B2Bi doesn’t support that natively yet.

Good news is Amazon AWS offers a service that allows you to map an S3 Bucket as a mounted drive on local OS of your VM and then use the mounted drive as the Document Directory ( jdbcService.document_dir) for B2Bi Instance. This configuration can be used irrespective of whether you run B2Bi/SFG On Premise or on AWS Cloud.

For achieving this, we will use AWS Storage Gateway Service. The service has three options at high level — File Gateway, Volume Gateway & Tape Gateway. We will use File Gateway to map an S3 Bucket as a locally mounted drive. I know it’s little weird to see the use of phrase “File Gateway” here :), but in terms of functionality it’s very different than IBM File Gateway (SFG).

For detailed configuration steps, please refer to AWS Blog — File Interface to AWS Storage Gateway. It’s well documented there, so I won’t cover all those details in this article.

AWS File Gateway gives you couple of different options for mounting your S3 Bucket, all the options require you to have a dedicated VM that will work as the File Gateway, you can use — an EC2 instance if you are running B2Bi on AWS Cloud, VMWare ESXi or Microsoft Hyper-V option if you are running B2Bi on premise and finally, there is a Hardware based Appliance option too. Please note AWS Blog shared above doesn’t list Microsoft Hyper-V or the Hardware based Appliance as options as they were added at a later point in time.

Screenshot from AWS Storage Gateway

For this article, I chose the EC2 Instance to map the S3 bucket. I configured one EC2 instance that will be used as AWS File Gateway and will be configured with the intended S3 bucket — b2bi-s3-demo. Once AWS File Gateway instance is configured, this EC2 instance acts as File gateway and used by other EC2 instances to mount the S3 drive. This can be done in as many EC2 client instances as we want within same AWS ‘region’.

Here “File-Gateway-Instance” is the one mapped to S3 bucket “b2bi-s3-demo”. In other two EC2 instances (File-Gateway-Client-1 & 2) B2Bi/SFG Application will be hosted.

At high level there are five distinct activities

Step:1 Configure the EC2 Instance to be used as File Gateway

Step:2 Configure the File Gateway that uses the EC2 instance configured above

Step:3 Configure File Shares that uses the configured File Gateway

Step:4 Mount the File Share in your EC2 instances hosting B2Bi/SFG

Step:5 Configure B2Bi/SFG to use the mounted S3 bucket

I will cover some of the high level configuration steps and highlight some of the critical steps. For details please refer to AWS Documentation.

Configure the EC2 Instance to be used as File Gateway

While configuring the EC2 instance for File Gateway Instance, I will recommend to use one of the AWS Storage Gateway AMI instances.

Second, while configuring the File Gateway EC2 instance, make sure you add an additional storage drive of minimum 150 GB, to be used for Caching.

Another important point — please change the security group’s default settings to allow inbound traffic. Unless you do this step, the public IP of the EC2 instance will not be accessible from File Gateway configuration UI. I spent lot of time before I figured it out :(.

Once EC2 instance for File Gateway is ready, you need to create the File Gateway Instance.

Configure the File Gateway that uses the EC2 instance configured above

Public IP of the EC2 instance for AWS File Gateway is used in above configuration.

Configure File Shares that uses the configured File Gateway

This gives you the commands/steps for mounting the File Share to your EC2 instance.

Mount the File Share in your EC2 instances hosting your Application

Next, I mounted the S3 drive in File-Gateway-Client-1 & 2 by running a simple mount command as shown below -

mount -t nfs -o nolock,hard 172.30.2.163:/b2bi-s3-demo /data/s3

172.30.2.163 is the private IP of the EC2 instance that acts as File Gateway.

With these steps, files in S3 bucket “b2bi-s3-demo” were available in other two EC2 instances as regular files and regular file operations can be done by Java classes using standard java File IO.

These are the same files available under S3 bucket as well -

Any changes done on the mounted drive from any of the EC2 instances get reflected immediately on the other EC2 instance as well as on S3. But changes done directly on S3 bucket doesn’t get reflected on File Gateway immediately. For that one can make a call to “AWS File Gateway” API or UI to refresh the File Gateway as soon as a new object added/deleted in S3 bucket directly i.e. without going through the AWS File Gateway. For B2Bi/SFG use-cases, I don’t expect anyone to write directly into the S3 bucket. The S3 Bucket should be treated as a system resource and only B2Bi/SFG should use that via AWS File Gateway which is nothing but the mounted drive.

Please note S3 bucket, File Gateway EC2 instance and Application hosting EC2 instances — all must be in same region. When used “US East (N. Virginia)” for everything, it worked fine. I tried cross region as well but that didn’t work.

Next, lets complete the configuration on B2Bi/SFG side.

Configure B2Bi/SFG to use the mounted S3 bucket

For using S3 for Document storage by B2Bi Adapter & Services, please configure following two parameters in customer_overrides.properties (or use the new Customization UI available starting with v6.0).

jdbcService.document_dir=/data/s3
jdbcService.defaultDocumentStorageType=FS

Additionally, if you want to configure a separate directory where you can restore the document payloads, then please configure following property in the customer_overrides.properties

jdbcService.RESTORE_DOCUMENT_DIR=/data/s3/<a sub directory>

For On Prem customers, I have not tried configuring the VMWare ESXi or Microsoft Hyper-V images. However, I don’t expect the behaviour and functionalities to be much different compared to what we covered above. For detailed steps on VMWare ESXi setup, you may refer to this article.

Now, before you take the decision, here are few points to consider -

  1. You can move to FileSystem for Document storage without worrying about FileSystem getting flooded. Your new File System has the benefits of scalability of AWS S3.
  2. Moving to FileSystem frees up your Database and help avoid some of the DB performance issues.
  3. Note AWS File Gateway need it’s own Disks for use by the gateway as cache storage. Usually it’s 20% of your overall storage need on S3.
  4. On performance front, let me quote from the AWS FAQ “The performance you experience depends on what host platform (hardware appliance, virtual machine, Amazon EC2 instance) you are using to run Storage Gateway software and a number of other factors.”
  5. Java Clients i.e. B2Bi/SFG will write to the Storage Gateway>File Gateway using SMB or NFS, so there will be some overhead due to this.
  6. I will suggest that you go through this blog to understand the performance impact of different options of integration with AWS S3.

Here are few more points about AWS File Gateway, almost all of it from AWS Documentation.

Please take a look at the architecture diagram of File Gateway below.

AWS File Gateway asynchronously updates the objects in Amazon S3 as you change the files. This should help avoiding any delay due to network latency etc.

The service optimizes data transfer between the gateway and AWS using multipart parallel uploads or byte-range downloads, to better use the available bandwidth. Local cache is maintained to provide low latency access to the recently accessed data and reduce data egress charges.

On security front, objects are encrypted with Amazon S3–server-side encryption keys (SSE-S3), this is irrespective of whether you choose Encryption option in B2Bi/SFG. This is configurable at AWS S3 Bucket level. All data transfer is done through HTTPS.

That’s all for now. You are all set.

Please leave your comments below.

--

--