To Automate the Restore of Archived Objects through AWS Storage Gateway

Hetul Sheth
ScaleCapacity
Published in
5 min readAug 31, 2020

Prerequisites:

  1. You should have a File Gateway setup in AWS. You can refer this link to set up a file gateway using an EC2 instance. You can also setup file gateway on the on-prem servers.

Scenario:

We have a File gateway setup in our AWS account which is mounted to our client’s local server. Now some old files in the S3 bucket are in Glacier storage class. So whenever someone tries to access those files it won’t show up. So we need to retrieve these files to one of the Glacier tier( Expedited, Standard, and Bulk)for audit purposes. But also we want to automate this task so every time when client tries to access such files, they do not need to manually perform this and that file will be automatically made available in few seconds

Please go through the aboveblog to get more details.

Procedure:

To automate the task:

  1. Create CloudWatch Log Groups for File Gateway( if created already while creating the file gateway, then note that down, it will be used later).

To configure a CloudWatch log group for your file gateway

While in the Gateway Log Group wizard, choose the Create new Log Group link to create a new log group. You are directed to the CloudWatch console to create one. If you already have a CloudWatch log group that you want to use to monitor your gateway, choose that group for Gateway Log Group.

If you create a new log group, choose the refresh button to view the new log group in the list.

If your gateway is deployed on a VMware host that is enabled for VMware High Availability (HA) cluster, you’re prompted to verify and test the VMware HA configuration. In this case, choose Verify VMware HA. Otherwise, choose Save and Continue.

2. Create AWS Lambda execution role:

Create an IAM role that our AWS Lambda function will assume when it runs. This Lambda function will be calling the S3 API to request object recalls, so it will need these permissions to access the S3 bucket on top of the usual Lambda Basic Execution Role.

  • Go to the AWS Identity and Access Management (IAM) console and select Roles.
  • Click Create role
  • Choose Lambda and click Next: Permissions
  • Choose one of the AWS managed policies, here AmazonS3FullAccess. If creating your own policy, you want to allow the “RestoreObject” action against any relevant resources.
  • In addition to the above Amazon S3 policy, also attach the AWSLambdaBasicExecutionRole so that Lambda can operate as normal and generate a CloudWatch Logs stream.
  • Select your policy and click on Next: Tags.
  • Add any optional tags to the role, if none, click Next: Review.
  • Provide a role name and make a note of this for later (e.g. myLambdaS3RestoreRole), and click Create Role

3. Create AWS Lambda function

Next, we need to build our Lambda function that will initiate our object restores for us.

  • Go to the AWS Lambda Console
  • Click Create function
  • Select Author From Scratch
  • Choose a name
  • Select the Runtime as Python 2.7
  • Expand “choose an execution role” → “Use an existing role.” Choose the role you created in above 2nd point
  • Click Create function
  • Scroll down to the Function Code window pane and replace the code in the editor with the following:

========================================

import os
import json
import boto3
import base64
import gzip
from botocore.exceptions import ClientError
from StringIO import StringIO

def lambda_handler(event, context):
cw_data = str(event[‘awslogs’][‘data’])
cw_logs = gzip.GzipFile(fileobj=StringIO(cw_data.decode(‘base64’, ‘strict’))).read()
log_events = json.loads(cw_logs)
for log_entry in log_events[‘logEvents’]:
result = process_recall(log_entry)
print (result)
return {
‘statusCode’: 200,
‘body’: result
}

def process_recall(log_entry):
print (“message contents: “ + log_entry[‘message’])
message_json = json.loads(log_entry[‘message’])
print (message_json)
if ‘type’ in message_json:
print (“Found ErrorType”)
error_type = message_json[‘type’]
print (“ErrorType = “ + error_type)
if message_json[‘type’] != “InaccessibleStorageClass”:
return “Unexpected error: not related to storage class”
else:
return_error = “error: no type entry”
return return_error
if ‘bucket’ in message_json:
print (“Found Bucket”)
s3_bucket = message_json[‘bucket’]
print (“Bucket = “ + s3_bucket)
else:
return_error = “error: no bucket”
return return_error
if ‘key’ in message_json:
print (“Found Key”)
s3_key = message_json[‘key’]
print (“Key = “ + s3_key)
else:
return_error = “error: no key”
return return_error
s3 = boto3.resource(‘s3’)
s3_object = s3.Object(s3_bucket, s3_key)
try:
restore_days = int(os.environ[‘RestoreDays’])
result = s3_object.restore_object(RestoreRequest={‘Days’:restore_days,’GlacierJobParameters’: {‘Tier’: os.environ[‘RecallTier’]}})
except ClientError as e:
if e.response[‘Error’][‘Code’] == ‘RestoreAlreadyInProgress’:
return e.response[‘Error’][‘Code’]
else:
return_error = “Unexpected Error whilst attempting to recall object”
return return_error
print (result)
return result

=====================================

Scroll down. In Environment variables section create 2 environment variables as per the Key-Value pair below:
Key = RecallTier | Value = “Expedited” or “Standard” (This value is used to specify the S3 Glacier Restore Level)

Key = RestoreDays | Value = Integer (The value is used to define how long the restored object will be made temporarily available for in days (e.g. 1))

On the top of the screen click Save.

4. Connect CloudWatch Logs to the Lambda function:

Finally, we need to connect the CloudWatch Logs group to our Lambda function so that it can process our File Gateway logs.

  • Open the Lambda function from Step 3 in the console
  • In the Designer, click Add trigger
  • Under Trigger configuration choose CloudWatch Logs
  • In the Log Group field, select the log group that you created in Step 1
  • Add a Filter Name
  • In Filter pattern add: “InaccessibleStorageClass”
  • Ensure the Enable Trigger box is checked
  • Click Add to continue

5. Test:

  • Try to access a file (through a File Gateway share) that you know is in the S3 Glacier storage class.
    Note: If the file is already in the local File Gateway cache, the file will be returned from cache and this new workflow will not execute
  • You should receive an initial IO error
  • Navigate to the CloudWatch console and select Logs from the left hand column
  • Select the File Gateway log group you had previously created
  • Under the Log Streams column you should see an entry similar to share-xyz123, which is your File Gateway file share ID. If you don’t see that entry, note that it can take up-to 5 minutes for the log group to receive the data from the File Gateway
  • Once the log stream from your file gateway share is visible, click on it
  • Click on the error message and look for the type:InaccessibleStorageClass — here you can also view the File (key) that you tried to access, along with the S3 bucket details
  • Open a separate window to the Lambda Console, and check the Lambda log group for the function and check for a successful restore request (http 202 response)
  • If you have enabled SNS notifications, once the S3 Glacier restore has been completed you will get an email that contains the following in the body “eventName”:”ObjectRestore:Completed”. At this point you can access the file through File Gateway again.

--

--

Hetul Sheth
ScaleCapacity

AWS Certified Solutions Architect, Developer and SysOps Admin Associate | Azure Certified