Amazon Simple Storage Service (AWS S3) operations from Julia

Manu Francis
5 min readSep 19, 2023

--

Introduction

Amazon Simple Storage Service, commonly known as Amazon S3, is a cornerstone of Amazon Web Services (AWS) and a pioneer in the world of cloud storage. Since its launch in 2006, Amazon S3 has revolutionized data storage, making it simple, scalable, and cost-effective. This article delves into the essence of Amazon S3, its importance in cloud computing, and its pivotal role in microservices development.

Amazon S3 is an object storage service provided by AWS, designed to store and retrieve vast amounts of data, or objects, over the internet. Objects in Amazon S3 can include virtually anything: from documents and images to videos and backups. Each object is stored in a container called a “bucket,” which can be thought of as a top-level directory in S3’s hierarchical structure. These buckets can be public or private, offering granular control over data accessibility.

Amazon S3 in Microservices Development

Microservices architecture has gained significant popularity due to its flexibility and scalability in building complex applications. Amazon S3 plays a crucial role in microservices development for several reasons:

  1. Data Storage: Microservices often need a centralized location to store and retrieve data, configurations, and assets. Amazon S3 serves as an ideal data repository for microservices, allowing them to share and access data easily.
  2. Decoupled Services: In a microservices architecture, services should be loosely coupled. Storing shared data and assets in Amazon S3 promotes decoupling, as services can access this data without direct dependencies on one another.
  3. Scalability: Microservices must handle varying workloads and traffic. Amazon S3’s scalability ensures that microservices can access data without performance bottlenecks, even during traffic spikes.
  4. Resilience: In a microservices ecosystem, individual services can fail without affecting the entire system. S3’s durability and availability ensure that data remains accessible, even if some microservices experience downtime.
  5. Event-Driven Architecture: Amazon S3 can trigger AWS Lambda functions or other services when objects are created, updated, or deleted. This capability enables event-driven architectures, allowing microservices to respond to changes in S3 data automatically.
  6. Cost Efficiency: Microservices can optimize costs by using Amazon S3’s tiered storage classes. Data that is infrequently accessed can be moved to lower-cost storage classes like Glacier or Glacier Deep Archive.
  7. Global Reach: For microservices with a global user base, Amazon S3 can be combined with Amazon CloudFront to distribute data globally, reducing latency and improving the user experience.

Amazon S3 and Julia: Empowering Microservices and Data Science

Amazon Simple Storage Service (S3) stands as a foundational pillar in the AWS ecosystem, offering versatility and reliability. When coupled with the capabilities of the Julia programming language, S3 becomes an indispensable asset for both microservices development and data science applications.

In the realm of microservices, S3 excels in data sharing and storage, enabling services to seamlessly access shared data without tight coupling. Its scalability ensures that Julia-powered microservices can handle fluctuating workloads, promoting flexibility and resilience. S3’s durability and event-driven capabilities, combined with Julia, facilitate automatic responses to changes in data, enhancing system robustness. Moreover, the cost-efficiency of S3’s tiered storage classes, coupled with Julia’s automation capabilities, optimizes resource utilization. For microservices serving a global user base, the integration of S3 with Amazon CloudFront, managed via Julia scripts, reduces latency and elevates the user experience.

In the realm of data science, S3, paired with Julia, proves invaluable for data storage, versioning, collaboration, and backup. Large datasets can be securely stored and efficiently accessed through S3, complementing Julia’s data analysis capabilities. Versioning in S3 ensures data tracking and reproducibility. Julia simplifies collaborative data analysis, fetching data from S3 and enabling seamless sharing within teams. S3 serves as a reliable backup and recovery solution, further fortified by Julia’s automation for backup processes. In machine learning endeavors, S3 stores training datasets and models, and Julia interacts with it to facilitate scalability in training and inference.

AWS S3 Interactions with Julia

AWS.jl package in Julia aims to provide a convenient and efficient interface for interacting with Amazon Web Services (AWS) using the Julia programming language.

With the AWS.jl package, developers can easily access AWS services such as Amazon S3, Amazon EC2, and Amazon SQS, among others. The package provides a set of functions and methods that make it easy to interact with these services from within Julia code, allowing developers to seamlessly integrate AWS functionality into their applications.

Some specific use cases for the AWS.jl package include:

  • Storing and retrieving data using Amazon S3
  • Launching and managing virtual machines on Amazon EC2
  • Sending messages to and from Amazon SQS queues
  • Working with other AWS services such as Amazon DynamoDB, Amazon RDS, and Amazon Kinesis

Install AWS.jl in Julia

using Pkg
Pkg.add("AWS")

Configure AWS Credentials:

You need to configure your AWS credentials to access the S3 bucket. You can do this by setting environment variables or using AWS CLI configuration. AWS.jl can also read credentials from the AWS configuration file, but setting environment variables is a straightforward method for this example. You can set the following environment variables:

  • AWS_ACCESS_KEY_ID: Your AWS access key ID.
  • AWS_SECRET_ACCESS_KEY: Your AWS secret access key.
  • AWS_DEFAULT_REGION: The AWS region where your S3 bucket is located.

You can set these environment variables in your Julia script or in your system environment.

In Julia REPL, you can set environment variables as below:

ENV["AWS_ACCESS_KEY_ID"] = "your_aws_access_key"
ENV["AWS_SECRET_ACCESS_KEY"] = "your_aws_secret_access_key"
ENV["AWS_DEFAULT_REGION"] = "your_aws_default_region"

Also you can set up environment variables permanently by following the link: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html

Import the AWS.jl Library:

In your Julia script, import the AWS.jl library to use its functionality:

using AWS

Set AWS S3 as Service:

@service S3

Create AWS Client:

To do S3 operations, we need an AWS client. This contains credentials, region etc.

aws_access_key_id = get(ENV,"AWS_ACCESS_KEY_ID","")
aws_secret_access_key = get(ENV, "AWS_SECRET_ACCESS_KEY","")
aws_region = get(ENV,"AWS_DEFAULT_REGION","us-east-1")
creds = AWSCredentials(aws_access_key_id, aws_secret_access_key)

You can also store this client to a global variable for future usage:

const AWS_GLOBAL_CONFIG = Ref{AWS.AWSConfig}()
AWS_GLOBAL_CONFIG[] = AWS.global_aws_config(region=aws_region, creds=creds)

List Objects in the S3 Bucket:

Use the list_objects function to retrieve a list of objects in the S3 bucket. Replace "your-bucket-name" with the name of your S3 bucket:

data = S3.list_objects("your-bucket-name"; aws_config=AWS_GLOBAL_CONFIG[])

Put Objects in the S3 Bucket:

Use the put_object function to upload objects into the S3 bucket. Replace "your-bucket-name" with the name of your S3 bucket. In the below example, we are trying to upload an image. Replace your-file-name.png with the actual file name.

image_data = read("your-file-name.png")
S3.put_object(
"your-bucket", "/your-file-name.png",
Dict(
"body" => image_data
);
aws_config=AWS_GLOBAL_CONFIG[]
);

We have to mention the path and object. Object always in binary format. That why we are reading as binary images using read function in Julia.

Moreover, we can add more parameters like tag, metadata, ACL etc. while uploading the object.

NOTE: we have to properly escape URI in tags. We can use escapeuri function from URIs.jl package .

You can add URIs.jl package as below:

using Pkg
Pkg.add("URIs")

using URIs
acl = "private"
metadata = Dict("example" => "metadata")
tags = Dict("example" => "tag")
data_string_to_put = """{key: value}"""

meta = Dict("x-amz-meta-$k" => v for (k, v) in metadata)
head = merge!(
Dict(
"x-amz-acl" => acl,
"x-amz-tagging" => URIs.escapeuri(tags),
"Content-Encoding" => "",
"Content-Type" => "",
),
meta
)

You can see sample codes and note books in github: https://github.com/efmanu/aws-julia-examples

--

--

Manu Francis

Researcher, Machine Learning Engineer, Software Developer