AWS Java SDK 2.x at half the cost

Doug Tangren
Making Meetup
Published in
10 min readJan 19, 2024
Photo by Nathan Dumlao on Unsplash

At Meetup, two of our core engineering principles are to be cost conscious and use technologies that are proven to scale.

On Meetup every time you click an attend button, schedule an event, create a group, start a conversation, decide join the local puppy group in your neighborhood, or any other activity your request is guaranteed to pass through multiple JVMs and within those, likely half a dozen AWS services along the way which themselves are often sitting in front of multiple JVMs. Both the JVM and AWS APIs are considered proven, rock-solid, and scalable technologies at Meetup. This is why we’re heavily invested in Java and AWS for our core platform services as well as keeping both up to date.

Like many companies, AWS included, we’ve completed the spiritual journey of migrating our largest primary platform codebase from Java 8 to 11, then to 17, and most recently to 21. The renaissance happening with the Java community has been wonderful and has unlocked a number of options for us, one being the subject of this post.

Being a 20 year strong engineering focused company, we’ve accumulated a lot learnings in the area of understanding the cost of code dependencies. We’ve learned it’s much easier to add than to remove dependencies and that the simplest solution to avoid the future burden of maintenance tax and security vulnerabilities attached to dependencies is simply not to invite them to the party in the first place so we’re relatively conservative when evaluating new dependencies. We understand the long term tax and cost involved in doing so.

When evaluating the new v2 AWS SDKs for Java, the first surprise we encountered was that it was nearing twice the size of that of the equivalent v1 SDK clients. While we started our V2 migration journey relatively long ago we recently revisited and this is still more or less the same case. We’ll compare the AWS SDK SQS client versionsv1.12.637vs v2.23.3 respectively in this post.

Here is an example Gradle build file that allows you to download each version into a separate directory for scientific comparison.

plugins {
base
}

repositories {
mavenCentral()
}

val sqsV1 by configurations.creating
val sqsV2 by configurations.creating

dependencies {
//👇 attach the v1 and v2 sqs dependencies to separate configurations
sqsV1("com.amazonaws:aws-java-sdk-sqs:1.12.637")
sqsV2("software.amazon.awssdk:sqs:2.23.3")
}

tasks.register<Copy>("downloadV1") {
from(sqsV1)
into("libV1")
}

tasks.register<Copy>("downloadV2") {
from(sqsV2)
into("libV2")
}
$ ./gradlew downloadV1 downloadV2
# 👇 compare the relative amount of weight of each version
$ du -h libV1 libV2
6.0M libV1
9.7M libV2 # <= 🙅‍♀️ same great SQS API, but now much heavier?

The reason for hike in dependency size largely has do with the fact that it comes pre-bundled with a few batteries included defaults, including their transient dependencies which include but is not limited to Netty which itself includes a bevy of jars for which we also depend on but unfortunately with different and conflicting versions.

# 👇 compare the breadth of transitive dependencies
$ ls libV1 | wc -l
13
$ ls libV2 | wc -l
40 # <= 🥶

It is my personal philosophy that the key building cost-effective sustainable software systems is simply just to use less, especially when I comes to what we don’t actually need. This blends well with our general cost mindset. It is not useful or sustainable to choose to invest in and maintain a catalog of multiple libraries which all do the same thing, namely libraries that are effectively HTTP clients.

This is why one new feature of the AWS v2 SDK for Java in particular, the ability to BYOHC or bring your own HTTP client, caught our attention.

The AWS Java SDK team put a lot of thought into the v2 SDK client design including making it very straightforward to implement an HTTP client for the SDK yourself by just implementing the interface SdkAsyncHttpClient which only requires you to implement two methods: one being extremely trivial bookkeeping method of declaring a client name and one being the actual work of executing a request.

Before we moved forward with the v2 SDK, we wanted to see if we could reduce the cost of upgrading and bringing on more dependencies by shaving a little top. Below is an example of how to depend on the SDK but not its pre-bundled http clients.

plugins {
base
}

repositories {
mavenCentral()
}

val sqsV1 by configurations.creating
val sqsV2 by configurations.creating
val sqsV2Slim by configurations.creating

dependencies {
sqsV1("com.amazonaws:aws-java-sdk-sqs:1.12.637")
sqsV2("software.amazon.awssdk:sqs:2.23.3")
//👇 a slim version of the above
sqsV2Slim("software.amazon.awssdk:sqs:2.23.3") {
//👇 exclude the default batteries included http clients
exclude(group = "software.amazon.awssdk", module = "netty-nio-client")
exclude(group = "software.amazon.awssdk", module = "apache-client")
}
}

tasks.register<Copy>("downloadV1") {
from(sqsV1)
into("libV1")
}

tasks.register<Copy>("downloadV2") {
from(sqsV2)
into("libV2")
}

tasks.register<Copy>("downloadV2Slim") {
from(sqsV2Slim)
into("libV2Slim")
}

The effect of this is that your AWS SDK dependency footprint will be nearly cut in half!

$ ./gradlew downloadV1 downloadV2 downloadV2Slim
# 👇 compare the relative amount of weight of each version
$ du -h libV1 libv2 libV2Slim
6.0M libV1
9.7M libv2
4.3M libV2Slim # <= 😏 reduced the weight by more than half
# 👇 compare the breadth of transitive dependencies
$ ls libV2 | wc -l
40
$ ls libV2Slim | wc -l
24 # <=🏌️reduced breath of transitive dependencies by more than half

The effect of the above is also that your code will compile, but will fail at runtime, because the SDK now as no HTTP client to execute requests with. Oops! Let’s address that.

A virtue of keeping your Java versions up to date means being rewarded with faster runtimes and also new APISs, one being very nice out of the box standard library HttpClient. The AWS SDK provides a URLConnection-based implementation which doesn’t natively support async IO. However Java’s newer native HttpClient does!

Below is the full code of a client implementation, sans logging and javadoc comments and imports wildcarded for brevity. Hat’s off to the AWS SDK team as this fits neatly into a single file. I’ll walk through a few things that weren’t initially obvious to us and were notable details.

package com.meetup.awssdk.jdk;

import static software.amazon.awssdk.http.HttpMetric.HTTP_CLIENT_NAME;

import java.net.http.*;
import java.nio.ByteBuffer;
import java.time.Duration;
import java.util.*;
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicBoolean;
import org.reactivestreams.*;
import software.amazon.awssdk.http.*;
import software.amazon.awssdk.http.async.*;
import software.amazon.awssdk.utils.AttributeMap;

public class JdkAsyncHttpClient implements SdkAsyncHttpClient {
private static final Set<String> RESTRICTED_HEADERS = Set.of("Content-Length", "Host", "Expect");
private static final String CLIENT_NAME = JdkAsyncHttpClient.class.getSimpleName();
private static final AttributeMap.Key<HttpClient> CLIENT_KEY =
new AttributeMap.Key<HttpClient>(HttpClient.class) {};

private final HttpClient client;
private final Optional<Duration> timeout;

private JdkAsyncHttpClient(AttributeMap attributes) {
this.client =
Optional.ofNullable(attributes.get(CLIENT_KEY))
.orElseGet(
() ->
HttpClient.newBuilder()
// 👇 use Java 21 virtual threads for new ✨ cheap, light-weight concurrency ✨
.executor(Executors.newVirtualThreadPerTaskExecutor())
.connectTimeout(
attributes.get(SdkHttpConfigurationOption.CONNECTION_TIMEOUT))
.version(
version(attributes.get(SdkHttpConfigurationOption.PROTOCOL)))
.build());
this.timeout =
Optional.ofNullable(attributes.get(SdkHttpConfigurationOption.READ_TIMEOUT));
}

@Override
public String clientName() {
return CLIENT_NAME;
}

// 👇 execute the request using your implementation's http client
@Override
public CompletableFuture<Void> execute(AsyncExecuteRequest request) {
request
.metricCollector()
.ifPresent(metrics -> metrics.reportMetric(HTTP_CLIENT_NAME, CLIENT_NAME));
return this.client
.sendAsync(
// 👇 convert aws request to std lib request
convertRequest(request),
info -> {
// 👇 convert std lib response to aws sdk headers
request.responseHandler().onHeaders(convertResponse(info));
return HttpResponse.BodySubscribers.ofByteArray();
})
.thenCompose(
resp -> {
var fut = new CompletableFuture<Void>();
// 👇 convert std lib response body to aws response stream
request
.responseHandler()
.onStream(
(subscriber) -> {
var consumedOnce = new AtomicBoolean();
subscriber.onSubscribe(
new Subscription() {
@Override
public void request(long n) {
if (consumedOnce.getAndSet(true)) {
subscriber.onComplete();
// 👇 we're done
fut.complete(null);
return;
}
try {
subscriber.onNext(ByteBuffer.wrap(resp.body()));
} catch (Exception e) {
// 👇 request failed!
fut.completeExceptionally(e);
}
}

@Override
public void cancel() {}
});
});
return fut;
});
}

@Override
public void close() {}

HttpRequest.BodyPublisher body(AsyncExecuteRequest request) {
return switch (request.request().method()) {
case PATCH, PUT, POST -> new HttpRequest.BodyPublisher() {
@Override
public void subscribe(Flow.Subscriber<? super ByteBuffer> subscriber) {
// 👇 adapt reactive streams subscriber to std lib flow subscriber
request.requestContentPublisher().subscribe(FlowAdapters.toSubscriber(subscriber));
}

@Override
public long contentLength() {
return request.requestContentPublisher().contentLength().orElse(-1L);
}
};
default -> HttpRequest.BodyPublishers.noBody();
};
}

HttpRequest convertRequest(AsyncExecuteRequest request) {
var sdkRequest = request.request();
var builder =
request.request().headers().entrySet().stream()
.filter(entry -> !RESTRICTED_HEADERS.contains(entry.getKey()))
.reduce(
HttpRequest.newBuilder()
.uri(sdkRequest.getUri())
.method(sdkRequest.method().name(), body(request)),
(res, entry) ->
entry.getValue().stream()
.reduce(
res,
(res2, value) -> res2.header(entry.getKey(), value),
(prev, next) -> next),
(prev, next) -> next);
this.timeout.ifPresent(builder::timeout);
return builder.build();
}

SdkHttpResponse convertResponse(HttpResponse.ResponseInfo response) {
return SdkHttpResponse.builder()
.statusCode(response.statusCode())
.headers(response.headers().map())
.build();
}


// 👇 typical aws sdk v2 pattern, provide a builder() and create() based interface
public static Builder builder() {
return new DefaultBuilder();
}

public static SdkAsyncHttpClient create() {
return new DefaultBuilder().build();
}


// 👇 implement typical sdk builder interface
public static interface Builder extends SdkAsyncHttpClient.Builder<JdkAsyncHttpClient.Builder> {
Builder client(HttpClient client);
}

public static class DefaultBuilder implements Builder {
private final AttributeMap.Builder standardOptions = AttributeMap.builder();

private DefaultBuilder() {}

@Override
public Builder client(HttpClient client) {
standardOptions.put(CLIENT_KEY, client);
return this;
}

@Override
public SdkAsyncHttpClient buildWithDefaults(AttributeMap serviceDefaults) {
return new JdkAsyncHttpClient(
standardOptions
.build()
.merge(serviceDefaults)
.merge(SdkHttpConfigurationOption.GLOBAL_HTTP_DEFAULTS));
}
}

private static Version version(software.amazon.awssdk.http.Protocol awsProto) {
return switch (awsProto) {
case HTTP2 -> Version.HTTP_2;
case HTTP1_1 -> Version.HTTP_1_1;
};
}
}

The first thing worth noting about the code above is an attempt to follow a “make it blend” principle of software design in order make it feel familiar to consumers of the APIs. The goal was to make this client feel like the any of the other default AWS SDK v2 HTTP clients, which means following a few design conventions.

The first design convention was use a builder interface. Nearly everything in the AWS SDK v2 client is accompanied by a builder interface in the form of Foo.builder().build() and a short hand version for defaults Foo.create(). These provide a consistent and uniform API across a large surface area making it easy to guess your way through the API efficiently. These builder interfaces tend to be an inner class implementing some interface. With the AWS SDK v2 http clients, that interface is called SdkAsyncHttpClient.Builder<T>. In our case, that translated to SdkAsyncHttpClient.Builder<JdkAsyncHttpClient.Builder>. These builders are typically configured using a typed map of properties called AttributeMap which is able to guarantee some invariants about the types of values at runtime though a type-safe interface. A note on this for implementors. Most Java http clients, std library included, come with their own configuration interfaces. A thoughtful addition to your implementation will likely include a client configuration option to accept a preconfigured http client.

The next thing to note is that the AWS SDK really leans into a specific flavor of operating on streams of data, in particular reactive java which aims to be a service provider interface for other implementations. The reason this is notable is that if you’re going to be writing your own AWS SDK http client you’re likely going to bump into the need to adapt your client to these api’s.

What’s notable with the std library HttpClient leans into the std library’s reactive Flow api . Thankfully the library authors for reactive java provided adapters for this for java 9+ runtimes that allow you to translate between interfaces which made it simpler to adapt to the reactive java apis required by the AWS SDK.

It’s worth nothing that in java 9+ a version of reactive java is now translated into the Flow type. Because AWS Java team needs to support millions of customers they can’t easily not support the portion customers running applications using Java 8 or older. If they were able to use the standard library Flow types here, it would have been a better choice as it would mean one fewer third party dependencies exposed in their public apis.

The next thing to note was that we ran into a few issues with HTTP headers that Java’s native HTTP client restricts but which the AWS SDK Client request supplies. Unfortunately these header name are not documented, but we’ve collected a set we’ve run into issues with and include them in the code above so you don’t have to guess yourself.

The final thing to note is how SDK users are going to expect to wiring in your client. There are two paths here, an implicit path (you’ll find this with the default SDK http clients) and an explicit path.

For the implicit path, the SDK supports Java’s native ServiceLoader capability to dynamically resolve list of implementations for a given interface. To participate in this, you’ll need to define an additional class which serves mainly as a supplier of instances of your implementation and a resource file that registers that with the SDKs lookup path.

package com.meetup.awssdk.jdk;

import software.amazon.awssdk.http.async.SdkAsyncHttpClient;
import software.amazon.awssdk.http.async.SdkAsyncHttpService;

public class JdkSdkAsyncHttpService implements SdkAsyncHttpService {
@Override
public SdkAsyncHttpClient.Builder<JdkAsyncHttpClient.Builder> createAsyncHttpClientFactory() {
return JdkAsyncHttpClient.builder();
}
}

In a separate resource file named after the interface in a conventional location, simply include the fully qualified name of your supplier.

$ cat src/main/resources/META-INF/services/software.amazon.awssdk.http.async.SdkAsyncHttpService
com.meetup.awssdk.jdk.JdkSdkAsyncHttpService

While our implementation supports this, we aim to be explicit just to make it clear we’re intentionally using a custom http client in our repository. Otherwise were we to discover a bug in our implementation, callers would never realize they were depending on a custom implementation! In practice this looks as follows

var sqs = SqsAsyncClient.builder().httpClient(new JdkAsyncHttpClient()).build();

You may been keen to have noticed that our goal here was to reduce the cost of upgrading and in that our dependency footprint, but which seemed to come at the expense of writing a bit of code to accomplish that. This might not seem immediately intuitive, because now we have new code to maintain and no one likes maintaining code right?

This is true. However, the amount of code is small in comparison to the alternative of what we’d be bring in otherwise, and in the process of writing it we’ve gained extensive experience with and knowledge of how the AWS SDK v2 works that we would likely not have otherwise. The code we did need to write will likely not need to change as its extremely focused on a stable api and itself has zero transitive dependencies outside of the standard library. This is important as we rely on AWS SDKs for all of our essential core platform features. We believe that building a foundation of knowledge of what you depend on reduces your engineering costs over time, especially when it comes to understanding your dependencies.

This exercise has reduced our upgrade cost in more ways that one. We are now not only more operationally efficient but also mindfully more efficient in understanding how our systems work.

--

--

Doug Tangren
Making Meetup

Meetuper, rusting at sea, partial to animal shaped clouds and short blocks of code ✍