HTTP Caching in Ballerina - Part I

Ballerina 0.970.0 introduced HTTP caching support. It does so by adding a caching layer on top of the plain HTTP client in Ballerina. The caching layer, complies with the RFC 7234 specification (it is not a complete implementation of the spec though). In addition to this, APIs are also provided for handling caching at the server side. In this post, we’ll take a look at the client side of caching using Ballerina 0.975.0.

Prerequisites

A Simple Caching Example

The following program depicts a simple service which sends a GET request to the backend (which in this case happens to be the /HTTP/cc.html resource of https://jigsaw.w3.org).

Caching is enabled by default for HTTP client endpoints. So yes, you are utilizing HTTP caching whenever you use an HTTP client endpoint in Ballerina. The cache attribute of the HTTP client endpoint configuration can be used to configure the caching behaviour of the endpoint. The following configurations are available in the cache configuration:

  • enabled - Enables/disables HTTP caching for the endpoint (default val: true)
  • isShared - Marks the endpoint as a shared cache if set to true (default val: false)
  • policy - Sets the caching policy to follow when using caching. Currently the users have two options: CACHE_CONTROL_AND_VALIDATORS and RFC_7234. The former caches responses only when they contain the Cache-Control header and a validator (either the ETag header or the Last-Modified header). This is also the default policy used and gives a more predictable caching behaviour. The latter option just simply follows the RFC7234 specification when caching and serving responses.
  • expiryTimeMillis - Sets the cache expiry time for the underlying cache used to store HTTP responses (default val: 86400)
  • capacity - Sets the maximum capacity of the underlying cache used in bytes (default val: 8MB)
  • evictionFactor - Sets the percentage of cache entries to evict when the cache is full. This is a value between 0 and 1 (default val: 0.2)

The last 3 configurations are for the underlying cache used for storing the responses. The cache used is an in-memory caching implementation available in the Ballerina standard library (ballerina/cache).

Running the Caching Service

To get a better understanding of what’s going on in here, let’s run this with debug logs and HTTP trace logs on.

$ ballerina run caching-client.bal -e b7a.log.level=DEBUG -e b7a.http.tracelog.path=./http.trace

The above command will publish the debug logs to the console and the HTTP trace log to a file in the current directory, named http.trace. Now invoke the service using an HTTP client:

$ curl -v http://localhost:9090/cache

Note that the response the client receives contains several headers used in caching, including the ETag and Cache-Control headers.

Cache-Control: max-age=120
Etag: "j366kj:q0efehi8"

The Cache-Control header only contains the max-age directive. The max-age directive tells the HTTP cache that the response is valid for 120 seconds (i.e., After 120 seconds, the response will become stale and it will have to be validated by the origin server before it can be served again. The ETag is used to uniquely identify the resource and will be used in the conditional requests sent to the origin server to validate the cached responses.

The following is a sample debug log generated while sending some requests to the service.

# Request 1
2018-06-30 16:52:58,998 DEBUG [ballerina/http] - Cached response not found for: 'GET /HTTP/cc.html'
2018-06-30 16:52:58,998 DEBUG [ballerina/http] - Sending new request to: /HTTP/cc.html
2018-06-30 16:53:00,767 DEBUG [ballerina/http] - Adding new cache entry for: GET /HTTP/cc.html
# Request 2
2018-06-30 16:53:24,986 DEBUG [ballerina/http] - Cached response found for: 'GET /HTTP/cc.html'
2018-06-30 16:53:24,999 DEBUG [ballerina/http] - Serving a cached fresh response without validating with the origin server
# Request 3
2018-06-30 16:55:41,727 DEBUG [ballerina/http] - Cached response found for: 'GET /HTTP/cc.html'
2018-06-30 16:55:41,730 DEBUG [ballerina/http] - Validating a stale response for '/HTTP/cc.html' with the origin server.
2018-06-30 16:55:41,730 DEBUG [ballerina/http] - Sending validation request for a stale response
2018-06-30 16:55:42,888 DEBUG [ballerina/http] - Response for validation request received
2018-06-30 16:55:42,912 DEBUG [ballerina/http] - Updating response headers using validation response.
2018-06-30 16:55:42,914 DEBUG [ballerina/http] - 304 response received, with a strong validator. Response(s) updated
# Request 4
2018-06-30 16:56:03,181 DEBUG [ballerina/http] - Cached response found for: 'GET /HTTP/cc.html'
2018-06-30 16:56:03,183 DEBUG [ballerina/http] - Serving a cached fresh response without validating with the origin server

Since the cache is empty initially, the first request results in getting an actual response from the upstream server and storing it in the cache.

For the second request, a cached response is found and it’s still fresh. Therefore this is served to the user without any involvement of the upstream server.

The third request is sent 2 minutes after the first request. By this time the cached response has become stale (since the max-age directive was set to 120 seconds), but it’s still in the cache. However, it cannot be served without first confirming with the origin server that the response is still valid. For this, the cache will make a conditional request to the origin server and if the requested resource hasn’t changed, the origin server will respond with a 304 Not Modified response. In this case, the resource hasn’t changed, so the cached response is “freshened” and the cache can serve it without validating for another 2 minutes.

Since the response is freshened, the fourth request is again served straight from the cache.

The HTTP trace log (the http.trace file) will have finer details (headers of the requests and responses) of the HTTP traffic coming into and going out of the caching service. For the first request, the HTTP traffic would contain a set of headers similar to the following:

[2018-06-30 16:52:58,972] TRACE {http.tracelog.downstream} - [id: 0x2455ebae, correlatedSource: n/a, host:/127.0.0.1:9090 - remote:/127.0.0.1:35538] INBOUND: DefaultHttpRequest(decodeResult: success, version: HTTP/1.1)
GET /cache HTTP/1.1
Host: localhost:9090
User-Agent: curl/7.58.0
Accept: */*
[2018-06-30 16:53:00,440] TRACE {http.tracelog.upstream} - [id: 0x195aaaf8, correlatedSource: 0x2455ebae, host:/192.168.1.5:48804 - remote:jigsaw.w3.org/128.30.52.21:443] OUTBOUND: DefaultHttpRequest(decodeResult: success, version: HTTP/1.1)
GET /HTTP/cc.html HTTP/1.1
host: jigsaw.w3.org
user-agent: ballerina/0.975.0
connection: keep-alive
[2018-06-30 16:53:00,715] TRACE {http.tracelog.upstream} - [id: 0x195aaaf8, correlatedSource: 0x2455ebae, host:/192.168.1.5:48804 - remote:jigsaw.w3.org/128.30.52.21:443] INBOUND: DefaultHttpResponse(decodeResult: success, version: HTTP/1.1)
HTTP/1.1 200 OK
Cache-Control: max-age=120
Date: Sat, 30 Jun 2018 11:23:01 GMT
Content-Length: 551
Content-Language: en
Content-Type: text/html
Etag: "j366kj:q0efehi8"
Expires: Sat, 30 Jun 2018 11:25:01 GMT
Last-Modified: Tue, 20 Jun 2000 22:47:14 GMT
Server: Jigsaw/2.3.0-beta2
Strict-Transport-Security: max-age=15552015; includeSubDomains; preload
Public-Key-Pins: pin-sha256="cN0QSpPIkuwpT6iP2YjEo1bEwGpH/yiUn6yhdy+HNto="; pin-sha256="WGJkyYjx1QMdMe0UqlyOKXtydPDVrk7sl2fV+nNm1r4="; pin-sha256="LrKdTxZLRTvyHM4/atX2nquX9BeHRZMCxg3cf4rhc2I="; max-age=864000
X-Frame-Options: deny
X-XSS-Protection: 1; mode=block
[2018-06-30 16:53:00,797] TRACE {http.tracelog.downstream} - [id: 0x2455ebae, correlatedSource: n/a, host:localhost/127.0.0.1:9090 - remote:localhost/127.0.0.1:35538] OUTBOUND: DefaultFullHttpResponse(decodeResult: success, version: HTTP/1.1, content: CompositeByteBuf(ridx: 0, widx: 551, cap: 551, components=1))
HTTP/1.1 200 OK
Cache-Control: max-age=120
Date: Sat, 30 Jun 2018 11:23:01 GMT
Content-Language: en
Content-Type: text/html
Etag: "j366kj:q0efehi8"
Expires: Sat, 30 Jun 2018 11:25:01 GMT
Last-Modified: Tue, 20 Jun 2000 22:47:14 GMT
Strict-Transport-Security: max-age=15552015; includeSubDomains; preload
Public-Key-Pins: pin-sha256="cN0QSpPIkuwpT6iP2YjEo1bEwGpH/yiUn6yhdy+HNto="; pin-sha256="WGJkyYjx1QMdMe0UqlyOKXtydPDVrk7sl2fV+nNm1r4="; pin-sha256="LrKdTxZLRTvyHM4/atX2nquX9BeHRZMCxg3cf4rhc2I="; max-age=864000
X-Frame-Options: deny
X-XSS-Protection: 1; mode=block
server: Jigsaw/2.3.0-beta2
content-length: 551

There are 2 sets for both request and response headers since for the first request, the origin server is contacted and the response it sends is forwarded to the client. But for subsequent requests until the 120 second time period expires, the origin server is not contacted, and as such, there would only be traffic from the client to the caching service.

[2018-06-30 16:53:24,983] TRACE {http.tracelog.downstream} - [id: 0x379afa3c, correlatedSource: n/a, host:/127.0.0.1:9090 - remote:/127.0.0.1:35558] INBOUND: DefaultHttpRequest(decodeResult: success, version: HTTP/1.1)
GET /cache HTTP/1.1
Host: localhost:9090
User-Agent: curl/7.58.0
Accept: */*
[2018-06-30 16:54:35,017] TRACE {http.tracelog.downstream} - [id: 0xe948cfbb, correlatedSource: n/a, host:localhost/127.0.0.1:9090 - remote:localhost/127.0.0.1:35588] OUTBOUND: DefaultFullHttpResponse(decodeResult: success, version: HTTP/1.1, content: CompositeByteBuf(ridx: 0, widx: 551, cap: 551, components=1))
HTTP/1.1 200 OK
Cache-Control: max-age=120
Date: Sat, 30 Jun 2018 11:23:01 GMT
Content-Language: en
Content-Type: text/html
Etag: "j366kj:q0efehi8"
Expires: Sat, 30 Jun 2018 11:25:01 GMT
Last-Modified: Tue, 20 Jun 2000 22:47:14 GMT
Strict-Transport-Security: max-age=15552015; includeSubDomains; preload
Public-Key-Pins: pin-sha256="cN0QSpPIkuwpT6iP2YjEo1bEwGpH/yiUn6yhdy+HNto="; pin-sha256="WGJkyYjx1QMdMe0UqlyOKXtydPDVrk7sl2fV+nNm1r4="; pin-sha256="LrKdTxZLRTvyHM4/atX2nquX9BeHRZMCxg3cf4rhc2I="; max-age=864000
X-Frame-Options: deny
X-XSS-Protection: 1; mode=block
age: 94
server: Jigsaw/2.3.0-beta2
content-length: 551

As it can be seen an Age header has been added to the response which depicts the time since the response was last validated. In this case, the response has been there in the cache for approximately 94 seconds, without being validated by the origin server.

Then, for the first request the caching service receives after the 120 second time period since the original request, the headers would look something like below:

[2018-06-30 16:55:41,723] TRACE {http.tracelog.downstream} - [id: 0xddcb6413, correlatedSource: n/a, host:/127.0.0.1:9090 - remote:/127.0.0.1:35620] INBOUND: DefaultHttpRequest(decodeResult: success, version: HTTP/1.1)
GET /cache HTTP/1.1
Host: localhost:9090
User-Agent: curl/7.58.0
Accept: */*
[2018-06-30 16:55:42,630] TRACE {http.tracelog.upstream} - [id: 0x8b30b744, correlatedSource: 0xddcb6413, host:/192.168.1.5:48886 - remote:jigsaw.w3.org/128.30.52.21:443] OUTBOUND: DefaultHttpRequest(decodeResult: success, version: HTTP/1.1)
GET /HTTP/cc.html HTTP/1.1
if-none-match: "j366kj:q0efehi8"
if-modified-since: Tue, 20 Jun 2000 22:47:14 GMT
host: jigsaw.w3.org
user-agent: ballerina/0.975.0
connection: keep-alive
[2018-06-30 16:55:42,884] TRACE {http.tracelog.upstream} - [id: 0x8b30b744, correlatedSource: 0xddcb6413, host:/192.168.1.5:48886 - remote:jigsaw.w3.org/128.30.52.21:443] INBOUND: DefaultHttpResponse(decodeResult: success, version: HTTP/1.1)
HTTP/1.1 304 Not Modified
Cache-Control: max-age=120
Date: Sat, 30 Jun 2018 11:25:43 GMT
Etag: "j366kj:q0efehi8"
Expires: Sat, 30 Jun 2018 11:27:43 GMT
Server: Jigsaw/2.3.0-beta4
Strict-Transport-Security: max-age=15552015; includeSubDomains; preload
Public-Key-Pins: pin-sha256="cN0QSpPIkuwpT6iP2YjEo1bEwGpH/yiUn6yhdy+HNto="; pin-sha256="WGJkyYjx1QMdMe0UqlyOKXtydPDVrk7sl2fV+nNm1r4="; pin-sha256="LrKdTxZLRTvyHM4/atX2nquX9BeHRZMCxg3cf4rhc2I="; max-age=864000
X-Frame-Options: deny
X-XSS-Protection: 1; mode=block
[2018-06-30 16:55:42,916] TRACE {http.tracelog.downstream} - [id: 0xddcb6413, correlatedSource: n/a, host:localhost/127.0.0.1:9090 - remote:localhost/127.0.0.1:35620] OUTBOUND: DefaultFullHttpResponse(decodeResult: success, version: HTTP/1.1, content: CompositeByteBuf(ridx: 0, widx: 551, cap: 551, components=1))
HTTP/1.1 200 OK
Content-Language: en
Content-Type: text/html
Last-Modified: Tue, 20 Jun 2000 22:47:14 GMT
Cache-Control: max-age=120
Date: Sat, 30 Jun 2018 11:25:43 GMT
Etag: "j366kj:q0efehi8"
Expires: Sat, 30 Jun 2018 11:27:43 GMT
Public-Key-Pins: pin-sha256="cN0QSpPIkuwpT6iP2YjEo1bEwGpH/yiUn6yhdy+HNto="; pin-sha256="WGJkyYjx1QMdMe0UqlyOKXtydPDVrk7sl2fV+nNm1r4="; pin-sha256="LrKdTxZLRTvyHM4/atX2nquX9BeHRZMCxg3cf4rhc2I="; max-age=864000
Strict-Transport-Security: max-age=15552015; includeSubDomains; preload
X-Frame-Options: deny
X-XSS-Protection: 1; mode=block
age: 1
server: Jigsaw/2.3.0-beta2
content-length: 551

There are 2 sets of request and response headers here as well, since to validate the stale response the cache needs to contact the origin server. This time, the request sent to the origin server contains two additional headers added by the cache:

if-none-match: "j366kj:q0efehi8"
if-modified-since: Tue, 20 Jun 2000 22:47:14 GMT

Since the response contained both an ETag and a last modified date, the validation request sent contains conditions for both validators. The conditions specified through these headers allow the origin server to check and verify whether the representation of the resource has changed from the representation the caching service has. Since it has not changed, the origin server sends a 304 response.

The headers of the 304 response is used to freshen the stale response (e.g., compare the Date and Expires headers from the previous responses with the above responses). Once it is freshen, the response can be served for another 120 seconds without being validated with the origin server. Note how the Age header has reset in the response sent to the client.

So, that’s about it for the client side of caching in Ballerina. In the next part of this series, we’ll take a look at the server side APIs and constructs provided in Ballerina for making the resources of your service cache-able.