Building a Data Layer with Kotlin and DynamoDB
We are building MailPlusPlus, an application to ingest, process and route emails. Users will be able to create and manage any number of email addresses. They can create rules, with optional machine learning, to process, route and store incoming messages.
We use Kotlin because for its expressiveness, and because it is a stable JVM-based language. DynamoDB was an obvious choice for the database since we are hosting this application on AWS. I am using Spring Boot to implement REST APIs to support the user interface and serve AWS lambdas that handle mail processing.
Kotlin with DynamoDB is not the best documented combination. There are a few quirks, but once I got passed them, the tech stack works admirably. I hope that my thoughts here will be helpful to someone.
Designing the Schema
When designing a data model, the first question I ask myself is “what queries will I need to support?”. This application has two distinct.
On the user interface side, we are configuring email processing rules. We need to be able to query the rules that exist for each account. We also need to be able to access account data, as well as data on stored emails.
On the mail processing side, we need to be able to get the rules associated with an address for incoming mail, as well as account data. There is a little complication, we need to search for rules given account, we also need to search for accounts using a rule. We accomplish this by putting the data in two tables. This requires transactions (you want to ensure this data is consistent.
Since we are using DynamoDB, there are a few relevant features for this design.
- In DynamoDB an item is identified by either a simple key (a single ‘partition’ key) or a composite key which combines a ‘partition’ key with a second ‘sort’ key. Obviously the fastest way to access the data is with the key.
- You can query based on the initial characters of a key. For example if you can query for any partition key starting with “NE-” (which may be useful for geographic data for example). You can also query for the partition part of a composite key and get all the items for that partition.
- DynamoDB supports both PutItem and UpdateItem. PutItem replaces the item (overwriting or removing existing fields). UpdateItem allows you to replace or delete certain fields while leaving the rest of the item intact (which turns out to be important).
- DynamoDB supports transactional writes across tables. This means I can update two tables at the same time knowing they will succeed together or fail together. This solves the rules-account problem I mentioned above.
- DynamoDB (like any other NoSQL database) allows you to put different types of item, with different fields, in the same table. This allows us to use polymorphism (more on that later).
The data design consists of the following tables.
The rules table has account and ruleID (a UUID) as the composite key. There are currently two types of rules with different sets of fields. Each item in the rules table has an “emailAddresses” field which keeps track of associated addresses.
The emailAddresses table has emailAddress as the primary key. AccountId is an indexed field (it can be queried quickly). There is also an ruleIds field for each item that contains a list of rules to run for incoming mail.
The emails table has an emailAddress as the partition key, and emailId as the sort key. I create the “emailID” by adding the date (i.e. ‘2023–07–04’) with the emailId assigned to the incoming email by Amazon SES. This allows me to restrict a query by date. The raw emails are stored on S3. Items in this table will contain results of processing (i.e. with ML models) and status of rules.
Developing Locally
The first thing I did is got DynamoDB running on my laptop. DynamoDB will be hosted on the AWS cloud in any production environment. The ability to run a local instance of the database while developing is sure nice to have.
To start a local instance of DynamoDB, I run this Docker command with no issues.
docker run -p 8000:8000 amazon/dynamodb-local -jar DynamoDBLocal.jar -sharedDb
The ‘-sharedDb’ option ensures that interacting by the CLI and the SDK points to the same data. This makes playing around (and debugging) much easier.
Then you can run aws CLI commands using ‘ — endpoint-url http://localhost:8000’ to point to your local database (in a Docker container).
This my Bash script to create a new database table and then list tables. The AWS access key and secret are dummies. With the local DynamoDB instance they need to be there, but then they are ignored.
export AWS_ACCESS_KEY_ID=X
export AWS_SECRET_ACCESS_KEY=X
aws dynamodb create-table \
--table-name rules \
--attribute-definitions AttributeName=account,AttributeType=S AttributeName=id,AttributeType=S \
--key-schema AttributeName=account,KeyType=HASH AttributeName=id,KeyType=RANGE \
--provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5 \
--endpoint-url http://localhost:8000
aws dynamodb --endpoint-url http://localhost:8000 --region us-east-1 list-tables
The thrill of victory and the agony of dependencies
Kotlin and dynamodb dependencies are even less well documented than usual with conflicting advice given the internet. I ended up needing the following dependencies. This is in Gradle/Groovy, you can adapt to your particular build system.
implementation 'aws.sdk.kotlin:aws-core:0.25.0-beta'
implementation 'aws.sdk.kotlin:dynamodb:0.25.0-beta'
implementation 'com.squareup.okhttp3:okhttp:5.0.0-alpha.11'
implementation 'javax.servlet:javax.servlet-api:4.0.1'
This took several hours for me to tease out some weird runtime errors. I hope this helpful to someone.
Configuring the DynamoDB client bean in Spring Boot
The main configuration task is to build a DynamoDbClient to autowire into your services. I use Spring Boot profiles to specify two beans; one for “local” and the other for “!local”. I provide fake access credentials for the local dynamodb instance.
When I run this application as “not local” it is running on an EC2 instance in AWS. In this case the access credentials aren’t needed. I define an IAM role for the EC2 instance that gives the necessary permissions.
My configuration file has the following two functions.
@Value("\${dynamodb.endpoint}")
val endpoint: String? = null
@Value("\${dynamodb.region}")
val dbregion = "us-east-1"
@Bean
@Profile("!local")
@Scope(value = ConfigurableBeanFactory.SCOPE_PROTOTYPE)
fun dynamoDbClientBuilder(): DynamoDbClient.Builder {
val builder: DynamoDbClient.Builder = DynamoDbClient.builder()
builder.apply {
this.config.region = dbregion
if (endpoint != null) {
this.config.endpointUrl = Url.parse(endpoint!!)
}
}
return builder
}
@Bean
@Scope(value = ConfigurableBeanFactory.SCOPE_PROTOTYPE)
@Profile("local")
fun dynamoDbClientBuilderLocal(): DynamoDbClient.Builder {
val builder: DynamoDbClient.Builder = DynamoDbClient.builder()
builder.apply {
this.config.region = dbregion
// Add fake accessKey for local profile
this.config.credentialsProvider = StaticCredentialsProvider {
accessKeyId = "AAABBB"
secretAccessKey = "BBBAAA"
}
if (endpoint != null) {
this.config.endpointUrl = Url.parse(endpoint!!)
}
}
return builder
}
Adding data: Put versus update
To put data into a DynamoDB table you have two options. PutItem is allegedly to add a new item to the table. UpdateItem is alledgedly to modify an existing item. In reality both operations can both add a new item and modify an existing item.
The difference is that PutItem will replace every non-key field in an item including removing fields that aren’t in the request. UpdateItem will only modify non-key fields that are specifically mentioned in the request (leaving the other fields as they were).
The AttributeValue class is used in a PutItemRequest. This allows us to specify the data type as well as the value. If I say
val ruleName = AttributeValue.S("DoctorsNotes")
The “S” function marks that this ruleName be stored as a String.
So, we define an item as a Map; the keys are field names, the values are AttributeValues. Here is the code to store a few fields in an item in the “pets” table.
The runBlocking function is required because we are using a suspend function (read about Kotlin coroutines if this doesn’t mean anything). The dynamoDbClientBuilder bean can be autowired from the Configuration beans I list above.
val itemValues = mutableMapOf<String, AttributeValue>()
itemValues["account"] = AttributeValue.S("123")
itemValues["ruleId"] = AttributeValue.S("S3295")
itemValues["description"] = AttributeValue.S("Respond to Bitcoin offer")
val request = PutItemRequest {
tableName = "rules"
item = itemValues
}
val response = runBlocking {
dynamoDbClientBuilder.build().use { ddb ->
ddb.putItem(request)
}
}
This has the effect of removing any additional fields (other than “account” and “ruleId”) that were in an existing version of the object. If you had “lastReceivedDate”, or “timesUsed” fields, they are now gone. The alternative is an UpdateItemRequest. Instead of AttributeValue objects, we now use AttributeValueUpdate objects. These specify an AttributeValue as well as an AttributeAction; either put or delete. Setting up a field in the item map would now look like this.
itemValues["description"] = AttributeValueUpdate {
value = AttributeValue.S("Respond to Bitcoin Offer")
action = AttributeAction.Put
}
The benefit of this is that I can now easily write functions to modify data based on REST requests from the user API. Any additional data I store as part of this item, such as associated email addresses or number of messages processed, won’t be touched when the User interface modifies data.
Implementing DAOs
I spent a half a day playing with Spring Data using DynamoDB. I decided against using Spring Data. This was partially because good complete documentation is hard to find and things that were easy with Spring Data JPA seemed much more difficult with Spring Data DynamoDB. I am also implementing polymorphic data classes (i.e. storing multiple classes in one table) which complicate things. There are articles in Baeldung and other places if you wish to go the Spring Data path.
We need a way to convert maps with AttributeValues into data classes, and data classes into maps of AttributeValues or AttributeValueUpdates. Kotlin extension functions turn out to work great. These transformations are easy to write and manage. In my code to write data into a polymorphic table, the transformation looks something like this.
fun MailRule.asAttributeValueUpdateList(): Map<String, AttributeValueUpdate> {
val itemValues = mutableMapOf<String, AttributeValueUpdate>()
itemValues["type"] = attributeValuePutString(type.toString())
itemValues["description"] = attributeValuePutString(description)
when (type) {
RuleType.ROUTING -> with(this as RoutingMailRule){
itemValues["emailAddresses"] = attributeValuePutOrRemoveListString(emailAddresses)
itemValues["active"] = attributeValuePutBool(active)
}
RuleType.DEADMANS -> with(this as DMMailRule) {
itemValues["pingDays"] = attributeValuePutInt(pingDays)
itemValues["sendDays"] = attributeValuePutInt(sendDays)
itemValues["pingEmails"] = attributeValuePutOrRemoveListString(pingEmails.toList())
}
RuleType.UNKNOWN -> TODO()
}
return itemValues
}
Why we are building this
This project is primarily a portfolio project to develop and show off skills as we look for new opportunities. There are two of us on the team right now, myself for the backend, Rupali Solanki is developing a nice ReactJS based front end.
This project is an application I want to use. Lots of sites want my email address. I am getting an ever increasing amount of spam. MailPlusPlus will let me have 100 or 1,000 email addresses I can give out. I can turn them on or off if I want. I can forward message. I can store them. I can process messages with AWS Comprehend (a ML product) and make decisions based on tokens generated by AI. I already have plans to integrate with other scripting platforms like IFTTT.
Ultimately we would like this project to turn into professional work. If this application can become commercially successful… that would be cool. We mainly looking to develop opportunities.
If you are a hiring manager looking for talented engineers, let’s talk. I am open to contract positions for anything involving Java/Kotlin, data processing or AWS. We can create a fully hosted cloud application to your specifications. And, I am open to the right full time permanent position.
If MailPlusPlus sounds like something you would like to use, I would definitely like to hear from you. We need user suggestions that would turn this into a potentially marketable product. Let me know in the comments what you would like to see.
Email me at eb.n5sqo@mailplusplus.com. You can see the progress of the project at http://mailplusplus.com