That means that we can focus now on building the business logic. In this case we are going to do the logic explained in this repository.
We are going to build a Microservice to index rooms information coming from another service (crawler). This service will be responsible for indexing the information into Elasticsearch.
Here you can see a table with the endpoints we need to build:
Building the first endpoint
Updating the OpenAPI Spec
In this first endpoint, we are going to index rooms to Elasticsearch. For that reason, we need to update the OpenApi Spec to include this new endpoint with the proper configuration.
As you can see, we include a new post action on the room endpoint. The operationId defines which
package.module.instance.method is going to execute the action.
In this endpoint, we need to receive a payload via body with the structure of the room. In OpenAPI Spec that is called a property. Here you can see how to create this new definition.
This is a kind of big example of how you can have nested definitions to build complex structures. But let me explain you some other tricks in this specification.
required lists all the properties of this definition that are required to exists in the payload, for instance, if we make a request without specifying the room name it will fail with an HTTP Error 400 Bad Request.
It is possible to define those definitions in a different YAML files, I just put it all together for convenience.
Setting up your configuration
In the previous article, we defined a docker-compose with a couple of dependencies. At this time, we need to have in some place the configuration to connect to Elasticsearch.
To do so, we are going to create a
.env file to store some environment variables. This is a good practice to store configurations and propagate it to your script. This is great too because you can use this setting in almost any programming language.
At the moment, we only need to include the configuration to connect to Elasticsearch, in the next article we will include configuration for RabbitMQ too.
Setting up Elasticsearch
Before to jump to the task to build the endpoint, we need to have a way to connect to Elasticsearch and be able to inject it to our controller.
For this purpose, we have the dependency
elasticsearch in our requirements.txt. This is the official package from Elasticsearch. It is quite straightforward to set up. I preferred to build a Factory class that will create a new Elasticsearch on demand. It looks like this:
After that, I will create another class that will handle some actions like:
- Make the connection to Elasticsearch, it will connect just once and check if the index should be created.
- Index a new room given the payload
- Check if a certain room already exists given an URL
Basically, for each action you have to define to which index you want to retrieve data, and which document type is being used. Let me briefly explain you some of these concepts.
An index is stored in a set of shards, which are themselves Lucene indices. This already gives you a glimpse of the limits of using a new index all the time: Lucene indices have a small yet fixed overhead in terms of disk space, memory usage and file descriptors used. For that reason, a single large index is more efficient than several small indices: the fixed cost of the Lucene index is better amortized across many documents.
This is where types help: types are a convenient way to store several types of data in the same index, in order to keep the total number of indices low for the reasons exposed above. In terms of implementation it works by adding a “_type” field to every document that is automatically used for filtering when searching on a specific type. One nice property of types is that searching across several types of the same index comes with no overhead compared to searching a single type: it does not change how many shard results need to be merged.
You can find more information in this article.
Something this class expects is the mapping. With Elasticsearch it is not necessary to define a map of your data; ES can do it automatically, it is called Dynamic Mapping. But it is always better to do it yourself, to give some extra help to ES to index the data in the right way.
But it is a good exercise to create our mapping to learn more about the different data structures ES supports.
The first part is the settings for the index. We can define the number of shards and replicas we need. If you do not know those concepts, read this article.
A simple explanation is that ES allow us to split an index into smaller subindexes distributed among all the nodes, that is call shard. And we can replicate that shard as many times as we need, this is useful for backups, but it is only handful if we are running an ES cluster.
The second part is defining the structure of our document. In this case a Room. To make it simpler, I just copied the same structure we had in the OpenAPI Spec definition. But, in a different case, it should not be necessary to have the same schema, and we could split the information in a different way for a different purpose.
For each attribute of the Room, we are giving the type, ES support many different of types. But here we have two that are probably quite handy.
- geo_point, this is useful when we need to index coordinates. A geo_point structure stores a latitude and longitude.
- keyword, this is used when we need to do the match against a specific value, for instance, a category name, or in this example an URL.
You can read more about the supported data structures in this article.
Elasticsearch is a powerful tool for searching, and probably I will have to write an exclusive article only for it. But I think this is enough to get some knowledge of some basic ES concepts.
Binding the Elasticsearch in the container
Finally, we have our Elasticsearch dependencies set up, we can start working on inject the
ElasticSearchIndex class in our post method.
If you remember the first article, we are using Flask-Injector to handling all our dependencies. To inject we only need to create our container as follow:
Now we have our
ElasticSearchIndex in the container and can be injected in any action like this:
Building the post action
We are getting closer; this is the last step. We need to create the post action that will receive the Room payload in the body, and it will index it on Elasticsearch.
This is the code we have in our
In this case, I want it to use a class, but it is not necessary, this can be even simplified using
def post(indexer, room) without having to instantiate the Room object.
This is very simple:
- We are injecting the ElasticSearchIndex service to this method.
- We are validating that the room URL does not exist in the index already. If it is exist we return a
409 (Conflict) status code
- If the room has been successfully indexed we return
201 status code
- If the room it is not indexed we return a
400 (Bad Request) status code
We are building a RESTFul API so it is very important to always return the proper status code to help the client understand what is going on. You can find a full list of status codes here.
Now we can execute a POST request to the 127.0.0.1:9090/room endpoint, passing as a body parameter a Room payload (you can check the OpenAPI docs to know how it should be defined). If it does not exists it will index that room. In a next article, we will build a service to search for all those rooms.
In this article, we learned how to create a complex OpenApi Spec configuration with several nested definitions on it. We introduce some basic concepts of Elasticsearch and finally included a more complex dependency in our container.
You can see it is pretty straightforward building the endpoint, but most of the work is defining the services we need, and working on the specifications.
But this work it is not in vain, having docs from the beginning it is a great thing. It will help you to scale to having multiple services, and multiple teams working on them more quickly.
In the next article, we are going to learn how to make Unit and Integration tests on that endpoint and services, and how to configure Travis to run the test suite every time we do a new commit on the repository.
This is a very interesting book to learn how to build a production-ready Elasticsearch solution.