Ingesting real-time data into Elasticsearch with Node.js.

Mark Mayfield
6 min readJun 29, 2020

--

Elasticsearch is a powerful RESTful search and analytics engine capable of addressing a growing number of use cases. It will centrally store your data for lightning fast search, fine‑tuned relevancy, and powerful analytics that scale with ease. There are a ton of resources on how to ingest data into Elasticsearch using the elastic-stack (a.k.a the ELK stack) but resources seem to be thin when looking for alternative methods of ingestion.

Ingesting data with Node.js is a great option if your programming language of choice is JavaScript and you need to ingest data from a third party application using RESTful API methods. You can also host the server to have it continually ingest the data in real-time. This demo will show you how to setup a Node.js + Express.js server that ingests data into Elasticsearch in real-time that can then be analyzed and acted upon in a meaningful way.

For this Demo we will be using publicly available world-wide earthquake data published in real-time by the USGS.

Prerequisites:

  1. Create an Elastic Cloud cluster.

For simplicity, we will be using the free-trial version of the Elastic Cloud Service provided by Elastic to allow us to quickly get a hosted cluster up and running. Click here and follow the steps to start your free trial and deploy an Elasticsearch cluster

3. Setup Node.js + Express.js server.

Create a project folder called earthquakes and open it in Visual Studio Code. Inside the folder create a file called package.json and insert the following JSON:

In the terminal run the command:

npm install

This will install all the necessary dependencies to run our application.

In the root directory of the project folder, create another file called server.js and insert the following code:

Save. In the terminal, run the following command:

npm run server

If successful, you will see the following message in the terminal:

Server Started On 5000

Congratulations. You are now running a production ready Express server.

4. Add Elasticsearch credentials to the server.

Now, we need to add our Elasticsearch credentials to enable communication between our server and our Elasticsearch cluster. In the root directory of our application, create a folder called elasticsearch and within it a file called connection.js. Add the following code to that file:

Insert your own password that was generated and the Cloud ID which can be found on your deployment page in the Elastic Cloud.

6. Test that the Elastic Cloud endpoint is working.

We now need to test that our Elastic Cloud API endpoint is working. Back in server.js insert the following code and restart the server:

We have now added an Elasticsearch method that will ping our Elastic Cloud endpoint to let us know if the connection is successful. Save to restart the server and if everything was connected properly you should see the following message displayed in the terminal:

Elasticsearch is connected

If not, double check your credentials in the connection.js.

5. Setup RESTful API call to retrieve data from the source.

Now that our server is running and Elasticsearch is connected, we need to test the API call to the USGS in order to receive the initial data. In the route directory, create a folder called routes with a sub folder called api. Inside of the api folder, create a file called data.js and add the following code:

The code above is making an asynchronous API call to the USGS earthquake API using the npm package Axios. Once the data is received, it will then be displayed as JSON. You can also see that we imported a dependency called log-timestamp at the top of the page. This will allow us to have a timestamp added to every console.log.

We also need to add the newly created API route to our server.js:

Now it is time to run the API query. In your browser or postman go to the link:

localhost:5000/api/data/earthquakes

It should then return an array of earthquake objects. Success!

2. Create Elasticsearch index and Geo Point mappings.

Elasticsearch intuitively will created most datatypes such as strings and integers automatically upon ingestion but datatypes such as Geo Point need to be declared manually before any data is ingested.

To do this, we will use the Dev Tools interface in Kibana. Back in your Elastic Cloud, open up the link to your Kibana and go to the Dev Tools and enter the following:

6. Map through and create a custom data object from the data received.

We now need to map through the array of data we have been receiving and create our very own custom object that we will then ingest into Elasticsearch. Back in our data.js we will need to add the following code:

localhost:5000/api/data/earthquakes
Great! Now we are getting our custom array of earthquake objects.

7. Ingest the object data into Elasticsearch

Let’s add the final piece to the data.js that will then index our data into Elasticsearch. You can also see that we added a setInterval() method to continually check for new data ever 2 minutes while adding the indexing method for Elasticsearch.

In a browser or Postman run the route again:

localhost:5000/api/data/earthquakes

Your data should now be ingesting automatically in real-time!

Every document you store in Elasticsearch has an associated version number. That version number is a positive number between 1 and 2 63–1 (inclusive). When you index a document for the very first time, it gets the version 1 and you can see that in the response Elasticsearch returns.

Since we are using static id’s from the USGS website for each earthquake as the same document id’s ingested into Elasticsearch, we don’t have to worry about duplicate entries. Older versions of documents are automatically tagged and marked for deletion so there is no need to worry about over-ingesting redundant data.

8. Check Kibana for data and create analytics dashboard

In the Stack Management section of Kibana, click on Create index pattern. You should then see the earthquake index. Give the index pattern the name earthquakes and click next.

Add the time field as the timestamp. Click next again, and the index should be created. Now go to the Discover section and you should see the data coming in periodically as new data is created.

Voila! Data is being ingested.

9. Create visuals.

Now that data is being ingested, we can create visuals. This process deserves pretty much a whole tutorial to itself, but here is a small sample of what you can do with the newly ingested data:

Now that the data is being ingested, play around with the visuals in Kibana and see what you can come up with!

--

--

Mark Mayfield

Aspiring web developer and software enthusiast, musician, mountain biker, and amateur authentic Chinese cuisine chef.