Building a real-time elastic search engine using Python

Build and Deploy a Website using Flask and Digital ocean

In this article, we are going to build a course finder elastic search engine using Python and Flask. Then we will containerize our application and push to docker hub using Travis CI. So that every time you make changes to your application and push to GITHUB, Travis CI will containerize your application and push to docker hub. We will be using Nginx that acts as a reverse proxy for our application. After that we have our website we will deploy it on Digital ocean’s web server so that everyone on the internet can see what you’ve created and we will be creating a Free SSL certificate for your web application using Let’s Encrypt.

Step 1 — Create Docker Compose File

Docker Compose is a tool for defining and running multi-container Docker applications.

We will use the docker-compose.yml configuration file for creating application’s services. It’s a simple YAML file. After creating compose configuration file, we will start all the services from the configuration with only one command.

Now let’s write a docker-compose file to run Elasticsearch container.

version: '3'
services:
elasticsearch:
image: "elasticsearch:5"
networks:
- frontend
restart: always
volumes:
- ./ES_DATA:/usr/share/elasticsearch/data
environment:
- discovery.type=single-node
- bootstrap.memory_lock = true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ports:
- "9200:9200"

networks:
frontend:

This Compose file defines one service, The Elasticsearch service:

  • Uses an Elasticsearch image from docker hub.
  • Forwards the exposed port 9200 on the container to port 9200 on the host machine.
  • Using the network tags we can set up a network that we can use across different containers and project bases.
  • With this setup the containers are connected to the frontend network, therefore, external containers can also connect with the frontend network to be able to access the services in it.
  • We will store the indexed data in the Elasticsearch container to ES_DATA folder in our project folder. The basic syntax for mounting volumes is /host/path:/container/path

We will start Elasticsearch service using the following command in our project directory.

docker-compose up

Now hit localhost:9200 on your web browser to check whether Elasticsearch container is working or not.

Elastic search

Step 2— Create new elastic search indices

We will be creating two indices called autocomplete and hacker. hacker index will be using a template called search_engine_template.

Index templates allow you to define templates that will automatically be applied when new indices are created. The templates include both settings and mappings and a simple pattern template that controls whether the template should be applied to the new index.

When you run this create_new_index.py you will get the following output.

2. Created a new template: search_engine_template
4. Created an index: hacker
6. Created a new index: autocomplete

Let’s check whether new indices are created or not.

http://localhost:9200/_aliases

Step 3— Scraping websites with Python and Beautiful Soup and Ingesting data into Elasticsearch

We will scrap hackr.io website on Course Title,Topic, Upvotes, URL and Tags then we will ingest scraped data into Elasticsearch.

Here is our Python Scraper that will scrape the data from a hackr.io and ingest the data into Elasticsearch:

When you run this scraper you will able to see data getting scrapped and ingested to Elasticsearch.

Scraper

After the script finishes, we can check whether data ingested in our Elasticsearch indices or not when you hit http://localhost:9200/autocomplete/_search and http://localhost:9200/hacker/_search

http://localhost:9200/hacker/_search

Step 4— Building a Course Finder Search Engine from our Scraped Data

The directory structure of our Flask application looks like the below:

├── app.py
├── create_new_index.py
├── docker-compose.yml
├── Dockerfile
├── gunicorn_config.py
├── requirements.txt
├── routes
│ ├── __init__.py
│ ├── __pycache__
│ │ └── search.cpython-36.pyc
│ └── search.py
├── scraper.py
├── static
│ ├── css
│ ├── fonts
│ ├── images
│ ├── js
│ └── scss
└── templates
├── index.html
└── __init__.py

First, we can install dependencies needed to run our flask application that is present in the requirements.txt file.

// requirements.txt
Flask==1.0.2
requests>=2.20.0
gunicorn

Install requirements using pip command:

pip install -r requirements.txt

Our Python Flask Application will render our HTML files using jinja templates.

In my case my application is named app.py:

app.py

We have set threaded as True to support multithreading in our Flask Application and registered a blueprint in our app.py file.

A blueprint defines a collection of views, templates, static files and other elements that can be applied to an application. For example, let’s imagine that we have a blueprint for an admin panel. This blueprint would define the views for routes like /admin/login and /admin/dashboard.

Our route will be named as routes/search.py

search.py

Our Index Page will be named templates/index.html

If everything was running according to plan, you should be able to run your application and it will listen on port 8005.

When you access your Endpoint on Port 8005, you should be able to see the main screen, which should look like this:

Search Suggest
Search Results

Step 5— Creating a Dockerfile for your Flask Application

We will be using a Gunicorn web server to deploy our application and we can set a number of workers in the gunicorn configuration file.

First, let us create a configuration for our Gunicorn web server called gunicorn_config.py

gunicorn_config.py

Let’s create our Dockerfile:

Dockerfile

Step 6 — Create a new repository in the docker hub.

Go to https://hub.docker.com/ and create a new repository where we will be pushing our docker image using Travis CI.

Step 7— Using Travis-ci to containerize our flask application and push to Docker hub.

First, create a public repository in Github. Then sign up with https://travis-ci.org/.

We have to activate our repository in the travis-ci site so that we can make use of Travis CI/CD pipeline.

Got to https://travis-ci.org/account/repositories and activate your repository.

Activate the repository which needs to use Travis.

We then need to write .travis.yml in our project directory which contains instructions to deploy our application to docker hub.

.travis.yml

I have used the environment variables $DOCKER_ID and $DOCKER_PASSWORD which can be set on your Travis repository page.

Choose More options →Settings to set Environment variables in Travis CI.

Setting environmental variables in Travis CI

Now push your application to Github, Travis-ci will create a docker-container for your application and deploy to docker hub.

Step 8— Adding our Flask application services in our docker-compose file

Here is our updated docker-compose file

version: '3'
services:
elasticsearch:
image: "elasticsearch:5"
networks:
- frontend
restart: always
volumes:
- ./ES_DATA:/usr/share/elasticsearch/data
environment:
- discovery.type=single-node
- bootstrap.memory_lock = true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"

hacker:
image: "dineshsonachalam/hacker:1.0.0"
networks:
- frontend
restart: always
ports:
- "8005:8005"


networks:
frontend:

Here hacker services contain our flask application image which will be pulled from docker hub and it runs on port 8005.

We can run both the elastic search and hacker services using the following command which will start the application.

docker-compose up

Step 10— Deploy our application to Digital ocean

Step 10.1 — Creating a Digital Ocean Account

Create an account for Digital Ocean by following this link.

Step 10.2 — Create a Droplet in Digital

Create a droplet in the digital ocean by following this link.

Step 10.3 — Now connecting to the remote server using SSH

Now connect to your remote server using the following command.

ssh <<USERNAME>>@<<IP Address>>

Now let us clone my application repository.

git clone https://github.com/dineshsonachalam/Building-a-search-engine-using-Elasticsearch
cd Building-a-search-engine-using-Elasticsearch

Follow this link to install Docker on your server machine and install docker-compose by following this link.

After installing run docker-compose up to run your container services.

Now hit your <<IP>>:8005 port to see your application running

app.png

Step 11— Buying a domain name from Freenom

After buying a domain name from Freenom.

Go to My Domains →Management Tools →Nameserver → Change Nameserver → Use custom Name server.

Add your domain name in Digital ocean and copy the DNS record values for Type NS(Name Server) from Digital ocean and paste it into the Freenom custom name server.

I have bought a new domain name contentsea.tk and mapped to the digital ocean.

Now hitting contentsea.tk:8005 will run my container services. In your case <YOUR_DOMAIN_NAME>:8005 will run on port 8005.

Step 12 — Using Nginx reverse proxy to run our application using our domain name

Using nginx configuration our server on default runs on port:80. Here we are specifying when we hit contentsea.tk domain name run our hacker container services which run on port 8005.

nginx.conf

Update your docker-compose.yml file that runs Nginx container service.

docker-compose.yml

Then run your application using sudo docker-compose up. You can see your application when you hit your domain name.

contentsea.tk

Step 13 —Creating a free SSL certificate for your application using let's encrypt

Go to https://www.sslforfree.com and choose manual-verification(DNS) and follow the instructions specified. Then download all ssl certificate files where you will find ca_bundle.crt, certificate.crt, private.key

Now add contents from certificate.crt followed by cabundle.crt

Now copy the content from cabundle.crt and paste it at the bottom of certificate.crt

In your server create a directory named certs in the root directory and create a new file under the directory called certificate.crt and paste the certificate.crt file contents. Then create a new file called private.key and paste private.key contents.

root@ubuntu-s-1vcpu-1gb-blr1-01:~# pwd
/root
root@ubuntu-s-1vcpu-1gb-blr1-01:~# mkdir certs
root@ubuntu-s-1vcpu-1gb-blr1-01:~# cd certs/
root@ubuntu-s-1vcpu-1gb-blr1-01:~/certs# touch certificate.crt
root@ubuntu-s-1vcpu-1gb-blr1-01:~/certs# vi certificate.crt
root@ubuntu-s-1vcpu-1gb-blr1-01:~/certs# vi private.key
root@ubuntu-s-1vcpu-1gb-blr1-01:~/certs# ls
certificate.crt private.key

Nginx is an extremely efficient and quite flexible web server. When you want to do a redirect in Nginx, you have a few options to select from, so you can choose the one that suits you best to do a Nginx redirect.

Whenever a user listens to port 80 it will be redirected to port 443 where https listens.

Our updated docker-compose file which contains

Now run docker-compose up to run our application and you can see https when you hit your domain name.

Finally checking whether you https certificate is valid or not using status code which returns when you hit your domain name. If your https certificate is valid it should return status code 200.

Here is a simple python code to check the status code of your application.

check_status_code.py

Thanks for reading and good luck!