Stories by Maanav Shah on Medium

What is hot cache vs cold cache?

Maanav Shah — Sun, 26 Mar 2023 11:55:28 GMT

A cache is a structure that holds some values (inodes, memory pages, disk blocks, etc.) for faster lookup.

Cache works by storing some kind of short references in a fast search data structure (hash table, B+ Tree) or faster access media (RAM memory vs HDD, SSD vs HDD).

Cache vs Memory

To be able to do this fast search you need your cache to hold values. Let’s look at an example.

Say, you have a Linux system with some filesystem. To access files in the filesystem you need to know where your file starts at the disk. This information stored in the inode. For simplicity, we say that the inode table is stored somewhere on disk.

Now imagine, that you need to read file /etc/fstab. To do this you need to read inode table from disk (10 ms) then parse it and get start block of the file and then read the file itself(10ms). Total ~20ms

This is way too many operations. So you are adding a cache in form of a hash table in RAM. RAM access is 10ns — that’s 1000! times faster. Each row in that hash table holds 2 values.

(inode number or filename) : (starting disk block)

But the problem is that at the start your cache is empty — such cache is called cold cache. To exploit the benefits of your cache you need to fill it with some values. How does it happen? When you’re looking for some file you look in your inode cache. If you don’t find inode in the cache (cache miss) you’re saying ‘Okay’ and do full read cycle with inode table reading, parsing it and reading file itself. But after parsing part you’re saving inode number and parsed starting disk block in your cache. And that’s going on and on — you try to read another file, you look in cache, you get cache miss (your cache is cold), you read from disk, you add a row in the cache.

So cold cache doesn’t give you any speedup because you are still reading from disk. In some cases, the cold cache makes your system slower because you’re doing extra work (extra step of looking up in a table) to warm up your cache.

After some time you’ll have some values in your cache, and by some time you try to read the file, you look up in cache and BAM! you have found inode (cache hit)! Now you have starting disk block, so you skip reading superblock and start reading the file itself! You have just saved 10ms!

That cache is called warm cache — cache with some values that give you cache hits.

TL;DR There is an analogy with a cold engine and warm engine of the car. Cold cache — doesn’t have any values and can’t give you any speedup because, well, it’s empty. Warm cache has some values and can give you that speedup.

That’s it. Hope this helps.

If you enjoyed this post, I’d be very grateful if you’d help it spread by emailing it to a friend or sharing it on LinkedIn or Twitter. Thank you!

Upgrade Boto for Python 3 and Django 2.2

Maanav Shah — Fri, 24 Feb 2023 16:35:37 GMT

I have recently written a blog post for upgrading your Python and Django services. This is an extension of that blog if you need to upgrade the Boto package for the same.
https://medium.com/@maanavshah/upgrading-your-django-and-python-microservice-9cb480541907

Boto

Boto is the AWS package our app had been using while running on Python 2.7.

It is basically used for all Amazon S3-related operations, like:

Upload file
Upload data (JSON/string)
Download file
Generate a signed URL of an S3 file

However, the boto package is not completely supported on Python 3+, and AWS now widely promotes a much better version that it calls boto3 which has extensive support for Python 3. Hence, the shift to boto3 was imminent.

What major changes were made code-wise?

The change to boto3 is quite straightforward, and one can refer here for particular documentation.

However, below are the major changes observed when migrating for the above-mentioned S3 operations:

The boto3 client/resource object now also expects the aws_region param along with the aws_access_key_id and aws_secret_access_key of the bucket to perform any operation.
This isn’t a breaking change per se, but if the aws_region param isn’t provided, it tries to perform the operation on the default AWS region. For example,

Earlier with boto, you would need to create a new S3Connection object, consecutively get a bucket object for the same, generate the remote key, and then perform the actual operation. For example,

Now, with boto3, all one needs is an instance of the client/resource, after which one can begin performing the operation. For example,

Boto3 1.42.9 documentation

That’s it. Hope this helps.

If you enjoyed this post, I’d be very grateful if you’d help it spread by emailing it to a friend or sharing it on LinkedIn or Twitter. Thank you!

Upgrade Boto for Python 3 and Django 2.2 was originally published in AWS Tip on Medium, where people are continuing the conversation by highlighting and responding to this story.

Upgrading your Django and Python microservice

Maanav Shah — Mon, 20 Feb 2023 12:17:03 GMT

We at Blinkit have many services in Django. We decided to upgrade to Python 3.8 (earlier 2.7) and Django 2.2 (earlier 1.8). While it's a complicated process, upgrading to the latest version of Django has several benefits:

New features, improvements, better performance, and bug fixes.
Older versions do not receive any security updates.
Upgrading to newer Django makes future upgrades as smooth as possible.

Prerequisite

Explore the difference between the different Python and Django versions. You should familiarize yourself with the changes that were made in the newer versions.
Choose your tool for maintaining compatibility between different python versions.
Explore coverage tools to calculate your test coverage, this will help in ensuring you have a working service.

Dependencies

Along with Python, in most cases, it will be necessary to upgrade to the latest version of your Django-related dependencies as well.

Caniusepython3

Use caniusepython3 to find out which of your dependencies are blocking your use of Python 3.

python2 -m pip install caniusepython3

Once you have identified the blocking dependencies, you can upgrade to the latest package that supports the Python 3 version.

Start Upgrading

Now you need to convert your python codebase from 2 to 3. For this, you can use different conversion tools that will make your life easy. We also need to hold active tasks/features and update all requirements in one go.

Futurize

We have used futurize to make our codebase Python 2/3 compatible.

pip install future

You can futurize your codebase in the following stages:

Step 1:

The goal for this step is to modernize the Python 2 code without introducing any dependencies (in the future or e.g. six) at this stage. The below command would write all changes to the original files and prepare a backup file having the original changes. If you don’t want changes to get written on their own, remove -w. It will only display the changes in the terminal.

futurize --stage1 -w **/*.py

Step 2:

The goal for this step is to get the tests passing first on Python 3 and then on Python 2 again with the help of the future package:

a. If you don’t want changes to get written on their own, run the following command.

futurize --stage2  **/*.py

To apply the changes, add the -w argument. If you would like to futurize to import all the changed builtins to have their Py3 semantics on Py2, invoke it like this:

futurize --stage2 --all-imports myfolder/*.py

b. Re-run your tests on Py3 now. Make changes until your tests pass on Py3.

c. Now run your tests on Py2 and notice the errors. Add wrappers from the future to re-enable Python 2 compatibility. See the Cheat Sheet: Writing Python 2–3 compatible code cheat sheet and What else you need to know for more info.

d. After each change, re-run the tests on Py3 and Py2 to ensure they pass on both.

e. Don’t forget to include the future as a dependency in your requirements.txt.

f. You’re done! Celebrate! Commit your changes and push your code.

Note: It is very important to review the changes after converting from python 2 to 3 (Use cheat sheet). Some functions have extra memory overhead and a few are inefficient for python 3.

2to3

We can also use the 2to3 package to convert our codebase from Python 2 to Python 3.

Install 2to3 and run on *.py
2to3 will overwrite existing python files and also create backup files for those files that are modified.

Python caveats

Although Futurize and 2to3 will make your code compatible with Python 3, still you’ve to manually check your entire repository for some weird errors.

Division operator

Python 2 and Python 3 division operator works differently.

# python 2
x = 10/5
print x # 2

# python3
x = 10/5
print(x) # 2.0

# python 2
for i in range(product_qty/10):

# python 3
for i in range(int(product_qty/10)):

Make sure you’re not doing some float division on iteration in your entire repository.

Rounding decimals

# python 2

from decimal import Decimal
round(Decimal("10.121"), 2) # 10.12

# python 3

from decimal import Decimal
round(Decimal("10.121"), 2) # Decimal("10.12")

Unicode str

# python 2
from hashlib import md5
md5("abcd").hexdigest() # 'e2fc714c4727ee9395f324cd2e7f331f'

# python 3
from hashlib import md5
md5("abcd").hexdigest() # TypeError: Unicode-objects must be encoded before hashing

Check out this porting guide for more caveats on str/Unicode.

Django Changes

Following are the changes and challenges we faced after upgrading to Django 2.2 from Django 1.8.

On Delete

Django 2.2 expects an on_delete argument while adding a ForeignKey and OneToOneField using Django models. So we added a on_delete=models.DO_NOTHING argument at existing ForeignKey declarations. We also need to add the same changes to the Django migration files. This will create a new migration file. However, since there should be no alteration in the database, we decided to fake the migration.

Transaction hooks and Database Client

In Django 1.8, we used transaction_hooks.backends.mysql as our default MySQL Database engine. However, in Django 2.2 transaction_hooks were included in the Django db package by default. So, we updated our engine as django.db.backends.mysql, for both, master and slave configuration. Also, the MySQL client we used MySQL-python was not supported by Python 3. We upgraded the package to use mysqlclient instead.

404Handler

If you’ve made some custom 404handler view then in the newer Django version you need to accept exceptions as a required argument in your view.

Backward incompatible changes

Depending on which Django version you’re moving away from, you should refer to release notes thoroughly for incompatible changes.
- Migrating from Django1.8 to Django1.11
- Migrating from Django1.11 to Django2.2

Django Rest Framework

You need to upgrade your DRF because it is deprecated for older versions of django.

Some changes we made after upgrading DRF to 3.12.2:

Serializer Fields

While defining any serializer, Django 2.2 expects the fields to be declared in the Meta class. So, for the existing serializer, we added the following code to resolve the issue:

class Meta:
  fields = '__all__'

list_route and detail_route decorators

Replace detail_route uses with @action(detail=True)
Replace list_route uses with @action(detail=False)

Celery and RabbitMQ

A lot of changes happened in both celery and python in recent years.

async and await are now keywords in python>=3.5 which caused a lot of trouble for some celery versions
celery>=4.0 changed its default task_serializer from pickle to json which means for your older celery this might work def task(model_instance) but won’t work in newer ones unless you explicitly specify pickle as your task_serializer in your settings.py
We didn't want to interfere with old celery queues, so we created a new virtual host (vhost) in Rabbitmq keeping the Celery queue name the same. With this, we ensured that new workers will pick tasks from new queues, and old workers (running on old pods) will keep consuming from the older queues.
Check your CELERY_BROKER_URL to find out which user and vhost is used in making the connection with Rabbitmq. To create a new vhost in Rabbitmq, do the following:

1ssh  2sudo rabbitmqctl add_vhost v1 3sudo rabbitmqctl set_permissions -p "v1" "user" ".*" ".*" ".*"

Redis

The pickle version used by both versions is different. Django 1.8 uses pickle version 2 and Django 2.2 uses pickle version 4. This throws unsupported pickle type ValueError.

The default data format in which python 3 stores string is Unicode and that of python 2 is Bytes. So, while using set and get in python 2 and 3 environments, we will read different values which will break our system. For example, if I set a value in the python 3 environment and try to read it in the python 2 environment it will interpret it as Unicode.

So our solution was to use a different Namespace (Table) in Redis. This allows us to separate the Redis namespace and avoid the pickle version collision. For example, to use DB 10 in the Redis namespace you can use the following configuration.

That’s it. Hope this helps.

If you enjoyed this post, I’d be very grateful if you’d help it spread by emailing it to a friend or sharing it on LinkedIn or Twitter. Thank you!

References

https://python-future.org/compatible_idioms.html

https://docs.djangoproject.com/en/2.2/releases/2.2/#backwards-incompatible-2-2

https://docs.djangoproject.com/en/1.11/releases/1.11/#backwards-incompatible-changes-in-1-11

https://docs.celeryproject.org/en/stable/whatsnew-5.0.html#upgrading-from-celery-4-x

https://django-redis-cache.readthedocs.io/en/latest/advanced_configuration.html

Performance optimization for LIKE queries in PostgreSQL

Maanav Shah — Tue, 28 Jun 2022 14:47:47 GMT

Here at blinkit, we’re trying to make sure that we query data quickly and efficiently. Poor queries, means we’re wasting both time and expensive resources.

One of the website’s most common searches is the product catalog. In the fashion e-commerce industry, users will search for different clothing brands. For example, if you want to buy an Apple MacBook Pro, you may either search for Apple or MacBook

So for querying such a column we generally use the LIKE/ILIKE and ~/~* operator. A PostgreSQL query will look like this:

SELECT * FROM product where product_name like ‘Apple%’;

SELECT * FROM product where product_name ilike ‘%macbook%’;

And it is super slow if we don’t add an index. So if I just add an index here, it will create a B-tree index

B-tree is a self-balancing tree that maintains sorted data and allows operation in logarithmic time. B-trees can handle range queries on sorted data (<, ≤, >, ≥, between, in, is null, is not null)

CREATE INDEX product_idx_1 ON product_name;

B-tree index can also be used for queries that involve pattern matching operator LIKE or ~ if the pattern is a constant and the anchor is at the beginning of the pattern. For example, you can try matching queries column_name LIKE ‘Apple%’ or column_name ~ ‘^Apple’

But, querying ‘%macbook%’ or ‘%pro’ will not be efficient. For such queries, the query planner will resort to full-text sequential search which is not optimized.

Enter, GIN indexes.

GIN stands for Generalized Inverted Indexes.

We can create a GIN index to speed up text searches:

CREATE INDEX index_name ON table USING GIN (to_tsvector(‘english’, column_name));

The query above specifies that the English configuration can be used to parse and normalize the strings. And for the part of searching, a simple query to print the title of each row that contains the word friend in its body field is:

SELECT * FROM table WHERE to_tsvector(‘english’, column_name) @@ to_tsquery(‘english’, 'text_to_search);'

This will also find related words, for example, if you search friend, it will also search for words such as friends and friendly, since all these are reduced to the same normalized lexeme.

For testing this out, I created a movies table with ~1 million records:
(Please find the repo here: https://github.com/maanavshah/gin-index-101)

table structure — movies

select * from movies limit 10;

We will be comparing the performance for the following queries:

Query 1: beginning of the pattern

EXPLAIN ANALYSE SELECT * FROM movies WHERE title LIKE 'Pirate%';

Query 2: beginning of the pattern (case-sensitive)

EXPLAIN ANALYSE SELECT * FROM movies WHERE title ILIKE 'Pirate%';

Query 3: contains pattern in the middle

EXPLAIN ANALYSE SELECT * FROM movies WHERE title ILIKE '%sea%';

No index performance

Execution Time: 51.886 ms

Execution Time: 109.932 ms

Execution Time: 152.286 ms

B-tree index performance

We can observe that on creating a B-tree index, the performance for the beginning of the pattern query has improved from 51.8 ms to 2.6 ms, however, performance for queries 2 and 3 did not improve.

CREATE INDEX movies_name_idx_0 ON movies (title);

Execution Time: 2.629 ms

Execution Time: 123.529 ms

Execution Time: 129.386 ms

DROP INDEX movies_name_idx_0;

GIN index performance

CREATE INDEX movies_name_idx_1 ON movies USING GIN (to_tsvector('english', title));

The syntax for running queries 2 and 3 is as follows:

EXPLAIN ANALYZE SELECT * FROM movies WHERE to_tsvector('english', title) @@ to_tsquery('english', 'Pirate');

Execution Time: 2.294 ms

EXPLAIN ANALYZE SELECT * FROM movies WHERE to_tsvector('english', title) @@ to_tsquery('english', 'sea');

Execution Time: 6.330 ms

DROP INDEX movies_name_idx_1;

Conclusion:

We can observe that on creating the GIN index, performance for queries 2 and 3 has significantly improved when compared with B-tree. The time taken by the GIN index was reduced from 129.3 ms to 6.3 ms over B-tree for matching anchor in the middle.

Reference:

https://pganalyze.com/blog/gin-index
https://www.postgresql.org/docs/13/textsearch-tables.html

You can also use pg_trgm for similar use cases.
https://niallburkley.com/blog/index-columns-for-like-in-postgres/

That’s it. Hope this helps.

If you enjoyed this post, I’d be very grateful if you’d help it spread by emailing it to a friend or sharing it on LinkedIn or Twitter. Thank you!

Performance optimization for LIKE queries in PostgreSQL was originally published in The Startup on Medium, where people are continuing the conversation by highlighting and responding to this story.

Why use Nginx for Flask/Django/RoR?

Maanav Shah — Fri, 17 Jul 2020 21:18:37 GMT

Why use Nginx? Flask, Django, ROR, NodeJS are not Production Server.

Why do we use Nginx, a web server in front of an application such as Flask, Django, Ruby on Rails, NodeJS, etc?

I will be talking about Flask over here, but this applies to all Frameworks such as Django, Ruby on Rails, NodeJS, etc.

If you are interested to know, how to deploy Flask Applications with uWSGI and Nginx, please check this out.

You’ve built your Flask web app and are working on deploying the site. It’s your first, small app and you kinda expected that setting debug to False on the app.run should be enough. Maybe enable threaded too?

You really shouldn’t rely on that. The official docs disagree as well. It clearly states Flask’s built-in server is not suitable for production.

What now? Well, no need to be confused. All is fine, you just need to understand what the Flask development web server is meant for, what it lacks and what to use instead.

Flask’s Built-In Web Server

The built-in Flask web server is provided for development convenience.

With it, you can make your app accessible on your local machine without having to set up other services and make them play together nicely. However, it is only meant to be used by one person at a time and is built this way. It can also serve static files but does so very slowly compared to tools that are built to do it quickly. This does not matter when only one person is accessing it, so it’s perfect for what it is meant for.

When running a web app in production, you want it to be able to handle multiple users and many requests, without those fine people having to wait noticeable amounts of time for the pages and static files to load.

A Production Stack

A production setup usually consists of multiple components, each designed and built to be really good at one specific thing. They are fast, reliable and very focused.

Communication with the whole thing, as in the case of the built-in web server, happens via HTTP. A request comes in and arrives at the first component — a dedicated web server. It is great at reading static files from disk (your CSS files for example) and handling multiple requests. When a request is not a static file but meant for your all it gets passed on down the stack.

The application server gets those fancy requests and converts the information from them into Python objects which are usable by frameworks. How this is supposed to happen is described by a specification people agreed on — WSGI.

Your Flask app does not actually run as you would think a server would — waiting for requests and reacting to them. It can be seen as a function that is called by the application server, being provided the request object.

The output of running your app is then packaged up into an HTTP response by the application server and passed back to the webserver to be delivered back to the user.

Conclusion

If you want to run Flask in production, be sure to use a production-ready web server like Nginx, and let your app be handled by a WSGI application server like Gunicorn.

If you plan on running on Heroku, a web server is provided implicitly. You just need to specify a command to run the application server (again, Gunicorn is fine) in the Procfile.

That’s it. Hope this helps.

If you enjoyed this post, I’d be very grateful if you’d help it spread by emailing it to a friend or sharing it on Twitter or Facebook. Thank you!

Why use Nginx for Flask/Django/RoR? was originally published in Analytics Vidhya on Medium, where people are continuing the conversation by highlighting and responding to this story.

Deploy Flask Applications With uWSGI and Nginx on Ubuntu 18.04

Maanav Shah — Tue, 07 Jul 2020 10:37:12 GMT

If you are interested to know, why do we use Nginx in front of an application such as Flask, Django, Ruby on Rails, NodeJS, etc?

You can also read about, why is WSGI necessary?

When you have an Ubuntu or any Linux server and want to set up Flask application using Nginx and uWSGI, you need to log in as your non-root user to begin using:

ssh ip_address

Step 1 — Installing Nginx

Because Nginx is available in Ubuntu’s default repositories, it is possible to install it from these repositories using the apt packaging system.

Since this is our first interaction with the apt packaging system in this session, we will update our local package index so that we have access to the most recent package listings. Afterward, we can install nginx:

sudo apt-get update
sudo apt-get install nginx

Step 2 — Checking your Web Server

At the end of the installation process, Ubuntu 18.04 starts Nginx. The web server should already be up and running.

We can check with the systemd init system to make sure the service is running by typing:

sudo systemctl status nginx

As you can see below, the service appears to have started successfully.

Output
● nginx.service - A high performance web server and a reverse proxy server
   Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2020-07-07 07:50:48 UTC; 53min ago
     Docs: man:nginx(8)
 Main PID: 10441 (nginx)
    Tasks: 2 (limit: 4373)
   Memory: 2.9M
   CGroup: /system.slice/nginx.service
           ├─10441 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
           └─10443 nginx: worker process

However, the best way to test this is to actually request a page from Nginx.

When you have your server’s IP address, enter it into your browser’s address bar:

http://your_server_ip

You should see the default Nginx landing page:

This page is included with Nginx to show you that the server is running correctly.

Step 3 — Managing the Nginx Process

Now that you have your web server up and running, let’s review some basic management commands.

To stop your web server, type:

sudo systemctl stop nginx

To start the webserver when it is stopped, type:

sudo systemctl start nginx

To stop and then start the service again, type:

sudo systemctl restart nginx

If you are simply making configuration changes, Nginx can often reload without dropping connections. To do this, type:

sudo systemctl reload nginx

By default, Nginx is configured to start automatically when the server boots. If this is not what you want, you can disable this behavior by typing:

sudo systemctl disable nginx

To re-enable the service to start up at boot, you can type:

sudo systemctl enable nginx

Step 4 — Installing the Components from the Ubuntu Repositories

Our first step will be to install all of the pieces that we need from the Ubuntu repositories. We will install pip, the Python package manager, to manage our Python components. We will also get the Python development files necessary to build uWSGI.

First, let’s update the local package index and install the packages that will allow us to build our Python environment. These will include python-pip3, along with a few more packages and development tools necessary for a robust programming environment:

sudo apt install python3-pip python3-dev build-essential libssl-dev libffi-dev python3-setuptools

With these packages in place, let’s move on to creating a virtual environment for our project.

Step 5 — Creating a Python Virtual Environment

Next, we’ll set up a virtual environment in order to isolate our Flask application from the other Python files on the system.

Start by installing the python3-venv package, which will install the venv module:

pip3 install virtualenv

Next, let’s make a parent directory for our Flask project. Move into the directory after you create it:

mkdir ~/myproject
cd ~/myproject

Create a virtual environment to store your Flask project’s Python requirements by typing:

python3 -m virtualenv myprojectenv

This will install a local copy of Python and pip into a directory called myprojectenv within your project directory.

Before installing applications within the virtual environment, you need to activate it. Do so by typing:

source myprojectenv/bin/activate

Your prompt will change to indicate that you are now operating within the virtual environment. It will look something like this (myprojectenv)user@host:~/myproject$.

Step 6 — Setting Up a Flask Application

Now that you are in your virtual environment, you can install Flask and uWSGI and get started on designing your application.

First, let’s install wheel with the local instance of pip to ensure that our packages will install even if they are missing wheel archives:

pip install wheel

Next, let’s install Flask and uWSGI:

pip install uwsgi flask

Creating a Sample App

Now that you have Flask available, you can create a simple application. Flask is a microframework. It does not include many of the tools that more full-featured frameworks might, and exists mainly as a module that you can import into your projects to assist you in initializing a web application.

While your application might be more complex, we’ll create our Flask app in a single file, called myproject.py:

vi ~/myproject/myproject.py

The application code will live in this file. It will import Flask and instantiate a Flask object. You can use this to define the functions that should be run when a specific route is requested:

from flask import Flask
app = Flask(__name__)

@app.route("/")
def hello():
    return "Hello There!
"

if __name__ == "__main__":
    app.run(host='0.0.0.0')

This basically defines what content to present when the root domain is accessed. Save and close the file when you’re finished.

Now, you can test your Flask app by typing:

python myproject.py

You will see output like the following, including a helpful warning reminding you not to use this server setup in production:

Output
* Serving Flask app “myproject” (lazy loading)
 * Environment: production
 WARNING: Do not use the development server in a production environment.
 Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)

When you are finished, hit CTRL-C in your terminal window to stop the Flask development server.

Creating the WSGI Entry Point

Next, let’s create a file that will serve as the entry point for our application. This will tell our uWSGI server how to interact with it.

Let’s call the file wsgi.py:

vi ~/myproject/wsgi.py

In this file, let’s import the Flask instance from our application and run it:

from myproject import app

if __name__ == "__main__":
    app.run()

Save and close the file when you are finished.

We’re now done with our virtual environment, so we can deactivate it:

deactivate

Any Python commands will now use the system’s Python environment again.

Creating a uWSGI Configuration File

You have tested that uWSGI is able to serve your application, but ultimately you will want something more robust for long-term usage. You can create a uWSGI configuration file with the relevant options for this.

Let’s place that file in our project directory and call it myproject.ini:

vi ~/myproject/myproject.ini

Let’s put the content of our configuration file:

[uwsgi]
module = wsgi:app

master = true
processes = 5

socket = myproject.sock
chmod-socket = 660
vacuum = true

die-on-term = true

logto = /home/maanav/myproject/myproject.log

When you are finished, save and close the file.

Note - Please remember to change maanav to your username.

Step 7 — Creating a systemd Unit File

Next, let’s create a systemd service unit file. Creating a systemd unit file will allow Ubuntu’s init system to automatically start uWSGI and serve the Flask application whenever the server boots.

Create a unit file ending in .service within the /etc/systemd/system directory to begin:

sudo vi /etc/systemd/system/myproject.service

Let’s put the content of our server file:

[Unit]
Description=uWSGI instance to serve myproject
After=network.target

[Service]
User=maanav
Group=www-data
WorkingDirectory=/home/maanav/myproject
Environment="PATH=/home/maanav/myproject/myprojectenv/bin"
ExecStart=/home/maanav/myproject/myprojectenv/bin/uwsgi --ini myproject.ini

[Install]
WantedBy=multi-user.target

With that, our systemd service file is complete. Save and close it now.

We can now start the uWSGI service we created and enable it so that it starts at boot:

sudo systemctl start myproject
sudo systemctl enable myproject

Let’s check the status:

sudo systemctl status myproject

You should see output like this:

Output
● myproject.service - uWSGI instance to serve myproject
   Loaded: loaded (/etc/systemd/system/myproject.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2018-07-13 14:28:39 UTC; 46s ago
 Main PID: 30360 (uwsgi)
    Tasks: 6 (limit: 1153)
   CGroup: /system.slice/myproject.service
           ├─30360 /home/maanav/myproject/myprojectenv/bin/uwsgi --ini myproject.ini
           ├─30378 /home/maanav/myproject/myprojectenv/bin/uwsgi --ini myproject.ini
           ├─30379 /home/maanav/myproject/myprojectenv/bin/uwsgi --ini myproject.ini
           ├─30380 /home/maanav/myproject/myprojectenv/bin/uwsgi --ini myproject.ini
           ├─30381 /home/maanav/myproject/myprojectenv/bin/uwsgi --ini myproject.ini
           └─30382 /home/maanav/myproject/myprojectenv/bin/uwsgi --ini myproject.ini

If you see any errors, be sure to resolve them before continuing with the tutorial.

Step 8 — Configuring Nginx to Proxy Requests

Our uWSGI application server should now be up and running, waiting for requests on the socket file in the project directory. Let’s configure Nginx to pass web requests to that socket using the uwsgi protocol.

Begin by creating a new server block configuration file in Nginx’s sites-available directory. Let’s call this myproject to keep in line with the rest of the guide:

sudo nano /etc/nginx/sites-available/myproject

Open up a server block and tell Nginx to listen on the default port 80. Let’s also tell it to use this block for requests for our server’s domain name:

server {
    listen 80;
    server_name your_domain www.your_domain;

location / {
        include uwsgi_params;
        uwsgi_pass unix:/home/maanav/myproject/myproject.sock;
    }
}

If we do not have a domain registered, then we can add IP address as server name:

server {
    listen 80;
    server_name ip_address;

location / {
        include uwsgi_params;
        uwsgi_pass unix:/home/maanav/myproject/myproject.sock;
    }
}

Save and close the file when you’re finished.

To enable the Nginx server block configuration you’ve just created, link the file to the sites-enabled directory:

sudo ln -s /etc/nginx/sites-available/myproject /etc/nginx/sites-enabled

With the file in that directory, we can test for syntax errors by typing:

sudo nginx -t

If this returns without indicating any issues, restart the Nginx process to read the new configuration:

sudo systemctl restart nginx

You should now be able to navigate to your server’s domain name in your web browser:

http://your_domain

http://ip_address

You should see your application output:

Step 9 — Managing the application process

Now that you have your application up and running, let’s review some basic management commands.

To stop your application, type:

sudo systemctl stop myproject

To start the application when it is stopped, type:

sudo systemctl start myproject

To stop and then start the service again, type:

sudo systemctl restart myproject

To check the status of the application:

sudo systemctl status myproject

Logs

Application Logs

/home/maanav/myproject/myproject.log: Every application request is recorded is in this log file.

Server Logs

/var/log/nginx/access.log: Every request to your web server is recorded in this log file unless Nginx is configured to do otherwise.
/var/log/nginx/error.log: Any Nginx errors will be recorded in this log.

Conclusion

In this guide, you created and secured a simple Flask application within a Python virtual environment. You created a WSGI entry point so that any WSGI-capable application server can interface with it, and then configured the uWSGI app server to provide this function. Afterward, you created a systemd service file to automatically launch the application server on boot.

That’s it. Hope this helps.

If you enjoyed this post, I’d be very grateful if you’d help it spread by emailing it to a friend or sharing it on Twitter or Facebook. Thank you!

Deploy Flask Applications With uWSGI and Nginx on Ubuntu 18.04 was originally published in The Startup on Medium, where people are continuing the conversation by highlighting and responding to this story.

Dog Breed Classifier using CNN

Maanav Shah — Sat, 25 Apr 2020 09:23:02 GMT

Imagine you are having your weekend jog/walk in the park, you see a really cute dog. Have you ever wondered which breed the dog belonged to? I have…

There are 266 individual breeds of dog pictured on the website dog time. If you are like me, you would be able to identify not more than 10–15 of the breeds.

So, when I was given a choice of a few different projects for the Data Scientist Nanodegree by Udacity, I chose the ‘Dog Breed Classifier Project’. This is a very popular project across machine learning and artificial intelligence Nanodegree programs offered by Udacity.

Overview

The aim of the project in the Data Scientist Nanodegree was to create a web application that is able to identify a breed of dog if given a photo or image as input. If the photo or image contains a human face (or alien face), then the application will return the breed of dog that most resembles this person.

The project uses Convolutional Neural Networks (CNNs)! A pipeline is built to process real-world, user-supplied images. Given an image of a dog, the algorithm will identify an estimate of the canine’s breed. If supplied an image of a human, the code will identify the resembling dog breed.

The steps that were followed to work through the project were the following:

Step 0: Import Datasets
Step 1: Detect Humans
Step 2: Detect Dogs
Step 3: Create a CNN to classify Dog Breeds (from scratch)
Step 4: Use a CNN to classify Dog Breeds (using Transfer Learning)
Step 5: Create a CNN to classify Dog Breeds (using Transfer Learning)
Step 6: Write an algorithm
Step 7: Test algorithm

In this project, I have experimented with both Keras and Fast.AI to build the Convolutional Neural Network (CNN) to make the dog predictions.

I have set myself a target test accuracy for the CNN of 90% i.e., the model identifies the dog breed 9 times out of 10 correctly. We will be using the accuracy metric on the testing dataset to measure the performance of our models.

To follow along with the steps you can download or clone the notebook from my GitHub repository. The repository features the ‘dog_breed_classifier.ipynb’ that runs on the GPU provided for free at Google Colab.

Step 0: Import Datasets

The datasets were provided by Udacity.

Dog Images — The dog images provided are available in the repository within the Images directory further organized into the train, valid and test subfolders
Human Faces — An exhaustive dataset of faces of celebrities have also been added to the repository in the lfw folder
Haarcascades — ML-based approach where a cascade function is trained from a lot of positive and negative images, and used to detect objects in other images. The algorithm uses the Haar frontal face to detect humans. So the expectation is that an image with the frontal features clearly defined is required
Test Images — A folder with certain test images have been added to be able to check the effectiveness of the algorithm
Pre-computed features for networks currently available in Keras (i.e. VGG19, InceptionV3, and Xception) will be made available from S3
any other downloads to ensure the smooth running of the notebook are available in the repository.

Load all the libraries and packages required through the notebook.

The libraries required can be categorized as follows:

Utility libraries — random (for random seeding), timeit (to calculate execution time),os, pathlib, glob(for folder and path operations), tqdm (for execution progress), sklearn (for loading datasets), requests and io (load files from the web)
Image processing — OpenCV (cv2), PIL
Keras and Fastai for creating CNN
Matplotlib for viewing plots/images and Numpy for tensor processing

Use the load dataset function from sklearn to import our datasets for our dog breed model training. Create the list of training, validation and test sets of filenames and the dog breed labels. Create a few paths that will be used later.

Dataset stats

The dog_names variable stores a list of the names for the classes to use in our prediction model. Based on the path name, we see a total of 8351 images of dogs belonging to 133 different dog breeds which are then categorized into 6680, 835, and 836 images in training, validation, and testing.

Step 1: Detect Humans based on OpenCV Haar cascade classifiers

Object Detection using Haar feature-based cascade classifiers is an effective object detection method proposed by Paul Viola and Michael Jones in their paper, “Rapid Object Detection using a Boosted Cascade of Simple Features” in 2001. It is a machine learning-based approach where a cascade function is trained from a lot of positive and negative images, which is then used to detect objects in other images.

We use OpenCV’s implementation of Haar feature-based cascade classifiers to detect human faces in images. OpenCV provides many pre-trained face detectors, stored as XML files on Github. Before using any of the face detectors, it is standard procedure to convert the images to grayscale. The detectMultiScale function executes the classifier stored in face_cascade and takes the grayscale image as a parameter. The face_detector function takes a string-valued file path to an image as input and returns True if a human face is detected in an image and False otherwise. While testing the human face detector, all 100 human faces were detected as human faces while 11 of the 100 dog faces were also detected as human faces

Step 2: Detect Dogs

Here, we use a pre-trained ResNet-50 model to detect dogs in images. Our first line of code downloads the ResNet-50 model, along with weights that have been trained on ImageNet, a very large, very popular dataset used for image classification and other vision tasks. ImageNet contains over 10 million URLs, each linking to an image containing an object from one of 1000 categories. Given an image, this pre-trained ResNet-50 model returns a prediction (derived from the available categories in ImageNet) for the object that is contained in the image.

When using TensorFlow as backend, Keras CNNs require a 4D array (which we’ll also refer to as a 4D tensor) as input, with shape

where nb_samples corresponds to the total number of images (or samples), and rows, columns, and channels correspond to the number of rows, columns, and channels for each image, respectively.

Create tensor input from paths to images

The path_to_tensor the function takes a string-valued file path to a color image as input and returns a 4D tensor suitable for supplying to a Keras CNN. The function first loads the image and resizes it to a square image that is 224×224 pixels. Next, the image is converted to an array, which is then resized to a 4D tensor. In this case, since we are working with color images, each image has three channels. Likewise, since we are processing a single image (or sample), the returned tensor will always have shape (1, 224, 224, 3).

The paths_to_tensor the function takes a NumPy array of string-valued image paths as input and returns a 4D tensor with shape (nbsamples, 224, 224, 3). Here, nb_samples is the number of samples, or a number of images, in the supplied array of image paths. It is best to think of nb_samples as the number of 3D tensors (where each 3D tensor corresponds to a different image).

In addition, ResNet-50 requires additional processing such as reordering of channels from RGB to BGR and normalization of pixels which is done using preprocess_input.

The model is then used to extract the predictions. The predict method, returns an array whose 𝑖-th entry is the model's predicted probability that the image belongs to the 𝑖-th ImageNet category. This is implemented in the ResNet50_predict_labels function below.

The categories corresponding to dogs appear in an uninterrupted sequence corresponding to keys 151–268, inclusive, to include all categories from 'Chihuahua' to 'Mexican hairless'. So, if the function returns any number between 151 to 268, the supplied image is that of a dog.

The dog_detector function above, returns True if a dog is detected in an image (and False if not). None of the samples of human images have a detected dog as expected and all sample images of dogs have a detected dog as expected.

Step 3: Create a CNN to Classify Dog Breeds

The model that I selected had a CNN architecture of 4 convolutional layers alternating with max-pooling layers, 10% dropout and batch normalization. The filters used were 16, 32, 64 and 128. The drop-outs were used to reduce the possibility of over-fitting.

This is then followed by a global average pooling layer which is then followed by a dense layer to identify 133 breeds.

This takes a 4D-tensor with shape (1, 224, 224, 3) and provides an array of 133 with probabilities. The optimizer used was ‘RMSProp’ and the metric used was accuracy. The model was run for 10 epochs and provided an accuracy of 6.69%

CNN model from scratch

Step 4: Use a CNN to Classify Dog Breeds

I used VGG16 to demonstrate the use of Transfer Learning. Bottleneck features is the concept of taking a pre-trained model and chopping off the top classifying layer, and then providing this “chopped” VGG16 as the first layer into our model.

The bottleneck features are the last activation maps in the VGG16, (the fully-connected layers for classifying has been cut off) thus making it now an effective feature extractor. The bottleneck features were obtained from a website where it’s stored as a .npz file using the BytesIO library along with requests for the URL extraction.

The pre-trained VGG-16 model was then used as a fixed feature extractor, where the last convolutional output of VGG-16 is fed as input to our model. The shape of the VGG16 pre-trained model was 6680, 7, 7, 512 i.e. a layer of (7,7,512) with 6680 samples. A global average pooling layer and a fully connected layer, where the latter contains one node for each dog category and is equipped with a softmax. Running this model for 20 epochs resulted in an increase in the accuracy of 47%. This demonstrates the benefit of leveraging Transfer Learning from pre-trained models.

Step 5: Create a CNN to Classify Dog Breeds (using Transfer Learning)

The model was built using Keras leveraging Transfer Learning. I tried with 4 different models VGG19, ResNet50, InceptionV3, and Xception.

The shapes correspond to VGG19: (6680, 7, 7, 512) Resnet50 : (6680, 1, 1, 2048) Inception: (6680, 5, 5, 2048) Xception : (6680, 7, 7, 2048). It took about 160 seconds to load all the Transfer learning models.

These models were then added with a global average pooling layer, a dropout layer followed by a fully connected layer (with softmax) and then run for 20 epochs

Training the models took less than a minute in each of these cases.

Training Time in seconds

Accuracy for Xception was ~85% while VGG19 was ~46%

I then explored options for increasing the accuracy. I used fastai to see if we could leverage transfer learning and obtain a higher accuracy.

The data bunch was created and normalized.

A cnn_learner was created with the resnet34 model and was run for two cycles. The accuracy was upto 86%. An optimal learning rate seems to be between 1e-6 and 1e-4

After using unfreeze and refitting the model and for 10 epochs an accuracy of up to 89.8% is also obtained that ensures up to 9 out of 10 images are accurately classified.

Based on the analysis of various models that we have fit, the learn_resnet34 seems to provide the most accuracy. This is also saved and exported as a pickle file for classification.

Step 6: Write own algorithm to provide an output breed based on an image

We input an image path, the bottleneck features for our pretrained model are applied to the image, this is then processed through our trained fully-connected model which gives a predicted_breed, the category index, and the probability tensor. The predict_breed function takes an input of a file_path and outputs the breed of the dog.

Our algorithm accepts a file path to an image and first determines whether the image contains a human, dog, or neither. Then,

if a dog is detected in the image, return the predicted breed.
if a human is detected in the image, return the resembling dog breed.
if neither is detected in the image, provide an output that indicates an error.

The algorithm leverages the CNN built in Stage 5 and leverages the previous functions created to come up with an output.

The algo function determines if the provided file_path contains a dog or human or neither and returns the species (dog or human or neither) and the predicted breed of the image

The provide_output outputs a greeting based on the predicted species and dog breed.

Step 7: Test Your Algorithm

The six dogs that were sampled to check the algorithm were correctly identified as dogs. The breeds of 5 of 6 were accurate too. Only 1 dog (a Rajapalayam, a native breed was identified as a Great Dane, possibly because Rajapalayam is not one of the 133 breeds in the ImageNet dataset.

The humans were also identified as human and a dog breed predicted — incidentally both were predicted as Dogue_de_bordeaux

Reflection

At the start, my objective was to create a CNN with 90% testing accuracy. Our final model obtained 89.8% testing accuracy.

There are a few breeds that are virtually identical and are sub-breeds. There’s also a possibility of some images being either blurred or having too much noise. There’s also a possibility of enhancing the quality by additional image manipulation.

Following the above areas, I’m sure we could increase the testing accuracy of the model to above 90%.

A simple web application in Flask could be built to leverage the model to predict breeds through user-input images.

Fixing python import error “No module named appengine”

Maanav Shah — Fri, 22 Nov 2019 14:55:59 GMT

Have you ever tried importing any package from Google App Engine library?

If it throws an import error, it means that either the App Engine SDK is not installed, or at least the Python run-time cannot find it.

For example,

You can check if the google __path__ is correctly linking to the App Engine SDK. For example, in my case, it is pointing to site-packages in the virtual environment.

You can execute the following steps in order to solve the ImportError:

Install App Engine SDK
Ensure App Engine is correctly installed on your system. Read and follow the instructions here:
SDK for App Engine
Install Google pip packages
You should have the google-api-core package installed. You can install it using the following command:
pip install google-api-core
Configure Google __path__ in python shell
Following snippet can configure the google.__path__ to App Engine SDK:

import google
import sys

gae_dir = google.__path__.append('/path/to/appengine_sdk/google_appengine/google')
sys.path.insert(0, gae_dir) # might not be necessary

import google.appengine # now it's on your import path`

For example, you can see the google __path__ correctly points to App Engine SDK, and we can also import the library.

That’s it. Hope this helps.

If you enjoyed this post, I’d be very grateful if you’d help it spread by emailing it to a friend or sharing it on Twitter or Facebook. Thank you!

Uploading multiple files to Google Cloud Storage with Python

Maanav Shah — Mon, 19 Aug 2019 14:30:54 GMT

You can use Google Cloud Storage in your Google App Engine applications to upload, store and serve files. These files can directly be uploaded to Google Cloud Storage.

Let’s see how to implement this in a simple Python and AngularJS application.

The steps are pretty easy as is the code.

Activate the Google Cloud Storage and install the client library

Activate Google Cloud Storage for your project by selecting the option in the Google Developers Console.
You can have a look at this introduction for a list of steps.
At the end of the process, you will have a bucket for your project. The bucket is a sort of virtual folder in GCS and is the place where the uploaded files will be stored and read.
You also need to install the python Storage library in your project, the library contains the module that you will import in your backend code to work with GCS. For this, you need to set up a lib folder in your Google cloud project using:
https://cloud.google.com/appengine/docs/standard/python/tools/using-libraries-python-27

The following command allows us to install the storage and authentication dependencies in the lib folder.

pip install -t lib google-cloud-storage oauth2client GoogleAppEngineCloudStorageClient

You also need to provide the correct access to the storage bucket. These will come under roles/storage. You can check out the storage legacy roles and provide correct access to the bucket.

Create the upload page

In my code, I created an HTML page to upload the files. The page includes a simple select button to select the multiple files to upload and some extra-code to show the upload progress and result.


   Upload Files 

  
    
    
    
      

      {{ upload_status }}

The AngularJS controller (UploadCtrl) for the page is in controller.js. The controller implements a callback function for the select button using the Upload object.

What is important here is to specify the URL that will process the uploaded files — in my case api/file/upload — this should contain the backend code that responds to the POST requests containing the image data, and saves them in the Google Cloud Storage.

var uploadModule = angular.module("uploadModule", []);

uploadModule.controller("uploadController", ["$scope", "$http", "$window", function ($scope, $http, $window) {

$scope.upload_status = "STATUS: Please select files. ";

$scope.upload_file = function (e) {
    $scope.upload_status = "STATUS: Uploading ...";

var formdata = new FormData(); //FormData object
    var fileInput = document.getElementById('files');

var selectedFiles = fileInput.files.length;
    if (selectedFiles < 1) {
      alert('Please select files!')
    } else {
      //Iterating through each files selected in fileInput
      for (i = 0; i < fileInput.files.length; i++) {
        console.log(fileInput.files[i].name)
        
        //Appending each file to FormData object
        formdata.append('files[]', fileInput.files[i], fileInput.files[i].name);
    }

      // Creating an XMLHttpRequest and sending
      var xhr = new XMLHttpRequest();
      var url = encodeURI("/api/file/upload/");
      xhr.open('POST', url);
      xhr.send(formdata);
      xhr.onreadystatechange = function (one) {
        if (xhr.status == 200) {
          console.log("Success");
          $scope.upload_status = "Status: Upload successful.";
          $scope.$apply();
        } else {
          console.log("Error");
          $scope.upload_status = "Status: Error while uploading.";
          $scope.$apply();
        }
      }
    }
  };
}]);

Create the service account for external page authentication

We need to create a service account which we will use for external authentication. This allows us to remove the dependency on google auth by simply providing the path to the JSON file. You can use this link to do so https://cloud.google.com/video-intelligence/docs/common/auth. This will help us in setting up a service account for external page access.

Create the backend code for storing the files

Now, we create the python code for the backend that receives uploaded files and stores them in the Google Cloud Storage.

Change your app.yaml to create an endpoint for the upload page created in step 2. In my code this is api/image/upload and as we saw, this is the URL used by the upload function in the controller.

Then, define a request handler for a POST request to that address (I use webapp2 as a web application, but the same concepts apply also to Django or Flask):

We need to import the following libraries.

from google.cloud import storage
from google.oauth2 import service_account

Following is the FileUpload class which will upload all the files to Google Cloud Storage.

class FileUpload(webapp2.RequestHandler):
    '''Handles Upload requests.'''
    def post(self):
        response = {}
        try:
            files = self.request.POST
            file_path = os.path.join(os.path.dirname(__file__),
                                     'path_to_service_account.json')
            credentials = service_account.Credentials.from_service_account_file(
                file_path)
            storage_client = storage.Client(
                credentials=credentials, project='project-name')
            bucket = storage_client.get_bucket('bucket-name')

            for file in files.values():
                filename = file.filename
                file_blob = bucket.blob(filename)
                file_blob.upload_from_file(file.file)
            response['success'] = True
        except Exception as ex:
            logging.error('Error while uploading upi logs: %s', ex)
            response['message'] = str(ex)
            response['success'] = False
        self.response.content_type = "application/json"
        self.response.write(json.dumps(response))

Setting up Sockets API

But, as we are going to allow this page to be accessed externally, without the Google Authentication(Google OAuth2) we need to set up the sockets API. This enables us to remove the dependency on google accounts and google auth. We can simply use a separate authentication library or create one of our own if need be.

We need to use the Sockets API for this purpose, which consists of adding the following code to your app.yaml file:

env_variables:
  GAE_USE_SOCKETS_HTTPLIB : 'true'

libraries:
- name: ssl
  version: latest

We need to add this as its a problem of urllib3 with Google App Engine. As you can see in this link. Once you added these lines, it will allow the access of the page directly using Sockets API.
You have the official google Sockets Python API docs here.

After that, you can just redeploy and connections should work.

That’s it. Hope this helps.

If you enjoyed this post, I’d be very grateful if you’d help it spread by emailing it to a friend or sharing it on Twitter or Facebook. Thank you!

Uploading multiple files to Google Cloud Storage with Python was originally published in The Startup on Medium, where people are continuing the conversation by highlighting and responding to this story.

Change the default MySQL Data Directory in Linux

Maanav Shah — Thu, 01 Aug 2019 08:20:43 GMT

After installing MySQL database for a production server, we may want to change the default data directory of MySQL to a different directory. This is the case when such directory is expected to grow due to high usage. Otherwise, the filesystem where /var is stored may collapse at one point causing the entire system to fail. Another scenario where changing the default directory is when we have a dedicated network share that we want to use to store our actual data. MySQL uses /var/lib/mysql directory as default data directory for Linux based systems.

In order to change the default directory, we need to check the available storage. We can use the df command to discover drive space on Linux. The output of df -H will report how much space is used, available, the percentage used, and the mount point of every disk attached to your system.

We are going to assume that our new data directory is /mnt/mysql-data. It is important to note that this directory should be owned by mysql:mysql.

mkdir -p /home/mysql-data

For simplicity, I’ve divided the procedure into 4 simple steps.

Step 1: Identify Current MySQL Data Directory

To identify the current data directory use the following command.

mysql -u username -p -e “SELECT @@datadir”

We need to identify the current MySQL data directory as it can be changed in the past. Let’s assume the current data directory is /var/lib/mysql

Step 2: Copy MySQL Data Directory to the desired location

To avoid data corruption, stop the service if it is currently running before proceeding and check the status.

service mysqld stop
service mysqld status

Then copy recursively the contents of /var/lib/mysql to /mnt/mysql-datapreserving original permissions and timestamps:

cp -rap /var/lib/mysql/* /mnt/mysql-data

Change the permission of the directory as its owner should be mysql:mysql. We can use the following command to change the ownership of the directory:

chown -R mysql:mysql /mnt/mysql-data

Step 3: Configure the new MySQL Data Directory

Edit the MySQL default configuration file /etc/my.cnf and update values of mysqld and client.

# Change From:[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock# Change To:[mysqld]
datadir=/mnt/mysql-data/mysql
socket=/mnt/mysql-data/mysql/mysql.sock

If there is no client variable then add, or else, update it to:

[client]
port=3306
socket=/mnt/mysql-data/mysql.sock

Step 4: Enable the MySQL Service and confirm the directory change

Restart the MySQL service using the following command:

service mysqld start

Now, use the same command to verify the location change of the new data directory:

mysql -u username -p -e “SELECT @@datadir”

If you face any issue during MySQL startup check MySQL log file /var/log/mysqld.log for any errors.

That’s it. Hope this helps.

If you enjoyed this post, I’d be very grateful if you’d help it spread by emailing it to a friend or sharing it on Twitter or Facebook. Thank you!