Password Strength Classifier using ML Algorithms

Anjali Agarwal
5 min readJul 24, 2020

In this article, I will walk through the steps to build a Password Strength Classifier which will be able to classify the strength of the provided passwords by applying some Machine Learning algorithms. I achieved an accuracy of 94% using Random Forest which was so far the best value compared to other ML algorithms I applied. There are three categories in which we will try to classify which are as follows :

  • Strong
  • Moderate
  • Weak

If you deploy this project on Heroku, which is a platform as a service (PaaS) that enables developers to build, run, and operate applications entirely in the cloud , then you will get the following result. To know how to deploy your projects on Heroku platform, you can visit this site : https://devcenter.heroku.com/articles/getting-started-with-python

Password Strength?

It is a numerical measure which analyzes the effectiveness of the passwords, in other words, it is an indicator which depicts the intensity of how hackable a password is on the basis of its length and characters(alphabets, numbers, special characters). There are 96 characters altogether including every keyboard symbols.

Required Libraries

Flask==1.1.1
gunicorn==19.9.0
itsdangerous==1.1.0
Jinja2==2.10.1
MarkupSafe==1.1.1
Werkzeug==0.15.5
numpy>=1.9.2
scipy>=0.15.1
scikit-learn>=0.18
matplotlib>=1.4.3
pandas>=0.19
seaborn==0.10.0

Lets Start

We will use two Machine Learning algorithms to predict the strength of the password. The Dataset consists of approximately 6,69,880 passwords which is reduced to 6,69,639 while cleaning the data which further we will split into test (20%) and training (80%) set using an open-source Python library, Scikit-learn.

Figure : Train -Test Split

We will be applying Tf-idf Vectorizer to tokenize each and every character which will be used to depict the strength instead of using the entire password. To improve the code, you can also use metrics such as Length of the password and number of special characters, digits, upper case and lower case alphabets.

Figure : Tf-idf

Down below is the output of the custom Tokenizer

Figure : Output

Logistic Regression

Now, we will apply Logistic Regression algorithm which is a supervised learning classification algorithm used to predict the probability of a target variable(Here, password). The accuracy I achieved is 81%.

Figure : Logistic Regression

Random Forest

Lets try using another Machine Learning algorithm, which is Random forests or random decision forests which is an ensemble learning method for classification. You can see that accuracy achieved using Random Forest Classifier is way better than Logistic Regression. You can also apply other Machine Learning algorithms and compare the results.

Figure : Random Forest Classifier

You can summarize the performance of the above model by plotting the Confusion Matrix for multi-class Classification. It will be useful for quickly calculating precision and recall given the predicted labels from a model.

To know more about Confusion Matrix, go visit this site: https://machinelearningmastery.com/confusion-matrix-machine-learning/

Figure : Confusion Matrix

Now, save your model in a pickle file. You can use the pickle operation to serialize your machine learning algorithms and save the serialized format to a file. Later you can load this file to deserialize your model and use it to make new predictions.

Figure : Pickle

To build a web application, you need to use Flask which is a web framework that allows you to build a web application. Open up app.py in your favorite editor and add the following code. After creating all the files. You can run your project in your local server.Run the app:

$ python app.py

To learn more about it, you can go visit this site : https://pypi.org/project/Flask/

And you should see your basic “Hello world” app in action on http://localhost:5000/. Kill the server when done. You can also deploy the same on Heroku platform.

The Complete code of this project is in following GitHub location. Please do this repository, if it helped you.

CONCLUSION

In this article, I have explained to build a Password Strength Classifier which is a multi class classification . Using different Machine Learning algorithms, you can experiment and try to achieve a more effective model. We have already discussed above, few improvements which can be done. Pull the code and try it out yourself.

Please click the 👏 button if you liked it and share to help others find it. Stay tuned to check out more interesting blogs related to Machine Learning and Computer Vision ❤️

Follow me on LinkedIn -https://in.linkedin.com/in/anjaliagarwal98. To know more about me, go check out my Website. -anjaliagarwal.tech 😄

--

--

Anjali Agarwal

A Data Science enthusiast who loves Coding and Mathematics. Currently pursuing MCA from Amity University.