We will use the Transformers and bert-extractive-summarizer library to fetch the pre-trained model and then summarize the Reddit comment section.

Image for post
Image for post
Image by Simona Sergi

Reddit users spend an average of 11 minutes on social media Reddit. On an average of 3 minutes, they read the same stuff.

Also, no one read the whole comment section of popular posts. Users only read popular comments on popular posts.

Also, most of the comments will not appear on your dashboard. You may get to know the trending topics, but you miss not trending topics. In trending topics, you might only read the best or hot posts and their comments.

So, what are you going to do to avoid wastage of time on Reddit?

We will discuss spacy-langdetect, Pycld2, TextBlob, and Googletrans for language detection

Image for post
Image for post
Photo by e on Unsplash

Are you sure that the input text data for your model is in English? Well, no one can be sure about this, as no one will read around 20k records of text data.

So, how non-English text will affect your English text trained model?

Pick any non-English text and pass it through as input to your English text trained classification model. You will come to know that the category is assigned to non-English text by the model.

But why?

The job of the text classification model is to classify. And, it will do its job despite its input text will be in English or not. …

Data Science

Learn to build a clean and interesting Bar Graph using various Matplotlib functionality

Image for post
Image for post
Source : Brett Zeck

The bar graph is a widely used chart in data science. Charts help you to connect the data with your stakeholders, managers, or audience. Charts speak a story about your results. Your graph should not look messy. It should be clearly visible and easy to understand.

Bar charts popularly represent data that has multiple categories. Create a bar graph in such a way that it creates a meaningful picture of data in your audience’s mind.

In this article, we discuss the best practices to create a highly effective bar graph. Bar graph represents time-series data, ranking, count of different categories, distribution of data, and deviation in data.

We implement moving averages, rank items, cumulative sums with aggregate function sum, average, min, and max

Image for post
Image for post
Naven Krcmarek : unsplash

SQL analytic functions are used to summarize the large dataset into a simple report. The Data summary produces by these functions can be easily visualized. These functions help a data analytics professional to analyze complex data with ease.

In this article, we combine pandas aggregate and analytics functions to implement SQL analytic functions.

There are many categories of SQL analytics functions. And we will go through these functions one by one. But first, let’s know about the data we use in this article.

Retail Dataset

We will be using Kaggle dataset. Here is an explanation of each column of the dataset.

This dataset has a sales date from 2010–02–05 to 2012–11–01. …

Find the pattern in data, data discovery, data cleaning using Measure of Central Datapoint and Measure of Dispersion.

Image for post
Image for post
Image by Author

The first step in a data science project is to summarize, describe, and visualize the data. Try to know about different aspects of data and its attributes. Best models are created by those who understand their data.

Explore the features of data and its attributes using Descriptive Statistics. Insights and numerical Summary you get from Descriptive Statistics help you to better understand or be in a position to handle the data more efficiently for machine learning tasks.

So, in this article, I will explain the attributes of the dataset using Descriptive Statistics. …

We will discuss problems related to infrastructure, teachers, education policy, and reservation systems in the Indian Education System.

Image for post
Image for post
Source: Jaikishan Patel (Unsplash)

Children are the future of the world. We try our best to provide every necessary thing for our future. That also includes education.

Children’s education is a top priority for governments. But still, we are lagging in this field. The governments are facing numerous challenges in education. India is one such country.

In this article, I will discuss on 4 main problems that exist in the Indian education system.

Challenges faced by Teachers

Whom do you consider the backbone of the Indian Education System (IES)? School, University, tuitions, coaching centers, Gross Domestic Product (GDP), or Teachers. I would say, Teachers. …

Exploring Reuters Articles with the help of K-Means, N-gram, Tf-IDF, bar graph, word clouds, NER etc. methods

Stuck behind the paywall? Read this article with my friend link here.

Image for post
Image for post
Source : Jaredd Craig — Unsplash

What will you do when I ask you to explain textual data? What steps will you take to build the textual visualization story? Here, I am not going to explain how you create a visualization story.

But this article will help you to get the required information to build the visualization story and explain the textual data.

Insights from textual data will help us to discover the connection between the articles. It will detect trends and patterns. …


Learn, why data stored in different sources, and how you retrieve them using python?

Stuck behind the paywall? Read this article with my friend link here.

Image for post
Image for post

Did you know that in 2020 around 147 GB of data is generated per day? And, we have already stored around 40 trillion GB of data until now. All these stored data are not even the same. Data types like text or numbers have different formats. That explains why we have different types of data sources.

When you are working with data, you should know how to ingest the data from different sources. …

Extract person name, location and organisation information from Subreddit r/Wordnews using Pre-Trained BERT Model

Stuck behind the paywall? Read this article with my Friend link here.

Image for post
Image for post
Source: Luis Villasmil from Unsplash

Named-Entity recognition (NER) is a process to extract information from an Unstructured Text. Its also known as Entity Extraction. This method extracts information such as time, place, currency, organizations, medical codes, person names, etc. We can mark these extracted entities as tags to articles/documents.

But, what do we achieve by extracting the entity from the text? Do these tags help us to reduce time in the article’s searching process?

Tags on the articles or documents can save a lot of time by improving the search process. Tags help us to categorize text documents. …

We will build a sentiment analysis model using a pre-trained RoBERTa model to discover sentiment of Reddit subgroup.

Stuck behind the paywall? Read this article with my friend link here.

Image for post
Image for post
Source : jeffrey grospe (unsplash)

How do you feel when you log in to social media accounts and read the opening post? Is it put a smile on your face or make you sad or angry? I have a mixed experience. Most of the time, social media posts make me happy. How? Well, we can’t control what other people post, but we can control what we want to see on our social media accounts.

If you joined a group having high negative comments, then you will read those comments more often. That makes you angry and sad. …


Manmohan Singh

Searching clue in DATA | Linkedin: https://www.linkedin.com/in/manmohan-singh-9570758a/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store