Happiness is not pleasure. Happiness is the expansion of possibilities — Scott Young
I often wonder what is one thing I am most passionate about, I am aware I like fitness, sports, e-commerce, and advertising space but I cannot come up with one answer. Is it because I know I am not good at it or am I scared to actually make a career out of it? I thought completing grad school and finding a job will help me to find my passion, it has definitely made me happy but I am far from finding my passion. I tried digging…
In 2019, there were a total of 409 natural disasters worldwide. The irony is that we are right now in the middle of a global pandemic due to Covid19. During a disaster or following the disaster, millions of people communicate either directly or via social media to get some help from the government or disaster relief and recovery services. If the affected person is tweeting it or even sending a message to the helpline service chances are that the message will be lost in the thousands of messages received. …
In this project, a mail-order sales company in Germany is interested in identifying segments of the general population to target with their marketing to grow. Demographics information has been provided (by Arvato Finacial Solutions through Udacity) for both the general population at large as well as for prior customers of the mail-order company to build a model of the customer base of the company. The target dataset contains demographics information for targets of a mailout marketing campaign.
The objective is to identify which individuals are most likely to respond to the campaign and become customers of the mail-order company.
It is said that Data Scientist spends 80% of their time in preprocessing the data, so lets deep dive into the data preprocessing pipeline also known as ETL pipeline and let's find out which stage takes the most time. In this blog post, we will learn how to extract data from different data sources. Let's take a real-life dataset so it’s easier to follow.
This lesson uses data from the World Bank. The data comes from two sources:
The buzz for Data Science in 2020 is like Tesla stock; it keeps on increasing every day. The field is hot to such an extent that everyone from mechanical engineers to doctors wants to be a data scientist. However, how would you break into Data Science? Join a DS Bootcamp? Do two or three MOOCs? Compete in Kaggle competitions? The rundown is endless. I am not refuting the advantages of MOOCs or even Kaggle competitions, they are incredible spots to learn Data Science.
However, the issue is everyone is doing it! How frequently have we seen some post about their…
Wage analysis is a process of comparing the salaries based on the attributes attached to the employee. Of course, there are several factors like the company, location which contributes to the wage. However, we will analyze the Mid-Atlantic wage dataset, which is available here.
For execution reason, I have utilized PySpark and Apache Spark Docker Jupyter Notebook, and you can utilize python and scikit or some other bundles.
We should peruse our information and perceive what it looks like:
To understand the key topics of text summarization, I highly recommend you read Text Summarization-Key Concepts.
Have you at any point condensed an extensive record into a short passage? To what extent did you take? Physically producing an outline can be tedious and repetitive. Programmed content synopsis guarantees to defeat such challenges and enable you to create the key thoughts in a bit of composing eﬀectively. Or have you ever tried the portable application Inshorts? It’s an imaginative news application that changes over news articles into a 60-word rundown. What’s more, that is actually what we will realize in this project — Text Summarization. Text summarization is the technique for generating a concise and precise summary…
In this blog post, we are going to develop an SMS spam detector using logistic regression and pySpark. We will predict whether an SMS text is spam or not. This was one of the first use cases of data science and is still widely used to filter emails.
Dataset: Text file can be downloaded from here. This is what our dataset looks like:
Some statistical models 𝑓(𝑥) are learned by optimizing a loss function 𝐿(Θ) that depends on a set of parameters Θ. There are several ways of finding the optimal Θ for the loss function, one of which is to iteratively update following the gradient:
To then, compute the update: