Image for post
Image for post

It is widely known that data is highly valuable and crucial for the decision making process. At the same time, handling and deriving values from data requires a deep skill-set and expertise. Digital talent has been scarce, as always. This condition is even more challenging in the government context, where attracting tech talent is not as straightforward as prominent technology companies. This article will demonstrate several concrete strategies and initiatives from Jakarta Smart City to attract and nurture digital talents, including university students and fresh graduates. Lastly, I personally have a deep passion for education (e.g. giving lectures, advising curriculum updates, recommending education policies). …


Image for post
Image for post

A telecommunications company was losing customers (churn rate was 49.9%) and wanted to identify why customers were leaving them. Using data-driven Artificial Intelligence (AI), the key reasons for customers leaving the business (churn) was identified and a proactive retention campaign was developed to prevent customers from leaving the business.

Example: Churn Classification

The Logistic Regression Model was used to build a data-driven churn model. The dataset had 2404 customers and twenty-three predictor variables were identified as relevant customer data to investigate to determine the drivers of churn and which customers were likely to churn.

1. First, we import the python libraries for training our churn model. …


Image for post
Image for post

On November 4, California passed Proposition 24, also known as the California Privacy Rights Act. The law will expand the California Consumer Protection Act (CCPA), which took effect in January. California, which was already a leader in online privacy, will now feature some of the world’s tightest data security regulations.

Unlike Europe with the General Data Protection Regulation (GDPR), the U.S. has taken relatively few steps to regulate online privacy protection. California has been a notable exception, leading the way by passing the CCPA in 2018. …


Image for post
Image for post

The AI area of Natural Language Processing, or NLP, throughout its gigantic language models — yes, GPT-3, I’m watching you — presents what it’s perceived as a revolution in machines’ capabilities to perform the most distinct language tasks.

Due to that, the perception of the public as a whole is split: some perceive that these new language models are going to pave the way to a Skynet type of technology, while others dismiss them as hype-fueled technologies that will live in dusty shelves, or HDD drives, in little to no time.

[Free download: Natural Language Processing Guide: 30 Free ODSC Resources to Learn…


Image for post
Image for post

The temporal difference learning algorithm was introduced by Richard S. Sutton in 1988. The reason the temporal difference learning method became popular was that it combined the advantages of dynamic programming and the Monte Carlo method. But what are those advantages?

This article is an excerpt from the book Deep Reinforcement Learning with Python, Second Edition by Sudharsan Ravichandiran — a comprehensive guide for beginners to become proficient in implementing state-of-the-art RL and deep RL algorithms.

Let’s quickly recap the advantages and disadvantages of DP and the MC method.

Dynamic programming — the advantage of the DP method is that it uses the Bellman equation to compute the value of a state. That is, we have learned that according to the Bellman equation, the value of a state can be obtained as the sum of the immediate reward and the discounted value of the next state. This is called bootstrapping. That is, to compute the value of a state, we don’t have to wait till the end of the episode, instead, using the Bellman equation, we can estimate the value of a state just based on the value of the next state, and this is called bootstrapping. …


Image for post
Image for post

2020 has been a unique year for public health, professional life, the economy, and just about every other aspect of daily life. While some doors are closing, and others are pivoting their business models, businesses that haven’t taken a hit are a rare breed. Despite this, there are some sectors that are thriving, and it’s not just virtual conferencing or healthcare.

Natural Language Processing (NLP) is one of those areas. In fact, the NLP market size is expected to grow from $10.2 billion in 2019 to $26.4 billion by 2024, according to research from MarketsandMarkets™. With use cases in assisting patients and practitioners in a healthcare setting, easing customer service queries, or even virtual assistance to help shoppers, there are several growth factors driving this uptick in NLP technology. …


Image for post
Image for post

In the wake of COVID-19, unprecedented acceleration in digital adoption has significantly re-shaped consumer behavior. The world is experiencing a decade’s worth of digital acceleration over the last few months, leading to significant changes in the way customers interact with brands and how brands interact with their customers.

Companies are realizing the importance of enhanced customer interactions and personalization to increase engagement and revenue. In fact, based on a survey conducted by Business2Community, 74% of online shoppers get frustrated when content is not relevant to their interests. …


Image for post
Image for post

Netflix is not only one of the most recognized names in the world, but it’s also one of the most recognized names in data science and Dr. Becky Tucker, data scientist at Netflix, knows all too well about the power of data and what it can tell us. Whether it be recommendations for our next big binge-watch, or how people will most likely react in a crisis, data is one of the most valuable tools available when developing strategy, making decisions, and deciphering your next move.

In times like these, uncertainty and instability is everywhere. It seems as though fear, worry, and despair have taken the reigns and rationality and sound judgment are nowhere to be found. But this is where data comes in. What can we learn in these unstable environments and how can we pivot according to what these data models are telling us? …


Image for post
Image for post

We invite you to learn more about the powerful, open-source HPCC Systems. Our comprehensive, dedicated data lake platform makes combining different types of data easier and faster than competing platforms — even data stored in massive, mixed schema data lakes — and it scales very quickly as your data needs grow.

[Related article: Gain Insight Into the COVID-19 Pandemic with the HPCC Systems Data Lake Platform]

HPCC Systems is a mature platform that has been heavily used in commercial applications for almost two decades, predating the development of Hadoop. Created by LexisNexis Risk Solutions, an innovative pioneer in big data processing, and open source for nearly a decade now, HPCC Systems features a vibrant development community that continues to push the boundaries of big data. This powerful, versatile platform makes it easier for developers to see the data they’re working with and manipulate it as needed. Flexible information delivery makes it easier for your clients to query and find the data they need — and it runs analysis and queries faster than other platforms such as SQL or Hadoop. …


Image for post
Image for post

Analyzing and classifying data is often tedious work for many data scientists when there are massive amounts of data. It even consumes most of their time and decreases their efficiency. Data scientists need to be smart, use cutting edge technologies, take calculated risks, and find out meaningful insights via supervised learning use cases that can discover opportunities to expand the business and maximize profits.

Data scientists & machine learning engineers rely upon supervised, unsupervised, and reinforcement learning. These methods give the best results in less time for classifying and analyzing data.

Introduction

Supervised learning is the process of training an algorithm to map an input to a specific output. In this method, developers select the kind of information to feed within the algorithms to get the desired results. The algorithms get both inputs & outputs. Then the next step is creating rules that map the inputs with outputs. The training process continues until the highest level of performance is achievable. …

About

ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store