EDA on Google Africa Developer Scholarship phase-one 2020…

Image for post
Image for post
img from pluralsight

Google has a mission for Africans… Let’s hear what Andela, the principal organizing partner for this scholarship program has to say…

In line with Google’s goal of training 100,000 Africans, we have, over the last 3 years, executed 5 programs in partnership with Google, Udacity, and Pluralsight, training a cumulative of 60,000+ learners across 54 Countries in Africa.

This is indeed a highly commendable program from a company that needs no introduction and for me it gets personal. The first GADS program some 3 years ago, was my first serious attempt to coding and building applications. …


Data Science

Exploring Sentiments, Key-Phrase-Extraction, and Inferences …

Image for post
Image for post
image from rev.com

2020 has been one ‘hell-of-a-year’, and we’re about the eleventh month.

It’s that time again for Americans to take to the polls.

If you’ve lived long enough, you recognize the patterns…

Each opposing political side, shades the other, scandals and leaks may pop, shortcomings are magnified, critics make the news, promises are doled out ‘rather-convincingly’ and there’s an overwhelming sense of ‘nationality and togetherness’ touted by both sides…

But for the most part, we’re not buying the BS! And often, we simply choose the ‘lesser of the two evils’, because candidly the one is not significantly better than the other.

So today, I’m going to analyze the presidential debates of President Trump and Vice-President…


Hard-coding the most popular text-embedding Algorithm…

Image for post
Image for post

Term Frequency-Inverse Document Frequency is a numerical statistic that is intended to reflect how important a word is to a document, in a collection or corpus.

Simply put, TF-IDF shows the relative importance of a word or words to a document, given a collection of documents.

Note that before we can do text-classification, the text must be translated into some form of numerical representation, a process known as text-embedding. The resulting numerical representation which is usually in the form of vectors can then be used as input to a wide range of classification models.

TF-IDF is the most popular approach to embed texts into numerical vectors for modelling, information retrieval and text-mining. …


Data Science

Proof that Testing Accuracy is Simply Not Enough…

Image for post
Image for post
Source: Pixabay

Intro:

Probability is the cornerstone of Artificial Intelligence. The management of uncertainty is key to many applications of AI, such as machine learning, filtering, robotics, computer vision, NLP, search and so on.

Probability is the machinery through which we manage uncertainties…

And no other sector is the management of uncertainty as crucial as it is in the health sector.

  • Imagine being allergic to a specific drug, but an allergy test falsely indicates you're nonallergic (false negative).
  • Or imagine feeling sick and tested for HIV, only to be found positive, while you actually don’t have HIV (false positive).

At first glance, the false-negative seems more devastating. Of course, a false allergy test-result has the likely outcome of a GP administering a drug to you that could cause life-threatening issues. …


Data Mining

Hard-coding the most popular text-embedding algorithm…

Image for post
Image for post

Term Frequency-Inverse Document Frequency is a numerical statistic that is intended to reflect how important a word is to a document, in a collection or corpus.

Simply put, TF-IDF shows the relative importance of a word or words to a document, given a collection of documents.

Note that before we can do text-classification, the text must be translated into some form of numerical representation, a process known as text-embedding. The resulting numerical representation which is usually in the form of vectors can then be used as input to a wide range of classification models.

TF-IDF is the most popular approach to embed texts into numerical vectors for modeling, information retrieval and text-mining. …


Machine Learning

A huge, Self-Learning Algorithm usually performs much better…

Image for post
Image for post
Img_Credit

Huge Self-Supervised Models are Strong Semi-Supervised learners..

Table of Contents:

  1. Introduction
  2. Key insight
  3. Results
  4. Why it matters
  5. I’m thinking
Image for post
Image for post
The era of Computer Vision is upon us… | Img_Credit

Introduction:

The long-standing problem in computer vision, where models find it hard to learn on a few labeled examples while making use of large amounts of unlabelled data for training, may be coming to an end.

The SimCLR framework
Researchers at Google Research, Brain team, comprising Geoffrey Hinton, Ting Chen, and a few others built the SimCLR Framework. SimCLR is a simple framework for contrastive learning of visual representations. …


EDA with Stacked and Unstacked Histograms

Image for post
Image for post
img-credit

Hello and welcome to Part Five of this mini-series on data visualization with the most popular Python visualization library called matplotlib.

The goal is to take you from beginner to expert in data visualization via matplotlib,without unnecessary details that you don’t need to know.

They say a picture is worth a thousand words, but when it comes to Data, a chart is worth a thousand lines…

This is a beginner-friendly roadmap that is designed for everyone interested in data visualization. The only requirement is basic programming experience with Python and some interaction with pandas or numpy.

In part one, we explored the matplotlib architecture, created plots with the three layers and 26 different plot styles. In part two we explored the matplotlib-pandas synergy via the plot() function. In part three we went deeper into intermediate pandas for data visualization. In part four we went deep into the most common plots:- Line and Area plots. We explored stacked and unstacked Area plots and played with colour-maps and the 148 colours of matplotlib.


Cloud Computing, Machine Learning

Understanding how to choose the best ML models

A common question often asked in Data Science is:-

Which machine learning algorithm should I use?

While there is no Magic-Algorithm that solves all business problems with zero errors, the algorithm you select should depend on two distinct parts of your Data Science scenario…

Image for post
Image for post
img_credit
  1. What do you want to do with your data?: Specifically, what is the business question you want to answer by learning from your past data?
  2. What are the requirements of your Data Science scenario?: Specifically, what is the accuracy, training time, linearity, number of parameters, and number of features your solution supports?

The ML Algorithm cheat sheet helps you choose the best machine learning algorithm for your predictive analytics solution. Your decision is driven by both the nature of your data and the goal you want to achieve with your data. …


Starting EDA with impactful Visualizations…

Image for post
Image for post
img_credit

Hello and welcome to Part Four of this mini-series on data visualization with the most popular Python visualization library called matplotlib.

The goal is to take you from beginner to expert in data visualization via matplotlib,without unnecessary details that you don’t need to know.

They say a picture is worth a thousand words, but when it comes to Data, a chart is worth a thousand lines…

This is a beginner-friendly roadmap that is designed for everyone interested in data visualization. The only requirement is basic programming experience with Python and some interaction with pandas or numpy.

In part one, we explored the matplotlib architecture, created plots with the three layers and 26 different plot styles. In part two we explored the matplotlib-pandas synergy via the plot() function. In part three we went deeper into intermediate pandas for data visualization. …


Artificial Intelligence, Opinion

A few minutes exploring AI…

Artificial Intelligence (AI) is such a buzz word these days and one thing about buzz words is… ‘They often get lost in translation’.

Image for post
Image for post
img_credit

Ask any Data Scientist (including yours truly) about AI, and you’re likely to hear Machine Learning (ML) algorithms or Deep learning (DL) and its fantastic applications, such as in AlphaGo… Where the Neural network learned through reinforcement learning, defeated the Go world champion, making AlphaGo arguably the strongest Go player in history… These are all applicable responses.

But I think it’s time we all take a deep breath, exhale, pause… And realize that AI is a well-founded discipline in its own right. …

About

Lawrence Alaso Krukrubo

Fair and Explainable AI | Data Science | Machine Learning | Deep Learning| Writer@towardsAI| AI-Nanodegree

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store