Learning DATA SCIENCE

Alex Souza
blog do zouza
Published in
9 min readJan 21, 2022

This material presents a set of POSTs related to Machine Learning, where we will start by studying: concepts, scenarios and predictions for Artificial Intelligence (AI), in addition to some basic concepts of Statistics. Next, we will show you some tools that help us on a daily basis when we work with Machine Learning . We will also see a brief description of Data and Big Data, passing through Non-Relational Databases (NOSQL).

After this base, we will enter the main subject: Machine Learning , where several materials will be presented detailing algorithms, techniques, libraries ( librarys ) and etc, we will give a greater focus to Classification Algorithms and Natural Language Processing (NLP) and last but not least importantly, the Metrics that can be applied.

Then, we will talk about Data Science , this area that has been growing and tends to grow more and more… explanation of what it is, what a Data Scientist does , tools used, a POST with several videos that show the techniques and tools used in the daily life of a Data Scientist.

In the end, we will see some applications that use Machine Learning .

That’s it, I hope you like the compilation of posts, and it can help in some way in your studies! This post will be constantly updated and I count on everyone’s feedback so that we can improve this material even more. If you want to suggest articles, you can suggest that I add them here, the idea is to be a source of studies.

BEFORE YOU BEGIN

Some observations regarding professionals in the area:

  • curious
  • Love to learn (always learning)
  • Problem Solver (This point is very important, everything we are going to do is to solve some situation, some problem… so focusing on this point is crucial. It all starts with understanding the problem, that is, if you know what you really need to do to solve a problem or situation is an excellent start.)

DATA and BIG DATA

Data Concepts, where to find data sources, Big Data, NOSQL and also SQL.

DATA
Data are codes that constitute the raw material of information , that is, it is untreated information. Data represent one or more meanings that alone cannot convey a message or represent some knowledge.

DATA SOURCES, WHERE TO FIND IT?

HOW TO BECOME A COMPANY DRIVED BY FACTS AND DATA

MCKINSEY: COMPETING IN A DATA-DRIVEN WORLD

DATA-DRIVEN MUST BE A CULTURE AND NOT A PROJECT.

IN 60 SECONDS (which is done in 60 seconds on the Internet, with a historical basis since 2016)

DATA and AI Landscape

BIG DATA — INFOGRAPHIC

KNOW THE POWER OF BIG DATA — INFOGRAPHIC

EXTRACTING BUSINESS VALUE FROM THE 4 V’S OF BIG DATA

BIG DATA HOW TO TRANSFORM A DATABASE INTO STRATEGY

NOSQL Non
-Relational Databases

MYSQL JSON DOCUMENT STORE

“SQL — Structured Query Language ”, that is, Structured Query Language. It was the way found so that communication with a database could be done in an uncomplicated, agile way that could be easily understood and learned by developers.

MACHINE LEARNING

Everything you need to know to get started in Machine Learning, check it out…

A LITTLE HISTORY… A Brief History of Machine Learning

WHAT YOU NEED TO KNOW ABOUT MACHINE LEARNING…

HOW MACHINE LEARNING EVOLVED OVER THE PERIOD

DIFFERENCE BETWEEN DATA MINING AND MACHINE LEARNING

FUNDAMENTALS OF MACHINE LEARNING ALGORITHMS (WITH PYTHON ER CODES)

YOUR FIRST MACHINE LEARNING PROJECT IN PYTHON (STEP BY STEP)

Machine Learning Tutorial with the Titanic Mario
Filho dataset — Video Sequence

MACHINE LEARNING YEARNING (Excellent book)

14 DIFFERENT TYPES OF LEARNING IN MACHINE LEARNING
- Supervised Learning -
Unsupervised Learning -
Reinforcement Learning -
Semi -Supervised Learning -
Self- Supervised Learning -Supervised Learning
- Multi -Instance Learning -
Inductive Learning -
Deductive Inference -
Transductive LearningTransductive Learning
- Multi — Task Learning -
Active Learning -
Online Learning -
Transfer Learning -
Ensemble Learning

AN ESSENTIAL GUIDE FOR NUMPY TO MACHINE LEARNING IN PYTHON

7 TECHNIQUES FOR DIMENSIONALITY REDUCTION

DATA REPRESENTATION IN MACHINE LEARNING

BEST LIBRARIES FOR PYTHON AND NATURAL LANGUAGE PROCESSING

Classification Algorithms, Regression, Neural Networks, Clustering …

DECISION TREES AND RANDOM FORESTS FOR RANKING AND REGRESSION

IMBALANCED CLASSIFICATION

LABELING WITH ACTIVE LEARNING

INTRODUCTION TO THE K-NEAREST NEIGHBOUR ALGORITHM (PYTHON CODE)

SVM ALGORITHM (SUPPORT VECTOR MACHINE) FROM EXAMPLES AND CODE (PYTHON ER)

WHAT IS AN ARTIFICIAL NEURAL NETWORK? (PYTHON CODE)

METRICS

Most used model evaluation metrics in Machine Learning

CROSS VALIDATION: CONCEPT AND EXAMPLE IN R

INTERPRETING MACHINE LEARNING MODELS ( EN )

EVALUATION OF THE CLASSIFICATION MODEL

TUNE HYPERPARAMETERS

ROC CURVE EXPLAINED IN AN IMAGE

DATA SCIENCE

Data science is a term that escapes any single complete definition, which makes it difficult to use, especially if the goal is to use it correctly. Most articles and publications use the term loosely, with the assumption that it is universally understood. However, data science — its methods, goals, and applications — evolves with time and technology. Data science 25 years ago referred to collecting and cleaning data sets and applying statistical methods to that data. In 2018, data science has grown into a field that encompasses data analytics, predictive analytics, data mining, business intelligence, machine learning, and more.

More Definition, Positions, Methods, Packages, videos and much more about Data Science, check it out below…

DEFINE DATA SCIENCE: WHAT, WHERE AND HOW IS DATA SCIENCE

DATA SCIENTIST SKILLS

COMPARISON — JOBS IN DATA SCIENCE

10 MACHINE LEARNING METHODS EVERY DATA SCIENTIST SHOULD KNOW

5 PYTHON PACKAGES A DATA SCIENTIST CAN’T LIVE WITHOUT

45 TECHNIQUES USED BY DATA SCIENTISTS

12 ALGORITHMS EVERY DATA SCIENTIST SHOULD KNOW

TOP USED DATA SCIENCE LIBRARIES FOR PYTHON, R AND SCALA

DATA SCIENCE RESOURCES : CHEAT SHEETS (R, PYTHON…)

[CHEAT SHEET] PYTHON BASICS FOR DATA SCIENCE

LEARN DATA SCIENCE WITH DATA MINING
Video Collection on Data Science (Algorithms, Graphs, Tips, Models in Production)

10 SCENARIOS OF HOW TO DELIVER A MACHINE LEARNING PROJECT

HOW TO PRICE A DATA SCIENCE, MACHINE LEARNING OR AI PROJECT

DATA VIEW

Below is an indication of a book that shows how to build better graphics and more attractive Dashboards . What’s the point of all good data analysis if we don’t know how to demonstrate it in a clean and clear way?!?!? Good Read Storytelling with Data: A Guide to Data Visualization for Business Professionals

Storytelling with Data is admirably well written, a masterful display of rare art in the business world. Cole Nussbaumer Knaflic possesses a unique skill — a gift — in telling stories using data. At JPMorgan Chase, she helped improve our ability to explain complicated analyzes to executive management and the regulators we work with. Cole’s book brings his talents together in an easy-to-read guide, with excellent examples anyone can learn from to spur smarter decision making.”
Mark R. Hillis , JPM Chase’s Chief Mortgage Risk Officer

Some tools you can use for Data visualization:
- Microsoft Power BI
- Tableau
- Google Data Studio

Useful links (tips from Prof. Grimaldo)

Storytelling :

Example of what not to do…

HOW TO GET STARTED IN THE AREA OF DATA SCIENCE

This is a very discussed topic, several questions arise such as:

  • What course should I take to become a data scientist ?
  • Do you have vacancies in the market?
  • How do I become more visible and fight for a vacancy in data science ?
  • How do I earn BRL 20,000.00, or more, per month as I saw on television?

Here are some personal comments regarding the above questions:

  • There is no one course that will make you a data scientist, there are several good courses in this area that will help you on your way to your goal. The tip is to take a course and absorb as much content as possible, write down the points you need to improve (example: programming, statistics, linear algebra…), research and study these points from the outside (if you prefer, take specific courses in these areas);
  • Yes, there are many vacancies in the market, both for Data Scientist and Data Engineer (the Data Engineer is what lays the foundation for the Data Scientist, it is an area that is also growing a lot and in my view, companies with large Artificial Intelligence projects, you should start in this area, before calling a Data Scientist — this will vary from company to company, depending on the size of the projects and their goals);
  • If you don’t have it yet, build a Portfolio (on Github , for example, or on a blog) contemplating the work you’ve already done, it can be the work you’ve done at university, personal work and etc (always try to make a project from the beginning until the end (deploy)), that is, show your potential! This will make it more visible to the market.
  • Don’t be fooled, this value is not quite the one practiced mainly here in Brazil, of course there are large institutions that pay this amount or even more, but they are exceptions and not the rule! But the tip is, study and you can earn that value!

More information at: Getting Started in Data Science .

That’s it folks, I hope you liked it and I count on everyone’s support by sending suggestions for materials that we can put here and make the material even better!

Thanks for reading!!!

Read in Portuguese…

--

--