“It is a capital mistake to theorize before one has data.” — Sherlock Holmes

Data, right you read it. DATA!!!

None of us in these times can ever get over data, can we?

Data is the biggest flex of our generation. Just to put things into perspective consider a simple random grain of sand on this earth, and as a matter of fact, we know that we can’t ever count it. Here’s an interesting fact though

There are approximately 400,000 bytes of data for every grain of sand on earth.

We need data, in fact vast volumes of it, on a daily basis. Make no mistake that jobs and analysis in this field are beyond our normal reach. It's one heck of a job to analyze and organize all these data to be useful and predictive. There’s a study showing that less than 0.5 % of data is analyzed. All the other data is unorganized and wasted.

Now, here’s another interesting fact to keep you all tangled and curious as we dive deep into the topic.

If you burned all of the data created in just one day onto DVDs, you could stack them on top of each other and reach the moon — twice.

Mammoth volume of data, isn't it?

Let’s talk about Big Data:

According to the book, Big Data is a collection of data that is huge in volume, yet growing exponentially with time. It is data with such a large size and complexity that none of the traditional data management tools can store it or process it efficiently.

But umm it's just so much more than this. Before actually studying that, let's fascinate types of data-related fields first.

  • Data Analytics
  • Data Analysis
  • Data Structures
  • Data Algorithms
  • Data Sciences

and this list has thousands of sub-branches on the tech side.

Here’s a bit about data structures, algorithms, and types

Popular types of Data Structures:

Here is a list of the types of Algorithms:

  • Brute Force algorithm
  • Greedy algorithm
  • Recursive algorithm
  • Backtracking algorithm
  • Divide & Conquer algorithm
  • Dynamic programming algorithm
  • Randomized algorithm

Machine Learning

Machine learning algorithms are classified into 4 types:

  • Supervised
  • Unsupervised Learning
  • Semi-supervised Learning
  • Reinforcement Learning

Below is the list of commonly used Machine Learning Algorithms:

  • Linear regression
  • Logistic regression
  • Decision tree
  • SVM algorithm
  • Naive Bayes algorithm
  • KNN algorithm
  • K-means
  • Random forest algorithm
  • Dimensionality reduction algorithms
  • Gradient boosting algorithm and Adab Oosting algorithm

Here are the mathematical topics needed to be good in data science:

  • Descriptive Statistics
  • Hypothesis Testing
  • Regression Analysis
  • Probability Distributions
  • Conditional Probability
  • Sampling and Central Limit Theorem
  • Bayes Theorem
  • Vectors and Matrix Properties
  • Matrix Transpose and Inverse
  • Determinants
  • Dot Product
  • Eigenvalues and Eigenvectors
  • Matrix Factorization
  • Principal Component Analysis
  • Orthogonality
  • Differential and Integral Calculus
  • Limit, Continuity, and Partial derivatives
  • Step, Sigmoid, Logit, and ReLU Function
  • Maxima and Minima of a Function
  • Product and Chain Rule

Alright, don’t get perplexed by all those topics. Those are necessary and interesting aspects of a tech stack in data.

Apart from these, it would help if you had a bit of knowledge of computer networking, all the seven layers and the OSI/TCP model, and various operating systems.

Now, of course you need to be fluent in various programming languages in order to get a firm grip on data.

Here are a few languages trendy these days to make it through:

  • Python.
  • JavaScript.
  • Scala.
  • R.
  • SQL.
  • Julia.

Coming back to our initial topic, let's get deep into data analytics now:

Analysis is a detailed examination of something in order to understand its nature or determine its essential features. Data analysis is the process of compiling, processing, and analyzing data so that you can use it to make decisions.

Analytics is the systematic analysis of data. Data analytics is the specific analytical process being applied.

There are majorly 5 types of data analytics

What are the 5 types of data analytics?

Prescriptive, Predictive, Diagnostic, Descriptive, and Cognitive Analytics.

We define prospects of big data into 5 subdivisions

They are:

volume, value, variety, velocity, and veracity.

  • When businesses have more data than they are able to process and analyze, they have a volume problem.
  • When businesses need rapid insights from the data they are collecting, but the systems in place simply cannot meet the need, there’s a velocity problem.
  • When your business becomes overwhelmed by the sheer number of data sources to analyze and you cannot find systems to perform the analytics, you know you have a variety problem.
  • When you have massive volumes of data used to support a few golden insights, you may be missing the value of your data.
  • When you have data that is ungoverned, coming from numerous, dissimilar systems, and cannot curate the data in meaningful ways, you know you have a veracity problem.

There are three broad classifications of data source types:

  • Structured data is organized and stored in the form of values that are grouped into rows and columns of a table.
  • Semistructured data is often stored in a series of key-value pairs that are grouped into elements within a file.
  • Unstructured data is not structured in a consistent way. Some data may have structure similar to semi-structured data but others may only contain metadata.

Talking about types of data bases, we have:

Online transaction processing (OLTP) databases, often called operational databases, logically organize data into tables with the primary focus being on the speed of data entry. These databases are characterized by a large number of insert, update, and delete operations.

Online analytical processing (OLAP) databases, often called data warehouses, logically organize data into tables with the primary focus being the speed of data retrieval through queries. These databases are characterized by a relatively low number of write operations and the lack of update and delete operations.

Now, a few of the jobs that can be redeemed after getting through the knowledge of data analytics are:

It's important to observe that data and related stuffs move in and and around their own universe. It's important to know the assortativity and credibility of these topics before just randomly aiming for a high-profile paying job.

Without big data, you are blind and deaf and in the middle of a freeway.” — Geoffrey Moore.

With this thought, I am going to leave with a ton of curiosity about data.

Until next time, keep learning, and keep improving.



