All About the Data

Gaurav Chauhan
8 min readAug 27, 2018

--

This is part of The ULTIMATE Curriculum in Data Science which you can refer for more topics related to Data Science.

The goal is to turn data into information, and information into insight. — Carly Fiorina

Before understanding and learning Data Science, you highly need to first understand all about the data. Its existence, its use and why we are thinking about it.

If you search data in google, you will get the basic answer.

basic definition of data

In a sense this definition is true if we considered it for general audiences, but as you know that we are adding some science to data. So In terms of Data Science,

A Data is any information that can be processed digitally to use it as per the individual or the organisation to visualize, predict, recommend and find insights from it.

This is purely in terms of Data Science taken into consideration.

Basically data is information that has been translated into a form that is efficient for movement or processing. Relative to today’s computers and transmission media, data is information converted into binary digital form.

Now in this curriculum, you will understand the whole concept of data science as easy as reading a storybook.

Coming to the data, it is immensely important to understand about him, his interests, his beliefs, advantages, disadvantages, specialities and many other factors. Now if you are thinking why then just think of any hero who has fought with the villain with bare hand. Technically you can fight with him, but it will be very difficult and sometimes impossible to you to win the fight. Worst it will require more time to get better results.

So to be on the same page you should know something about a Data Scientist.

What is a data scientist — curiosity and training

The Mindset

A common personality trait of data scientists is they are deep thinkers with intense intellectual curiosity.

Data science is all about being inquisitive — asking new questions, making new discoveries, and learning new things. Ask data scientists most obsessed with their work what drives them in their job, and they will not say “money”. The real motivator is being able to use their creativity and ingenuity to solve hard problems and constantly indulge in their curiosity. Deriving complex reads from data is beyond just making an observation, it is about uncovering truth that lies hidden beneath the surface. Problem solving is not a task, but an intellectually-stimulating journey to a solution. Data scientists are passionate about what they do, and reap great satisfaction in taking on challenge.

Training

There is a glaring misconception out there that you need a sciences or math Ph.D to become a legitimate data scientist. That view misses the point that data science is multidisciplinary. Highly-focused study in academia is certainly helpful, but doesn’t guarantee that graduates have the full set of experiences and abilities to succeed. E.g. a Ph. D statistician may still need to pick up a lot of programming skills and gain business experience, to complete the trifecta.

In fact, data science is such a relatively new and rising discipline that universities have not caught up in developing comprehensive data science degree programs — meaning that no one can really claim to have “done all the schooling” to be become a data scientist. Where does much of the training come from? The unyielding intellectual curiosity of data scientists push them to be motivated autodidacts, driven to self-learn the right skills, guided by their own determination.

And you. my friends are Autodidacts.

The main pillars of Data Science

Three pillars of Data Science

Mathematics Expertise

At the heart of mining data insight and building data product is the ability to view the data through a quantitative lens. There are textures, dimensions, and correlations in data that can be expressed mathematically. Finding solutions utilizing data becomes a brain teaser of heuristics and quantitative technique. Solutions to many business problems involve building analytic models grounded in the hard math, where being able to understand the underlying mechanics of those models is key to success in building them.

Also, a misconception is that data science all about statistics. While statistics is important, it is not the only type of math utilized. First, there are two branches of statistics — classical statistics and Bayesian statistics. When most people refer to stats they are generally referring to classical stats, but knowledge of both types is helpful. Furthermore, many inferential techniques and machine learning algorithms lean on knowledge of linear algebra. For example, a popular method to discover hidden characteristics in a data set is SVD, which is grounded in matrix math and has much less to do with classical stats. Overall, it is helpful for data scientists to have breadth and depth in their knowledge of mathematics. And if you will follow my curriculum then you will definitely full understanding of all of these.

Technology and Hacking

First, let’s clarify on that we are not talking about hacking as in breaking into computers. We’re referring to the tech programmer subculture meaning of hacking — i.e., creativity and ingenuity in using technical skills to build things and find clever solutions to problems.

Why is hacking ability important? Because data scientists utilize technology in order to wrangle enormous data sets and work with complex algorithms, and it requires tools far more sophisticated than Excel. Data scientists need to be able to code — prototype quick solutions, as well as integrate with complex data systems. Core languages associated with data science include SQL, Python, R, and SAS. On the periphery are Java, Scala, Julia, and others. But it is not just knowing language fundamentals. A hacker is a technical ninja, able to creatively navigate their way through technical challenges in order to make their code work.

Along these lines, a data science hacker is a solid algorithmic thinker, having the ability to break down messy problems and recompose them in ways that are solvable. This is critical because data scientists operate within a lot of algorithmic complexity. They need to have a strong mental comprehension of high-dimensional data and tricky data control flows. Full clarity on how all the pieces come together to form a cohesive solution.

Strong Business Acumen

It is important for a data scientist to be a tactical business consultant. Working so closely with data, data scientists are positioned to learn from data in ways no one else can. That creates the responsibility to translate observations to shared knowledge, and contribute to strategy on how to solve core business problems. This means a core competency of data science is using data to cogently tell a story. No data-puking — rather, present a cohesive narrative of problem and solution, using data insights as supporting pillars, that lead to guidance.

Having this business acumen is just as important as having acumen for tech and algorithms. There needs to be clear alignment between data science projects and business goals. Ultimately, the value doesn’t come from data, math, and tech itself. It comes from leveraging all of the above to build valuable capabilities and have strong business influence.

As you see that the DATA is so interesting and fun, lets start learning basics types of data that we will use in Data Science.

Types of Data

This image basically gives you a good understanding, but still in terms of definition,

structured data as data that can be easily organized. As a result these type of data are easily analyzable.

Unstructured data refers to information that either does not have a pre-defined data model and/or is not organized in a predefined manner. Unstructured data are not easy to analyze. A primary goal of a data scientist is to extract structure from unstructured data.

Internal sources of data reflect those data that are under the control of the Business. These data are housed in financial reporting system, operational systems, HR systems and CRM systems, to name a few. Business leaders have a large say in the quality of internal data; they are essentially a byproduct of the processes and systems the leaders use to run the business and generate/store the data.

External sources of data, on the other hand, are any data generated outside the walls of the business. These data sources include social media, online communities, open data sources and more. Due to the nature of source of data, external sources of data are under less control by the business than are internal sources of data. These data are collected by other companies, each using their unique systems and processes.

For any company that wishes to enhance their business, data is the secret sauce. Most of you are got interest in Data science by some or the other story regarding that Data science is the future. And i agree it for some aspects but still if you do not have any interest in it then my friend how many tries you do, you cannot succeed. And even if you get job then also it is very important to update yourself in this new field as no one know exactly the future of Data Science in the next coming years.

So go on pack your bags, ready your sword as for the next few weeks you are going to a quest to find out all about the data.

— and if you want to learn and become Data Scientist, take this curriculum.

To get the latest updates, tips and anything you want or have issue just post in the comments.

Till then….

Happy coding :)

And Don’t forget to clap clap clap…

--

--