The Importance of Data in AI

Nathan Ciantar
4 min readJan 29, 2024

--

Obtaining a course on AI through online e-learning is achievable, yet it won’t provide the experience and the challenges an AI professional faces in his day to day when creating solutions.

Many developers underestimate such solutions and with the help of tools of Generative AI many assume that these solutions can be achieved easily. Previously an AI professional had to dive into and learn the mathematics behind each algorithm to assess each model how it works and where it is best suited to be used. Nowadays with libraries such as scikit learn in python, these models and mathematical functions come out of the box and without prior knowledge into the mathematics of how they work, one can easily implement an AI by leveraging Generative Models input.

Having knowledge of how and where a Machine Learning model is best used will have a great impact on the outcome of the solution to your problem.

Another important key element when creating such models, is the how and what data to acquire for the specific use-case at hand. Many courses I have encountered miss the data acquisition part, which is a very important part of creating an AI model. And as one of my lecturers had said during a lecture ‘90% of the work is probably finding and processing the data’ which I later had confirmed he was right (through hours of trying to find data for the specific case I needed).

Data Acquisition

Data which is the fuel of an AI model and essentially the key in this Data Driven Decision era we are currently in, which can be acquired from various sources. Everything around us can be essentially considered as data, if you think about it, imagine how much data a human being is producing simply by living. For example the amount of calories he/she is burning, it could be how much energy is being consumed to perform a single movement, or even the amount of gas a trip to the grocery and back has been used.

Another source where a lot of data can be found is the internet, every click or transaction done on the internet is producing some kind of data, be it buying something from amazon, or even simply visiting a website to find a recipe for dinner.

Even though this provides a very resourceful source on obtaining data, this is still one of the biggest challenge. Before obtaining such data, the user must extract the relevant features which would make sense to be used in taking a decision. Once the features has been selected to start the process of obtaining the data, the user must take into account the ethical concerns of the data source which it might raise, for example the bias which may be found in the data, as this will have a great impact in the classifications or predictions the model will make, especially when such models are being used publicly.

Data Processing

Processing of data to extract useful information is another key element which is essential to an AI professional. This isn’t an easy task to learn as the data which needs to be processed will vary from solution to another, meaning that the kind of processing will essentially change.

This process involves from combining data from different sources with varying formats, structure and semantics which can be complex, data cleaning, transformation and normalization to ensure the compatibility and consistency across the dataset.

Conclusion

Data Manipulation, Data Learning, Knowledge Discovery and Creating Visualizations are not only for data scientists, but are essential tools an AI professional should master.

An AI model learns very similar to how us humans learn. It uses all the data it has seen throughout its learning phase and tries to predict the next value or classify based on the input. If the training data is incorrect, the AI model will predict incorrect values, basically Garbage in Garbage out methodology. Therefore having the correct data is important when creating such tools, especially when tools are being used in a public environment which raises ethical implications in such cases.

In the age of artificial intelligence, data reigns supreme. Its abundance, quality, and diversity underpin the advancements and innovations that are shaping our digital world. By recognizing the importance of data and embracing best practices for its collection, management, and utilization, we can harness the full potential of AI to drive progress, empower individuals, and address some of society’s most pressing challenges.

--

--