My Journey Into Data Science
The rovers of NASA, Curiosity and Opportunity landed on our solar system’s very famous planet named Mars. I was always curious to see their images, motions and landing techniques. But, this curiosity was to see the adventure and magic of modern science. I was a student of Computer Science and trying to relate to its modern technologies. I was always thinking about the role of Rovers in our modern Information Technology. I was always thinking a few questions like what they are going to send NASA, how can they fit into modern computer science, etc. I visited the website of NASA and came to know that those rovers will send data or information related to geological conditions like a wide range of rock and soil, water, micro-biological life, etc.
The Data of NASA’s Curiosity rover made me curious by knowing the vision of Opportunity rover to seek the opportunity in Data Science.
Data Pipeline made me Excited
I started working with different sets of data by applying the code of Python, Pandas, NumPy, etc. I started applying different techniques with data like data cleaning, reshaping, aggregation, synchronization, to query data from different sources, handling structured and unstructured data, pushing into different data sources, visualization, etc.
In one meeting I came to know about Data Pipeline from a highly efficient and skilled team member. The concept of the modern Data Pipeline made me curious and it was very interesting. Team members provided me with the opportunity to learn, design and implement as an individual contributor. I started learning and worked hard to achieve the below skill-sets with efficiency in a short duration:
- Architecture and Concept of Data Pipeline
- Importance and real-time usages of Data Pipeline
- Job automation tools like Apache Airflow, Domino Data Lab, etc.
- AWS Redshift and S3 Bucket.
- REST API consuming mechanisms.
- Bash Scripts.
- Integration with several components and Data sources in the Data Pipeline.
- Importance of memory while processing of Job and handling large Data.
- How Data Pipeline will provide data to ML Models?
Now, I am an expert in designing the architecture of an efficient Data Pipeline.
Efficient Python Script saved me from Sleepless Nights
I learnt to write efficient Python scripts along with different libraries like Pandas, NumPy, Flask, etc. I wrote several scripts for the Data Science project and integrated them with automated Job Agents. Those Job Agents in Pipeline executes efficiently and perform their operations as per requirements.
I learnt and implemented several techniques in scripts for Data Science mentioned below:
- Data Cleaning and De-Duplication
- Data Aggregation, Reshaping, Concatenation and Merging
- Normalizing Data
- Data Encoding
- Handling with CSV, excel and other data sources
Machine Learning Models showed me the importance of Data Cleaning
Data cleaning is an important step of Data Science for Machine learning because in the modern world data can either be structured and unstructured. Machine Learning Model expects highly accurate training datasets otherwise you can’t expect good scoring and accuracy.
I learnt to label data by adding meaningful tags or labels or classes to the observations or rows. These tags can come from observations or asking people or specialists about the data.
Data cleaning is very important for Businesses too because it generally holds so much important information about the business, employees, customers or clients, etc. So, we must ensure that the personal information of business is kept safe and organized.
Learnt the Important of Analytics and Visualization
Analytics is completed related to Data Science and it depends on data. It is used to analyze raw data to conclude information. So, learnt the way to automate raw data so that business or expert can apply analytics and visualize data.
I came to know the responsibilities of a Data Analyst:
- Data Interpretation, analysis and applying statistical techniques.
- To maintain statistical efficiency and quality.
- To capture data from primary and secondary data sources and to maintain databases.
Conclusion
Data Science is very important and successfully adding values to all the business models by applying several techniques like statistics, machine learning, deep learning etc. I started my journey in Data Science and moving ahead by achieving many things with hard work and dedication.