We were so excited and honored to receive an invitation to participate in Geo-ICT Data Science Training. Why were we invited? You may have that question in your mind. We ranked in a first place in Data visualization challenge conducted by Resilience Academy.
The training(4 days training) was so useful as it gave us knowledge and skills on how to deal with data driven decisions such as using Big Data in our program. For any solution to work as it was intended to, data must play an important role in giving insights and shine a light on which solution should work best. We also shared our innovative idea, and we were open to receive any critics from other people.
As a team we first introduced ourselves to the teachers and other attendees. The first session was led by Mika Murtojärvi, the session was about Introduction to data science, where he gave a short history of data and various techniques they used in processing the data and getting the respective output (Solutions to problems using data). Not only that but also he explained how technology has evolved in a way that there is a large amount of data present in our society than ever before. With the improvement in technology it is now possible to process the data compared to when technology was not advanced.
There are specific steps to be followed in order to solve any data science project. Such as
- Project Understanding: Setting the goals of the project and gathering domain knowledge
- Data Understanding: What kind of data do we have, is it suitable and of high enough quality. Illustrations: simple examples of data visualization.
- Data Preparation: Transforming the data into a form suitable for data analysis methods (often matrix-like). Data quality checks and quality improvement. Illustrations: outliers and missing data, dealing with categorical data.
- Modeling: Choosing the right model. Illustrations: examples of models for regression and classification.
- Evaluation:Once one or more models have been built that appear to have high quality based on whichever loss functions have been selected, these need to be tested to ensure they generalize against unseen data and that all key business issues have been sufficiently considered. The end result is the selection of the champion model(s). If evaluation is sloppy, one may choose a poor model and believe that it is good. Over fitting is a common problem that can be discovered by proper model evaluation.
- Deployment: Putting the model into practical use and monitoring that it still works adequately i.e. Generalize well new unseen data.
On the second day of training, a new concept was introduced which is Data visualization, including types of data visualization, intents of data visualization which answers the question of why data visualization, factors influencing the use of a particular way of presenting data. This lesson was led by Mr. Charles Bundu.
Characteristics of Good and Bad Data Visualization
Examples of Bad Data Visualization
Tips for Good Data Visualization
- Choose the chart that tells the story and it can easily be interpreted.
- Remove anything that doesn’t support the story
3. Keep chart and graph headers simple and to the point.
4. Don’t use distracting fonts or elements
5. Don’t use more than 6 colors in a single layout.
6. Always choose the most efficient visualization.
Introduction to Data Visualization using Tableau
Is the simplest way of presenting data including developing different graph, charts, histogram and maps. That helps people see and understand data. Organizations everywhere, from non-profits to global enterprises, and across all industries and departments, are empowering their people with data. With Tableau they are finding opportunities in their business that they have never seen before.
The session included the following procedures
- Connect to your sample Data Source.
- Drag and drop to take a first look.
- Focus your results.
- Explore your data geographically.
- Drill down into the details.
- Build a dashboard to show your insights.
- Build a story to present.
- Share your findings.
As the result of above steps each participant was able to create his/her own account and publish the findings.