How I developed my Data Science skills using corona lockdowns
Being in lockdown and fight the emerging boredom with learning new skills? Yes, I did it. Here is, what I’ve created during the lockdown(s).
2020 was a difficult year. I’ve never thought I would ever witness a worldwide pandemic. As most of us did. Going into a lockdown, working from home with reduced working hours and homeschooling. But how to fight the emerging boredom?
The pandemic created many data which is available for free, and I’ve decided to built up my Data Science knowledge. I’ve realized some small projects to learn about new tools and technologies.
This article is a brief overview of the things I have personally tried and learned. It’s not about Corona but about how to learn new things during a pandemic.
I’ve built a Twitter bot
My first project was creating a Twitter bot which regularly tweets current figures and a chart. I intended to learn how to set up a Twitter bot, to collect data from open available sources and to visualize some key facts about the disease.
More details are given in a medium article I wrote on how I proceeded.
What I’ve learned
I’ve learned three things:
- How to create a Twitter bot which tweets regularly
- How to collect data and clean it for later use
- How to visualize my data insights
Plotly Dashboard
While figuring around with Pandas and MatplotLib creating my Twitter Bot, I’ve stumbled upon Plotly for interactive visualizations. And finally, I’ve found Dash from Plotly. Dash allows building browser-based dashboards using Plotly graphs. You don’t have to fiddle with HTML, Javascript or CSS. The approach Dash uses is declarative, and you describe your layout entirely in Python code. I’ve taken a deep dive into Dash, and it resulted in an article for a german developer magazine called „Entwickler“.
What I’ve learned
I’ve learned a lot about Plotly and Dash. For my article I’ve collected Data from two sources and combined them to create one dashboard. And at least I’ve published an article for a print magazine! And finally, I was using my knowledge for a project in the company I’m working at.
Sentiment Analysis of tweets
NLP — Natural Language Processing. I was curious about this topic and I decided to work on a sentiment analysis of some tweets about the german corona app. The app was and still is controversially dosed, and I just wanted to figure out how sentiments will change over time. I’ve collected tweets at launch, after one week and after two weeks. Finally, I have visualized my findings with Tableau.
What I’ve learned
I’ve learned how to collect numerous tweets. Furthermore, I’ve figured out how to clean natural language to be able to analyze: Remove stop words, removed Links, removed entities like hashtags and finally to lemmatize tweets to be able to start a sentiment analysis.
Oh, and I got into Tableau for visualizing. Wich leads me into the next learning.
Tableau Visualization
After tinkering around with Tableau for my sentiment analysis, I’ve decided to try some more analysis and visualization with Tableau. This time I’ve collected some climate data from the region I’m living in Germany. Finally, I used the visuals for a Twitter thread explaining my findings.
Another project using Tableau was a chart visualizing the stupidity reopening schools in Germany after the first lockdown, having higher infection rates before the lockdown.
What I’ve learned
Finally, I got into Tableau. This feature rich monster is easier to learn than I feared. The public version is free to use, and you can learn how to use an enterprise visualization tool. And I’ve learned that climate change is being visible — scary!
Collecting data from PDF files
The county where I live publishes its Corona figures as a PDF file. Easy to read, but difficult to analyze. Furthermore, they are generating the PDF once a week. My plan was collecting some data to be able to create a time series analysis. But how to extract tabular data from a PDF? I found out there is a PDF-Reader for Pandas named tabula/read_pdf. I tinkered with jupyterNotebook and could read the data. Unfortunately, the PDF itself changed its format from week to week. And I had to dismiss my plan to collect data regularly.
What I’ve learned
I’ve once again learned how powerful Pandas is. In particular when working with custom readers, like read_pdf. And I’ve learned, some projects do not make sense.
Using complex Excel sheets with Pandas
Talking about readers for Pandas, I’ve taken a deep dive into Pandas Excel reader. My goal was to get data out of a complex Excel sheet into a data-frame. I’ve found out that Pandas indeed has superpowers in reading Excel sheets. My article shows in detail the superpowers of Pandas reading Excel sheets.
What I’ve learned
Pandas is great. Nothing less, nothing more.
Providing data and visualize for free
Last but not least, I’ve once again found some sources providing data roundabout corona for my county. I’ve started collecting the data on a daily basis. Using the data I’ve created a visualization using Datawrapper an awesome web-based tool for data visualization. The Best: It has a free tier. When publishing my insights at Twitter people asked if I would like to share my collected data. A short research for a good location brought me to Qri — a service like GitHub, just for Data. Awesome! I have summarized this in an article:
What I’ve learned
Again I learned how to collect data regularly. And I’ve learned how to use existing tools and services for a specific task. Datawrapper is awesome and widely used by newspapers and online magazines. With Qri everybody can share datasets in an open-source manner. You don’t need to implement everything on your own. Take a look and use the already existing tools.
(My) Conclusion
2020 and the beginning of 2021 was wild. But it’s relatively easy to use the circumstances for your personal growth. COVID-19 generated tons of free data to play with. Of course there are other data sources available. Just take it, be curious and set yourself challenges. Make the most out of it, train yourself, try new tools and maybe get out of your comfort zone. Nothing is as bad that it is not good for something! I’ve learned many great things. And it has also distracted a bit from the complexity of the whole situation we are still in. Personally, I’ve learned many new tools and technologies, practiced a lot and sharpened some Data Science skills.