Turning a New Page: Learning Python and Data Science in Twelve Weeks

Giudittaparolini
Learn IT, Girl
Published in
8 min readDec 11, 2019
Photo by Daniel Schludi on Unsplash

Now that I think about it, I made an implicit commitment to lifelong learning quite early in my life. When I was a teenager, I spent several hours a day translating Latin and Greek texts into my mother tongue, Italian, but when it was time to go to university, I decided that studying physics was more challenging than becoming a classical philologist. Five years later, with my hard-earned degree in hand, I felt I needed tools to explain non-experts what science is about and what scientists do, and I trained as a science writer. I sincerely enjoyed writing about science for non-expert readers, adults and children alike, but after a while I realised that I needed to understand better how science develops and to place it into an historical perspective, if I wanted to be a really convincing writer. For this reason, I returned to academia to study the history of science and I have been an academic historian for a few years now. It has been a stimulating intellectual adventure, but I now feel that I am missing something and that I need to turn a new page.

As an historian, I try to understand the past. I do so by visiting archives and libraries to study primary sources, such as scientists’ personal papers and research notebooks, I assess value and reliability of these sources, collect oral histories when possible, and read scientific publications that give me an idea of how technical concepts developed. But even if I am studying the past, I am still living in the present, and I cannot overlook the increasingly relevant role that digital data and technologies are acquiring in it.

Photo by Andrew Neel on Unsplash

More and more, coding and data literacy are becoming essential skills for anyone, not differently than reading, writing, and arithmetic. Even ‘historical data’ are increasingly digital, as libraries, research institutions, and even private companies are promoting large-scale digitisation projects of archival collections and old scientific publications. These projects are making available corpora of digitised documents that offer unprecedented opportunities for historical research. But mining a large data set requires to put new skills into the historian’s toolbag because close reading, i.e. reading page after page of a manuscript or a published article, is not a feasible procedure when you have to examine thousands of records. You need distant reading that in turn requires knowledge of coding and quantitative data analysis.

I am grateful to the IT community — to meetups and study groups in Berlin and to other volunteer-based initiatives online, such as the study groups of the Anita Borg community — for the support that I have received in acquiring these skills. Learn IT, Girl!, the program that allowed me to learn Python and data science last spring, is one example of the effort that the IT community is making to teach coding skills to beginners, especially to people from underrepresented groups. Learn IT, Girl! is a program run by volunteers and focuses on increasing coding literacy among women. Its peculiarity and, I believe, the reason of its success is that the program is project-based. Each mentee is paired with a mentor and during the twelve-week program the mentee pursues a project of her own under the supervision of the mentor, learning coding skills along the way.

Mining historical data sets with digital tools

Photo by Jan Kolar / VUI Designer on Unsplash

I am researching the history of agricultural meteorology, the scientific field that studies how weather and climate affect farming. As part of this research, I am compiling a bibliography of scientific publications printed between 1900 and 1950 in this field. The data set includes references to journal articles, books, reports, and PhD thesis published in over twenty languages. During the project, last spring, my bibliography had about 3.300 entries. It has now grown to more than 4.300 and it is constantly expanding. I applied to the Learn IT, Girl! program to learn Python and data science, because I wanted to mine this data set. I was interested in understanding who were the key authors and journals in agricultural meteorology, the most popular topics, the scientific communities that did research in this field and their respective contributions. With Pandas, a Python library for data analysis and data visualisation, I was able to extract information about the language and time distribution of the publications in my bibliography, extract author names and generate co-author lists, distribute journal articles across disciplinary categories, such as agriculture, meteorology, and geography. With SpaCy, a Python library for Natural Language Processing, I have also analysed the fulltext of a few articles extracting geographical entities and examined the titles of the English articles to individuate the most common themes in each disciplinary category. My bibliography is stored in a Zotero library and the data will become accessible to everyone once the data set is complete. I hope that other historians — perhaps also natural scientists — may find an interest in this large collection of publications on agricultural meteorology, especially considering today’s increasing concerns for climate change and agricultural sustainability. The code I have written may be a useful template even for people who have very different research questions in mind. As Zotero is a popular open access software, my code can also be re-used by people who want to mine different data sets created with Zotero. You can find all the materials I developed for Learn IT, Girl! in my GitHub repository.

“I Hear and I Forget, I See and I Remember, I Do and I Understand” (Confucian tradition)

Photo by LUM3N on Unsplash

I was very lucky in being accepted into the program and in being assigned a mentor, Laura Fernández Gallardo, who is a professional data scientist. She has constantly supported and encouraged me throughout the program, answered all my questions, suggested books and tutorials, and advised about best practices in data science. Thanks to her, my project repository has a clear structure and my code is understandable and well annotated. I especially appreciate her mentorship, because she always listened to me, understood what I wanted to do, and helped me achieve my objectives, but, at the same time, she also realised that a beginner needs some time to think things through. She never ‘took over the keyboard’, a frustrating experience that I have sometimes encountered when learning to code with software professionals, but she allowed me to make my own mistakes and find my own solutions. I really thank her for this approach that has helped me to build not only my coding skills, but also my confidence in my own ability to code. Thank you, Laura, I hope you will continue to mentor in the next edition of Learn IT, Girl!

What I have learned during the program

Photo by Helena Hertz on Unsplash

1. It is better to ‘Learn the Hard Way’

I used Zed Shaw’s Learn Python the Hard Way to acquire enough knowledge of Python to start using the Pandas library. I can only recommend this book to all beginners who want to learn Python. Shaw’s recipe is easy and effective. You read the code, retype it in your own editor, and run it in the command line. Questions and exercises help you to think about the piece of code you have reproduced. Learning a programming language feels a bit like learning to speak a new language, you start with easy sentences that teach you key concepts — names, verbs, prepositions, etc. — and then, when you are familiar with these concepts, you begin understanding the formal structure of the language and you can start writing your own sentences.

2. Do not be afraid of jargon and technical documents

It took me a while to understand that ‘foo’ is not a magic formula able to make every piece of code work, but just a placeholder. In this case, I sincerely regretted that software developers do not like explanatory footnotes, as historians do. It would have helped. As a beginner, I often find software documentation not very user-friendly and, in many cases, I feel more comfortable in checking Stackoverflow questions and answers. I like reading about other people coding problems and their possible solutions. Comparing and contrasting the answers in each thread gives me the opportunity to evaluate different coding approaches and I can choose the solution that works better for me.

3. Work on what matters to you

Working on my own bibliographic data set and tackling my historical research questions made a huge difference for me during Learn IT, Girl! I certainly would not have felt so engaged, if I had used a dummy data set to learn Python and data science. As I was analysing my own research data, I felt constantly motivated on the one hand, and on the other I could also have an immediate feedback on what I was accomplishing. This made learning Python much easier and it gave me the determination to overcome the challenges I encountered. I still remember how much I struggled to extract information on co-authors, i.e. the scientists who published one paper or more together. This is an important information, because it is a first step towards a network analysis of my bibliographic data, and so, no matter how hard it was, I persisted and eventually succeeded.

4. Twelve weeks and no more

Compared to the average duration of academic projects, twelve weeks seemed an incredibly short time to achieve any meaningful result, but Learn IT, Girl! taught me that this is a perfectly suitable duration for a coding project. In this limited time motivation remains high, and it is necessary to work on the project every day to complete tasks, therefore there is no risk to forget what has been previously done. At the same time, twelve weeks still give the opportunity to write quite complex code and even to start thinking how to improve and enlarge it. Unless differently required, I will continue to apply the twelve-week rule for my future digital projects.

5. With comfortable shoes you can go anywhere

At the conclusion of Learn It, Girl! I did not only have working code that was helpful for my historical research, but I also had the feeling that I knew enough to write more code autonomously. Python has not been the first programming language that I have tried to learn, but it is certainly the first one I felt comfortable with and that made me curious to go beyond the beginner’s stage. I find the logic behind Python very clear and understandable and data science methods implemented in Python resonate with the practices of academic research. The Italian writer Italo Calvino argued that placing your feet in a comfortable position is essential to secure concentration while reading. In the same way, I think that you need to wear comfortable shoes to solve complex problems, and coding in Python really feels like wearing a pair of sneakers that can take you anywhere, if you want it.

--

--