Want More Meaningful Data Science?
Stand On the Shoulders of Giants, Like Yeshimabeit Milner!
One of the biggest struggles that loads of students of data science face, is tying together all the hard technical skills being learned from the vast internet of resources to meaningful social-economic-environmental-business problems.
Is this a soft skill? Is it a skill that’s captured by “communication” or “data storytelling?”
I think it is none of the above. In my mind, making the connection between hard data science skills and real-world value is the hardest skill of all. It is hard because it requires more than just communication or storytelling skills. It also requires prior knowledge, inspiration to motivate your questions and search for deeper solutions, and creativity to explore the realm of the possible.
One way to get inspired is to “stand on the shoulders of giants” and learn about the meaningful efforts of others as we examine how their passions materialize with data science. One such person I focus on here is Yeshimabeit Milner.
Milner is known for her work as co-founder and director of Data 4 Black Lives.
The Data 4 Black Lives organization is working to identify gaps in data on black people that reverberates in our understanding of important social and health-related outcomes. When gaps in data exist, the models we build to understand, predict, and ultimately attempt to solve aspects of those problems are necessarily biased.
For example, Data 4 Black lives has created a data set showing which states have failed to accurately report and share data on COVID-19 infections by race. Attempting to understand and solve the COVID pandemic in those states is relatively impossible without complete data.
On the social front, the organization has penned an open letter to Facebook asking for a commitment to share anonymized data, collaborate with researchers, and commit to hiring Black data scientists.
Each initiative is meaningful to data scientists looking to develop meaningful projects in data science. Milner’s focus is inspiring in itself but it is also informative as it points a specific focus on understanding that major health and social problems are missing essential data to be able to solve.
And for any data science project, the most important aspect of a solution is the ability to gather the right data.
Going a bit deeper into the Data 4 Black Lives site, in the Blog section there is a story regarding the concept of Data Weapons. According to Milner, Data Weapons are data and Ai models that are being used to further justify targeting socially vulnerable communities.
Data weapons include predictive models that use historical crime data to predict where crime is likely to occur, who is most likely to commit new crimes, and to recognize faces of past criminal history. The problem is that the data these models are based on are themselves biased because the officers have also been biased in their policing. Thus, the models are self-fulfilling prophecies and we must make professionals using and selling these solutions more common knowledge.
The #NoMoreDataWeapons goals also give those data scientists interested in these issues some really good ideas for building data science solutions that may help to support the goals. Even if you are not interested in this specific issue, some of the ideas for data science I provide below may inspire ideas for you to consider your own pursuit of more meaningful data science projects.
1. Promote Black Self Determination
a. Goal: Public education on what the various Data Weapons are and how to organize against them
b. Data Science Ideas
i. Develop a series of Google Alerts based on data science in policing (e.g. “data science policing,” “machine learning surveillance,” “facial recognition Ai in policing”) and have them delivered to a dummy email address. Scrape the results and develop NLP analytics that help to create insight regarding trends in this space and write about the results, or collaborate with other researchers.
ii. Identify more tech-enabled police departments and pull data in those areas to identify differences with less tech-enabled police departments.
2. Shift the National Narrative
a. Goal: Document and promote storytelling by Black communities and individuals directly impacted by data weapons
b. Data Science Ideas
i. Identify public comment boards online where minority communities can report on police experiences and build data science analytics to identify potential data weapons
ii. Develop data collection for sources from minority communities and from majority communities in order to demonstrate how the two speak very differently about the same issues using NLP tools.
3. Data Consolidation and Collection
a. Goal: stablish a dataset of carceral and surveillance technologies in use based on US location. Inform and Support Policy Innovation: Cultivate and support targeted campaigns to move policy concerning the use of data weapons
b. Data Science Ideas
i. Make sure that data collection developed on your idea remains “hot” (e.g. continue to collect data and put it into a useful analytical form (e.g. table).
ii. Communicate through Blogs, Outreach to others, and Collaborations to raise awareness and build a value story on your efforts.
I am hopeful that you have found this exercise both informative and inspiring to your own pursuits of meaningful data science projects.
Like engaging to learn about data science, career growth, life, or poor business decisions? Sign up for my newsletter here and get a link to my free ebook.