5 Short Courses to Boost your Data Science Skills [Part 6]
Boosting your data science skills with these 5 short courses [Data ethics and literacy version]
Data science is a very broad field, that requires a wide spectrum of skills and tools to be able to advance your career. Therefore, it is important to build a habit of learning and gaining more skills throughout your career. As James Clear mentions in his amazing book Atomic Habits:
It is so easy to overestimate the importance of one defining moment and underestimate the value of making small improvements on a daily basis. Too often, we convince ourselves that massive success requires massive action. Whether it is losing weight, building a business, writing a book, winning a championship, or achieving any other goal, we put pressure on ourselves to make some earth‑shattering improvement that everyone will talk about.
Meanwhile, improving by 1 percent isn’t particularly notable — sometimes it isn’t even noticeable — but it can be far more meaningful, especially in the long run. The difference a tiny improvement can make over time is astounding. Here’s how the math works out: If you can get 1 percent better each day for one year, you’ll end up thirty‑seven times better by the time you’re done. Conversely, if you get 1 percent worse each day for one year, you’ll decline nearly down to zero. What starts as a small win or a minor setback accumulates into something much more.
Inspired by this mentality, I will be recommending five short courses every month that can be finished in a couple of hours (3–5) to help you build up learning habits that will eventually increase your data science skills over time.
This month’s version will focus on data ethics and literacy. As a data scientist, you will be working with data on a daily bases, so it is important to know and understand the ethics of using the data and applying AI algorithms to it, so as to avoid any accident that will affect you and your company and also the people if you are using customer’s data.
Course Description :
What are the ethical considerations regarding the privacy and control of consumer information and big data, especially in the aftermath of recent large-scale data breaches? This course provides a framework to analyze these concerns as you examine the ethical and privacy implications of collecting and managing big data. Explore the broader impact of the data science field on modern society and the principles of fairness, accountability, and transparency as you gain a deeper understanding of the importance of a shared set of ethical values. You will examine the need for voluntary disclosure when leveraging metadata to inform basic algorithms and/or complex artificial intelligence systems while also learning best practices for responsible data management, understanding the significance of the Fair Information Practices Principles Act and the laws concerning the “right to be forgotten.” This course will help you answer questions such as who owns data, how do we value privacy, how to receive informed consent and what it means to be fair. Data scientists and anyone beginning to use or expand their use of data will benefit from this course. No particular previous knowledge is needed.
Estimated duration: 12 hours
Difficulty Level: Beginner
Instructor: H.V. Jagadish [Bernard A Galler Collegiate Professor]
Course Topics: The course consists of four chapters. Below are the covered topics in each chapter.
- What are Ethics: Module 1 of this course establishes a basic foundation in the notion of simple utilitarian ethics we use for this course. The lecture material and the quiz questions are designed to get most people to come to an agreement about right and wrong, using the utilitarian framework taught here. If you bring your own moral sense to bear or think hard about possible counter-arguments, it is likely that you can arrive at a different conclusion. But that discussion is not what this course is about. So resist that temptation, so that we can jointly lay a common foundation for the rest of this course.
- History, Concept of Informed Consent: Early experiments on human subjects were by scientists intent on advancing medicine, to the benefit of all humanity, with disregard for the welfare of individual human subjects. Often these were performed by white scientists, on black subjects. In this module, we will talk about the laws that govern the Principle of Informed Consent. We will also discuss why informed consent doesn’t work well for retrospective studies, or for the customers of electronic businesses.
- Data Ownership: Who owns data about you? We’ll explore that question in this module. A few examples of personal data include copyrights for biographies; ownership of photos posted online, Yelp, Trip Advisor, public data capture, and data sale. We’ll also explore the limits on recording and use of data.
- Privacy: Privacy is a basic human need. Privacy means the ability to control information about yourself, not necessarily the ability to hide things. We have seen the rise of different value systems with regard to privacy. Kids today are more likely to share personal information on social media, for example. So while values are changing, this doesn’t remove the fundamental need to be able to control personal information. In this module, we’ll examine the relationship between the services we are provided and the data we provide in exchange: for example, the location of a cell phone. We’ll also compare and contrast “data” against “metadata”.
- Anonymity: Certain transactions can be performed anonymously. But many cannot, including where there is physical delivery of the product. Two examples related to anonymous transactions we’ll look at are “blockchains” and “bitcoin”. We’ll also look at some of the drawbacks that come with anonymity.
- Data Validity: Data validity is not a new concern. All too often, we see the inappropriate use of data science methods leading to erroneous conclusions. This module points out common errors, in language suited for a student with limited exposure to statistics. We’ll focus on the notion of a representative sample: opinionated customers, for example, are not necessarily representative of all customers.
- Algorithmic Fairness: What could be fairer than a data-driven analysis? Surely the dumb computer cannot harbor prejudice or stereotypes. While indeed the analysis technique may be completely neutral, given the assumptions, the model, the training data, and so forth, all of these boundary conditions are set by humans, who may reflect their biases in the analysis result, possibly without even intending to do so. Only recently have people begun to think about how algorithmic decisions can be unfair. Consider this article, published in the New York Times. This module discusses this cutting-edge issue.
- Societal Consequences: In the last chapter, we consider societal consequences of data science that we should be concerned about even if there are no issues with fairness, validity, anonymity, privacy, ownership, or human subjects research. These “systemic” concerns are often the hardest to address, yet just as important as other issues discussed before. For example, we consider ossification, or the tendency of algorithmic methods to learn and codify the current state of the world and thereby make it harder to change. Information asymmetry has long been exploited for the advantage of some and to the disadvantage of others. Information technology makes the spread of information easier, and hence generally decreases asymmetry. However, Big Data sets and sophisticated analyses increase asymmetry in favor of those with the ability to acquire/access.
- Code of Ethics: tying all the issues we have considered together into a simple, two-point code of ethics for the practitioner.
- Attributions: This module contains lists of attributions for the external audio-visual resources used throughout the course.
Data ethics covers an incredibly broad range of topics, many of which are urgent, making headlines daily, and causing harm to real people right now. A meta-analysis of over 100 syllabi on tech ethics, titled “What do we teach when we teach tech ethics?” found that there was huge variation in which topics are covered across tech ethics courses (law & policy, privacy & surveillance, philosophy, justice & human rights, environmental impact, civic responsibility, robots, disinformation, work & labor, design, cybersecurity, research ethics, and more– far more than anyone course could cover). These courses were taught by professors from a variety of fields. The area where there was more unity was in outcomes, with abilities to critique, spot issues, and make arguments being some of the most common desired outcomes for the tech ethics course.
In this course, the focus is on topics that are both urgent and practical. In keeping with this teaching philosophy, we will begin with two active, real-world areas (disinformation and bias) to provide context and motivation, before stepping back into Lesson 3 to dig into the foundations of data ethics and practical tools. From there we will move on to additional subject areas: privacy & surveillance, the role of the Silicon Valley ecosystem (including metrics, venture growth, & hypergrowth), and algorithmic colonialism. I realize this course still just covers a slice of what is a sprawling field, and I hope that it will be a helpful entry point for continued exploration.
This class was originally taught in person at the University of San Francisco Data Institute in January-February 2020, for a diverse mix of working professionals from a range of backgrounds (as an evening certificate course). There are no prerequisites for the course. This course is in no way intended to be exhaustive but hopefully will provide useful context about how data misuse is impacting society, as well as practice in critical thinking skills and questions to ask.
Estimated duration: 12 hours
Difficulty Level: Intermediate
Instructor: Rachel Thomas [Cofounder of Fast.ai]
Course Topics: The course consists of six lessons. Below are the covered topics in each lesson.
- Disinformation: From deepfakes being used to harass women, widespread misinformation about coronavirus (labeled an “infodemic” by the WHO), fears about the role disinformation could play in the 2020 election, and news of extensive foreign influence operations, disinformation is in the news frequently and is an urgent issue. It is also indicative of the complexity and interdisciplinary nature of so many data ethics issues: disinformation involves tech design choices, bad actors, human psychology, misaligned financial incentives, and more.
- Bias & Fairness: Unjust bias is an increasingly discussed issue in machine learning and has even spawned its own field as the primary focus of Fairness, Accountability, and Transparency (FAccT). We will go beyond a surface-level discussion and cover questions of how fairness is defined, different types of bias, steps towards mitigating it, and complicating factors.
- Ethical Foundations & Practical Tools: Now that we’ve seen a number of concrete, real-world examples of ethical issues that arise with data, we will step back and learn about some ethical philosophies and lenses to evaluate ethics through, as well as consider how ethical questions are chosen. We will also cover the Markkula Center’s Tech Ethics Toolkit, a set of concrete practices to be implemented in the workplace.
- Privacy and surveillance: Huge amounts of data are being collected about us: apps on our phones track our location, dating sites sell intimate details, facial recognition in schools records students, and police use large, unregulated databases of faces. Here, we discuss real-world examples of how our data is collected, sold, and used. There are also concerning patterns of how surveillance is used to suppress dissent and to further harm those who are already marginalized.
- How did we get here? The Ecosystem: News stories understandably often focus on one instance of a particular ethics issue at a particular company. Here, the course will step back and consider some of the broader trends and factors that have resulted in the types of issues we are seeing. These include our over-emphasis on metrics, the inherent design of many of the platforms, venture capital’s focus on hypergrowth, and more.
- Algorithmic Colonialism, and Next Steps: When corporations from one country develop and deploy technology in many other countries, extracting data and profits, often with little awareness of local cultural issues, a number of ethical issues can arise. Here we will explore algorithmic colonialism. We will also consider the next steps for how students can continue to engage around data ethics and take what they’ve learned back to their workplaces.
3. Ethics of AI — University of Helsinki
Artificial intelligence is already a part of our daily lives. When we post pictures to social media, search online or ask questions from chatbots, we’re interacting with AI. Authorities, such as cities, rely on AI for providing public services. And governments are seeking solutions to global problems by using algorithmically produced knowledge. The goal of this course is to help you to develop your own skills for ethical thinking. You can complete the course at your own pace.
Estimated duration: 4 hours
Difficulty Level: Beginner
Course Topics: The course consists of seven chapters. Below are the covered topics in each chapter.
- What is AI ethics: What does AI ethics mean and what role do values and norms play? We’ll also look at the principles of AI ethics that we will follow in this course.
- Non-maleficence: What do the principles of beneficence (do good) and non-maleficence (do no harm) mean for AI, and how do they relate to the concept of the “common good?
- Accountability: What does accountability actually mean, and how does it apply to AI ethics? We’ll also discuss what moral agency and responsibility mean and the difficulty of assigning blame.
- Transparency: Why is transparency in AI important and what major issues are affected by transparency — and what are some of the risks associated with transparency in AI systems?
- Human rights: What are human rights, and how do they tie into the current ethical guidelines and principles of AI? We’ll also look more closely at three rights of particular importance to AI: the right to privacy, security, and inclusion.
- Fairness: What does fairness mean in relation to AI, how does discrimination manifest through AI — and what can we do to make these systems less biased?
- AI ethics in practice: What are some of the current challenges for AI ethics, what role do AI guidelines play in shaping the discussion, and how might things develop in the future?
Data literacy involves articulating a problem that can potentially be solved using data. Most importantly, it is about interpreting the results of an analysis and making decisions based on the gained insights. A data literate person would have the ability to understand and check the adequacy of the data involved. Ultimately, you should be able to read and derive valuable information from data. This course talks about the importance of data literacy. Understanding the language of data allows you to use and interpret data effectively. If you want a successful career path, you will most likely need these skills.
Estimated duration: 5 hours
Difficulty Level: Beginner
Instructor: Olivier Maugain
- Introduction To Data Literacy Free
- Understanding Data
- Using Data
- Reading Data
- Interpreting Data
Data analysis isn’t just for specialists who need to make sense of massive datasets. Decision-makers in every industry can benefit from a basic understanding of the goals and concepts of applied data analysis. In this course, join Barton Poulson as he focuses on the fundamentals of data fluency, or the ability to work with data to extract insights and determine your next steps. Barton shows how exploring data with graphs and describing data with statistics can help you reach your goals and make better decisions. Instead of focusing on particular tools, he concentrates on general procedures that can help you solve specific problems. Find out how to prepare data, explore it visually, and use statistical methods to describe it.
Estimated duration: 5 hours
Difficulty Level: Beginner
Instructor: Barton Poulson [Professor, Designer, Data Analytics Expert]
Course Topics: The course consists of four lessons. Below are the covered topics in each lesson.
- Think with Data
- Prepare Data
- Explore Data
- Describe Data