The Future of Commercial Deep Learning
How do we balance its benefits and integrity going forward?
Artificial intelligence (AI) is surprisingly ubiquitous. It powers innumerable positive applications, from automated medical diagnoses to self-driving cars.
Underlying modern AI is deep learning, algorithms through which computers learn to perform intelligent tasks without being explicitly programmed. These algorithms train artificial neural networks, which iteratively learn relationships between inputs and outputs through copious examples.
Recently, commercial deep learning has soared in popularity because companies are selling easy-to-use, neural network-powered services. For instance, tech giants like Microsoft and Google make pre-trained neural networks for facial recognition easily accessible to clients across the world through their cloud technology. Hence, clients no longer need a technical background in deep learning to rapidly deploy their own artificial intelligence applications.
However, commercial deep learning is increasingly employed in critical, human-centered applications like policing. In human-centered deep learning, neural networks classify individuals based on their gender, race, and even potential for exhibiting criminal behavior.
But, these neural networks often produce disparate outputs for the majority group and marginalized communities, so human-centered deep learning can have harmful consequences like unfair incarceration and unfounded police targeting. Despite these issues, commercial deep learning services are usually not legally subject to audits. Critics argue that they should undergo accountability checks, given their wide reach.
My solution to preserving the benefits of commercial deep learning while prioritizing its integrity is three-fold:
- Commercial deep learning companies should be legally required to transparently report the results of their services on their own performance and fairness assessments prior to sale.
- Companies ought to educate their clients about biases in deep learning and ethics.
- Companies should rely on community-based approaches to improve their services.
Background, Technology, and Ethics
Pros and Cons of Deep Learning
Deep learning can improve the efficiency of society and quality of life of its members through a variety of applications.
For instance, deep learning enables smart personalized content recommendation on platforms like Netflix . This allows users to quickly discover new content they like based on past preferences. The worst harm a Netflix user could suffer is not being recommended a film they would enjoy; thus, this application is arguably benign.
Deep learning also enhances smart voice assistants like Siri and Cortana. Intelligent voice assistants can better serve members of society with visual impairments, a limited knowledge of English, or minimal experience with technology than can laptops and mobile devices.
More recently, deep learning has also entered autonomous vehicles, in which neural networks process sensory data to make “immediate and accurate driving decisions” . Notably, autonomous vehicles have the potential to eliminate poor decisions resulting from “fatigue, misperception, and intoxication,” which will protect the lives of humans . However, deep learning-powered decisions in critical situations, e.g. when harm is inevitable, can be unexpected and unexplainable. For example, in a popular autonomous vehicle training set compiled by Udacity, “thousands of vehicles, hundreds of pedestrians, and dozens of cyclists were not labelled” . Hence, an engineer may properly train an autonomous vehicle on this dataset yet witness undesirable decisions, such as hitting pedestrians.
This raises concerns about how often undetectable, undesired errors, imbalances, and correlations in immense amounts of training data may significantly affect the performance of a deep learning system.
Beyond self-driving cars, deep learning can assist the analysis of knee MRIs . Manual analysis of knee MRIs is “time-intensive and subject to diagnostic error,” but a deep learning, automated system for interpreting knee MRIs can “prioritize high-risk patients and assist clinicians in making diagnoses” . This ultimately makes healthcare more accurate and rapid, which improves the wellbeing of members of society.
Additionally, a DeepMind deep learning system “consistently achieved a 40 percent reduction in the amount of energy used for cooling” in Google’s data centers . This mitigates the environmental harms produced by these centers and improves worldwide pollution and health.
Ultimately, all of these beneficial deep learning applications exhibit only mild harms and have minimal drawbacks in ideal situations.
Commercial Deep Learning ➡️ Increased Accessibility
Positive deep learning applications are beginning to proliferate thanks to commercial deep learning. Complex, deep learning-powered services are becoming accessible to individuals without a technical background in the field because companies are selling them in easy-to-use packages.
Since its inception, deep learning has largely been an academic discipline, primarily accessible to researchers and practitioners with an advanced degree in a computational field. Given barriers in education systems and limited opportunities to learn computer science, deep learning has suffered from inaccessibility, especially to underserved communities and underrepresented minorities.
Recently, TensorFlow and PyTorch have allowed more individuals to implement and train neural networks without extensive technical knowledge. However, implementation and training are still not trivial tasks; moreover, PyTorch and TensorFlow are only amenable to those who have unconstrained access to computing power.
A popular cloud-based commercial deep learning platform is Microsoft’s Azure Cognitive Services (ACS). In contrast to TensorFlow and PyTorch, the deep learning services offered by ACS are engineered by advanced researchers, pre-trained, and easily requested and served over a network. Hence, commercial deep learning enables even deep learning novices to create significant beneficial applications.
In 2019, Bryan Chiang, a Computer Science student at UCLA, leveraged ACS to develop EasyGlucose, “a non-invasive blood glucose level monitor” that uses deep learning to analyze iris morphological variation in an eye image . Chiang’s invention can greatly increase the treatment and quality of life of diabetes patients globally.
With commercial deep learning tearing down accessibility barriers, more members of society can develop world-changing applications that advance humanity.
The Dark Side of Commercial Deep Learning
Despite the successes of commercial deep learning, it can enable unfounded discrimination.
In “Gender Shades,” Dr. Timnit Gebru and Joy Buolamwini’s pioneering work on assessing commercial deep learning models, they discovered that darker-skinned females are most misclassified of any intersectional group by three commercial gender classification systems; darker-skinned females faced error rates of up to 34.7% while the maximum error rate for lighter-skinned males is 0.8% .
Their experiments insinuate that deep learning models suffer from sample size disparities and limited features, especially for certain intersectional groups like darker-skinned females.
When these models are trained on unbalanced training sets with incorrect or incomplete data for a minority segment of a population, the models’ predictions are less precise for this minority group compared to the majority segment. This is because a model’s performance is gauged by its accuracy on an evaluation dataset without conditioning on any particular population segment.
The gender classification systems examined by Gebru and Buolamwini perform poorly on darker-skinned women because the systems’ training data lacked darker-skinned individuals, and lacked darker-skinned women to an even larger extent. Extrapolating to commercial facial recognition systems, sample size disparities and limited features may yield poor model performance for darker-skinned individuals, which makes them more susceptible to being “wrongfully accused of a crime based on erroneous misidentification” .
Although facial recognition systems purportedly allow cops to efficiently identify criminals without a background in deep learning, this lack of background insinuates that police overtrust the performance and fairness of these systems, not understanding that they are racially biased . Thus, if cops act in accordance with these systems’ outputs, they may disproportionately erroneously accuse darker-skinned individuals of crimes.
Hence, commercial facial recognition technology enables unwarranted racial discrimination with dire consequences. Commercial deep learning needs to face greater regulation to minimize the harms it poses to minorities and marginalized communities.
Another Example: Predictive Policing
In addition to facial recognition technology, deep learning-powered tools for predictive policing are being sold to police departments across the nation.
The company PredPol sells deep learning models that enable cops to predict where crimes will occur and prevent them. According to PredPol’s website, their per-city models predict crime using “historical event datasets,” and the models are updated every day based on new crime data.
However, PredPol’s models amplify social biases due to skewed sampling. If, due to an initial bias, the model for a city predicts there is a high probability of crime in a specific area, police will increase surveillance of this area and minimize surveillance of other areas.
Consequently, future observations of crime in the heavily-surveilled area will confirm the model’s predictions and there will exist fewer opportunities for observations of crime in other areas to contradict the model’s predictions. This causes the model’s initial bias to compound.
In 2016, Professor Kristian Lum and Dr. William Isaac analyzed the effects of the PredPol model on a digital simulation based on Oakland Police crime data . They projected the spatial distribution of drug arrests in Oakland and found that “arrests are concentrated in neighborhoods with predominantly nonwhite and low-income populations”. Furthemore, Black individuals were doubly more likely to be targeted by cops than white individuals .
The historical crime data used by PredPol to train its models are likely tainted by racism, e.g. the disproportionate targeting of Black people. This produces models with an initial bias that compounds as they are updated. Lum and Isaac’s analysis confirms this: the Oakland model reinforced inequalities over time, with the model learning from crime data influenced by crime predictions from the model itself .
Thus, predictive policing amplifies social biases and enables unfounded racial discrimination. Commercial predictive policing systems ought to be subject to greater regulation to minimize the harms they pose to minorities and marginalized communities.
Proposed Solutions and Recommendations
Ultimately, while commercial deep learning enables individuals without a background in the field to invent revolutionary technologies that positively impact the world, it can also engender racial discrimination in human-centered applications.
Furthermore, cloud-based commercial deep learning is wide-reaching, which means developers across the globe can leverage its power. However, this also means that biased technology is rapidly accessible worldwide.
Ethically, commercial deep learning is only as righteous as the most harmful application for which it could be used. Yet there is currently minimal oversight of it.
The FDA recently proposed a regulatory framework to validate medical products that use AI before they are made available to hospitals . However, no federal laws currently address commercial uses of facial recognition . Companies are primarily held accountable by independent assessments by academics and watch groups. For instance, in response to Buolamwini and Gebru’s research, IBM improved its training data and increased the accuracy of its Watson Visual Recognition for facial analysis, additionally reporting error metrics for its deep learning model’s performance on individuals from a spectrum of skin tones across six nations . Furthermore, facing growing criticism this year, IBM stopped selling general-purpose facial recognition software “out of concern it could be used for mass surveillance or racial profiling” .
However, PredPol simply published a rebuttal to Lum and Isaac’s study, failing to reform its technology . Because academic and social pressures are not sufficient to keep commercial deep learning companies in check, government intervention is necessary.
Recommendation 1: Transparent Results and Model Cards
I propose that commercial deep learning companies be legally required to transparently report the results of their services on their own performance and fairness assessments prior to sale.
In this way, companies, driven by public pressure, hold themselves accountable to only offering deep learning services that have a low likelihood of harming minority and marginalized groups. Notably, I do not recommend that commercial deep learning services be required to pass government-set assessments to be sold. This is because companies will perpetually improve the strength of their own assessments, driven by competition from other companies to gain the trust of clients. Moreover, companies are more capable of creating high-quality assessments specific to their services than is the government. Lastly, legally banning the sale of commercial deep learning services until they pass government-set assessments may stifle the development of inventions with a positive impact on humanity.
Commercial deep learning companies could draw inspiration for their own assessments from “Gender Shades.” This study urges companies to perform intersectional error analysis of facial detection . This analysis could mitigate harms due to limited features and sample size disparities by informing companies to compile training datasets with more balanced phenotypic and demographic representation and higher-quality features. Buolamwini and Gebru also suggest that companies transparently report the errors of human-centered deep learning models on relevant subgroups, such as on Black women for gender classification .
In addition, I recommend that companies publish the phenotypic and demographic composition of their training and testing datasets to increase accountability. Building upon “Gender Shades,” the paper “Model Cards for Model Reporting” insists that companies release model cards with their services. These model cards detail neural network architecture and training, evaluations performed across “different cultural, demographic, or phenotypic groups […] and intersectional groups,” and an explanation of the evaluation metrics, data, and caveats . For instance, a company might transparently assess their speech recognition service by:
1) disclosing that they employed an LSTM neural network architecture, trained their model using the Voices dataset with batch gradient descent and the Adam optimizer, and randomly searched for their hyperparameters
1.1) detailing the phenotypic, demographic, and intersectional composition of the Voices dataset
2) reporting their model’s accuracies, true positive rates, and false positive rates on intersectional subgroups of speakers of different languages with various accents on the evaluation dataset
2.1) explaining the significance of the accuracies, true positive rates, and false positive rates
2.2) detailing the phenotypic, demographic, and intersectional composition of the evaluation dataset
2.3) hypothesizing potential pitfalls of this evaluation procedure
Model cards allow the government and independent organizations to reproduce companies’ assessments, which ensures factual reporting. Model cards also list the intended applications of deep learning services, to prevent clients from using the services in harmful or inappropriate ways .
Recommendation 2: Educate Clients about AI Ethics
I urge commercial deep learning companies to additionally educate their clients about biases in deep learning and ethics.
It is insufficient to simply provide intended applications of services, as clients will not have the knowledge to decide on gray-area use cases.
Companies should create engaging tutorials on using AI responsibly that are accessible to all clients. Clients should evaluate the intended uses, unintended uses, and the potential negative implications of intentional misuse and failure of their deep learning service.
Through this activity, police departments, for example, may reconsider how or if they will continue to use facial recognition and predictive policing technology.
Recommendation 3: Community-Based Approaches
Lastly, companies ought to rely on community-based approaches.
In human-centered deep learning, minority and marginalized groups are often adversely impacted. However, the voices of these groups are usually not considered by companies in discussions about how to improve their services. Community-based approaches “bring those who interact with and are affected by an algorithmic system into the design process” .
For instance, oncology nurses and doctors may guide the creation of a deep learning model for detecting tumors. Furthermore, Black individuals from communities in which police use facial recognition or predictive policing technology can raise their concerns and provide anecdotes about the harms the technology poses, to drive its improvement.
However, community-based approaches often exploit the free labor of minority and marginalized groups . I recommend that, in return for their consultation work, these groups receive financial compensation and long-term support from companies. This support could include sponsoring a deep learning fundamentals and ethics curriculum for these groups’ local public schools and supporting these groups’ civic battles against the unjust usage of commercial deep learning. For example, I teach students at underserved schools in Los Angeles about AI ethics to prepare them to have informed conversations about AI with friends and family and fight against algorithmic injustices in their own communities. Unfortunately, research substantiating the benefits of providing a deep learning education to minority and marginalized groups is in its infancy and is a promising direction of future study.
To summarize, my proposed solution preserves the benefits of commercial deep learning while improving accountability for its integrity. Commercial deep learning companies should be legally required to transparently report the results of their services on their own performance and fairness assessments prior to sale. Additionally, companies should educate their clients about biases in deep learning and ethics, and rely on community-based approaches.
I strongly encourage everyone to learn more about the drawbacks of commercial deep learning and educate their friends and family. Furthermore, I urge everyone to advocate for federal legislation in line with my solution that holds companies accountable for minimizing harms to minority and marginalized groups.
 Hallinan B, Striphas T. 2014. Recommended for you: The Netflix Prize and the production of algorithmic culture. New Media & Society 18:117–137.
 Cunneen M, Mullins M, Murphy F. 2019. Autonomous Vehicles and Embedded Artificial Intelligence: The Challenges of Framing Machine Driving Decisions. Applied Artificial Intelligence 33:706–731.
 Quach K. 2020. Please check your data: A self-driving car dataset failed to label hundreds of pedestrians, thousands of vehicles. The Register. https://www.theregister.com/2020/02/17/self_driving_car_dataset/
 Bien N, Rajpurkar P, Ball R, Irvin J, Park A, Jones E, Bereket M, Patel B, Yeom K, Shpanskaya K et al. 2018. Deep-learning-assisted diagnosis for knee magnetic resonance imaging: Development and retrospective validation of MRNet. PLOS Medicine 15:e1002699.
 Evans R, Gao J. 2020. DeepMind AI Reduces Google Data Centre Cooling Bill by 40%. Deepmind. https://deepmind.com/blog/article/deepmind-ai-reduces-google-data-centre-cooling-bill-40
 UCLA CS Department. 2019. CS freshman Bryan Chiang wins the 2019 Microsoft ImagineCup World Championship | CS. Cs.ucla.edu. https://www.cs.ucla.edu/cs-freshman-bryan-chiang-wins-the-2019-microsoft-imaginecup-world-championship/
 Joy B, Timnit G. 2018. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. Proceedings of Machine Learning Research: Conference on Fairness, Accountability, and Transparency 81:1–15. http://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a.pdf
 Greenberg P. 2020. Facial Recognition Gaining Measured Acceptance. Ncsl.org. https://www.ncsl.org/research/telecommunications-and-information-technology/facial-recognition-gaining-measured-acceptance-magazine2020.aspx
 Lum K, Isaac W. 2016. To predict and serve?. Significance 13:14–19.
 Benbouzid B. 2019. To predict and to manage. Predictive policing in the United States. Big Data & Society 6:205395171986170.
 Hao K. 2020. The FDA wants to regulate machine learning in health care. MIT Technology Review. https://www.technologyreview.com/2019/04/04/65911/the-fda-wants-to-regulate-machine-learning-in-healthcare/
 Mitchell M, Wu S, Zaldivar A, Barnes P, Vasserman L, Hutchinson B, Spitzer E, Raji I, Gebru T. 2019. Model Cards for Model Reporting. Proceedings of the Conference on Fairness, Accountability, and Transparency — FAT* ‘19.
 Sloane M. 2020. Participation-washing could be the next dangerous fad in machine learning. MIT Technology Review. https://www.technologyreview.com/2020/08/25/1007589/participation-washing-ai-trends-opinion-machine-learning/