The Toxic Loops within Mathematical Models

Connor Sutton
The Startup
Published in
3 min readFeb 23, 2021

A Discussion of the Dangers of Big Data as Expressed by Cathy O’Neil

A model is a simplified, abstracted representation of some process, and human beings carry within them uncountable numbers of models. Everyone has internal representations, or models, of other places and people. You can imagine, perhaps, drafting a list of the likes and dislikes of friends and acquaintances. For those whom you know the best, a lengthy list would be easy for you to write up. For others, maybe acquaintances from work or school, it might be difficult to write even one or two things with confidence. This difficulty is determined by the strength of your model of the person, built up by experiences over time through which you internalize their personality and character. You may have models of places, foods, animals, events, ideas, etc. Sometimes, these models can be expressed mathematically.

Take, for example, the ads recommended to you by Google, or movies recommended by a service like Netflix. These companies create models of your interests by accumulating relevant data. The models are updated over time, as more data is fed to the model, and the recommendations will become stronger as the model understands you more. Models like these may be, for the most part, harmless; after all, what danger could such a model pose?

In Weapons of Math Destruction, Cathy O’Neil gives several examples of models actively increasing racial inequality and class polarization. Of course, this is not their purpose, but rather a byproduct of poor design and biased data — something incredibly difficult to avoid. O’Neil gives the example of the LSI-R, a model used to determine the risk factor of a certain criminal. The criminal is instructed to fill out a questionnaire, and based on the answers, the model outputs their risk factor which is used in deciding their fate. Questions like “How many prior convictions have you had,” “Is this your first time being involved with the police,” “What part did drugs and alcohol play,” are clearly going to tend to favor, say, a white teen from a wealthy suburb that never had to interact with the police or was not surrounded by illegal drugs, over a black teen from the inner city who may be surrounded by these things through no fault of their own. The model would thus, in this situation, perpetuate racial and class discrimination, creating a toxic feedback loop.

Models like LSI-R are being used more and more in all kinds of fields and contexts due to their low cost, scalability, and supposed efficiency, making decisions about who is admitted to a college, who gets to take out a loan, who gets hired and who gets fired, etc. This creates an enormous responsibility for the creators of these models to understand the data being used and to be transparent with the data and the decisions being made through the model. Users of models must be similarly responsible and cognizant of the risks.

--

--