Cheap Sugar: Correlations in Price and Sugar Content in Mainstream Yogurts

Nikita Bogdanov
The Startup
Published in
8 min readSep 26, 2019

Background

I have for a long time been suspicious of cheap food — whether in grocery stores, gas stations, restaurants, or other such establishments — primarily because of what in my experience is a direct relationship between price and quality. To be sure, some products, despite being manufactured at the same plant and out of the same raw ingredients, are marked up simply because of their brand label; yet in other cases, higher prices do seem to track something like quality. At base they might reflect more expensive ingredients, themselves elevated in price because of, say, their organic provenance, the more stringent environmental standards of their manufacturers, or their purity.

Were quality a measure simply of something like aesthetic sophistication, such price divergences would be of little practical concern; of course, quality is tied to much more than abstract ideals, most notably being quite intimately related to nutritional and health value, broadly defines. That is, quality, and any pricing differential attached to it, is of great interest precisely because of its significance to health. If, for example, as indeed often seems to be the case, cheap foods are overwhelmingly unhealthy, those with fewer financial resources will inevitably, through their greater consumption of such foods, expose themselves to greater health risks. In addition to its questionable ethics, this disparity is also economically suspect, as lower-income families and individuals are more likely to rely on government-funded healthcare, the program eventually responsible for covering their costs for the management of diabetes and heart disease, among other illnesses.

A separate and equally interesting dynamic influences food pricing further: consumer preference. It may well be that, while a company could increase the quality of their product’s ingredients without significantly affecting its price, doing so would decrease the appeal to their target population of the new product’s flavor profile. An especially sweet ice cream, for example, were it to be made less sweet and to be made of healthier ingredients, may no longer satisfy the tastes exactly of that population which is the predominant purchaser of the product. Given that disparities in health education breakdown along lines of class (and race and ethnicity), and that health education may prompt consumers to opt for healthier even if less immediately delicious products, this dynamic may drive unhealthier foods to be cheap and healthier ones to be more expensive; it is worth noting, too, that eating healthy may be culturally associated with being white and upper-middle class, in such a way as to make it less appealing to those of other cultures, a subtle form of (culinary) stereotype threat. On this model, as supply and demand predicts, companies would for each product select those ingredients that create a taste appealing to their target population, first, and second would set that product’s price as high as they think their target consumer can justify paying. While the result is the same, namely that cheaper foods are less healthy and more expensive foods more so, rather than to external constrains, this explanation looks to this consumer as the primary mechanism of such cost sorting.

The complex of mechanisms in fact underlying any relationship between price and nutritional and health value is not doubt complicated and is likely, at the very least, a combination of the above mechanisms. Moreover, the relationships are made more complex by the likely interdependence of preferences and prices: each affects the other and to unknown and probably unpredictable degrees. The above, therefore, should be taken for only hypothesis.

The Problem

Aware that much of the above assessment is pure speculation, and especially frustrated with the amount of sugar present in so many of the yogurts I see in grocery stores, a recent trip to the store prompted me to put my core question to the test — or at least, to a limited trial, neither broad nor statistically sophisticated but interesting nonetheless: for yogurt, would there be any relationship between the amount of sugar per serving (standardized at one cup) and the price?

I began by installing a third-party, web-page-scraping extension to Chrome and by tuning it to extract product information from the results of a search for “yogurt” on the Safeway website; specifically, I went after the product’s name, sale and original prices, ingredients list, and nutritional facts. From there, in addition to cleaning up all of this data, I further extracted the number of items per sale unit (e.g., 6 individual cups of yogurt per package; or 1 large tub) and the size of each individual unit. Finally, I screened the data for completeness and relevance, removing some 20 products because they did not qualify as yogurt and another 70 because they did not have complete nutrition information; out of an original 452 individual products, only 362 were relevant and complete and only these made it into the final analysis.

Knowing that I would be interested in plotting whatever relationship came out of this analysis, I then jumped over from Excel to Python to analyze and visualize all of this data. Expanding some on the original question, I was especially interested in the relationship between sugar and both the item price and unit price, between price and other nutritional variables such as the amount of fat and protein and the calorie density, between container size and the amount of sugar, and between all of the prior variables and the sale and discount price.

Motivating this expanded set of questions was an interest in 1) product health quality as defined beyond the narrow measure of the amount of sugar present; and 2) the relationship, broadly, between the amount of sugar present and a product’s discount status. At worst, products with the most sugar, the least protein, and the most fat would be both the cheapest and the most likely to be on sale and cheap products would be only sugary products; at best, my browsing experiences would be incorrect and the relationship between sugar and price would be insignificant. Even if purely correlative, that is, even not the result of explicit motivation in product design and planning, the former finding, were it to be stable across products and time periods, would be of interest at least to public health advocates.

Results and Discussion

The results are best described as mixed. For example, while definite patterns exist, product variability is so large as to diminish correlation coefficients to below 0.32, at best; on a basic analysis, the trend lines only for Figures 1, 2, and 5 are statistically significant. Further, while there are some socially significant trends in sugar content and calorie density by (unit) price, the relationship between the unit price and the amount of fat and protein, as between the container size and the amount of sugar and between the discount amount and the amount of sugar is statistically weak. (Statistical significance is determined in the context of taking the null hypothesis to be that the regression slope is zero.)

A secondary trend of note is especially evident in Figures 1 and 2: as the unit price increases, the range of sugar amounts decreases, which is to say that among cheaper yogurts, as compared to more expensive ones, product diversity is relatively greater. While the present analysis does not go to the level of investigating causation, that more expensive products are spread across a narrower nutritional profile can support the theory that those products are narrowly tailored to appeal to a particular population, one that is at once financially capable and interested in some measure of health; specifically, this trend might support the conclusion that, while the preferences of working- and middle-class consumers are diverse, those of upper-middle- and upper-class consumers are relatively more narrow. That this trend does not as clearly obtain in Figure 4 may indicate further that the latter consumers are concerned more with sugar content than with calorie density.

Finally, it is worth noting that comparing the amount of sugar to the sale unit price versus to the non-sale unit price yielded only a marginal difference, suggesting, jointly with the low correlation between discount amount and amount of sugar, that high-sugar items are not meaningfully more or less likely to be on sale than low-sugar items.

Such an analysis ventures into the territory at once of attempting retroactively to extract consumer preference by economic class and, in turn, of looking to assess the impact of this spread of preferences on the ability of lower- and higher-income consumers to purchase healthy products. The data collected herein suggest that while product prices and nutritional profiles may reflect consumer preferences to some degree, they do not, at least in the domain of yogurt, appear to have a meaningful effect on the ability of all consumers to purchase healthy foods — nor that grocery stores are targeting specifically lower-income consumers by more regularly marking high-sugar items as on sale. In short, you can be on a budget and still find a reasonably healthy yogurt.

Limitations and Considerations for Future Inquiry

Conceptually, this project’s understanding of “healthy food” is rather narrow and anecdotal: rather than assessing the nutritional profile in its entirety, it focuses only upon sugar, fat, and protein and operates off of the overly simplistic assumption that sugar and fat are unhealthy and that protein is at the very least neutral.

Statistically, the project does not rigorously consider statistical significance nor does it take multiple snapshots of the same data set across time, to assess, for example, how sale prices fluctuate throughout the year. For this reason, the above conclusions, especially as they relate to the findings regarding discount amount, are informative only for the slice of time captured in this data set, most narrowly for August 30th, 2019.

Regarding the dataset itself, it is worthwhile to note that is comes from stock information available on the website of Safeway, Inc; other stores may price the same items differently, or may have a reduced in-store stock, affecting the accuracy and conclusions of the above analysis. This fact prompts me to mention further that by pulling pricing data from a grocery store, this analysis brings together the result of more or less strictly on-site decisions, such as whether to have a sale on a certain item, as well of pricing decisions made by the producers of all of the individual products.

Future work might consider the ingredients of each product, available in the linked data files in this post, asking, for example, whether cheaper yogurts are more likely to have more or fewer ingredients, to have sugar as a primary ingredient, and to have more processed ingredients; a marketing analysis could further examine the labels and sentiments associated with each product and with each brand. Although such information no doubt exists internally within each product’s marketing and development teams, an aggregate view of the larger ecosystem could provide interesting insight into the overall behavior and interaction of companies and consumers of different priorities and financial abilities.

See the original publication and download the relevant files at https://nikita-bogdanov.com/2019/09/02/cheap-sugar-correlation-price-sugar-content-yogurt/.

--

--

Nikita Bogdanov
The Startup

Nikita holds a BA in philosophy from Stanford University and is currently an MA student in English literature at Columbia University.