Stories by Pooja Mahajan on Medium

Quantization

Pooja Mahajan — Wed, 19 Feb 2025 17:22:04 GMT

Model compression techniques play a crucial role in optimizing deep learning models for deployment, especially in edge computing, real-time applications and especially in the era of LLMs. By reducing model size and improving inference efficiency, these techniques ensure that even complex models like LLMs can be deployed effectively without compromising performance.

Among these techniques, Quantization stands out as a popular and effective approach to reducing inference latency. The key idea is to reduce model size by using fewer bits to represent the model parameters. It is easy to implement, requires no special architectural modifications, and can generalize across different model architectures.

Before delving into quantization, let’s understand few more details about the popular datatypes used in this scenario.

Floating-Point Precision and Quantization Shifts

FP32–1 sign bit, 8 exponent, 23 fraction
FP16–1 sign bit, 5 exponent, 10 fraction
BF16–1 sign bit, 8 exponent, 7 fraction (stands for brain float, this is quite popular choice!)

Key point in this comparison to note is BF16 has a bigger range than FP16 although precision is lesser compared to FP16.

Comparison of Float datatypes

Common lower precision datatypes — FP16, BF16, INT16, INT8.

Common quantization scenarios:- FP32 -> FP16 , FP32 -> BF16, FP32 -> INT8

Pros and Cons of Model Quantization

Pros

Ease of implementation
Reduced memory footprint
Faster training
Faster inferencing

Cons

Rounding due to reduced number scale can lead to model performance challenges.
Overflow and underflow can be marked as 0 leading to different understanding.

Types of Quantization

Quantization Aware Training — Models are trained in lower precision itself. You can use less memory for each parameter, which allows you to train larger models on the same hardware.
Post-training Quantization — Quantization is applied after model training, at inference to optimize efficiency without retraining.

In this post, we will delve more into post training quantization approaches.

Approaches to Post-Training Quantization :-

Approach 1 — Downcasting

Downcasting involves converting a model’s parameters to a more compact data type, such as BF16, to reduce memory usage. During inference, computations are performed in this lower-precision format.
While downcasting generally works well with BF16, using smaller data types may lead to performance degradation. Notably, converting the model to an integer type like INT8 will likely cause incompatibility issues.

Model Casting

model.to(DATATYPE): Converts the model's parameters to the specified datatype.
model.half() : Converts the model’s parameters to FP16(half precision).
model.bfloat16(): Converts the model’s parameters to BF16 precision.

Example using “to” method

Approach 2 — Linear quantization

Linear quantization reduces the model size by storing only the quantized weights along with the scale and zero-point values required for transformation process.
It enables the quantized model to maintain performance much closer to the original model by converting from the compressed data type back to the original data type(FP32) during inference, helping to maintain performance while optimizing memory usage.
So when the model makes a prediction, it is performing the matrix multiplications in FP32, and the activations are in FP32. This enables you to quantize the model in data types smaller than BF16, such as INT8, as well.

Image from — Deeplearning.ai

As per the above image, during linear quantization FP32 values will be mapped to INT8 and scale and zero parameters will be saved. Dequantization (i.e. mapping back to FP32) can be performed using these linear mappings (scale and zero). While mapping back to FP32, we might not get exactly the same tensors though, as linear quantization will result in the loss of some information known as quantization error.

For more details and practical implementation refer the code repo here.

So finally we have made to the end of this blog, and this is just the tip of the quantization iceberg !

References:-

Batch vs Online Prediction

Pooja Mahajan — Mon, 11 Nov 2024 13:19:54 GMT

In this article, we will discuss one of the key aspects of the model deployment phase i.e. determining the prediction strategy— Online or Batch. While this depends on factors such as latency requirements, prediction frequency, error tolerance, type of use case and end user requirements, we will explore the key differences between both and outline when each is most appropriate.

This blog is in continuation of my previous blog Scratching deployment surface.

AI Generated Image from Pixlr.com

Batch Prediction

In Batch prediction, predictions are computed for multiple data samples all at once in a scheduled manner, then stored in a database and retrieved as per request. While computation, prediction requests can be sent directly to the model resource without deploying it on an endpoint. Also known as asynchronous prediction, as predictions are generated independently of the requests and precomputed in advance.

Pros

Primarily optimized on throughput (i.e. processing large volumes of data in single run).
Generates multiple predictions at once and can make use of distributed computing (e.g. Spark)
Can be scheduled at off-peak hours for efficient resource usage.

Cons

Generates predictions even for those users or queries that might not be requested ever, resulting in unnecessary compute and storage usage.

When to use Batch Prediction?

Ideal scenario to use when immediate results are not required as in use cases like recommender system, customer segmentation, etc. where predictions are needed at some interval (daily, weekly, monthly).
Batch prediction is useful when input queries are known in advance, allowing predictions to be generated all at once. For example, predicting which customers will buy a product can be run for your entire customer base. However, in cases like language translation, chatbots, or virtual assistants, where queries are unpredictable, batch prediction is not feasible.

Companies using Batch Prediction- Netflix uses batch prediction for recommendations, Doordash uses batch prediction for recommending restaurants.

Online Prediction

Online Prediction also known on-demand predictions where predictions are generated as and when the request comes. Prediction is performed on a single observation of data rather than a batch. It is also called synchronous prediction as predictions are generated in sync with requests. Plus, it can be done at any time of the day in real -time basis rather than scheduled as in batch prediction.

Pros

Optimized for low latency(i.e. quickly processes and return predictions almost instantly after receiving input.)
Predicts only for the required requests unlike batch prediction where predictions are made for all users who might not even require the predictions.

Cons

Online prediction requires significant computational resources to handle real-time requests like dynamic scaling leading to higher operational costs.
It requires constant monitoring and maintenance to ensure optimal performance and quick issue resolution.

When to use Online Prediction? — When predictions are needed immediately as in real-time use cases like fraud detection, machine failure detection, self driving vehicles, etc. where there is lower error tolerance. Online predictions are more suited to models with fast inference times i.e. less complex in nature or if they are complex they need to be optimized.

Companies using Online Prediction — You tube uses online prediction for video recommendation based on viewing history, Amazon uses online prediction to recommend products to users in real-time.

How about combining both ?

While both online and batch prediction has advantages and disadvantages, combining both the types of prediction is practical in many scenarios. For e.g. in Ad Targeting and Personalization use cases, online prediction can be used for predicting which ad to show to a user based on their most recent behavior considering searches, clicks, etc. and batch prediction can be used to periodically update user profiles and segmentation by analyzing their historic browsing habits and purchase history.

So that’s all folks!

We have made to the end of this article. I hope this article helped you grasp the key differences between the two prediction strategies and will be useful in guiding your decisions for future use cases.

References :-

Scratching the surface of ML Deployment

Pooja Mahajan — Sat, 14 Aug 2021 13:19:23 GMT

In this article we will be discussing some of the deployment considerations and patterns. Deployment being the last step of machine learning cycle has gained lot of traction recently, so let’s try to get hang of some of the concepts.

We will be discussing some of the common deployment patterns i.e. how to start consuming your new algorithm in production. Apart from this, we will touch some of the concepts related to changing data distributions that affect model predictions.

Image Courtesy — Unsplash

Deployment Patterns

By deployment pattern we mean how to start consuming your new algorithm in production. It can be a replacement of an old algorithm, or replacement of old methods(manual/conventional),etc.

A) Shadow deployment :

The aim of shadow deployment is to deploy and evaluate new algorithm’s performance but not to use it for real-time predictions.
Currently used method is utilized for actual predictions and performance comparison is done (e.g. comparison with another deployed model or conventional methods using human predictions, etc.).
It’s a decent way to judge a new model and risk free.
No impact on current production and new algorithm can be tested with production load.

B) Canary deployment:

The main idea of canary deployment is to roll out predictions for a small proportion of traffic using new algorithm and evaluate performance.
It helps to monitor and spot problems early, if any.
It can be ramped up gradually (i.e. incremental increase in the traffic proportion for newer algorithm).

C) Blue green deployment:

Here blue signifies current prediction service and green as new prediction service.
We can set up new prediction service separately without stopping current one and can shift to the new prediction service. If it doesn’t go well, we can point to blue again.
Easier to rollback and no downtime while cost and operational overheads will be there.

Image — by Author

Deployment considerations

Once the model is deployed it is important to understand changes in statistical distribution of the data being used. There are two aspects that should be considered.

A) Data drift

It arises when there are changes in distribution of underlying variables which results in degrading model predictions, or in simple terms when distribution of independent variables change.
This drift can arise from changes in underlying business logics, data quality challenges, etc. E.g. in case of image recognition data drift can arise due to setup changes like lighting, device, etc. , or in speech recognition systems due to upgradation of microphone.

B) Concept drift

It arises when relation between predictor and target variables has changed resulting in model prediction degradation, or simply put when relation of x -> y changes (x being independent variable and y being dependent variable).
A common example of concept drift can be online shopping patterns of customers before and after COVID as there has been lot of changes in consumer buying behavior. Another example can be a price prediction model where the target variable relation(price) has changed due to inflation.

So that’s it, we have made to the end of this article. It’s just the tip of the deployment iceberg, there is a lot underneath!

References

Scratching the surface of ML Deployment was originally published in The Startup on Medium, where people are continuing the conversation by highlighting and responding to this story.

Avoid for loops with Vectorization

Pooja Mahajan — Fri, 30 Jul 2021 13:25:00 GMT

Vectorization in Python

In this article, we will discuss about how to speed up your code by using NumPy’s feature of Vectorization.

Python’s NumPy arrays enable us to express batch operations on data without writing any explicit for loops, referred as Vectorization. This is one of the key features of optimization using NumPy apart from broadcasting.

Let’s dig deep into vectorization!

Vectorized array operations execute faster than their python equivalents and thus have a high impact in numerical computations. The reason why NumPy is efficient because it’s operations are mapped to highly optimized C codes.

Image — Unsplash.com

In deep learning practice, we often train on relatively large data sets, so it’s important that our code should be optimized to execute fast. In these scenarios vectorization is quite useful, it enables us to take much better advantage of parallelism and to do our computations much faster on CPUs and GPUs as well.

Examples to showcase impact of Vectorization

Example 1 — Dot product operation

Approach 1 — Using for Loop

# Import libraries
import numpy as np
import time
import math

#1. Using for loop
n = 1000000
arr1 = np.random.rand(n)
arr2 = np.random.rand(n)

# dot product using for loop
start_time_1 = time.time()
output=0
for i in range(n):
    output += arr1[i]*arr2[i]

print("time taken using for loop approach ",\
        time.time()- start_time_1,'ms')
print('Output',output)

Approach 2 — Using NumPy

## 2. using vectorization
start_time_2 = time.time()
output1 = np.dot(arr1,arr2)
print("time taken using vectorization",\
      time.time() - start_time_2,'ms')
print('Output',output1)

In the above example, dot product operation has been performed and it is clearly visible that for the same operation ‘for’ loop takes 0.6ms while vectorization takes merely 0.003ms.

Example 2 : Exponent operation

n = 100000000
arr1 = np.random.rand(n)
output = np.zeros((n,1))

## Approach 1
print("Using for loop")
start_time = time.time()
for i in range(n):
    output[i] = math.exp(arr1[i])

print("time taken ",time.time()- start_time,'ms')
print('Output - first 3 values',output[:3])

## Approach 2
print("Numpy implementation")
start_time = time.time()
output1 = np.exp(arr1)
print("time taken ",time.time()- start_time,'ms')
print('Output - first 3 values',output1[:3])

Time taken for ‘for’ loop is way higher than vectorization

From the above example it may feel like 70 ms is way small even using ‘for’ loop but vectorization showcases its core strength while working with big datasets and creates magic by executing computations at a much faster speed!

References:

Understanding Avoidable Bias!

Pooja Mahajan — Tue, 18 May 2021 11:52:26 GMT

In this article, we will discuss comparing our model accuracy with human-level performance and discuss the concepts like avoidable bias and how to tackle it!

Image Source

What is avoidable bias?

The difference between human error (approximation of Bayes error) and the training error is termed avoidable bias.

The perfect level of accuracy may not be always 100% and Bayes optimal error is the very best theoretical function that can never be surpassed(best possible error).

Comparison with human-level performance

As long as our model is doing worse than humans we can use some tactics for improving our model. Although knowing about bias and variance helps and it turns out that knowing how well humans can do on a task can help us understand better how much we should try to reduce bias!

Scenario 1- You have built a cat classifier and you got error percentages for training and dev sets as 8% and 10% respectively. Plus let’s say human error is 0.5% for this dataset.

In this example:-

Avoidable Bias is 7.5% and variance is 2%.
The next steps should focus on reducing the avoidable bias.

How to reduce Avoidable bias?

Train a bigger neural model.
Train longer or with better optimization algorithms.
Finding better neural network architecture.

Scenario 2 -You have built a cat classifier and you got error percentages for training and dev sets as 8% and 10% respectively and human error for this dataset is 7.2% (images were of poor quality s.that it is difficult for human eyes also to classify a cat in the image).

In this example:-

Avoidable Bias is 0.8% and variance is 2%.
The next steps should focus on reducing the variance.

How to reduce Variance?

Get more training data.
Regularisation (Dropout, L2 regularisation, Batch Normalisation, etc.)
Data Augmentation
Early stopping
Finding better neural network architecture.

If you want to read more about these pointers mentioned above refer to my previous post.

So that’s it! You have made it to the end of the post. For more information about these concepts check out the references below!

References:-

AI in Healthcare -Closer Look!

Pooja Mahajan — Wed, 07 Apr 2021 15:58:35 GMT

Photo by Kendal on Unsplash

In this article, I will be taking you through some of the practical considerations and challenges of using AI in Medical diagnosis. The criticality involved in this domain makes it important to pay heed to these challenges apart from the core AI/ML practices.

Dataset considerations

Imbalanced dataset —The presence of a higher number of data samples without disease rather than with disease results in imbalanced datasets and thus poses issues in training algorithms for medical analysis.

Solution:

a) Resampling data (Oversampling, Undersampling, etc.)

b) Modifying loss function i.e. using weighted loss to incorporate the effect of imbalanced classes.

2. Small training dataset — Absence of labeled dataset or small training data availability especially for Image analysis tasks like thoracic disease detection using Chest-Xray, Brain Tumor detection using MRI Scans, etc.

Solution:

a) Transfer Learning- Transfer learning implies adapting a network trained for one problem to a different problem. With transfer learning, we can build good classifiers with a few hundred images. Find more info on transfer learning here.

b) Data Augmentation- We can use different transformations on the data to increase the breadth of training data. The only thing we need to be careful in the case of medical image data transformations is to choose transformations keeping in mind that post-transformation Y-label(Ground truth) remains true.

For eg. If on a chest X-ray, a horizontal flip is used that will cause the heart to appear on a different side implies a separate kind of medical problem. Find more info on data augmentation here.

Image Source

Model Testing considerations

Patient overlap- While creating training, validation and test set it should be made sure that there is no patient overlap among these sets as this issue can result in over-optimistic test results due to the same patients present in training and test sets. Splits should happen based on patients to resolve this issue.
Deciding ground truth- To decide on ground truth for a sample is a different challenge that may arise due to interobserver disagreement where one practitioner's opinion may differ from another and is quite common in the medicine domain. Consensus voting where a group of human experts is considered to determine the ground truth can be a way to tackle this issue.

Practical Considerations

Different populations’ characteristics— Achieving reliable generalization is a challenge for AI models using medical data. For e.g. Chest X-ray data for India may look quite different from the US so a model built on US patient’s data may not be very effective for other geographies.
External validation — While the AI models are built on historical data but how it works on real-world data marks whether the AI model can be directly used or it needs to be fine-tuned as per new data. However, to understand the utility of AI models in the real world, these need to be applied to real-world data(prospective data).

So, we have reached the end of this post and learned about dataset challenges, model testing, and practical implementation challenges that can be examined for healthcare use cases.

References :

AI in Healthcare -Closer Look! was originally published in CodeX on Medium, where people are continuing the conversation by highlighting and responding to this story.

Transfer Learning using PyTorch

Pooja Mahajan — Tue, 16 Mar 2021 16:51:09 GMT

Image Source

Transfer learning implies adapting a network trained for one problem to a different problem. It is common to pre-train a CNN on a very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories), and then use that either as an initialization or a fixed feature extractor for the task of interest.

Why use Transfer Learning?

Using large networks that were trained with vast datasets for our new tasks reduces time and computation requirements.
It is relatively rare to have a dataset of sufficient size. With transfer learning, we can build good classifiers with a few hundred images.

Types of Transfer Learning:

Finetuning — Starting with a pre-trained model and updating all of the model’s parameters for the new task and thus retraining the whole model.
Feature Extraction — Starting with a pre-trained model and only updating the final layer weights from which predictions will be derived. Pre-trained CNN is used as a fixed feature-extractor, hence the name.

Transfer learning using the pre-trained model

PyTorch’s torchvision.models have been pre-trained on the 1000-class Imagenet dataset. In the example below, I have implemented feature extraction transfer learning where we will load the pre-trained model and update the final layer. I am using the CIFAR-10 dataset and Resnet18 pre-trained model.

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224.
The images have to be loaded into a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]

Code snippet showcasing Image resize to (224,224) and Normalisation using suggested values

Instancing a pre-trained model will download its weights to a cache directory. After that, we are freezing model weights by setting requires_grad to False.

When we print the model, we see that the last layer is a fully connected layer.

So we are reinitializing model.fc such that it has 10 output features at the end(corresponding to our task).

A glimpse of model summary (last few layers) — We can see the last layer has 10 output features as updated in the previous step.

Model Training — In few epochs only, you can see the train and test accuracies have reached 80%.

You can find the source code for the example shown here.

You can find more about some common rules of thumb for when and how to finetune here.

References:

Transfer Learning using PyTorch was originally published in Towards Dev on Medium, where people are continuing the conversation by highlighting and responding to this story.

Broadcasting in Python

Pooja Mahajan — Wed, 18 Nov 2020 13:25:51 GMT

In this post, we will discuss ‘Broadcasting’ using NumPy. It is also used while implementing neural networks as these operations are memory and computationally efficient.

So let’s understand what Broadcasting means followed by a few examples!

Broadcasting describes the way numpy treats arrays with different shapes for arithmetic operations. The smaller array is broadcasted across the larger array so that they have compatible shapes. It provides a way to vectorize array operations thus leading to efficient implementations.

Broadcasting Examples- Image Source

The light bordered boxes represent the broadcasted values, this extra memory is not actually allocated during the operation, but can be useful conceptually to imagine that it is.

Normal arithmetic operation and broadcasting comparison

In the above example for both cases output is same while the second case uses the concept of broadcasting and is more efficient. NumPy is elegant enough to use the original scalar value without actually making copies making broadcasting operations more efficient.

More Broadcasting examples

Example 1- Arithmetic operation between two arrays where the first array has shape (m,n) and the second array with shape (m,1).

Output:

Final Output shape is (2,3) corresponding to ‘a’

Example 2- Arithmetic operation between two arrays where the first array has shape (m,n) and the second array with shape (1,n).

Output:

Final Output shape is (2,3) corresponding to ‘a’

Example 3- Arithmetic operation between an array and scalar value where the array has shape (1,n)

Output:

Final Output shape is (1,4) corresponding to ‘a’

Example 4- Arithmetic operation between an array and scalar value where the array has shape (m,1).

Output:

Final Output shape is (4,1) corresponding to ‘a’

In the examples above I have used ‘+’ arithmetic operation to demonstrate Broadcasting, it can be replicated for other arithmetic operators too in a similar fashion.

So that's it! You have made it to the end of this quick demo of Broadcasting in Python.

References:

Image Filters with Convolutions

Pooja Mahajan — Thu, 12 Nov 2020 12:25:56 GMT

Using Python’s scipy library for implementation

In this post, I will discuss about Convolutions and how they act as image filters by implementing convolution operation using a few edge detection kernels.

So let’s start!

What are Convolutions?

Convolutions are mathematical operation on two functions that produces a third function that expresses how one is modified by the other.

To give an example, the first function can be the image and the second function is a matrix sliding over the image(kernel) that results in transforming the input image. Kernel works by multiplying the patch of the image corresponding to the kernel’s size and summing up the result obtained and then sliding over the image to perform the same process again.

Image Source

Thus, Convolution operation can help in extracting features from images with the help of kernels. For more details on Convolutions and Kernels, please refer to my previous blog here.

Image Filters with Convolutions

In order to demonstrate how convolution operation can be used as Image Filters, I have used signal.convolve2d function from the scipy library of Python that outputs convolution result for two 2D arrays.

Importing required libraries

Loading and displaying images

Defining a function to detect features from the image using specified kernel

Detecting horizontal edges

Transformed image after convolution operation using a horizontal edge detector kernel

Detecting vertical edges

Transformed image after convolution operation using a vertical edge detector kernel

Detecting diagonal edges

Transformed image after convolution operation using a diagonal detector kernel

This was a quick demo to show when a kernel convolves over an image it can act as a filter to extract a particular feature from it.

You can find the corresponding codes here.

Fully Connected vs Convolutional Neural Networks

Pooja Mahajan — Fri, 23 Oct 2020 05:45:12 GMT

Implementation using Keras

In this post, we will cover the differences between a Fully connected neural network and a Convolutional neural network. We will focus on understanding the differences in terms of the model architecture and results obtained on the MNIST dataset.

Fully connected neural network

A fully connected neural network consists of a series of fully connected layers that connect every neuron in one layer to every neuron in the other layer.
The major advantage of fully connected networks is that they are “structure agnostic” i.e. there are no special assumptions needed to be made about the input.
While being structure agnostic makes fully connected networks very broadly applicable, such networks do tend to have weaker performance than special-purpose networks tuned to the structure of a problem space.

Multilayer Deep Fully Connected Network, Image Source

Convolutional Neural Network

CNN architectures make the explicit assumption that the inputs are images, which allows encoding certain properties into the model architecture.
A simple CNN is a sequence of layers, and every layer of a CNN transforms one volume of activations to another through a differentiable function. Three main types of layers are used to build CNN architecture: Convolutional Layer, Pooling Layer, and Fully-Connected Layer.

To know more about the basic fundamentals related to CNN, check out my earlier blogs on Convolutions and Pooling.

Simple Convolutional architecture, Image Source

Dataset Used

MNIST (Modified National Institute of Standards and Technology database) dataset of 60,000 28x28 grayscale images of the 10 digits, along with a test set of 10,000 images.
It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.
It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.

Model Implementation

A) Using Fully Connected Neural Network Architecture

Model Architecture

For the fully-connected architecture, I have used a total of three hidden layers with ‘relu’ activation function apart from input and output layers.

Model Summary

The total number of trainable parameters is around 0.3 million. In a fully-connected layer, for n inputs and m outputs, the number of weights is n*m. Additionally, you have a bias for each output node, so total (n+1)*m parameters.

Model Accuracy

On training the fully connected model for five epochs with a batch size of 128, and validation split value set to 0.3 we got training accuracy of 98.6% and validation accuracy of 96.07%. Moreover, after 2nd epoch, we can visualize how train and validation accuracy tends to move wide apart.

Accuracy on Test data

On test data with 10,000 images accuracy for the fully connected neural network is 96%.

B) Using Convolutional Neural Network Architecture

Model Architecture

For Convolutional Neural network architecture, we added 3 convolutional layers with activation as ‘relu’ and a max pool layer after the first convolutional layer.

Model Summary

With CNN the differences you can notice in summary are Output shape and number of parameters. As compared to the fully connected neural network model the total number of parameters is too less i.e. 0.1 million.

Model Accuracy

On training, CNN for five epochs for a batch size of 128, and validation split value set to 0.3 we got training accuracy of 99.19% and validation accuracy of 99.63%. Moreover, unlike the fully connected model, we can visualize train and validation accuracy do not tend to move as wide apart.

Accuracy on the Test dataset

On test data with 10,000 images, accuracy for the fully connected neural network is 98.9%.

Final Thoughts

Although fully connected networks make no assumptions about the input they tend to perform less and aren’t good for feature extraction. Plus they have a higher number of weights to train that results in high training time while on the other hand CNNs are trained to identify and extract the best features from the images for the problem at hand with relatively fewer parameters to train.

Please find the relevant codes used in this blog here. On similar lines, you can find the implementation of CNNs on the FMNIST dataset using PyTorch here.

References :

Fully Connected vs Convolutional Neural Networks was originally published in The Startup on Medium, where people are continuing the conversation by highlighting and responding to this story.