Progressive Summarization V: The Faster You Forget, The Faster You Learn

Published in

Praxis

12 min readMar 5, 2018

PART 1 | PART 2 | PART 3 | PART 4 | PART 5

In Part I, I introduced Progressive Summarization, a method for easily creating highly discoverable notes. In Part II, I gave you examples and metaphors of the method in action. Part III included my top recommendations for how to perform it effectively. Part IV showed how to apply the technique to non-text media.

In Part V, I’ll show you how Progressive Summarization directly contributes to the ultimate outcome we’re seeking with our information consumption: learning.

The burden of perfect memory

In traditional schooling, the ability to recall something from memory is taken as the clearest evidence that someone has learned something. This is the regurgitation model of learning — the more accurately you are able to reproduce it, without adding any of your own interpretation or creativity, the higher your mark.

But in the real world, perfect recall is far from ideal.

This New York Times article tells the fascinating story of the 60 or so people known to have a condition called Highly Superior Autobiographical Memory (HSAM). They can remember most of the days of their lives as clearly as the rest of us remember yesterday. Ask one of them what they were doing on the afternoon of March 16, 1996, and within just a few seconds they’ll be able to describe that day in vivid detail.

These are people who have achieved the holy grail of recall — perfect memory. And yet, they often describe it as a burden:

“Everyone has those forks in the road, ‘If I had just done this and gone here, and nah nah nah,’ everyone has those,” she told me. “Except everyone doesn’t remember every single one of them.” Her memory is a map of regrets, other lives she could have lived. “I do this a lot: what would be, what would have been, or what would be today,” she said….“I’m paralysed, because I’m afraid I’m going to fuck up another whole decade,” she said. She has felt this way since 30 March, 2005, the day her husband, Jim, died at the age of 42. Price bears the weight of remembering their wedding on Saturday, 1 March 2003, in the house she had lived in for most of her life in Los Angeles, just before her parents sold it, as heavily as she remembers seeing Jim’s empty, wide-open eyes after he suffered a major stroke, had fallen into a coma and been put on life support on Friday, 25 March 2005.

It seems that perfect memory isn’t quite the blessing you’d expect.

The importance of forgetting

I propose that forgetting is just as important to the process of learning as recall. As the world changes faster and more unpredictably, attachment to ideas and paradigms of the past becomes more and more of a liability.

Contrast this with most books and courses on “accelerated learning,” which tend to offer two kinds of approaches:

#1 Increase the flow of information entering the brain

This leads to techniques like spritzing, listening to audiobooks on 2x speed, speed reading, focusing on already highly condensed sources, blocking distractions, deep focus, and biaural beats.

#2 Improve memory and recall of this information

This leads to techniques like spaced repetition, memory palaces, mnemonics, music and rhyming, acronyms, and mindmapping.

All these techniques work. And they completely miss the point. They both operate with the same misguided metaphor: the mind as an empty vessel. You fill it with information like filling a jug with water, which you can then retrieve and put to use later. With this framing, your goal is to maximize how much you can get in, and how much you can take out.

But there’s a fundamental difference between a mind and a static container like a jug of water or a filing cabinet: a mind can not just store things; it can take action. And taking action is where true learning actually takes place.

Here’s the problem: the more we optimize for storage, the more we interfere with action. The more information we try to consume, meticulously catalogue, and obsessively review, the less time and space remain for the actions that matter: application, implementation, experimentation, conversation, immersion, experience, collaboration, making mistakes.

Learning is not an activity, process, or outcome that you can dial in and optimize to perfection. It is an emergent phenomenon, like consciousness, attention, or love. These states become harder and harder to achieve by trying to force them, a phenomenon known as hyper-intention.

The truth is, we don’t need to “accelerate” or “improve” the way our mind learns — that is what it evolved to do. All day, all night, whether you’re working or resting, talking or listening, focused or mind-wandering — your brain never stops drawing relationships, making connections, and noticing correlations. You couldn’t stop learning if you wanted to.

Knowing that our brain is continuously collecting information, our goal switches from remembering as much as possible, to forgetting as much as possible.

The information bottleneck

Contrast this dim view of perfect memory with this article on new deep learning techniques in artificial intelligence. Specifically, a new theory called the “information bottleneck.”

The basic question researchers were trying to answer was, how do you decide which are the most relevant features of a given piece of information? When you hear someone speak a sentence, how do you know to ignore their accent, breathing sounds, background noise, and even words you didn’t quite catch, and still receive the gist of the message? It is a problem fundamental to artificial intelligence research, since computers will tend to give equal weight to all these inputs, and thus end up thoroughly confused.

It turns out, our highly constrained bandwidth for absorbing information is not a hindrance, but key to our ability to perform this feat. What our brain does is discard as much of the incoming noisy data as possible, reducing the amount of data it has to track and process. In other words, our brain’s ability to “forget” as much information as quickly as possible is what allows us to focus on the core message.

This is also how advanced new deep learning techniques work. Take for example an algorithm being trained to recognize images of dogs. A set of training data (thousands of dog photos) is fed into the algorithm, and a cascade of firing activity sweeps upward through layers of artificial neurons. When the signal reaches the top layer, the final firing pattern is compared to a correct label for the image — “dog” or “no dog.” Any difference between the final pattern and the correct pattern are “back-propagated” down the layers. Like a teacher correcting an exam and handing it back, the algorithm strengthens or weakens the network’s connections to make it better at producing the correct label next time.

This process is divided into two parts: in an initial “fitting” phase, the algorithm “memorizes” as much of the training data as possible. It tries to learn as much as possible about how to assign the correct labels. This is followed by a much longer compression phase, during which it gets better at generalizing what it has learned to new images it hasn’t seen before.

The key to this compression phase is the rapid shedding of noisy data, holding onto only the strongest correlations. For example, over time the algorithm will weaken connections between photos of dogs and houses, since most photos don’t contain both. It might at the same time strengthen connections between “dogs” and “fur,” since that is a stronger correlation. It is the “forgetting of the specifics,” the researchers argue, that enables the algorithm to learn general concepts, not just memorize millions of photos. Experiments show that deep learning algorithms rapidly improve their performance at generalization only in the compression phase.

The key to generalizing the information we consume — to learning — is strictly limiting the incoming flow of information we consume in the first place, AND then forgetting as much of the extraneous detail as soon as we can. Sure, we lose some detail, but detail is not what the brain is best at anyway. It is best at making meaning, at finding order in chaos, at seeing the signal in the noise.

This paper on the role of forgetting in learning used problem-solving algorithms to determine exactly how much forgetting was optimal. Using a series of experiments testing different hypotheses, they found that the optimal strategy involved learning a large body of knowledge initially, followed by random forgetting of approximately 90% of the knowledge acquired. In other words, performance improved as knowledge was forgotten, right up until the 90% mark, after which it rapidly deteriorated.

Strikingly, they found that this was true even if that 90% included problem-solving routines known to be correct and useful. Trying to “forget” only the least useful knowledge also didn’t help — random forgetting performed far better. The researchers used these results to argue for the existence of “knowledge of negative value” — forgetting it actually adds value.

Progressive Summarization is not a method for remembering as much as possible — it is a method for forgetting as much as possible. For offloading as much of your thinking as possible, leaving room for imagination, creativity, and mind-wandering. Preserving the lower layers provides a safety net that gives you the confidence to reduce a text by an order of magnitude with each pass. You are free to strike out boldly on the trail of a hidden core message, knowing that you can walk it back to previous layers if you make a mistake or get lost.

Minimizing cognitive load

How does Progressive Summarization help you offload as much of your thinking as possible? By minimizing the cognitive burden of interacting with information at all stages — initial consumption, review, and retrieval.

Cognitive load theory (CLT) was developed in the late 1980s by John Sweller, while studying problem solving and learning in children. He looked at how different kinds of tasks placed different demands on people’s working memory. The more complex and difficult the task, the higher the “cognitive load” it placed on the learner, and the greater the perceived mental effort required to complete it. He believed the design of educational materials could greatly reduce the cognitive load on learners, contributing to great advances in instructional design.

CLT proposes that there are three kinds of cognitive load when it comes to learning:

Inherent: the inherent difficulty of the topic (adding 2+2 vs. solving a differential equation, for example)
Extraneous cognitive load: the design or presentation of instructional materials (showing a student a picture of a square vs. trying to explain it verbally, for example)
Germane cognitive load: effort put into creating a permanent store of knowledge (such as notes, outlines, diagrams, categories, or lists)

Instructional design, inspired by CLT, focuses on two goals:

Reducing inherent load by breaking information into small parts which can be learned in isolation, and then reassembled into larger wholes
Redirecting extraneous load into germane load (i.e. focusing learner’s attention on the construction of permanent stores of knowledge)

P.S. accomplishes both objectives.

It reduces the inherent difficulty of the topic you’re reading about by eliminating the necessity of understanding it completely upfront. It instead treats each paragraph as a small, self-contained unit. Your only goal is to surface the key point in each “chunk” — each chapter, section, paragraph, and sentence — leaving it to your future self to figure out how to string those insights together.

It also helps redirect extraneous load into germane load, by saving all these chunks in a permanent store of knowledge, like a software program. You no longer have to hold in your head all the previous points in a text, and fit each new point into that structure on the fly. You dedicate your effort to constructing small chunks of permanent knowledge, which will be saved for later review.

But reducing cognitive load isn’t just about making learning easier. As learning becomes easier, it also becomes faster, better, deeper, and stronger.

Recall as inhibition

Why is minimizing cognitive load so important to making learning deeper and stronger?

Because new learning can be impaired when a reader is trying to remember too many things at once. The more bandwidth being used for remembering and memorizing, the less bandwidth is available for understanding, analyzing, interpreting, contextualizing, questioning, and absorbing in any given period of time. Like a bursting hard drive slows down a computer with even the fastest RAM, a brain crammed full of facts and figures starts to slow down even the smartest person.

This blog post describes recent research on what is known as “proactive inhibition of memory formation.” Offloading our thinking to an external tool lowers the brain’s workload as it encounters new information. In the experiments above, telling participants they didn’t have to remember a list of items enhanced their memory for a second list of items.

At first, offloading your thinking seems to cause you to remember less. Especially if you do it immediately, as you read, such as with highlighting. The ideas seem to jump directly from the page to your notes, barely touching your brain. But in the long run, you actually end up remembering more. Being able to frictionlessly hand off highlighted passages to an external tool, free of the anxiety that comes with keeping many balls in the air, you’re free to encounter the next idea with an empty mind. If it’s compelling, it will stick, regardless of any fancy memorization techniques you may think you need.

The more you try to memorize what’s in any given book, the less bandwidth left over for seeing the patterns across them

Your attachment to what you already know may actually interfere with your ability to understand new ideas. Clinging to our notecards, diagrams, and memorization schemes, we may be missing out on simply being present. Carefree immersion is, after all, how children learn. And they are the best learners in the world.

Training your intuition

Technology has given us the ability to “remember everything.” Coming from a legacy of information scarcity, this feels like a huge blessing. But it’s clear the blessing has become a curse. Our brains and our bodies are breaking under the strain of constant, high-volume, 24/7 information flows.

We must transition from knowledge hoarders to knowledge curators. We must learn how to frame our options about what to read, watch, and review in a way that restricts what we pay attention to, so we can see clearly instead of being overwhelmed.

What is being called into question is the very purpose of learning. What is learning for, now that we can access any knowledge on demand?

Learning is no longer about accumulating data points, but training our algorithm. Our algorithm is our intuition — our felt sense about what matters, what is relevant, what is interesting, and what is important, even if we’ve never seen it before and can’t explain why we like it.

What’s interesting is that, just like the deep learning experiments mentioned above, we still need massive amounts of data for the initial training phase. In other words, we need diverse, intense, personal experience. But 90% of the data we collect through these experiences can be ignored, discarded, or forgotten. What is left over is wisdom — the distilled nuggets of insight that, when deployed in the real world by someone who knows how to use them, can uncompress into dazzling feats of accomplishment. These nuggets of wisdom apply across a wide range of situations, can be communicated from person to person, and even last for centuries as timeless works of art.

Progressive Summarization is about using the information you consume as training data for your intuition. You can consume a lot more, because you’re able to continuously offload it. But more importantly, even if you lost all that data, you would still be left with the greatest prize: who you’ve become and what you’re sensitive to as a result of the diversity and depth of your personal experience.

The new purpose of learning is to enable you to adapt, as the pace of change continues to accelerate and the amount of uncertainty in the world continues to spiral upward. This occurs at every level: adapting your lifestyle to fit changing societal conditions; adapting your productivity to fit changing workplace norms; adapting your communication style to fit new kinds of collaboration; adapting your thinking process to fit new ways of solving problems. It applies right down to the most narrow tasks — the hardest part about writing this article were the mental gymnastics I had to perform to not get stuck on my assumptions about what I was trying to say.

Making a dent in a universe that keeps changing shape increasingly requires working on projects and problems that are FAR bigger than you can hold in your head. The challenges of our time are vast and cross the disciplinary boundaries that experts limit themselves to. We need people who can hold the context of two or more completely different fields in their heads at once, and then apply their highly trained intuition to finding patterns and hidden connections.

A lot of people sense this intuitively, but their attempts to memorize and to recall all this context are futile. There’s simply way too much to know. And in the meantime you get frazzled, overwhelmed, and isolated attempting to do so. This is how we are missing some of our best and brightest minds, lost in their organizational systems as the world falls to pieces.

What we need is people who know how to recruit networks to “know” for them. Networks of people, objects, images, computers, communities, relationships, and places. To connect, unite, inspire, and facilitate collaboration between these networks.

And what does that take? It takes courage, to let go of the security of knowing everything ourselves. It takes vulnerability, to depend on others for our progress and success. It takes presence, noticing what we notice and being willing to bet on it before we know exactly why. It takes curiosity, being willing to ask questions that don’t yet have answers, or any reasonable path to an answer. It takes pushing through our assumptions about how learning should look to get what we know in the hands of someone who needs it, right now.

Sign up here for a free 30-day trial of the new Praxis blog, or subscribe to the newsletter to receive notifications of free articles. You can also follow us on Twitter, Facebook, LinkedIn, or YouTube.