Ultralearning Ch. 10 - Principle 7 - Retention

Tue 12 May 2026 in Meta-learning

#learning #how-to-learn #ultralearning #books

Don’t Fill a Leaky Bucket

Nigel Richards, is a multiple-time world champion in the game of Scrabble. This game requires a voluminous memory of words in a language. He doesn’t speak French but he won the French world Scrabble competition.

What is Nigel Richards’s Secret?

Richards loved cycling. He cycled 200 miles all night without sleeping and participated in a tournament. After the tournament, he declined rides back home and decided to cycle again through the night. Cycling, of course, isn’t a great mnemonic technique. However, it does illustrate a common theme in Richards’ personality that overlaps with that of other ultra learners I have encountered: an obsessive intensity that exceeds what is considered a normal investment of effort.

Richards’ cycling, it turns out, also lines up well with the only other clues the author has been able to uncover about his methods: he reads lists; long lists of words, starting with two-letter words and then moving up. "The cycling helps," he explains, "I can go through lists in my mind." He reads the dictionary, focusing exclusively on combinations of letters, ignoring definitions, tenses, and plurals. Then, drawing from memory, he repeats them over and over again as he cycles for hours. This aspect also corresponds with a method that is common to other ultra learners and that has shown up in other principles of learning so far: active recall and rehearsal. By retrieving words, Richards likely takes his already impressive memory and makes it unassailable through active practice.

There are other clues about Richards’ performance: he focuses on memory, not anagramming (rearranging the tiles to create words); he works forward and backward, starting from small words, going on to big ones and back again; he claims to recall the words visually, as he cannot remember words when they’re spoken. All of these clues provide glimpses into Richards’ mind, but they leave out even more than they reveal. How many times does he have to read the words from his list before he can rehearse it mentally? Are the words organized in some way or just listed alphabetically?

Is he a savant with exceptional abilities and lower-than-normal general intelligence or an all-round genius for whom memorizing Scrabble words is just one of many impressive abilities? Maybe his intelligence is quite average and his dominance in Scrabble represents his extreme dedication to the game. We might never know the answers to those questions.

Part of the author suspects that his intense, obsessive personality, which enables him to cycle for hours reviewing lists mentally, might also form at least a partial explanation. For whatever it’s worth, Richards’ himself argues for this explanation: "It’s hard work, you have to have dedication to learn," elsewhere adding "I’m not sure there is a secret, it’s just a matter of learning the words."

Scrabble words may not be important to your life. However, memory is essential to learning things well. Programmers must remember the syntax for the commands in their code. Accountants need to memorize ratios, rules, and regulations. Lawyers must remember precedents and statutes.

Memory is essential, even when it is wrapped up in bigger ideas such as understanding, intuition, or practical skill. Being able to understand how something works or how to perform a particular technique is useless if you cannot recall it. Retention depends on employing strategies so the things you learn don’t leak out of your mind. Before discussing strategies of retention, however, let’s take a look at why remembering things is so difficult.

Why is it so hard to remember things?

How can you retain all of the things you learn? How can you defend against forgetting hard-won facts and skills? How can you store the knowledge you’ve acquired so that it can be easily retrieved exactly when you need it? In order to understand learning, you need to understand how and why you forget.

Research has led to discovering the forgetting curve. This curve shows that we tend to forget things incredibly quickly after learning them, there being an exponential decay in knowledge, which is steepest right after learning.However, this forgetting tapers off, and the amount of knowledge forgotten lessens over time. Our minds are a leaky bucket; however, most of the holes are near the top, so the water that remains at the bottom leaks out more slowly.

Though the jury is still out on the exact mechanism underlying human long-term memory, the three ideas of decay, interference, and forgotten cues, likely form some part of explaining why we tend to forget things and, conversely, provide insight into how we might better retain what we’ve learned.

Decay: Forgetting with time

The first theory of forgetting is that memories simply decay with time. This idea does seem to match common sense. However, there are flaws with this theory being the complete explanation. Many of us can vividly recall events from early childhood, even if we can’t remember what we ate for breakfast last Tuesday. Vivid, meaningful things are more easily recalled than banal or arbitrary information. Even if there is a component to our forgetting that is simply decay, it seems exceedingly unlikely that this is the only factor.

I believe the brain is intelligent enough to remember what needs to be remembered and forget what isn’t important. Even if not important now, perhaps there is a cue that will bring up that particular memory.

Interference: Overwriting Old Memories with New Ones

Interference suggests that our memories, unlike the files of a computer, overlap one another in how they are stored in the brain. In this way, memories that are similar but distinct can compete with one another. There are at least two flavors of this: proactive interference and retroactive interference. Proactive interference occurs when previously learned information makes acquiring new knowledge harder. This can happen when you want to learn the definition of a word but have difficulty because that word already has a different association in your mind. Retroactive interference is the opposite — where learning something new "erases" or suppresses an old memory.

Forgotten Cues: A locked box with no key

This theory says that many memories we have aren’t actually forgotten but simply inaccessible. We have the idea of cues linked to a memory for retrieval. If the cue is no longer accessible, then the linked memory might also no longer be accessible. However, if that cue were restored, or if an alternative path to the information could be found, we would remember much more than is currently accessible to us.

This explanation also has some advantages. Intuitively it seems to be somewhat true. It might also suggest that relearning things is much faster than learning them initially, because relearning is closer to repair work, while original learning is a completely new construction. Forgetting cues seems highly likely as a partial, if not complete, explanation of forgetting many things.

I really like the idea of relearning being faster due to being closer to repair work. I am basically relearning the fundamentals of computer science because I already have a bachelor’s degree in computer science. Now that I’m going through OSSU and TeachyourselfCS, I feel like I am learning quickly, so this point aligns with my experience. I had this sense already intuitively.

Cue forgetting as a complete explanation for our memory woes isn’t without its problems, however. Many memory researchers now believe that the act of remembering is not a passive process. In recalling facts, events, or knowledge, we’re engaging in a creative process of reconstruction. The memories themselves are often modified, enhanced, or manipulated int he process of remembering. It may be, then, that "lost" memories that are retrieved through new cues are actually fabrications. This seems especially likely in the case of "recovered" witness testimony from traumatic events, as experiments have shown that even highly vivid memories that feel completely authentic to the subject can be untrue.

That throws some doubts into my own memory. Am I fabricating some memories? Should I write down the most important memories immediately so that I don’t risk memory decay or forgotten cues leading to fabrications? This makes me want to write more about big events in my life. Perhaps I will.

How Can you prevent forgetting?

Forgetting is the default, not the exception, so the ultra learners the author encountered had devised various strategies for coping with this fact of life. The first set of methods deals with the problem of retention while undertaking the ultra learning project: How can you retain the thins you learned the first week, so that you don’t need to relearn them by the last week? This is particularly important for memory-intensive ultra learning efforts.

The second set of methods, in contrast, has to do with the longevity of the skills and knowledge acquired after the project has been completed: Once a language has been learned to a level you’re satisfied with, how can you keep yourself from forgetting it completely a couple years later?

You need to pick a mnemonic system, which will both accomplish your goals and be simple enough to stick to. For example, the author, during intense periods of language learning, the sheer volume of vocabulary often meant that spaced-repetition systems were helpful for him. Other times, he preferred having conversations to maintain his speaking ability, even though this method is not quite as precise. With other subjects, he’s happier to allow for some degree of forgetting as long as he practices the skills he needs to use continuously and has the ability to relearn.

I think I’m at a similar point, where I’m okay with allowing some forgetting, as long as I get practice through exercises and assignments in the courses or books I go through continuously. I do have the ability to relearn if I come across a concept where I need to review a previously learned concept.

The author admits that his approaches may not reach a theoretical ideal, but they may end up working better because they have fewer possibilities for error and can be sustained more easily. Regardless of the exact system used, however, all systems seemed to work according to one of four mechanisms: spacing, proceduralization, over learning, or mnemonics. Let’s look at each of these mechanisms of retention first, in order to make sense of the quite different and idiosyncratic manifestations used in different ultra learning projects.

Memory Mechanism 1 - Spacing: Repeat to Remember

One of the pieces of studying advice that is best supported by research is that if you care about long-term retention, don’t cram. Spreading learning sessions over more intervals over longer periods of time tends to cause somewhat lower performance in the short run (because there is a chance for forgetting between intervals) but much better performance in the long run. This was something the author needed to be careful about during the MIT Challenge. After his first few classes, he switched from doing one class at a time to doing a fe in parallel, to minimize the impact that the crammed study time would have on his memory.

If you have 10 hours to learn something, therefore, it makes more sense to spend ten days studying one hour each than to spend ten hours studying in one burst. Obviously, however, if the amount of time between study intervals gets longer and longer, the short-term effects start to outweigh the long-term ones. If you learn something with a decade separating study intervals, it’s quite possible that you’ll completely forget whatever you had learned before you reach the second session.

Finding the exact trade-off point between too long and too short has been a minor obsession for some ultralearnersw. Space your study sessions too closely, and you lose efficiency; space them too far apart, and you forget what you’ve already learned. This has led many ultra learners to apply what are known as spaced-repetition systems (SRS) as a tool for trying to retain the most knowledge with the least effort. Tools such as Anki are the preferred tool of more extreme ultralearnersw who want to squeeze out a little more performance.

SRS is an amazing tool, but it tends to have quite focused applications. Learning facts, trivia, vocabulary words, or definitions is ideally suited for flash card software, which presents knowledge in terms of a question with a single answer. It’s more difficult to apply to more complicated domains of knowledge, which rely on complex information associations that are built up only through real-world practice.

Another strategy for applying spacing, which can work better for more elaborate skills that are harder to integrate into your daily habits, is to semi regularly do refresher projects. The author leaned toward this approach for the things he learned during the MIT Challenge, since the skill he wanted most to retain was writing code, which is tricky to do on only half an hour per week. This approach has the disadvantage of sometimes deviating quite a lot from optimal spacing; however, if you’re prepared to do a little bit of relearning to compensate, it can still be a better approach than completely giving up practice. Scheduling this kind of maintenance in advance can also be helpful, as it will remind you that learning isn’t something done once and then ignored but a process that continues for your entire life.

Not sure if I want to incorporate a spaced repetition system like Anki or otherwise for my current learning journey. I do want to consistently practice active recall or retrieval though, so I’m thinking I should be deliberate about setting up a routine to do this. For now, I was thinking in the evening, before bed, look at my time tracker app, see what I’ve learned 2 days ago, then try my best to recall what I’ve learned on a blank document here in apple notes. That’s one method. Another method is to use Anki for this type of recall. First, try to do a free recall without any cues. Then, go to Anki and use the cards there to start my retrieval process. I need to look into the timing intervals of Anki. Then, I need to figure out which facts or concepts I should make cards out of.

About the maintenance projects, maybe this is a good idea for me and programming. I could implement a chat server or implement some other standard project that I haven’t done before as a way to keep my skills sharp. Perhaps I can do leetcode every now and then. But that’s only if I’m going to have an interview coming up. It doesn’t make sense to practice for interviews if I’m not going to be interviewing soon.

Since SRS tools tend to have focused applications, I’m not sure it’ll be worth the effort to set up and follow this system for my computer science learning journey. I’ll have to think about it more. Maybe I’ll experiment with it for my next OSSU session. I might even make Anki cards of the concepts here from the ultra learning book.

Memory Mechanism 2 — Proceduralization: Automatic Will Endure

There’s evidence that procedural skills, such as riding a bicycle, are stored in a different way from declarative knowledge, such as knowing the Pythagorean theorem or the Sine Rule for triangles. This difference between knowing how and knowing that may also have different implications for long-term memory. Procedural skills, such as the ever-remembered bicycling, are much less susceptible to being forgotten than knowledge that requires explicit recall to retrieve. This finding can actually be used to our advantage.

One dominant theory of learning suggests that most skills proceed through stages — starting declarative but ending up procedural as you practice more. Procedural knowledge is quite robust and tends to be retained much longer than declarative knowledge.

This may suggest a useful heuristic. Instead of learning a large volume of knowledge or skills evenly, you may emphasize a core set of information much more frequently, so that it becomes procedural and is stored far longer.

For programming, practicing the procedure of writing algorithms emphasizes a core set of skills and knowledge.

Most skills we learn are incompletely proceduralized. We may be able to do some of them automatically, but other parts require us to actively search our minds. Perhaps, owing to their nature, some skills cannot be completely automated and will always require some conscious thought. This creates an interesting mix of knowledge, with some things retained quite stably over longer periods of time and others susceptible to being forgotten.

One strategy for applying this concept might be to ensure that a certain amount of knowledge is completely proceduralized before practice concludes. This could be taking some knowledge from a programming course and figuring out a way to apply or incorporate it into the process of writing code. Perhaps a question(s) to keep in mind, that can be a step added to the process as sort of a checklist. Another approach might be to spend extra effort to proceduralize some skills, which will serve as cues or access points for other knowledge.

You may aim to completely proceduralize the process you use to start working on a new programming project, for example, so that you can get over that hump in the process of writing a new program. These strategies are somewhat speculative, but the author thinks there are lots of potential ways the declarative-to-procedural transition of knowledge might be applied by clever ultra learners in the future.

I have proceduralized a process for writing code. The mental models, thought patterns, knowledge, and experience that come with it are emphasized and more likely to stick to memory.

Memory Mechanism 3 — Overlearning: Practice Beyond Perfect

Additional practice, beyond what is required to perform adequately, can increase the length of time that memories are stored. Overlearning dovetails nicely with the principle of directness. Because direct use of a skill frequently involves over practicing certain core abilities, that kernel is usually quite resistant to forgetting, even years later. In contrast, academically learned subjects tend to distribute practice more evenly to a minimum level of competency in each area. For example, those who study languages through years of formal schooling have much more impressive vocabularies or knowledge of grammatical nuances than the author does. However, those same people may trip over fairly basic phrases, because they learned every fact and skill evenly, rather than over learning the smaller subset of very common patterns.

I feel like Systematic Program Design had us over learn the design method, which is a good thing. I will be on the lookout for over learned topics in future courses.

There seem to be two main methods the author has encountered for applying over learning. The first is core practice, continually practicing and refining the core elements of a skill. This approach often works well paired with some kind of immersion or working on extensive (as opposed to intensive) projects, after the initial ultra learning phase has been completed. The shift from learning to doing here may actually involve a deeper, subtler form of learning, which shouldn’t be discounted as simply applying previously learned knowledge.

Core practice of writing algorithms, applied to a coding project after my ultra learning session, could be a way for me to over learn.

The second strategy is advanced practice, going one level above a certain set of skills so that core parts of the lower-level skills are over learned as one applies them in a more difficult domain. Perhaps doing a substantial programming project can help here.

Memory Mechanism 4 — Mnemonics: A Picture Retains a Thousand Words

The final tool common to many ultra learners whom the author encountered was mnemonics. There are many mnemonic strategies, and covering them all is outside the scope of this book. What they have in common is that they tend to be hyper specific — that is, they are designed to remember very specific patterns of information.

Second, they usually involve translating abstract or arbitrary information into vivid pictures or spatial maps. Mnemonics work well, and with practice, anyone can do them. However, based on the author’s experience with them, their applications are quite a bit narrower than they first appear, and in many real-world settings they simply aren’t worth the hassle. The example given is that a man used mnemonics to memorize 70,000 digits of pi.

There are two disadvantages to mnemonics. The first is that the most impressive systems also require a considerable up-front investment. The second disadvantage is that recalling from mnemonics is often not as automatic as directly remembering something. The author feels that mnemonics tend to serve more as cool tricks than as a foundation you should base your learning efforts on. Still, there is a devoted subset of ultra learners who are fiercely committed to applying these techniques.

For me personally, I won’t be using mnemonics. The other learning methods will have me covered.

Winning the war against forgetting

To retain knowledge is ultimately to combat the inevitable human tendency to forget. This process occurs in all of us, and there’s no way to avoid it completely. However, certain strategies — spacing, proceduralization, over learning, and mnemonics — can counteract your short and long term rates of forgetting and end up making a huge difference in your memorization.

Real life tends to reward a different kind of memory: one that integrates knowledge into a deep understanding of things. In the next principle, we’ll look at going from memory to intuition.