(First of all, this story isn’t new. I wrote about it on Google+ about a year and a half ago. Since that platform is going away, I won’t bother linking to my original post and creating another broken link. I gave a lightning talk on this at the February meeting of the Atlambda Haskathon, as well. That’s when I realized I need to write about this again or risk losing it to defunct hosting.)

There are two concepts in Haskell that seem to have an interesting relationship to each other: lenses and zippers.

The zipper is one of the classical functional pearls: a data structure that was originally created for practical purposes, and then out of the blue, also exhibits some insane abstract relationships!

The original “zipper” is a way of working with linked lists.

Suppose that instead of traversing from the beginning, you’d like to navigate through the list, keeping track of a current position, and only making modifications around that current position. For example, the list might be a list of characters in a text editor, or a list of windows in a window manager. You want to be able to insert, delete, and modify the elements around the cursor efficiently.

The trick is to reverse the section of the list that comes before the cursor. You end up with two linked lists — one reversed list of values before the cursor, and one forward list of values after it — and one value at the cursor itself.

The green box represents a hole, containing a single element of the list. Surrounding this hole is the *context*, which is the surrounding data structure from the point of view of the cursor at that hole. In this representation, changes to the focused element are easy, since it’s separate from the context. Changes to the context near the cursor are also easy: to insert or remove an element adjacent to the cursor, you push or pop from one of the two lists. These are all constant time operations. Mission accomplished.

Zippers don’t just work for linked lists, though. In fact, most data structures containing elements also have associated zipper types, which separate the structure into a single focused element, and a context that describes the rest of the structure. For a linked list, the context consists of a left and right half, but for something like a binary tree, the context includes left and right sub-trees, and the parent nodes and its other children. It’s less obvious what this might look like. But there’s a trick! You can calculate these zipper types from the algebraic structure of the tree.

For this to make sense, we need to think about data types as things that have standard algebraic operations on them. In particular, we want to know how to add types, multiply types, and raise them to powers.

The intuition here is to think about the number of values that a type has, and find operations on the type that have the same effect on the number of values. (Of course, types often have infinitely many possible values… but let’s agree to ignore that for now.)

Here are the basic operations on types:

- Addition corresponds to a tagged union. That is, A + B is a type that can contain
*either*a value of type A,*or*a value of type B,*and*(unlike unions in C, for example) you can find out which one. - Multiplication corresponds to tuples with fields. That is, A * B is a type that contains
*both*a value of type A,*and*a value of type B. - Exponents correspond to
*functions*. That is, A^B is a type of functions (and I mean mathematical functions, so they must be referentially transparent with no side effects) that take values of type B as arguments, and return values of type A.

If this is a new point of view, it’s worth taking the time to convince yourself that the normal mathematical properties of these operations still hold! For example:

- Is it still true that X² = X * X?

Here, “2” can be seen as referring to a Boolean type, since it has two possible values. Choosing a pure function*f*from Boolean to X is the same thing as choosing a value of X to return for*f*(true), and another for*f*(false). - Does the distributive property hold, so A * (B + C) = A * B + A * C?

Suppose you have an A,*and*you have either a B or a C. Then maybe you have both an A and a B, or maybe you have both an A and a C. That works, too.

With this in mind, the question of how to build a zipper is answered in an amazing way by Conor McBride in an unpublished but well-known paper. The idea is that the context we saw above is just the *derivative* of the data type!

Let’s return to the example of a binary tree. A binary tree containing values of type X can be:

- A leaf, which is just a single value that terminates the tree.
- A branch, which consists of an X value, a left sub-tree, and a right sub-tree.

Expressing this algebraically, if T is the type of binary trees, then we can write T = 1 + X * T². The 1 refers to the nil value used for the leaves, and the X * T² is the branch node, which consists of an X value, and two sub-trees.

We can compute the derivative by differentiating both sides of that equation with respect to X. Applying the product rule, power rule, and chain rule, we get this:

dT/dX = T² + 2 * X * T * dT/dX

Do you see why this makes sense as the zipper type for a tree? Remember, the zipper type is the type of binary trees with one missing X value. The missing X value can be either:

*At the root:*In this case, since the X is missing, all that remains is to say what sub-trees live to the left and right. This is the T² term.*In one of the sub-trees of the root:*In this case, you need to know which sub-tree (a choice between two values), what X value is at the root, what the sub-tree looks like that’s not missing an X, and another context for the subtree that is missing a value. This is the 2 * X * T * dX/dT term.

Well, that’s interesting, if you haven’t seen it before. But let’s not stop there.

One thing that changed here is that while the original zipper by Huet reversed the point of view so that values near the cursor are easier to access, this new zipper doesn’t do so. That is an important point for practical uses of the zipper type, but not so much for this observation, which is entirely about the *information content* of the zipper, and not its efficiency.

Thinking about what information is contained in a zipper, you might note that a zipper type, dT/dX, contains *almost* a complete value of type T, but it’s missing an X. In practice, like we saw with the list zippers above, you’ll often keep a zipper (for context) alongside a specific X value that’s currently being edited. But that’s just another product!

You may be tempted to write an equation like X * dT/dX = T. But that’s wrong. Having an X and a dT/dX doesn’t just give you a value of the type T. It *also* gives you a location of a specific X inside of that T. How would that be represented?

Many programming languages tackle the question of how to identify specific places where one value fits inside the other. The answer usually comes down to two operations: *get*, which retrieves the value in some location from the larger structure, and *set*, which puts a new value into that location.

Entirely separate from zippers, functional programming languages have developed an elaborate theory of what basically amounts to getter/setter pairs for values. The result is the *lens*, which for any structure type T and field type X, is a new type I’ll call L(T, X), which says:

- For any T, how to get an X out of the T.
- How to put a new X back into the T as well.

In some intuitive sense, you can think of a lens as a *locator* for the X inside of the T. So you may be tempted to write an equation like this:

X * dT/dX = T * L(T, X)

Let’s take a simple example, and think about it. Suppose you have the type T = X³, which is just tuples of three X values. Then dT/dX = 3X² (by the power rule), and you might expect that L(T, X) = 3, since a lens just tells you which of the three X values inside the T you are looking at. Then indeed, the above equation holds!

That’s exciting, but the excitement won’t last. Thinking further about this leads to a few sobering realizations.

First, if you try this for something like a list, it doesn’t work out. If the type T is a list of X, then dT/dX = T², as we saw above. But L(T,X) doesn’t even make sense, because it’s impossible to find an X inside an arbitrary list (which might be empty). For the equation to work out, you would need L(T,X) = X * T, and that seems clearly wrong. If you play around with this, you’ll discover that this problem comes up any time you’ve got a sum type. Types that are pure products, like X³, work just fine, but as soon as different values might have different shapes, everything stops working.

The reason is some subtlety in what we mean by identifying the location of an X inside a T. A lens knows how to locate an X inside *any* possible value of type T. But the extra information we have from X * dT/dX is the location of an X in one *specific* value of type T. Basically, we have a quantifier in the wrong place: trying to use a lens here consists of saying “for all…” when we meant “there exists…”.

There’s a second problem, too: in general, it’s not true that L(X³,X) = 3 as assumed in the previous section. There are three *obvious* ways to find an X inside a value of type X³, but depending on the type X, there might be all sorts of other ways! For example, if X is the type of 32-bit integers, you could hide one integer by taking the lower-order 16 bits of the first one, and the higher-order 16 bits of the other. This is a perfectly good lens, but not one of the three we anticipated. Parametricity sometimes solves this problem if we quantify over possible values of X (because you can’t construct an X out of pieces if you don’t know what type it is, so you’re forced to use one that occurs intact in the data structure!), but that’s another accomodation needed.

In fact, it’s even worse. We can’t even safely say that X * dT/dX contains a T at all! If T is a list of X, for example, the type X * dT/dX can only contain non-empty lists (plus the location info), since dT/dX only represents contexts that leave a hole for some X value in the first place.

As I warned in the beginning, there’s not really a satisfying wrap-up here. We looked at some ideas that initially appear to be related. In some limited situations — when the zipper types are constructed over non-empty product-only types, and the lenses are restricted to being natural enough — things seemed to work out. But we also saw why it’s important to be precise about what these kinds of abstractions mean.

]]>I’ve been quiet on social media lately, but it’s been an active time for the CodeWorld project. Here’s a brief summary of some of the things happening in the last few months.

There have been some significant updates lately in the CodeWorld platform. Here are some of the big ones.

Together with Ashish Myles and with some good advice from Fernando Alegre, I started building the CodeWorld Requirements Framework. This is a language embedded into CodeWorld for expressing requirements for assigned student code. For example, you might ask students to write at least 4 functions in their project, to use all five built-in picture transformations (translation, rotation, dilation, non-uniform scaling, and coloring), and to make sure there are no unused parameters in their functions. The requirements language lets you express all of these.

The framework implements a few one-off rules for some common requirements, but based on some inspired advice from Fernando, it also includes a pattern matcher over Haskell syntax, which lets you define patterns that should (or should not) occur in the code. By abusing template haskell syntax, you can write a pattern like f $var = sqrt $any, and this will match any function called f whose body is defined by applying sqrt to any expression. It’s surprising how many rules can be expressed in this form. Ashish and I alpha-tested the framework when teaching functions to middle school students, and it was very successful.

That said, though, the Requirements Framework is still in experimental stages. There are a lot more rule types I’d like to implement, including common lint rules, UX improvements, and even some ambitious projects like integrating runtime validation testing and compile-time inspection testing into the framework. The status of the project is at https://github.com/google/codeworld/projects/9.

Artem Kanev has recently been contributing to the tool support in CodeWorld, and has made a lot of progress. You might notice that CodeWorld’s education mode now offers documentation for known variables when you hover over them, or when you select them in the autocomplete UI. There’s also a lot more work done to include symbols defined in the current module in these features. These simple changes make a huge difference to students as they use the platform.

I know Artem has big plans coming up, to continue working on the student programming tools and turn the CodeWorld environment into a world-class user interface (but without sacrificing its essential simplicity!). Coming soon are continuous highlighting of compile errors, better error message presentation, and deferred type errors. After that, who knows? You can definitely try out the new features and send bugs and feature requests with your feedback.

As we learn more from teaching with CodeWorld, we continue to tweak the exported API so that future classes go better. A few API changes are now in progress.

First, CodeWorld is moving toward using pattern synonyms for a lot more constants. This includes the built-in colors, and the constant pi, for example. The reason for this is that there’s a nice educational progression from listing inputs and outputs of a function in a table, then writing them in a sequence of equations for specific inputs, and finally generalizing to a pattern with a formal parameter. But that doesn’t work when the input isn’t a literal. Bidirectional pattern synonyms act like literals in most ways, so students can write equations like *f*(Blue) = 5, and they act as they are supposed to.

This is a big change, especially for colors, and the old names for colors will be around for at least a year and half, and possibly longer. But I encourage you to teach with the new names in the future.

At the same time, I’m continuing to update documentation and curriculum. I’ve taken another pass at the beginning of the guide over the last several months, and expanded much of the content. There are also new folders in the CodeWorld Drive folder with worksheets and resources I’ve created for my new classes. This is still very much a work in progress, but things are getting better.

I’m also resuming work on the CodeWorld comic book. This was a cute project I began many years ago, to convey some of the humor and programming culture, as well as introduce the language of error messages, in a fun and casual way. The story is finished, but we’re filling out the book with creative and personality-filled reference pages that students can use as they build their own projects. Stay tuned!

Of course, none of this would be worth it without the kids! I’m also continuing to teach. I’ve finished a class for 8th grade students in the previous semester, and I’m taking on 9th graders for the first time this semester. It’s a new experience for me, so keep your fingers crossed for me.

Others have also continued their teaching. In particular, I’m still working with Louisiana State University to investigate results measured in our classes, and I know others are teaching classes everywhere from elementary school through university classes around the world.

To stay up to date on CodeWorld, feel free to subscribe to the codeworld-discuss mailing list. We’ve got a lot of cool things coming up. I’m hopeful that we can work with another student or two in the upcoming Google Summer of Code season, and share some of our research results over the next several months. Stay tuned!

]]>Just to set expectations from the beginning:

- This is not research; it’s just exposition of things that are already widely known in programming language theory. If you eat denotational semantics for breakfast, nothing here will be new.
- This is also not an applied post. You won’t learn (at least directly) how to build better software in Haskell. You might, though, learn a new way of thinking about what your existing Haskell code
*means*.

If you’re still with me, let’s go!

If you have a function from any set to itself, then a **fixpoint** of that function is any input that maps to *itself*. On a standard, graph you can think of these as being values that fall on the line *x* = *y*.

Here are a few examples:

- Consider the function
*f*(*x*) =*x*². It has two fixpoints: 0 and 1. These are the two real numbers that don’t change when you square them. (Negative numbers become positive, so they can’t work. Numbers between 0 and 1 become smaller. Numbers larger than 1 get larger.) - Now consider the cosine function, with its argument in radians. You may have unknowingly calculated the fixpoint of this function when you were bored in high school math class. If you start with any number, and repeatedly press the cosine button on your calculator, you’ll notice that the answer converges to a number around 0.739. This is the only fixpoint of the cosine function.
- Finally, consider
*g*(*x*) =*x*+ 1. This function has no fixpoints, since if*g*(*x*) =*x*, then subtracting*x*from both sides, you could conclude that 0 = 1.

In general, deciding whether a function has fixpoints or not can be difficult. So there are many different fixpoint theorems. For example, Brouwer’s fixpoint theorem says that any continuous function from a closed ball into itself in Euclidean space must have a fixpoint. Another, which will be important in a bit, is the Knaster-Tarski theorem, which says that any order-preserving function on a complete lattice has a complete lattice of fixpoints. But more on that in a bit.

That’s what a fixpoint is, but why should we care about it? Are fixpoints more interesting than a children’s amusement with the cosine button of a calculator? In fact, they are.

A really beautiful sequence of the MIT text *Structure and Interpretation of Computer Programs* deals with progressively more abstract representations of algorithms for computing a square root. Heron of Alexandria proposed an algorithm as follows. To compute the square root of *n*: (a) start with a guess, *a*; (b) compare *a* with *n* / *a*; (c) if they are close enough to each other, then stop; otherwise, average the two and repeat with that as your guess.

For example, if we start by guessing that the square root of 10 is 5, then we go through these steps:

*a*= 5, but*n*/*a*= 2. Those are not close, so our new guess is 7/2.*a*= 7/2 (3.5), but*n*/*a*= 20/7 (about 2.86). Those are not close, so our new guess is 89/14.*a*= 89/28 (about 3.18), but*n*/*a*= 280/89 (about 3.15). Those are not quite close enough, so our new guess is 15761/4984.*a*= 15761/4984 (about 3.16), and*n*/*a*= 49840/15761 (also about 3.16). Those pretty close, so we can stop.

See what’s happening? Just like with the cosine button, we’re repeating a computation to make a guess better until it converges. The point where it converges is a fixpoint. But this time, the computation was chosen so that its fixpoint is the square root of *n*, giving a technique for calculating square roots.

To express this, we can write a Haskell implementation built around the notion of searching for a fixpoint.

You can play with this code using CodeWorld, and compute your own square roots.

This is sort of a funky way to compute a fixpoint, though.

- First, it requires a starting value, which is a wart in the API. A function’s fixpoints don’t really depend on any particular starting value.
- Second, it relies on the function to converge — to produce a value closer to a fixpoint in its result than its input. This is the reason the average is there at all, instead of just using n/2, which has the right fixpoint but doesn’t
*converge*to that fixpoint. - Third, the function may not even have a fixpoint at all! Here, a fixpoint exists for Double because it has finite precision. But the function
*f*(*x*) = (*x*+ 10/*x*) / 2 has no fixpoint in the rational numbers, so there’s no chance this would also work with Rational.

These problems are hard to overcome in the current form, but it turns out that by creatively modifying some definitions, we can distill the essence of this into a more abstract combinator, which is Haskell’s fix.

In the Data.Function module of Haskell’s base library, there’s a surprising function.

fix :: (a -> a) -> a

This function computes a fixpoint of *any* Haskell function. It does it without any of the disadvantages of the version above, and even without any constraints on the types! But how can this be? First of all, even demonstrating the existence of fixpoints is so difficult that it requires heavy mathematical theorems. And even worse, some functions (like *g*(*x*) = *x* + 1, above) have no fixpoints at all!

The trick is that Haskell functions aren’t just ordinary functions on ordinary sets. Haskell functions can throw exceptions, and they can loop indefinitely. To reconcile this with the world of mathematical functions, we say that Haskell functions work not just with the set of ordinary values of its types, but the set of all *partially defined values*.

Here’s how that works. In math, the set of integers is {0, 1, -1, 2, -2, …}. That’s it! There is nothing else in that set. But in Haskell, we say that the type Integer includes one more value: ⊥. This special value is used to indicate that the function doesn’t produce a true number at all, instead either running forever in an infinite loop, or throwing an exception. In essence, ⊥ means “I don’t know.”

Once ⊥ is taken into account, *g*(*x*) = *x* + 1 does actually have a fixpoint, because *g*(⊥) = ⊥. That is, if you don’t know what number is passed in, you also don’t know what number comes back out. And this is, in fact, the fixpoint you get from fix. The expression fix g loops forever instead of producing a value, so we say its value is ⊥.

In fact, any time that *f*(⊥) = ⊥, fix f will produce ⊥. (This is unfortunate if we’re looking for actual numbers as fixpoints like we are above. But later, we’ll be able to recover our original notion of fixpoint in terms of this new one!) This might make you wonder why fix is useful at all! After all, if you don’t know the input to a function, can you ever know the output? Are there any functions for which ⊥ is *not* a fixpoint? Well, yes there are, and it’s all because of laziness. Consider a constant function, like *h*(x) = 42. Because it’s a non-strict language, Haskell will produce an output of 42 without even looking at the input. In other words, *h*(⊥) = 42. ⊥ is not a fixpoint. And, in fact, fix h is 42 for this function.

With functions on simple types like Double, these are the only two possibilities: if the function is constant, then fix will produce the constant value, and otherwise it will produce ⊥. (The word for these simple types is “flat”; either you know nothing about the value, or you know everything.) But the result is more interesting when the types have more structure. Given a linked list, for example, there are many possible values where different parts of the value are unknown:

- ⊥ means that nothing about the list is known.
- ⊥ : ⊥ means that the list is known to be non-empty, but neither the first element nor the rest of the list are known. This is more information than you have about ⊥ (which might be an empty list).
- 3 : ⊥ means that the list starts with a 3, but you don’t know what (if anything) comes after that. This is strictly more information than ⊥ : ⊥.
- ⊥ : [] (also known as [⊥]) means that the list is known to only contain one element, but it’s not known what that element is. This is again strictly more information than ⊥ : ⊥, but it’s incomparable to 3 : ⊥. They are both more info than just whether the list is empty, but neither is strictly more informative than the other. They provide different information.
- 3 : [] (also known as [3]) is strictly more information than
*either*of ⊥ : []*or*3 : ⊥.

This defines a partial order on Haskell values. It’s different from the order given by Ord, which is the usual meaningful order for that type. This order sorts the values by how much we *know* about them. So in this order, neither of 3 or 4 is less than the other (they are both completely defined, just different); but ⊥ is less than ⊥ : ⊥, which is in turn less than either 3 : ⊥ or ⊥ : [], and each of these is in turn less than 3 : [].

In fact, any Haskell type has a complete semilattice of values in this definedness order, which just means that given any nonempty collection of values, there is always some unique greatest lower bound, which is the most-defined possible value that’s still less defined — but compatible — with all of them. For example, for lists of numbers, the greatest lower bound of the set {[3, 2], [3, 1, 5], [3, 4, 5, ⊥]} is 3 : ⊥ : ⊥, since all three lists are compatible with this type, but making it any more defined would have to exclude some of them.

An important attribute of Haskell functions is that they *preserve* this definedness order. In other words, if *x* is less than or equal to *y* in this order — meaning that *x* is compatible with *y* but possibly less defined — , then *f*(*x*) is also less then or equal to *f*(*y*). If you think about it, this makes sense. If knowing a little bit about the input gives you some knowledge about the output, then learning *more* about the input might tell you *more* about the output, but it should never contradict or take away from what you already knew.

Now here’s where everything starts to come together. The fix combinator always produces the *least* fixpoint in this definedness ordering. This least fixpoint will be guaranteed to exist by the Knaster-Tarski theorem mentioned earlier, which says that any *order-preserving function* on a *complete semilattice* must also have a complete semilattice of fixpoints — and in particular, there must be a least one of them. Definedness is a complete semilattice, and all Haskell functions are order-preserving, and that’s good enough to guarantee that the least fixpoint exists.

Let’s look at another example. Define threeAnd list = 3 : list. A fixpoint here is a list that is not changed at all by prepending a 3 onto it. It can’t be ⊥, because 3 : ⊥ is definitely not the same as ⊥. The answer is an *infinite list* of 3s, so fix threeAnd gives you exactly that: an infinite list of 3s. We can check this with GHCi.

ghci> import Data.Function

ghci> fix (3 :)

[3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ^C

Interrupted

The reason that fixpoints play such a dominant role in functional programming is that they are intricately related to recursion, and that relationship is an important bridge between the operational realm — understanding what the program does— and the denotational realm — understanding what the program means.

In the operational sense, Haskell’s fixpoints can be defined using recursion, like this:

fixf = x where x = f x

This definition seems almost too cute to work, but it does! It just starts with an *x*, and then keeps replacing every *x* with *f*(*x*). After an infinite number of substitutions, we find that the least fixpoint of *f* is just *f*(*f*(*f*(*f*(*f*(*f*(*f*(*f*(…)))))))). In other words, any time we need to refer to the input of the function, we just substitute another function application instead.

In the opposite direction, though, general recursion can be defined in terms of the fixpoint combinator. Suppose you have a programming language with no direct recursion allowed, but you are given a fixpoint combinator. Then there’s a simple syntactic sugar for recovering general recursion. Just take your recursive definition, like x = ... x ..., and rewrite it with an extra parameter that takes the place of recursive uses: x1 x0 = ... x0 .... Notice that x1 is *not* defined recursively. But setting x = fix x1 satisfies the equation given for x.

In the untyped lambda calculus, fix is known as the Y combinator, and it’s a crucial step to showing that the untyped lambda calculus is computationally universal. That’s because lambda calculus doesn’t have general recursion as an a priori language feature. But Y combinators can be built within the untyped lambda calculus itself, so that general recursion can be simulated with fixpoints as described above.

Once you add types, things change a little bit. It’s impossible to define a Y combinator directly in most sound typed lambda calculi, because it would allow for nontermination, which typed lambda calculi usually try to avoid. When these calculi are used as the basis for general programming languages, the first step is usually introduce some kind of general recursion (such as a recursive *let*) to allow for nonterminating programs to be written.

There’s still an important place for a fixpoint combinator, though! If we care only about what our programs do, then adding recursion might be sufficient. But recursion literally means defining things in terms of themselves, and that’s not safest thing to do if you would like words to keep having meanings that make sense. So what does a recursive definition even mean?

Fortunately, the Knaster-Tarski theorem guarantees that any function has a unique least fixpoint in the definedness order! So, at least for the pure fragment of Haskell, it’s okay to just *define* the meaning of a recursive definition to be the least fixed point of the related non-recursive function obtained by pull recursive uses into a parameter exactly as we did above. This ensures that any recursive definition can be given a solid meaning, even though on face value it’s just a circular definition.

(Incidentally, fixpoints are a nice way out of all kinds of self-referential paradoxes like this — what Hofstadter called “strange loops”. So next time you’re debating your science fiction geek friends about time travel and the integrity of the space-time continuum, remember that there’s no paradox in time travel as long as the universe is a fixpoint of the function obtained by pulling the back references out as parameters.)

Looping back, then, we could now redefine the fixpoint function from the introduction — using iterative refinement to close in on the right value of the square root — using the least fixpoint of a Haskell function.

Putting together all that we’ve talked about, here’s a question. Suppose you get frustrated with fixing your Haskell code, and type the following into GHCi:

ghci> import Data.Function

ghci> fix error

What happens next? Give it some thought before reading on.

To solve the puzzle, let’s first explore the types. The error function in Haskell has the type String -> a. It cannot reasonably have a fixpoint unless the domain and codomain are the same, so the type specializes to String -> String. Then fix error has the type String.

What is this magical string that fixes all errors? Well, it must be a fixpoint of the error function, but the result of theerror function is *always* ⊥. That is its whole purpose in life! So fix error must be ⊥, too. Not so special after all?

Ah, but on the operational side, there are many kinds of ⊥. Some are just infinite loops, while others throw exceptions. If they throw exceptions, then there’s an exception value to think about. These are all invisible to the pure fragment of Haskell, but immensely important to what the program does! To understand what’s going on here, recall that fix error = error (error (error ...))), an infinite nesting of error functions. The result is quite interesting.

ghci> fix error

"*** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception: *** Exception:^C

Interrupted

Let’s just say it did not fix any errors. Instead, it threw an exception, with an associated exception message that itself throws an exception when printed, and that exception has another exploding error message, and so on as long as you allow it to continue!

]]>**First of all**, if you’re reading this and will be at ICFP, please consider coming to dinner on Monday night, Sep 24, to talk with other interested people about Haskell (and Haskell-like languages) in K-12 education. Please RSVP here as soon as you can.

There have been a number of changes made to the CodeWorld platform lately.

- It’s the end of the summer, and if you missed Krystal Maughan’s write-up of her summer contributions for Google Summer of Code, you should really check it out. Krystal added some great new debugging features, including an implementation of the time-traveling debugger that is famous in Elm, and tools for running programs in slow motion or fast forward, and zooming and changing the view during debugging.
- Richard Cook also contributed some code to offer a local authentication database as an alternative to Google OAuth integration. This should help anyone who is setting up their own CodeWorld server and wants to create accounts manually or with a script.

CodeWorld is being taught in quite a few places now!

- In Louisiana, a pilot program is continuing to teach computational thinking with CodeWorld, organized by a group at LSU. This program reaches 350 students, including many schools and teachers who were trained over the summer.
- In a separate effort in Louisiana, middle school students have seen a brief introduction to CodeWorld as part of a survey course that also includes p5.js, Arduino, and web programming.
- In California, Brooklyn Cook is teaching CodeWorld to a new class, keeping it alive there even after my moving to New York.
- Together with a coworker and a local teacher, I’m leading a class at The Clinton School in New York. This is CodeWorld’s first entry into the New York area. Schools here start quite a bit later in the year, so we’ve just had our first class today.

I’ve been working on updating the Guide to match our current best understanding of how to teach CodeWorld. Every time I do this, though, it gets quite a lot longer. My plan is to try to keep up with my own class this school year, so I’ll have a good chunk of the updates done by the end of the year. I’ve also swapped the Guide to use markdeep, a powerful variant of markdown that makes for a surprisingly complete publishing platform. I’ve extended markdeep locally with some features like collapsible notes that can be expanded for more information. Just seeing the results makes me excited to be writing the guide again!

Fernando Alegre at Louisiana State University is also building a separate curriculum focused on 9th grade, and with different elements and API entry points involved.

]]>*You would be wrong, though.*

I *am* still hopeful that this works out for the best. It might raise enough awareness, and motivate and fund enough school districts, to offset the harm of the standards themselves. But the move to quickly adopt standards for K-12 computer science is deeply misguided, and I believe we will likely come to regret the push for early standardization.

This is unusual of me to say. The push for updated standards like the Common Core, NGSS, etc., has been a huge part of the education reform movement over the past decade, and it’s admittedly been largely successful. I certainly don’t align myself with those naysayers who see the Common Core as government overreach or argue that children need more drills on long division. But computer science is different, in a few ways.

It might seem like computer science has been around forever, but it’s pretty new by any measure. The first computers themselves were built barely 70 years ago — barely the blink of an eye compared to the storied millenia of mathematics, language, science, and history. And yet even that 70 years is misleading. Software in 1948 was a very different beast from what it looks like today. It doesn’t look like that pace of change is slowing. If anything, the advent of new fundamental ways of looking at computing is proceeding at breakneck speed.

Computer science shows telltale signs of immaturity. The common university computer science curriculum looks like a list of all the things we thought computer programmers needed to do 30 years ago. There are courses in writing operating systems, implementing compilers, assembly language, databases, network programming, graphics, and artificial intelligence. Compare this to mathematics or science, and you’ll find a quite different story, with a mature field organized around key ideas, not tasks.

We aren’t talking about computer science, though, but about computer science education. And that is far, far newer and less certain than the topic as a whole. The current push for computer science education is maybe 15 years old. There have been older generations of the effort, to be sure, but without a lot of success to show for it.

The current thinking in computer science education, led by Scratch and Hour of Code, is that CS can be made accessible to younger children with a drag-and-drop tools and game-like learning environments. Yet most success stories come without real learning objectives, much less authentic transfer of skills. Researchers often measure *interest *instead of *outcomes*, and on success or failure rather than building a theory of how students make meaning and learn.

The standards themselves show telltale signs of this. If you compare with the Common Core, the first and most striking difference is how much of California’s new CS standards are about auxiliary concerns — they talk a lot about making sure students know to respect copyright and can talk about the social implications of computing (something that is important, but belongs in a civics curriculum!), but fall very short in enumerating the skills students need to build toward robust technical competency. There’s a reason for that: we don’t really understand those early technical skills even remotely well enough to say what they are, much less where a student should be at which grade levels!

A third challenge is that the professional practice of computer programming is a bewildering mixture of different notations, languages, libraries, paradigms, and tools. Great debates can be found, both in professional practice and in the education space, about the choice of tools and notations. Little if anything is known about how well children transfer expertise in one tool (especially drag-and-drop languages, for example) to others, and the early hints we do have don’t look so promising.

Standardization implies that we’ve converged on some consensus, and it’s time to get everyone on the same page. In computer science today, what page is that? By reading the standards, one can only conclude that the page we’re aiming at is a very shallow surface understanding of technical content, in an effort to be as vague as possible about what that content might be.

Finally, it’s important to look back at why we’re doing this. Not every student will develop computer software for a living. But we have taken as an article of faith (and I agree with this wholeheartedly) that easy and ubiquitous computation has changed the world, creating new opportunities for learning, and new ways of thinking and understanding. These are important not just for computer programmers, but for educated citizens in the world at large.

If we believe this, though, why does our computer science education, even in elementary school, look so much like the profession of software engineering? Why are we using languages that either are built for industry (like Python, and JavaScript, and later Java), or mimic their structure and organization (like Scratch or Hour of Code)? There’s been much debate in recent years over how we define computational thinking in a way that keeps faith with its meaning as a universal skill. A great answer is that it’s about using decomposition, abstraction, and modeling to tackle the kinds of enormously complex tasks that arise when the grunt work of computation becomes trivial. Today’s standards don’t reflect that answer.

One could ask, even if these aren’t the best standards, why *not* have standards? Even before adoption, my colleagues in California have already been asked The Question. “Is your activity aligned with the proposed CS standards?” If the standards talked about syntax, this question would exclude using block-based languages. They don’t, though, because there are powerful groups working on block-based languages. This is a huge gap, since learning to be creative in formal notations is a huge part of what students stand to gain from learning computer science. The standards do talk about loops and mutating variables, so this question makes it harder to justify functional programming — despite the fact that Bootstrap, a curriculum based on functional programming, shows the best research evidence I’ve seen of transfer learning between computer science and mathematics.

There’s an even bigger problem: computer science is an inherently creative field, and the way standards are used in STEM stands in the way of students’ ability to create.

Maybe you’ve decided to do some weight-lifting. You might join a gym and try a circuit on the weight machines. *Big mistake*, say almost all experts! Instead, they advise you to use free weights. Weight machines isolate and develop a small set of major muscles. But lifting free weights also builds a bunch of little muscles that you need for balance and versatility.

In a very similar way, building interesting and complex things in computer science develops a huge number of small skills, without which you might pass an exam, but you will struggle to build new things on your own. Standards are like the major muscles: if you are missing one in your training routine, it’s worth fixing. But it’s not sufficient to isolate just the standards, and train each one in isolation. There’s a whole second tier of skills that, while each one may not be critical, leaves students in a precarious position if they don’t pick up any!

Unfortunately, the current educational climate is about lessons carefully aligned to standards; the educational equivalent of weight machines. When they learn to read, students read a selection of books that give them a broad exposure to examples of literary value. But almost everywhere else, teachers are expected to choose resources, tools, and curriculum that’s stripped down to spend time only on the specific skills mentioned in the standards document. This laser-like focus on specific standards doesn’t do students any goals if we want them to apply the content independently and creatively!

The push for California to adopt computer science standards was supported by most of the CS education community. I’d suggest this was not at all about a feeling that there’s too little standardization! Instead, it’s just because issuing standards is *what the board of education does* to encourage things like CS education. It was about recognizing the *need* for CS, not controlling the content.

But standards don’t fix budget problems. They don’t fix teacher education. And they don’t replace high-quality curriculum. Instead of pushing for the state to *declare *what teachers should be doing about computer science, I much prefer to help teachers do what they already know is good for their students.

This meeting is for anyone around the world with an interest in teaching Haskell and related languages at the K-12 (that is, pre-university) level. We will hope to find a critical mass of people to build a connected and supportive community. In this community, we can share experiences and resources, ask for and receive help, and encourage each other as we strive to bring functional programming to a new generation.

Personally, I’ve been working for six years on using Haskell to build algebraic thinking skills in middle school students. I am not alone. Emmanuel Schanzer and many others have explored the same approach with Bootstrap. Christopher Anand and colleagues have done the same with Elm at McMaster University as early as 4th grade, and the Gordon Cain Center at Louisiana State University has trained teachers using CodeWorld for a 9th grade computational thinking class for hundreds of students. Just as importantly, the Haskell community is built of thousands of individuals who have something unique contribute to your own communities and children.

You’re invited to come discuss our common goals, how we can organize and teach, which tools and libraries we find helpful, what technology is missing that could help the effort move forward, and even how to scale our efforts to reach as many children as possible around the world.

Let’s meet for dinner about 7:00 pm on Monday the 24th. Please RSVP here. I hope to see you there!

]]>Personally, I think when people talk about computational thinking, they mostly have the right set of ideas in mind — decomposition, abstraction, and modeling. There’s another bit, though, sort of underlying all of these, that’s harder to name. It’s the sense that, instead of trying to *first* understand and *then* formalize, we can build understanding through the creation of a formally validated symbolic representation. (There’s a beautiful section of the SICP text in which the application of more and more abstraction to a procedure for computing square roots leads to exploring its connections with important ideas like fixpoints, signal damping, and derivatives. In lecturing on this at IBM, Abelson commented on how this lets us comprehend the meaning of Heron of Alexandria’s process, in a way that even Heron could not access.)

The validation is necessary, by the way, as a crutch, because we’ve left the realm in which intuition works well as a guide. Having some intuition is a good thing, but when trying to state things formally, we always make mistakes, and they are usually not even subtle ones. They are glaringly obvious things that make our work nonsense even if we know what we meant to build. All computation environments do some amount of validation, even if it’s just checking syntax, or making the bad behaviors of our broken models obvious and observable. Some do more than that, such as checking consistent use of types or abstraction boundaries.

This is, I think, is one way in which the computer is relevant to computational thinking. Without constant validation, we still go through qualitatively the same abstraction process, but each step toward abstraction introduces a new possibility of error. The work of carefully verifying each step grows to overwhelm the invention of new abstractions, limiting its feasible scope. Similarly, computer scientists are by no means the first to approach difficult tasks by structured decomposition. Yet, when you are designing a bridge, the work of the engineer is necessarily dwarfed by the hundreds of laborers pouring concrete and welding beams, not to mention mining iron or milling parts! It gives quite a different perspective when coping with complexity is the *main* determinant of success or failure.

The problem is that people say all these things, but then they teach what they know. And for many of us who were trained as computer programmers, what we know is the *practice* of computer programming. Some of that practice is relevant to computational thinking, but much of it is an accident of history, or a consequence of which company had more marketing money in which decades. Even when it’s quite justified as a technology, it’s quite a different thing whether a skill is part of that *core* that gets at the essence of how computation opens up new worlds.

Computer programmers are attached to their tools, but the UNIX command line, integrated development environments, and revision control systems are surely not part of that core. Maybe there’s not much debate on that point. Loops, though, are a main player in even elementary school computing activities. Should they be? They seem to me to have been invented entirely for the practice of computer programming, and are not at all important to abstraction, decomposition, or modeling.

Alas, it seems I have written a blog post in this comment. I might recycle a lot of it in my later recap after having thought this over more!

]]>Without further adieu, my list of reflection prompts:

- Since Jeanette Wing’s 2006 introduction, the phrase “computational thinking” has resurrected the notion that within computer programming lies a universally applicable skill. Since then, politicians and other social leaders have jumped on the bandwagon of learning to code. Why has computational thinking become such a rallying cry for CS education? What are its implications? What are people looking for that they didn’t find in “programming”, “coding”, or “computer science”?
- Jeanette Wing defines computational thinking as the
*thought processes involved in formulating problems and their solutions so that the solutions are represented in a form that can be carried out by an information processing agent.*There is disagreement on this definition. What should computational thinking mean? How is it different from programming? From computer science? How is it related to abstraction, decomposition, algorithmic thinking, mathematical thinking, logical thinking, computational modeling, or digital literacy? - Computational thinking is described as a universal skill set that is applicable to everyone, and not just computer scientists. Is this fair? If the current generation of computational thinking curriculum looks a lot like computer science, how are we living up to that standard? If the skill set is so universal, where are the examples from before computer science existed as a topic of study?
- The skills needed to both use and create computational artifacts are changing rapidly. How does this impact the meaning of computational thinking? Suppose that, in a few decades, most computer software will be created by non-specialists using informal language and NLP; what does that mean for computational thinking?
- The authors compare computational thinking with computational literacy, which includes social concerns and computation as a medium for math, science, or art. What can we say about the relationship between the two? What, if anything, distinguishes these ideas?
- The elements of computational thinking are listed as abstraction, information processing, symbolic representations, algorithmic flow control, decomposition, procedural (iterative, recursive, and parallel) thinking, conditional logic, efficiency and performance, and debugging. How do these topics really fit together? How would we see them differently if computer science were not involved?
- There are hundreds of very different competing programming languages and tools vying for using in computing classes. How do we identify a universal body of knowledge in such an environment? What does this do to the possibility of transfer learning? Can we assess computational thinking in the absence of assumptions about languages and tools?
- The authors point out that Scratch, the most popular K-8 programming language, is missing procedural abstraction, even though abstraction is “indisputably” the cornerstone of computational thinking. What other important ideas are missing from common tools? What can we do about it?
- The authors claim that few efforts in computational thinking take into account socio-cultural and situated learning (i.e., how we learn through apprenticeships and relationships with others), distributed and embodied cognition (i.e., how cognitive processes take place across the whole body and through communication between multiple people), or activity, interaction, and discourse analyses (i.e., study of the way communication happens in the classroom). How would these perspectives change our view of computational thinking?