A Hacker's Introduction to Partial Evaluation

by Darius Bacon

created August 2002
updated 16 September 2002

Abstract: A partial evaluator automatically specializes a program with respect to some of its input. This article shows how the idea comes up naturally when you write program generators by hand, then builds a basic online partial evaluation library and puts it to work transforming an interpreter into a compiler.

Cite: Darius Bacon. A Hacker's Introduction to Partial Evaluation. The Lisp Magazine at lisp-p.org, 1, August 2002. http://www.lisp-p.org/peval/peval.cgi.

0. Introduction

Mainstream programmers think of writing an interpreter or a compiler as a major job, worth doing only for a major problem. They know this because the languages they use every day have big, serious implementations, and the compiler class they took in school had them write just one big, semi-serious compiler. Lispers know better: all the textbooks show how to write a Lisp interpreter in about a page of code. While their underlying Lisp system is just as big and serious as the mainstream compiler, they know how and why to whip up a little special-purpose minilanguage.

But even in the Lisp world, partial evaluation is more obscure. For most hackers, if they've heard of it at all, it was through either a fancy general-purpose partial evaluator or a toy example. This tutorial article tries to introduce the subject in the same style as our familiar one-page interpreters: small examples that both show the ideas in a tractable context and suggest how to use them on real problems. After studying this, hopefully, you'll be able to not just apply a general-purpose partial evaluator like PGG, but also write a small custom one targeted to your problem, as needed.

These ideas aren't new, not even the idea of popularizing them this way: see the further reading for original sources. (That's especially recommended if you don't know Scheme, as I present a lot of Scheme code without explanation. Some helper functions for this code are defined separately.)

Section 1 sets the stage by presenting a typical hand-written code generation function, the sort of thing thoroughly familiar to anyone who's written Lisp macros. We go through it in detail to bring out why we'd want to automate the job better. Sections 2 and 3 refactor this example and reformulate what we've done as symbolic execution. Section 4 applies symbolic execution to a bigger program: the rewrite-rule interpreter from Paradigms of AI Programming, converting it into a compiler almost automatically. Section 5 reflects on the result and the principles it illustrates.

1. Generating specialized code

Say we have a power function:

(define (power x n)
  (cond ((= n 0) 1)
        ((odd? n) (* x (power x (- n 1))))
        (else (square (power x (/ n 2))))))

(define (square x)
  (* x x))

and for speed we want custom versions of it for particular n, like (* x x) when n=2, (* x (* x x)) when n=3, and so on. Here's a first cut at the solution:

(define (emit-power x n)
  (cond ((= n 0) 1)
        ((odd? n) `(* x ,(emit-power x (- n 1))))
        (else `(square ,(emit-power x (/ n 2))))))

> (emit-power 'x 5)
; (* x (square (square (* x 1))))

(Interactive input appears in bold, with the response in italic.) The emitted code looks right, but we wanted to make it as efficient as possible -- so, first off, the (* x 1) has got to go:

(define (emit-power x n)
  (cond ((= n 0) 1)
        ((= n 1) 'x)
        ((odd? n) `(* x ,(emit-power x (- n 1))))
        (else `(square ,(emit-power x (/ n 2))))))

> (emit-power 'x 5)
; (* x (square (square x)))

Better, but there are still those runtime calls to square. Let's inline them:

(define (emit-power x n)
  (cond ((= n 0) 1)
        ((= n 1) 'x)
        ((odd? n) `(* x ,(emit-power x (- n 1))))
        (else (emit-square (emit-power x (/ n 2))))))

(define (emit-square x)
  `(* ,x ,x))

> (emit-power 'x 5)
; (* x (* (* x x) (* x x)))

Wait a minute, we wound up duplicating the (* x x). We need to factor it explicitly:

(define (emit-square x)
  `(let ((y ,x))
     (* y y)))

> (emit-power 'x 5)
; (* x (let ((y (let ((y x)) (* y y))))
;        (* y y)))

Argh, too many lets! Get rid of the redundant one:

(define (emit-square x)
  (if (symbol? x)
      `(* ,x ,x)
      `(let ((y ,x))
         (* y y))))

> (emit-power 'x 5)
; (* x (let ((y (* x x))) (* y y)))

And we're done! Or are we? There's been a bug all along -- emit-power ignores its x parameter:

> (emit-power 'input 2)
; (* x x)

We need to unquote the x:

(define (emit-power x n)
  (cond ((= n 0) 1)
        ((= n 1) x)
        ((odd? n) `(* ,x ,(emit-power x (- n 1))))
        (else (emit-square (emit-power x (/ n 2))))))

> (emit-power 'input 2)
; (* input input)

2. Symbolic execution

That wasn't hard, but it was kind of annoying and fiddly. Surely there's a better way? The code would have been simpler if we could depend on our Scheme compiler to optimize out the redundant let and the (* x 1) for us. In general, though, the compiler won't know all the optimizations we need; this article covers techniques that go beyond relying on the underlying compiler for everything. The next most convenient choice for us would be if we could push the optimizations into an underlying layer we could forget about, as if it were part of the compiler. So let's try gathering all the emission code into a new function, emit-*, where we'll put the optimizations:

(define (emit-power x n)
  (cond ((= n 0) 1)
        ((odd? n) (emit-* x (emit-power x (- n 1))))
        (else (emit-square (emit-power x (/ n 2))))))

(define (emit-square x)
  (emit-* x x))

(define (emit-* x y)
  `(* ,x ,y))

> (emit-power 'x 5)
; (* x (* (* (* x 1) (* x 1)) (* (* x 1) (* x 1))))

We're back to a completely unoptimized output. We need to restore the optimizations, putting them in emit-* this time:

(define (emit-* x y)
  (cond ((eqv? y 1) x)
        ((and (equal? x y)
              (not (symbol? x)))
         `(let ((y ,x))
            (* y y)))
        (else
         `(* ,x ,y))))

> (emit-power 'x 5)
; (* x (let ((y (* x x))) (* y y)))

Now we can use the original code unchanged to emit its specialized version, just by rebinding the * it uses:

(define (emit-power x n)
  (let ((* emit-*))

    (define (power x n)
      (cond ((= n 0) 1)
            ((odd? n) (* x (power x (- n 1))))
            (else (square (power x (/ n 2))))))

    (define (square x)
      (* x x))

    (power x n)))

Looking back over this result suggests another way to think about what we're doing: to emit specialized code, run the original program on symbolic values instead of numbers. Where an operation takes a symbolic value, have it return code that computes the runtime value, instead of returning a value immediately. Running the program generates a trace of the operations to be done.

If this is truly general it should work on another example, so let's try something else, specializing it the `right' way from the start. We evaluate a polynomial represented by a list of coefficients, highest-order first:

(define (poly-value coeffs x)
  (foldl (lambda (value coeff)
           (+ (* value x) coeff))
         0
         coeffs))

> (poly-value '(5 0 1) 2)
; 21

If we specialize poly-value for particular polynomials, the residual code will use + and *. We need to define an emit-+, and while we're at it we'll factor out common code between the emitters and invest in some more optimizations -- defined in Appendix B.

(define (emit-poly-value coeffs x)
  (let ((* emit-*) (+ emit-+))
    (foldl (lambda (value coeff)
             (+ (* value x) coeff))
           0
           coeffs)))

> (emit-poly-value '(5 0 1) 'x)
; (+ (* (* 5 x) x) 1)

It's rather nice that we could write poly-value in as high-level a style as we liked, using foldl instead of a loop, without seeing any trace of that abstraction in the output code. This works for the same reason we rarely need to worry about the speed of macroexpansion, only we didn't even have to make the decision up front to write a macro instead of ordinary code.

3. Eliminating common subexpressions

There are several directions we could go from here:

Do something about having to hand-collate the residual operators and then write emitters for them all.
Build more optimizations into the emitters to generate smarter code for more complicated programs.
Handle a wider variety of programs, yielding residual code with assignments, conditionals, loops, and so on.

We could tackle the first by writing emitters for all Scheme functions, but if we did that we'd find that it works poorly in general -- the let hack used in emit-* and emit-+ is too weak. If instead of square we'd started from something like

(define (foo x)
  (* x (- x)))

then our emit-* would not have noticed the code duplication. Here's a general common subexpression eliminator to solve this problem and simultaneously attack the second item above.

(define (with-cse receiver)
  (let ((bindings '()))

    (define (cseify emitter)
      (lambda operands
        (let ((exp (apply emitter operands)))
          (cond ((or (symbol? exp) (number? exp)) exp)
                ((assoc exp bindings) => cadr)
                (else (let ((name (fresh-name)))
                        (set! bindings (cons (list exp name) bindings))
                        name))))))

    (let ((exp (receiver cseify)))
      `(let* ,(reverse (map reverse bindings))
         ,exp))))

(with-cse receiver) returns an expression produced by calling (receiver cseify) but with common subexpressions eliminated. cseify is a function that transforms an emitter into an equivalent one that remembers and reuses the expressions produced, to avoid redundant code. receiver should return an expression built partly out of calls to emitters built with cseify.

This is clearer with an example:

(define (emit-poly-value coeffs x)
  (with-cse
   (lambda (cseify)
     (let ((* (cseify emit-*))
           (+ (cseify emit-+)))
       (foldl (lambda (value coeff)
                (+ (* value x) coeff))
              0
              coeffs)))))

> (emit-poly-value '(5 0 1) 'x)
; (let* ((x0 (* 5 x)) (x1 (* x0 x)) (x2 (+ x1 1))) x2)

Exercise: Improve with-cse so it doesn't bind the redundant variable x2 in the output code above.

Exercise: Specialize the n-point Fast Fourier Transform for constant n. It's possible to write the FFT in a straightforward, inefficient style (using O(n²) operations) and let our specialization library optimize it to O(n log n) automatically. You'll still need to use the Danielson-Lanczos lemma: see Appendix C. (Answer)

Exercise: Extend the code to handle impure primitives like set-car! and write. This might work like:

> (with-cse (lambda (cseify cseify-impure)
>             (let ((write (cseify-impure 'write)))
>               (for-each write '(x y z)))))
; (let* ((x0 (write x)) (x1 (write y)) (x2 (write z))) #f)

Exercise: Instead of using with-cse, write a let macro shadowing the normal one. That is, when you use a costly computation more than once, it's normal for your code to have a let naming that computation, so you should be able to exploit this information instead of reconstructing it with with-cse. What other special forms would need to be shadowed to make this work transparently?

4. From interpreter to compiler

For our next trick, we'll take an interpreter and turn it into a compiler. The example interpreter is based on the rewrite-rule interpreter in Peter Norvig's Paradigms of AI Programming -- it's a nice example that also comes with a hand-coded compiler we can compare ours to. His code is in Common Lisp, so we reconstruct a simplified version in Scheme first. (Alternatively I could've ported the symbolic execution code to Common Lisp, which is probably simpler, but I just didn't feel like it. And using two different Lisp dialects in this article would narrow its audience even further.)

We can use rewrite rules to simplify algebraic expressions. For example,

> (simplify '(+ (* 3 x) (* x 3)))
; (* 6 x)

This works by applying a list of rules to all parts of the subject expression repeatedly until no more simplifications are possible:

(define *simplification-rules*
  '(((+ ?x ?x)          (* 2 ?x))
    ((* ?s ?n)          (* ?n ?s))
    ((* ?n (* ?m ?x))   (* (* ?n ?m) ?x))
    ((* ?x (* ?n ?y))   (* ?n (* ?x ?y)))
    ((* (* ?n ?x) ?y)   (* ?n (* ?x ?y)))))

The left hand column has patterns to match, while the right hand holds responses. The first rule says, if you see (+ foo foo), rewrite it into (* 2 foo). Variables like ?x can match anything, while ?m and ?n can only match numbers. (Norvig uses lots more rules; this is just a sample.)

So here's the code. I won't explain it in any detail, since this article isn't about writing interpreters.

;; Return exp simplified until no more rules apply.
(define (simplify exp)
  (if (pair? exp)
      (simplify-exp (map simplify exp))
      exp))

;; Simplify an expression whose subexpressions are already simplified.
(define (simplify-exp exp)
  (or (arithmetic-eval exp)
      (rule-based-translator exp *simplification-rules*
                             (lambda (env response)
                               (simplify (sublis env response))))
      exp))

;; Find a rule that matches input, and apply action to the matching
;; bindings and the response part of the matching rule.  If none
;; matches, return #f.
(define (rule-based-translator input rules action)
  (let checking ((rules rules))
    (and (not (null? rules))
         (let ((pattern (rule-pattern (car rules)))
               (response (rule-response (car rules))))
           (cond ((pat-match pattern input '())
                  => (lambda (env) (action env response)))
                 (else (checking (cdr rules))))))))

;; The fields of a rule.
(define rule-pattern car)
(define rule-response cadr)

;; Try to match pattern against input given existing bindings in
;; env, returning an extended env.  If this is impossible, return
;; #f. env may be #f already, and then no match is possible.
(define (pat-match pattern input env)
  (cond ((not env) #f)
        ((pair? pattern)
         (and (pair? input)
              (pat-match (cdr pattern) (cdr input)
                         (pat-match (car pattern) (car input)
                                    env))))
        ((variable? pattern)
         (match-variable pattern input env))
        ((equal? pattern input)
         env)
        (else #f)))

;; Try to match a variable var against input given env.
(define (match-variable var input env)
  (cond ((assq var env)                 ; already bound
         => (lambda (pair)
              (and (equal? input (cdr pair))
                   env)))
        ((memq var '(?m ?n))            ; numeric variables
         (and (number? input)
              (acons var input env)))
        (else (acons var input env))))

;; Return true iff x is a variable.
(define (variable? x)
  (and (symbol? x)
       (< 0 (string-length (symbol->string x)))
       (char=? #\? (string-ref (symbol->string x) 0))))

;; Perform the substitutions in env upon exp.
(define (sublis env exp)
  (cond ((null? exp) '())
        ((pair? exp)
         (cons (sublis env (car exp))
               (sublis env (cdr exp))))
        ((assq exp env) => cdr)
        (else exp)))

Besides the rule-based translator, we also simplify with arithmetic-eval which just reduces constant expressions like (+ 2 3) to numbers.

;; Try to reduce exp as a constant expression; return #f if impossible.
(define (arithmetic-eval exp)
  (and (all number? (exp-args exp))
       (or (memq (exp-op exp) '(+ - * /))
           (and (eq? (exp-op exp) '^)
                (integer? (cadr (exp-args exp)))))
       (apply (eval-primitive-name (exp-op exp))
              (exp-args exp))))

(define (eval-primitive-name op)
  (cadr (assq op op-meanings)))

(define op-meanings
  `((+ ,+) (- ,-) (* ,*) (/ ,/) (^ ,expt)))

;; The parts of a compound expression: operator and arguments.
(define exp-op car)
(define exp-args cdr)

Norvig sped up this simplifier by compiling out what rule-based-translator does with *simplification-rules* -- converting a list of rules into Lisp code that tests a subject expression and builds a new expression out of the response of the matching rule, if any. This is just what we'd get by partially evaluating rule-based-translator against *simplification-rules* -- can we do that, instead of hand-coding a compiler?

Let's look at which operations depend on the dynamic input. Each such operation will have to show up in the compiled code.

(define (rule-based-translator input rules action)
  (let checking ((rules rules))
    (and (not (null? rules))
         (let ((pattern (rule-pattern (car rules)))
               (response (rule-response (car rules))))
           (cond ((pat-match pattern input '())
                  => (lambda (env) (action env response)))
                 (else (checking (cdr rules))))))))

Here, all the rules are static -- only input and env vary dynamically. (env is an interesting case: we know that if we succeed in matching a particular pattern like (+ ?x ?x), we'll get an env like ((?x . something)) back, where something depends on dynamic input. This is called a partially static structure.) The cond is dynamic -- it depends on the match against dynamic data, and therefore we'll need a cond or equivalent in the compiled code.

(define (pat-match pattern input env)
  (cond ((not env) #f)
        ((pair? pattern)
         (and (pair? input)
              (pat-match (cdr pattern) (cdr input)
                         (pat-match (car pattern) (car input)
                                    env))))
        ((variable? pattern)
         (match-variable pattern input env))
        ((equal? pattern input)
         env)
        (else #f)))

For pat-match, the dynamic variables are input and env. Operations on them are (not env), (pair? input), (car input), (cdr input), and (equal? pattern input). All but the first are reasonable things to have to do at runtime, but why are we testing env for failure? env will be false only if some earlier step in the matching failed, and then we should simply return #f for the whole match. Instead, this code will discover failure at some point in the pattern tree, then repeatedly check for #f at each parent node back to the root. What was convenient in the interpreter leads to lousy code in the compiler, the root cause being that it unnecessarily let a dynamic dependence leak into its control flow. We can factor that dependence out with a continuation we'll name succeed:

;; Try to match pattern against input given existing bindings in
;; env, returning (succeed env2) where env2 is an extended env.
;; If instead we fail to match, return #f.
(define (pat-match pattern input env succeed)
  (cond ((pair? pattern)
         (and (pair? input)
              (pat-match (car pattern) (car input) env
                         (lambda (env)
                           (pat-match (cdr pattern) (cdr input) env
                                      succeed)))))
        ((variable? pattern)
         (match-variable pattern input env succeed))
        (else
         (and (equal? pattern input)
              (succeed env)))))

Note that we're adding the new parameter to match-variable as well. We need to change rule-based-translator to supply the new parameter:

(define (rule-based-translator input rules action)
  (let checking ((rules rules))
    (and (not (null? rules))
         (let ((pattern (rule-pattern (car rules)))
               (response (rule-response (car rules))))
           (or (pat-match pattern input '()
                          (lambda (env) (action env response)))
               (checking (cdr rules)))))))

match-variable changes in the same way. It adds another dynamic operation, (number? input):

(define (match-variable var input env succeed)
  (cond ((assq var env)
         => (lambda (pair)
              (and (equal? (cdr pair) input)
                   (succeed env))))
        ((memq var '(?m ?n))
         (and (number? input)
              (succeed (acons var input env))))
        (else (succeed (acons var input env)))))

Finally, sublis adds a cons operation. What about the association list functions acons and assq? These operate on partially static structures here, as mentioned above. assq will look up a static variable yielding a dynamic result. As long as we're happy with that dynamic expression appearing in the code wherever the assq generates it, we're home free -- we don't need to build the association list at runtime. Where is it safe for assq to move that expression? Anywhere but outside the scope of any variables it references. (If the code also used side effects and recursion, we'd have to worry about evaluation order and termination, too.) But adding the succeed continuation didn't just eliminate a test, it also moved our build-a-result operation into the scope of the pattern match -- just what we needed! Hurray.

Concretely, for a rule like (foo ?x) rewriting to (bar ?x), the succeed made the difference between output code like:

(let ((env
       (and (pair? subject)
            (let ((t1 (car subject)) (t2 (cdr subject)))
              (and (eq? t1 'foo)
                   (pair? t2)
                   (null? (cdr t2))
                   (acons '?x (car t2) '()))))))
  (and env
       (list 'bar (cdr (assq '?x env)))))

and the much better:

(and (pair? subject)
     (let ((t1 (car subject)) (t2 (cdr subject)))
       (and (eq? t1 'foo)
            (pair? t2)
            (null? (cdr t2))
            (list 'bar (car t2)))))

Now we're ready to wrap the interpreter inside our partial evaluator, except for a couple things:

We need to produce dynamic conditionals, and we don't have any emitters for them yet.
Instead of numbers, this interpreter's values are general s-expressions, so we need a general way to distinguish static values at compile time from dynamic code to emit. (Previously we just called number?.)

Tackling the second issue first: we leave static values alone and attach a unique tag to dynamic values. (Could we do it the other way around? Should we?)

(define dynamic-tag            (list 'dynamic))
(define (emit code)            (list dynamic-tag code))
(define (dynamic.code dynamic) (cadr dynamic))

(define (dynamic? obj)  (and (pair? obj) (eq? dynamic-tag (car obj))))
(define (static? obj)   (not (dynamic? obj)))

(define (as-code obj)
  (cond ((dynamic? obj)
         (dynamic.code obj))
        ((or (null? obj) (pair? obj) (symbol? obj))
         (list 'quote obj))
        (else obj)))

;; Generic emitter maker with constant folding
(define (make-emitter op op-name)
  (lambda operands
    (if (all static? operands)
        (apply op operands)
        (emit `(,op-name ,@(map as-code operands))))))

Dynamic conditionals interact with common subexpression elimination: if you generate code like (if foo (bar ...) (baz ...)), you don't want to eliminate subexpressions of (baz ...) that happen to match a subexpression of (bar ...). We'll handle this by having with-cse supply a new procedure, (run thunk), that calls (thunk) while keeping its subexpressions in a new subcontext of the current context.

(define (with-cse receiver)
  (let ((bindings '()))

    (define (run emitter-thunk)
      (let* ((old-bindings bindings)
             (value (emitter-thunk))
             (result (wrap-let* (list-difference bindings old-bindings)
                                value)))
        (set! bindings old-bindings)
        result))

    (define (cseify emitter)
      (lambda operands
        (let ((value (apply emitter operands)))
          (if (static? value)
              value
              (let ((code (dynamic.code value)))
                (cond ((symbol? code) value)
                      ((assoc code bindings)
                       => (lambda (pair) (emit (cadr pair))))
                      (else (let ((name (fresh-name)))
                              (set! bindings (cons (list code name) bindings))
                              (emit name)))))))))

    (define (wrap-let* bindings exp)
      (emit `(let* ,(reverse (map reverse bindings))
               ,(as-code exp))))

    (let ((value (receiver cseify run)))
      (as-code (wrap-let* bindings value)))))

Ready to roll!

(define (compile-rules)
  (with-cse
   (lambda (cse run)
     (let ((cons    (cse (make-emitter cons 'cons)))
           (car     (cse (make-emitter car 'car)))
           (cdr     (cse (make-emitter cdr 'cdr)))
           (pair?   (cse (make-emitter pair? 'pair?)))
           (number? (cse (make-emitter number? 'number?)))
           (equal?  (cse (make-emitter equal? 'equal?)))
           (%and    (lambda (test then-proc)
                      (emit-and test (run then-proc))))
           (%or     (lambda (test else-proc)
                      (emit-or test (run else-proc)))))

       (define (rule-based-translator input rules action)
         (let checking ((rules rules))
           (and (not (null? rules))
                (let ((pattern (rule-pattern (car rules)))
                      (response (rule-response (car rules))))
                  (%or (pat-match pattern input '()
                                  (lambda (env) (action env response)))
                       (lambda ()
                         (checking (cdr rules))))))))

       (define (pat-match pattern input env succeed)
         (cond ((pair? pattern)
                (%and (pair? input)
                      (lambda ()
                        (pat-match (car pattern) (car input) env
                                   (lambda (env)
                                     (pat-match (cdr pattern) (cdr input) env
                                                succeed))))))
               ((variable? pattern)
                (match-variable pattern input env succeed))
               (else
                (%and (equal? pattern input)
                      (lambda () (succeed env))))))

       (define (match-variable var input env succeed)
         (cond ((assq var env)
                => (lambda (pair)
                     (%and (equal? (cdr pair) input)
                           (lambda () (succeed env)))))
               ((memq var '(?m ?n))
                (%and (number? input)
                      (lambda () (succeed (acons var input env)))))
               (else (succeed (acons var input env)))))

       (define (sublis a-list exp)
         (cond ((null? exp) '())
               ((pair? exp)
                (cons (sublis a-list (car exp))
                      (sublis a-list (cdr exp))))
               ((assq exp a-list) => cdr)
               (else exp)))

       (rule-based-translator (emit 'subject)
                              *simplification-rules*
                              sublis)))))

All we've done to the body is change dynamic instances of and and or to %and and %or with thunked arguments; we could have avoided that change as well by using macros. We still need emitters for those dynamic conditionals:

(define (emit-and . operands)
  (emit `(and ,@(map as-code operands))))

(define (emit-or . operands)
  (emit `(or ,@(map as-code operands))))

and a new top-level function to call the compiled code:

(define (simplify-exp exp)
  (cond ((arithmetic-eval exp))
        ((apply-rules exp) => simplify)
        (else exp)))

(define apply-rules
  (eval `(lambda (subject) ,(compile-rules))))

5. Reflections

How does this compiler compare to Norvig's? The structure is essentially the same, but his was built by looking at some hypothetical sample output and then writing new code by hand that would generate it. The continuation-passing came in as a eureka step. Our continuation was kind of eurekaish, too, but at least it wasn't needed to get a working compiler. A fancy partial evaluator might have found that optimization on its own, as a ``continuation-based binding-time improvement''.

Norvig does two optimizations we don't: first, his compiler assumes the subject expressions are arithmetic expressions with a fixed number of arguments, so the pattern-matching makes fewer tests on it. We could do the same with changes to the pattern matcher.

Second, different rules share tests they hold in common -- for example, if two successive patterns are (+ u v) and (+ x y), the compiled code only checks for a + once. This is a form of common subexpression elimination across branches, implemented with rewrite rules on the output code like (or (and x e1) (and x e2)) --> (and x (or e1 e2)). This is a reusable kind of optimization we could profitably add to our library.

Project: Generalize the code emission library, letting users supply custom optimization rules in the rewrite rule language we just implemented. Look up Norvig's and/or optimizations and implement them here; you'll find you need to fix with-cse to give variables canonical names instead of monotonically-increasing ones (otherwise pattern-matching on output code would miss opportunities when equivalent code uses different variables). Provide a way to generate output code with loops and recursions.

More important than the quality of the code is the principle we've just drawn out: a compiler can be derived from an interpreter automatically by analyzing how the interpreter depends on runtime input. (There's an implication for how to write interpreters: write so this analysis stays simple. Don't unnecessarily mix binding times.) More generally, we can think of many, many code optimization problems as moving binding times around, where the same logic applies.

We've also seen how simple, quickly understandable tools can deliver a significant fraction of the benefit of a sophisticated partial evaluator.

Exercise: Our simplifier still has a significant interpretive component: we only compiled away the rule-based-translator part. Write a compiler from a set of rules to a standalone Scheme program. Benchmark it against the simplifier above.

Notes

It might seem like I'm implying Norvig should have written his match compiler my way, but that's not so at all -- hand-coding it made perfect sense in a book where this material would be a lengthy detour. The book also included a reference to the Berlin/Weise paper above.

Thanks to Richard Uhtenwoldt, Attila Lendvai, and Gene Michael Stover for problem reports and suggestions.

Appendix A: Helper Functions

;; Extend an association list.
(define (acons x y a-list)
  (cons (cons x y) a-list))

;; Return true iff test? is true of each element in ls.
(define (all test? ls)
  (or (null? ls)
      (and (test? (car ls))
           (all test? (cdr ls)))))

(define pi (* 2 (atan 1 0)))

(define fresh-name-counter -1)

(define (fresh-name)
  (set! fresh-name-counter (+ fresh-name-counter 1))
  (string->symbol
   (string-append "x" (number->string fresh-name-counter))))

;; Return the list of integers from 0 to p - 1.
(define (iota p)
  (let loop ((p p) (ls '()))
    (if (= p 0)
        ls
        (loop (- p 1)
              (cons (- p 1) ls)))))

(define (foldl fn acc ls)
  (if (null? ls)
      acc
      (foldl fn
             (fn acc (car ls))
             (cdr ls))))

;; Pre: small is eq? to some tail of big.
;; Return the head of big before that tail.
(define list-difference
  (lambda (big small)
    (let loop ((ls big))
      (if (eq? ls small)
          '()
          (cons (car ls)
                (loop (cdr ls)))))))

Appendix B: Emitters Mark II

(define (emit-* x y)
  (cond ((or (eqv? x 0) (eqv? y 0)) 0)
        ((eqv? x 1) y)
        ((eqv? y 1) x)
        (else (emit-binary-op * '* x y))))

(define (emit-+ x y)
  (cond ((eqv? x 0) y)
        ((eqv? y 0) x)
        (else (emit-binary-op + '+ x y))))

(define (emit-binary-op op op-name x y)
  (cond ((and (number? x) (number? y))
         (op x y))
        ((and (equal? x y)
              (not (symbol? x)))
         `(let ((y ,x))
            (,op-name y y)))
        (else
         `(,op-name ,x ,y))))

Appendix C: Discrete Fourier Transform

An n-point Fourier transform evaluates a polynomial of order n at n points (the complex nth roots of 1), for a total of O(n^2) complex multiplications. The time can be reduced to O(n log n) when n is a power of 2, using a clever, tightly coded algorithm that exploits commonalities among the n evaluations. Or, if we're feeling lazy, we can just specialize the simple algorithm for a particular n and let the common subexpression eliminator take care of it all. Ours isn't smart enough to do that unless we help it along a little with the Danielson-Lanczos lemma:

for a polynomial p(x) = c₀ + c₁ x + c₂ x² + ...
we rewrite it as p(x) = q(x²) + x r(x²)
where q(x) = c₀ + c₂ x + ... (using the even-numbered coeffs)
and r(x) = c₁ + c₃ x + ... (using the odd-numbered coeffs)

This recursion is the central insight of the Fast Fourier Transform; we leave the remaining coding details for the partial evaluator to work out.

;; Pre: (length inputs) is a power of 2.
(define (dft inputs)
  (let ((n (length inputs)))
    (map (lambda (i) (poly inputs i n))
         (iota n))))

;; Evaluate the complex polynomial with the given coefficients
;; at argument (root-of-unity p n).
;; Pre: (length coeffs) is a power of 2.
(define (poly coeffs p n)
  (if (null? (cdr coeffs))
      (car coeffs)
      (alternate coeffs
                 (lambda (evens odds)
                   (let ((p2 (modulo (* 2 p) n)))
                     (complex+ (poly evens p2 n)
                               (complex* (root-of-unity p n)
                                         (poly odds p2 n))))))))

;; Return (receiver x y), where x is the elements of ls at
;; even positions, and y is the elements at odd positions.
;; E.g. for ls = '(A B C D), return (receiver  '(A C) '(B D)).
;; Pre: (length ls) is even.
(define (alternate ls receiver)
  (do ((ls     ls   (cddr ls))
       (evens  '()  (cons (car ls) evens))
       (odds   '()  (cons (cadr ls) odds)))
      ((null? ls)
       (receiver (reverse evens) (reverse odds)))))

;; Complex numbers - constructor and accessors.
(define (make-complex re im) (list re im))
(define re car)
(define im cadr)

(define (complex+ a b)
  (cmake (+ (re a) (re b))
         (+ (im a) (im b))))

;;(a + ib) (x + iy) = ax - by + i(bx + ay)
(define (complex* a x)
  (cmake (- (* (re a) (re x)) (* (im a) (im x)))
         (+ (* (im a) (re x)) (* (re a) (im x)))))

;; exp(2 pi i/n) = cos(...) + i sin(...)
(define (root-of-unity i n)
  (let ((theta (* (* 2 pi) (/ i n))))
    (make-complex (cos theta) (sin theta))))

Appendix D: Solution to the FFT Exercise

(define (emit-fft inputs)
  (with-cse
   (lambda (cse)
     (let ((* (cse emit-*))
           (+ (cse emit-+))
           (- (cse emit--)))

       ;; Return the discrete fourier transform of a list of complex numbers
       ;; (whose length must be a power of 2).
       (define (dft inputs)
         (let ((n (length inputs)))
           (map (lambda (i) (poly inputs i n))
                (iota n))))

       ;; Evaluate the complex polynomial with the given coefficients
       ;; at argument (root-of-unity p n).
       (define (poly coeffs p n)
         (if (null? (cdr coeffs))
             (car coeffs)
             (alternate coeffs
                        (lambda (evens odds)
                          (let ((p2 (modulo (* 2 p) n)))
                            (complex+ (poly evens p2 n)
                                      (complex* (root-of-unity p n)
                                                (poly odds p2 n))))))))

       ;; Return (receiver x y), where x is the elements of ls at
       ;; even positions, and y is the elements at odd positions.
       ;; E.g. for ls = '(A B C D), return (receiver  '(A C)  '(B D)).
       (define (alternate ls receiver)
         (do ((ls     ls   (cddr ls))
              (evens  '()  (cons (car ls) evens))
              (odds   '()  (cons (cadr ls) odds)))
             ((null? ls)
              (receiver (reverse evens) (reverse odds)))))

       (define (complex+ a b)
         (make-complex (+ (re a) (re b))
                       (+ (im a) (im b))))

       (define (complex* a b)
         (make-complex (- (* (re a) (re b)) (* (im a) (im b)))
                       (+ (* (im a) (re b)) (* (re a) (im b)))))

       (dft inputs)))))

(define (make-complex real imag) (list real imag))
(define re car)
(define im cadr)

;; Return exp(i*2*pi*k/n).
(define (root-of-unity k n)
  (let ((theta (* (* 2 pi) (/ k n))))
    (make-complex (cos theta) (sin theta))))

> (emit-fft (list (make-complex 'r0 'i0) (make-complex 'r1 'i1)))
; (let* ((x0 (+ r0 r1))
;        (x1 (+ i0 i1))
;        (x2 (* -1.0 r1))
;        (x3 (* 1.2246063538223772e-16 i1))
;        (x4 (- x2 x3))
;        (x5 (* 1.2246063538223772e-16 r1))
;        (x6 (* -1.0 i1))
;        (x7 (+ x5 x6))
;        (x8 (+ r0 x4))
;        (x9 (+ i0 x7)))
;    ((x0 x1) (x8 x9)))

Well, that didn't quite work, because (sin pi) is not 0 in our floating-point math, and similarly for cos. A hack around this:

;; exp(2 pi i/n) = cos(...) + i sin(...)
(define (root-of-unity i n)
  (let ((theta (* (* 2 pi) (/ i n))))
    (make-complex (cosine theta) (sine theta))))

(define (sine x)
  (cond ((= x pi) 0)
        (else (sin x))))

(define (cosine x)
  (cond ((= x (/ pi 2)) 0)
        ((= x (* 1.5 pi)) 0)
        (else (cos x))))

More interesting is the way we cheated, above, by using our ordinary runtime representation of complex numbers, but with symbolic values in the real and imaginary components. We could have written emitters for make-complex and its accessors, and strictly speaking we should have, since there are residual make-complex operations at the end -- the last line of the output ought to be

;    ((make-complex x0 x1) (make-complex x8 x9)))

But we didn't bother because the emitters are just extra bureaucracy in this case -- leaving them out gives us the convenience of pretending the intermediate objects are ordinary complex numbers, and now we can patch up the last line at our leisure.