The Racket Blog: Dynamic Programming versus Memoization

tbenst · on Aug 26, 2012

I had Shriram as a professor - great lecturer. Here's a racket implementation for the memoize function based off of class notes:

#lang racket

(define memo-table (box empty)) (define-struct memo (key ans))

(define (memoize f) (lambda args (local ([define lookup (filter (lambda (v) (equal? args (memo-key v))) (unbox memo-table))]) (if (empty? lookup) (begin (local ([define ans (apply f args)]) (begin (set-box! memo-table (cons (make-memo args ans) (unbox memo-table))) ans))) (memo-ans (first lookup))))))

Edit: I guess HN doesn't like my formatting. Here's a readable version http://pastie.org/4591751

gus_massa · on Aug 26, 2012

You should use an empty line to make a new paragraph, and then start every line with two spaces:

  #lang racket

  (define memo-table (box empty))
  (define-struct memo (key ans))

  (define (memoize f)
    (lambda args
      (local ([define lookup (filter (lambda (v)
                                       (equal? args (memo-key v)))
                                     (unbox memo-table))])
        (if (empty? lookup)
            (begin (local ([define ans (apply f args)])
                     (begin
                       (set-box! memo-table (cons (make-memo args ans)
                                                  (unbox memo-table)))
                       ans)))
            (memo-ans (first lookup))))))

Fixnum · on Aug 26, 2012

Be careful -- this doesn't fully memoize recursive functions unless you force the computation of intermediate results, since the recursive calls don't use memoization:

(define memo-fib (memoize fib)) ;; example from SICP (memo-fib 40) ;; long wait

I would love to see a good general-purpose 'memoize but rather doubt it's possible, though I think I remember seeing one in Common Lisp in "Paradigms of Artificial Intelligence Programming" that exploited CL's weird namespacing rules for functions to make recursive functions like 'fib run fast.

kenko · on Aug 26, 2012

You can solve the problem with the parent's memoization by writing the original fib using open recursion and closing it with a fixed-point operator. Doesn't help for library functions or things like that, though.

soegaard · on Aug 26, 2012

http://planet.racket-lang.org/package-source/dherman/memoize...

mattj · on Aug 26, 2012

I think this neglects a performance consideration: dynamic programming is often simpler for a compiler to optimize, since it's generally a set of nested loops. Sometimes you can vectorize operations, but often you can just exploit locality to keep all the active set in l1 cache (not a compiler optimization per-se).

Memoization, however, is much more of a black-box. Very few compilers will handle the branching present in memoized function calls in the same way they'd handle a loop (which is also a branch, but a more predictable one).

That all being said, memoization is often simpler to reason about and great for infrequent computations.

rcfox · on Aug 26, 2012

This won't help everywhere, but sometimes you can give the compiler a hint that a certain branch is less likely to be taken. The computation branch is going to be slower anyway, and only happen once, so tell the compiler to assume the lookup branch will be taken most of the time.

ced · on Aug 26, 2012

Wikipedia has an implementation of Fibonacci with DP, and another with memoization. It's a clearer explanation than the blog provides.

https://en.wikipedia.org/wiki/Dynamic_programming#Fibonacci_...

lukev · on Aug 26, 2012

My lack of a formal CS background is hurting me here. Could someone speak to the distinction between a tree and a DAG they make here?

I was under the impression that the terms were more or less synonymous.

pilgrim689 · on Aug 26, 2012

A DAG has directed edges, whereas a tree does not. Also, a node in a tree can only have 1 parent.

Here's an example of a DAG that can't be a tree because of the edge directions:

http://en.wikipedia.org/wiki/File:Directed_acyclic_graph_3.s...

Evbn · on Aug 26, 2012

Trees are commonly directed. Single in edge per node is the important distinction.

pilgrim689 · on Aug 26, 2012

Right, but then we'd be talking about the difference between a "directed tree" and a DAG...or a "rooted tree" and a DAG. But yea, I'm sure grand-parent gets the point by now

lukev · on Aug 26, 2012

Yep, thanks. I realize that in graph theory trees aren't directed, but in programming, saying "tree" almost always implies a parent-child relationship (i.e, directedness).

I was missing that nodes in a DAG can have two "parents", that makes sense.

dkarl · on Aug 26, 2012

In programming lingo a tree is usually a directed acyclic graph in which every node has exactly one incoming edge (one parent) except for the root node, which has no parents. A "cycle" in a directed graph is a loop that connects a node to itself along directed edges, so "acyclic" just means there aren't any loops. In this case, that means no circular dependencies among the computations.

There's also a mathematical definition of tree which is occasionally used in theoretical CS, so you have to be careful about getting them mixed up, but that kind of tree is an undirected acyclic graph. The kind of tree they're talking about in the article is directed, because it represents computational dependencies. (If A is connected to B, then either A depends on B or B depends on A. The dependency only goes one way.)

The defining difference between a tree and a directed acyclic graph is that a node in a DAG can have more than one incoming edge. So a DAG is like a tree in which nodes can have more than one parent. Here's the simplest illustration. I can't draw arrows, but imagine each edge being directed from top to bottom, so that A is the root of the tree:

      A (a tree, therefore       A (a DAG, but not a tree)
     / \     also a DAG   )     / \
    B   C                      B   C
       /                        \ /
      D                          D

In the tree, D can have only one incoming edge (only one parent) so there can only be one path from A to D along the directed edges. In the DAG, D can have multiple incoming edges (from B and C in this case) so there can be multiple paths from A to D. Direction is important to keep in mind (and I really wish I could draw it.) Note that in the DAG, A-B-D-C-A isn't a loop because D-C-A goes against the direction of the edges: A->B->D<-C<-A. Also note that if the tree were not directed, it would be useless for representing dependencies, because it would not tell us whether C depends on A and D, or D depends on C and A depends on C, or maybe D depends on C which depends on A which depends on B. The order of the edges is what tells us that A depends on C and not vice-versa.

Here's what the article means by converting a tree into a DAG for computation. In the following illustration, I'm going to use letters to label the nodes, but different nodes in the same graph are different, even if they have the same label. Here's the tree:

Suppose the node labels represent computations, and the edges represent dependencies. A depends on the results of B and C, B depends on the result of D, C depends on the results of E and D, and D depends on the result of F. If you perform all the computations as they are represented in this tree, D and F will be computed twice. This might be inefficient. So you take nodes with the same labels and identify them, make them the same. That gives you a different graph which is no longer a tree:

This DAG represents the same dependencies that the tree above does, and since nodes with identical labels have been combined, each computation is represented once. The tree representation is easier to create, because you don't have to worry about finding and combining duplicate nodes, but the DAG is more efficient to compute.

The idea of memoization is that each node in the tree should be the name of a computation, and the computation itself should be looked up by name. That way even if a name occurs multiple times in the tree, the computation it names will only occur once. The computations named "B" and "C" both depend on a computation named "D" which they will look up by name. They don't have to know they share a dependency, and they don't even have to reference the same copy of the name "D". This extra layer of indirection is the "black box of memoization" that implicitly turns the tree into a DAG to avoid replicating computations D and F.

dkarl · on Aug 26, 2012

I should add: a consequence of allowing more than one incoming edge is that a DAG can have more than one "root," like this:

  A   B
   \ /
    C

With trees, if you have two roots, you'll have two disjoint trees, because there's no way for their descendants to meet without some node having multiple parents.

nandemo · on Aug 26, 2012

A tree is a DAG that is connected: if a and b are nodes of a tree, there is a path between a and b. But in general a DAG doesn't have to be connected. So all trees are DAGs but not all DAGs are trees: some are forests i.e. unions of trees.

ot · on Aug 26, 2012

That's wrong, not all connected DAGs are trees.

A DAG is a directed graph that has no (directed) loops. Consider the following:

     A
    / \
   v   v
  B     C
   \   /
    v v
     D

This is a DAG but not a tree, because it has an undirected loop but no directed loops.

nandemo · on Aug 26, 2012

You're totally right (I had a thinko and managed to miss the D in the DAG). My apologies for the misinformation.

qlkzy · on Aug 26, 2012

Trees are, in general, undirected (although it is often convenient to treat edges in rooted trees as being all directed towards or away from the root, to determine whether or not a graph is a tree it is necessary to consider the edges as undirected).

A tree is a connected undirected acyclic graph (or, equivalently, an undirected graph in which there is exactly one path between any pair of vertices).

A DAG may represent a tree (usually rooted) or it may not. Not all connected DAGs are trees e.g.:

a->b, a->c, b->d, c->d is a DAG, but not a tree.

shasta · on Aug 26, 2012

A tree is a DAG that's simply connected.

drunkpotato · on Aug 26, 2012

This is a really interesting way to think about computations. I mostly think of memoization for I/O things like database lookups, or long computations. I'd never thought of it in comparison to dynamic programming algorithms.

spaghetti · on Aug 26, 2012

Memoization and dynamic programming come together in top down DP algorithms. These are also referred to as recursion with memoization. The recursive function often looks something like:

    f(arguments) {
        results = memoTable[arguments]

        if(results) // use results
        else
            results = // expensive recursive call to f
            memoTable[arguments] = results
    }

oskarkv · on Aug 26, 2012

As far as I have understood, this is just regular memoization. I mean, often in functional languages one just turns a regular function into a memoized version by calling memoize(function). Sometimes that works to create a function that does what your code does, and sometimes it does not. When it does not I think that's a shortcoming of the language/implementation.

irahul · on Aug 26, 2012

> I'd never thought of it in comparison to dynamic programming algorithms.

The blog post is using a different terminology. Memoization isn't part of dynamic programming - it's just that the top-down dynamic programming benefits from memoization.

The top-down fibonacci is defined as fib(n) = fib(n-1) + fib(n-2) Since fib(n-1) and fib(n-2) are overlapping(they have to overlap for it to be an example of dynamic programming), memoization reduces the number of computations. But that's not to say that a non-memoizing solution isn't an example of dynamic programming.

The bottom-up dynamic programming is the iterative solution.

The blog post is calling the top-down DP as memoization, and bottom-up DP as dynamic programming. I don't think this terminology is common(or correct).

cyarvin · on Aug 27, 2012

I don't know that the terminology is common (or correct), but I find it pretty useful. It also corresponds intuitively to the way people in practice use these terms. I say keep it.

TLDR for OP: in both "DP" and "memoization" you construct a name-value table for a given function. In "memoization" you construct it reactively, in "DP" proactively - where "reactively" means "lazily as we need it," and "proactively" means "any way that isn't reactively."

(A dag being just one way of storing this data structure. Generally it is just a cache of function results - which can be stored as a table, derivation graph, or whatever.)

hobbyist · on Aug 26, 2012

Dynamic programming is an optimization technique using divide and conquer, whereas memoization is an added optimization in the dynamic programming paradigm. An optimization we do while calculating fibonacci sequence is memoization, to prevent from going down the tree to the leaf every time.