Be careful -- this doesn't fully memoize recursive functions unless you force the computation of intermediate results, since the recursive calls don't use memoization:
(define memo-fib (memoize fib)) ;; example from SICP
(memo-fib 40)
;; long wait
I would love to see a good general-purpose 'memoize but rather doubt it's possible, though I think I remember seeing one in Common Lisp in "Paradigms of Artificial Intelligence Programming" that exploited CL's weird namespacing rules for functions to make recursive functions like 'fib run fast.
You can solve the problem with the parent's memoization by writing the original fib using open recursion and closing it with a fixed-point operator. Doesn't help for library functions or things like that, though.
I think this neglects a performance consideration: dynamic programming is often simpler for a compiler to optimize, since it's generally a set of nested loops. Sometimes you can vectorize operations, but often you can just exploit locality to keep all the active set in l1 cache (not a compiler optimization per-se).
Memoization, however, is much more of a black-box. Very few compilers will handle the branching present in memoized function calls in the same way they'd handle a loop (which is also a branch, but a more predictable one).
That all being said, memoization is often simpler to reason about and great for infrequent computations.
This won't help everywhere, but sometimes you can give the compiler a hint that a certain branch is less likely to be taken. The computation branch is going to be slower anyway, and only happen once, so tell the compiler to assume the lookup branch will be taken most of the time.
Right, but then we'd be talking about the difference between a "directed tree" and a DAG...or a "rooted tree" and a DAG. But yea, I'm sure grand-parent gets the point by now
Yep, thanks. I realize that in graph theory trees aren't directed, but in programming, saying "tree" almost always implies a parent-child relationship (i.e, directedness).
I was missing that nodes in a DAG can have two "parents", that makes sense.
In programming lingo a tree is usually a directed acyclic graph in which every node has exactly one incoming edge (one parent) except for the root node, which has no parents. A "cycle" in a directed graph is a loop that connects a node to itself along directed edges, so "acyclic" just means there aren't any loops. In this case, that means no circular dependencies among the computations.
There's also a mathematical definition of tree which is occasionally used in theoretical CS, so you have to be careful about getting them mixed up, but that kind of tree is an undirected acyclic graph. The kind of tree they're talking about in the article is directed, because it represents computational dependencies. (If A is connected to B, then either A depends on B or B depends on A. The dependency only goes one way.)
The defining difference between a tree and a directed acyclic graph is that a node in a DAG can have more than one incoming edge. So a DAG is like a tree in which nodes can have more than one parent. Here's the simplest illustration. I can't draw arrows, but imagine each edge being directed from top to bottom, so that A is the root of the tree:
A (a tree, therefore A (a DAG, but not a tree)
/ \ also a DAG ) / \
B C B C
/ \ /
D D
In the tree, D can have only one incoming edge (only one parent) so there can only be one path from A to D along the directed edges. In the DAG, D can have multiple incoming edges (from B and C in this case) so there can be multiple paths from A to D. Direction is important to keep in mind (and I really wish I could draw it.) Note that in the DAG, A-B-D-C-A isn't a loop because D-C-A goes against the direction of the edges: A->B->D<-C<-A. Also note that if the tree were not directed, it would be useless for representing dependencies, because it would not tell us whether C depends on A and D, or D depends on C and A depends on C, or maybe D depends on C which depends on A which depends on B. The order of the edges is what tells us that A depends on C and not vice-versa.
Here's what the article means by converting a tree into a DAG for computation. In the following illustration, I'm going to use letters to label the nodes, but different nodes in the same graph are different, even if they have the same label. Here's the tree:
A
/ \
B C
/ / \
D E D
\ \
F F
Suppose the node labels represent computations, and the edges represent dependencies. A depends on the results of B and C, B depends on the result of D, C depends on the results of E and D, and D depends on the result of F. If you perform all the computations as they are represented in this tree, D and F will be computed twice. This might be inefficient. So you take nodes with the same labels and identify them, make them the same. That gives you a different graph which is no longer a tree:
A
/ \
B C
\ / \
D E
\
F
This DAG represents the same dependencies that the tree above does, and since nodes with identical labels have been combined, each computation is represented once. The tree representation is easier to create, because you don't have to worry about finding and combining duplicate nodes, but the DAG is more efficient to compute.
The idea of memoization is that each node in the tree should be the name of a computation, and the computation itself should be looked up by name. That way even if a name occurs multiple times in the tree, the computation it names will only occur once. The computations named "B" and "C" both depend on a computation named "D" which they will look up by name. They don't have to know they share a dependency, and they don't even have to reference the same copy of the name "D". This extra layer of indirection is the "black box of memoization" that implicitly turns the tree into a DAG to avoid replicating computations D and F.
I should add: a consequence of allowing more than one incoming edge is that a DAG can have more than one "root," like this:
A B
\ /
C
With trees, if you have two roots, you'll have two disjoint trees, because there's no way for their descendants to meet without some node having multiple parents.
A tree is a DAG that is connected: if a and b are nodes of a tree, there is a path between a and b. But in general a DAG doesn't have to be connected. So all trees are DAGs but not all DAGs are trees: some are forests i.e. unions of trees.
Trees are, in general, undirected (although it is often convenient to treat edges in rooted trees as being all directed towards or away from the root, to determine whether or not a graph is a tree it is necessary to consider the edges as undirected).
A tree is a connected undirected acyclic graph (or, equivalently, an undirected graph in which there is exactly one path between any pair of vertices).
A DAG may represent a tree (usually rooted) or it may not. Not all connected DAGs are trees e.g.:
This is a really interesting way to think about computations. I mostly think of memoization for I/O things like database lookups, or long computations. I'd never thought of it in comparison to dynamic programming algorithms.
Memoization and dynamic programming come together in top down DP algorithms. These are also referred to as recursion with memoization. The recursive function often looks something like:
f(arguments) {
results = memoTable[arguments]
if(results) // use results
else
results = // expensive recursive call to f
memoTable[arguments] = results
}
As far as I have understood, this is just regular memoization. I mean, often in functional languages one just turns a regular function into a memoized version by calling memoize(function). Sometimes that works to create a function that does what your code does, and sometimes it does not. When it does not I think that's a shortcoming of the language/implementation.
> I'd never thought of it in comparison to dynamic programming algorithms.
The blog post is using a different terminology. Memoization isn't part of dynamic programming - it's just that the top-down dynamic programming benefits from memoization.
The top-down fibonacci is defined as fib(n) = fib(n-1) + fib(n-2) Since fib(n-1) and fib(n-2) are overlapping(they have to overlap for it to be an example of dynamic programming), memoization reduces the number of computations. But that's not to say that a non-memoizing solution isn't an example of dynamic programming.
The bottom-up dynamic programming is the iterative solution.
The blog post is calling the top-down DP as memoization, and bottom-up DP as dynamic programming. I don't think this terminology is common(or correct).
I don't know that the terminology is common (or correct), but I find it pretty useful. It also corresponds intuitively to the way people in practice use these terms. I say keep it.
TLDR for OP: in both "DP" and "memoization" you construct a name-value table for a given function. In "memoization" you construct it reactively, in "DP" proactively - where "reactively" means "lazily as we need it," and "proactively" means "any way that isn't reactively."
(A dag being just one way of storing this data structure. Generally it is just a cache of function results - which can be stored as a table, derivation graph, or whatever.)
Dynamic programming is an optimization technique using divide and conquer, whereas memoization is an added optimization in the dynamic programming paradigm. An optimization we do while calculating fibonacci sequence is memoization, to prevent from going down the tree to the leaf every time.
#lang racket
(define memo-table (box empty)) (define-struct memo (key ans))
(define (memoize f) (lambda args (local ([define lookup (filter (lambda (v) (equal? args (memo-key v))) (unbox memo-table))]) (if (empty? lookup) (begin (local ([define ans (apply f args)]) (begin (set-box! memo-table (cons (make-memo args ans) (unbox memo-table))) ans))) (memo-ans (first lookup))))))
Edit: I guess HN doesn't like my formatting. Here's a readable version http://pastie.org/4591751