Friday, February 22, 2013

Dynamo: Preventing Unnecessary Evaluation (Part 2)

In my last blog post, I spoke about how we could use graph analysis in Dynamo to avoid duplicating code when compiling Dynamo workflows down to the FScheme engine. I am pleased to announce that this has now been implemented and is live on GitHub!

Of course, nothing ever goes right the first time. Hours after pushing all the new code to GitHub, during a moment of quiet reflection, I discovered why the approach I originally outlined was flawed. As with most issues in programming--I find--the problem has to do with mutation.

So what's wrong?

Let's use the following workflow as an example:



If we were to optimize this using the original approach, the resulting FScheme code would look like this:

(λ (path)
  (let ([a (read-file path)])
    (list (begin (write-file path "text")
                a
                path)
          a)))

(Note that this is a definition for a new node, so it gets compiled down to a function. Hence the λ at the beginning.)

At first glance, the problem may not seem immediately obvious. The Read File node is connected to two inputs, Perform All and List. We use the lowest single ancestor algorithm to find where we can place a let binding: in this case, it's around the List node. We evaluate the Read File node, store it in a new identifier a, and then in both places that it's used, we refer to a.

...so what's wrong?

FScheme--like most Schemes--is an eager language. This means that it will evaluate all that it can as soon as it can. When traversing the above code, first FScheme will first evaluate (read-file path) and store it in a. Then it will evaluate the body of the let binding, where (write-file path "text") will be evaluated. The problem here is that Perform All (which compiles to the begin expression) must evaluate its inputs in the order they are listed, otherwise side-effects could occur in the wrong order. Since the order in which side-effects occur matters a lot, this is a big problem.

What we really want is to not evaluate (read-file path) until we normally would, after (write-file path "text"). Then, we want to store that value somewhere and reference it later when we need it. In more traditional imperative programming languages, this is familiar. Take a look at the following snippet of C#:

delegate (string path)
{
  string a;
  writeFile(path, "text");
  a = readFile(path);
  return new List<string>() { path, a };
};

Notice how we store a after we perform writeFile, and then later we just refer to a again. Well, we can do something similar in Scheme:

(λ (path)
  (let ([a (begin)])
    (list (begin (write-file path "text")
                 (begin (set! a (read-file path))
                        a)
                 path)
          a)))

This may look strange, but we're following the same pattern as in C#:  we declare a new identifier a that's uninitialized (the (begin) code returns a value that, when used, will result in an error), then where a would normally first be referenced, we evaluate (set! a (read-file path)) which stores an actual value in a. In all other places, we can just refer to a which now contains an actual value.

Putting it all together

The full algorithm is as follows:
  1. For each node X that has multiple outputs connected:
    1. Assign a unique string ID to be used as a storage variable
    2. Look up the lowest single ancestor LSA(X)
  2. Compile the dynamo graph starting from the entry point (node with no connected output).
    1. When reaching LSA(X), insert a let binding, binding ID to a (begin)
    2. The first time X is reached, insert a begin that first binds the compiled form of X to ID, and then returns ID.
    3. All subsequent times X is reached, simply insert ID

No comments:

Post a Comment