Sunday, February 17, 2013

Dynamo: Preventing Unnecessary Evaluation

Hello again! Today I'm pleased to present some progress with automatic removal of duplicate code, an issue that has plagued Dynamo workflows for a long time.

What's the problem?

Take a look at the following workflow:


Notice that several of the nodes in the workflow have their outputs connected to multiple places. Therein lies the heart of the problem: as a user, one would expect the output of a node to be passed to all inputs connected to it. Dynamo does this, but up until now, it did this by duplicating code.

You can see this in the compiled form of the above workflow:

(list (+ (square 3) 3)
      (+ (square 3) 3) 
      (* (square 3) (square 4)) 
      (+ (square 4) (square 4)))

When Dynamo evaluates this expression, it will run all of the duplicated code: (square 3)for example, will be run 3 times. This is two more times than necessary.


That doesn't seem like big deal

It's true that for basic calculations such as in this example, the code that's duplicated doesn't make a big difference. However, if the code that's duplicated takes a long time to run--or worse, performs a mutation such as adding geometry to a Revit document--then the duplication turns out to be a very big deal.

Fortunately there was a way to work around this: the only nodes where duplication never really mattered were ones that had no inputs themselves, such as numbers or variables. This is because it's just as fast to "calculate" the output of these nodes as it would be to lookup a cached result. So what you could do is create a new node and pass in the results you want to cache as inputs. The results are then cached as the node's variables, and then you can access them as many times as you like.

To remove duplication from the above workflow using this method, the compiled expression would look like this:

((lambda (a)
  ((lambda (b)
     ((lambda (c)
        ((lambda (d)
           (list c c (* b d) (+ d d)))
         (square 4))) 
      (+ b a))) 
   (square a))) 
 3)

In the actual Dynamo UI, this would require four separate user-defined nodes, and is obviously too painful to have to do. Ideally, all of this would be done under the hood for us.

There is a better way

Scheme has a built-in construct called let for dealing with caching values, without having to use intermediate helper functions. Using let* (shorthand for nested lets), the above can be rewritten as:

(let* ([a 3]
       [b (square a)]
       [c (+ b a)]
       [d (square 4)])
  (list c c (* b d) (+ d d)))

It's actually significantly easier to have Dynamo generate a let* then having to create all of the intermediate functions. When passed to the underlying FScheme engine, it will eventually all be converted to these helper functions anyway.

So how does this happen?

Lots of graph analysis. When compiling the Expression used for evaluation, Dynamo will now check for duplicated code by looking for nodes with an output being used in more than one place. It figures out dependencies among these nodes (so that they are ordered properly in the let*) and calculates the extent of where the code will be used (to determine the "entry-point" of the let*).

For any node with more than one output, in order to determine where a let* will be placed, the lowest single ancestor for the node must be calculated. In graph theory, the LSA for a node v is another node l that lies on all paths from the root of a directed acyclic graph (which is what a Dynamo workflow is) to v, and none of l's descendants also have this property (otherwise the root node would always satisfy the problem). I found a great paper by Fischer and Huson outlining how to implement an LSA lookup.

No comments:

Post a Comment