Dataflow programming method

Dataflow programming is a style of programming that models computation as the flow of data through interconnected nodes. It differs from traditional procedural programming because it does not rely on explicit control structures such as loops or branches to determine the order in which code should be executed. Instead, data flows between nodes and is processed by them according to their internal algorithm. This makes it easier for developers to visualize and understand complex computations since they can trace all operations back to input values.

Introduction


This document describes the Proavatar’s dataflow programming[1] method which can be used to implement mathematical algorithms that process timestamped sensor data and generate specific output values that are to be reported or displayed to a user.


Figure 1: Basic set-up of a sensor data processing algorithm.

When such algorithms would be part of a user-facing application on for example a smartphone, it would not be a good approach to implement these algorithms hard coded as this would require updates of the app. With Proavatar’s dataflow programming method a diagram can be specified as a JSON text file that can be easily shared and exchanged.

In such a diagram, the algorithm is implemented by specifying the inputs and outputs and constructing a sequence of interconnected functions to calculate the outputs.

While theoretically a dataflow diagram could be written in a text-editor, Proavatar offers a graphical tool "Dataflowed" to visualize dataflow diagrams and construct these in a simple and intuitive manner.

Algorithm architecture


Consider an algorithm that takes one or more sensor data streams as input and calculates some kind of output. Such an algorithm is constructed - or implemented - by creating a dataflow diagram that uses input streams and a number of interconnected functions to calculate the value of one or more variable outputs of specific types (variable types are discussed in the next chapter). These can then be used by a connected reporting system like a graphical user interface (GUI) for example. Any settings of the diagram – that can be changed by the executing application – are implemented by using constants.

The architecture of an algorithm is given in figure 2.


Figure 2: Algorithm architecture.

In the following sections the components of a dataflow diagram are discussed.

Variables


In a dataflow diagram several timestamped variables can be used.

In table 1 the variable types are defined that can be used in a dataflow diagram.

Table 1: Defined dataflow variable types.

Name

Description

Symbol

Float[2]

Floating point value, used for angles and other values.

f

Integer

Integer value, used for indices and numbers.

i

Vector

A 3D position (x, y, z).

v

Boolean

Logical value, i.e. true or false.

b

String

A piece of text.

$

Orientation

An orientation, expressed as a quaternion (x, y, z, w).

q

Array

An array of floats, i.e. [f0 , … , fN-1].

a

Sequence

An array of timestamped floats, i.e. [(t0 , f0) , … , (tN-1 , fN-1)].

s

Generic

A generic timestamped variable will not hold any value, but will be converted into a specific type once a value is set.

?

A timestamped variable has a number of relevant properties which are given in table 2 and that are used during the execution of the diagram (discussed in “Diagram execution”).

Table 2: Relevant properties of a timestamped variable.

Property

Description

value

The actual value for which the type is defined by the type of the variable itself as give in table 1.

updated

Boolean indicating whether the value was updated.

checked

Boolean indicating whether the timestamped variable can be evaluated as its updated state will not change during the ongoing execution.

timestamp

Timestamp at which the variable was last updated.

Nodes


All the elements in a diagram are called nodes. A node has one or more connectors, depending on the type. The possible node types are given in table 3.

Table 3: Node types.

Type

Description

Input connectors

Output connector

Function

Generates a certain output by performing an internal calculation using its inputs.

Multiple

Constant

Stores a constant value.

x

Input streams

Generates an output when it is updated by an external process.

x

Variable outputs

Store the result of the execution of the dataflow diagram.

1

x

Container

A node that can be constructed by the diagram editor to assist in visually simplifying the diagram and can contain a number of internal functions, constant nodes or nested containers.

Multiple

For visualization purposes, a node in the diagram can be given a specific color. The following options are available:

blue (default), red, black, brown, cyan, gray, green, magenta, orange, purple, yellow

Also, for all nodes, except functions, a ‘description’ can be edited which appears in the body of the node when displayed. For functions the description specifies the internal algorithm that it implements.

Functions


The core components of a dataflow diagram are functions. A function is a dataflow node that can be selected from a list and has one or more inputs and a single output. In figure 3 an example of such a function is given with two input connectors and one output connector.


Figure 3: Example of a function.

The inputs and output are of specific variable types. For example, the function in figure 3 implements the functionality to round a float (f) to a specified number of decimal places, specified by the integer (i) input.

Functions can also have generic connectors for which the type is not specified yet prior to connecting it. For example, the function in figure 4 performs an addition of two values, whereby the type will be specified as soon as one of the connectors is connected (also see “Connections”).


Figure 4: Example of a generic function.

Since the concept of a data ‘flow’ is used, in general a function calculates (i.e. updates) its output when all inputs are updated. In the concept of dataflow programming, the dataflow ‘assumes’ sequential updates of the inputs. Therefore, the function will ‘wait’ for the updates of all inputs. When the output is calculated, the inputs are set back to their “not updated” state. This process is illustrated in figure 5.

Figure 5: Function update process.

Note that when one of the input updates a number of times consecutively, these updates are not used (i.e. lost) unless the other input(s) are also updated. Note that the timestamp of the generated output of a function will always have the timestamp of the last updated input value.

There are also functions that use the value of the first input to determine how the other inputs are to be used. The simplest example is the “Gate” function as illustrated in figure 6.


Figure 6: “Gate” function.

The “Gate” function will forward an updated second (generic) input only when the first input is true. This means that the output is ‘calculated’ even if the first input is not updated during the current update round, which is different from the behavior of standard functions.

Constants


A constant node can hold a fixed value of a certain variable type which is specified when the dataflow diagram is designed. In figure 10 an example of a constant node is given.

In the visualization of a constant node, the value is displayed above the node except when the constant represents an array or sequence because these would be impractical to display as a string. The description “Float” can be changed, for example to describe the purpose of the constant.


Figure 10: Constant data flow diagram node.

A constant can be connected to the input of one or more functions. When this is done, a function will obviously not ‘wait’ for that input to be updated but calculate its output as soon as all the inputs are updated that are not connected to a constant.

Typical usages of constants could be the specification of axes, size of buffers, indication of indices or other ‘settings’ of the algorithm. In figure 11 an example is given where a vector constant is used to determine the direction of the z-axis as a result of the rotation of the left upper leg (made available via an input stream which node type is discussed in the next chapter).


Figure 11: Example of a vector constant.

Input streams


An input stream offers a certain data stream on its output. An example of input streams is given in figure 12 where the input stream with the name “Left upper leg” supplies a timestamped orientation (q) and the one with “Right knee” supplies a position (v).


Figure 12: Input stream nodes.

Because the input stream is the beginning of the dataflow diagram, it has no inputs. Instead, the stream is updated by the application whenever new data is available, e.g. at each timestep.

There is an input stream node for all variable types.

Variable outputs


To be able to present the results of the execution of a dataflow diagram (also see “Diagram execution”), a variable output node is defined. There is a variable output node for each variable type.

In figure 13 the setup of a variable output node is shown. To be able to identify the value, the description “Float” can be changed.


Figure 13: Variable output node.

When the input of a variable output node is updated, the connected system is informed with the description (or ‘label’) and the value. It is then up to the implementation of the connected system to use (or ignore) the value.

Note that a variable output can also be of great help when debugging algorithms by supplying intermediate values, e.g. much like a debug statement in normal coding practice.

Connections


To implement the functionality of creating a chain of executed function operations, i.e. to let data ‘flow’, the output of a node can be connected to the input of another node in case of matching variable types or either one being a generic connector.

An example of interconnected functions is given in figure 7.


Figure 7: Example of interconnected functions where the data ‘flows’ from left to right.

To create a connection an output and input connector must be selected that are of the same type or either one is a generic connector. For example, in figure 8 the last connection will be created when the float input connector of the variable output is selected.


Figure 8: Connecting nodes.

An input can only be connected to one output, but an output can be connected to any number of inputs.

An editor also prevents the user from creating connections that would result in feedback loops as shown in figure 9.


Figure 9: Feedback loop prevented.

Once a connection has been made the update of an output during execution of the diagram (see “Diagram execution”), will result in the update of all the connected inputs.

Containers


Potentially, dataflow diagrams can become quite large. This makes it difficult to quickly assess the functionality. It would be helpful if certain functionality which is implemented in a set of interconnected nodes could be collapsed into a ‘container’. Just like a regular function, such a container would have a single output and one or more inputs.

Containers can be created by selecting a number of interconnected nodes, being functions, constants or other containers.

The set of selected nodes must comply with the following requirements:

In the diagram in figure 15 an example is given of a set of selected nodes (indicated by the red border) that comply with these requirements.


Figure 15: Set of nodes that can be collapsed into a container.

The set of selected nodes can be collapsed to a container. The default description “[Container]” can be changed to something more descriptive.


Figure 16: Newly created container node.

A container is only for visual simplification of the diagram. It has no other effect on the data processing. As such, the container still contains the interconnected nodes and can be expanded again if so required[3].

Subdiagrams


Another method to get a better overview of the algorithm is to be able to divide the diagram in subdiagrams that can be displayed separately.

To do this, a subdiagram can be specified with a name describing the functionality implemented and the nodes that are contained in it.

Similarly to containers, subdiagrams only serve visualization purposes and have no other effect on the data processing.

Diagram loops


The application is responsible for setting the values of the input streams and having the diagram calculate the variable outputs. However a special situation occurs when variable outputs are specified that have the same description and variable type as one or more of the input streams. An example of such a diagram is shown in figure 14.


Figure 14: Diagram with an internal looped variable.

In the diagram in figure 14 the external sensor system could update the input streams labeled “Left upper arm” and “Left lower arm” first and when the execution of the diagram (see “Diagram execution”) results in an update of the variable output labeled “Angle”, the input stream with the same label and type is updated as well.

A more detailed description is given in the next chapter.

Diagram execution


To determine the output of a dataflow diagram at a certain timestamp, the diagram must be executed, which is started by the external application updating those input streams for which new data is available at a certain timestamp. This will update all the connected inputs.

The next step that is performed is updating the outputs of the constant nodes, which in turn will also update all the connected inputs.

Next, a loop is started that iterates over a list of function nodes and performs the following actions:

  1. It checks if all inputs of the current node are checked.
  2. If all inputs are not checked, it sets the ‘allChecked’ variable to false and continues to the next iteration of the for loop.
  3. If all inputs are checked, it checks if all relevant inputs of the current node are updated.
  4. If all relevant inputs are updated, it sets the ‘updated’ variable to true and runs the internal algorithm of the current node.
  5. After the for-loop has completed its iterations, it checks the values of the ‘updated’ and ‘allChecked’ variables. If either of these variables is false, it repeats the loop. If both variables are true, it ends the loop.

This process can be written as pseudo-code:

updated = true

allChecked = false

WHILE updated OR NOT allChecked

    updated = false

    allChecked = true

    FOR node IN nodes["Function"]

        # check if all inputs of the function node are checked

        inputsChecked = checkInputs(node)

        IF NOT inputsChecked THEN

            allChecked = false

            CONTINUE

        END IF

        # check if all relevant inputs of the function node are updated

        relevantInputsUpdated = checkRelevantInputs(node)

        IF relevantInputsUpdated THEN

            updated = true

            # run the function node's internal algorithm

            result = runAlgorithm(node)

        END IF

    END FOR

END WHILE

Note that running the function node’s internal algorithm uses the input values. Therefore it is necessary to check the input values first before using them. This is to support special functions that do not wait for all their inputs to be updated. For these function nodes, checking the input values ensures that its current value is up-to-date.

Depending on the result of running the function node’s internal algorithm, the ‘value’ property of the timestamped variable associated with the output of the node could be set and when that happens, the generic part is called in which the ‘updated’ property of the node's output is set to true. It then enters a for-loop that iterates over the node's inputs. For each input, it sets the ‘updated’ property to false and updates the ‘timestamp’ property of the node's output to the maximum of its current value and the ‘timestamp’ property of the input.

After the first for-loop has completed its iterations, another for-loop is started that iterates over the node's connected inputs. For each connected input, it sets the ‘value’ and ‘timestamp’ properties to the corresponding properties of the node's output and sets the ‘updated’ property to true.

These actions can be specified by the following pseudo-code:

node.output.updated = true

node.output.timestamp = 0

FOR input IN node.inputs

    input.updated = false

    node.output.timestamp = MAX( node.output.timestamp, input.timestamp )

END FOR

FOR input IN node.connectedInputs

    input.value = node.output.value

    input.timestamp = node.output.timestamp

    input.updated = true

END FOR

The process will continue until there are no more updates. At that point it is checked whether the input of any of the variable outputs was updated. If this is the case, the execution process is repeated, but now with the values of the updated variable outputs as input.

This allows for the implementation of diagram loops as discussed in “Diagram loops”.

To prevent infinite loops, variable outputs will only be used once in this way.

Dataflow diagram file format


A dataflow diagram is stored in a file with the ‘dfd’ extension which is formatted as a JSON file. It has the following sections:

Section

Description

Type

nodes

Array of generic node information.

[NodeInfo]

properties

Array of properties that can differ per node class.

[PropertyInfo]

connections

Connections between nodes.

[ConnectionInfo]

positions

Positions of the nodes.

[PositionInfo]

containers

Array of the information of the defined containers.

[ContainerInfo]

manualLines

Optional array of strings containing the lines describing for example the purpose of this diagram.

[String]

subdiagrams

Optional list of subdiagrams.

[SubdiagramInfo]

Node that the information contained in the “positions” and “containers” section is only required for visualization but not for execution.

The custom types specified above are described in the following sections.

NodeInfo

Parameter

Description

Type

id

Id of the node, unique within the file.

Int

classString

The class of the node, i.e. “Input stream”, “Constant”, “Function” or “Variable output”.

String

typeString

The type of the node within the class, for example “Add

String

outputType

Optional parameter used to indicate the variable type of the output for functions with generic connectors.

String?

PositionInfo

Parameter

Description

Type

id

Id of the node for which this property must be set.

Int

x

X-coordinate of the position of the node.

Float

y

Y-coordinate of the position of the node,

Float

ContainerInfo

Parameter

Description

Type

description

Description given to the container.

String

nodeIds

Array of ids of the contained nodes.

[Int]

color

Optional parameter specifying the color.

String?

x

X-coordinate of the position of the container.

Float

y

Y-coordinate of the position of the container,

Float

containers

List of nested containers.

[ContainerInfo]

ConnectionInfo

Parameter

Description

Type

outputNodeId

Id of the node for which the output connector is connected.

Int

inputNodeId

Id of the node for which one of the input connectors is connected.

Int

inputIndex

Index of the input connector that is connected.

Int

PropertyInfo

Parameter

Description

Type

id

Id of the node for which this property must be set.

Int

propertyString

Identifier of the property.

String

valueString

Textual representation of the value.

String

The following propertyStrings are defined:

SubdiagramInfo

Parameter

Description

Type

description

Description given to the container.

String

nodeIds

Array of ids of the contained nodes.

[Int]

Example

Consider the following diagram.

The contents of the “Joint angle” container is given below.

The contents of the “Angle” container is given below.

JSON

The corresponding diagram file formatted as a JSON string is as follows.

{"connections":[{"inputIndex":0,"outputNodeId":1,"inputNodeId":3},{"inputIndex":1,"outputNodeId":2,"inputNodeId":3},{"inputIndex":0,"outputNodeId":3,"inputNodeId":5},{"inputIndex":0,"outputNodeId":4,"inputNodeId":1},{"inputIndex":0,"outputNodeId":4,"inputNodeId":2},{"inputIndex":0,"outputNodeId":5,"inputNodeId":0},{"inputIndex":1,"outputNodeId":6,"inputNodeId":1},{"inputIndex":1,"outputNodeId":7,"inputNodeId":2},{"inputIndex":0,"outputNodeId":8,"inputNodeId":4}],"properties":[{"id":0,"propertyString":"Description","valueString":"Angle"},{"id":0,"propertyString":"Color","valueString":"purple"},{"id":6,"propertyString":"Description","valueString":"Left upper arm"},{"id":7,"propertyString":"Description","valueString":"Left lower arm"},{"id":8,"propertyString":"Description","valueString":"z-axis"},{"id":8,"propertyString":"OutputValue","valueString":"(0.0, 0.0, 1.0)"}],"containers":[{"nodeIds":[8],"x":430,"y":210,"containers":[{"nodeIds":[1,2,3,4],"x":765,"y":545,"containers":[],"description":"Angle"}],"description":"Joint angle"}],"manualLines":["This is an example of a dataflow diagram with nested containers."],"positions":[{"id":0,"x":940,"y":210},{"id":1,"x":980,"y":520},{"id":2,"x":980,"y":660},{"id":3,"x":1220,"y":590},{"id":4,"x":730,"y":490},{"id":5,"x":690,"y":210},{"id":6,"x":140,"y":150},{"id":7,"x":140,"y":260},{"id":8,"x":615,"y":415}],"nodes":[{"id":0,"classString":"Variable output","typeString":"Float"},{"id":1,"classString":"Function","typeString":"Rotate vector"},{"id":2,"classString":"Function","typeString":"Rotate vector"},{"id":3,"classString":"Function","typeString":"Vector angle"},{"id":4,"classString":"Function","typeString":"Pass-through","outputType":"Vector"},{"id":5,"classString":"Function","typeString":"Radians to degrees"},{"id":6,"classString":"Input stream","typeString":"Orientation"},{"id":7,"classString":"Input stream","typeString":"Orientation"},{"id":8,"classString":"Constant","typeString":"Vector"}]}


[1] Wikipedia - Dataflow programming

[2] A float is a data type composed of a number that is not an integer, because it includes a fraction represented in decimal format.

[3] In the “Dataflowed” app this is for example done by double tapping the container.