Template Task Graph User Guide

Contents

Your First TTG Program

The following code creates four tasks, one of type A, two of type B (B(0) and B(1)), and one of type C, and ensures that A runs before both B tasks, and that both B tasks run before C.

Simple diamond DAG

#include <ttg.h>

To implement a TTG program, the user just needs to include ttg.h. The selection of the task backend is usually done at compile time through a compiler definition. However, before include ttg.h, the user could #define TTG_USE_PARSEC 1 or #define TTG_USE_MADNESS 1. Note that only one of the backends can be defined, and the recommended way is to define which backend is used when invoking the compiler (see Compiling Your First TTG Program below).

static void a(std::tuple<ttg::Out<int, double>> &out) {
ttg::print("Called task A ");
ttg::send<0>(0, 1.0, out);
ttg::send<0>(1, 2.0, out);
}
void send(const keyT &key, valueT &&value, ttg::Out< keyT, valueT > &t)
Sends a task id and a value to the given output terminal.
Definition: func.h:158
void print(const T &t, const Ts &... ts)
atomically prints to std::cout a sequence of items (separated by ttg::print_separator) followed by st...
Definition: print.h:130

This is the function that implements tasks of type A. Tasks of this type have no key to identify them, and receive no input data. Their only output terminal is defined to receive a key of type int and a value of type double.

We use ttg::print to printout information, as a convenience function that also avoids messages from multiple threads to interfere with each other.

The task sends the value 1.0 to B(0) by outputing the key of type int and value 0 with the data of type double and value 1.0 on the output terminal of index <0>, and the key of type int and value 1 with the data of type double and value 2.0 on the same output terminal.

Because the two keys are different, these two ttg::send instantiate two different target tasks. Which task depends on how the function is wrapped into a template task (TT), and how the terminals of this template task are connected to other terminals of template tasks (see below).

static void b(const int &key, const double &input, std::tuple<ttg::Out<void, double>, ttg::Out<void, double>> &out) {
ttg::print("Called task B(", key, ") with input data ", input);
if (key == 0) ttg::sendv<0>(input + 1.0, out);
else ttg::sendv<1>(input + 1.0, out);
}
void sendv(valueT &&value, ttg::Out< void, valueT > &t)
Sends a value (without an accompanying task id) to the given output terminal.
Definition: func.h:179

This function defines the behavior of tasks of type B.

This time, tasks of type B have an integer identifier (key), an input value (value), of type double, and two output terminals. Both output terminals have no identifier (keys of type void), and carry a data of type double.

The task sends to different terminals depending on the value of key: tasks with a key of 0 output on the terminal of index <0>, while tasks with another key output on the terminal of index <1>. They also output different values on these edges.

Because the output terminals do not define a task identifier (their keys are of type void), one cannot use ttg::send, but needs to use ttg::sendv. ttg::sendv differs from ttg::send only in the fact that ttg::send requires a key identifier for the destination task, while ttg::sendv does not.

static void c(const double &b0, const double &b1, std::tuple<> &out) {
ttg::print("Called task C with inputs ", b0, " from B(0) and ", b1, " from B(1)");
}

Tasks of type C are implemented with this function. It's a sink task: no ttg::send are emitted by this task that takes no task identifier, defines no output terminals, and only an input value of type double.

int main(int argc, char **argv) {
ttg::initialize(argc, argv, -1);
void initialize(int argc, char **argv, int num_threads=-1, RestOfArgs &&...)
int main(int argc, char **argv)
Definition: simple.cc:23

The code needs to initialize ttg before any other ttg-related calls.

ttg::Edge<void, double> B_C0("B->C0");
ttg::Edge<void, double> B_C1("B->C1");
Edge is used to connect In and Out terminals.
Definition: edge.h:25

We define 3 edges, to connect the different tasks together.

They have different prototypes: A_B carries an identifier of type int and a value of type double, while B_C0 and B_C1 carry no identifier (void) and a value of type double.

To help debugging, we give unique meaningful names to these edges in the constructor argument.

We need only three edges, because these edges define connections between the template tasks, not connections between tasks. Their instantiation by ttg::send or ttg::sendv define the actual edges between tasks.

There are two edges connecting B to C because C has two input terminals, and if we used the same edge between B and C, sending on that edge would trigger C twice.

auto wa(ttg::make_tt<void>(a, ttg::edges(), ttg::edges(A_B), "A", {}, {"to B"}));
auto wb(ttg::make_tt(b, ttg::edges(A_B), ttg::edges(B_C0, B_C1), "B", {"from A"}, {"to 1st input of C", "to 2nd input of C"}));
auto wc(ttg::make_tt(c, ttg::edges(B_C0, B_C1), ttg::edges(), "C", {"From B", "From B"}, {}));
auto make_tt(funcT &&func, const std::tuple< ttg::Edge< keyT, input_edge_valuesT >... > &inedges=std::tuple<>{}, const std::tuple< output_edgesT... > &outedges=std::tuple<>{}, const std::string &name="wrapper", const std::vector< std::string > &innames=std::vector< std::string >(sizeof...(input_edge_valuesT), "input"), const std::vector< std::string > &outnames=std::vector< std::string >(sizeof...(output_edgesT), "output"))
Factory function to assist in wrapping a callable with signature.
Definition: make_tt.h:560
auto edges(inedgesT &&...args)
Make a tuple of Edges to pass to.
Definition: func.h:147

We now define the three template tasks wa, wb, and wc, using the ttg::make_tt helper.

ttg::make_tt takes as parameters the function that implements the task, the list of input edges that are connected to its input terminals, the list of output edges that are connected to its output terminals, the name of the task, the list of names for the input terminals, and the list of names for the output terminals.

These TTs and the edges define the template task graph that will then be instantiated as a DAG of tasks by the execution.

if (wa->get_world().rank() == 0) wa->invoke();
std::enable_if_t<(std::is_convertible_v< decltype(*(std::declval< TTBasePtrs >))), TTBase & > bool make_graph_executable(TTBasePtrs &&...tts)
Definition: func.h:80

Before executing the first tasks, the template task graph must be made executable by calling ttg::make_graph_executable() on each source TT of the graph. This signals to the runtime system that all edges that connect TTs are defined, computes internal state necessary to track all dependencies, and registers active message handles for each template task type.

We need to start the DAG of tasks by invoking the initial tasks with ttg::TTBase::invoke(). In this simple DAG, there is a single initial task, the task A, which runs on rank 0.

void execute(ttg::World world)
Starts the execution in the given execution context.
Definition: run.h:74
void fence(ttg::World world)
Returns when all tasks associated with the given execution context have finished on all ranks.
Definition: run.h:81
ttg::World & get_default_world()
Definition: world.h:80

We can then start the execution of the DAG of tasks. This will enable the compute-threads in the ttg library, and start instantiating tasks as the execution unfolds.

With ttg::fence(), we wait for the completion of all DAGs started.

return EXIT_SUCCESS;
}
std::enable_if_t<!meta::is_void_v< keyT >, void > finalize(const keyT &key, ttg::Out< out_keyT, out_valueT > &t)
Finalize streaming input terminals connecting to the given output terminal for tasks identified by ke...
Definition: func.h:545

And finally, we can shut down the ttg library and return from the application.

Full first example

Compiling Your First TTG Program

The recommended way to compile a TTG program is to use CMake.

Below, you will find a minimal CMakeLists.txt file to compile the first example above with both the PaRSEC and the MADNESS driver.

cmake_minimum_required(VERSION 3.19)
project(TTG-Example CXX)
find_package(ttg REQUIRED)
add_executable(first-parsec first.cc)
target_compile_definitions(first-parsec PRIVATE TTG_USE_PARSEC=1)
target_link_libraries(first-parsec PRIVATE ttg-parsec)
add_executable(first-mad first.cc)
target_compile_definitions(first-mad PRIVATE TTG_USE_MADNESS=1)
target_link_libraries(first-mad PRIVATE ttg-mad)

This CMakeLists.txt uses find_package(ttg) to define the different ttg targets. find_package uses the ttg_DIR CMake variable as a hint where to find configuration files. So, if you installed ttg in /path/to/ttg, you can point find_package to the appropriate directory by calling CMake as follows:

cd /path/to/your/builddir
cmake -Dttg_DIR=/path/to/ttg/lib/cmake/ttg /path/to/your/sourcedir

find_package(ttg) defines the following CMake targets:

  • ttg-parsec: the PaRSEC backend for TTG
  • ttg-mad: the MADNESS backend for TTG

When source code #include <ttg.h>, it needs to define which backend it uses. In this example, we do that from the command line, by adding the compile-definition TTG_USE_PARSEC=1 or TTG_USE_MADNESS=1.

It is then sufficient to tell CMake that the executable depends on the corresponding TTG target to add the appropriate include path and link commands.

Data Dependent Program

We now extend the first example to illustrate a data-dependent application behavior. Consider now that the tasks of type C can dynamically decide to iterate over the simple DAG of tasks before, depending on the values received as input.

To make the example simple, we will simply define a threshold: if the data sent by B(0) plus the data sent by B(1) is lower than this threshold, then the DAG should be iterated, otherwise the application is completed.

One way of representing this behavior is denoted by the graph below:

Iterative diamond DAG

First, because each task in the DAG needs to be uniquely identified, and there are potentially many tasks of type A or C, tasks of these kinds now need to get an identifier. Second, tasks of type B are not only identified by 0 or 1, but also need another identifier that denotes to which task of A or C it is connected. We extend the identifier type of B to Key2, which is a std::pair<int, int> to do this simply.

Second, the function that implements the task for C needs to decide dynamically if it continues iterating or not. This is done by conditionally calling ttg::send in this function. If the function does not call ttg::send, then the no more task is discovered, and the whole operation will complete.

#include <ttg.h>

The inclusion of the ttg/serialization/std/pair.h file is necessary to import the serialization mechanisms for the task identifiers of tasks of type A or C.

const double threshold = 100.0;
const double threshold
Definition: iterative.cc:5

We define the threshold as a globally visible constant.

using Key2 = std::pair<int, int>;
namespace std {
std::ostream &operator<<(std::ostream &os, const Key2 &key) {
os << "{" << std::get<0>(key) << ", " << std::get<1>(key) << "}";
return os;
}
} // namespace std
std::ostream & operator<<(std::ostream &os, ttg::device::Device device)
Definition: device.h:68
std::pair< int, int > Key2
Definition: distributed.cc:6

We define the key type as a std::pair<int, int>, and extend the std::operator<< to printout an object of type Key2

static void a(const int &k, const double &input, std::tuple<ttg::Out<Key2, double>> &out) {
ttg::print("Called task A(", k, ")"); ttg::send <0>(Key2{k, 0}, 1.0 + input, out);
ttg::send <0>(Key2{k, 1}, 2.0 + input, out);
}

Tasks of type A now take an integer key, and an input value; the output is modified to take a Key2 (as tasks of type B have keys of type Key2).

static void b(const Key2 &key, const double &input, std::tuple<ttg::Out<int, double>, ttg::Out<int, double>> &out) {
ttg::print("Called task B(", key, ") with input data ", input);
if (std::get<1>(key) == 0) ttg::send <0>(std::get<0>(key), input + 1.0, out);
else ttg::send <1>(std::get<0>(key), input + 1.0, out);
}

Tasks of type B now take a key of type Key2, and the output is modified to take an integer key. We then use ttg::send instead of ttg::sendv, because ttg::sendv is only used to send to a task that does not have a key identifier.

static void c(const int &k, const double &b0, const double &b1, std::tuple<ttg::Out<int, double>> &out) {
ttg::print("Called task C(", k, ") with inputs ", b0, " from B(", k, " 0) and ", b1, " from B(", k, " 1)");
if (b0 + b1 < threshold) {
ttg::print(" ", b0, "+", b1, "<", threshold, " so continuing to iterate"); ttg::send <0>(k + 1, b0 + b1, out);
} else {
ttg::print(" ", b0, "+", b1, ">=", threshold, " so stopping the iterations");
}
}

Tasks of type C are modified the same way, and the function that implements the task holds the dynamic decision to continue in the DAG or not.

int main(int argc, char **argv) {
ttg::initialize(argc, argv, -1);
ttg::Edge<Key2, double> A_B("A(k)->B(k)");
ttg::Edge<int, double> B_C0("B(k)->C0(k)");
ttg::Edge<int, double> B_C1("B(k)->C1(k)");
ttg::Edge<int, double> C_A("C(k)->A(k)");
int main(int argc, char **argv)
Definition: iterative.cc:39

We update the edges types to reflect the new tasks prototypes, and add a new edge, that loops from C(k) to A(k+1) (note that the value of the key is decided in the function itself, this has no impact on this part of the code).

auto wa(ttg::make_tt(a, ttg::edges(C_A), ttg::edges(A_B), "A", {"from C"}, {"to B"}));
auto wb(ttg::make_tt(b, ttg::edges(A_B), ttg::edges(B_C0, B_C1), "B", {"from A"},
{"to 1st input of C", "to 2nd input of C"}));
auto wc(ttg::make_tt(c, ttg::edges(B_C0, B_C1), ttg::edges(C_A), "C", {"From B", "From B"}, {"to A"}));

The ttg::make_tt calls are also updated to reflect the new task prototypes, and include the edge from C(k) to A(k+1).

if (wa->get_world().rank() == 0) wa->invoke(0, 0.0);

When invoking A(0, 0.0), one needs to provide the key for the task and the input value for each input that A now defines.

return EXIT_SUCCESS;
}

Full iterative diamond example

Streaming Terminals

Now, consider that for a given k, there can be a large amount of tasks of type B, and that the number of such tasks depends on some computation. This means that the input of tasks of type C is not fixed, but variable.

To express such construct, it is possible to do it by building a sub-DAG of tasks that combine the outputs of the different tasks of class B before passing the combination to task C.

TTG provides a more synthetic construct to do so easily: the streaming terminals.

DAG of the iterative diamond of arbitary width

The begining of the program remains identical to the iterative case above: we still use a std::pair<int, int> that we alias as Key2 to define the task identifiers of tasks of class B, and we use the standard serialization provided by TTG for those.

#include <ttg.h>
const double threshold = 100.0;
using Key2 = std::pair<int, int>;
namespace std {
std::ostream &operator<<(std::ostream &os, const Key2 &key) {
os << "{" << std::get<0>(key) << ", " << std::get<1>(key) << "}";
return os;
}
} // namespace std
const double threshold
Definition: reducing.cc:5

The code for tasks of type A will be inlined as a lambda function, because it needs to access other parts of the DAG that need to be defined before. The code for tasks of type B becomes simpler: we always send the updated input that tasks of type B receive to the single input terminal of tasks of task C, so we don't need to differentiate between the keys to decide on which output terminal to provide the data.

static void b(const Key2 &key, const double &input, std::tuple<ttg::Out<int, double>> &out) {
ttg::print("Called task B(", key, ") with input data ", input); ttg::send <0>(std::get<0>(key), input + 1.0, out);
}

Tasks of type C have been simplified too: they now take a single input, and it's the input terminal that will do the sum operation.

static void c(const int &k, const double &sum, std::tuple<ttg::Out<int, double>> &out) {
ttg::print("Called task C(", k, ") with input ", sum);
if (sum < threshold) {
ttg::print(" ", sum, "<", threshold, " so continuing to iterate"); ttg::send <0>(k + 1, sum, out);
} else {
ttg::print(" ", sum, ">=", threshold, " so stopping the iterations");
}
}

The main program that builds the DAG starts similarly to the simple iterative diamond example. Edge types have been simplified, because there is less unique edges (but edges of type C_A will be extended to include the streaming capability).

int main(int argc, char **argv) {
ttg::initialize(argc, argv, -1);
ttg::Edge<Key2, double> A_B("A(k)->B(k, i)");
ttg::Edge<int, double> B_C("B(k, i)->C(k)");
ttg::Edge<int, double> C_A("C(k)->A(k)");
int main(int argc, char **argv)
Definition: reducing.cc:30

Tasks of type C are defined first, because we need to expose those to the code of tasks of type A.

auto wc(ttg::make_tt(c, ttg::edges(B_C), ttg::edges(C_A), "C", {"From B"}, {"to A"}));

Now, we define the input reducer function to apply to the input terminal 0 of tasks of type C. The set_input_reducer function takes two references to elements of the appropriate type, a and b. The operation goal is to aggregate the values as they are sent to the input terminal. The first time a data is sent to this input terminal, it is copied onto the current aggregated value. Every other data sent to the same input terminal (and for the same destination task) is reduced into the aggregator value via this lambda. a is a reference to the (mutable) aggregator value, while b is a reference to the (constant) value to add.

Here, the function we define simply adds the value of b to a.

wc->set_input_reducer <0>(

We can now define the tasks of type A. Instead of passing the function to call, we define it in a lambda expression, which allows us to capture the TT of type C (wc). The prototype of this lambda is the one expected for tasks of the A. After displaying its name, the task calls set_argstream_size on the first input (<0>) of wc. This function takes two arguments: a task identifier (k), and the number of elements that are expected as input of the streaming terminal <0>. That counter can be data depndent, in this case we set it to k+1.

auto wa(ttg::make_tt(
[&](const int &k, const double &input, std::tuple<ttg::Out<Key2, double>> &out) {
ttg::print("Called task A(", k, ")");
wc->set_argstream_size<0>(k, k + 1);

The task can then create as many tasks of type B as is needed, and since each task of type B will output their value into the streaming terminal of the corresponding C, we instantiate k+1 tasks of type B by sending them input data.

The other parameters are the usual parameters of ttg::make_tt.

for (int i = 0; i < k + 1; i++) {
ttg::send <0>(Key2{k, i}, 1.0 + k + input, out);
}
},
ttg::edges(C_A), ttg::edges(A_B), "A", {"from C"}, {"to B"}));

Tasks of type B are created according to the new prototype, and the rest of the code is unchanged.

auto wb(ttg::make_tt(b, ttg::edges(A_B), ttg::edges(B_C), "B", {"from A"}, {"to C"}));
if (wa->get_world().rank() == 0) wa->invoke(0, 0.0);
return EXIT_SUCCESS;
}

Full iterative diamond of arbitrary width example

Distributed Computing

Any TTG program is a parallel application. In the current backends, TTG applications are also MPI applications. Tasks are distributed between the MPI ranks following a process keymap. The default process keymap hashes the task identifiers and distributes the hashes in a round-robin way. The user can control the task distribution by setting a user-defined keymap for Task Templates.

In the iterative diamond of arbitrary width, we can easily provide a suitable keymap by pinning tasks of type A and C (which are the first and last task of each diamond) onto the rank 0, while distributing the tasks of type B between the ranks using the second element in the key of those tasks.

This gives the code below, almost identical to the previous example, except for the keymap definition, and displaying on which rank each task executes.

#include <ttg.h>
const double threshold = 100.0;
using Key2 = std::pair<int, int>;
namespace std {
std::ostream &operator<<(std::ostream &os, const Key2 &key) {
os << "{" << std::get<0>(key) << ", " << std::get<1>(key) << "}";
return os;
}
} // namespace std
static void b(const Key2 &key, const double &input, std::tuple<ttg::Out<int, double>> &out) {
ttg::print("Called task B(", key, ") on rank", ttg::ttg_default_execution_context().rank(), "with input data ", input); ttg::send<0>(std::get<0>(key), input + 1.0, out);
}
static void c(const int &k, const double &sum, std::tuple<ttg::Out<int, double>> &out) {
ttg::print("Called task C(", k, ") on rank", ttg::ttg_default_execution_context().rank(), "with input ", sum);
if (sum < threshold) {
ttg::print(" ", sum, "<", threshold, " so continuing to iterate");
ttg::send<0>(k + 1, sum, out);
} else {
ttg::print(" ", sum, ">=", threshold, " so stopping the iterations");
}
}
int main(int argc, char **argv) {
ttg::initialize(argc, argv, -1);
ttg::Edge<Key2, double> A_B("A(k)->B(k, i)");
ttg::Edge<int, double> B_C("B(k, i)->C(k)");
ttg::Edge<int, double> C_A("C(k)->A(k)");
auto wc(ttg::make_tt(c, ttg::edges(B_C), ttg::edges(C_A), "C", {"From B"}, {"to A"}));
wc->set_input_reducer<0>([](double &a, const double &b) { a += b; });
auto wa(ttg::make_tt([&](const int &k, const double &input, std::tuple<ttg::Out<Key2, double>> &out) {
ttg::print("Called task A(", k, ") on rank", ttg::ttg_default_execution_context().rank());
wc->set_argstream_size<0>(k, k+1);
for(int i = 0; i < k+1; i++) {
ttg::send<0>(Key2{k, i}, 1.0 + k + input, out);
}
}, ttg::edges(C_A), ttg::edges(A_B), "A", {"from C"}, {"to B"}));
auto wb(ttg::make_tt(b, ttg::edges(A_B), ttg::edges(B_C), "B", {"from A"}, {"to C"}));
wa->set_keymap([&](const int &k) { return 0; });
wb->set_keymap([&](const Key2 &k) { return std::get<1>(k) % wb->get_world().size(); });
wc->set_keymap([&](const int &k) { return 0; });
if (wa->get_world().rank() == 0) wa->invoke(0, 0.0);
return EXIT_SUCCESS;
}
const double threshold
Definition: distributed.cc:5
int main(int argc, char **argv)
Definition: distributed.cc:31
ttg::World ttg_default_execution_context()
Definition: ttg.h:136
int rank(World world=default_execution_context())
Definition: run.h:85

Full iterative diamond of arbitrary width example with user-defined keymap