Relations Overview

The relations API functions enable you to associate instances of named entities with other instances. By making associations among the entities in your trace, you enable the Intel® Platform Analyzer to distill your detailed trace information into higher-level concepts that are implicit in the design of your code.

For example, in some parallel processing environments, a single operation can lead to dozens, if not hundreds of tasks on other threads. In a case like this, to understand the performance impact of such an operation, you need to manually sum up the durations of these hundreds of tasks. The relations API functions help the Intel® Platform Analyzer automate that process for you.

Think of all your markers, tasks and task groups as nodes in a graph. Relations enable you to create edges of certain types between nodes, say a “Parent” edge from X to Y to signify that Y is the parent of X.

To create these connections between entities, you need to assign instance identifiers (IDs) to them. The ID enables you to refer to the instances before, or after, it has actually run. In the Intel ITT API, IDs are represented by the __itt_id structure.

To see this in action, let’s work through a parallel-for implementation, and the challenge of measuring the total cost of the parallel-for. Start with the actual parallel-for task:

 

#define ParForNS (void*)3

__itt_string_handle* pH1 = __itt_string_handle_create(L”ParallelFor”);

__itt_string_handle* pH2 = __itt_string_handle_create(L”WaitForCompletion”);

__itt_domain* domain = __itt_domain_create(L”MyDomain”);

 void ParallelFor(int base, int count, void(*doWork)(int index))

 {

      // Get a unique identifier for this specific parallel-for call

      static int sParallelForCounter = 0;

      int id = sParallelForCounter++;

      // Establish the beginning of this ID’s existence. Without calling this, all references to this ID  

      // are invalid.

      __itt_id_create(domain, __itt_id_make(ParForNS, id));

      // Now, begin the actual task and associate it with an ID

      __itt_task_begin(domain, __itt_id_make(ParForNS, id), __itt_null, pH1);

 

      // this numTasksOutstanding will be used to decide whether the parallel for has completed

      int numTasksOutstanding = 0;

 

      // now do the actual ParallelFor work: kick off the child tasks…

      int countPerTask = count / GetNumWorkerThreads();

      int curBase = 0;

      int remaining = count;

      while(remaining > 0)

      {

         int curCount = MIN(remaining, countPerTask);

         // out of work

         if(curCount == 0) continue;

         // add a task to the parallel for status

         AtomicIncrement(&numTasksOutstanding);

         EnqueueTask(ParallelFor_DoWork, &numTasksOutstanding, id, doWork, curBase, curCount);

         curBase += curCount; remaining -= curCount;

      }

      // our parallel-for kickoff has completed, but the parallel-for work hasn’t

      // we will make a subtask to measure how long we wait for completion

      __itt_task_begin(domain, __itt_null, __itt_null, pH2);

      while(numTasksOutstanding > 0)

         YieldTimeslice();

      __itt_task_end(domain);

      // The parallel for, including all its child work, is complete. We can end the root parallel-for task

      __itt_task_end(domain);

      // Now, we MUST destroy the ID that was used for this parallel for.

      // We can destroy the ID here because all the child tasks are known to be done, due to the wait loop above.

      __itt_id_destroy(domain, __itt_id_make(ParForNS, id));

 }

 

In addition to instrumenting the master parallel-for task, instrument the worker task:

 

 __itt_string_handle* pH = __itt_string_handle_create(“ParallelFor”);

 void ParallelFor_DoWork(int* numTasksOutstanding, int parallelForID, void (*doWork)(int index), int base, int count)

 {

      // make a task for this worker task. It does not need an ID itself.

      __itt_task_begin(domain, __itt_null, __itt_null, pH);

 

      // mark this task as a child of the parent parallelFor task created earlier

      __itt_relation_add_to_current(domain, __itt_relation_is_child_of, __itt_id_make(ParForNS, parallelForID));

      // do the work for this actual task…

      for(int i = base; i < count; ++i) {

         doWork(i);

      }

      // The parallel for implementation above needs to know when all worker tasks have completed their work

      AtomicDecrement(numTasksOutstanding);

      // we’re done, mark us as done

      __itt_task_end(domain);

   }

 

All relationships in ITT are symmetric, in the sense that where there is a relation indicating a relationship X from A to B, there will be another implied relationship Y that goes from B to A. For example, if you define a __itt_relation_is_child_of relationship from A to B, there will be an implied __itt_relation_is_parent_of relationship from B to A.

The most difficult part about using relations is destroying IDs at the right moment. You must not destroy an ID until all tasks that might reference that ID have also completed. In the example earlier, the ID could be retired in the same function because the root ParallelFor function blocked on completion of all its child tasks.

See Also

__itt_relation_add
_
_itt_relation_add_to_current
__itt_relation

Relations Overview