Hi everybody,
I have finally implemented class which performs the communication among threads for a given interface.

The original MPI version is implemented in communicator.hh. Here you can find a sightly modified version

At the beginning, I tried to generalized this class for a generic parallel paradigm but it was very complicated since the MPI and the thread implementation have very little in common. Therefore I decided to write a completely different class, Dune::ThreadCommunicator, to manage the communication among threads. This class is implemened in the following header

The public methods are exactly the same of the MPI Dune::BufferedCommunicator except for the absence of the build methods. A part from that, the new Dune::ThreadCommunicator works completely different. I will summarize very briefly the logic behind it.

The constructor calls the private method computeColoring() which calculates an optimal disjoint partitioning of the graph representing the threads. A more accurate description can be found in my previous post. I have chosen this approach to minimize the locking time.

It is worth to notice that, since the ThreadCommunicator will be used inside the threads, it is not possible to create a lock-free mechanism using a join() (like done in the above mentioned post). Instead, I will use a barrier which mimics the MPI_Barrier().

I have implemented the barrier and all the communication facilities among threads in the class ThreadCollectiveCommunication present in the header

It uses a double counter and a std::mutex to create a self-resetting barrier (with a single counter is not safe to reset the counter).

Moreover, ThreadCollectiveCommunication provides all the facilities to set/get pointers to a variable which is defined in the thread scope. In this way a thread can read safely a variable (of the same type) owned by another thread.

To perform this communication, a common (shared) std::array is created. This array contains 1 element, of user-defined type, for each thread. The allocation is performed by only 1 thread and all the other waits since we need to be sure that the buffer exists before inserting the elements. The method which set the elements uses the barrier at the end in order to be sure that the array is completely filled before using it. This seems very expensive, but we are just inserting a pointer for each threads therefore it is very fast. Finally, the deallocation is equivalent of the allocation from a locking point of view.

All the public methods of the class ThreadCommunicator which perform the forward or backward communication of the values calls simply the private method sendRecv.

This method again uses the facilities provided by ThreadCollectiveCommunication to communicate the values among the threads. More precisely each element of the array is a std::pair containing a pointer to the data target and a pointer to the interface map.

Therefore each thread has access to all the interfaces and to the local structures where the data need to be scattered. In order to avoid race conditions, only the threads which share the same color are run concurrently and all the other wait on the barrier.

You can find the two equivalents examples which use different parallel paradigm here

Therefore we have now a full support for thread in dune-common.

Stay tuned!
Marco.