Hi,
I will describe very briefly the infrastructure which allows DUNE to run in parallel, based on Indexsets, which can
be found in the module dune-common.
The module dune-common contains the basic classes used by all DUNE modules. It provides some infrastructural classes for debugging and exception handling as well as a library to handle dense matrices and vectors. Moreover it provides an abstraction for parallel computing and it implements all the classes for syncing distributed indexed data structures.
I have implemented a toy example to understand how the Indexsets works and to get practice with this framework. All the original DUNE modules are mirrored on github and you can find here the source of dune-common. For a detailed description of the different classes, here we have the web doxygen documentation. My examples is directly taken from the paper C++ Components Describing Parallel Domain Decomposition and Communication, Markus Blatt and Peter Bastian, which describes all the design behind the Indexsets.
I have forked the dune-common repository here and I have created a branch called threads. You can find the toy example is available here paralleltest.cc. The code basically initialize MPI and check if the code was run by more than 2 processes, if not create a local vector, al, on each process of length 7:
- on rank 0 al = {0, 1, 2, 3, 4, 5, 0}
- on rank 1 al = {0, 6, 7, 8, 9, 10, 11}
where on rank 0 al[6] ( where the indexing starts from 0 ) is the ghost of element al[0] on rank 1, while on rank 1 al[0] is the ghost of element al[6] on rank 0.
After printing for each process the contents of al, it adds 10 on rank 0 to all the elements of the vector and 20 on rank 1. Finally it perform the communication and output the finals local vectors:
- on rank 0 al = {10, 11, 12, 13, 14, 15, 26}
- on rank 1 al = {15, 26, 27, 28, 29, 30, 31}
Looking at the code, we can notice how simple is to implement this communication without the need of using any MPI API, a part from MPI_Init() and MPI_Finalize(). All the parallelization process is automatically carried out by DUNE. The user needs just to provide the correct Indexsets.
The ParallelLocalIndex contains the local index (where local means referred to the process) and the type (in this case if is a ghost or not); ParallelIndexSet coupled the local index to a global ID (which must be unique). Finally RemoteIndices maps two ParallelIndexSet, source and target, and it is passed to the Interface which is the communication interface. The actual communication, according to the interface created is carried out by BufferedCommunicator.
In the paper mentioned above there are all the rigorous definitions of the classes. Moreover, it is important to know that the overhead is almost null respect to a direct use of MPI and, in case of buffered communication, there is also a performance gain.
Now, that we have taken a look on the parallel structure of DUNE we can understand better the SOCIS project. Indeed Indesets provide both an abstract interface for generic parallel communication and a specialization for MPI. The aim of my work is to add the support for threads to have an hybrid approach shared memory (threads) and distributed memory (MPI).
In the next post I will describe very briefly what is a threads and which threads model I will use.
Questions are more than welcomed.
Stay tuned!
Marco.