BoxLib: DownloadTo get a copy of the latest version of the BoxLib repository using git, please visit our Downloads page. If you do not have git and would like us to send you a tar ball, please contact Mike Lijewski of CCSE.
BoxLib: Users' GuideThe BoxLib User's Guide is available in the BoxLib git repository in BoxLib/Docs/ (type 'make'). This document contains step-by-step instructions for running simulations in parallel with multiple levels of refinement, with accompanying tutorial applications in BoxLib/Tutorials/.
BoxLib: TutorialsWe have created tutorials in the BoxLib release that give examples of how to use the extensive functionality that is available. Some of the tutorials available in the download describe:
BoxLib: Summary of Key FeaturesBoxLib is the block-structured AMR framework that is the basis for many of CCSE's codes.
BoxLib contains all the functionality needed to write a parallel, block-structured AMR application. The fundamental parallel abstraction is the MultiFab, which holds the data on the union of grids at a level. A MultiFab is composed of FAB's; each FAB is an array of data on a single grid. During each MultiFab operation the FAB's composing that MultiFab are distributed among the cores. MultiFab's at each level of refinement are distributed independently. The software supports two data distribution schemes, as well as a dynamic switching scheme that decides which approach to use based on the number of grids at a level and the number of processors. The first scheme is based on a heuristic knapsack algorithm; the second is based on the use of a Morton-ordering space-filling curve. MultiFab operations are performed with an owner computes rule with each processor operating independently on its local data. For operations that require data owned by other processors, the MultiFab operations are preceded by a data exchange between processors. Each processor contains meta-data that is needed to fully specify the geometry and processor assignments of the MultiFab's. At a minimum, this requires the storage of an array of boxes specifying the index space region for each AMR level of refinement. The meta-data can thus be used to dynamically evaluate the necessary communication patterns for sharing data amongst processors, enabling us to optimize communications patterns within the algorithm. One of the advantages of computing with fewer, larger grids in the hybrid OpenMP--MPI approach (see below) is that the size of the meta-data is substantially reduced.
BoxLib: Hybrid Parallelism
The basic parallelization strategy uses a hierarchical programming approach for multicore architectures based on both MPI and OpenMP. In the pure-MPI instantiation, at least one grid at each level is distributed to each core, and each core communicates with every other core using only MPI. In the hybrid approach, where on each socket there are n cores which all access the same memory, we can instead have one larger grid per socket, with the work associated with that grid distributed among the n cores using OpenMP.
BoxLib: Parallel I/O
Data for checkpoints and analysis are written in a self-describing format that consists of a directory for each time step written. Checkpoint directories contain all necessary data to restart the calculation from that time step. Plotfile directories contain data for postprocessing, visualization, and analytics, which can be read using AmrVis, a customized visualization package developed at LBNL for visualizing data on AMR grids, or VisIt. Within each checkpoint or plotfile directory is an ASCII header file and subdirectories for each AMR level. The header describes the AMR hierarchy, including number of levels, the grid boxes at each level, the problem size, refinement ratio between levels, step time, etc. Within each level directory are the MultiFab files for each AMR level. Checkpoint and plotfile directories are written at user-specified intervals.
BoxLib: Parallel Restart
Restarting a calculation can present some difficult issues for reading data efficiently. In the worst case, all processors would need data from all files. If multiple processors try to read from the same file at the same time, performance problems can result, with extreme cases causing file system thrashing. Since the number of files is generally not equal to the number of processors and each processor may need data from multiple files, input during restart is coordinated to efficiently read the data. Each data file is only opened by one processor at a time. The IOProcessor creates a database for mapping files to processors, coordinates the read queues, and interleaves reading its own data. Each processor reads all data it needs from the file it currently has open. The code tries to maintain the number of input streams to be equal to the number of files at all times. Checkpoint and plotfiles are portable to machines with a different byte ordering and precision from the machine that wrote the files. Byte order and precision translations are done automatically, if required, when the data is read.
The plotfile format generated by BoxLib can be read by VisIt, AmrVis, and yt.
Further information about BoxLib can be found by contacting Mike Lijewski of CCSE.