Newsletter August 2016

We are proud to announce that today, our efforts to improve the Deposit toolkit culminates in the release of Deposit 3. In the video below a simulation is shown depositing 2000 AlphaNPD molecules in the timeframe of 8 hours. Due to the (in Deposit 2 terms) large grid size, the equivalent simulation time using the previous Deposit 2 amounts to up to two weeks to achieve the same result.

Each frame shows the deposition of one or multiple Alpha NPD molecules. The deposition stops intermittently to show the execution of a single deposition cycle and resumes to fill the complete box.

Reductions in Deposit 3’s disk I/O lead to large performance increases especially on HPC cluster architectures. In a sample 2000 molecule deposition of AlphaNPD only 60MB disk I/O was conducted. Similar simulations would have created more than 300GB of file I/O in Deposit 2.

 

Reduced file I/O By reengineering Deposit 3 using an optimized in-memory architecture, we were able to reduce the file I/O in a 4.2x4.2x16nm3 for the deposition of 2000 AlphaNPD molecules from approximately 300GB in Deposit 2 to 60MB in Deposit 3. The reduced load, especially on stressed cluster filesystems, accelerates Deposit by one order of magnitude even using serial 1 CPU deposition.

Runtime scaling of the new grid update algorithm. a) Flat runtime of the grid update procedure for a 4.2x4.2x16nm grid. b) Speedup of the grid update procedure. Even for high core counts up to 64 cpus, speedup increases linear up to a speedup of 50 allowing for the simulation of very large grids.

 

Parallelization of grid forcefield generation The usage of large grids up to 30x30x30nm3 was infeasible in Deposit 2, not only due to the mentioned I/O constraints, but in particular due to sub-optimal parallelization. We implemented a near-optimal scaling algorithm conducting the grid update procedure. Using the newly developed algorithm, Deposit 3 can run on up to 64 cores (SMP) machines and allows the update of a 30x30x30nm3 grid in under 8 seconds for the deposition of one Pentacene molecule.

Evolution of runtime per molecule in dependence of morphology size (red: Runtime of the grid update procedure, green: runtime of the Monte Carlo procedure). In contrast to Deposit 2, Deposit 3’s deposition time flatines after the deposition of 100 molecules leading to reliable deposition times even for large morphologies. In spite of the large amount of MC steps a single deposition can be performed in a 13s timeframe. Simulation parameters: XY-periodic box 7x7x21nm3, 32 SA cycles consisting of 120,000 Monte Carlo steps each (3.84M MC steps cumulative).

 

MPI parallelization of simulated annealing cycles Deposit 3 conducts all simulated annealing cycles in parallel using an MPI-SMP shared memory protocol. For example, on a 32 CPU core machine 32 simulated annealing cycles can be performed at the same time, yielding yet another speedup of up to 32 in comparison to Deposit 2. Combining all these optimizations enables the deposition of a single molecule in under 10s on a 32 core machine using 32 SA cycles with 50,000 MC steps each. The in-memory implementation allows the fast construction and update of large grids and therefore facilitates the generation of sufficiently large structures. Parallelization of forcefield construction and simulated annealing cycles enable the increase of the number of SA cycles as well as the MC steps and thus allow for the generation of sufficient statistics required to run subsequent KMC simulations.