Scotch, anyone?

Hey there,

It’s Robin from CFD Engine & I’ve strayed deep into the OpenFOAM weeds this week – I couldn’t help it, I nerd sniped myself.

I routinely mesh & solve using scotch decomposition, mainly because I’m too lazy to figure out the hierarchical coefficients every time I want to run on a different machine. But having written that scotch is slower than hierarchical in snappyHexMesh, I just had to try it out.

Is my laziness costing me time & money & if so, how much?

I ran a few cases to see what’s up – here’s what found…

Background

ICYMI: I recently summarised a blog post from AWS on tips for running cost-effective OpenFOAM simulations. It wasn’t just AWS stuff, most of the recommendations were OpenFOAM-based.

The one that stuck out for me was: “use hierarchical decomposition for meshing.” In their test case they found that snappyHexMesh took up to twice as long to complete when using scotch decomposition 😶

I’m always up for a change of that magnitude, but my cases are typically larger than their 8M cell test case. What kind of speedup could I expect?

I ran a few test meshes (including a 37M cell model from a current client project) & in all cases hierarchical decomposition was faster than scotch. I never got to the realms of twice as quick, but it was always quicker.

What’s going on?

As far as I can tell this is all about balancing. As snappyHexMesh adds new cells it periodically pauses & redistributes the mesh across all the available processors, trying to maintain a balanced workload.

Almost all of the re-balancing occurs during the refinement phase, and it uses the decomposition method specified in your decomposeParDict.

It turns out that re-balancing using the hierarchical method is much quicker than scotch & this is where we get the speed up.

Your mileage may vary

If you mesh using scotch, you can get an idea of what switching to hierarchical might yield by taking look inside a recent log.snappyHexMesh.

Here are a couple of grep commands to grab the important lines:

grep ^Balanced log.snappyHexMesh will show you all the times SHM paused to re-balance the mesh & how long those individual pauses were (sum these up);
grep "Mesh refined in" log.snappyHexMesh will tell you how long the mesh refinement phase took.

In my 37M cell client-project case, SHM spent 23mins refining the mesh, of which 11mins were spent re-balancing with scotch.

Switching to hierarchical shaved 9mins off this.

The other meshing phases (snapping & layering) saw no speedup, but 9mins off a total meshing time of 50mins isn’t too shabby.

Extras

I also played with a couple of balancing-related settings in snappyHexMeshDict. Increasing maxLoadUnbalance & reducing maxLocalCells both made things quicker, but nowhere near the impact of switching to hierarchical.

I narrowly avoided falling all the way down the decomposition rabbit hole, but I did note that:

the choice of coefficients for the hierarchical method made a small difference to meshing time, although this is most likely to be case specific;
the kahip method was faster than scotch (and also doesn’t need you to specify coefficients) but it still wasn’t as quick as hierarchical.

Finally, many thanks to Ivan for introducing me to the multiLevel decomposition method which offers a way to tailor your decomposition to your hardware 🙏 Worth exploring for you HPC-heads?

TL;DR

I’m not sure whether this was time well spent…

AWS were right, snappyHexMesh goes faster when you use hierarchical decomposition instead of scotch.

And I shortened an 8hr run by 9mins (or 30¢) 🙈

A bit of an anti-climax.

I think it’s probably time for a real Scotch 🥃

Until next week, stay safe,