Borrowing from ML

Hey there,

It’s Robin from CFD Engine & I’m borrowing stuff from our machine learning friends this week 🤫

I don’t know much about machine learning, but I do know that it has a huge community, lots of research & plenty of investment. It even crosses over with CFD, but that’s not where I’m going today.

Instead I wanted to take a look the tooling that’s springing up around machine learning & see whether we can re-purpose any of it for CFD.

It seems like there’s some overlap between what we both do – we both make computational models of stuff, make changes to those models to see how they respond & ideally improve some key metrics along the way.

There are loads of tools that are trying to make this process easier for ML folks, they even have a word for it – welcome to the world of MLOps.

MLOps?

Machine Learning Operations (or MLOps) is pretty much everything involved in the running of machine learning experiments, including managing (& versioning) everything (data, models, experiments, parameters, etc), validating & testing results, plus provisioning the resources that it all runs on.

It’s a busy space, with plenty of sub-domains, but here are three that I thought overlapped with CFD & that might have some useful tools (&/or ideas) that we could ~~steal~~ borrow.

Pipelines

This one is a double-borrow, coming from ML & from genomics (well known for it’s complex simulation setups).

These tools are essentially scripting languages that boost your Allrun with super-powers, things like restarts, logging, cleanup, task dependencies, parallel scaling & notifications.

Here’s a HUGE list of pipeline tools but if you don’t fancy digging through that, then I like the look of Nextflow & Bpipe.

The former looks like it could be overkill, but the latter looks quite approachable & would probably be my starting point.

I like the idea of adding more logic to my run scripts, especially around restarts & checking that tasks ran properly. If that can be done with something that feels like my current Allrun script, then I’m interested 🤔

Versioning

It can be useful to think your CFD runs as unique combinations of input geometry & OpenFOAM dictionaries – changes to either will change the result.

I can track changes to my OpenFOAM dictionaries using version control tools like Git or Fossil, much the same as developers track changes to their source code.

But tracking changes to geometry files is a little more difficult. Git can tell me who messed up my fvSchemes but it doesn’t really like working with .obj or .stl files, so I won’t know who tampered with them whilst I wasn’t looking.

Turns out, the machine learning folks have a similar problem, especially when it comes to tracking big images & videos. They haven’t quite solved it but there are a couple of interesting tools worth checking out.

DVC is conceptually similar to Git, but has some extra tricks for handling big files. It’s a bit of a one-stop-shop as it also has pipeline tools & experiment tracking – the docs are a good starting point

There’s also an extension to Git which could help – Git LFS (Large File Storage) which claims to be able to handle objects up to a couple of GB.

Both options should be enough to track changes to your .obj & .stl files, but they would require changes to your day-to-day workflow. Your Quality Manager would probably like these tools though 🤓

Experiment Tracking

Essentially these tools record the relationships between your runs, revealing the family tree of your project. Something to replace those ropey spreadsheets, you know the ones, with run titles like:

R20: As R10 with a Fancy New Bit

where:

R10: Combination of R02 & R08

and so on, getting steadily more confusing until we get right back to the baseline 🙈

I like the idea of something “better than a spreadsheet but more user friendly than a raw database” but most of the ML tools I found were either over-powered or enforced a project structure that didn’t really fit with OpenFOAM cases.

Honorable mentions go to DVC (again), Neptune.ai & MLflow – did I miss a better one?

Anything else?

Have you borrowed tools from any other domains for your CFD toolkit? I’ve seen FEA tools & 3D-printing tools that are immediately useful, but is there anything else (or any other domains) that we should be looking at?

Have you used any of these tools? How did you find them? Useful? Or perhaps a bit too much hassle for jobs which can be done with a spreadsheet & a bash script?

Always keen to hear your thoughts – drop me a note 🙏

Until next week, stay safe,