Hey there,
It’s Robin from CFD Engine and I’ve been testing OpenFOAM on ARM on AWS this week and I’m left wondering why I’m so late to the party.
I’d read some good things about running OpenFOAM on the AWS ARM instances. Someone who knows about these things told me that I should check ’em out. I’d even written about them – but I still hadn’t gotten around to testing them out 😳
I needed to build a new AMI anyway, so I thought I’d take an ARM instance for a spin while I was there – now I’m wishing I did this six months ago.
What are ARM processors?
You’ll be familiar with Intel & AMD (x86) processors, but you might never have heard of ARM & that might be because they don’t really make anything.
ARM provide a recipe book for building your own processors & they’re everywhere. They’re in your phone, your tablet, your TV, maybe even your fridge – the kind of places where an Intel or AMD chip might not be a good fit.
They’re also on AWS. Amazon created their own server-grade ARM CPUs that form the basis for several new instance families. Some of these ARM instances are interesting for CFD – a good number of cores, decent memory bandwidth & competitively priced.
So, why hadn’t I tested them?
What’s the hold up?
Honestly, it sounded like a faff. You need to compile OpenFOAM specifically for ARM & I’m not into wrestling with compilers – I’m more of a binaries guy.
Secondly, for some reason I had the impression that the ARM instances were slower than the usual instances, but priced to make them competitive.
And whilst cheaper is usually better, I’m relatively happy with my turnaround times.
As it turns out, I’d missed something rather important.
Cores vs VCPUs
AWS offers a huge menu of different instance types, different combinations of CPU, RAM, storage & networking to meet different needs. You pick the one closest to what you like, rather than specifying all of the individual ingredients.
One of the preset items is the Virtual CPU count, i.e. the number of threads an instance’s processor has. For CFD we can chop the VCPU count in half, as we’re interested in the number of cores available, not threads (don’t run on all the threads, it doesn’t speed things up).
Then the ARM instances turned up & showed why it pays to read the small print.
ARM instances don’t have threads, each ARM VCPU is a physical core.
So hang on a minute – they’re cheaper AND they have way more cores – this is starting to look more interesting than I first thought.
What I did
The original benchmark blog post looked at running OpenFOAM on large instances (64 ARM cores vs 36 or 48 Intel cores) & found ARM to be slower.
Those instances have a lot of RAM & RAM is expensive on AWS. Plus OpenFOAM doesn’t scale linearly on single instances due to memory bandwidth issues. As such, those large instances are expensive & don’t go that much quicker.
How would the comparison stack up on the smaller, more cost-efficient instances?
To find out I…
- Booted a mid-sized AWS ARM instance (
c6g.4xlarge
) with Ubuntu 20.04; - Compiled OpenFOAM v2012 on it;
- Ran a few test cases ~7M, ~13M & ~22M cells;
- Repeated those tests across a few other ARM & Intel instances;
What happened
Compilation
This bit was very uneventful, no changes were needed to compile OpenFOAM v2012 for ARM 👍
I followed the steps in the build guide, used system libraries for openmpi
etc & it compiled first time, taking about 40mins on 16 cores.
Good to go – first hurdle cleared.
Benchmark Tests
The tests were a bit more more surprising – the ARM instances smashed the “equivalent” Intel instances.
For example: 16 ARM cores vs 18 Intel cores
- with ~7M & ~13M cell models ARM was ~30% faster & ~75% cheaper
- with ~22M cells it was a little closer, with ARM ~15% faster & ~70% cheaper
On a cost basis, I could run the ~22M cell test case on 64 ARM cores for less than the cost of 18 Intel cores AND have it done in almost half the time.
Much more compelling than it first seemed.
Cheaper-per-hour AND faster than their Intel counterparts, making them much cheaper overall – why didn’t I try this earlier?
I still need to figure out how spot pricing and spot availability factor in to this equation, but there’s definitely something interesting here.
This was all on single instances but it has some implications for running on clustered instances too – check out Simon’s blog post for some more info on clustering ARM instances.
I need to check out the AMD EPYC-based instances too.
Key takeaways
I had (fairly) assumed that the ARM instances on AWS were slower than their Intel counterparts but priced such that they were cost-competitive.
It seems that might just be for the big instances. The smaller instances can be faster and much cheaper than the Intel alternatives.
The key takeaways here are:
- Read the small print – ARM VCPUs are PHYSICAL CORES
- Test things yourself
I’m keen to hear if you’ve tested these instances and whether you saw similar results on your models.
Perhaps there’s something else CFD-related that you’ve tried out recently & wished you’d got to it sooner?
Drop me a note, it’s always good to hear about little CFD wins.
Until next week, stay safe,