Hey there,
It’s Robin from CFD Engine & I’ve been playing with AWS instances again. Specifically, a super-simple, single-instance CFD shootout β AMD vs Intel vs ARM.
A naΓ―ve attempt at figuring out which instances are good for CFD & which ones we should avoid, based on a ~20million cell test case.
There were a couple of surprises in there but, spoiler alert, the ARM instances are still really good π
Background
When I started using AWS there was essentially just one good CFD instance, the cc2.8xlarge
boasting 16 cores & 60GB RAM β things have changed a bit since then.
So many instances are now capable of running CFD, and with new dedicated HPC-instances in the works, it’s becoming increasingly difficult to make sense of it all π€―
And that’s just on AWS, Azure are playing the same game, their upcoming Milan-X-based instances look really interesting for CFD.
But let’s stick to AWS for the moment, specifically their compute-optimised family of instances.
The current generation of that one family, includes eight distinct sub-groups. If you include ye olde c4
instances, there are 80+ different instances to choose from β far too many.
For this shootout I wanted to focus on the 3 different clans within the compute-optimised family. Those featuring AMD EPYC chips (c5a
), those featuring Intel Ice Lake chips (c6i
) & those based on AWS’ own Graviton2 ARM chips (c6g
). They can all run CFD, so what’s the difference? Is there a “best” choice? Perhaps there’s one to avoid?
Let’s see…
What I did
It was all pretty simple β I ran a 22.5 million cell version of the windAroundBuildings
tutorial for 500its on 16-core, 32-core, 48-core & 64-core instances from each of the three chip clans & timed ’em (one exception β there’s currently no 64-core AMD instance).
Some details (if you’re interested):
- single instances only (no clustering, this time);
- all done in OpenFOAM
v2106
; - the Intel & AMD instances used the pre-compiled Ubuntu binaries;
- the ARM code was compiled without any tweaks & used system libraries;
- the model was meshed in advance & decomposed (using the
scotch
method) to match the physical cores on the instance; - timings are just for the
simpleFoam
solve phase, including the read-in & a single write at the end (no post-pro or additionalfunctionObjects
were used); - prices were for on-demand Linux instances, in the EU-Ireland region, this week;
You can find the full data here, but I’ll boil it down to a few takeaways.
TL;DR
Here are the headlines:
- ARM instances (
c6g
) were cheapest across the board, often less than half the cost of the alternatives; - ARM instances didn’t get meaningfully quicker beyond 32 cores;
- Ice Lake instances (
c6i
) were quickest across the board, especially above 32 cores; - The 64-core instances weren’t meaningfully quicker then their 48-core siblings (both for Ice Lake & ARM);
- The EPYC instances (
c5a
) were the worst choice in all cases (slowest & most expensive, roughly 3x the cost of running on ARM);
This all boils down to memory bandwidth, chip architectures & virtualisation, all things I’m not qualified to talk about, but at a superficial level…
I was surprised that the EPYC chips weren’t better, there has been a lot of noise about them for CFD, but I think I’ll pass, at least until the next generation arrives.
I was surprised that the ARM chips saturated at (or before) 32 cores. I’d probably only use the larger ARM instances if I needed the RAM (note: they have less RAM then their x86 cousins).
I think the reason the Ice Lake instances looked so good at 48 cores is that they have 2 CPUs (rather than one big one) & therefore have more effective memory bandwidth. The 64-core variant didn’t really go any faster, so I probably wouldn’t use that one.
Conclusion
Whilst it’s pretty easy to pick holes in my “methodology”, I think it reflects what many people want to do with CFD on AWS β solve a decent sized job in a reasonable time frame, without spending more than they need to.
So, if you want to solve on a single instance (on AWS) then:
- an ARM instance will give you the best bang for your buck, but avoid the big ones (> 32 cores) unless you need the RAM;
- a 64-core Ice Lake instance will solve this case the quickest, but you could save a little money by using the 48-core version & probably not notice the extra run time;
- give the EPYC chips a miss, for now.
This also has implications for clustering instances, but I’ll leave that for the next shootout.
What’s your experience of these instances? Do you have a go-to instance? Have you found the sweet spot for your jobs? Always keen to hear what you’ve discovered.
Until next week, stay safe,