AWS CFD Shootout: Single Instances

Hey there,

It’s Robin from CFD Engine & I’ve been playing with AWS instances again. Specifically, a super-simple, single-instance CFD shootout – AMD vs Intel vs ARM.

A naïve attempt at figuring out which instances are good for CFD & which ones we should avoid, based on a ~20million cell test case.

There were a couple of surprises in there but, spoiler alert, the ARM instances are still really good 👏

Background

When I started using AWS there was essentially just one good CFD instance, the cc2.8xlarge boasting 16 cores & 60GB RAM – things have changed a bit since then.

So many instances are now capable of running CFD, and with new dedicated HPC-instances in the works, it’s becoming increasingly difficult to make sense of it all 🤯

And that’s just on AWS, Azure are playing the same game, their upcoming Milan-X-based instances look really interesting for CFD.

But let’s stick to AWS for the moment, specifically their compute-optimised family of instances.

The current generation of that one family, includes eight distinct sub-groups. If you include ye olde c4 instances, there are 80+ different instances to choose from – far too many.

For this shootout I wanted to focus on the 3 different clans within the compute-optimised family. Those featuring AMD EPYC chips (c5a), those featuring Intel Ice Lake chips (c6i) & those based on AWS’ own Graviton2 ARM chips (c6g). They can all run CFD, so what’s the difference? Is there a “best” choice? Perhaps there’s one to avoid?

Let’s see…

What I did

It was all pretty simple – I ran a 22.5 million cell version of the windAroundBuildings tutorial for 500its on 16-core, 32-core, 48-core & 64-core instances from each of the three chip clans & timed ’em (one exception – there’s currently no 64-core AMD instance).

Some details (if you’re interested):

single instances only (no clustering, this time);
all done in OpenFOAM v2106;
the Intel & AMD instances used the pre-compiled Ubuntu binaries;
the ARM code was compiled without any tweaks & used system libraries;
the model was meshed in advance & decomposed (using the scotch method) to match the physical cores on the instance;
timings are just for the simpleFoam solve phase, including the read-in & a single write at the end (no post-pro or additional functionObjects were used);
prices were for on-demand Linux instances, in the EU-Ireland region, this week;

You can find the full data here, but I’ll boil it down to a few takeaways.

TL;DR

Here are the headlines:

ARM instances (c6g) were cheapest across the board, often less than half the cost of the alternatives;
ARM instances didn’t get meaningfully quicker beyond 32 cores;
Ice Lake instances (c6i) were quickest across the board, especially above 32 cores;
The 64-core instances weren’t meaningfully quicker then their 48-core siblings (both for Ice Lake & ARM);
The EPYC instances (c5a) were the worst choice in all cases (slowest & most expensive, roughly 3x the cost of running on ARM);

This all boils down to memory bandwidth, chip architectures & virtualisation, all things I’m not qualified to talk about, but at a superficial level…

I was surprised that the EPYC chips weren’t better, there has been a lot of noise about them for CFD, but I think I’ll pass, at least until the next generation arrives.

I was surprised that the ARM chips saturated at (or before) 32 cores. I’d probably only use the larger ARM instances if I needed the RAM (note: they have less RAM then their x86 cousins).

I think the reason the Ice Lake instances looked so good at 48 cores is that they have 2 CPUs (rather than one big one) & therefore have more effective memory bandwidth. The 64-core variant didn’t really go any faster, so I probably wouldn’t use that one.

Conclusion

Whilst it’s pretty easy to pick holes in my “methodology”, I think it reflects what many people want to do with CFD on AWS – solve a decent sized job in a reasonable time frame, without spending more than they need to.

So, if you want to solve on a single instance (on AWS) then:

an ARM instance will give you the best bang for your buck, but avoid the big ones (> 32 cores) unless you need the RAM;
a 64-core Ice Lake instance will solve this case the quickest, but you could save a little money by using the 48-core version & probably not notice the extra run time;
give the EPYC chips a miss, for now.

This also has implications for clustering instances, but I’ll leave that for the next shootout.

What’s your experience of these instances? Do you have a go-to instance? Have you found the sweet spot for your jobs? Always keen to hear what you’ve discovered.

Until next week, stay safe,