Choosing an EC2 Instance for OpenFOAM

With more than 100 options, how do you chose which EC2 instance type to run your OpenFOAM simulations on?

Which instances are good for CFD? How much does it cost to run CFD on AWS? And how does the performance compare to your local machines?

I’ll answer all of those questions & provide a few rules-of-thumb to help you choose an EC2 instance to run OpenFOAM.

I’ll also share the test cases so you can run them on your own machines. Benchmarking your current hardware should give you an idea of what to expect when you run your own simulations on AWS. I’ll also share all of the test data so you can build-your-own EC2 Rosetta Stone – translating EC2 performance into local performance & giving you an idea of cost.

What this article ISN’T about?

This isn’t an article about how to extract the ultimate compute performance from AWS. There are OpenFOAM settings and compiler flags that would probably slash the run time. This article doesn’t dig into them & it definitely doesn’t get into anything exotic like GPU acceleration.

I don’t look at clustering instances either. You can get a surprising amount of CFD done with a single instance, partly because you can launch as many of them as you have simulations to run. You can build clusters of EC2 instances, but it’s perhaps not the place to start your AWS adventure?

Finally, I don’t go into detail on spot instances – the AWS market where unused compute capacity can be bought at a deep (∼70%) discount. Whilst awesome, the pricing is fluid, making it difficult to compare instances on a cost basis. Whichever instance you choose, know that you can drop your compute costs significantly by using spot instances. There are a few tricks to properly take advantage of spot instances, which deserve their own article.

With this article I wanted to give you an easy-to-replicate series of tests that would give a good indication of the out-of-the-box performance of both OpenFOAM and AWS.

Perhaps I’ll save the other stuff for future articles?

Introducing EC2 instance types

On the off chance that you’re not familiar with how AWS EC2 instances work – you choose from a menu of pre-configured instance types with fixed amounts of CPU, RAM & base storage. Each instance is priced by the hour & charged by the minute. You can’t mix-and-match the hardware options – so you need to choose the instance type that closest fits your use case.

Sounds straight-forward. Except that, depending on your AWS region, there are 100+ different instance types to choose from (and that’s just the current generation of instances).

Ranging from a single CPU with 500MB RAM all the way up to 224 cores with 12TB (yes - terabytes) RAM – all priced accordingly.

Amazon help us out (a little) by breaking the list into “families” of instances that target different compute workloads:

“AWS EC2 Instance Type Families”

Nice, but it still leaves us with 100+ instances to choose from. So, for this article I concentrated on the “compute optimised” & “general purpose” families. The other instance families would certainly run CFD, but you’re likely to be paying for hardware you’re not using – and 12TB RAM isn’t cheap.

That brings our list to just 30+ instances. I needed to knock-out a few more instances to make this workable.

The shortlist of instances to test

I wanted to run a range of model sizes (< 200K cells to ∼25M cells) so I excluded the instances with insufficient RAM to run these models (< 25GB RAM). I also excluded the instances with less than 8 physical cores (a slightly arbitrary choice).

I finished off by excluding instances with the “d” name suffix c5d.9xlarge for example. They have the same compute performance as their non-suffix sibling, but faster storage and associated higher price tag.

This left me with the following instance shortlist – 3 x General Purpose & 3 x Compute Optimised (priced in the US Virginia region):

Name	VCPU	ECU	Memory	$ Per Hour	Type
m4.10xlarge	40	125	160 GB	$2.000	General Purpose
m5.12xlarge	48	173	192 GB	$2.304	General Purpose
m5.24xlarge	96	345	384 GB	$4.608	General Purpose
c5.4xlarge	16	68	32 GB	$0.680	Compute Optimised
c5.9xlarge	36	141	72 GB	$1.530	Compute Optimised
c5.18xlarge	72	281	144 GB	$3.060	Compute Optimised

You might be wondering what the VCPU and ECU columns represent – I’ll get to that – bear with me.

The Test Cases

So, I’ve got my shortlist of instances to test – now what?

Rather than just run a benchmarking code, I wanted to run something that resembled a production CFD workload. A real OpenFOAM case that (if you squinted a bit) looked like one of your simulations. I also wanted something that you could run on your own machine.

I settled on the windAroundBuildings tutorial from OpenFOAM v6. A model of a simplified cityscape, meshed in snappyHexMesh & solved in simpleFoam for 400 iterations.

“The (STD) windAroundBuildings solution with pressure contours & streamlines”

The standard version of the tutorial is < 200K cells – great for a quick tutorial, not so great for benchmarking the performance of the instances. As a simple workaround I stepped through several blockMesh resolutions to create the following test suite of models.

Test Case	Model Size	blockMesh
STD	∼185K Cells	25 - 50 - 10
x2	∼1.1 Million Cells	50 - 40 - 20
x3	∼3.3 Million Cells	75 - 60 -30
x4	∼7.0 Million Cells	100 - 80 - 40
x5	∼13.3 Million Cells	125 - 100 - 50
x6	∼22.5 Million Cells	150 - 120 - 60

I could’ve gone bigger and I realise that many of you run models much larger than this. But for many users, < 25 million cells is representative of where they are with their day-to-day CFD.

So that’s the test matrix – 6 Instances @ 6 model sizes – 36 data points.

How were the tests done?

The testing method was pretty simple:

Start the required instance type, running the official Ubuntu 16.04 machine image
Install the Foundation release of OpenFOAM v6
Download the test cases
Step through the 6 test cases, measuring the time taken to complete each version (mesh + 400its)
Shut down the instance
Repeat for next instance type

Apart from changing the blockMesh resolution, the only differences between the test cases & the tutorial that ships with OpenFOAM were:

It was meshed & solved in parallel using the maximum number of physical cores on the instance, therefore:
- decomposeParDict was added:
  - numberSubDomains was set to the number of physical cores on the instance (see note below)
  - method set to scotch to avoid having to specify decomposition for each different number of cores
The write parameters in controlDict were changed to reduce the amount of storage required:
- writeFormat to binary
- purgeWrite to 1

The test cases are available here if you’d like to run them on your local machine.

VCPUs vs ECUs vs Cores

A quick sidebar – AWS lists the number of VCPUs and ECUs in their descriptions of instance types but what are they?

ECUs (Elastic Compute Units) are a benchmark figure intended to score instances such that different processor types can be ranked against each other.

VCPUs are the Virtual CPUs on the instance. They represent the number of threads available NOT the physical cores. On our instance types, the number of cores is half the VCPU count. For example – c5.18xlarge has 72 VCPUs and 36 physical cores.

As mentioned, all of these test cases were run on the maximum number of physical cores of each instance NOT the number of VCPUs.

“But wouldn’t running on the threads speed things up?”

I thought you might ask that so I ran a whole host of tests and the short answer is no.

The headlines of those VCPU/scaling tests were:

Running on the maximum number of physical cores produced the shortest run time, for all of the tested model sizes;
Using the available threads is slower (although not by much);
Run time doesn’t scale with core count (likely memory bandwidth limited). For example: 18 cores is less than twice as quick as 6 cores;

An example of the scaling is shown below, from a c5.9xlarge instance with 36 VPUs &/or 18 physical cores. You can slice and dice the data from these tests for yourself – available here.

“Effect on solve time of increasing the numSubdomains - c5.9xlarge and 13 million cell case”

Which instances stood out?

I’m not going to go through every test case on every machine. Instead, I’ll draw out some of the key takeaways along with a few rules-of-thumb.

If you’d like, you can play with &/or download all the data for these tests via Airtable.

As an example, the chart below shows how the tested instances ranked against each other for the 13 million cell case – cost vs. time taken.

“Comparing cost-per-run against time taken for the 13 million cell model across EC2 instances”

Cheapest Instance Type – c5.4xlarge

Across all tested model sizes, the cheapest solution came from the c5.4xlarge instance (8 cores & 32GB RAM). It’s also the slowest. But, if time isn’t an issue (& with cloud CFD it may not be) then this instance was the cheapest way to solve the test cases.

Fastest Instance Type – m5.24xlarge

On all but the smallest test case, the fastest solution came from the m5.24xlarge instance (48 cores & 384GB RAM). It was roughly 3.5x quicker than the cheapest solution at roughly twice the cost. A reasonable ratio, but not the best on offer.

Best Value – c5.18xlarge

If you’re sensitive to both cost and speed then the c5.18xlarge (36 cores & 144GB RAM) is an interesting option. On all but the smallest cases it was roughly 3x faster than the cheapest option, but roughly 60% more expensive.

How to select an EC2 instance for CFD

If the instances above don’t seem like they’d be a fit for your workload then try these rules-of-thumb to help you select your own.

Filter the instance list by RAM

If your simulation won’t fit in the available RAM then things won’t be at all happy – start with this.

Choose from “C” & “M” families

Only look outside the Compute Optimised (“C”) & General Purpose (“M”) families if you can’t get the RAM you need. The other instance families are much more expensive, for the same crunch power, thanks to their added RAM or storage.

Prefer C-family over M-family…

…unless you need results ASAP (or you need the RAM). But, if you do need the RAM, then you’re likely running a MUCH larger case than I’ve tested here.

Ignore the “d” suffix instances

These instances have the same crunch power as their non-suffix siblings but are much more expensive thanks to their large solid-state drives. It’s unlikely that you read-write enough to justify the added expense.

Over to you

The datasets are available for you to play with (if you’re interested) – EC2 instance data plus the VCPU/Scaling results.

You can also grab the test cases from GitHub & run them on your local machine. This should give you an idea of how your machines compare to AWS in terms of speed. From there you should be able to estimate how quickly (& hence how much it would cost) to run your own OpenFOAM cases on EC2.

It wouldn’t surprise me if your local machine is faster, but that’s not the only draw for running your CFD in the cloud.

If you’d like to find out more about how to get the most out of AWS for OpenFOAM, then check out my online course. It walks you through building your own OpenFOAM cloud on AWS, including how to take advantage of those deeply-discounted spot instances.