With more than 100 options, how do you chose which EC2 instance type to run your OpenFOAM simulations on?
Which instances are good for CFD? How much does it cost to run CFD on AWS? And how does the performance compare to your local machines?
I’ll answer all of those questions & provide a few rules-of-thumb to help you choose an EC2 instance to run OpenFOAM.
I’ll also share the test cases so you can run them on your own machines. Benchmarking your current hardware should give you an idea of what to expect when you run your own simulations on AWS. I’ll also share all of the test data so you can build-your-own EC2 Rosetta Stone – translating EC2 performance into local performance & giving you an idea of cost.
What this article ISN’T about?
This isn’t an article about how to extract the ultimate compute performance from AWS. There are OpenFOAM settings and compiler flags that would probably slash the run time. This article doesn’t dig into them & it definitely doesn’t get into anything exotic like GPU acceleration.
I don’t look at clustering instances either. You can get a surprising amount of CFD done with a single instance, partly because you can launch as many of them as you have simulations to run. You can build clusters of EC2 instances, but it’s perhaps not the place to start your AWS adventure?
Finally, I don’t go into detail on spot instances – the AWS market where unused compute capacity can be bought at a deep (∼70%) discount. Whilst awesome, the pricing is fluid, making it difficult to compare instances on a cost basis. Whichever instance you choose, know that you can drop your compute costs significantly by using spot instances. There are a few tricks to properly take advantage of spot instances, which deserve their own article.
With this article I wanted to give you an easy-to-replicate series of tests that would give a good indication of the out-of-the-box performance of both OpenFOAM and AWS.
Perhaps I’ll save the other stuff for future articles?
Introducing EC2 instance types
On the off chance that you’re not familiar with how AWS EC2 instances work – you choose from a menu of pre-configured instance types with fixed amounts of CPU, RAM & base storage. Each instance is priced by the hour & charged by the minute. You can’t mix-and-match the hardware options – so you need to choose the instance type that closest fits your use case.
Sounds straight-forward. Except that, depending on your AWS region, there are 100+ different instance types to choose from (and that’s just the current generation of instances).
Ranging from a single CPU with 500MB RAM all the way up to 224 cores with 12TB (yes - terabytes) RAM – all priced accordingly.
Amazon help us out (a little) by breaking the list into “families” of instances that target different compute workloads:
Nice, but it still leaves us with 100+ instances to choose from. So, for this article I concentrated on the “compute optimised” & “general purpose” families. The other instance families would certainly run CFD, but you’re likely to be paying for hardware you’re not using – and 12TB RAM isn’t cheap.
That brings our list to just 30+ instances. I needed to knock-out a few more instances to make this workable.
The shortlist of instances to test
I wanted to run a range of model sizes (< 200K cells to ∼25M cells) so I excluded the instances with insufficient RAM to run these models (< 25GB RAM). I also excluded the instances with less than 8 physical cores (a slightly arbitrary choice).
I finished off by excluding instances with the “d” name suffix c5d.9xlarge
for example. They have the same compute performance as their non-suffix sibling, but faster storage and associated higher price tag.
This left me with the following instance shortlist – 3 x General Purpose & 3 x Compute Optimised (priced in the US Virginia region):
Name | VCPU | ECU | Memory | $ Per Hour | Type |
---|---|---|---|---|---|
m4.10xlarge | 40 | 125 | 160 GB | $2.000 | General Purpose |
m5.12xlarge | 48 | 173 | 192 GB | $2.304 | General Purpose |
m5.24xlarge | 96 | 345 | 384 GB | $4.608 | General Purpose |
c5.4xlarge | 16 | 68 | 32 GB | $0.680 | Compute Optimised |
c5.9xlarge | 36 | 141 | 72 GB | $1.530 | Compute Optimised |
c5.18xlarge | 72 | 281 | 144 GB | $3.060 | Compute Optimised |
You might be wondering what the VCPU and ECU columns represent – I’ll get to that – bear with me.
The Test Cases
So, I’ve got my shortlist of instances to test – now what?
Rather than just run a benchmarking code, I wanted to run something that resembled a production CFD workload. A real OpenFOAM case that (if you squinted a bit) looked like one of your simulations. I also wanted something that you could run on your own machine.
I settled on the windAroundBuildings tutorial from OpenFOAM v6. A model of a simplified cityscape, meshed in snappyHexMesh
& solved in simpleFoam
for 400 iterations.
The standard version of the tutorial is < 200K cells – great for a quick tutorial, not so great for benchmarking the performance of the instances. As a simple workaround I stepped through several blockMesh
resolutions to create the following test suite of models.
Test Case | Model Size | blockMesh |
---|---|---|
STD | ∼185K Cells | 25 - 50 - 10 |
x2 | ∼1.1 Million Cells | 50 - 40 - 20 |
x3 | ∼3.3 Million Cells | 75 - 60 -30 |
x4 | ∼7.0 Million Cells | 100 - 80 - 40 |
x5 | ∼13.3 Million Cells | 125 - 100 - 50 |
x6 | ∼22.5 Million Cells | 150 - 120 - 60 |
I could’ve gone bigger and I realise that many of you run models much larger than this. But for many users, < 25 million cells is representative of where they are with their day-to-day CFD.
So that’s the test matrix – 6 Instances @ 6 model sizes – 36 data points.
How were the tests done?
The testing method was pretty simple:
- Start the required instance type, running the official Ubuntu 16.04 machine image
- Install the Foundation release of OpenFOAM v6
- Download the test cases
- Step through the 6 test cases, measuring the time taken to complete each version (mesh + 400its)
- Shut down the instance
- Repeat for next instance type
Apart from changing the blockMesh
resolution, the only differences between the test cases & the tutorial that ships with OpenFOAM were:
- It was meshed & solved in parallel using the maximum number of physical cores on the instance, therefore:
decomposeParDict
was added:numberSubDomains
was set to the number of physical cores on the instance (see note below)method
set toscotch
to avoid having to specify decomposition for each different number of cores
- The write parameters in
controlDict
were changed to reduce the amount of storage required:writeFormat
tobinary
purgeWrite
to1
The test cases are available here if you’d like to run them on your local machine.
VCPUs vs ECUs vs Cores
A quick sidebar – AWS lists the number of VCPUs and ECUs in their descriptions of instance types but what are they?
ECUs (Elastic Compute Units) are a benchmark figure intended to score instances such that different processor types can be ranked against each other.
VCPUs are the Virtual CPUs on the instance. They represent the number of threads available NOT the physical cores. On our instance types, the number of cores is half the VCPU count. For example – c5.18xlarge
has 72 VCPUs and 36 physical cores.
As mentioned, all of these test cases were run on the maximum number of physical cores of each instance NOT the number of VCPUs.
“But wouldn’t running on the threads speed things up?”
I thought you might ask that so I ran a whole host of tests and the short answer is no.
The headlines of those VCPU/scaling tests were:
- Running on the maximum number of physical cores produced the shortest run time, for all of the tested model sizes;
- Using the available threads is slower (although not by much);
- Run time doesn’t scale with core count (likely memory bandwidth limited). For example: 18 cores is less than twice as quick as 6 cores;
An example of the scaling is shown below, from a c5.9xlarge
instance with 36 VPUs &/or 18 physical cores. You can slice and dice the data from these tests for yourself – available here.
Which instances stood out?
I’m not going to go through every test case on every machine. Instead, I’ll draw out some of the key takeaways along with a few rules-of-thumb.
If you’d like, you can play with &/or download all the data for these tests via Airtable.
As an example, the chart below shows how the tested instances ranked against each other for the 13 million cell case – cost vs. time taken.
Cheapest Instance Type – c5.4xlarge
Across all tested model sizes, the cheapest solution came from the c5.4xlarge
instance (8 cores & 32GB RAM). It’s also the slowest. But, if time isn’t an issue (& with cloud CFD it may not be) then this instance was the cheapest way to solve the test cases.
Fastest Instance Type – m5.24xlarge
On all but the smallest test case, the fastest solution came from the m5.24xlarge
instance (48 cores & 384GB RAM). It was roughly 3.5x quicker than the cheapest solution at roughly twice the cost. A reasonable ratio, but not the best on offer.
Best Value – c5.18xlarge
If you’re sensitive to both cost and speed then the c5.18xlarge
(36 cores & 144GB RAM) is an interesting option. On all but the smallest cases it was roughly 3x faster than the cheapest option, but roughly 60% more expensive.
How to select an EC2 instance for CFD
If the instances above don’t seem like they’d be a fit for your workload then try these rules-of-thumb to help you select your own.
Filter the instance list by RAM
If your simulation won’t fit in the available RAM then things won’t be at all happy – start with this.
Choose from “C” & “M” families
Only look outside the Compute Optimised (“C”) & General Purpose (“M”) families if you can’t get the RAM you need. The other instance families are much more expensive, for the same crunch power, thanks to their added RAM or storage.
Prefer C-family over M-family…
…unless you need results ASAP (or you need the RAM). But, if you do need the RAM, then you’re likely running a MUCH larger case than I’ve tested here.
Ignore the “d” suffix instances
These instances have the same crunch power as their non-suffix siblings but are much more expensive thanks to their large solid-state drives. It’s unlikely that you read-write enough to justify the added expense.
Over to you
The datasets are available for you to play with (if you’re interested) – EC2 instance data plus the VCPU/Scaling results.
You can also grab the test cases from GitHub & run them on your local machine. This should give you an idea of how your machines compare to AWS in terms of speed. From there you should be able to estimate how quickly (& hence how much it would cost) to run your own OpenFOAM cases on EC2.
It wouldn’t surprise me if your local machine is faster, but that’s not the only draw for running your CFD in the cloud.
If you’d like to find out more about how to get the most out of AWS for OpenFOAM, then check out my online course. It walks you through building your own OpenFOAM cloud on AWS, including how to take advantage of those deeply-discounted spot instances.