AWS compute for CFD: a mini-guide

Hey there,

It’s Robin from CFD Engine & I’m giving you a whistle-stop tour of AWS’ compute services this week – which ones are useful for CFD, what do they offer & who might find them useful?

I wrote a mini-guide to AWS storage a while ago, so I thought I’d make it a pair.

It’s just a casual overview of the main CFD-friendly compute services, but hopefully you’ll get something out of it, even if you’re not an AWS user 🤞

A mini-guide to AWS compute for CFD

I don’t even know how many ways there are to buy compute from AWS, but I’m going to stick to the 5 or 6 services that are most relevant for CFD.

I’ll give you a quick overview on everything from a plain-ol’ server to a fully-managed serverless platform.

A minute on each, with the minimum of jargon & hopefully some insight.

Let’s do this…

EC2

Elastic Compute Cloud (or EC2) is the basic building block for all of AWS compute services – pay-as-you-go access to a server (or lots of servers) where you’re in charge of everything.

These are just like your local machines – you choose the software, the security, the storage, the networking – it’s just that these machines are in one of AWS’ datacenters.

You don’t specify how much RAM or how many cores you want though. Instead, you pick from a menu of hundreds of different server types (or instances in AWS-speak) all helpfully organised into “families” targeting different workloads.

The “compute optimised” & “HPC optimised” families (see here) are probably the most relevant for CFD, with instances featuring up to 96 cores & 384GB RAM.

There are cheaper/smaller instances too, which can often be more cost-effective, so it’s worth trying a few to see which ones fit your simulations. Don’t forget, you’re charged by the minute – if it’s running, you’re paying.

Golden rule of EC2: Turn off unused instances

So, grab an Ubuntu instance with a few cores, install OpenFOAM & you’re away. This is your basic, bog standard CFD machine on AWS.

ParallelCluster

If you want something with a bit more crunch-power, then you might want to build a cluster of instances. You could do this manually, but you shouldn’t. Let the AWS take the strain & use ParallelCluster instead.

Create a custom CFD environment (featuring your preferred OpenFOAM & ParaView versions, plus any other tools you might need) & let ParallelCluster turn it into a scalable cluster, complete with shared storage, a head/management node & a queue for submitting jobs.

When you do submit a job, it automatically provisions the nodes you need to run it & sorts out networking, access to the fast interconnects & storage.

It’s equally happy running hundreds of single instance jobs, or a single job spread across hundreds (or thousands) of instances.

ParallelCluster is free – you just pay for the underlying compute & storage (including the management/head node, which doesn’t do any actual CFD).

I’d suggest playing with EC2 instances first, but if/when you need a cluster, this should be your first stop.

AWS Batch

As the name suggests this is a service for running batch jobs (i.e. self-contained work units that don’t need any intervention).

It’s Docker-based 🙈 which means you bundle your CFD environment into a Docker image & upload it somewhere that Batch can grab it.

Next you define what a “job” looks like – i.e. what compute resources it needs, which Docker image to use & where to store the results.

Then, when you submit a new job, Batch will provision the compute, grab your custom Docker image, add your case data & run it for you (shutting everything down afterwards).

It’s entirely hands-off, there’s no interactivity & no logging-in to check on progress – CFD jobs go in one end, results come out of the other.

If you’re cranking through lots of very similar jobs (& you’ve optimised your workflow into a black box that doesn’t need/allow any interaction) then you might be a good match for AWS Batch.

As before, I’d figure things out in EC2 & then implement it in Batch.

AWS ECS & EKS

I’m including these two for completeness because they can be used to run CFD & they can manage insanely complex workflows, but you don’t need them (yet).

Both tools provide what’s known as “container orchestration” i.e. juggling Docker-based tasks across lots of computers – they’re the pro-elite version of AWS Batch.

You can safely ignore ECS & EKS until you’ve outgrown Batch.

AWS Lambda

This one is a little different – AWS Lambda is a serverless code runner – you give it some code (or a script) & it handles the rest (finding somewhere to run it, running it, returning a result & shutting down).

You can’t run a full CFD job on it (there’s a 15min hard limit on run time) but you could use it for stand-alone, automated pre/post-processing tasks.

For example, you could use a Lambda to:

process your forces & store them in a database
do some geometry cleanup, using a MeshLab script
generate some batch post-pro images, using pvbatch
do some pre-run validation checks on a new case or geometry

Pretty much any little task that doesn’t need to be run on a cluster & doesn’t really warrant starting a new instance.

It’s charged based on how much RAM you need & how long your task takes to run – your first 400,000 GB-seconds each month are free 🤯

I’m sure there’s something CFD-related that you could do within that allowance 🤔 have a play.

What are you using?

I know most of you aren’t using AWS to run your simulations, but I hope this overview was somewhat useful 🤞

If you have a local cluster with a queue in front of it (like SLURM, Torque or SGE), then I reckon you’d be totally at home using Parallel Cluster on AWS.

Batch, ECS, EKS & Lambda are a little bit different, but you can (with some effort) build cost-effective, scalable CFD number-crunchers with them.

The key point with these options is, having a CFD workflow that doesn’t need attention, interaction or “babysitting” – are you there yet?

Let me know your experiences using AWS (or any of the public clouds) to run your simulations? Or, if you have a question about any of the above, drop me a note & I’ll do my best to answer it (or at least make something plausible up) 😉

Until next week, stay safe,