pcluster for the win

Hey there,

It’s Robin from CFD Engine, back with your weekly CFD note. Grab a coffee, I’m going long(ish) on building CFD clusters on AWS ☕️

tl;dr: clustering instances on AWS isn’t that hard & it works quite well.

You might’ve noticed I have a thing about running my CFD on a one-instance-per-case basis. However, I often get asked (by more sensible CFDinhos) about running on clusters of instances. A good question & one that I didn’t have a good answer for, until recently.

I never fancied the hassle of firing up multiple instances, getting them to talk to each other, submitting jobs and then tearing it all down afterwards. Pro Tip: NEVER leave an instance running with the intention of turning it off later – “later” can get VERY expensive.

Thankfully, AWS have an open-source, command line that tool should do all the heavy lifting, admin-y bits of building a cluster for us🤞

Welcome to Parallel Cluster.

The Basics

The basic process looks like this:

Build a custom parallel cluster AMI: grab one of their existing AMIs, install your favourite CFD-bits, compile OpenFOAM to use the provided MPI libraries & save it as your very own custom AMI. Only needs to be done once & then you’re set;
Install Parallel Cluster: a couple of python commands & a handful of multiple choice questions & you’re ready to roll;
Launch Your First Cluster: a single command fires up a head/management node (& optionally your compute nodes), configures networking & shared storage etc - takes about 5 mins;
Connect to the head node: a single command has you SSH’d into your management node;
Upload / Grab your case files: from S3 or wherever & stash them in the shared storage;
Submit your job to the queue: there used to be a choice of queues, but its SLURM from now on;
Grab a cocktail🍹: the head/management node will spin up the required amount of instances & launch your job. Once complete, it will (depending how you configured it) shutdown any instances that are no longer required.
Rinse & repeat steps 5-7 (as required): go easy on the cocktails, but you get the idea;
Tear it all down: A single command terminates the head node & tears down all the supporting bits & pieces that were created in step 3 (takes about 5 mins) 👏

My Tests

I kept things super-simple & modest.

I didn’t dig into the fast networking (elastic fabric adapter), the fast parallel filesystem (FSx for Lustre) or any of that “advanced” stuff.

I wanted to get a feel for how it all goes together & what a small group of cheaper instances could do as a team. So, I built little clusters, just 4 nodes, using some of the smaller instance types:

4 x c5n.2xlarge = 16 cores total
4 x c5n.4xlarge = 32 cores total
4 x c5n.9xlarge = 72 cores total

I used a few of my windAroundBuildings test cases ranging from 7 million to 22.5 million cells as the jobs.

The Headlines

I saw a speed-up of 3.87 out of 4 on my largest cluster with my largest case (22.5 mill on 72 cores). On the smallest case, it was slightly better, 3.98 out of 4. I saw similar results on all of the cases & configs I ran. Not too shabby considering the effort required to get there, i.e. not much.

I didn’t quite get to the point where “more cores = more cheaper” but I got much closer than I thought I would.

It’s still going to be a time vs. cost trade-off, but clustered instances seem to work well enough to give you some interesting options beyond just the plain single instances. They’re slightly more expensive than a single instance, but way faster.

Gotchas

If you’re planning on taking this for a spin, then I noticed a couple of gotchas that you might want to look out for.

You don’t run CFD on the head node…

…so don’t use an expensive instance. It would be nice if the management was serverless so that you didn’t need a dedicated head/management node at all.

Beware NAT gateway $$$

One of the setup steps innocently asks if you’d like to run your clustered instances in a private subnet behind a NAT gateway. Sure, that sounds like a good idea. Until you get the bill. In my limited tests, the NAT gateway cost almost as much as the compute 🤯 apparently that’s not unusual. Choose the other option.

Spot instances

If you’re running your cluster on spot instances (& you probably should) then you’re left with how to handle one or more of your instances being terminated without much notice. SLURM seems to handle pulling the job off the queue when this happens & you can probably get it to re-queue it with some SLURM-magic. But, I haven’t figured it out yet, so I can’t say how good it is (or how easy it might be to get into a mess).

Final Thoughts

I’m not using this in production yet. But, should I need to run a job on a cluster of instances then I’d jump on it immediately. For me, it just feels a bit more difficult to manage than my current setup. Maybe I’m misreading “different” as “difficult” - we’ll see as I get to grips with it.

I’ll put together some step-by-step notes at some point. But, in the meantime, do you use this in production? What’s your experience been like? Is this one of the missing pieces in your cloud CFD puzzle? Keen to hear your thoughts, as always, drop me a note anytime.

Until next week, CFD safely 🙏