Do you really need HPC for your CFD?
Let’s get this straight, right out of the gate — there are many, many CFD workflows & mine isn’t going to suit everyone. Some of you will be very attached to your HPC and unlikely to give any of this a second thought. Equally, some of you will be perfectly happy running on a single workstation. This won’t offer much to you guys either. So this isn’t about how you should do CFD without HPC. But just how I do cloud CFD without HPC.
What is HPC?
InsideHPC have a pretty good definition of what they consider HPC (High Performance Computing) to be:
“High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business.”
High Performance Computing is a pretty broad church that is sometimes mistaken for supercomputing. But I’m going to concentrate on cluster computing which is much more accessible to most of us than supercomputing. And by cluster computing I mean
“collocated, interconnected collections of individual computers that are orchestrated and controlled to work like one large machine”.
But why does CFD need so much compute power?
CFD simulations can be BIG. Problem sets routinely run to many millions (if not billions) of cells. And, as many (computer) hands make light work, we’re always hungry for compute power. More compute nodes please. Especially if you want your results in a timely manner. But it not just about the number of nodes or cores. Simulations are hungry for memory too.
A large CFD simulation can’t typically fit into the memory of a single machine. So it’s chopped into smaller chunks and shared amongst the nodes of the compute cluster. Unfortunately the simulation doesn’t just sit in memory, it’s data needs to flow around the cluster. And, as any fluids dude knows — big pipes help fast flow. This is why compute clusters usually include fast, high-bandwidth interconnections (big pipes) which allow information to pass between the cluster nodes as efficiently as possible.
We aren’t done yet though, the compute nodes need big internal pipes too. The processor in each node needs quick access to it’s own memory, that’s why they’re cuddled up on the same motherboard. However, if there’s too much memory on a board, accessing it can become a bottleneck. If the amount of memory available & the pipe connecting it to the processor aren’t matched then the computer can become memory bandwidth limited. It was historically tight memory bandwidth that drove the growth of CFD cluster sizes.
Parkinson’s Law of CFD
Are you familiar with Parkinson’s Law? Where “work expands so as to fill the time available for its completion.” Well, if Cyril Parkinson had been a CFD engineer, rather than a naval historian, he might have noted that:
If you’ve got the hardware you might as well use it, right? But what happens if you don’t have hardware? What if you use on-demand computing and pay-as-you-go? Do CFD models have the same tendency to inflate when you can put an exact figure on the dollar cost of a solution? Or does that change how you think about your CFD models and hardware?
My workflow
I can’t say whether I adopted this workflow to fit the resources available or that I chose my resources to fit this workflow (which came first, the chicken or the egg?). It’s unusual to talk about how few compute resources you use but here goes.
My CFD projects are design driven. Design iterations usually arrive (or are produced by me) in batches. My aim is to turn these batches around overnight (an aim — not a service level agreement or guarantee). This leads to very peaky demand for compute power. I’ll frequently have no runs for a few days & then 30+ runs to be completed by tomorrow.
Simulations run independently, starting as soon as they’re ready to go. Each simulation requests a single 16-core (2x8) node with 60GB RAM and SSD storage. The resource is then terminated on completion. My models are pretty modest, weighing-in at less than 40million cells, so it’s quite possible to turn them around overnight on that hardware.
As Lean CFD has it’s Minimum Viable Model, this, is my Minimum Viable Infrastructure.
Why not use traditional HPC?
Short answer — I don’t need it (yet).
Longer answer — it doesn’t add enough value in these important areas of my workflow:
Scalability vs Performance
HPC rules when performance is important. But in my workflow, scalability trumps performance every time. For a given cost, I choose to run lots of jobs relatively quickly rather than one job very quickly. I’d choose 30 jobs over a single job at 30x speed any day.
No Queue
I’m a Brit & therefore I love queuing. But there is no queue in my workflow. Jobs start on a dedicated machine as soon as they’re ready to go. Therefore, no job is dependent on (or can get in the way of) another job. If a node fails or a job hangs, it only affects itself and not whatever jobs were to follow. But much more importantly, no client is ever waiting for another client’s jobs to be processed. This makes it easier to maintain overnight processing for all clients, irrespective of workload.
What’s the value of going faster?
I can’t process jobs faster without incurring higher costs, so what is the value of a faster simulation? In my case, with my clients, overnight is more than sufficient. They can’t act on results faster than that. Maybe their design process is too long or their resources are too tight or they’re just juggling too many plates. Reducing the simulation time wouldn’t help them. So it offers little additional value to them & significant extra cost for me. For example, running (slightly-less-than) twice as fast across twice as many nodes will cost exactly twice as much — is it worth it? On-demand (cloud) computing may be contentious in some areas but it’s excellent at rubbing the dollar cost of performance in your face.
Besides, there’s more to speed than just hardware. Solver changes can have a dramatic impact at very little cost. I’m thinking of the recent 3.5x speedup from the introduction of SIMPLEC in OpenFOAM.
pitzDaily example case now runs with SIMPLEC in #OpenFOAM-dev; with optimized relaxation factors, get 3.5x speed up: https://t.co/zimGY6KZZa
— CFD Direct OpenFOAM (@CFDdirect) June 25, 2015
How much would a 3.5x speedup cost as a hardware upgrade?
Why do you do it your way?
I know that this (along with a lot of my process) wont be a good fit for everyone. Some of you will need more cells, more memory, more nodes or shorter runs times than I do. Some of you would be waiting months for your simulations to come back if you only used 16 cores? I’m not suggesting you adopt my workflow. But I am suggesting you inspect yours. If you can’t answer the question “why do you run on that number of cores?” then it should be a useful exercise. Try also asking yourself “what’s my minimum viable infrastructure?” There might be a better way.
Then again, if your boss comes knocking with some year-end budget to spend, grab yourself some extra nodes. But beware. While you’re not looking, your models will slowly inflate to fill the new nodes & you’ll be back to where you started #justsaying