Hey there,
It’s Robin from CFD Engine, back with another slice of CFD. This week, I’m sifting through the 175 😲 AWServices to highlight the 5 that I actually use & what I use ’em for.
I’ve had this idea of “the 5 pillars of cloud CFD” kicking around since I wrote my OF on AWS course a few years ago. They represent the minimum we need to do CFD in the cloud & look something like this:
- Compute Power – somewhere to mesh, solve, post-pro etc
- Storage – somewhere to store our data – before, during & after our simulations
- Access – we need to be able to get at our machines & data easily
- Data Transfer – we need to minimise the amount of data we’re transferring but, when we do want to transfer data, we want to do it conveniently & securely
- Security – we don’t want just anyone being able to access our data, or use our compute power
It helped structure the course, but it also works as a framework to outline the AWServices that I use & how they fit into the bigger CFD picture…
Compute Power
The most obvious of all 5 pillars.
AWS has loads of options for compute & I don’t just mean the different instance types.
There’s the one that you can use to stand up a Wordpress instance. There are ones for managing & automating the running of containers (Docker & friends). A couple of serverless ones where you upload your code & it takes care of running it on an appropriate machine.
All good, but maybe not for CFD? We’re most interested in the core service behind it all, Elastic Compute Cloud, also known as EC2.
With EC2 you select your compute from a menu of different pre-configured instances & pick the one closest to your needs (as opposed to requesting a custom mix of cores, memory & storage).
The different instance types are aimed at different use cases. The ones loosely aimed at our use case are the compute-optimised family. They feature many cores, a decent hunk of memory plus relatively fast interconnects. Plus you can string them together to make your own virtual cluster, complete with queuing & auto-scaling.
Storage
If there’s one thing that we CF-Do-ers are good at, it’s generating more data than we know what to do with. AWS has us covered for places to put it, with the biggest (pun intended) being their Simple Storage Service aka S3.
S3 is an object store, a place to put your files that looks a bit like a filesystem, but it isn’t really one. It does not make a good place to run your simulations from, too slow (& potentially expensive).
I typically upload a new case to S3 before I run it, copy it to EC2 while the simulation is in progress & then archive it back to S3 once the job has finished.
There a few options for storage on EC2, from solid-state drives connected directly to the instance, through to a shared network drive & even fast parallel storage – depending on how much you’re writing & how often.
Top Tip: Storage on S3 is essentially bottomless, so be careful what you keep & how long you keep it. Things can get expensive pretty quickly when storing CFD-sized data.
Access
I recommend using the AWS command-line interface to interface with AWS from the command-line 😜 Joking aside, you spend plenty of time on the command-line anyway, so why not access AWS from there too?
You can do most of you day-to-day tasks with this toolkit, including uploading/downloading data from S3 and starting/stopping instances. I use their managed encryption keys to ensure that my data is encrypted in transit & at rest.
I use SSH (or MOSH) to connect to my instances using a private key. For long running instances, I use the AWS console to turn off external incoming traffic once the simulation is running. That way, no-one (including me) can connect to them.
Data Transfer
As I mentioned above, I use the command-line tools to copy data to & from S3. Data transfer can be expensive on all cloud platforms. Particularly so in our case where we might want to grab multiple multi-GB files. I try to keep this kind of file transfer to the absolute minimum.
A new OpenFOAM case directory (control files only) is typically less than a MB when tar-ed & zipped, trivial to upload. The geometry files are the big bit. I maintain a library of geometry files on S3 to avoid having to upload geometries multiple times per project.
For downloads, I usually only download the post-processing output (animations ~30MB) and have the job email me a force summary. If needed, I’ll grab the log files from S3 and occasionally I’ll bring home the exported surfaces to load into a local ParaView (< 200MB).
I almost never download the whole case (multiple GB). If I need to dig into the volume data then I’ll use ParaView remotely(client-server style).
Security
I use AWS Identity & Access Management (IAM) to control who can do what with my data & compute resources. It’s probably the most complicated (& frustrating) bit of the whole stack to get to grips with. Everything seems to take multiple iterations to get it how you want it (including occasionally locking yourself out of stuff 😳). But, once you’ve cracked it, you don’t need to touch it too often.
It’s worth the effort (as opposed to just doing everything with the equivalent of root access 😬) even if it’s just for the peace of mind that you’ve at least tried to keep things tight.
Anything else?
I think that’s it. I might have missed something, but if I have, it’s because it’s a now-and-again tool, rather than part of the core toolkit.
I’m starting to look at the developer tools (continuous integration/continuous deployment tools), the lighter-weight databases & the API tools for a big CFD Engine update (more on that another time 🤫).
Other than that, is there a service that I’m missing out on? What do you use that I should check out?
How about the other main public clouds? I’m sure these services are “table stakes” for all the big players (Azure, Google etc). If you’ve used these platforms, do they have something extra that might turn a CF-Do-er’s head?
It’s been good to hear from some of you after recent issues, drop me a note anytime, on anything – my inbox is always open.
Until next time, stay safe