Issue 107 – July 30, 2022

Data Hoarding

Hey there,

It’s Robin from CFD Engine & this week I realised I’ve got a problem โ€“ I’m a data hoarder & it’s time to do something about it.

The catalyst for this revelation was a message from a friend saying they’d just discovered their IT department has been auto-deleting files older than 90 days including their CFD results ๐Ÿ˜จ

My internal monologue went something like…

  • “Ha, I’m glad I don’t have an IT department. No-one is deleting my cases”
  • “Anyway, I do all my CFD in the cloud, I’ve got unlimited storage”
  • “Hold up, no one’s deleting my cases & I have unlimited storage โ€“ how much is in there?”

So I took a look & I have 8.4TB of CFD data on-hand ๐Ÿ˜ถ maybe it’s time to revamp my storage “strategy” a bit?

What I’m supposed to do

I archive each run so that I can quickly revisit the solution without having to re-run it โ€“ great for when you need to do a little extra post-pro or quickly check something.

My cases are typically steady-state, so I’ll archive the mesh, the dictionaries & the converged solution as a single bundle. I’ll also archive the post-processing, forces, logs & input geometry separately, but they’re too small to get excited about.

I’ll keep the archived cases around for a year after a project finishes (at least) just in case a client needs something. After that, old cases can be deleted.

What I actually do

I never delete anything.

The cost of adding a new case to the archive is so low that I don’t even think about it โ€“ here are some real numbers from a recent project…

One 37M cell case (mesh & solution โ€“ tarred & zipped) weighed in at 6.1GB and costs me around 14ยข per month to store it on the most expensive (i.e. the highest availability) tier of AWS’ Simple Storage Service (S3).

For context, it would cost between $6.50 & $13.10 to re-run that case from scratch (depending on how good my spot instance discount was). Therefore, I could keep the case in storage for at least 4 years before the cost to store it outweighed the cost to re-run it. By that point I’ll have moved on to something else & forgotten all about it, unlike the billing robots.

The monthly cost creeps up very slowly &, as I’m not forced to delete anything, I don’t pay enough attention to what I’m keeping & how long it’s been there. I’m hoarding data that I’ll never look at again โ€“ Marie Kondo would not approve.

What I’m going to do

I reckon the IT department were onto something, it’s time for my own version of the rolling delete that started this introspection.

I’m not brave enough to go for a 90-day period though, instead I will…

  • transition the cases to a cheaper storage tier after 60 days for a 50% cost saving. They’ll still be on-hand if needed, albeit with a small access charge.
  • delete cases from the archive after 12 months ๐Ÿ˜ฌ long enough for a client to come back if they need something, or to grab an old case to start a new project.

This can all be done automagically using S3 lifecycle rules โ€“ taking me out of the loop, and guaranteeing the changes will actually happen ๐Ÿคž

Maybe this will be the first step towards that rolling 90-day delete, but for now it feels like it strikes a balance between having data on-hand & reducing my ongoing storage costs.

What have I missed?

You’ve probably got your own storage strategy, after all it’s not much good churning out results if you’ve got nowhere to store them.

Is someone deleting your old cases on a rolling loop or do you have regular Friday afternoon pruning sessions where you clear out old cases to make space for your weekend jobs (nobody wants to be the one who filled the disc in the early hours of Saturday & crashed the queue).

Perhaps you’re continually expanding your storage to be able to hoard it all (like me)?

I’m keen to hear from you โ€“ what does your storage strategy look like?

Have you’ve managed to curb your data hoarding? If so, tell me how you did it.

Or drop me a cautionary tale about the time you dug a case out of the archive & saved a project โ€“ maybe I won’t delete those cases after all ๐Ÿค”

Let me know ๐Ÿ™

Until next week, stay safe,

Signed Robin K