This is a bit overblown. Is Iceberg "easy" to set up? No. Can you get set up in ...

simlevesque · 2025-03-06T16:04:19 1741277059

I think Iceberg can work in real time but the current implementations make it impossible.

I have a vision for a way to make it work. I made another comment here. Your blog posts were helpful, I digged a bit in the Duck Takes Flight code in python and rust.

whalesalad · 2025-03-06T17:34:30 1741282470

heads up the logo on your site needs to be 2x'd in pixel density it comes across as blurry on hidpi displays. or convert it to an svg/vector.

mritchie712 · 2025-03-06T19:43:21 1741290201

fixed!

pid-1 · 2025-03-06T17:19:51 1741281591

If you're already in AWS, why wouldn't you use AWS Glue Catalog + AWS SDK for pandas + Athena?

You can setup a data lake, save data and start doing queries in like 10 minutes with this setup.

thedougd · 2025-03-06T17:30:47 1741282247

These days you can 'just' create an S3 tables bucket. https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-tab...

tsss · 2025-03-06T19:18:00 1741288680

Athena is really expensive though and you will often run into a hard limit on the size of your query.

pid-1 · 2025-03-07T12:44:02 1741351442

Like most things serverless Athena is cheap as long as you don't use it.

My company has 100s of data pipelines that are executed infrequently.

For this use case Athena is ridiculously cheap and easy to use vs most other solutions.

fifilura · 2025-03-07T03:18:30 1741317510

I never found Athena expensive. Compared to employment cost it will be miniscule.

And some times, if your query is CPU extensive but the queried data size is not huge you can get a ridiculous value for money, like many CPU-days in 10 minutes for just $5 if your query covers 1TB after partitioning.

Query size limits are also configurable.

Obviously it depends on what data you are working on, but not having to set up and pay for a computational cluster is a huge cost saving.

mritchie712 · 2025-03-06T17:31:58 1741282318

Agreed.

A lot of people worry would worry about "vendor lock-in" here, but it's certainly convenient.