Yes, and actually, it is simpler than expected. The cached segments are written ...

orf · on March 21, 2023

You won’t be throttled if you partition your data “correctly” and have a reasonable amount of data in your bucket. That’s difficult to do, but more than possible.

zX41ZdbW · on March 21, 2023

We already partition our data correctly, as the AWS solution architects recommend, but there are limitations on the: - total throughput (100 Gbit/sec); - the number of requests.

For example, at this moment, I'm doing an experiment: creating 10, 100, and 500 Clickhouse servers, and reading the data from s3, either from a MergeTree table or from a set of files.

Ten instances saturate 100 Gbit/sec bandwidth, and there is no subsequent improvement.

JFYI, 100 Gbit/sec is less than one PCI-e with a few M.2 SSDs can give.

orf · on March 21, 2023

Ahh yes, sorry, you are running it on a single instance. That’s capped, but the aggregate throughput across instances can be a lot higher.

There are request limits but these are per partition. There is also an undocumented hard cap on list requests per second which is much lower than get object.

zX41ZdbW · on March 21, 2023

No, I'm creating 10, 100, and 500 different EC2 instances and measuring the aggregate throughput.

zX41ZdbW · on March 21, 2023

Worth to note that ClickHouse is a distributed MPP DBMS, it can scale up to 1000s of bare-metal servers and process terabytes (not terabits) of data per second.

It also works on a laptop, or even without installation.

zX41ZdbW · on March 21, 2023

100 instances of m5.8xlarge, 500 instances of m5.2xlarge

orf · on March 21, 2023

Something must be wrong, at my previous work we were able to achieve a much higher aggregate throughput - are you spreading them across AZ, and are you using a VPC endpoint?

We did use multiple VPCs, that might make a difference

zX41ZdbW · on March 21, 2023

Yes, the current experiment was run from a single VPC. It should explain the difference.

orf · on March 21, 2023

Why not randomise VPCs between instances? It’s fairly easy to do with a fixed pool and random selection.

zX41ZdbW · on March 21, 2023

https://repost.aws/knowledge-center/s3-maximum-transfer-spee...

zX41ZdbW · on March 21, 2023

https://imgur.com/a/uh8H9PO