Hacker Newsnew | past | comments | ask | show | jobs | submit | qoega's commentslogin

It is meant for single reader/writer workload so not meant to be used as a service


In ClickHouse it is just `INSERT INTO t FROM INFILE 'data.csv.gz'`. Any supported format, any encryption, autodetected from file name and sample data piece to get column types, delimeters etc. Separate tools to convert CSV are not necessary if you can just import to db and export as SQL Statements.

echo "name,age,city John,30,New York Jane,25,Los Angeles" > example.csv

clickhouse local -q "SELECT * FROM file('example.csv') FORMAT SQLInsert" INSERT INTO table (`name`, `age`, `city`) VALUES ('John', 30, 'New York'), ('Jane', 25, 'Los Angeles');


Are you guy comparing 16vCPU/32GB vs 8vCPU/32GB and say yours is only 1.6 faster?


Hi there CEO of Tablespace here, no we are comparing 16CPU/32GB which Tablespace used to 24CPU/96GiB which ClickHouse cloud uses.Tablepsace is 1.6x faster running the benchmark even though we are using a much smaller compute shape.


Now you rarely use basic MapReduce primitives, you have another layer of abstraction that can run on infrastructure that was running MR jobs before. This infrastructure allows to efficiently allocate some compute resources for "long" running tasks in a large cluster with respect to memory/cpu/network and other constraints. So basically schedulers of MapReduce jobs and cluster management tools became that good, because MR methodology had trivial abstractions, but required efficient implementation to make it work seamlessly.

Abstraction layers on top of this infrastructure now can optimize pipeline as a whole by merging several steps into one when possible, add combiners(partial reduce before shuffle). It requires whole processing pipeline to be defined in more specific operations. Some of them propose to use SQL to formulate task, but it can be done using other primitives. And given this pipeline it is easy to implement optimizations making whole system much more user-friendly and efficient compared to MapReduce, when user has to think about all the optimizations and implement them inside single map/reduce/(combine) operations.


You can even migrate your zookeeper to ClickHouse keeper. It requires small downtime, but you will have all your zookeeper data inside and your clients will just work when your keeper will be back


Did not expect to see issue I created


Open-source ClickHouse also allows both real-time and large historical data.


QuestDB, kdb+ and others mentioned are more geared toward time-series workloads, while Clickhouse is more toward OLAP. There are also exciting solutions on the streaming side of things with RisingWave etc.


I think atwong just promotes his product https://news.ycombinator.com/threads?id=atwong


It is nice to have an image of expected dashboard in readme.


Passion and experience


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: