Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Free and open source distributed database written in C++ or C
50 points by _448 on May 17, 2022 | hide | past | favorite | 58 comments
Any recommendation for a free and open source distributed database that is written in C++ or C?



Clickhouse for OLAP and ydb for OLTP are both in C++. Both are used at production scale in Yandex. Clickhouse has been around for a long time while ydb is pretty new.

Clickhouse - https://github.com/ClickHouse/ClickHouse

ydb - https://github.com/ydb-platform/ydb


now this looks much better than mysql/postgres cluster solutions. do you know comparisons to the two bad ones?


Here's a query on Carnegie Mellon's "Database of Databases" that might help you find some answers:

https://dbdb.io/browse?programming=c&programming=cplusplus&s...

(note that this doesn't filter on license -- I couldn't see a way to add a negative-filter on proprietary licensing -- so you may have to inspect a few of the results to confirm)


Yes, I need to fix the search. It's quite primitive right now.


You could post it on DoltHub… that would give you a web interface with SQL support, version tracking and a way to receive community edits.

https://www.dolthub.com/


Went searching for https://redpanda.com/ and couldn't find it in the DB, fyi :)

Not affiliated, just saw the Jepsen report posted here a few weeks ago.


HN would probably come up with better answers if you told us the problem you're trying to solve instead of the solution you seek.


How about assuming that op is asking exactly the thing they want instead of thinking they have asked the wrong question.


I never said they asked the wrong question. I just said we could give them better answers if we had the context of what problem they are solving.


I personally wish people would not ask that question. Rarely is context required. The question is often very clear. And when one does provide context, the most common response is a value judgement on the question -- "why would you want that". It's honestly a waste of everyone's time.

Wish people would just move on if they see a question that they don't know the answer to.


The person asked an easily google-able question, so I assume they want an answer deeper than a simple list. The only way to give that answer is with more context.


https://www.yugabyte.com - Postgres fronted distributed OLTP database

I work there (300+ ppl). It's primarily open source. It makes money through support contracts and cloud hosting, but all core development is done in the open.


Never used it before but check out https://github.com/scylladb/scylla

I have a lot of experience with Cassandra though.


Scylla's main competitor in replacing C* with native code.

https://github.com/aerospike/aerospike-server


Yugabyte is an open source distributed database that has a Postgres compatible query layer as well as a Cassandra compatible one.

https://www.yugabyte.com/


It is like cloud spanner and open source.


What are you trying to do, and why the C++ requirement?


Isn't a requirement. OP stated C is acceptable.



Erlang's Mnesia has parts in C; well the underlying databases ets and dets are in C, the distributed part is in Erlang.


Whatsup used, still uses? Mnesia.



MySql/Postgres? What am I missing?


The "distributed" part.

While distributed could mean a lot of different things, I'm guessing the OP wants either horizontal scalability, or bidirectional replication between multiple regions/datacenters. And while it's technically possible to do those with mysql or PostgreSQL, they definitely don't work out of the box, and require a lot of careful architecting, and it isn't really how they were originally designed to be used.


Also technically redis?(distributed might not be the authors version of distributed)


redis has clustering capabilities right out of tge box. So do mysql and mariadb. Postgres needs an extension like citus, however.


I don’t think they’re distributed without third party extensions?


Does PostgreSQL not fit your needs?


Don't forget the spatial extensions => https://postgis.net/


you're getting downvoted, but Citus would probably fit the bill here.


MySQL Innodb Cluster? https://dev.mysql.com/doc/refman/8.0/en/mysql-innodb-cluster... targets C++17 nowadays



PostgreSQL with Patroni and Citus


What are the other aspects?


Surprised MongoDB hasn't been mentioned yet. It's C++


It is not FOSS.


Yes it is




I don’t think you read that entire article (or even looked at the sidebar). SSPL is copyleft, but not FOSS.

From that article:

> In January 2021, following the re-licensing move by Elastic, OSI released a statement declaring that the SSPL does not comply with its Open Source Definition because it discriminates against specific fields of endeavor, describing it as a "fauxpen" source license.

> Debian FSG compatible No

> OSI approved No

> GPL compatible No


I read it

I don't hold to the extremes OSI does as to whether something fits their esoteric view of "open source", however


It's not just OSI. Debian, FSF, etc. all don't see that as free software specifically because it restricts the ways that you can use the software - thereby defeating the point of having the source code in the first place.

That's no different than some EULA saying you can't do xyz with this software without purchasing a $xxx,xxx license from the vendor or whatever: it's the same effect. It's an artificial boundary put in place that restricts the user from doing something.

I don't find that particular view "esoteric" and I'm curious about why you do.


Exactly. It is fine to call it "source available" or something.




Looks abandoned. The site with the documentation is down. Here's the docs on archive.org, last up in 2012...: https://web.archive.org/web/20210109022446/https://github.co...


Redis?


Check lut cmu database group on youtube, ghe feature a lor of databases


Dont most of them match your specs?


Doesn't exist.


I have one in Java: https://github.com/tinspin/rupy

(Which is better for a server, Java does not crash and it can do atomic shared parallelism between cores with OO code something you would need to use arrays to achieve in C/C++ without VM + GC.)

Here is the 2000 lines of code of the entire database: http://root.rupy.se/code?path=/Root.java&space

Darkmode: http://edit.rupy.se/?path=/Root.java&space

And here you can try it out: http://root.rupy.se


Java does not crash and it can do atomic shared parallelism between cores with OO code something you would need to use arrays to achieve in C/C++ without VM + GC.)

There is no universe where any of this is true. It doesn't even make sense in any coherent way.

https://en.cppreference.com/w/cpp/atomic/atomic


Have you even used Java?

"While I'm on the topic of concurrency I should mention my far too brief chat with Doug Lea. He commented that multi-threaded Java these days far outperforms C, due to the memory management and a garbage collector. If I recall correctly he said "only 12 times faster than C means you haven't started optimizing"." - Martin Fowler https://martinfowler.com/bliki/OOPSLA2005.html

"Many lock-free structures offer atomic-free read paths, notably concurrent containers in garbage collected languages, such as ConcurrentHashMap in Java. Languages without garbage collection have fewer straightforward options, mostly because safe memory reclamation is a hard problem..." - Travis Downs https://travisdowns.github.io/blog/2020/07/06/concurrency-co...

"Inspired by the apparent success of Java's new memory model, many of the same people set out to define a similar memory model for C++, eventually adopted in C++11." - https://research.swtch.com/plmm


Copy and paste actual lines that can't be done in C++. Show me something that is 12 times faster in java or explain in detail the underlying mechanics of something that can't be done in C++.

You said C++ can't do atomics which is like starring at paint and saying it can't paint in the color that it is. What is it that you think the atomic C++ functions do?

This is all before even talking about whatever you were saying about "needing arrays in C++" or needing VMs, which is so bizarre I barely even know where to start.

Why do you say things like this if you can't actually explain them?


Because I have not written my own memory model, why are you refuting them when you cannot explain why they are wrong?

What I mean with C++ not being able to do atomic parallelism on OO is that you need a VM with GC to do that.

The best way to do atomic (and avoid cache misses in one go) is to use primitive arrays (char/int/float) so C++ has zero value, I'm going back to C.

The guy quoted in the first quote is the proffessor that rewrote the whole JVM with a new memory model to add concurrent package in 1.5, if you don't believe him you are on you own.

I'm still waiting for C++ guys to actually use Java on the server and still claim C++ is better. crickets


Because I have not written my own memory model,

What does that even mean? A memory model isn't a library.

why are you refuting them when you cannot explain why they are wrong?

There is no evidence to refute. You keep making claims with no explanation, no reasoning and no evidence at all. This isn't how reality works. When you claim something the burden of proof is on you. Extraordinary claims (like java being 12 times faster than C or C++) require extraordinary evidence. You gave zero evidence.

The best way to do atomic (and avoid cache misses in one go) is to use primitive arrays (char/int/float)

This is you repeating the same thing (even though it makes no sense), this isn't evidence.

This isn't even how CPUs or their instructions work. Data types only matter in atomic math instructions on integers, but you lumped floats in there for some reason too. Arrays have nothing to do with it (atomic should have been a clue). Why do you believe this?

so C++ has zero value, I'm going back to C.

What is it that makes any difference here? Why can't you demonstrate anything with an actual program? Use godbolt.org and you can prove what you're saying.

The guy quoted in the first quote is the proffessor that rewrote the whole JVM with a new memory model to add concurrent package in 1.5, if you don't believe him you are on you own.

I believe actual evidence. You seem to have so little understanding of these things that if this person knows what they are talking about, you must not understand what they are saying. Show something instead of making insane claims.

I'm still waiting for C++ guys to actually use Java on the server and still claim C++ is better. crickets

I use C++ 'on the server' but it's bizarre that you think this has anything to do with the conversation. The things you keep saying are not just technically incorrect, they have no connection to how anything actually works.


The reason I'm preaching without proof is that I have run a service with 350.000 customers (1100 concurrent) that proved all this.

I don't need to prove anything to myself, when I know it is true because I witnessed it.

You have sunk cost so obviously you want tme to prove that I'm right, but I don't have the time. It's too complex and time will solve all proofs automatically.


Thanks. I was looking for something written in Java. I'll check it out.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: