More

hivacruz · 2025-01-23T10:27:54 1737628074

I don't think it is "focused" on the JSON part of Redis, they have added a lot of commands from Redis, not only JSON. They are also working on storing vectors too for instance.

As for the drivers compatible, if you mean "clients", in our case we use the official Python Redis client with Kvrocks and it works perfectly with the commands we use.

hivacruz · 2025-01-23T10:22:26 1737627746

We use it for a year now in production and so far so good. We couldn't handle anymore having huge instances with a lot of RAM to hold data in Redis.

It looks like a normal database to me: disk storage for most of the data, and some cache in memory to speed up read queries. Everything is customizable.

I'm still waiting for a nice way to deploy it in a Kubernetes cluster, like a Helm Chart to easily setup a cluster with primaries and replicas. Also, the lack of keys eviction like LRU is problematic for us in some cases, it would be a nice addition.

mnutt · 2025-01-23T13:49:54 1737640194

I’ve been testing kvrocks as a replacement for a Scylla use case where we have TBs of data and uniform TTLs. In many of these LSM dbs, compaction is the thing that really kills throughput, making TTLs (or really any updates/deletes) difficult.

KVRocks doesn’t expose it directly right now so it requires code changes, but so far I have had good success with FIFO compaction. When an sstable gets old enough, it just gets dropped.

hivacruz · 2025-01-01T08:49:52 1735721392

I guess a senior engineer might be "linked" to a single kind of tasks (backend, frontend etc) while a staff engineer has knowledge in a lot of domains and can be the "bridge" for projects that need people from many different teams

throw5959 · 2025-01-01T09:07:56 1735722476

Nonsense.

hivacruz · 2024-12-30T11:10:20 1735557020

Instead of taking a image every 5 seconds from the video and embed it, you could detect when there are enough changes between frames to decide to embed or not. One frame, one scene, one vector.

For instance, Ffmpeg can do that with the filter `select=gt(scene,0.3)`. It selects the frames whose scene detection score is greater then 0.3 (the scene change detection score are values between 0 and 1).

https://ffmpeg.org//ffmpeg-filters.html#select_002c-aselect

sunnybeetroot · 2024-12-30T12:15:23 1735560923

I didn’t know this is existed but it sounds very handy, thanks!

okigan · 2024-12-31T07:12:07 1735629127

Don’t you need to apply filtering to the frame selection based on scene score?

Otherwise you’d select frames with 0.3, 0.7, 1.0, 0.7, 0.3 - selecting 5 frames instead of 1?

Two pass with sobel filter comes to mind.

Beefin · 2024-12-30T22:17:04 1735597024

thought the same thing and wrote this: https://blog.mixpeek.com/dynamic-video-chunking-scene-detect...

which uses https://github.com/Breakthrough/PySceneDetect

under the hood i'm sure it's the same ffmpeg method ;)

hivacruz · on May 29, 2024

Did you do use the same method, i.e. split by chunks each article and vectorize each chunk?

dudus · on May 29, 2024

That's the only way to do it. You can't index the whole thing. The challenge is chunking. There are several different algorithms to chunk content for vectorization with different pros and cons.

minimaxir · on May 29, 2024

You can do much bigger chunks with models that support RoPE embeddings, such as nomic-embed-text-1.5 which has a 8192 context length: https://huggingface.co/nomic-ai/nomic-embed-text-v1.5

In theory this would be an efficiency boost but the performance math can be tricky.

qudat · on May 29, 2024

As far as I understand it, context length degrades llm performance, so just because an llm "supports" a large context length it basically just clips a top and bottom chunk and skips over the middle bits.

rahimnathwani · on May 29, 2024

Why would you want chunks that big for vector search? Wouldn't there be too much information in each chunk, making it harder to match a query to a concept within the chunk?

nostrebored · on May 30, 2024

The problem is that often semantic meaning depends on state multiple paragraphs or sections away.

This is a coarse way to tackle that

gfourfour · on May 29, 2024

hivacruz · on April 10, 2024

PHP is a great language to learn OOP, classes, interfaces, abstract classes, traits, managing dependencies and unit tests. I'm not using it anymore but I learned basically everything with it a decade ago. Thanks PHP!

zelphirkalt · on April 10, 2024

A good language to learn OOP is Pharo or Smalltalk.

hivacruz · on April 4, 2024

How does it compare to Kvrocks, which use RocksDB as the storage backend too?

https://github.com/apache/kvrocks/

SableDb · on April 4, 2024

Also, it works mentioning that kvrocks is more mature and supports many more commands than what SableDb currently supports

SableDb · on April 4, 2024

It performs better and uses different design choices (for example: SableDb uses tokio's local task per connection, and in general it uses green threads to make the code more readable and easy to maintain).

I will release some design documents later on (hopefully this month). Remember that is a one man project (hopefully, not for long), so it takes time to organize everything :)

ChocolateGod · on April 4, 2024

I did some rookie testing between KVRocks and sableDB using Redis Benchmark

KVRocks

  PING_INLINE: 171821.30 requests per second, p50=0.183 msec
  PING_MBULK: 173310.22 requests per second, p50=0.191 msec
  SET: 115074.80 requests per second, p50=0.399 msec
  GET: 163398.70 requests per second, p50=0.271 msec
  INCR: 110741.97 requests per second, p50=0.415 msec
  LPUSH: 89847.26 requests per second, p50=0.487 msec
  RPUSH: 94428.70 requests per second, p50=0.487 msec
  LPOP: 86880.97 requests per second, p50=0.535 msec
  RPOP: 88339.23 requests per second, p50=0.527 msec

SableDB

  PING_INLINE: 90744.10 requests per second, p50=0.279 msec
  PING_MBULK: 90826.52 requests per second, p50=0.279 msec
  SET: 85763.29 requests per second, p50=0.311 msec
  GET: 87336.24 requests per second, p50=0.295 msec
  INCR: 68775.79 requests per second, p50=0.663 msec
  LPUSH: 36589.83 requests per second, p50=1.031 msec
  RPUSH: 38299.50 requests per second, p50=1.135 msec
  LPOP: 38051.75 requests per second, p50=1.191 msec
  RPOP: 37383.18 requests per second, p50=1.143 msec

KVRocks seems faster but certainly not a bad start

SableDb · on April 4, 2024

Sharing the build configuration (e.g. did you make sure to build `sabledb` in release mode?) + threads configurations etc, worth mentioning.

vlovich123 · on April 4, 2024

How well does raw Redis and/or raw RocksDB perform on your machine?

theossuary · on April 4, 2024

I like the idea of doing thread local execution of Tokyo tasks; I assume that means SableDb is mostly single threaded? Was this to reduce complexity, or for some other reason? I'm looking forward to the design doc on this!

SableDb · on April 4, 2024

It is multi-threaded (configurable, you can set it to a specific number configuration file, or use the magic value 0 where SableDb decides based on the number of cores divided by 2).

Each incoming connection is assigned to a worker thread, and two tokio tasks are created for the connection (one for reading and another for writing).

Using tokio allowed me to use the `async` code without using "callback hell" so the code looks clean and readable in a single glance without the need to follow callbacks

super_user · on April 4, 2024

Hi SableDb. I am looking for a tech cofounder in databases. Probably not the best place to ask for a cofounder. :-) Regardless, would you be interested?

scottlamb · on April 4, 2024

You might as well post it to the discussion of this article about why you won't find a technical co-founder. <https://news.ycombinator.com/item?id=39902372>

vlovich123 · on April 4, 2024

I’m potentially interested in a cofounder for my DB. Can you ping me on gmail to connect (username in profile)?

hivacruz · on April 2, 2024

Not affiliated (not my first comment about this) but we are using KVRocks[1] for now at work, which is based on RocksDB by Meta and it works nicely. Developers are nice and reactive and the Redis commands support is large.

We picked this project because of our RAM usage that was exploding with Redis.

The only downside for us right now is the Kubernetes support. There is an operator and a controller being made but no Helm Chart yet to deploy Kvrocks with master and replicas easily. That will be awesome.

[1]: https://github.com/apache/kvrocks

hivacruz · on March 31, 2024

For a few recruitments, we asked the candidates to create a front app like this with React. It was quite nice as we could quickly see how they use the library, what they know etc.

Simple app but funny game.

hivacruz · on March 31, 2024

As an alternative YouTube Front-end, there is also Piped[0][1]

[0]: https://github.com/TeamPiped/Piped

[1]: https://piped.video