Hacker News new | past | comments | ask | show | jobs | submit | cfors's comments login

While not strictly for RDBMS, I think this book is pretty close!

https://www.databass.dev/


Just wanted to say thank you for this article - I've read and shared this a few times over the years!


When Breath Becomes Air by Paul Kalanthi.

A fascinating memoir by a philosopher turned brain surgeon, facing a terminal cancer diagnosis. A person who spent their entire life pondering the morality of life being faced with their own ultimatum.

I reread it once a year, at minimum. A deeply moving book.


I was a labmate of Paul when I was starting my PhD. He was an incredible human, and such a fantastic writer.


It's a fantastic memoir indeed, very moving.

Love this quote from the book: "You can't ever reach perfection, but you can believe in an asymptote toward which you are ceaselessly striving".


Isn't that just a more verbose way of saying "You can reach for perfection."?


Everything good’s already been said. All that’s left is just a wordier retelling.


Like Dr. Wilson from Dr. House.


My system’s programming teacher spent an entire week of a 12 week semester (3 classes) going over how to use gdb/vim for one of our projects.

Still my favorite class and professor from my time at school.


Can anybody speak to how Vespa compares to some other Vector Database solutions? Seems like there's so many options today


Disclaimer, I work on Vespa.

If you look for just pure vector similarity search, there are many alternatives. But Vespa's tensor support, multi-vector indexing and the ability to express models like colBERT (1) or cross-encoders makes it stand out if you need to move beyond pure vector search support.

Plus, for RAG use cases, it's a full blown text search engine as well, allowing hybrid ranking combinations. Also with many pure vector databases like Pinecone, you cannot describe an object with more than one vector, if you have different vector models for the object, you need different indexes, and then duplicate metadata across those indexes (if you need filtering + vector search).

1 https://blog.vespa.ai/pretrained-transformer-language-models...


You might want to add a \ before the asterisk.


Could also make a guess that the blackout has changed how their normal traffic patterns operate, causing some issues with autoscaling/hot partitions.


Yeah, my assumption was that something in some layer of their application isn’t well optimized when asked to return posts from a subreddit that has “gone dark” in whatever fashions the mods chose to do that.

For example, maybe it causes reads from the database take a lot longer than they normally would, locking up the database or causing the process the crash (again, that’s just pure speculation).


one I've been wondering about is user overview pages. People use those a lot (it's actually my bookmark for getting onto reddit) and yesterday I noticed that a post I made wasn't in my overview, and it's because that sub had gone dark early.

What happens when a user has 99% of their posting in subs that are now hidden, and the API is programmed to produce a fixed 30 comments of history on the overview page? The answer is extremely deep database pulls... you might pull a year of comment history to get 30 comments that aren't hidden. And depending on how they do that, it may actually pull the whole comment history for that timespan, since most of the time posts aren't hidden like this.

I worked at a backend team at work with some very overburdened legacy tables in mongo, and this is the kind of thing we'd think about. Yeah you can use an index, but then you have to maintain the index for every record, and change it every time a sub goes private/public (and we literally were hitting practical limits on how many indexes we could keep, we finally instituted a 1-in-1-out rule). And how often does that happen? Even deleted comments are overall probably a minority such that indexes don't matter, but, this is relational data, you have to know which subreddits are closed before you can filter their results, and mongo sucks at joins. And the mongo instance can become a hotspot, so, just filter it in the application instead for those "rare" instances. Even if they are doing it in mongo, the index/collection they're joining may suddenly be 100x the size, which could blow stuff up anyway.

edit: for me, one overview page is now taking me back one month in comment history. And I comment a lot on subs that are currently closed, so it could easily be throwing away 5-10 comments for every comment it displays.


I'm guessing hit on the open subreddit mostly goes directly out of caching layer while hit on private one incurs DB hit to check whether user belongs there



I like this viewpoint when you are a small company searching for PMF and your entire backend can fit into a small DB.

At one point you'll have to reckon with IO costs and storage once that is over and then you have no choice but to exploit data locality more heavily.

Just saying, YMMV depending on how much data is in your database.


As somebody that has done both contracting and worked in tech, I hope this comment makes sense.

Most homeowners (product managers) want the cheapest (fastest) thing that fixes (delivers) a working home (feature) to them. When given the options those are generally what people choose.

Sure some contractors (software engineers) always cut corners, but I believe many take pride in their work and given the option would prefer a solution they take pride in.


I trust you have the numbers for this. And we can debate them until the world ends, but I think having a mental fallback that “yeah, some people suck” is valuable for your mental well being.

Life’s too short to let the few negative people out there bring you down. 1% or 5% whatever lets you realize that and move on


> I find it more efficient to work in topic-scoped batches, so I can load context on a protocol and codebase once and use it to land multiple changes.

This is my favorite way of writing software as well. My current gig has a ton of microservices, and when a feature comes up that requires changing one of them, I much prefer to make a couple other, smaller changes that help keep the service operational and easier to maintain with it.

One issue is that this often times bring out the yak-shaving, but I think it's a fair tradeoff and helps reduce the time burden of doing large migrations.


If yak-shaving in this case refers to polishing, I don't believe it's a bad thing per se for a crypto library - or any standard library, for that matter.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: