More

cfors · 2024-10-21T14:22:13 1729520533

While not strictly for RDBMS, I think this book is pretty close!

https://www.databass.dev/

cfors · 2024-07-25T13:36:12 1721914572

Just wanted to say thank you for this article - I've read and shared this a few times over the years!

cfors · 2024-07-04T21:29:04 1720128544

When Breath Becomes Air by Paul Kalanthi.

A fascinating memoir by a philosopher turned brain surgeon, facing a terminal cancer diagnosis. A person who spent their entire life pondering the morality of life being faced with their own ultimatum.

I reread it once a year, at minimum. A deeply moving book.

etrautmann · 2024-07-04T22:17:19 1720131439

I was a labmate of Paul when I was starting my PhD. He was an incredible human, and such a fantastic writer.

TheAlchemist · 2024-07-04T22:18:30 1720131510

It's a fantastic memoir indeed, very moving.

Love this quote from the book: "You can't ever reach perfection, but you can believe in an asymptote toward which you are ceaselessly striving".

zanmat0 · 2024-07-04T22:39:46 1720132786

Isn't that just a more verbose way of saying "You can reach for perfection."?

briankelly · 2024-07-04T23:38:25 1720136305

Everything good’s already been said. All that’s left is just a wordier retelling.

Horffupolde · 2024-07-04T21:34:06 1720128846

Like Dr. Wilson from Dr. House.

cfors · on Jan 17, 2024

My system’s programming teacher spent an entire week of a 12 week semester (3 classes) going over how to use gdb/vim for one of our projects.

Still my favorite class and professor from my time at school.

cfors · on Oct 4, 2023

Can anybody speak to how Vespa compares to some other Vector Database solutions? Seems like there's so many options today

jkb79 · on Oct 4, 2023

Disclaimer, I work on Vespa.

If you look for just pure vector similarity search, there are many alternatives. But Vespa's tensor support, multi-vector indexing and the ability to express models like colBERT (1) or cross-encoders makes it stand out if you need to move beyond pure vector search support.

Plus, for RAG use cases, it's a full blown text search engine as well, allowing hybrid ranking combinations. Also with many pure vector databases like Pinecone, you cannot describe an object with more than one vector, if you have different vector models for the object, you need different indexes, and then duplicate metadata across those indexes (if you need filtering + vector search).

1 https://blog.vespa.ai/pretrained-transformer-language-models...

ComputerGuru · on Oct 5, 2023

You might want to add a \ before the asterisk.

cfors · on June 12, 2023

Could also make a guess that the blackout has changed how their normal traffic patterns operate, causing some issues with autoscaling/hot partitions.

mrzimmerman · on June 12, 2023

Yeah, my assumption was that something in some layer of their application isn’t well optimized when asked to return posts from a subreddit that has “gone dark” in whatever fashions the mods chose to do that.

For example, maybe it causes reads from the database take a lot longer than they normally would, locking up the database or causing the process the crash (again, that’s just pure speculation).

paulmd · on June 12, 2023

one I've been wondering about is user overview pages. People use those a lot (it's actually my bookmark for getting onto reddit) and yesterday I noticed that a post I made wasn't in my overview, and it's because that sub had gone dark early.

What happens when a user has 99% of their posting in subs that are now hidden, and the API is programmed to produce a fixed 30 comments of history on the overview page? The answer is extremely deep database pulls... you might pull a year of comment history to get 30 comments that aren't hidden. And depending on how they do that, it may actually pull the whole comment history for that timespan, since most of the time posts aren't hidden like this.

I worked at a backend team at work with some very overburdened legacy tables in mongo, and this is the kind of thing we'd think about. Yeah you can use an index, but then you have to maintain the index for every record, and change it every time a sub goes private/public (and we literally were hitting practical limits on how many indexes we could keep, we finally instituted a 1-in-1-out rule). And how often does that happen? Even deleted comments are overall probably a minority such that indexes don't matter, but, this is relational data, you have to know which subreddits are closed before you can filter their results, and mongo sucks at joins. And the mongo instance can become a hotspot, so, just filter it in the application instead for those "rare" instances. Even if they are doing it in mongo, the index/collection they're joining may suddenly be 100x the size, which could blow stuff up anyway.

edit: for me, one overview page is now taking me back one month in comment history. And I comment a lot on subs that are currently closed, so it could easily be throwing away 5-10 comments for every comment it displays.

ilyt · on June 12, 2023

I'm guessing hit on the open subreddit mostly goes directly out of caching layer while hit on private one incurs DB hit to check whether user belongs there

teaearlgraycold · on June 12, 2023

That’s what Deimos thinks: https://tildes.net/~tech/163e/reddit_appears_to_be_down_duri...

cfors · on April 21, 2023

I like this viewpoint when you are a small company searching for PMF and your entire backend can fit into a small DB.

At one point you'll have to reckon with IO costs and storage once that is over and then you have no choice but to exploit data locality more heavily.

Just saying, YMMV depending on how much data is in your database.

cfors · on Jan 6, 2023

As somebody that has done both contracting and worked in tech, I hope this comment makes sense.

Most homeowners (product managers) want the cheapest (fastest) thing that fixes (delivers) a working home (feature) to them. When given the options those are generally what people choose.

Sure some contractors (software engineers) always cut corners, but I believe many take pride in their work and given the option would prefer a solution they take pride in.

cfors · on Jan 1, 2023

I trust you have the numbers for this. And we can debate them until the world ends, but I think having a mental fallback that “yeah, some people suck” is valuable for your mental well being.

Life’s too short to let the few negative people out there bring you down. 1% or 5% whatever lets you realize that and move on

cfors · on Sept 13, 2022

> I find it more efficient to work in topic-scoped batches, so I can load context on a protocol and codebase once and use it to land multiple changes.

This is my favorite way of writing software as well. My current gig has a ton of microservices, and when a feature comes up that requires changing one of them, I much prefer to make a couple other, smaller changes that help keep the service operational and easier to maintain with it.

One issue is that this often times bring out the yak-shaving, but I think it's a fair tradeoff and helps reduce the time burden of doing large migrations.

Cthulhu_ · on Sept 13, 2022

If yak-shaving in this case refers to polishing, I don't believe it's a bad thing per se for a crypto library - or any standard library, for that matter.