Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've built https://search.marginalia.nu/ from scratch, as a solo hobby project. It's literally just a computer in my living room.

Hardware investment is about $3-4k as a one-time cost and then I estimate I'll need a 1 Tb SSD per every couple of years as the server does kind of chew through them with great appetite.

My monthly operational costs are $15 in power, and $20 for cloudflare because I kept getting DDoS:ed by botnets.

As for development time, dunno, I've been working on it in my spare time since this spring some time, generously estimated 30h/week x 30 weeks, so the upper bound may be 900 hours, but probably closer to something like 600 hours as I have other projects as well, and I'm not always feeling it.

I don't think off the shelf search solutions or databases are viable, they are too flexible which means they can't be fast and space-efficient enough to keep cost down. They're meant to run in a data center, not on a single computer. That means your operational costs will be prohibitive.

It's required a lot of old-fashioned wizardry to build though, bit-twiddling and demoscene-esque hacks to coax a lot of data into a minimal amount of space, the type of microoptimization stuff that usually is a waste of time except the data set is so large that saving single bytes in object encoding often translates to saving multiple gigabytes. If you aren't at least fairly comfortable with building custom compression algorithms, memory mapped hash tables, things like that, it's gonna be a rough project. If I didn't have a background in low level programming, this would have been nearly impossible.

Beyond that, most of this stuff you can pick up along the way. I didn't really know shit about building search engines before I started. I just threw together a design that made sense and built... something, and iterated upon that. With every iteration it's gotten faster, smaller, better, smarter. I think the upcoming release is gonna be yet another huge improvement.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: