Hacker News new | past | comments | ask | show | jobs | submit login

First of all, the fact that you're asking this question puts you ahead of most engineers that I know. There's a well known saying that goes something like "Make it work, make it work well, then make it fast."

One of the simplest ways to think about scale is to think in terms of speed. This is a very very gross oversimplification and glosses over a lot of really important concepts, but at its core, you can say "if it's fast enough, it'll scale."

In a very simple mathematical sense, consider the idea that you have a single-instance, single-threaded application with no concurrency. If a request takes 1000ms to run, then you can do, at most, 1 request per second. If the request takes 100ms, you can do 10 requests per second. If it takes 10ms, you can do 100 request per second, and if it takes 1ms, you can do 1000 requests per second.

See? Speed is throughput is scale.

But that is, obviously, an oversimplification of the problem. Real applications are multi-threaded, multi-instance, and offer concurrency. So now the problem is identifying your bottlenecks and fixing them. But again, at its core, the main idea is speed. How can you make things as fast as possible?

(Note: There is a need to consider concurrency and parallelism, plus certain data stores have inherent speed limitations that may need to be overcome, and those things can offset poor speed, but the simplest path to scalability is speed and optimizing throughput.)

The analogy I like to use is the grocery store. Imagine you own a grocery store, and you want to make as much money as possible. Well, the best way to do that is to make sure your customers can get their food and check out as fast as possible. That means making sure the food is easy to find (i.e., read access is fast!), that they don't have to wait to check out (i.e., queue depth is low), and that checking out is fast (i.e., writes are fast). The faster your customers can walk in the door and back out again, the more customers you can sustain over a period of time.

On the other hand, if your customers take too long to find their groceries, or they spend too long waiting in line, or they have to write checks instead of swiping a smart phone, then you wind up with a backlog. And the larger the backlog, the longer it takes for money to hit your bank account.

So in this sense, time is literally money. The faster they can get through your system, the better.

I mentioned three different ways of thinking about speed: reads, writes, and queue depth.

Keeping with our grocery store analogy, consider how to improve each of those things. How do you make sure your customers can find what they're looking for as fast as possible? You "index" things. You put signs on the aisle, you organize your content in a way that is intuitive and puts related things near each other. If you want spaghetti, the pasta and the sauce and the parmesan cheese are all right next to each other. If you want breakfast, the eggs and milk and cinnamon rolls are right next to each other. In and out.

Similarly, your data needs to be organized smartly so that the user can get in and out as fast as possible. In a database, this means optimizing data structures, adding indices, and optimizing queries. Reduce expensive queries, keep cheap fast queries. Find ways to cache hot data. Make it easy to find what you need.

For writes, how do you speed up writes? One way is to make things asynchronous. Throw things that can be eventually consistent into queues and let an asynchronous job handle it outside the normal flow. The customer experiences minimal latency, and you've introduced concurrency to keep the data flowing while the customer is doing something else. This is, in part, why those little screens at the checkout counter ask you so many questions. They're distracting you while the cashier is scanning your groceries.

Queue depth optimization is important as well. If the queue gets really long at the grocery store, how do you improve that? You add more cashiers! The more cashiers you have, the more concurrent customers you can handle. But does it make sense to have 1 cashier per customer? Probably not. Now you've overscaled and you're spending too much money.

As you can see, this is a complex operation, and again, my analogy is overly simplified and very dumb, but I hope this gives you a decent idea of how to visualize a scalability problem.

I'm not familiar with Elixir, but frankly the concepts should translate to any language, although the details my vary.

My suggestion? Learn how to do profiling, identify bottlenecks, and target the biggest bang for your buck. The big risk here is micro-optimization, so fight for changes that give you order of magnitude improvements. Saving 50 microseconds isn't worth your time, but shaving off 1500 milliseconds almost certainly is.

Best of luck.




As someone working in an extremely small scale world I don’t get opportunities to tackle scaling problems, so this was a very nice read. Thanks!


Even at small scales, speed is extremely important! I don't know if it's still true, but I talked to an engineer at SmartyStreets several years ago, and they said they could serve up 100,000 requests per second on a Raspberry Pi. At the time I didn't believe them, but since then I've developed several systems that could absolutely do that.

At small companies, every dollar counts. You could save a bajillion dollars by serving up all your traffic on an AWS micro or small instance, instead of larger machines!

So even at small scales, it's worth figuring out how to make things fast. That makes scaling later much easier!


Thanks! Why would you want fast read access in the customer example? More customers looking for items means a smaller queue


You mean "customers in the aisles means shorter lines at checkout"? That would be true if you're optimizing for short lines at checkout, but what a business is really optimizing for is sales over time, and so you want as many people to come in, buy their stuff, and leave in as short a period as you can. The more people you can get in and out of your store, the more money you make.

The more time they spend wandering the aisles (or browsing your site), the more opportunity for them to say "I can get this somewhere else faster/easier/cheaper." Every second they aren't punching in payment details is a moment for the baby to cry, for someone to call, for the boss to give them an assignment, for a bathroom break. Any of those things can totally break the moment and cause the customer to abandon their cart and leave.

Solution? Don't let them hang around. Get them to their goal as fast as possible and keep the total transaction time -- from landing on the site to checking out -- as short as you possibly can.

To put a more concrete example, imagine a company like Instacart. If it takes you more than an hour to fill up your cart on the site -- for whatever reason! whether bad organization, slow response times, whatever -- then you might as well just go to your local grocery store yourself! You can almost certainly be in and out of your local store in less than an hour.

The value prop that Instacart has is "you never have to go to the grocery store again, because it's easier to order from home." But if it's harder to order from home, then what's the value of Instacart? (Again, I'm oversimplifying here, but this is the gist of the value prop. Instacart doesn't sell groceries -- it sells TIME. It sells that hour of your life back so you can spend it with your family or playing video games or arguing with me on HN.)

And so in terms of scalability, Instacart wants you to land on the site, add everything to your cart, and get the order placed as fast as possible. And to do that, everything needs to be fast. The category pages need to load fast, the product detail pages need to load fast, your cart needs to load fast, the checkout pages need to load fast. The faster everything is, the quicker -- and better! -- your experience is.

There are numerous studies out there that show that as little as 500ms latency can cost millions of dollars for a company. It's really important to keep everything fast!

I have more thoughts about this, but this is the gist of the answer to your question: because the goal isn't short queues, but rather faster total trips.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: