We actually do a lot more than 20B (not sure what the number is actually pulled from), but it's also important to note that even 8000 events/sec isn't free.
- 40k per event (average, it gets into 2mb territory for some)
I hear you, at the same time you have embarassingly parallel problem that can be easily sharded per customer. (Do I understand your service correctly?)
Processing 8k * 40k is a feat but I’d think you have more then a few dozens of nodes.
Then it gets into ~100s ev/s per node territory. Quite manageable even for e.g. complex search engines to ingress.
Of course, it all depends on where your bottlenecks are.
> I hear you, at the same time you have embarassingly parallel problem that can be easily sharded per customer. (Do I understand your service correctly?)
The actual event ingestion from an external consumer to our queue is indeed very boring. What makes Sentry internally complex is how incredibly complicated those events can get. For instance before we can make sense of most iOS (as wlel as other native) or JavaScript events we need to go through a fairly complex event processing system. A lot of thought went into making this process very efficient.
Also obviously even if a problem like ours can be easily solved in parallel, you don't just want to add more servers you also want to run a reasonably lean operation. Instead of just throwing more machines at the issue we want to make sure we internally do not do completely crazy things but optimize our processes.
This is an interesting line of discussion. How do you know you are optimally utilizing the hardware you do have? For example, is each machine running at near 100% cpu? Is the network card saturated? Is the database the bottleneck? How about the search indexing operation? What metrics are you tracking and how are you tracking them? If you're at liberty to discuss, it would be highly enlightening!
> How do you know you are optimally utilizing the hardware you do have?
That's probably one better to answer for actual ops people. However what we generally do is looking at our metrics to see where we see room for improvement. Two obvious cases we identified and resolved through improving our approaches have been JavaScript sourcemap processing and debug symbol handling.
There we knew we were slow based on how far p95 timing was from the average and median. We looked specifically at what problematic event types looked like and what we can do to handle them better. We now use just better approaches to dealing with this data, multiple layers of more intelligent caches and that cut down out event processing time and with it the amount of actual hardware we need.
So the answer at least as far as event processing goes: for some cases we spent time on we were CPU bound and knew we can do better and did. I wrote a bit about what we did on our blog where we moved sourcemap processing from Python to Rust. The latests big wins in the processing department have been a rewrite of our dSYM handling where we now use a custom library to preprocess DWARF files and various symbol tables into a more compact format so we can do much faster lookups for precisely the data we are about.
> Is the network card saturated?
In other cases that has been an issue, but that's less of a problem on the actual workers which are relevant to that number we are discussing here. In particular as far as the load balancers goes there is a ton more data coming in but not all will make it to the system. A non uncommon case is someone deploys our code into a mobile app SDK and then stops paying. The apps will still spam us, but we want to block this off before it hits the expensive parts of the system etc.
> Is the database the bottleneck?
When is a database not a concern ;)
What we are tracking is a ton, how we are tracking is a lot of datadog.
> I’d be curious to see how hardware stacks up against events/second end2end as well.
That heavily depends on which part of the stack you are looking at. (Event ingestion, processing, load balancers, databases etc.). There is a lot less hardware there than one would imagine ;)
All event processing (currently) takes place on three 32 core Xeon machines. But it has seen much worse days when we had many more before we optimized critical paths. Likewise what we have now is pretty chill, but we need to also consider the case where a worker drops entirely from the pool so these are over-provisioned for the amount of work they each take.
The data layer is probably more interesting from a scaling point of view in general since it's harder to scale, but it's also not exactly a point we're particularly happy with. There is already a comment from david (zeeg) on this post going into our plans there.
96 core to process 10K complex event/s sounds quite impressive given the kind of analysis you seem to have to do. Did sentry used to have all the stack trace, local variable data originally or that kind of extensive processing a more recent development? I am more used to Java where the basic stack trace is provided by the VM and I assume processing those are much simpler than some of the user kind of events?
> All event processing (currently) takes place on three 32 core Xeon machines. But it has seen much worse days when we had many more before we optimized critical paths.
Finally I could understand what kind of efficiency you guys have. Not bad at all I’d say though not exactly “high load miracle” the article title would imply.
I'm not sure what the article how the article implies a "high load miracle". But we are quite proud about the efficiency gains and some of our solutions for the problems we encounter I think.
- 40k per event (average, it gets into 2mb territory for some)
- streaming processing of everything on the fly
- storage/querying of data