“Dataflow” and the open source ecosystem in that neighborhood (Flink, Spark, Bea...

yehosef · on Nov 14, 2017

I don't think this is "always" true. The power of having the data in records is that you can trivially pivot and slice and dice the data in different ways. When it's aggregated to a count - I don't have that ability. When trying to debug something that's happening, I find it far easier to have the entire records.

As for scaling, it scales very well (you can read through the elastic blog/use cases to see plenty of examples.) That's not to say there aren't levels of scaling it won't handle. But I would venture to say that for 99% of the people out there, it will solve there problems very well.

bb01100100 · on Nov 16, 2017

Would it be accurate to say that your ability to count something new, starting from a previous point in time (i.e. not just from 'now') is dependent upon how easily you can replay your stream (I'm thinking in Kafka terms); or is there something in your architecture that allows you to consume from the stream from a (relative? absolute?) point in time (again, Kafka thinking leaking into my question)?

Thanks.