I have been running vector in several k8s clusters since July. I'm doing some non-trivial transforms over several different types of sources and sinks.
The config makes this easy, but my favorite part is the fact that the CPU and MEM of the vector processes barely even registers on my metrics charts. I can't even tell you what their actual and requested resources are because I haven't bothered to look in a while.
It's one thing I never have to worry about. I could use more of those.
The world of observability seems to be converging on some key ideas. I think Vector's data model where they convert logs into structured events is a key idea that wasn't always obvious. I think the remaining big idea is what do do about high cardinality data. Most solutions pre-aggregate the data which does't tolerate high cardinality tags. Solutions like Honeycomb and Datadog logs are stream based and tolerate high cardinality tags, but with limitations on what can be done with it. It will be interesting if the streaming based solutions become the final standard. Vector warns about high cardinality labels, just like others do. I'm not sure if that is a limitation of Vector or just the Sink.
I think the interesting part is the overlap between observability metrics for operations and expanding in to BI metrics with the same tools.
> I think the interesting part is the overlap between observability metrics for operations and expanding in to BI metrics with the same tools.
I built Logflare so you could get your structured logs into BigQuery directly, so now you this :)
Hopefully we'll have a Vector sink soon, but until then I think they support POSTing batches to HTTP endpoints so you can do that and we'll take any JSON and go straight to BQ with it after migrating your schema for you (automatically based on the incoming payload shape).
Vector still insists that Kafka is for logs only. Vector is a fantastic project, we have so many use cases for it, but the thing that trips us up is not being able to send metrics via Kafka, without transforming them to logs.
Edit: Awesome, my complaint that you couldn’t scrape the Prometheus federation endpoint has fixed.
Thanks for pointing this out! That limitation is largely a holdover from when the Kafka sink was written and our support for accepting multiple data types was not as good as it is now. As things stand today, it should be a pretty simple change to enable this.
I'll go ahead and open an issue to get that addressed, but in the future please feel free to do so yourself for anything that's tripping you up! We really value this kind of feedback and try to address it as promptly as possible.
Thank you, that’s great. We really do love having Vector available to us, for such a young project it’s amazing that is such a stabil and solid piece of software.
I did that, and it's still just some board room speak about cost reduction, novel data enrichment and not what actual it actually does. What do I use this for?
I get that you want to sell this to executives, but the people that point executives to products to buy are the developers like me, and if don't understand what it does, I can't see executives finding your product.
Just give examples of what on-the-ground problems is solves, converts kafka messages to s3 objects or what?
Lack of novel data enrichment is not a problem that I have, but I could do with something that streams k8s pod logs to kafka or s3 or whatever. Does this solve that?
Yes, streaming k8s pod logs to Kafka and/or S3 is a great example of when you could use Vector.
The "collect, transform, and route all your logs, metrics, and traces" bit is our most succinct explanation of what Vector does, but I'll admit it's still not as clear as we'd like. To expand it slightly, Vector is a tool to collect observability data (logs, metrics, etc) from wherever it's generated, optionally process that data in-flight, and then forward it to whatever upstream system you'd like to consume it. It does this by providing a variety of different components that you configure into whatever pipeline you need. In your example, you could use our new k8s source and plug it into our Kafka sink, our S3 sink, or both.
Vector is designed to work well with systems like Kafka, not to replace them. While it does have a _very_ simple durable queue in the optional disk buffer feature, it is nowhere near the durability, fault tolerance, performance, etc of a full Kafka cluster and we would not recommend thinking of them as the same type of system.
That being said, we do know of a few cases where Vector's in-flight processing and routing capabilities were enough that a full Kafka-based pipeline was no longer needed. This ability to push computation out to the edge and reduce the need for centralized infrastructure is one of the aspects of Vector that we're most excited about.
The config makes this easy, but my favorite part is the fact that the CPU and MEM of the vector processes barely even registers on my metrics charts. I can't even tell you what their actual and requested resources are because I haven't bothered to look in a while.
It's one thing I never have to worry about. I could use more of those.