Streaming ingestion support in the collector #212

yurishkuro · 2017-06-16T01:05:51Z

@hzariv you mentioned a lack of streaming support as an issue, could you please elaborate? Do you mean streaming ingestion of spans (e.g. from Kafka), or a stream coming out of collectors?

Note that we actually have support for both (subject to the exact format) that we can open source if there's a demand. The streaming ingestion somewhat goes against the design of Jaeger, since we want an active bi-directional communication between clients-agent-collector, but it's certainly possible to add.

hzariv · 2017-06-16T01:19:10Z

I mean as a transport similar to Zipkin (https://github.com/openzipkin/zipkin-reporter-java). I understand jaeger uses UDP based agent but we already use Kafka for centralized logging and it would be nice to use the same infrastructure to report spans. Hope this clarifies :-)

yurishkuro · 2017-06-16T20:35:48Z

@hzariv there are a few reasons we decided against a streaming ingestion model, Kafka or others:

Dependencies: if Jaeger client library needs to write spans to a UDP or HTTP server, it usually needs nothing more than a standard library in the respective language. To write to a streaming platform, on the other hand, requires extra dependencies in the library, e.g. a Kafka client. These extra dependencies make the library harder to integrate into applications, due to larger footprint and versions mismatch.
Broker configuration: Jaeger tracers can be instantiated with nothing else but a service name argument, because they talk to a known UDP port. If they were to talk to a streaming platform, they would need a lot more configuration, like a Kafka broker address, which would be different in different environments / data centers, again making deployment more complicated.
Overall dependency on extra infra: as one of the basic observability tools the goal was to make Jaeger depend on as few other infrastructure components as possible. Not every organization uses Kafka, so making it a requirement to run Jaeger is a downside. Granted, we still have another dependency on some service discovery or routing layer for network calls between agent and collector, but in the microservices architecture that problem generally needs to be solved anyway.
Bi-directional data flow: unlike Zipkin that was originally designed to have clients to "fire and forget" the spans, Jaeger was architected with a feedback loop between clients and the collectors cluster. The feedback loop gives us the ability to implement adaptive sampling, centralized baggage key whitelisting, and more features in the future. This bi-directional data flow design means that clients always need to have the ability to talk to collectors, thus making any intermediary messaging bus redundant.
Advanced adaptive sampling: the current version of adaptive sampling that we have only controls probabilities in the clients. A more advanced version of it will allow the clients to capture a lot more data that will be cached in the agents and in most cases discarded shortly as not interesting. Because the communication between clients and agents happens on the local host, it is relatively cheap and does not affect performance as much as sending the data over the network or writing to disk. Having to write all data to a messaging solution would effectively prevent these advanced sampling techniques for performance reasons.

As I mentioned, we do have the ability to steam spans out of collectors. We are building a post-collection data pipeline that will provide various aggregates on top of standalone trace data.

Internally we do ingest some Kafka streams that contain certain enrichment data generated by tools other than Jaeger clients, such as haproxy/routing logs, or tracing data from mobile apps.

At some point we can implement (or better, accept a PR) ingestion of spans from Kafka, mostly for compatibility with existing Zipkin instrumentation / installation.

fgcui1204 · 2017-06-23T09:56:11Z

Looking forward to support this.

yurishkuro · 2019-03-11T18:20:43Z

This is supported by jaeger-ingestor.

yurishkuro changed the title ~~Streaming support in the collector~~ Streaming ingestion support in the collector Jun 16, 2017

vprithvi mentioned this issue Jul 17, 2017

Kafka for trace storage #274

Closed

yurishkuro mentioned this issue Aug 16, 2017

Migrating from Zipkin Kafka stream #171

Closed

jpkrohling added enhancement question area/storage labels Jul 18, 2018

yurishkuro closed this as completed Mar 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming ingestion support in the collector #212

Streaming ingestion support in the collector #212

yurishkuro commented Jun 16, 2017

hzariv commented Jun 16, 2017

yurishkuro commented Jun 16, 2017 •

edited

fgcui1204 commented Jun 23, 2017 •

edited

yurishkuro commented Mar 11, 2019

Streaming ingestion support in the collector #212

Streaming ingestion support in the collector #212

Comments

yurishkuro commented Jun 16, 2017

hzariv commented Jun 16, 2017

yurishkuro commented Jun 16, 2017 • edited

fgcui1204 commented Jun 23, 2017 • edited

yurishkuro commented Mar 11, 2019

yurishkuro commented Jun 16, 2017 •

edited

fgcui1204 commented Jun 23, 2017 •

edited