For us, what we are striving to do differently is:
1/ DFS vs BFS. We are planning on rolling out connectors slowly vs. building tens / hundreds / thousands of connectors to try to attract a broader set of audiences. Those that have tried to replicate data between OLTP and OLAP know how painful it is and we really want to solve this pain point before we move on to new sources. In addition, we're planning on providing more value than just replicating data from source to destination. We're planning on integrating our telemetry library [1] with Datadog such that customers can:
* Centralize all metrics. See Artie metrics without coming to our dashboard, instead it's integrated with your existing tools.
* Help provide cookie cutter monitors for anomaly detection
* We also want to provide better table quality checks
2/ We do not want to change user behavior. We are using pretty standard tools to solve this problem such as Kafka, Pub/Sub and Red Hat Debezium. If you are already using one of these tools, we can just integrate vs. standing up a whole suite of services just for the data pipeline.
* If you have CDC events being emitted for event-driven architecture already, we'll skip deploying Debezium and just deploy our consumer
* If you have Kafka enabled and want to also consume CDC events, we'll just deploy Debezium to publish to your Kafka vs. ours.
3/ Ease of use. This goes without saying, but plenty of tools out there have broken links in their documentation, extremely confusing UI, etc. We really look up to Fivetran in this regard and we try to make the onboarding process for new connectors as simple as possible.
Do you think there are anything missing with the other CDC or data replication tools out in the market? Let me know and happy to see how we can help!
1/ DFS vs BFS. We are planning on rolling out connectors slowly vs. building tens / hundreds / thousands of connectors to try to attract a broader set of audiences. Those that have tried to replicate data between OLTP and OLAP know how painful it is and we really want to solve this pain point before we move on to new sources. In addition, we're planning on providing more value than just replicating data from source to destination. We're planning on integrating our telemetry library [1] with Datadog such that customers can: * Centralize all metrics. See Artie metrics without coming to our dashboard, instead it's integrated with your existing tools. * Help provide cookie cutter monitors for anomaly detection * We also want to provide better table quality checks
2/ We do not want to change user behavior. We are using pretty standard tools to solve this problem such as Kafka, Pub/Sub and Red Hat Debezium. If you are already using one of these tools, we can just integrate vs. standing up a whole suite of services just for the data pipeline. * If you have CDC events being emitted for event-driven architecture already, we'll skip deploying Debezium and just deploy our consumer * If you have Kafka enabled and want to also consume CDC events, we'll just deploy Debezium to publish to your Kafka vs. ours.
3/ Ease of use. This goes without saying, but plenty of tools out there have broken links in their documentation, extremely confusing UI, etc. We really look up to Fivetran in this regard and we try to make the onboarding process for new connectors as simple as possible.
Do you think there are anything missing with the other CDC or data replication tools out in the market? Let me know and happy to see how we can help!
[1] https://docs.artie.so/telemetry/overview