A lot of people want a product like this, or at least think they do, and many have also attempted to build parts of this. The issue, of course, with this product is that there is no evidence of a product nor an ability to assess it in abstract of a sales person presumably emailing you.
Questions I think would be important to answer:
- What types of queries are supported?
- Does data size or update frequency have performance implications?
- Is discovery embedded in this product?
- Is it available on-prem or only as an API?
- What are the data security guarantees?
- What workflows are easy to do in the UI?
- Does this support alerting on queries?
I could go on, but I think the point is made. There’s a reason a lot of data tooling companies use a freemium model — users want to feel the system before investing in it, maybe only reaching out when they’re ready to load test.
The old guard is very enterprise-y for a reason. Their products are complex monoliths that require mid-six figure investments to get moving. Either you have the staff/dollars/organizational momentum to buy their crap and follow-through or it's not gonna work.
I've spent more time than I'm comfortable admitting with Gartner's favorites in the Data Governance and Data Quality areas and they all suck. Desperately looking forward to having someone not-tied to legacy technology enter this space.
If you're looking for assistance developing the tool and/or selling to enterprise, reach out (username minus the 2 at gmail), I've got a virtual Rolodex full of F500 clients looking for better answers in this space.
Disclosure: I am part of the DataHub team at LinkedIn
There have been a number of recent open source offerings in this space, one of them being DataHub [1] which is a product of our evolution in the metadata space over the years.
An important lesson we learnt as part of this evolution is that monolithic/centralized architecture just doesn't scale with new data and users. Individual teams/owners must have the power and flexibility to decide what they care about while still being able to tap into the global metadata.
Aren't these abstractions often so complex, company-specific and leaky that most companies are better off building their own in-house solutions? Atleast that's what I've seen at most FAANG companies but then again they're notorious for having NIH syndrome and have the resources which your regular IT shop might not.
IME the issue is that startup data modelling ends up in some ad-hoc framework that enforces a specific paradigm. When you try to adopt a standardized tool they don't ask "does this produce the right answer" but "can this do exactly what the old tool did", and usually it's not the case. Nobody can actually reason about the whole set of data from first principles anymore so they're forced to mechanically repeat the same exact process to get consistent results. Hell, the results might be wrong, or nobody uses them, but you don't have the tooling to detect that and you're too afraid to adopt it
Would be curious to better understand your needs in this space. I work at an early stage VC firm and have met probably ever single catalogue/metadata startup - shoot me an email at davis (at) innovationendeavors.com if you want to share notes on the space. I can potentially recommend a few options.
Looks interesting. I'm finding that using dbt and the automatically generated docs helps with quite a lot of this. There are other sources of metadata though, so I can see why some might want a tool like this.
Would be curious to better understand your needs in this area. I work at an early stage VC firm and have met probably ever single catalogue/metadata startup - shoot me an email at davis (at) innovationendeavors.com if you want to share notes on the space.
What sort of pricing is this targeting?