Curious, does AWS Glue or SchemaCrawler do any type inference past basic data types? Such as string analysis to automatically mark fields as a shipping tracking number, IP address, ISO country/city, correct date format, etc?
We use this to impute the schema. Has anyone thought about taking this creating something like an open-source TAMR ? Would love to hear ideas around it if someone has.
TAMR is a neat tool, but what is the price? I’m wary of tools that require a demo to even understand if this is a $1M tool or $100M tool.
I’m interested in sustainable ways to map out data across the enterprise. But the vendor space is hard, for me, to analyze at a green fields level because it’s full of pretty heavy tools that require implementation and consulting just to set up. I was unaware of TAMR until your post but tried to go through Gartner’s analysis looking at data management platforms.
Are there even any open source tools or communities in this space. For example, I started looking at Talend’s metadata mgmt oss stack for eval as to whether it’s something that would help me, but gave up after their demos wouldn’t run in a few environments I tried.
Do not know the cost point of TAMR. Aware that it is expensive. There are a couple of tools that come to mind, metacat (as someone pointed out), wherehows (linkedin), apache atlas (not sure who contributed). The issue is also to look at not just RDBMS but also RDF, Graph and then abstract it through a semantic layer.
Their sells team must be good to manage to get a french bank with this name: tamr would be read as "ta mère" (your mom) which is usually used alone only for cursing.
I also use Schemaspy. Two things I like about Schemaspy 1) Recognises markdown in your database comment field. 2) you can merge documentation in a text file with metadata from your database. It's a neat way of keeping database documentation.
This is a great tool. We use it to generate an Entity Relationship Diagram from our canonical DDL file checked into our repo.
Here's the basic recipe:
1. Spin up a fresh Postgres instance on Docker using -P to claim an available ephemeral TCP port
2. Use `docker inspect` to read the Postgres port
3. Run DDL script on the fresh instance
4. Run SchemaCrawler Docker container using --network host option so it can connect to Postgres
and using -v so it can save a schema image to the host filesystem
This entire process is a `/bin` script checked into our repo, so we can update `/doc/db-schema.png` any time. It takes about 15s total since we have to pause for the Postgres instance to come online.
I'd also used wwwsqldesigner[0] (possibly a different fork) with some custom hacks to infer relationships by naming where foreign keys were not present. It produced a quick ERD for getting started on a project. Always wanted a more complete (non-PHP) version of this tool and perhaps there is one in these comments.
This tool is fantastic. In a previous life, I used it to dynamically analyze and extract users from a multi-tenant database and determine the proper sort order for reinsertion in a different database on a potentially different (JDBC-compatible) platform.
Quite a bit of detail in terms of command line options, etc, under Features menu. Could be better organized, but I don't think it's as bad as you suggested.
Similarity really is only the name. Web crawlers scrape web pages and follow links to find additional pages to scrape. A tool like this inspects your database and determines your schema, relationships, etc.
Disclosure: I work on AWS Glue