SchemaCrawler: Free database schema discovery and comprehension tool

Zaheer · on Dec 30, 2018

For those interested in this space, AWS Glue Crawler does schema discovery on databases as well as data stored in S3: https://aws.amazon.com/glue/

Disclosure: I work on AWS Glue

sbr464 · on Dec 30, 2018

Curious, does AWS Glue or SchemaCrawler do any type inference past basic data types? Such as string analysis to automatically mark fields as a shipping tracking number, IP address, ISO country/city, correct date format, etc?

pklee · on Dec 30, 2018

We use this to impute the schema. Has anyone thought about taking this creating something like an open-source TAMR ? Would love to hear ideas around it if someone has.

prepend · on Dec 30, 2018

TAMR is a neat tool, but what is the price? I’m wary of tools that require a demo to even understand if this is a $1M tool or $100M tool.

I’m interested in sustainable ways to map out data across the enterprise. But the vendor space is hard, for me, to analyze at a green fields level because it’s full of pretty heavy tools that require implementation and consulting just to set up. I was unaware of TAMR until your post but tried to go through Gartner’s analysis looking at data management platforms.

Are there even any open source tools or communities in this space. For example, I started looking at Talend’s metadata mgmt oss stack for eval as to whether it’s something that would help me, but gave up after their demos wouldn’t run in a few environments I tried.

pklee · on Dec 30, 2018

Do not know the cost point of TAMR. Aware that it is expensive. There are a couple of tools that come to mind, metacat (as someone pointed out), wherehows (linkedin), apache atlas (not sure who contributed). The issue is also to look at not just RDBMS but also RDF, Graph and then abstract it through a semantic layer.

dsimms · on Dec 30, 2018

I've never used TAMR, but how about https://github.com/Netflix/metacat?

pklee · on Dec 30, 2018

I have not used it, I have heard of it. Have you used it, would love to get your ideas / thoughts on it.

Something1234 · on Dec 30, 2018

What is TAMR?

pklee · on Dec 30, 2018

https://www.tamr.com/ - Metadata management for the enterprise as I understand it.

arkh · on Dec 30, 2018

Their sells team must be good to manage to get a french bank with this name: tamr would be read as "ta mère" (your mom) which is usually used alone only for cursing.

zahreeley · on Dec 30, 2018

I would love to contribute

sbuttgereit · on Dec 30, 2018

I'm currently using: http://schemaspy.org/ and I also used its predecessor.

There are certainly warts, and they seem to be Oracle first, but on the whole I get a reasonable documentation experience out of it.

mmsimanga · on Dec 30, 2018

I also use Schemaspy. Two things I like about Schemaspy 1) Recognises markdown in your database comment field. 2) you can merge documentation in a text file with metadata from your database. It's a neat way of keeping database documentation.

blr246 · on Dec 31, 2018

This is a great tool. We use it to generate an Entity Relationship Diagram from our canonical DDL file checked into our repo.

Here's the basic recipe:

  1. Spin up a fresh Postgres instance on Docker using -P to claim an available ephemeral TCP port
  2. Use `docker inspect` to read the Postgres port
  3. Run DDL script on the fresh instance
  4. Run SchemaCrawler Docker container using --network host option so it can connect to Postgres
     and using -v so it can save a schema image to the host filesystem

This entire process is a `/bin` script checked into our repo, so we can update `/doc/db-schema.png` any time. It takes about 15s total since we have to pause for the Postgres instance to come online.

karmakaze · on Dec 30, 2018

I'd also used wwwsqldesigner[0] (possibly a different fork) with some custom hacks to infer relationships by naming where foreign keys were not present. It produced a quick ERD for getting started on a project. Always wanted a more complete (non-PHP) version of this tool and perhaps there is one in these comments.

[0] https://github.com/ondras/wwwsqldesigner

btgeekboy · on Dec 31, 2018

This tool is fantastic. In a previous life, I used it to dynamically analyze and extract users from a multi-tenant database and determine the proper sort order for reinsertion in a different database on a potentially different (JDBC-compatible) platform.

alexashka · on Dec 30, 2018

What can this tool do? Download it and run -h to find out.

One can only wonder why any javascript library of the week has better docs.

bdcravens · on Dec 30, 2018

Quite a bit of detail in terms of command line options, etc, under Features menu. Could be better organized, but I don't think it's as bad as you suggested.

ziont · on Dec 30, 2018

How is this different from a web crawler?

bdcravens · on Dec 30, 2018

Similarity really is only the name. Web crawlers scrape web pages and follow links to find additional pages to scrape. A tool like this inspects your database and determines your schema, relationships, etc.