So, I'm an application developer, for the most part. I write stuff and I don't know a whole lot about databases other than how to create some tables/collections, do some basic indexing, and read the slow query log. Even running the explain commands, I have to spend half a day reading the docs and then I've forgotten everything after a few weeks.
I want to know: Can someone tell me how/where to focus my efforts to learn more? In my day job, I use mostly MongoDB these days, but also some MySQL. I have a few resources that I'm aware of:
I don't know when to use a relational database and when to use NoSQL. I don't know the differences between the various NoSQL databases. I don't know the ins-and-outs of any of the systems and I don't know how to determine what path to go for high availability, geo-distribution, replication, etc.
I find it all very overwhelming and would like to learn more. I'd like to be able to pull plugs in things and see my systems stay up (which, I'm sure would require intimate knowledge of at least one system). Does anybody have any suggestions? How valuable do you think this stuff is?
The book "Seven Databases in Seven Weeks" may be a good start. It may not answer all of your questions, but it does a good job summarizing various types of databases. Honestly you can read through it in a day or two - I don't think there's much need to install the DB's and do the exercises unless you're looking to explore a certain DB in more detail.
I originally skipped over the first chapter on Postgres but went back and read it and learned a few things (too bad there's no mention of the recently added json and jsonb data types).
Hey! So I'm not an expert, but I'm learning more every day.
It sounds like you're conflating three (or more) problems -- knowing if your data is relational in its nature, understanding how to efficiently retrieve the data you're looking for and understanding system administration/architecture tasks for a particular system.
The reason why I bring this up: my first job out of college was as a (very) junior Oracle DBA. At that particular job, that meant knowing some stuff about data retrieval but knowing much more about performance tuning and the architectural idiosyncrasies of Oracle and lots of Linux stuff.
Later I moved into startupland and necessarily learned quite a bit more about the data storage and retrieval patterns popular in web applications.
My point is: you might want to pick a particular aspect of databases that you want to learn about and really focus on that -- just like you would for a new programming language or framework. The way I usually do it is by finding a project that seems just a little too hard to be easily in reach and learning everything I can to make it happen.
i dont have much experience with NoSQL, but the typical use is for document storage where structures can have varying levels of depth and differing schemas or different schema versions. personally, i think a lot of the NoSQL benefits are hype for most use-cases. relational databases will get you very far when used correctly.
a good method of learning is to use some really large dataset and come up with some questions you want answered about the set, then write sql to get that information out. if your queries take a long time, begin indexing and looking at how the queries can be rewritten to return results faster.
i cannot recommend a good, large interconnected/relational data source, but something like zipcode databases [1] can get you started. you can also play with geo-spatial indexing to find zipcodes within x miles of each other. [2]
I want to know: Can someone tell me how/where to focus my efforts to learn more? In my day job, I use mostly MongoDB these days, but also some MySQL. I have a few resources that I'm aware of:
- http://aphyr.com/tags/Jepsen - Have browsed this and found it interesting.
- MySQL reference manual http://dev.mysql.com/doc/#manual
- MongoDB Manual docs.mongodb.org/manual/
- High Performance MySQL - I purchased this book but never got past the first chapter. Not to say that I can't...
- use-the-index-luke.com/ - Rarely read this, but have given it a half-hearted start from time to time.
- http://www.mysqlperformanceblog.com/ - ditto
I don't know when to use a relational database and when to use NoSQL. I don't know the differences between the various NoSQL databases. I don't know the ins-and-outs of any of the systems and I don't know how to determine what path to go for high availability, geo-distribution, replication, etc.
I find it all very overwhelming and would like to learn more. I'd like to be able to pull plugs in things and see my systems stay up (which, I'm sure would require intimate knowledge of at least one system). Does anybody have any suggestions? How valuable do you think this stuff is?