I am having trouble understanding what a "Serveless Database" is. When I do a search of the term, I get hype, not a definition. For example:
"What is Serverless Database?
Serverless Database is a prerequisite for Serverless Computing. These are specially designed for the workloads which are unpredictable and can change rapidly. What’s more? This allows you to pay only for the database resources you use, on a second-by-second basis. ..."
I would really appreciate it if someone would give us a clear definition of this term.
Basically I understand the idea of serverless in this context as abstracting away everything related to managing & maintaining the database - i.e. anything like VMs, containers, OS, DB processes etc.
So you can set up a PlanetScale MySQL DB and use it just like a normal instance of MySQL, but also keep adding data from one small set of records all the way up until you have gigantic petabyte volumes of data without having to do anything beyond sending the data through your MySQL connector.
In theory it should just keep working in a performant way from 100 user records for your new startup to the scale of running parts of Slack. No choosing bigger and bigger AWS RDS instances as you scale, no need for autoscale strategies in case of traffic surges or worrying about replicas for perf etc. etc.
As someone who is honestly quite resistant to parts of the serverless paradigm this offering actually appeals to me. I prefer running my own fleet of VMs and traditional PHP/Nginx type stack but have already moved to AWS RDS to abstract away some of the replication complexity required to achieve high availability DB with minimum hassle. This seems like the logical next step and despite being allergic to kind of hype you mention finding this is something I'd definitely try out before moving other parts of my infrastructure onto anything like Lambda.
Now we need seperate access controls, seperate networking tools, seperate monitoring and diagnostics. It's becoming apearent to me that this kind of stuff is the scam of the century.
There is no clear definition with universal agreement. It's a hype-y term applied with... varying levels of rigor.
However, roughly speaking, "serverless" rolls together 3 features:
1. Fine grained pay-per-use (e.g. pay for a query by the number of rows scanned)
2. The pricing dial goes down to zero when usage is small enough.
3. You generally don't control VM/instance-level scaling but something closer to the abstraction level of the product being claimed as "serverless". For example in planetscale you get no control over how many mysql instances actually run your queries. This is great for reducing operational complexity but not so great for controlling performance. Performance tends to be quite opaque -- for example there's nothing I can find in Planetscale's docs about latency and throughput. The operational benefits are real, though. It's a tradeoff.
This is good info thanks. I have some cloud Infra experience so I am interesting in knowing how does they keep the data stored and remove the "query" servers when not in use.
Possibly some kind of EBS equivalent storage which is attached to the VM when it's booted up? I wonder that creates more failures at the cost of operational simplicity?
It's just a database service. You don't run the servers. You pay someone to do that and you just connect to it and use it and someone else maintains the servers, storage, scaling, backups, patches, etc., according to some SLA and other terms.
There are indeed servers somewhere. "Serverless" is misleading cloud speak for the otherwise easily understood concept of a service.
They almost never shut down the actual VMs, they are simply re-allocated. (Semi)Auto scaling exists behind the scenes but once a provider becomes popular enough the VMs become more expensive to stop.
Amazon Aurora has http endpoint too, for its serverless offering.
I use S3 with duckdb files in 'em as a sharded OLAP database of sorts. No transactions or joins though, but my workload is write once, ready plenty. Hoping Databricks' serverless SQL is a good fit once it emerges out of beta.
You don't provision and manage the infrastructure associated with your DB. Someone else does it for you and you just focus on querying/storing the data and whatever else your business requires you to do.
Not provisioning & managing is part of it, but the pricing model is important too. e.g. RDS and Mongo Atlas are managed database services so you're not provisioning and managing the infra, but you are paying for dedicated machines and their sizes, etc.
>but you are paying for dedicated machines and their sizes
I see this as provision & managing as well, but yes I see your point. Ultimately it's about the user only caring about its data and nothing else (to the extent possible).
With a dedicated Mongo DB cluster (server-full), you are paying for a certain cluster size, per hour. It doesn't matter if you read or write any data to it. You're paying for a machine with a specific amount of storage capacity and cpu. Use it or lose it.
DynamoDB (considered serverless), you're charged based on the read and write throughput, and how much you have stored (GB-month). If you don't store any data, you're not being charged for unused storage capacity, like a dedicated cluster. You don't think about the instance size, amount of memory, etc.
Serverless means infrastructure as code and you're in as much control of the system as they want, which can be near 0, and they'll tell you it's for your benefit to have no control. Why use a bank vault when you can have a crypto wallet.
I'm curious about what kind of workloads people would be using a serverless DB for?
If I understand it correctly, let's say my DB is only being used to process 1M transactions from 9am-1pm, I'm basically only paying for the load during that time, versus a managed DB where it's being paid to be on 24/7. With most serverless there is a penalty though - cold startup.
So is it purely an economic play for esoteric DB workloads? If my DB is realistically going to be churning for 24/7 anyway, why would I ever use serverless DBs?
Lower operational burden - if you're using a serverless db maintaining servers is one less thing to think about. Same with any infrastructure - you don't have to worry about orchestration, load balancing, authentication, etc.
As I understand it, they've extracted the storage and compute parts of the database and are running them in a scalable way such they can automatically add more compute or storage as needed.
"What is Serverless Database?
Serverless Database is a prerequisite for Serverless Computing. These are specially designed for the workloads which are unpredictable and can change rapidly. What’s more? This allows you to pay only for the database resources you use, on a second-by-second basis. ..."
I would really appreciate it if someone would give us a clear definition of this term.