Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Timescale Announces New Database Cloud (timescale.com)
119 points by manigandham on Oct 5, 2021 | hide | past | favorite | 54 comments


Clickhouse went corp a couple weeks ago, Timescale goes fully managed, Snowflake, Dremio, DBX ... :popcorn:

Apache Arrow, K8s, ML analytics have given rise to another DB War.

The end of NoSQL was the realization that SQL had a good reason for existing in most cases. Now we have massively distributed SQL in many flavours. I wonder what the hard lessons will be this time?

I'll wager small data companies will be spending good money on vastly overpowered engines... I wonder what else tho?


Exciting things are happening, but to set the record straight, we (Timescale) went "all in" on the cloud over a year ago :-)

[0] https://blog.timescale.com/blog/building-open-source-busines...


Using NoSQL is fast, but requires you to know the how you want to read the data before you start writing. This is not the case with analytics, where new queries pop up all the time.

As a result, NoSQL never really catch on in the analytics/BI crowds, SQL is always king there, if you discount Excel :)


Influxdb was doing a project in Rust/Arrow as well right?

I think the lesson might be- be careful who and when you take funding from. All these managed services smack of VCs looking for ARR growth and a big exit...


We are yes - InfluxDB IOx. It will be available in our cloud platform.


I’d love to see some overlap/collab with Supabase here. If Timescale is providing databases I don’t have to monkey with that scales to my workload, and Supabase is proving Firebase-like ergonomics and DX, there’s some strong synergy here.

Supabase as an add on to Timescale Cloud or Timescale as a supported alternate provider for Supabase would be compelling for get-it-done devs and teams out there


There are a lot of interesting postgres front ends that could be useful. Supabase, metabase, postgrest, hasura.

https://supabase.io/

https://www.metabase.com/

https://postgrest.org/

https://hasura.io/

Of course grafana too, but that goes beyond just PostgreSQL.


Unfortunately Timescale's license is the blocker - https://news.ycombinator.com/item?id=25299128


yes, we'd love to offer Timescale as an extension, but still waiting for buy-in from the Timescale team on this.

If we can't manage we'll eventually do something else for timeseries data - eg, we could do a Foreign Data Wrapper to Clickhouse tables


Why use FDW if you can work with ClickHouse directly? What is the use case?

By the way I can see there are some already but I do not use them https://github.com/Infinidat/infi.clickhouse_fdw/ https://github.com/adjust/clickhouse_fdw


> Why use FDW

It would be beneficial to couple transactional data (Postgres) with OLAP data (CH). There are a few other things coming in PG which might mitigate the need for CH too (pluggable storage, zheap), so we're not going to rush this part.


We have 2 tables that are good candidates for Timescales, others are fine with Postgres. We perform joins query across those 2 table and others. What do you suggest for this? Migrate all to timescale or have two database (Timescale for 2 tables and PG for the rest) ?


TimescaleDB is a PostgreSQL extension, so you don't have to choose. Just convert these two tables to hyper-tables and leave the rest as is.


As far as I know, you can seamlessly join Timescale tables with normal Postgres tables in a query. Timescale is activated on a per-table basis.


> good candidates for Timescales

By this i assume you want columnar access along time dimension.

There are a bunch of columnar options out there ( timescale being one). you can operate hybrid row + column access.

https://www.citusdata.com/blog/2021/03/06/citus-10-columnar-...

https://swarm64.com/post/postgresql-columnstore-index-intro/


Does your postgres host allow you to install the timescale extension? If so, I'd go with that and specify the two tables as being hypertables.

If not, it may be worthwhile to migrate to a host that Does support timescaledb, if not timescales managed product itself


What measurable test could I do to verify I would have improvements enabling timescale on those two tables?


I find clickhouse with proper time-series encoding and tiered data storage to be a better alternative than timescale. There were also some issues with ingestion speed with a timescaleDB.

Can we think of timescale as OLTP for timeseries data with gurrantees from postgres and clickhouse as OLAP for timeseries data?

Continuous Aggregates is a neat feature though.


Clickhouse is a really great piece of technology, especially for general OLAP.

But for time-series workloads, we've found that the results are quite close, with TimescaleDB outperforming Clickhouse for many different types of query workloads. We'll be sharing our results soon.


Nice to hear! We are looking for cases where ClickHouse is not as fast as it could be. And we you can share your experience with ClickHouse on GitHub issues or by mail. If something was not intuitive or worked not as expected. Or lack in functionality


This is quite interesting. Some of the features and upcoming features look really nice. One-click database forks would be really handy. VPC peering is nice, not just for security, but also so AWS doesn't fleece you with bandwidth charges ($0.01/GB on your side and timescale side in the same AZ, and worse if not. For big data systems that can be a lot.)

I wonder how the storage works: https://docs.timescale.com/cloud/latest/scaling-a-service/#p...

Seems like to scale it separately it would need to be EBS not local instance storage? I wonder if magnetic or SSD? That does constrain the performance, especially for queries.


Yes, we use EBS SSD on the backend so we can scale up storage separately from the instance. Our Cloud performance metrics are based on this backend so the short answer is no it doesn't constrain perf. The constraint I see right now is that we are currently mostly GP2 with a planned migration to GP3 which will allow for new independent controls of IOPS and throughput. There are certain, uncommon, situations where customers need to bump up performance beyond what the normal GP2 perf steps allow.

To tie GP2/3 back into the serverless vs. DBaaS concepts we are looking at auto-scale for IOPS/Throughput performance while also allowing more direct access such that a customer could control performance via APIs to manage on your own.

(timescaler here)


And to follow up: Today your IOPS will automatically scale with storage, and you can set storage to autoscale with your usage. Under the covers, we continually look for ways we can optimize that for our users, without them having to think of this.

So for most users, they can "set it and forget it" (easy, scalable): Launch a default cloud database, which starts out small, autoscaling on by default. As they start inserting data, the system automatically detects when it starts to approach current capacity and automatically increases (without any downtime) to more capacity & IOPS.

But for the power user, they have greater control. What that means is that you can manually resize, and as mentioned above, we'll likely also give you independent control over IOPS (independent of capacity). That's the flexibility and control we think developers also want...or at least know is there is they need it.

(Timescaler & post author)


That makes sense to me. Are you saying you're not IO limited usually though?

Some kinds of database workloads would run more efficiently on the local NVMe storage. But that does come with lots of operational considerations.


Bandwidth on vpc peering is free within the same AZ since may 2021


Looks cool. Will this work with postgraphile, hasura, or prisma? They seem to suggest it does but I wonder if anyone has tried it. Postgraphile relies on row-level policies [0] and not even all hosted postgres instances work with it in that respect.

[0]: https://www.graphile.org/postgraphile/security/


Generally speaking, if it works with Postgres, it works with Timescale. Hasura even has documentation on how to use TimescaleDB with Hasura [1].

That said, I'd be curious to hear about other folks experiences.

Disclaimer: I work at Timescale.

[1] https://hasura.io/blog/using-timescaledb-with-hasura-graphql...


What ever happened to the Coral Content Distribution Network. (From co-counder of Timescale.) Such a cool project.


:wave:

CoralCDN ran from about 2004 - 2015. Eventually, its need was pretty much negated by the rise of free CDN services (e.g., Cloudflare) or just lots of SaaS services that supported user-generated content. For example, in late 2004, many of the amateur videos of a large Indian Ocean earthquake & tsunami were shared using CoralCDN, but that soon went to YouTube. Podcasters like "This Week in Tech" were using CoralCDN, but those went to freemium podcasting services. And so on.

What eventually took CoralCDN "down" was that the academic platform on which the ~1000 servers ran, PlanetLab (https://planetlab.cs.princeton.edu/) was end-of-life'd.

But taking a step back, the original thinking behind CoralCDN was as a peer-to-peer CDN, but there were a lot of web security issues that actually made that difficult. If folks are interested, I talk about some of these issues in this 2009 retrospective [0], also also outline a browser-based P2P CDN in this workshop paper that could address (and actually make it P2P but secure)[1]. But still, I think the economics of CDNs (and transit costs) have just changed, such that most of the p2p architectures just don't make sense today like they did in early 2000s.

But thanks for the kind words!

[0] https://www.cs.princeton.edu/~mfreed/docs/coral-nsdi10.pdf

[1] https://www.cs.princeton.edu/~mfreed/docs/firecoral-iptps09....


You just made mfreed's day :-)


> The future is serverless, but not database-less.

I'm a fan of Timescale but definitely not a fan of "serverless" as a phrase.

That phrase is just abstracting "other people's computers" one additional degree, in a borderline meaningless way.

There's still a server. How is that serverless? It's just a server managed in a more indirect way.

Please convince me I am wrong here.


Your view is the pedant’s view. Of course there are servers. There always will be, to some degree. They’re just not your servers to manage (e.g. update, scale) or pay for. You don’t own them, or even rent them, so you can eliminate the hardware details from your thought process.


> Your view is the pedant’s view

You still face constraints, and the platform design is informed by the hardware it runs on.


I like serverless. I know that cloud providers can run my code and auto scale it however is needed. I don't want to worry about how that actually works. I don't want to auto scale servers or think about RAM, etc.

I also like cloud services like Timescale. I just want a database of a certain size. How it runs? I don't care at all as long as it works.


> I don't want to worry about how that actually works.

The problem is that while you don't want to know you end-up having to. As soon as you hit the limitations you have to workaround it. As soon as what you are doing becomes expensive for reasons you ignored before, you have to go back and rewrite. The only thing is that because it's a Blackbox, you also have to guess and poke it until it does the thing you want.


A fun gotcha like this is that if you write to an S3 bucket fast enough, they auto-upgrade you to a more powerful set of servers... but this incurs a short outage. Unfortunately, this outage will happen exactly when you don't want it to: when the system is busy with a high write volume!

You just have to "know" that these things happen, and because it's all serverless, it's essentially invisible and out of your control.


Also a fan and user of Timescale, and I hate the serverless phase.

I'm pretty sure everyone is pushing it so they can make money from SaS. But if this allows them to give it away for free, then I'm all for it ( just won't be using it ).


(Timescale co-founder)

I also used to hate the phrase. But now I get it. And it's not about marketing, but about the idea that the developers _doesn't shouldn't have to worry about the server_.

For example, when you run a DB on a classic DBaaS, you have to worry about CPU, memory, storage provisioning. But when you use a classic SaaS service (e.g., Stripe, Twilio), you don't think about servers at all - but rather just how much you are consuming.

The Serverless model for databases is aiming to apply a SaaS like experience but for DBaaS. Where you don't need to think about provisioning, but about consumption.

What we (try to) do in this article is to push beyond that serverless paradigm. We believe there are real drawbacks to the "serverless black box" architecture that the industry is building, and what (we believe) developers need (including ourselves) is a more a "transparent box."

Hope this helps. But "serverless" also feels a little like "horseless carriages". I suspect in 5 years we'll have a better term that describes this concept for what it _is_, not what it _isn't_.


With timescale you still have to deal and think about database connections, nodes and what not. IMO, a truly serverless database is something like dynamodb which just gives you an HTTP endpoint and they take care of the rest.


The point of the blog post is that Timescale cloud isn't totally "serverless", because for some 20% of use cases, you do want to think about nodes and other database internals.

From the blog post: "So today’s serverless data platforms are not familiar or flexible. But further, black boxes are never truly easy and worry free: you never know if there are any skeletons lurking in the proverbial closet, just waiting to cause your service to fall over."

(Timescale employee here)


I think calling it serverless perfectly captures the operational semantics.

Your target market will understand.

Ignore the HackerNews naysayers who are stuck in the past.


FWIW you guys are losing me in your sales pitch here. I am looking for technical products and I interpret "serverless" as marketing garbage, not as "I dont have to worry about the server".

Call it managed DB as a service and you're more honest in your product offering. You're suffering marketing-speak in lieu of honesty for a technical product - bad plan, imo.


I am an engineer and to me "serverless" means I don't pay money when I'm not using it. Which I like.


I made another comment to this point, but I think you described it much more succinctly.


I'm the author of the blog post, and I can also say that I'm not thrilled from the serverless phrase either. It's turtles all the way down =)

But for better or worse, it seems like the industry has adopted it.

So, we're trying to explain actually how our vision is different.

In that we don't want to hide developers completely from their services behind these black-box abstractions. But provide something that's similarly easy and scalable (and automated), but allow developers greater control, flexibility, and understanding when they want it.


Doh, guess I should have read the article ;).


The best way to be different is to be different. Don't even say serverless at all. You're falling into the trap of being like everyone else.


Serverless should probably just be re-named to "pay-for-request/cpu time/whatever" or something like that. By and large most if not all "serverless" databases or data services (like kafka/pulsar) are just multi-tenant deployments and you're billed on the metrics your tenant generates. Unlike RDS where you provision an instance that you pay for as long as it's running.


"Gay people aren't even always happy! Gay means happy!"

Language changes. "Serverless" clearly doesn't mean "computers aren't involved", but "the computers are someone else's responsibility". It's a marketing term, just like "cloud", and it's unlikely to go away.

As with "hackers vs. crackers", "Linux vs. GNU/Linux", "copyright infringement vs. piracy", and other similar scenarios, the ship has sailed here.


If Gay used to mean straight you'd have a point.


There is a lot of marketing in that blog post but I feel the tldr is that it is a rebranding of forge (their second in-house cloud offering) that will eventually replace their outsourced timescaledb cloud (first offering by aiven).

Am I correct? If so what is the plan for existing customers of those services? Especially since forge didnt support other clouds than AWS last time I checked.


Not exactly right - seems like we could have clarified the difference more :-)

We have two cloud products, Timescale Cloud (which is what this post is discussing), and Managed Service for TimescaleDB (MST), which is what you are also referencing.

Also, as we say in the post:

  Some of you may remember that we launched the first “Timescale Cloud” 2.5 years ago, as the world’s first fully-managed time-series database-as-a-service on AWS, GCP, Azure. That product is alive and well, and fully supported as before, but is now called “Managed Service for TimescaleDB”.
We're investing in and maintaining both. They are just different products, depending on what you are looking for.


Sounds like a lot of people don't like the "serverless" term. And I agree, it's not great.

So here's an "RFP" - any suggestions for a better term to describe this consumption-based experience, where you don't need to worry about servers (and ideally, you don't pay for what you don't use)?


I like “fully managed”, but that doesn’t go far enough to imply the consumption-based model you’re getting at.

To me, serverless is fine. Yes, it’s a buzzword, but it makes sense.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: