Hacker News new | past | comments | ask | show | jobs | submit login

> When you have thousands of metrics, uptime kuma and bunch of friends wont help you.

This is fair! Actually Uptime Kuma still doesn't support a multi-user mode (e.g. one admin user, multiple users that can edit values/setup, maybe some for viewing data).

That said, I'm also at the scale where it makes perfect sense to use something this simplistic and there are few things that give me joy than running/building a container image and getting working software in less than an hour, which at my scale is also good for most if not all "day 2" concerns.

> Detailed App/Infra metrics can also run on your own infrastructure unlike status pages that should use something independent In your case, if your local mattermost fails, you will get 0 notifications.

Another fair point! That said, there's very little preventing you from choosing the most boring and stable multi-cloud setup that you can find. A Docker container for the software, with a reverse proxy and connected to the aforementioned infrastructure monitoring.

Has the Docker service failed? I'll get a notification. Docker bridge network down? I'll get a notification. Containers fail health checks? Might still need to work on this, but totally doable as well with minimal work.

Of course, there's also a lot of variability to how you can lay everything out - for example, I run some of my personal infrastructure from nodes that are in another room at my place, most other parts off of rented VMs in a semi-local company. My homepage, for example, has both Uptime Kuma as well as external monitoring service connected to it, just to compare how believable those values are.

At work, though? For development/test environments, Uptime Kuma on a separate server is enough (say, if you have one that controls the container cluster or aggregates other metrics, might as well spin up a simple container there), or any other software package that's necessary, like Apache Skywalking etc.

For production? Frankly, depending on what you're running, you might as well get a team of people together and come up with something that has proper redundancies in place, as well as a multi-cloud strategy.




> Has the Docker service failed? I'll get a notification. Docker bridge network down? I'll get a notification.

If you rely on cloud services yes. If you run your own infra, then no, you will have to metric/alert that in a custom manner as with everything else. So that thing you mention is NOT a borring technology (which should be promoted) but outsourceing (which should NOT get promoted in general).

> For development/test environments, Uptime Kuma on a separate server is enough

It doesn't matter as your network will fail. There is nothing worse then status page having false positives.


> If you rely on cloud services yes. If you run your own infra, then no, you will have to metric/alert that in a custom manner as with everything else.

Consider this example:

  I have Zabbix on server A.
  I have an e-mail server on server B.
  I have Uptime Kuma on server C.
  I have an instance of Mattermost on server D.
  I have the application that I want to monitor on server E.
In a zero trust model (or even just running WireGuard) there is very little preventing you from having either on different cloud providers. There's also very little preventing you from having a setup like A-D on a few boxes that sit under your desk/colocated somewhere but having D in the cloud.

Thus, one can reason about the potential failure states:

  If servers C-E run into issues (say, Docker issues), I'll get a notification thanks to A and B (Zabbix sending an e-mail).
  If servers C-E are utterly unreachable (say, network interface problems), I'll get a notification thanks to A and B (Zabbix sending an e-mail).
  If servers A-B or E run into issues, I'll get a notification thanks to C and D (Uptime Kuma sending a message).
  In the current configuration, I wouldn't be protected against a compound failure of A-D (both Zabbix and Uptime Kuma down), but those might as well run on different clouds, with different orchestrators.
Of course, you can setup failover and redundancy options, but by that point you're probably also looking into distributed file systems for any backing storage like GlusterFS or Ceph but right now I don't need that complexity.

Furthermore, as you said, you can also rely on cloud services in addition to what you already have, so should A-D go down, then E will still be monitored by another solution as an alternative, though that's also hardly necessary for most things.

Hell, for all I care, I might as well have a Raspberry Pi on my desk that pings the servers, checks SSH connections, checks running Docker images, does a curl call and blinks and beeps aggressively when something isn't okay on servers that sit in a data center somewhere. It's not like there's not an endless amount of options. Of course, you can also go in the opposite direction and pick whatever is good enough, such as having A-B as a single server (or VM) and C-D as a single server (or VM), to not overcomplicate.


I know you can have all that :) All I say is that you must relay on externals if A-E are all on the same network as it may go down. Then your emails or other notif. channels wont work.

Be that as it may I think people generally tend to overkill redundancy. One can usually tolerate most of the regular services going down an hour or tow once every couple of years...


> All I say is that you must relay on externals if A-E are all on the same network as it may go down.

Thankfully, it's not too hard to take advantage of multiple networks in a hybrid/multi-cloud setup nowadays! Though, depending on the necessary access controls and auditing, such a setup might require slightly more work.

You do bring up an excellent point, though, about how it's a serious single point of failure in many systems out there, because personally I've also seen many setups like that (the majority of them, actually): I do suspect that in many cases that is indeed done for ease of use/convenience, even if it may lead to downtime.

Of course, in some cases downtime is acceptable, so I cannot argue that it can also make sense to choose such a simpler setup - for example, for having your own company's applications/monitoring for development environments all on the same network.

Though if this topology is retained at scale, things can get a bit interesting. On a similar note, I recall Bryan Cantrill doing an interesting presentation "Debugging Under Fire: Keep your Head when Systems have Lost their Mind" that talked about restarting their whole data center and the implications of that: https://youtu.be/30jNsCVLpAE




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: