This does seem incredibly useful for service oriented architectures. As I understand, it's basically a per application, per machine monitoring library for quickly detecting problems up and down the network stack.
However it also does load balancing. But doesn't that defeat the purpose a little bit? If your monitoring tool is the same as your load balancing tool, then who's monitoring the load balancer? :) I might be misunderstanding the architecture here.
So correct me if I'm wrong, but it seems to maintain a web socket 'mesh' that it proxies all inter-service communication through. So whenever you need to speak to another service, you don't need to worry about the extra cost of creation/teardown of a new web socket. It also says that it handles automatic retries, global rate limiting (https://lyft.github.io/envoy/docs/intro/what_is_envoy.html).
Because all inter-service requests are going through Envoy, it is really easy to keep incredibly detailed stats about network health, request success rate & more.
Envoy performing the task of load balancing does not defeat the purpose, because it provides extremely detailed stats for ALL THE THINGS, they reported it helped them find problems much quicker, instead of checking service code, EC2 networking, or the ELB. Essentially by creating a supersolution with better stats reporting for all, troubleshooting seems like it would be easier.
This is hardly my field, but it was my understanding that this was the opposite. It's a load balancer that reports statistics so that you can figure out what's happening. That doesn't make your concern less valid, but it does change how everything is framed.
However it also does load balancing. But doesn't that defeat the purpose a little bit? If your monitoring tool is the same as your load balancing tool, then who's monitoring the load balancer? :) I might be misunderstanding the architecture here.