Polyglot is a distributed web framework for multiple programming languages

Xorlev · on July 14, 2014

"Messsage queue - a queue that receives the messages that represent the HTTP request. the acceptor accepts HTTP requests and converts the requests into messages that goes into the message queue. The messages then gets picked up by the next component, the responder. The implementation of the message queue is a RabbitMQ server."

Alrighty then. Someone has never scaled RabbitMQ vs. a basic HTTP service. If raw scalability is what you're looking for w/ a polyglot backend, an edge service that accepts HTTP and turns those requests into Thrift structs (or similar) to RPC to various polyglot services might be better for you. This is the model most use.

However, I'm unsure how this'll be more 'performant' than picking the right technology from the start and architecting wisely. Generally, the more performant you want something to be the simpler you build it and only compromise where necessary. Thrift/RabbitMQ are definitely complexity compromises.

Complexity is the bane of scalability.

Additionally, if you needed pure scalability, you generally have purpose-built services for each "responder" which is load balanced over. Pretty similar to this, minus the message queue.

I imagine having a message queue in the middle of your HTTP response path could lead to some nasty latency spikes too. Much better to drop a request with a 503 than have the next N spin for minutes while workers chug through it. Especially if you're taking in 10K req/s.

Last thought: The benchmarks are lacking detail, could use a more thorough job.

sausheong · on July 14, 2014

Thanks for the comments. The detailed benchmark data is in the perf/ directory, I just did some basic analysis.

kcorbitt · on July 14, 2014

Switching from a monolithic framework like Rails to a number of independent communicating services that handle different responsibilities is a classic step in scaling. However, to the best of my knowledge that transition usually involves moving to a mostly-custom setup dependent on the app's specific needs.

It's not clear exactly what functionality Polyglot provides beyond, say, raw RabbitMQ, but if it can find a way to encode best practices in a service-oriented architecture it could be a handy tool for developers going through this process for the first time.

sausheong · on July 15, 2014

Polyglot as it is at the moment is an experiment, a prototype. Still fleshing things out, so feedback and contribution is always welcome :)

buro9 · on July 14, 2014

I had to stop and check where this guy works. 2 companies I know of have just moved to something fairly similar.

A request is turned into a JSON message pumped into a queue with a signature declaring the type of message it contains, service discovery reads from the queue and allocates a service to handle it, shuffling it onto another queue (and if necessary spinning up the service). The service picks up the queued item, processes it and hands it back where the new message may be the response (in which case it gets handed back) or another service call (in which case discover the handler and assign it to a queue).

It's SOA based on messaging and a basic pipeline. Except they don't call it that.

Thankfully the applications in question do not have low response time as a core criteria.

seguer · on July 14, 2014

If the goal is the ability to have certain routes processable by different languages/systems you could achieve this with reverse proxying (from eg. nginx) [1].

That way you can leverage any existing language frameworks and run them as standard HTTP responders. No need to work with a queue (and add it to the stack).

You can still limit the HTTP methods each proxy responds to as well [2].

[1]: http://nginx.com/resources/admin-guide/reverse-proxy/

[2]: http://stackoverflow.com/questions/8591600/nginx-proxy-pass-...

sausheong · on July 15, 2014

Thanks for the suggestion, it's a good one. A few cases a message queue can be advantageous -- (1) persistence (2) a few responders can work on the same request in parallel (3) adding/removing responders dynamically according to the load.

These are not common/generic use cases but would be useful under particular circumstances.

* I could be wrong with (3) -- I'm not very experienced in reverse proxies.

seguer · on July 15, 2014

You're correct on 3; I actually did this in a large system I wrote and worked on at my last job.

It wasn't quite dynamic (it required an engineer to set new values for how many workers you wanted..) but we could do this via a GUI.

For (1) what do you do with the persistence? A web request, in general, is not important after a few seconds.

For (2) how does Polyglot accept multiple responders for a single request, and how would it join the responses?

sausheong · on July 15, 2014

Thanks for the confirmation.

The implementation today is as a task queue which removes the request from the queue once a responder acknowledges, but it could be a pub-sub model, where a number of independent responders can work on the same message in parallel, and only 1 responder need to return a response. In this case, persisting in the queue is useful.

An alternative is to chain the responders where one responder can leave a message in the queue for another responder, and the final responder returns the response.

Polyglot is still experimental though, and the current implementation is a prototype.

CMCDragonkai · on July 15, 2014

The main problem with that, is that reverse proxying is push based, message queues are pull based.

seguer · on July 15, 2014

What makes this a problem? The initial request to Polyglot is also "push".

The rest of the web works on "push" too; pull in this case would only help if you don't care that a request could take a long time (seconds) to resolve.

I didn't see mention of it, but what happens if a message is not responded to? How does Polyglot handle time outs?

sausheong · on July 15, 2014

That's not implemented yet I'm afraid, I haven't figured out a good mechanism to timeout the messages.

floatboth · on July 14, 2014

So, it's like mongrel2, but a bit worse: AMQP is centralized, unlike ZeroMQ.

tlrobinson · on July 14, 2014

That was my thought as well, though I never really understood the point of Mongrel2.

It's basically a reverse proxy that speaks to upstream application servers using a custom protocol over ZeroMQ instead of HTTP over TCP? Why is this better than just using HTTP?

Does anyone here use Mongrel2? Do you like it?

rubiquity · on July 14, 2014

I haven't used Mongrel2, though I have used ZeroMQ, but I can try to answer this question:

> Why is this better than just using HTTP?

In SOA most of your services aren't going to be exposed publicly. HTTP is a great protocol for public facing servers, but HTTP is a very clunky protocol. For private services, it's a pretty big benefit (performance, scalability and ease of parsing) to skip HTTP and use something else. ZeroMQ gives you several messaging patterns that you would never get from HTTP.

FooBarWidget · on July 14, 2014

That makes sense if your services do not follow the request-response pattern (e.g. background workers). But what if they do? In Polyglot, the services very clearly follow a request/response pattern because they're handling web traffic. What sense, then, does it make to use a message queue?

Persistence? Makes no sense for web traffic. Even if the message is persisted to disk, it's useful for a few minutes at most before the user gives up and closes the tab.

Language-independence? You don't need a message queue for that. You can do that with regular HTTP.

Scaling and load balancing? Ditto.

rubiquity · on July 14, 2014

I'm not sure what your comment is getting at. I think HTTP is fine for the Web, and any service exposed and designed for use by the Web is going to have to use HTTP. But if you're going to use a framework like Polyglot, you're going to have several services and any of those services not directly communicating with the Web doesn't need to speak HTTP.

FooBarWidget · on July 14, 2014

What I'm getting at is why those services shouldn't speak HTTP. I understand that they don't need to speak HTTP per se, but I get the feeling that your comment is implying that such services should speak a non-HTTP protocol, while I think that it's fine even if those services speak HTTP.

socceroos · on July 15, 2014

Persistence can be very important depending on your needs. Save-and-forget comes to mind.

tlrobinson · on July 14, 2014

That makes sense. There are also good reasons to just HTTP: one less technology to understand/troubleshoot, reuse existing tools and middleware.

Xorlev · on July 14, 2014

We've had a similar argument internally. In load testing, we're not bounded by the speed of our JSON parsing, and if we were we'd probably just swap that endpoint to use a different message body provider before moving to a totally binary RPC model.

Especially in the Java world, where Jackson + Afterburner is fast enough for most cases. Protobufs/Thrift will smoke it in most performance tests, true enough, but when you're waiting on a database or algorithm to run, what's JSON serialization?

There's certainly benefits at extreme scale, but not enough to justify the loss of tooling that comes with it until necessary.

rubiquity · on July 14, 2014

Yup, I agree with that and in a team environment that may be very valuable. HTTP is definitely the "JavaScript of protocols." However, I don't think learning other protocols is a ton of overhead, and learning is always a good thing ;)

jkarneges · on July 14, 2014

We use Mongrel2 at Fanout.io. I love it. It's part of our general ZeroMQ architecture of having lots of components that each do one thing well.

ZeroMQ is an improvement over plain HTTP primarily because you can use fewer internal pipes by interleaving requests and responses over the same sockets.

We even created a HTTP over ZeroMQ spec to help standardize this approach: http://rfc.zeromq.org/spec:33

FooBarWidget · on July 14, 2014

What do you mean by internal pipes? Do you mean file descriptors? Why is reducing the number of file descriptors a good thing? After all, the total amount of traffic stays the same.

jkarneges · on July 15, 2014

If you have a lot of fds, then you have fd polling issues to deal with for high performance.

To be honest I've not actually benchmarked anything comparing the throughput of an efficient fd poller to multiplexed pipe. It just feels nice to avoid the problem, considering what kind of effort can go into those pollers.

Some discussion from Zed about fd polling: https://web.archive.org/web/20120225022154/http://sheddingbi...

steveklabnik · on July 14, 2014

This might be a weird edge case, but in Rust, 0MQ bindings came really quickly, as its protocol was easy. This let Rust do web stuff _far_ earlier than the still-ongoing work to actually write a server in Rust itself.

At least in Ruby world, we steal Mongrel's parser over and over and over and over. Seems like it might be easier to not have to do that, and just use Mongrel2 with a 0MQ library. I haven't actually done this, though...

CMCDragonkai · on July 15, 2014

The advantage of Mongrel2 is this: http://zef.me/4502/message-queue-based-load-balancing

Message queue based load balancing.

Also dynamic load balancing without restarting any daemons.

kodablah · on July 14, 2014

AMQP/Rabbit can use TLS (even if not leveraged in this project, something not possible for ZeroMQ IIRC). Also, define "centralized". An intelligent AMQP client properly falls over to another node in a cluster, and Rabbit offers good durability guarantees while still allowing quorum and tolerating node failure.

FooBarWidget · on July 14, 2014

But we're talking about HTTP requests here. The RabbitMQ durability is useless for this use case. Suppose the RabbitMQ node fails, and the admin notices after 2 minutes and solves the problem after 2 more minutes. The user who initiated the HTTP request has long pressed Stop in his browser.

kjaleshire · on July 14, 2014

True, durability wouldn't help here, however, individual Rabbit queues can fail over to alternate nodes if the master node for that queue stops responding.

calineczka · on July 14, 2014

Exactly what I was wondering. How different from mongrel2 it is.

dbpokorny · on July 14, 2014

"forcing the deliberate use of different programming languages"

...wait, what? I don't see how this solves anything. It's like asking American schoolchildren to learn English, Russian, and Chinese before doing math. Makes no sense.

vdaniuk · on July 14, 2014

I'd like to share my experience. I started to learn programming by taking Coursera and other moocs in python, ruby, javascript and c. I feel that my understanding was greatly enhanced by simultaneously getting to know various language.

So this may be a nice learning exercise.

SEJeff · on July 14, 2014

Also relevant to building microservices in a somewhat similar fashion: https://github.com/koding/kite

datashaman · on July 14, 2014

This sounds very similar to tir / mongrel2.

shebson · on July 14, 2014

This is cool, but the naming is unfortunate as Polyglot is already a somewhat popular library for doing internationalization in Javascript: https://github.com/airbnb/polyglot.js

ilaksh · on July 15, 2014

Seems like this is pretty much what every single system does that is based on a web application that scales with a message queue.

With Polyglot are there standard SDKs for responders or acceptors?

I think we should relate this to that ocaml mirageos thing and the idea of a common knowledge representation for program generation. I think pattern with queue has a fairly close correspondence with some common OOP patterns.

We are repeating the same patterns over and over in different contexts for different applications. I think that we have semantic representations and programming languages that if we created a good common dictionary and referenced that rather than restating everything I different forms then we could get much better code reuse.

mcguire · on July 14, 2014

"1. Acceptor

"2. Message queue

"3. Responder"

So, a SOA?

hangonhn · on July 14, 2014

It's kind of amazing how often SOA is rediscovered by people. It's not at all exotic either. It's fairly common in enterprise applications but I guess maybe most people don't write enterprise apps?

My first thought when I read the article was "Did he just reinvent the wheel?"

It's in Go so that's kind of neat.

programminggeek · on July 14, 2014

I actually built something very similar 2 years ago http://radial.retromocha.com

Mine used a node proxy instead of a message queue, but same basic idea. It makes scaling and changing languages so much easier.

Really, the trick is having a standard message protocol that everything abides by. Once you have that, building a proxy and frameworks around it is pretty trivial. I chose something similar to JSON-RPC and for what I wanted/needed it worked well.

It never saw any kind of scale, but it was a fun project.

FooBarWidget · on July 14, 2014

Why does it make scaling and changing languages easier? How is using an HTTP reverse proxy/load balancer not just as easy, if not easier?

sausheong · on July 15, 2014

Nice! Thanks for sharing the project.

jgill · on July 14, 2014

I thought of Polyglot by AirBnB when I first saw this post, https://github.com/airbnb/polyglot.js

webmaven · on July 14, 2014

Acceptors / Responders feel a bit like Python's WSGI model, with the addition of a queue in the middle for connecting gateways to applications. I suspect that the similarity extends to JSGI (Javascript), SCGI, and PCGI (perl).

EGreg · on July 14, 2014

What is the benefit of having this vs say a monolithic framework?

pmontra · on July 14, 2014

So you can write the new functionalities in the new popular framework and keep the old one running. Why that's a better choice... well it depends on case by case.

However this is a bit like a reverse proxy that load balances many different web apps. You could have different applications written in different languages serving requests to the same url.

Furthermore I don't see anything here that load balancers haven't done since the '90s. Maybe I'm missing something but maybe that's why everybody is puzzled.

sausheong · on July 14, 2014

It's supposed to be more fine grained than load balancers, and you should be able to scale up and down different parts of the same web app dynamically. That probably didn't come up well in the write-up.

GUNHED_158 · on July 14, 2014

I was thinking why RabbitMQ and not ZeroMQ? just for my knowledge

SolarNet · on July 14, 2014

They solve slightly different (architecture) problems and they are on completely different ends of the message queue library spectrum of "usability vs. customizability".

RabbitMQ is a "batteries included" solution. ZeroMQ is a roll your own sort of library. If you just want a message queue use RabbitMQ. If you want to build your own message queue system (with complex or specific requirements) use ZeroMQ.

sausheong · on July 15, 2014

RabbitMQ was easier for me to develop a prototype to test out the idea.

mmgutz · on July 14, 2014

vert.x?

cordite · on July 14, 2014

vert.x is limited to only languages with supported implementations and hooks in the JVM world.

CmonDev · on July 15, 2014

Based on examples, it seems to be meant for dynamic scripting languages, rather than programming languages?

sausheong · on July 15, 2014

Not really. I started with Ruby, Python and PHP but it's very much possible with compiled languages like C/C++. There's a whole bunch of client software that allows you communicate with RabbitMQ - https://www.rabbitmq.com/devtools.html