Toxiproxy – simulate network and system conditions for chaos testing

Arubis · on Oct 11, 2023

On a similar note, my favorite-named network testing tool: https://github.com/tylertreat/comcast

smusamashah · on Oct 11, 2023

And clumsy https://github.com/jagt/clumsy which is windows only.

intelVISA · on Oct 11, 2023

Based on the name alone I just knew this was going to be written in Go.

lib-dev · on Oct 11, 2023

eatonphil · on Oct 11, 2023

Tools like this and Jepsen and chaosmonkey do fault injection on network and/or processes.

Are there any tools you like for storage fault injection?

Many database companies have an internal tool specific to their database for this. I think Scylla has a public tool for storage fault injection.

I've been experimenting with ptrace for storage fault injection myself. And want to try FUSE and some other tech out.

https://notes.eatonphil.com/2023-10-01-intercepting-and-modi...

dur-randir · on Oct 12, 2023

>Are there any tools you like for storage fault injection?

You can try dmsetup with dust/flakey targets.

baz00 · on Oct 11, 2023

We used this. Then we realised none of our guys have any idea how to build systems that are resilient to the failures we simulated. So they swept it under the rug and just cross fingers nothing happens.

mysterydip · on Oct 11, 2023

Sounds like a project I tested once. All UDP, which is fine, but they assumed that all packets would arrive, in order.

pvorb · on Oct 11, 2023

So they switched to TCP before they went live and everything worked fine?

mysterydip · on Oct 11, 2023

No, they kept UDP and crossed their fingers like parent :/

leetrout · on Oct 11, 2023

I can't say enough good things about Toxiproxy.

I will repeat my previous comment[0] about it:

> Toxiproxy is fantastic. I wish they supported a full configuration file in JSON or TOML or something but other than that it has been a lifesaver testing websockets.

0: https://news.ycombinator.com/item?id=32116969

Uptrenda · on Oct 12, 2023

Toxiproxy is based. It's basically a REST API that lets you spawn sub-servers acting as a relay to some remote or local machine. From there: you're able to add different behaviors to the relay (using a REST API on the sub server.) These behaviors are called 'toxics' and they're designed to introduce more uncertainty in the relay they run on. For example - there's a toxic that lets you introduce both latency and a random range of jitter to delay packets by.

You can add behavior to drop certain packets (like every N packets) or even to split messages up into multiple packets (TCP works based on a stream protocol so technically a send may result in multiple packets being received. This is why the recv() function used in BJ's guide to network programming works up to a modified wrapper that uses a while loop until it returns no data! I believe BJ does the same for send (it assumes a send buffer of unknown length and that send might not send all your data))

Toxiproxy is a very-well engineered tool because you can point any of your TCP client software to the right Toxiproxy relay end-point and you won't have to modify your code. What's significant about Toxiproxy is it creates the basis to start thinking about the requirements for network code in a more scientific way. For example: lets say you write network code. The performance of your code depends on how your network behaviors when you run the code. Consequently, its extremely difficult to know if your software will actually handle adversity in the future. But with Toxiproxy you can approach write algorithms that better handle this.

Shameless plug: I am slowly working on my own networking stack in Python and I implemented my own version of toxiproxy (the server and client.) Otherwise you would have to download toxiproxy's server in Go for a project which isn't that easy to package and use as part of testing. Here's info about it on my docs page - https://p2pd.readthedocs.io/en/latest/built/toxiproxy.html The interface is still unstable and may change or have bugs. But there's integration tests at least, lel.

I read what another commenter here said about UDP. I wanted to say that UDP is a total pain in the ass to work with. It took me a long time to design parts of my code that do address lookups with STUN because of UDP. It gets easier though once you know what to expect.

tlarkworthy · on Oct 11, 2023

Oh I just started using this. I was a bit disappointed initially that it can't randomly drop connections through its probabilistic filters, but you can still achieve this with another process commanding it so it's stayed.

jsiepkes · on Oct 11, 2023

I think when doing integration testing with ToxiProxy often people want it to be "predictable". Most of the time you just want to prove the application behaves correctly when it needs to reconnect. Otherwise you can get tests which sometimes fail and sometimes not. That's also a valid test, but then it's a smoke test.

What I often do is create a small framework which uses a pseudo random number generator for all the "randomness". I then feed a list of seeds to this random number generator which makes the tests "random" but repeatable since you know the seed.

tlarkworthy · on Oct 11, 2023

Even this headline says for chaos testing but I had to bring my own chaos. Still good though, it's very clear what it is doing in a docker compose

Alifatisk · on Oct 14, 2023

Refreshing to see Ruby being used!

https://github.com/Shopify/toxiproxy-ruby

SeriousM · on Oct 11, 2023

I wonder why there is no remote control web interface. It would be just awesome to flip a switch to jitter/disable proxied connections and a slider to rate limit the connection. Should be easy, right?