I like the txtar file format mentioned in this video. (Described below.) I’ve worked with teams who would have rejected this idea from a junior dev because it might sound too simple. It’s easy to forget the inherent value of simplicity, which is at odds with impressive-sounding features, or demonstrating hard-won coding sophistication.
This is the complete txtar file format: One or more comment lines followed by zero or more virtual files. Each virtual file begins with a line like “-- filename --“ and ends at the next such line. Specifically, these filename lines are those beginning with dash dash space and ending with space dash dash. That’s the whole format.
It avoids the need to have multiple real files all over the place when you want just one real file but you want your code to think in terms of multiple files. (Eg have one txtar file that combines different blocks of test data.)
chezmoi (<https://chezmoi.io> or <https://github.com/twpayne/chezmoi>) has a couple dozen txtar tests. They are both amazing and completely frustrating to use, but I don't think that there would be a better way to test most of what chezmoi does without them.
Tom Payne (the creator and primary developer of chezmoi) has added some extra commands to the txtar context which makes things easier for certain classes of testing.
Agreed with most of this but I’m skeptical of the rsc.io/script dsl approach. I’ll try it, though, because Russ is often right.
shameless advert: do you wish testify was implemented with generics and go-cmp, and had a more understandable surface area? Check out my small library, testy https://github.com/peterldowns/testy
shameless advert: do you want to write tests against your postgres database, but each new test adds seconds to your test suite? Check out pgtestdb, the marginal cost of each test is measured in tens of milliseconds, and each test gets a unique and isolated postgres instance — with all your migrations applied. https://github.com/peterldowns/pgtestdb
Just helped a junior setup dockertest on a new service, interested in checking this out. Are you familiar with test containers or dockertest? How would you compare pgtestdb. We use postgres/pgx.
Great question! pgtestdb requires that you somehow run a postgres server. It then connects to that server and handles all of the database creation. The README goes into more detail here, but basically you should be easily able to point pgtestdb at a dockertest-managed postgres server.
One massive advantage of pgtestdb compared to using dockertest on its own is that it handles running migrations for you in a very efficient way that amortizes the cost to zero as the number of tests trends to infinity. This should be much faster than naively creating a new database and re-running the migrations for each test.
I happen to recommend using docker-compose over dockertest out of some slight personal preference that is one part separation of concerns (it's weird to me to have the test code handle the state of the postgres container), one part developer-experience related (it's nice for devs to be able to easily manage their postgres server instance with existing docker tools outside of golang code), one part infra related (it's nice to be able to use any method to provide and manage the postgres server, particularly in CI), and one part totally arbitrary.
Yeah I think we will give it a try today. We have a mix of devs using devcontainers, runnning pg themselves or working in cloud environment. But they all have a pg instance running already.
For migrations we have a built in house tool, but looks like we just need to satisfy the interface and we should be good to use our tool. Any tips on writing a migrator that won't be obvious?
Yeah, in some cases (devcontainer) we are using docker-compose to for the dev/pg/other systems and dockertest with docker in docker, to spin up the ephemeral pg instance for tests. It works fine, but is a little bit more complicated than I would like it.
I’m happy to hear you’re gonna give it a try! No special tips beyond looking at the existing docs and migrators to see what they do. If you’re stuck please file an issue, I can help debug. I’d really like to know what works and what doesn’t. (It would also be nice to just hear that it all works!)
That project looks interesting. I'm using test containers and I'm thinking it could be combined. Currently we're setting up the db container for the test suite, running migrations once then having each test run in a separate transaction, and rolling back after the test completes.
Want to see if there's any speed improvements and better separation if we switch to pgtestdb. The transaction approach has worked well but on some larger integration tests I have seen some conflicts due to our test fixture ids.
pgtestdb should work for your use case, and because each test gets its own database, you shouldn't run into any conflicts that are caused by different tests each trying to write data with the same ID. If it doesn't, please let me know and/or file a bug in Github.
One other nice thing about pgtestdb is that you can run all your tests in parallel, and if a test fails the related db will be left up for you to connect to via psql (helps a lot with debugging.) Compared to your transaction setup, the only data in the db will be from the test that failed.
You’re not alone — I am sure other people have done this before me but I haven’t ever seen it published, and I rediscovered this technique independently. The logic behind this library has been ported to both scala and typescript without issue, I hope you and others can benefit from the idea even if you don’t use my implementation.
It's really great. It reminds me of using overlay filesystems for tests so that you can maintain a clean read-only template and then run all your tests non-destructively on the overlay.
pgtestdb looks great. Looks like it leverages Postgres template DBs. We use MySQL more often than Postgres; I may take a stab at creating something similar for MySQL. Nicely done!
I haven't really been able to find a way that works well on MySQL (well, MariaDB). Transactions are too unreliable and magical in MariaDB, Memory databases/tables have all sorts of caveats and limitations, and there isn't really anything like PostgreSQL's schemas or templates.
The best I could come up with was to run MariaDB server with libeatmydata, which I believe was pretty much intended for this. It's not very "works on any MariaDB server"-generic, but it's good enough.
dang. Only thing I've found so far is a `mysqldump --no-data` and then a restore from dump. Not fast at all. Maybe you could pre-provision a few thousand DBs this way in parallel and write a service to hand out DB handles...
Using tmpfs for MySQL/MariaDB's data directory helps tremendously. If you're using Docker natively on Linux, use `docker run --tmpfs /var/lib/mysql ...` and that'll do the trick. Only downside is each container restart is slightly slower due to having to re-init the database instance from scratch.
Tuning the database server settings can help a lot too. You can add overrides to the very end of your `docker run` command-line, so that they get sent as command-line args to the database server. For example, use --skip-performance-schema to avoid the overhead of performance_schema if you don't need it in your test/CI environment.
For MySQL 8 in particular, I've found a few additional options tend to help: --skip-log-bin --skip-innodb-adaptive-hash-index --skip-innodb-log-writer-threads
(Those don't help on MariaDB, since it already defaults to disabling the binary log and adaptive hash index; and it doesn't have separate log writer threads.)
A lot of other options may be workload-specific. My product Skeema [1] can optionally use ephemeral containerized databases [2] for testing DDL and linting database objects, so its workload is very DDL-heavy, which means the settings can be tuned pretty differently than a typical DML-based workload.
Wow! Someone brought up and demonstrated go's "example tests" just yesterday to me and a group. He praised them highly for tech-adjacent fields where people don't have familiarity with software engineering practices for modular, tested, or ready to share projects. We discuss infrastructure, and it was enlightening to be reminded of the tech-adjacent fields that could benefit greatly from a few more of the practices common in the tech industry. I suppose the Joel test and 12 Factor app would be indicative guides.
I worked on a team with a metric and associated goal of x% code coverage. So of course, even negotiating what x should be resulted in a week's worth of meetings over a quarter. And the "branch coverage versus line coverage" war was long and ugly. When the metric was finally in place, it was of course immediately gamed to test getters and setters and to avoid the hard to test, sensitive integrated parts of the code.
This, in the same organization where another team was told by it's VP, "don't schedule time for unit tests, we are too far behind schedule for a luxury like that".
All this to say, #3 "coverage is not a substitute for thinking" really hit me in the (grumpy) feels.
Separating test data and test logic is exactly how we have built our test infrastructure to satisfy the FDA; making clear test data that you can then query and report on gives you tons of nice features for free!
It also really helps you to properly model your problem by breaking things out into their simplest possible components, and makes porting to a new implementation a breeze; reproduce a quick test data parser and some simple case logic, and you're well on your way to having two totally matching, but also totally separate, implementations!
...all of this without reinventing Lisp, of course ;)
Test code doesn't have tests of its own, so we must rely on inspection to establish that it is correct. This means that test code must be extremely readable.
It would make me uncomfortable to rely on tests like some of the examples in this talk, that have such a degree of complexity without explicit tests. Code such as to parse a text file should be in its own package. The package should expose an extremely readable API, and it should have its own suite of tests.
Similarly, writing formatting code for error messages and diffs inline makes the tests less readable and therefore less reliable. The golang standard library should include testing helpers such as in `testify`, which would allow tests to be concise, less buggy, and extremely readable.
Long time Go user, recently using rust as well. One of the things I miss most in rust are subtests and t.Errorf as opposed to t.Fatal.
But perhaps I don't know how to do it. In most of the rust codebases I worked with tests contain sequences of asserts (the failure of which causes the whole test to fail). Procedural macros are often employed to create many smaller tests that seem to address the use case of table driven tests + t.Error or subtests.
Macros in rust are ... adequate, but there is something bout being able to just express your logic in your main language instead of having to fiddle with a macro language and effectively code generation.
Does anybody have good tips for bringing some of the enjoyable testing patterns from Go into Rust?
Also, writing benchmarks that report time, memory, and allocations is just as easy as writing a test. A huge win for Go compared to trying to accurately benchmark memory or cycles in TypeScript or Java.
I do like Go tests. Would love to have ability to find out untested parts of the code base though to increase coverage, but I do not think there is a way to measure this with available tools just yet.
Although I have yet to make integration tests that require live database and other dependencies. That will be fun...
Anyway, my main gripe with tests or TDD is that even if the code base is 95% finished, not talking libraries here, even small code changes can cause a cascade of changes that need to be addressed and tests essentially multiply the work load by a factor of 10, easy. And I am not talking about big changes. It might be a simple addition of a new struct field which suddenly breaks half of your test suite. Hence teste should be, in my experience, written as the absolutely last step before going live. Otherwise they might impose massive costs on time, and potentially money(if we're talking actual business and not one man shot type of project).
You must be using a dynamic language with a heavy framework? Rails maybe?
I code in Go mainly (also Java and Rust) and never experienced what you describe: simple addition of a field to a struct does nothing if not used in code. And the use is simply checked by compiler.
However, I did work alongside a Rails team which had major gripes with this. They called it brittle tests: whenever they made a simple change (like adding a field), half of their tests would fail. This really lowered devs' confidence in their codebase and slowed the changes to a halt.
Small code changes breaking a gamut of tests is indicative of testing the wrong thing. It is almost certain that you have tested the wrong thing if they start breaking on changes when you are 95% complete.
I don’t have particular experience here with golang, but in other languages the biggest reason for this is mocking and stubbing everything.
Junior or intermediate developers start writing code. Most functions manipulate internal state instead of acting cleanly on inputs/outputs. Writing tests for this style of code is hard, so developers reach for the nearest mocking library. Now, instead of testing that given inputs produce given outputs, tests are written in a way that effectively they only verify that functions are currently implemented the way they are currently implemented.
These style of tests literally have negative value (NB, not all mocking is bad, but these type of tests are). Delete them when you find them.
Testing should help you accomplish two things: find bugs and allow confident refactoring. These do neither.
They don’t help you find bugs because they don’t look for bugs. They look for “the code is currently implemented in a certain way”. And this of course means if you implement the same logic a different way, they will fail.
Negative value. Delete them, and whenever possible rewrite modules that are designed such that they need to be “tested” in this manner.
> Testing should help you accomplish two things: find bugs and allow confident refactoring. These do neither.
Technically the primary goal of testing is to document for other developers (and future you) how something is intended to be used. The documentation being self-verifying most definitely helps with refactoring and may discover bugs, but those are largely incidental.
Having not really written many tests before due to having worked moreso as a data scientist (a smidgen of pytest here and there), it's so nice writing tests in Go. Definitely a considered and well implemented part of the language
Russ Cox has fully imbibed his $DAYJOB into his life to the extent it feels unhealthy. Golang Shirt; multiple golang wall paintings; countless golang soft toys in the background and on his side.
I am sure he has a few golang tattoos as well!
This is the complete txtar file format: One or more comment lines followed by zero or more virtual files. Each virtual file begins with a line like “-- filename --“ and ends at the next such line. Specifically, these filename lines are those beginning with dash dash space and ending with space dash dash. That’s the whole format.
It avoids the need to have multiple real files all over the place when you want just one real file but you want your code to think in terms of multiple files. (Eg have one txtar file that combines different blocks of test data.)