Content and patch distribution for video games: Data integrity, progressive downloads, file-level patching, compression, encryption, and platform/version branching.
It's quite mind-boggling; nobody is really doing it on an industry-scale level. Every video game developer has their own way, all of which have their own problems.
It is a very hard problem. Blizzard actually came up with a very good system, but it's not in a state where it can be commercialized or open sourced.
I actually think whoever comes up with a system which solves these problems in a clean and consistent way will be sitting on a little revolution for content distribution.
I have devoted 10 years to game content distribution, packing, compression etc. (Now not in gamedev anymore)
This is a very easy problem which usually solved by attaching fairly simple script which is aware of your file formats to any commercial installer system.
Some companies are even selling more or less standard solutions for that, but in reality from any given 1000 games 900 will have very different data formats and all have fairly good reasons to do so - using universal "patch systems" really creates more problems.
I think the "900 different data formats" problem is something that will go away as we move towards better tools which cover all the standard use cases.
Gamedev is riddled with really smart people that reinvent the wheel all the time because they found a way to micro-optimize this or that. They get to do this because until recently, there was no "good enough" solution for a wide range of games (or the "solution" was priced with enough zeroes to make bill gates cringe).
But you saw how popular Unity got, and how fast. That's the games industry in a nutshell: ripe for solutions that work for more than just one studio.
BTW Unity, with all its excellence, has really horrible data format for content and patch distribution, and had and still has huge problems with this. Perhaps the legacy of early overengineering and struggle to protect the games from easy reverse-engineering.
And compare, say, to simple incremental zips of Quake with alphabetic file loading order.. Total no-brainer to implement and use. (I have even seen zips with custom LZMA compression!)
So, if any, someone will have to solve a problem of artificially created obstacles, not a problem per se.
The path forward for games is roughly similar to where digital audio is now: Comprehensive workstation environments with an easing facade through plugins, presets, etc. The coarse elements of a rendering algorithm or a piece of game logic can be reduced to a processing graph, behavior tree, or other convenient abstractions. They can plug into each other by exposing both assets and processing as globally addressable data. Original coding for game logic will still be required for the foreseeable future, but most of the development problem is weighted towards getting assets in the game, and that can be abstracted.
This is done in bits and pieces across existing engines and third-party tools, but there's a lot of room to make it cheaper and easier.
I used to use irrlicht and Ogre. Both have the problem of only really doing graphics and to a certain extent input. In comparison, Unity and Unreal offer the whole package: graphics, asset pipeline, audio, networking, and physics.
Speaking from experience as I'm currently making the jump to Unity for my projects, the time savings of choosing one of the all-in-one engines instead of gluing together engines is really substantial.
Wharf is a protocol that enables incremental uploads and downloads to keep software up-to-date. It includes:
A diffing and patching algorithm, based on rsync
An open file format specification for patches and signature files, based on protobuf
A reference implementation in Go
A command-line tool with several commands
Butler is the commandline tool for generating patches (it can negotiate small diffs from the server without requiring a full local copy of the thing you're diffing against), uploading them and applying them back on the client.
It is used to power itch.io's Steam-like application, itch: http://itch.io/app, delivering multi-gigabyte game installs & updates.
Hey, amos here, main developer of wharf/butler, here's a quick technical summary so you don't have to do the digging yourself:
- File formats are streams of protobuf messages - efficient serialization, easy to parse from a bunch of programming languages. Most files (patches, signatures) are composed of an uncompressed header, and a brotli-compressed stream (in the reference implementation, compression format are pluggable) of other messages.
- The main diff method is based on rsync. It's slightly tuned, in that: it operates over the hashes of all files (which means rename tracking is seamless - the reference implementation detects that and handles it efficiently), and it takes into account partial blocks (at the end of files, smaller than the block size)
- The reference implementation is quite modular Go, which is nice for portability, and, like elisee mentioned, used in production at itch.io. We assume most things are streaming (so that, for example, you can apply a patch while downloading it, no temporary writes to disk needed), we actually use a virtual file system for all downloads and updates.
- The reference implementation contains support for block-based (4MB default) file delivery, which is useful for a verify/heal process (figure out which parts are missing/have been corrupted and correct them)
- The wharf repo contains the basis of a second diff method, based on rsync - for a secondary patch optimization step. The bsdiff algorithm is well-commented with references to the original paper, and there's an opt-in parallel bsdiff codepath (as in multi-core suffix sorting, not just bsdiff operating on chunks)
- A few other companies (including well-known gaming actors) have started reaching out / using parts of wharf for their own usage, I'll happily name names as soon as it's all become more public :)
It's actually exactly what you described - the documentation is very sparse on it because it's an internal thing (I'm guessing you found the CASC documentation, not the NGDP one). If you're interested, shoot me an email and I can send you some more details; but it'd simply be for intellectual curiosity, as I said it's an internal protocol.
It's exactly what you want, but this service isn't poplar. It doesn't work because everyone is on Steam. Network effects. It's like when app.net tried to replace twitter. You can get mad at users, but users are using Steam & Origin and Battle.net and they don't care.
For themselves! It's also not a great system, lots of legacy. More to the point though, it's not a commercial system. Steam behaves as a distribution platform and licenses the publishing rather than the distribution.
Which is not at all interesting for self-published games (be it indies who want to avoid steam, big publishers with their own systems etc).
Steam supports any game from any publisher, including self-published games. Steam is exactly what you get when you try to solve this problem, because none of the big companies want to be involved, since they see it as competition. Blizzard and EA are the only two big companies I know of that are not on Steam, and they both have direct competitors to it that are vendor-locked.
It's quite mind-boggling; nobody is really doing it on an industry-scale level. Every video game developer has their own way, all of which have their own problems.
It is a very hard problem. Blizzard actually came up with a very good system, but it's not in a state where it can be commercialized or open sourced.
I actually think whoever comes up with a system which solves these problems in a clean and consistent way will be sitting on a little revolution for content distribution.