Since Hop doesn't do compression, the most appropriate comparison would be to as...

gnabgib · on Nov 10, 2021

tar doesn't do compression either, and zip doesn't NEED to (several file formats are just bundles of files in a zip with/without compression)

syrrim · on Nov 10, 2021

Tar does do compression, via the standard -z flag. Every tar I have ever downloaded used some form of compression, so its hardly an optional part of the format.

codetrotter · on Nov 10, 2021

It is important to distinguish between tar the format and tar the utility.

tar the utility is a program that can produce tar files, but it is also able to then compress that file.

When you produce a compressed tar file, the contents are written into a tar file, and this tar file as a whole is compressed.

Sometimes the compressed files are named in full like .tar.gz, .tar.bz2 or .tar.xz but often they are named as .tgz, .tbz or .txz respectively. In a way you could consider those files a format in their own right, but at the same time they really are simply plain tar files with compression applied on top.

You can confirm this by decompressing the file without extracting the inner tar file using gunzip, bunzip2 or unxz respectively. This will give you a plain tar file as a result, which is a file and a format of its own.

You can also see the description of compressed tar files with the “file” command and it will say for example “xz compressed data” for a .txz or .tar.xz file (assuming of course that the actual data in the file is really this kind of data). And a plain uncompressed tar file described by “file” will say something like “POSIX tar archive”.

thayne · on Nov 11, 2021

> When you produce a compressed tar file, the contents are written into a tar file, and this tar file as a whole is compressed.

I don't think that's quite how it works. I'm pretty sure it's more similar to doing something like `tar ... | gzip > output.tgz`. That is it streams the tar output through gzip before writing to the file. In fact with GNU tar at least you can use a `-I` option to use an arbitrary program to compress.

codetrotter · on Nov 11, 2021

Yeah I formulated it simplistically to not make the comment too long. But I didn't mean to imply that a tar file is actually created on disk or anything like that before compression happens.

nemetroid · on Nov 10, 2021

Tar (the tool) does compression. Tar (the format) does not. Compression is applied separately from archiving.

https://www.gnu.org/software/tar/manual/html_node/Standard.h...

Jarred · on Nov 10, 2021

The numbers shown are with zip -0, which disables compression.

kupopuffs · on Nov 10, 2021

What is the meaning then? What's the benefit of an archive that's not compressed? Why not use a FS?

derefr · on Nov 10, 2021

Plenty of uncompressed tarballs exist. In fact, if the things I’m archiving are already compressed (e.g. JPEGs), I reach for an uncompressed tarball as my first choice (with my second choice being a macOS .sparsebundle — very nice for network-mounting in macOS, and storable on pretty much anything, but not exactly great if you want to open it on any other OS.)

If we had a random-access file system loopback-image standard (open standard for both the file system and the loopback image container format), maybe we wouldn’t see so many tarballs. But there is no such format.

As for “why archive things at all, instead of just rsyncing a million little files and directories over to your NAS” — because one takes five minutes, and the other eight hours, due to inode creation and per-file-stream ramp-up time.

wizzard0 · on Nov 19, 2021

> random-access file system loopback-image standard

> But there is no such format.

What do you think of ISO (ISO9660)? just downloaded a random image to double-check, opens on OSX, Windows and Linux just fine.

It's read-only ofc, but that should not be a problem given we're talking about archives

lazide · on Nov 10, 2021

Archives often have checksums, align files in a way conducive to continuous reading which can be great performance wise in some cases (and like zip, also random read/write), and can provide grouping and logical/semantic validation that is hard to do on a ‘bunch of files’ without messing it up.

eins1234 · on Nov 10, 2021

FWIW, IPFS does all of that by default (maybe outside of the continuous reading part).

Jarred · on Nov 10, 2021

Hop reduces the number of syscalls necessary to both read and check for the existence of multiple files nested within a shared parent directory.

You read from one file to get all the information you need instead of reading from N files and N directories.

Can’t easily mount virtual filesystems outside of Linux. However, Linux supports the copy_file_range syscall which also makes it faster to copy data around than doing it in application memory

chrisseaton · on Nov 10, 2021

An archive is essentially a user-mode file system. User-mode things are often more efficient in many ways, as they don't need to call into the kernel as often.

zuhsetaqi · on Nov 10, 2021

https://github.com/Jarred-Sumner/hop#why

rjzzleep · on Nov 10, 2021

I don't know if it allows streaming, but if it does transferring files on portable devices or streaming their over the wire is a lot faster this way compared to the files directly. Especially for small files

zinconfire · on Nov 11, 2021

The speed of accessing a zip -0 archive would seem to be an implementation issue, not a format issue. Why didn't you fix the performance of your zip implementation instead of inventing yet another file format?

lern_too_spel · on Nov 11, 2021

The format makes different tradeoffs vs. the zip format, making creation more expensive but reading cheaper. With that said, if you imposed additional restrictions on the zip archive (e.g., sorted entries, always include end of central directory locator), and the reader understands that the archive conforms, the performance difference will probably be imperceptible.

kumarsw · on Nov 10, 2021

For a random-access archive format that supports compression see DAR: http://dar.linux.free.fr/home.html

It doesn't seem very well known, which is unfortunate because it's much better suited for archiving files compared to gzipped tar (which is great for distributing files, but not great for archiving/backup).

orlp · on Nov 11, 2021

> It's not hard being faster than zip if you are not compressing/uncompressing

Actually, since CPUs are so fast and disk IO is so slow, it is essentially impossible to beat a properly tuned program that reads data from disk and decompresses it on the fly using one that doesn't use compression at all.

rendaw · on Nov 11, 2021

This looks really cool, I was looking for a simple compressed archive format that doesn't have junk like permissions, filename encoding issues, etc.

It claims it's easy to write a parser but then requires "Pickle" (derived from Python's Pickle?) which is just a link to a chrome header file... I'm not aware of a standard and like that it doesn't seem like it even wants to be standardized. Do they mean easy to write a parser as long as you're using chrome's JS engine?

catchmeifyoucan · on Nov 11, 2021

I was poking around, and I didn’t see a Weismann score. Wonder if anyone ran a test.