Hacker News new | past | comments | ask | show | jobs | submit login

The numbers shown are with zip -0, which disables compression.



What is the meaning then? What's the benefit of an archive that's not compressed? Why not use a FS?


Plenty of uncompressed tarballs exist. In fact, if the things I’m archiving are already compressed (e.g. JPEGs), I reach for an uncompressed tarball as my first choice (with my second choice being a macOS .sparsebundle — very nice for network-mounting in macOS, and storable on pretty much anything, but not exactly great if you want to open it on any other OS.)

If we had a random-access file system loopback-image standard (open standard for both the file system and the loopback image container format), maybe we wouldn’t see so many tarballs. But there is no such format.

As for “why archive things at all, instead of just rsyncing a million little files and directories over to your NAS” — because one takes five minutes, and the other eight hours, due to inode creation and per-file-stream ramp-up time.


> random-access file system loopback-image standard

> But there is no such format.

What do you think of ISO (ISO9660)? just downloaded a random image to double-check, opens on OSX, Windows and Linux just fine.

It's read-only ofc, but that should not be a problem given we're talking about archives


Archives often have checksums, align files in a way conducive to continuous reading which can be great performance wise in some cases (and like zip, also random read/write), and can provide grouping and logical/semantic validation that is hard to do on a ‘bunch of files’ without messing it up.


FWIW, IPFS does all of that by default (maybe outside of the continuous reading part).


Hop reduces the number of syscalls necessary to both read and check for the existence of multiple files nested within a shared parent directory.

You read from one file to get all the information you need instead of reading from N files and N directories.

Can’t easily mount virtual filesystems outside of Linux. However, Linux supports the copy_file_range syscall which also makes it faster to copy data around than doing it in application memory


An archive is essentially a user-mode file system. User-mode things are often more efficient in many ways, as they don't need to call into the kernel as often.



I don't know if it allows streaming, but if it does transferring files on portable devices or streaming their over the wire is a lot faster this way compared to the files directly. Especially for small files


The speed of accessing a zip -0 archive would seem to be an implementation issue, not a format issue. Why didn't you fix the performance of your zip implementation instead of inventing yet another file format?


The format makes different tradeoffs vs. the zip format, making creation more expensive but reading cheaper. With that said, if you imposed additional restrictions on the zip archive (e.g., sorted entries, always include end of central directory locator), and the reader understands that the archive conforms, the performance difference will probably be imperceptible.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: