Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Among others, this problem:

http://www.dwheeler.com/essays/fixing-unix-linux-filenames.h...

find -print0 is a lame hack, and even filenames with spaces (not newlines) are somewhat messy to work with on the Unix shell.

Or a little recurring problem I have: How do I grep the output of grep -C (matches showing multiple lines delimited with a "--" line)? I wrote a custom tool to do it, which does the job, but really it would be nice if I could use all the normal line-based Linux tools (sort, uniq, awk, wc, sed) with a match as a "line".



> find -print0 is a lame hack, and even filenames with spaces (not newlines) are somewhat messy to work with on the Unix shell.

This problem is simply a flaw in sh (and its descendants), other shells handle it much better, see for example Tom Duff's rc shell: http://rc.cat-v.org

Also note that Plan 9, the successor to Unix (and which uses the rc shell as its main shell) doesn't even have a find command, find's design is not really very unix-y.

As for your second questions, the answer might be structural regular expressions: http://doc.cat-v.org/bell_labs/structural_regexps/


> This problem is simply a flaw in sh (and its descendants)

Indeed, although it's not just sh; if you want to, say, make a table of filenames and some attributes of each file, you're in trouble if the filenames contain spaces (awk, cut, sort don't work as easily) and screwed if they contain newlines.

What does Plan 9 use instead of find?

> As for your second questions, the answer might be structural regular expressions:

I've actually been meaning to write a clone of the command line portion of sam, tack on some slightly more powerful features, and try living with it... it would be able to solve much of that use case, but I think it would be cleaner if all the normal tools just knew that the output of grep -C is, in fact, a list of multiline strings.


The Plan 9 approach is to avoid creating problems for yourself by not using spaces in file names to begin with. The file server initially disallowed spaces in file names just as nulls and slashes are disallowed. That restriction has since been relaxed,† but everyone still avoids spaces. If you cannot avoid files with spaces in their names, there exists trfs,†† a file system that transparently replaces spaces with something more convenient.

Instead of find, I run /bin/test on a list of files. For anything more complicated than what test can handle, I use Inferno's fs program.†††

http://swtch.com/cgi-bin/plan9history.cgi?f=1999/0323/port/c...

†† http://a-30.net/inferno/man/4/trfs.html

††† http://www.vitanuova.com/inferno/man/1/fs.html It has a misleading name. It is not a file server.


filenames with newlines is an edgecase and a data problem. You can have fucked up characters in filenames doesn't mean you should.

You can spend 10 hours solving for edge cases or 1 hour redefining the problem. In this case by mandating we only work on files with ascii printable characters. It's often much, much easier to massage input than have follow on tools handle every freaking possible edge case ever.


The GVariant/dbus typesystem has both "string" which is a UTF8 text string and "bytestring" which is an array of bytes. The later is what you would use for filenames and would avoid problems with weird characters in filenames, etc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: