Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I will add that I'm plenty "smart" enough to understand that "10" comes before "9" in a strictly alphabetical sense, and I still want my file managers to sort "9" before "10".

I don't want to put leading zeroes before every all the single digit numbers in my file names. (And then potentially go come back later and add even more leading zeroes once the maximum number reaches three digits.)

---

I split all of my audiobooks into chapters. I use the format "Chapter 01.mp3" (or "Chapter 001.mp3" when there are > 99 chapters) because some (all?) MP3 players are too stupid to sort numbers properly and I want my audiobooks to work everywhere.

This works, but it looks kind of ugly and creates extra work—yes I have scripts to automate it, it's still an extra step—and it would be great if I could just trust that every device will understand numbers.





> I don't want to put leading zeroes before every all the single digit numbers in my file names.

> ... it would be great if I could just trust that every device will understand numbers.

Strings are not numbers, even if some part of their content "looks like a number."

> I will add that I'm plenty "smart" enough to understand that "10" comes before "9" in a strictly alphabetical sense, and I still want my file managers to sort "9" before "10".

Problem is, this is your preference for a specific situation. Which may not be another person's preference in the same situation nor yours in a different situation.

So what are programs to do?

Display strings in a consistent, documented, manner. Which is lexicographical ordering in all cases lacking meta-data to indicate otherwise.


> Display strings in a consistent, documented, manner.

IMO, "Treat any sequence of digits as a number for the purpose of sorting" is consistent. I'm not sure if it's documented—I've never needed to look up the documentation—but if it's not, the developers could certainly fix that.

> this is your preference for a specific situation.

Sure, but we generally make decisions based on which situations we think will be most common. I think having ten or more things (screenshots, audio samples, whatever) named "Thing 1" – "Thing 10" in a folder is extremely common. And if Thing 10 comes before 9, it's really annoying!

Let's say I have a directory of 32 numbered files. Under the author's preferred sorting method, they'll get displayed:

    1
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    2
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    3
    31
    32
    4
    5
    6
    7
    8
    9
If I download a folder with files like this, I basically have to pause whatever I'm doing and edit the files to have leading zeroes before I can make sense of what I'm looking at.

Do I understand that you want these to be sorted like this?

  1
  2
  9
  10
  11
So I guess you also want things sorted like

  1.1
  1.2
  2
  9
  9.9
And also

  1
  1.1
  1.10
  1.2
  1.10.1
So when you're done defining whatever crazy rules you think up, how do I pause whatever and edit the filenames to get them back into lexicographical order?

You can massage lexicographical to meet your needs. I can't massage your arbitrary rules to meet my needs.


Your examples don’t need any extra rules to be sorted correctly. The basic idea is that any sequence of digits is treated for sorting as if it were a single character. On my iPhone, your examples are sorted as expected.

Would you sort

  1.10
  1.2
or

  1.2
  1.10
?

I would not know how an OS treats those if we do not assume mindreading vs proper lexicographic order. Why would we need to substitute precision with vagueness for something that simply taking care of proper naming would suffice?


Ah yes sorry, 1.10 comes after 1.2 because 10 is bigger than 2 (so in fact different from your example). But assuming your original list is a list of versions (which seems reasonable given the presence of multiple decimal points for some cases), then that’s the order you’d want.

If you have non-integer numbers in your filenames then it won’t give the order you want, but there isn’t going to be a rule that works for all cases.


I was with you until this point, but 1.2 is bigger than 1.10, because 1.2 is a shortened version of writing 1.20 _unless_ you explicitely want these to be version numbers or something like that. The normal expectation would be to treat numbers as, well, mathematical numbers, and not SemVer, especially if we only have one decimal point, don't you think?

As I said, the sorting rule won’t always give pleasing results, but it seems to me like a simple and reasonable modification of lexicographic ordering.

It is neither simple, nor reasonable.

1.10, the number, is equivalent to 1.1. It is less than 1.2. You say you want numbers to sort as numbers, but you want 1.10 to be greater than 1.2.

Do you consider '1/4' to be a number? Should it come before or after '1/3'?

I'm guessing that you don't want to sort one character at a time if you encounter one of [0-9]. Instead, you want to group all consecutive [0-9] as a single sortable number. But aren't characters '.', ',', '/', '-' also part of numbers?

What about numbers like ↋, 五, π, B, ⅔, or -1?


It doesn’t work for decimals. It also doesn’t work for pi, or most dates. That’s okay. Supporting those cases would require “reading your mind” / trying to guess what the user wants by applying opaque rules. I certainly don’t want that.

Treating consecutive digits as numbers is a simple modification (I still think it’s quite simple) that is easy to understand and supports 99% of real-world use cases.


> But assuming your original list is a list of versions (which seems reasonable given the presence of multiple decimal points for some cases), then that’s the order you’d want.

What level of assumption is here expected from the sorting-system, would it have to process ALL entries of the list to find multiple decimal-points and then assume that they are ALL versions and not numbers?

How to treat this on different locales, where the decimal point is a comma and thousands-separator is a dot. Should the locale then also be considered by that system? Also when listing the folder of a remote-system with a different locale?

What about dates, should that system attempt to sort entries with multiple date-formats (yyyy-mm-dd, dd-mm-yyyy, dd-MMM-yyyy,...)?

The topic is far more complex than this narrow example. If we expect such a system to alter its sorting based on some data format interpretation, there is a risk of misinterpretation which might make the whole list unusable...


It has nothing to do with decimal points. It just looks at any contiguous sequence of digits and treats it as a single character for the purposes of sorting. The decimal point could be any other character and the behavior would be the same.

So only whole numbers are sorted as numbers then.

Decimal numbers are treated as strings and will have a completely different order, with digits after the decimal point sorted differently to whole numbers without fractions?

Or you mean every set of continuous digits within the same string are considered as individual whole number?

Depending on the decision, either lists of decimal numbers or lists of version numbers will be sorted wrong.

--> This could be covered by adjusting the logic based on the amount of decimal points.

And the logic complexity keeps increasing, up to an arbitrary point of "no, this will not be considered", resulting in an unpredictable user-experience of sorting...


>Depending on the decision, either lists of decimal numbers or lists of version numbers will be sorted wrong.

Yes. I don’t see why this is a big deal.

I didn’t suggest adjusting the logic based on the number of decimal points.


Ah ok.

I understand that you found your perfect trade-off for sorting based on longer considerations. But it will be difficult to communicate such a concept to a user.

Applying partial rules to improve sorting in one direction is not a lossless activity, it makes the UX actually worse in other scenarios as the user is first guided to assume a certain behavior, but then learns that his expectation is broken in adjacent scenarios (Which is more or less the bottom-line of that article to begin with).

In the end it'll be just "another standard" for sorting [0]

[0] https://xkcd.com/927/


> But it will be difficult to communicate such a concept to a user.

This isn't a prerequisite, since the existing naive character sort approach is not communicated either. In fact, it's almost universally unexpected by any user who hasn't written a naive string sort. Apple doesn't do this, and I very much did not need it communicated to me why 10 was coming after 2, because that's what everyone, who's not a programmer, expects.

As a litmus test, go ask some people, who are not programmers, without loading the question beyond "here are some files, how would you expect for them to be displayed in a list?". Show the lists side by side. It should not surprise you.


I consider 八 to be a whole number.

There is a rule that works for all cases. It's lexicographical sorting.

Simple. Consistent. Easy to manipulate to get what you want.


We just discussed a situation where lexicographical sorting doesn’t work. Adding in a rule to treat consecutive digits as one number doesn’t significantly complicate the logic and makes sorting work for a major additional use case. It doesn’t magically fix every case but it fixes a common one with minimal downsides.

> IMO, "Treat any sequence of digits as a number for the purpose of sorting" is consistent.

Are you sure about that?

  So how do you suggest handling hexadecimal numbers?
  Or octal numbers?
  What about binary numbers?
  What about file names with portions of a date and/or time?
  How is a program supposed to know any of the above?
> Let's say I have a directory of 32 numbered files.

Assuming any of the filesystems I am aware of is in use, those names are strings having one or two characters. They are not "numbered files."


Sorting dates: This is why there is an international standard of having YYYY-MM-DD hh:mm:ss in the order we have it. We got to learn this in school in the 80-ies because sorting paper documents would be more logical and easier to find stuff. So way before most people got computerized.

It just happens to be the most logical way to sort for computers too, as long as humans are involved in the usage of the data.


> Sorting dates: This is why there is an international standard of having YYYY-MM-DD hh:mm:ss in the order we have it.

That would be great, but this ISO is just one of the standards, and there are still regional standards as well.

And that's still ignoring the end-user. In Europe for example, humans might create filenames with date in format dd.mm, e.g. "Report 25.01.xls"

A system attempting to sort this intelligently would likely assume this is a decimal number, as it has zero context for it.

It's just slightly worse than the lack of consistent UTC-usage of systems, with the mixed attempts to correct data to local timezone (or not) depending on application...


Okay, I'll refine the rule to "Treat any sequence of digits as a base 10 whole number for the purpose of sorting". I still think this is quite clear. (Frankly, I also think the original definition is quite clear unless you're purposefully trying to misinterpret it.)

> those names are strings having one or two characters. They are not "numbered files."

Yes they are! In this context, a number is an idea, not a data type. Strings are capable of containing numbers.


I generally agree that treating substrings that are numbers as numbers is a good default for most users in most situations.

However, for hex numbers this simply won't give good results because some of them will just happen to not contain any of the digits A to F and be treated as base-10 numbers by the heuristic while others will include these digits and be sorted differently.

(So, a having a strict lexicographic mode as an alternative in file managers would be nice.)


Octal or binary numbers are going to be fine, but it'll totally and confusingly mess up hexadecimal numbers.

I am not sure any of the points you raised change anything to the OP's point, do they?

Op was taking about changing the rule to something more intuitive, in such case it would s'en natural that decimal numbers are used.


Your concept appears to have coherence until you consider that numbers are not necessarily expressed in decimal notation. What about hexadecimal numbers in filenames? Should they be sorted your way?

And what about very long strings of digits in the filenames - so long that they are too long for even the longest available numerical representation? In some apps, they are converted to floating point...


> "Treat any sequence of digits as a number for the purpose of sorting" is consistent.

How about decimal numbers, are they strings or still numbers?

How about version numbers with multiple dots?

How about decimal numbers of a different locale, e.g. you list the folder from a remote machine with filenames of a different locale?

The problem with such semi-consistent schemes is that they are still guess-work, they may make some cases better for some people, but other cases practically unusable because the system doesn't have sufficient information to handle all scenarios consistently.


> Strings are not numbers, even if some part of their content "looks like a number."

Irrelevant and intentionally obtuse. Filenames can't be anything but strings - there's literally no way to mark part of a filename as "this is an integer", so the idea that "strings are not numbers" is ridiculous because the only way to encode numbers (which people constantly want to encode) is as part of a string - which means that parts of filenames are numbers, because that's exactly how people use them.

> Problem is, this is your preference for a specific situation. Which may not be another person's preference in the same situation nor yours in a different situation.

> So what are programs to do?

> Display strings in a consistent, documented, manner. Which is lexicographical ordering in all cases lacking meta-data to indicate otherwise.

These do not follow from each other.

First, the assertion that "peoples' preferences are different, so we shouldn't pick an overwhelmingly common preference" is laughably false. The vast majority of computer users (which happen to not be people on HN) prefer "sort numbers by number rather than by UTF-8 value", so that's simply the correct way to sort.

Second, even regardless of the above, there's nothing preventing a "by name" sorting from being consistent and documented.

Either way, this line of reasoning is just wrong.


> I will add that I'm plenty "smart" enough to understand that "10" comes before "9" in a strictly alphabetical sense

Strictly speaking 9, 1 and 0 are not in the alphabet so can't be sorted alphabetically.

And I think most "normal users" wouldn't expect that programmers generalize the alphabet like we do.


Well, that's not alphabetical order.

It's great if DEs build this and give it a name. It's even better if they have a different one that deals with SI prefixes too. But it's not good if "alphabetical order" means that.


What desktop environment called this alphabetical?

This is a really important point - my file manager just says "Name" with sorting. So while its not perfectly defined, it doesn't make the promise of saying its alphabetical.

I mean, nine does come before ten in alphabetical order.

> I will add that I'm plenty "smart" enough to understand that "10" comes before "9" in a strictly alphabetical sense, and I still want my file managers to sort "9" before "10".

Amen.

> I split all of my audiobooks into chapters. I use the format "Chapter 01.mp3" (or "Chapter 001.mp3" when there are > 99 chapters) because some (all?) MP3 players are too stupid to sort numbers properly and I want my audiobooks to work everywhere.

Well, some car and kitchen radio manufacturers will probably never get this right. In my car (which tends not to be brand new) they even messed up UTF-8 chars, which gets me laughing every time a track has them. It's become a running gag with my wife, "Oh, listen up, it's &%=?! again".

> (all?)

Well, I kind of hate to say this, but Apple got this right with the iPods. They even regarded the metadata fields `sort-*` (e.g. sort-album), movement-name (for series) and movement-index (for part). With these fields they really group and sort my audio books as I expect it to be.

I even wrote my own software to fill these tags appropriately, so that I don't need to split my audio books. I'm pretty happy using `m4b` files - an mp4 / m4a container with chapter support, which is supported perfectly fine on my iPod Nano 7g and my Android Phone (using Audiobookshelf[1] and Voice[2]). After all these years, the iPod Nano 7g to me is the PERFECT portable audio book player with 2 exceptions: Repairability and the proprietary Apple headphone remote protocol [3].

1: https://audiobookshelf.org

2: https://github.com/PaulWoitaschek/Voice

3: https://tinymicros.com/wiki/Apple_iPod_Remote_Protocol


There’s a couple of reasons I don’t use m4b files:

- A lot of my audiobooks come as mp3, and converting to m4b (which is AAC based) would mean loosing quality.

- Some MP3 players (even those that support AAC) don’t support M4B.

- I want playback to stop automatically at the end of a chapter, unless I actively decide to start the next chapter. (Admittedly, some MP3 players don’t have an option for this anyway and will always start the next track. This annoys me.)

- Even with chapter metadata, I find it difficult to seek through a 10+ hour m4b file. Seeking through a 10 – 60 minute chapter is more manageable. (Of course, this doesn’t always work out; A Memory of Light has a single chapter that’s more than ten hours long. Whatever, I want to split in a way that follows the author’s structure, and Sanderson purposefully chose to write one extremely long chapter.)

I probably sound like I regularly switch between 20+ different models of MP3 player. In fact, I mostly use my computer or iPhone these days; however, I expect my audiobook collection to outlast any one piece of hardware.


And maybe someone else uses “American” style dates in their file names mm-dd-YYYY, can those also be put in correct order for those users?

That is just silly notation used by a minority in this world ;-)

Perhaps, but if you set your browser language to US English you have dates displayed as MM.DD.YYYY and there's no way to change it neither to European nor ISO (YYYY-MM-DD) format.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: