> An algorithm must be unambiguosly specified for all possible inputs.
And it is. It's just that some outputs may not match what the user expects. TFA's preferred algorithm (simple lexicographic sorting) matches user expectations 90% of the time. The algorithm actually in use on most OSs (simple lexicographic sorting + treat consecutive digits as combined numbers) matches expectations 99% of the time. An algorithm that matches expectations 100% of the time doesn't exist. Shouldn't we pick the 99% algorithm?
(I am admittedly making up the actual percentages, but you get the point.)
> But I'm not sure how simple it would be to explain to a non-technical user why size_5, size_10 and size_15 are in order but size_0.25, size_0.5 and size_0.75 are out-of-order.
You don't have to explain it if the situation never comes up.
I'd bet 99.9% of computer users don't have any files which would trigger this edge case in a situation they would actually notice. Decimals just aren't that commonly used in this context, and even if you do have decimals the sorting will still work a lot of the time. For the remaining 0.5%, chalk it up to a bug.
I literally had to test this on my Mac just now because I never realized it was broken.
I will add that I'm plenty "smart" enough to understand that "10" comes before "9" in a strictly alphabetical sense, and I still want my file managers to sort "9" before "10".
I don't want to put leading zeroes before every all the single digit numbers in my file names. (And then potentially go come back later and add even more leading zeroes once the maximum number reaches three digits.)
---
I split all of my audiobooks into chapters. I use the format "Chapter 01.mp3" (or "Chapter 001.mp3" when there are > 99 chapters) because some (all?) MP3 players are too stupid to sort numbers properly and I want my audiobooks to work everywhere.
This works, but it looks kind of ugly and creates extra work—yes I have scripts to automate it, it's still an extra step—and it would be great if I could just trust that every device will understand numbers.
> I don't want to put leading zeroes before every all the single digit numbers in my file names.
> ... it would be great if I could just trust that every device will understand numbers.
Strings are not numbers, even if some part of their content "looks like a number."
> I will add that I'm plenty "smart" enough to understand that "10" comes before "9" in a strictly alphabetical sense, and I still want my file managers to sort "9" before "10".
Problem is, this is your preference for a specific situation. Which may not be another person's preference in the same situation nor yours in a different situation.
So what are programs to do?
Display strings in a consistent, documented, manner. Which is lexicographical ordering in all cases lacking meta-data to indicate otherwise.
> Display strings in a consistent, documented, manner.
IMO, "Treat any sequence of digits as a number for the purpose of sorting" is consistent. I'm not sure if it's documented—I've never needed to look up the documentation—but if it's not, the developers could certainly fix that.
> this is your preference for a specific situation.
Sure, but we generally make decisions based on which situations we think will be most common. I think having ten or more things (screenshots, audio samples, whatever) named "Thing 1" – "Thing 10" in a folder is extremely common. And if Thing 10 comes before 9, it's really annoying!
Let's say I have a directory of 32 numbered files. Under the author's preferred sorting method, they'll get displayed:
If I download a folder with files like this, I basically have to pause whatever I'm doing and edit the files to have leading zeroes before I can make sense of what I'm looking at.
Do I understand that you want these to be sorted like this?
1
2
9
10
11
So I guess you also want things sorted like
1.1
1.2
2
9
9.9
And also
1
1.1
1.10
1.2
1.10.1
So when you're done defining whatever crazy rules you think up, how do I pause whatever and edit the filenames to get them back into lexicographical order?
You can massage lexicographical to meet your needs. I can't massage your arbitrary rules to meet my needs.
Your examples don’t need any extra rules to be sorted correctly. The basic idea is that any sequence of digits is treated for sorting as if it were a single character. On my iPhone, your examples are sorted as expected.
I would not know how an OS treats those if we do not assume mindreading vs proper lexicographic order. Why would we need to substitute precision with vagueness for something that simply taking care of proper naming would suffice?
Ah yes sorry, 1.10 comes after 1.2 because 10 is bigger than 2 (so in fact different from your example). But assuming your original list is a list of versions (which seems reasonable given the presence of multiple decimal points for some cases), then that’s the order you’d want.
If you have non-integer numbers in your filenames then it won’t give the order you want, but there isn’t going to be a rule that works for all cases.
I was with you until this point, but 1.2 is bigger than 1.10, because 1.2 is a shortened version of writing 1.20 _unless_ you explicitely want these to be version numbers or something like that. The normal expectation would be to treat numbers as, well, mathematical numbers, and not SemVer, especially if we only have one decimal point, don't you think?
As I said, the sorting rule won’t always give pleasing results, but it seems to me like a simple and reasonable modification of lexicographic ordering.
1.10, the number, is equivalent to 1.1. It is less than 1.2. You say you want numbers to sort as numbers, but you want 1.10 to be greater than 1.2.
Do you consider '1/4' to be a number? Should it come before or after '1/3'?
I'm guessing that you don't want to sort one character at a time if you encounter one of [0-9]. Instead, you want to group all consecutive [0-9] as a single sortable number. But aren't characters '.', ',', '/', '-' also part of numbers?
It doesn’t work for decimals. It also doesn’t work for pi, or most dates. That’s okay. Supporting those cases would require “reading your mind” / trying to guess what the user wants by applying opaque rules. I certainly don’t want that.
Treating consecutive digits as numbers is a simple modification (I still think it’s quite simple) that is easy to understand and supports 99% of real-world use cases.
> But assuming your original list is a list of versions (which seems reasonable given the presence of multiple decimal points for some cases), then that’s the order you’d want.
What level of assumption is here expected from the sorting-system, would it have to process ALL entries of the list to find multiple decimal-points and then assume that they are ALL versions and not numbers?
How to treat this on different locales, where the decimal point is a comma and thousands-separator is a dot. Should the locale then also be considered by that system? Also when listing the folder of a remote-system with a different locale?
What about dates, should that system attempt to sort entries with multiple date-formats (yyyy-mm-dd, dd-mm-yyyy, dd-MMM-yyyy,...)?
The topic is far more complex than this narrow example. If we expect such a system to alter its sorting based on some data format interpretation, there is a risk of misinterpretation which might make the whole list unusable...
It has nothing to do with decimal points. It just looks at any contiguous sequence of digits and treats it as a single character for the purposes of sorting. The decimal point could be any other character and the behavior would be the same.
Decimal numbers are treated as strings and will have a completely different order, with digits after the decimal point sorted differently to whole numbers without fractions?
Or you mean every set of continuous digits within the same string are considered as individual whole number?
Depending on the decision, either lists of decimal numbers or lists of version numbers will be sorted wrong.
--> This could be covered by adjusting the logic based on the amount of decimal points.
And the logic complexity keeps increasing, up to an arbitrary point of "no, this will not be considered", resulting in an unpredictable user-experience of sorting...
I understand that you found your perfect trade-off for sorting based on longer considerations. But it will be difficult to communicate such a concept to a user.
Applying partial rules to improve sorting in one direction is not a lossless activity, it makes the UX actually worse in other scenarios as the user is first guided to assume a certain behavior, but then learns that his expectation is broken in adjacent scenarios (Which is more or less the bottom-line of that article to begin with).
In the end it'll be just "another standard" for sorting [0]
> But it will be difficult to communicate such a concept to a user.
This isn't a prerequisite, since the existing naive character sort approach is not communicated either. In fact, it's almost universally unexpected by any user who hasn't written a naive string sort. Apple doesn't do this, and I very much did not need it communicated to me why 10 was coming after 2, because that's what everyone, who's not a programmer, expects.
As a litmus test, go ask some people, who are not programmers, without loading the question beyond "here are some files, how would you expect for them to be displayed in a list?". Show the lists side by side. It should not surprise you.
We just discussed a situation where lexicographical sorting doesn’t work. Adding in a rule to treat consecutive digits as one number doesn’t significantly complicate the logic and makes sorting work for a major additional use case. It doesn’t magically fix every case but it fixes a common one with minimal downsides.
> IMO, "Treat any sequence of digits as a number for the purpose of sorting" is consistent.
Are you sure about that?
So how do you suggest handling hexadecimal numbers?
Or octal numbers?
What about binary numbers?
What about file names with portions of a date and/or time?
How is a program supposed to know any of the above?
> Let's say I have a directory of 32 numbered files.
Assuming any of the filesystems I am aware of is in use, those names are strings having one or two characters. They are not "numbered files."
Sorting dates: This is why there is an international standard of having YYYY-MM-DD hh:mm:ss in the order we have it. We got to learn this in school in the 80-ies because sorting paper documents would be more logical and easier to find stuff. So way before most people got computerized.
It just happens to be the most logical way to sort for computers too, as long as humans are involved in the usage of the data.
> Sorting dates: This is why there is an international standard of having YYYY-MM-DD hh:mm:ss in the order we have it.
That would be great, but this ISO is just one of the standards, and there are still regional standards as well.
And that's still ignoring the end-user. In Europe for example, humans might create filenames with date in format dd.mm, e.g. "Report 25.01.xls"
A system attempting to sort this intelligently would likely assume this is a decimal number, as it has zero context for it.
It's just slightly worse than the lack of consistent UTC-usage of systems, with the mixed attempts to correct data to local timezone (or not) depending on application...
Okay, I'll refine the rule to "Treat any sequence of digits as a base 10 whole number for the purpose of sorting". I still think this is quite clear. (Frankly, I also think the original definition is quite clear unless you're purposefully trying to misinterpret it.)
> those names are strings having one or two characters. They are not "numbered files."
Yes they are! In this context, a number is an idea, not a data type. Strings are capable of containing numbers.
I generally agree that treating substrings that are numbers as numbers is a good default for most users in most situations.
However, for hex numbers this simply won't give good results because some of them will just happen to not contain any of the digits A to F and be treated as base-10 numbers by the heuristic while others will include these digits and be sorted differently.
(So, a having a strict lexicographic mode as an alternative in file managers would be nice.)
Your concept appears to have coherence until you consider that numbers are not necessarily expressed in decimal notation. What about hexadecimal numbers in filenames? Should they be sorted your way?
And what about very long strings of digits in the filenames - so long that they are too long for even the longest available numerical representation? In some apps, they are converted to floating point...
> "Treat any sequence of digits as a number for the purpose of sorting" is consistent.
How about decimal numbers, are they strings or still numbers?
How about version numbers with multiple dots?
How about decimal numbers of a different locale, e.g. you list the folder from a remote machine with filenames of a different locale?
The problem with such semi-consistent schemes is that they are still guess-work, they may make some cases better for some people, but other cases practically unusable because the system doesn't have sufficient information to handle all scenarios consistently.
> Strings are not numbers, even if some part of their content "looks like a number."
Irrelevant and intentionally obtuse. Filenames can't be anything but strings - there's literally no way to mark part of a filename as "this is an integer", so the idea that "strings are not numbers" is ridiculous because the only way to encode numbers (which people constantly want to encode) is as part of a string - which means that parts of filenames are numbers, because that's exactly how people use them.
> Problem is, this is your preference for a specific situation. Which may not be another person's preference in the same situation nor yours in a different situation.
> So what are programs to do?
> Display strings in a consistent, documented, manner. Which is lexicographical ordering in all cases lacking meta-data to indicate otherwise.
These do not follow from each other.
First, the assertion that "peoples' preferences are different, so we shouldn't pick an overwhelmingly common preference" is laughably false. The vast majority of computer users (which happen to not be people on HN) prefer "sort numbers by number rather than by UTF-8 value", so that's simply the correct way to sort.
Second, even regardless of the above, there's nothing preventing a "by name" sorting from being consistent and documented.
It's great if DEs build this and give it a name. It's even better if they have a different one that deals with SI prefixes too. But it's not good if "alphabetical order" means that.
This is a really important point - my file manager just says "Name" with sorting. So while its not perfectly defined, it doesn't make the promise of saying its alphabetical.
> I will add that I'm plenty "smart" enough to understand that "10" comes before "9" in a strictly alphabetical sense, and I still want my file managers to sort "9" before "10".
Amen.
> I split all of my audiobooks into chapters. I use the format "Chapter 01.mp3" (or "Chapter 001.mp3" when there are > 99 chapters) because some (all?) MP3 players are too stupid to sort numbers properly and I want my audiobooks to work everywhere.
Well, some car and kitchen radio manufacturers will probably never get this right. In my car (which tends not to be brand new) they even messed up UTF-8 chars, which gets me laughing every time a track has them. It's become a running gag with my wife, "Oh, listen up, it's &%=?! again".
> (all?)
Well, I kind of hate to say this, but Apple got this right with the iPods. They even regarded the metadata fields `sort-*` (e.g. sort-album), movement-name (for series) and movement-index (for part). With these fields they really group and sort my audio books as I expect it to be.
I even wrote my own software to fill these tags appropriately, so that I don't need to split my audio books. I'm pretty happy using `m4b` files - an mp4 / m4a container with chapter support, which is supported perfectly fine on my iPod Nano 7g and my Android Phone (using Audiobookshelf[1] and Voice[2]). After all these years, the iPod Nano 7g to me is the PERFECT portable audio book player with 2 exceptions: Repairability and the proprietary Apple headphone remote protocol [3].
There’s a couple of reasons I don’t use m4b files:
- A lot of my audiobooks come as mp3, and converting to m4b (which is AAC based) would mean loosing quality.
- Some MP3 players (even those that support AAC) don’t support M4B.
- I want playback to stop automatically at the end of a chapter, unless I actively decide to start the next chapter. (Admittedly, some MP3 players don’t have an option for this anyway and will always start the next track. This annoys me.)
- Even with chapter metadata, I find it difficult to seek through a 10+ hour m4b file. Seeking through a 10 – 60 minute chapter is more manageable. (Of course, this doesn’t always work out; A Memory of Light has a single chapter that’s more than ten hours long. Whatever, I want to split in a way that follows the author’s structure, and Sanderson purposefully chose to write one extremely long chapter.)
I probably sound like I regularly switch between 20+ different models of MP3 player. In fact, I mostly use my computer or iPhone these days; however, I expect my audiobook collection to outlast any one piece of hardware.
Perhaps, but if you set your browser language to US English you have dates displayed as MM.DD.YYYY and there's no way to change it neither to European nor ISO (YYYY-MM-DD) format.
One nice thing about buying vinyl these days is that they almost all come with a DRM free digital download of the album as well. Buying physical records is what has caused my digital music collection to grow the most since my Hotline 1.2.3 days.
Depends on your perspective. If you’re into the ritual and interested in close, intensive listening, there’s a certain magic and immediacy to knowing you’re using a physically destructive playback mechanism—that this right now is the best this record will ever sound again.
As for myself, I have young kids and this sort of thing doesn’t make the cut these days, so I stream everything. It all feels background-y and I haven’t fallen in love with an album in years and years.
> there’s a certain magic and immediacy to knowing you’re using a physically destructive playback mechanism—that this right now is the best this record will ever sound again
Maybe I just don't get it - I'm much younger than the average HN user, growing up with physical media but not physical media that rapidly degraded on use like how vinyl does. But to me this sentiment is so alien that it seems like some kind of a milder nostalgia Stockholm syndrome.
When we think of other physical media, no one ever romanticizes that type of thing because degradation never really existed there. Would you want a photograph that faded away a significant amount each time you looked at it? A book that had the ink on its pages visibly rub off?
To me it just seems that the hard technical limitations of a long bygone era (that some people would've undoubtedly hated at the time) were given a mystique to them when people come back to them. Is the harsh fact of media degradation really inherently "magical"? Or is it that people ascribe good qualities to it because it's just the way it was?
CDs just seem so much better. Yes it's technically digital, but can you tell?
I didn't think so, until a couple of weeks ago.
I was in a record store and it had a CD player on sale for $30. One of those cheap blister-pack jobs. Just for a laugh, I bought it, and a couple of CD versions of records I own. (Genesis, New Order, R.E.M.)
I thought "digital is digital" so it shouldn't matter that it was cheap.
It wasn't great.
I sounded very flat. Even with my expensive headphones, it just didn't sound right. I'm not sure if "mechanical" is the right word, but it was noticeably different, and I'm not someone who has perfect hearing. It just sounded... boring.
So I compared the CD sound with the record versions that I rip with a $20 USB dongle and Audacity. The record rips sound much better than the CDs.
Maybe someone with perfect hearing will think otherwise. But I'm not an audiophile. I'm just a guy who likes gadgets.
Digital is digital, but you’re ignoring multiple places where things might not be the same:
That $30 CD player… if it’s connected to headphones, how were the headphones driven? Especially if you have nice headphones, it’s very easy for a cheap device to not be able to competently drive them.
Vinyl vs CD mastering is a thing. There could be differences there. Additionally, depending on how you ripped the vinyl (especially with a “cheap dongle”) that may introduce its own color to the record.
There’s a reason why music collectors differentiate between every single source, because often there are differences (sometimes small, sometimes big) between the various sources.
Yeah it depends on where the producer expects the CD to be played.
99% of music is made to be played on radio / in car etc., a noisy environment, where you don't want to be adjusting the volume knob all the time. So the dynamics are stripped in mastering phase.
Music that gets pressed on vinyls isn't mastered for car-play, but home stereo equipment, so it makes more sense to have larger dynamic range.
CDs have objectively lower noise floor (less hissing), and more dynamic range (difference between loudest and quietest note), but it's the mastering that usually destroys the sound. And nothing can be done about it on consumer end. Except find a less remastered version of the album in a thrift store that isn't scratched to oblivion.
There's really no reliable way to tell if a CD is going to have high dynamic range, except perhaps niche audiophile studios like https://www.stockfisch-records.de/sf12_start_e.html, but https://dr.loudness-war.info/ has fantastic list of records with their dynamic ranges, so you can check before you buy, and you can also explore and find new stuff to use to listen to your speakers ;)
If your CD player had cheap anti-skip it probably does lossy digital audio compression before output. A lot of the CD-player-as-a-package chips had older, crappy lossy audio compression and saved to a small bit of RAM on the CD player. Not much of a power envelope for compute power for audio compression logic. With memory being really expensive back in the day and prices being cutthroat there wasn't much memory for the blanti-skip buffer. So you needed fast, cheap, and really compact audio compression. Nobody really bothered improving it once MP3 players came out and memory got cheaper, so even "new" CD players use the same hardware portable CD players were using in the 90s.
And even then, it's not digital square waves coming out of your headphones. At some point that digital signal needs to be converted to analog waves. The quality of the DAC matters as well and can give a different quality of output.
If you used an analog audio output of the cheap CD player then the "digital is digital and it shouldn't matter that it was cheap" argument may not hold. The low quality of sound could be due to low quality of Digital to Analog Converter in the cheap player, not due to low quality of CD records that you have tried.
CDs degrade pretty fast. I know people with CD collections that are basically unplayable now. And the typical plastic cases don't even make for nice shelf deco like books or paper-based vinyl cases.
These are easily fixable problems! M-Disc exists for disc longevity. High quality cases exist.
I realize this isn't the world we live in so I guess I'm just yelling at clouds. But come on, Vinyl is just so obviously a bad way to preserve music...
I find it completely strange that dental care isn't just considered part of standard healthcare. Like, so my employer's health care plan covers every part of my body except my mouth? Why does my mouth specifically need its own plan?
It's a really unfortunate historical accident, especially in the US.
Dentistry evolved relatively late compared to regular medicine, and early oral procedures were mostly tooth extractions, which ended up being predominantly done by barbers, who would also do surgery (!). These procedures were often considered crude and beneath that of a trained doctor, and they were generally performed by self-trained practitioners. There were several attempts in the 1800s to integrate dentistry into mainstream medicine, but they failed, both because the doctors of the time didn't think of dentistry as being a real science, but also because, as dentistry started to legitimize itself, the dentists themselves preferred being separate.
For some reason the same separation also evolved in the U.K., but it's more integrated in other countries. For example, basic dental coverage is part of national healthcare in Germany and Japan. In the U.S., dentists have their own schools and licensing boards and so on, which isn't the case in the rest of the world, where dentistry is usually accepted as a regular branch of medicine and taught at the same universities.
I literally remember one of my elderly relatives pulling out a tooth at home with a thread tied around a door handle as if this was nothing special. DIY all the way! Early 1990s, former Soviet Union rural-or-so area.
As of today, we have moved to a different situation where dental care in that same country is ample, but the price lists are rarely transparent, making many not-so-well-off people avoid going to dentists altogether.
The health insurance system gives something like 60€ yearly to an adult for fixing teeth. Dental care for children up to 19yo is for free, though, which is great.
>I literally remember one of my elderly relatives pulling out a tooth at home with a thread tied around a door handle as if this was nothing special. DIY all the way! Early 1990s, former Soviet Union rural-or-so area.
Some form of this type of tooth pulling was common for baby teeth in the 90s. I’m sure it still is today. I don’t know about recommending it for adult teeth.
Yeah my kid asked me to do some form of this for a baby tooth because She didn’t want to wait for an appointment and it was really bothering her. It worked! Definitely not recommended for an adult tooth. Baby teeth are barely hanging on and don’t have deep roots really
In Germany, basic dental is very basic. It basically covers only acute scenarios and one checkup a year. Since so much of dentistry is maintenance and prevention, most people get supplementary insurance or pay out of pocket or do without.
Here is the list of covered services in Croatia:
- tooth extraction
- periodontal treatment
- tartar removal up to twice a year
- prosthetic work (e.g. dental crowns, partial and total dentures, depending on age)
- composite (white) fillings for front teeth
- amalgam fillings for other teeth
- composite (white) fillings for children up to 18 years of age for all teeth
- braces for children up to 18 years of age
It's not just in the US. In all the EU countries I lived it was also excluded from standard health insurance and government programs except in cases of acute damage.
Australia, despite having basically universal medicare (otherwise) still considers teeth "luxury bones" in the sense that they're not really necessary to health care (else they would be covered by medicare)
True but the profits and regulatory capture lie more with pharma companies than with dentists.
There's an incentive for pharma to want people not to take care of their teeth, I don't know if they ever act on this incentive but I wouldn't be surprised.
They don't generally cover normal vision needs. Your regular glasses and/or contacts aren't covered.
Eye injury or cancer, though? Usually covered. I honestly can't recall anyone getting denied for a possible scratch on the eye, either here in Norway or the US (Am from the US). Expenses for eye issues with MS or Diabetes? Usually covered. In these cases, you often go to a specialist MD instead of the normal eye doctor (if you see an MD at all). More is generally covered for eyes than teeth - if teeth were covered like eyes, it would be an improvement. A lot of acute care and infections and stuff would be covered.
As a glasses wearer, not covering glasses feels reasonable to me, although I can't quite put my finger on why.
I guess I see glasses at least partially as a fashion accessory--a necessary one for sure, but then so are shoes, and I don't expect those to be covered by insurance.
It's supposed to be insurance and most dental services don't cost enough to be worthwhile insuring as a risk. Everyone needs them, more or less, so any insurance would just be a (costly to run) payment plan.
Proper civilized countries do provide free to the consumer dental services, at least for children and poor people.
It's not your employer's "health care" plan. It's an insurance plan. And dental care doesn't really fit the insurance model since the vast majority of spending is expected and regular (checkups, cleaning, etc).
reply