Alternatively, what happens when someone graduates, changes ISPs, a domain name ...

sergiosgc · on March 21, 2013

What is the alternative, then? That's the million dollar question. Even phone numbers, which have a large personal switching cost are ephemeral for many people. SSNs or other government mandated IDs are not acceptable. I see no better alternative.

saurik · on March 21, 2013

First, you seem to have immediately disciunted standard usernames, which have flaws, but are not "fundentally flawed". Second, you seem to assume that the primary key has to be something the user even knows; I do not. My Facebook account has a number: that number will never change, no matter what I do to my email addresses or even if my Facebook username changes; that number, and that number alone, defines my account. I will be honest and say I actually did have that number memorized a long time ago, but for years now I have not known it, nor would 99.999% of Facebook users. The same is true of Google's account infrastructure (which even has much much longer numbers). When you use OpenID, you get back an opaque string, which can be arbitrarily complex and encode whatever is required by the federated identity provider. There simply are not fixed identity nodes for people: you have to make them.

sergiosgc · on March 21, 2013

Sorry, usernames are not identity providers, they do not cross system boundaries. Perhaps we are discussing different problems.

Facebook is one identity provider. I'm not comfortable having all of the Internet, to use a reduction to the absurd, dependent on Facebook. Persona is distributed, which is, for me, a primary requirement for an identity system.

saurik · on March 22, 2013

"Sorry", but you just moved the centralized identity to a different location, you aren't magically removing it: in fact, you can't magically remove it, because there is no such thing as a universal canonical form of identity. If there was, we would just use that instead of all of these more complex forms of identity we build up.

In the case of the e-mail address, you are moving it to the person providing the domain name the e-mail address is on; this is thereby explicitly why I brought up OpenID--which you seem to have totally ignored--as it is also a distributed system but yet it doesn't force the incorrect assumption that Persona does (specifically, that e-mail addresses are canonical identification mechanisms for users, and so nigh-unto encourages opaque tokens to be returned).

Let's put this another way: you seem to believe that there is a way to assign a token to a user in a way that works for them no matter where they are. I believe that that entire concept is a fantasy. In particular, I believe the one that Persona chose, which you seem to want to defend, only works for a very weird subset of users that are technical enough to understand that others make this incorrect assumption, and then organize their lives around it.

In comparison, once you assume that there is no such identifier, you start asking how you federate the choice, and you end up with the idea of "identity providers". However, you now have no stable way of holding on to identity as you change providers. I maintain that is an unsolvable problem, and it should be treated as an unsolvable problem: the concept of an identifier that can cross multiple identity providers is, again, a fantasy.

So the result is that you now have to ask the question of what identifier is used within an identity provider. I maintain that e-mail addresses are a horrible horrible choice, as they are known to be something that the people who provide them are known to reuse, and they are things many users want to be able to change often. Instead, it should be encouraged to make the token entirely opaque, with a requirement in the spec that "this token will never get reused by any other user using your identification provider ever in the future". AFAIK, OpenID states exactly this.

And yet, when I go to log in to a website using OpenID, I can select Google as my authentication provider, and then log in with my e-mail address. Later, when my e-mail address changes, I'll be able to log in with a different e-mail address, and Google will assign me the same OpenID identifier for my federated login with other websites. You end up in a similar boat if you use the various OAuth flows that other companies use, such as with Facebook, then using an API to get the stable identifier (which for Facebook is a number).

Look: I maintain a website with tens of millions of worldwide users, and I only have federated login. I deal with the issues of federated login day in and day out. I deal with users from very diverse user communities, and I see how they treat their e-mail addresses (as sadly, Google will not let you change your Gmail address without getting a new account, so a lot of users are forced into situations where their identification changes when there was no real reason why it should have been required). When I look at Persona, all I see is a support nightmare due to a fundamentally invalid assumption.

sergiosgc · on March 22, 2013

A touch and flee commentary roughly touching many important aspects of identity management, with lots of assumptions with which I disagree. It'd take me a while to deconstruct all of them correctly, so I'll try to aim just for the most important points.

Your central point is that, for a given identity, there is a set of emails that map to that identity (N-1 relationship). Moreover, this set is itself volatile, as people add and remove emails to the set that maps to their identity.

There is no relation of one person to one email (it is one to many). This was approached during the discussion, and I believe you agree that this is not an issue.

There are cases where the relation of email to person is one email to many people. This was approached during the discussion, and it just invalidates the usage of group emails as identity identifiers (stuff like info@example.com). It does not preclude the concept of email as identity identifier. I believe you agree.

Where you have a specific point against the usage of email as an identifier is on the fact that the email is volatile. Volatility is a flaw, which must be worked around, but it's not a showstopper. Further, there's absolutely no evidence that a pure identity provider, such as OpenID, wouldn't be as volatile as email. It does not have the scale needed to prove non-volatility yet.

I think we agreed on the point about natural/surrogate keys.

We then briefly touched the question of identity provider (up to here the discussion was about the identity identifier). I think we agree that the provider must be distributed, hence your reference to OpenID (albeit the reference to FB, which I think was related to the point about natural/surrogate keys).

Briefly, in summary:

- The identity identifier must map one identifier to one identity. (ok for emails, ok for OpenID, ok for most auth providers such as Google or FB)

- The identity provider must be distributed. (ok for email and OpenID, ko for Google, FB and other social networks). Social networks are not acceptable because they are not distributed enough. Were there as many social network providers as there are email providers, on a on open ecosystem, where users could run their own social network on their own domain, social networks would be acceptable.

- Identity identifiers must be memorizable by users.

And here, on this last point, is the crux of our disagreement. You are of the opinion that, since emails are volatile, users will forget their previous email, or discontinue its service, and will be locked out of access to their indentity on services tied to the email. I am of the opinion that users will never remember (or include in their daily routine) yet another identity identifier (the OpenID URL, or any other identity specific service).

The fact of the matter is that space in consumer minds is limited (heck, marketing is a whole field of study on grabbing a share of consumer mind). Being limited, you have two options:

a. Aim for the ideal scenario, go for OpenID or other pure identity provider, aim for theoretical perfection, and risk no adoption because users can't be bothered to use it; or

b. Use the one identity that already exists, that users already remember, that almost fits the bill, and work around volatility, the only big hurdle, by having the ability for users to change/add/remove emails (identity identifiers) associated with each account. This is the approach of Amazon or Linkedin, for example.

Aiming for a), while theoretically optimal, introduces the failure scenario of users going for the other identity that already exists in their daily routine: Facebook/google accounts. This is a catastrophic scenario.

Optimum is often the enemy of good.

Look: You brought a smile to my face with the reference to a tens millions of users site. It's a vain appeal to authority. I also manage a tens of millions of users site, with both federated and native authentication.

saurik · on March 22, 2013

First, nothing in this description/recap touches on my original complaint--the one that started my involvement in this thread and which was your first reply to me--that e-mail addresses get reused, and thereby are dangerous to use as your canonical identification identifier; you mention that they are volatile, but you fail to take into consideration that they are sufficiently volatile as to be lost and reassigned to others. I cannot tell if this was simply ignored or actively dismissed.

This is for two reasons: first, because they are tied to the DNS system, a system that is subject to political and economic issues. Tying the namespace of users eventually to the DNS system means that if a domain name is lost and then owned by someone else, that they are now in control of the identities of everyone associated with the domain name.

There is no reason this needs to be the case: a sane federated identity system, in my book, would solve this by tying identities instead to a private key that is stored by the authentication provider. Thereby, if the authentication provider goes out of business, their key would be lost, and would not be reassigned to anyone willing to pay $12.

Secondly, your fundamental disagreement is about the ramifications of a bullet point I never agreed with (as opposed to, seemingly, about whether we see eye-to-eye on that bullet point): I do not believe that identity identifiers must be memorizable by users, I believe that only the presentation identity must be memorizable by users.

To demonstrate the difference, I am 116098411511850876544@google and 3614794@facebook. However, despite the fact that I log in to numerous external websites using these authentication providers, I have only at occasional times remembered the shorter Facebook number, and I have never remembered (nor do I believe I would ever remember) that Google number.

Yet, all I need to log in is my e-mail address. Yes: I end up using my e-mail address to log in. However, the concept of my identity is not stored via that e-mail address: that's just one of the many things associated with my account at the authentication provider, and it will change over time. I could use any other identifier, including crazy ones like "what were the last three things you did on this website, and give us a list of at least five of your friends" (which is similar to something Skype once used to verify that I was the owner of my account when they thought it had been hacked by someone trying to steal my credits).

In fact, despite being one of those weird tech people who have actively avoided changing my e-mail address in nearly ever, I actually have changed my Facebook e-mail, as originally Facebook was for students only. Therefore, my account was originally connected with saurik@umail.ucsb.edu.

However, I later changed my account to be attached to saurik@saurik.com, which I consider to be "mine". Then, at some point later, long after I graduated, UCSB removed my access to my old student accounts, and I'm pretty certain will happily give them to others. Of course, this means that for any website that considers e-mail addresses to be related to security or where it can be considered a canonical store of identity (although I maintain that this second category is actually very rare: only Persona seems to make this mistake) that I may have forgotten to go find and update, I'm now screwed: someone else can access my account.

Now, when I log in to Facebook to third-parties that I previously logged in to, they still work. They still work, because Facebook, unlike Persona, did not ever use my e-mail address as the canonical definition of my identity. None of the tons of websites out there that I've ever used Facebook with ever even saw saurik@umail.ucsb.edu, and if they did it was metadata associated with my account: it wasn't "my account".

This is a critical detail. The different is enormous, and is especially noticeable due to the way Persona is designed, where you don't even really need an active third-party to manage your identity, and are instead supposed to be able to just magically use any e-mail address: some e-mail addresses would suck even worse than others (example being my UCSB address), but any e-mail address is something that is volatile.

You need that middle-man, even when you are federated such as with OpenID (or via OAuth with random APIs for identifiers), in order to isolate the volatility of e-mail addresses from the people handling federated login. Otherwise, you end up in federated hell, where every time a user changes their e-mail address instead of contacting one party, they have to contact every party, defeating the entire point of single sign-on.

To take a concrete example of this, I received a support e-mail from someone saying that they are changing their e-mail address redacted12323@gmail.com (yes, with a random 5 digit number at the end) to redactedqredacted93@gmail.com. As Google internally does not understand this very argument that e-mails are something people want to be volatile refuses to let you attach or move around Gmail accounts to Google account identities, this means that their identity on my site changed, and I had to reassign them. I did this, and seriously two days later (today) I received an e-mail saying that they have lost access to redactedqredacted93@gmail.com and now want it to be redactedqredacted94@gmail.com.

In contrast, when a user changes their Facebook username (which is equivalent to changing their Google username, and changes their @facebook.com e-mail address similarly to their @gmail.com e-mail address), they don't need to even think about how this affects other websites, as Facebook doesn't map e-mail addresses back to the identifiers that they return: they return an opaque number. This is glorious: they get it; they make the value of federated login actually be a benefit for users and for websites. Some entities in the system (and not just one: hence how Facebook, Yahoo, and Google can all provide this service) can thereby specialize in maintaining and controlling identity, and other entities in the system can spend all of their time consuming identity.

This means that the user has to do less work, and the website has to do less work: win/win. In your scenario, where the e-mail address is canonical, the user has to go to every single website and update their identity, and the websites have to provide systems to let users do this (which isn't even something that cleanly automates given that almost every time people bring this kind of issue up to me, it is because they no longer have access to the old account, and thereby can't use it to authenticate at my site anymore).

At this point, you might now argue that "well, that's OpenID, and OpenID failed". First, it isn't clear why it failed: Yahoo claims from user studies that it failed because the UI for it was so horrible, and that made a lot of sense to me. They even recommended a potential solution to that problem, and its a solution that I don't think is inherently worse for anyone than Persona. It certainly isn't clear, however, that Persona makes any sense. I in fact just tried to use Persona, and I found it horribly confusing; and in the end actually haven't succeeded in logging in to the website in question (at some point in the flow I seem to have caused a persistent error that now won't clear). It certainly was more complex than using "classic" federated authentication, and it seemed like that complexity was directly related to the website in question having to go further out of their way than they did with previous solutions to abstract the identification tokens (my e-mail address) from my account.

Finally, my point about the tens of millions of users was not in regards to an appeal to authority: it was a defense to the actively condescending tone of your "sorry", which I found to be downright belittling and inappropriately unrelated to my arguments (hence the "look" at the beginning of that clause). That "sorry" you added is a rhetorical construct that does nothing other than to annoy your opponent (hence my repeating it back to you in quotation marks for emphasis to demonstrate that "I noticed"). Think of it more like you patting my head and going "kid, I don't think you understand" in an attempt to turn my youth into an argument point, and me feeling the need to say "look, I'm over 30, I'm not a kid anymore". I found that "sorry" so bothersome that I will be honest: I was downright shocked when you were willing to even go down a point-by-point argument rather than just turning more rhetoric back at me; I was absolutely certain you were just going to flip me off again in your reply to my further-argued comments, although I guess I sort of feel you did anyway by calling me "vain"... :( I am not certain why you feel like belittling and insulting me directly (as opposed to my arguments) is a logical argument path :(.

sergiosgc · on March 23, 2013

The "Sorry" had no ill Intention. My apologies if I conveyed the wrong meaning.

Anyhow, I now fully understand your position. I still maintain my point, and further discussion would only be useful if accompanied by beer. All in all, it's the ever present tug between theoretic perfection and real world constraints. It's doubtful this back and forth would ever reach an acceptable compromise.