Originally posted on Analyticbridge.
Here are two multiple-choice questions that could be used to uniquely characterize each human that will ever exist on Earth. Even twins will have different answers. It is expected no two human beings to have the same answers.
First question: Order the following types of food, from your favorite (#1) to the one you like least (#9). Possible choices: fruit, vegetable, dairy, carbohydrate, red meat, poultry, fish, seafood, dessert.
Second question: Order the following types of environment, from your favorite (#1) to the one you like least (#9). Possible choices: beach, mountain, desert, plain / rural, urban, small town, lake / river bank, hills, forest.
The number of potential answers (that is, the number of potential orderings) for each question is factorial 9. The total number of potential answers for both questions is square of factorial 9, that is 132 billion.
Of course some combinations are more likely to appear than others, some people will have a hard time ranking and would rather allow for ties, and if you've lived all your life in the same place eating the same food, you can't correctly answer these questions. Same if you are a little kid. But for most of us, this works and could even be used by companies such as match.com or advertisers. Also, this type of ID has the following advantages:
- It is universal (it could even apply to dogs),
- It is personal unlike arbitrary social security numbers,
- You know what's in your ID (government IDs such as SSN might be hiding some encoded data about you, in your ID, for profiling purposes)
- It's easy to retrieve if lost (at least partially, which might be good enough) by answering the two questions
- Unlike genome, this ID is (to a large extent) is independent from gender and race (or age)
It may change over time as tastes change, but I think this is OK, your ID follows your personality. You might want to add a third question (maybe about favorite colors or climates) to increase the discriminating power, but I think it is not necessary.
Another option is to have more questions with fewer choices. For instance, 8 questions each with 4 choices (rather than 2 questions, each with 6 choices) would allow for pretty much the same number of unique IDs (a bit above 100 billion) but would be less error-prone, as people are more likely to correctly remember how their rank 4 items (e.g. colors), rather than 6 items. If you allow for only 2 choices per question, then you would need to ask 37 questions to cover 100+ billion unique IDs.
Experimental design to choose good questions and good choices
The possible choices (answers) should be determined using experimental design and testing, not the other way around. Let's say that your first question is about food, with two choices: fish versus dirt. You do a test, you realize everybody rank fish as #1. The test tells you that this is not a good, there will be lots of people with same ID. You change you choices from fish/dirt to fish/meat. Now you see that the distribution is more uniform. You continue testing till you have something good enough.
You can even test choice stability: Ask a person to rank 9 choices today and in 7 days, retain the choices that
- are most stable over time and
- provide an even distribution (or as close as possible to uniform distribution)