“Ke$ha” is an Invalid Name

The below video and blog post highlight the difficulties in asking the user to input data that both captures as much relevant data as possible (data utility) while not making the user feel marginalized (user experience).


Data utility involves asking the user to input personal information, such as name, gender, etc. We then collect this data, analyze it, and use it to improve our product. However, systems are strictly logical and interpret what we tell them to interpret, and short of writing an unnecessarily long algorithm to capture all the variations of user input that a system can interpret, we have to accept that some inputs just won’t be used in our data analysis.

The video discusses some ways to minimize the data loss, which is to set up constraints on input — such as limiting gender selection to only male or female or a list of other types of gender people may identify as. However, the more selections the server includes, the harder it becomes for the user to sift through all of the options, leading them to become frustrated or bored and ultimately noncompliant. A noncompliant user leads to inputs of false data, which is either unusable or misleading. The latter is dangerous since it can contribute to skewed results in the larger data analysis.

Yet limiting the users selection to make data input quick — thus maximizing compliance and valid inputs — marginalizes many users who don’t identify with the listed results, thus leading them to input data that doesn’t represent themselves or out of frustration/antagonism for being excluded.

Thus, to maximize accurate data collection, some balance is needed. We should find some balance between offering too many or too few input choices. By maximizing compliance, we do miss out on much potential data. However, this becomes a net win since more users input data, and more importantly the data is accurate.

Therefore, when collecting data, we need to consider the user experience. Look from the point of view of the user and see whether this is enjoyable, boring, frustrating, ostracizing, etc. Offer them choices or suggestions, but try not to marginalize too much. Data from users who fit in no category may be useless to the larger study; however, that does not mean the input field/selection should exclude them. Excluding them would (1) dissuade the user from using the service, or as mentioned above (2) cause the user to put in incorrect data. So instead of excluding them, let them make their input, and then let the data be useless. Better they input useless data than false and misleading data.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.