This was really interesting. I think it’s good to offer access to large data sets to the public in some contexts. Context is important. Data isn’t impartial, neither is how it is collected, classifies, distributed, or shared. Algorithms are biased b/c they’re written by humans.
I don’t know you and I’m just learning about the breadth of your work (which I’ll be slowly reading through) but I’m not making any assumptions about you, your motives, or your team, etc.
So … I wonder at your decision to run with only one data set — Clinton’s emails — when there was Twitter to look to for BOTH presidential candidates to test your tool in a different way, with less emotionally and historically freighted data, that can be used to explore use of language, the spread of misinformation, the control of information and messaging, there is a lot to explore. If not Twitter, then there are other data sets the encompass each candidate that could have been used.
I’d think it’s naive, but it isn’t naivety. You’re a sophisticated, highly educated man who handles data every day in varying complexity. I don’t think it’s overt bias, but I wonder at your lack of thought and consideration of the political climate, of the history of the GOP’s 30-year character assassination of HRC & the incessent investigations of her that have proven nothing, and now, the fight to keep this country out of authoritarian dictatorship. (No hyperbole, given what we’ve learned about the activist pro-Trump FBI agents in NYC and WDC and likely elsewhere.)
Lastly — and I don’t mean to impugn you, but I am curious — I wonder what kind of ethical rigor you and your students apply to ideas when it comes to the manipulation of data and it’s effect on and use by humans.