I wanted to include in my initial analysis :
- shannon entropy calculation (just for fun)
- pos tagging and comparing sentence structures (I also developed my own pos tagger for french and sentence structure, first based on Markovian improbability calculations (same method as used in rainbow hashtables for breaking password hashes https://www.sstic.org/media/SSTIC2011/SSTIC-actes/rainbow_tables_probabilistes/SSTIC2011-Slides-rainbow_tables_probabilistes-schneider.pdf), and now based on recurrent neural networks. I know of no public pos tagger in french. It is also now implemented in an AI capable of breaking down sentences into actions, and manipulating and learning abstract objects)
- co-occurence matrices or some kind of word2vec (maybe gloVe, as I liked the concept and always wanted to use it)
- each speaker’s community analysis based on social media (and a hashtag), with a tfidf analysis of tweets and k-means algorithm for clustering them. A screenshot example of the visualisations I developed with d3js : https://gyazo.com/39333aed5484a55af152ac2b96b8fa5d
There are two things that, in my opinion, you failed to understand :
- Your mistake : a mind projection fallacy : https://en.wikipedia.org/wiki/Mind_projection_fallacy
The only thing you saw from this post is the success it had, thus upsetting you for the misnomer title. Maybe I left open doors and shortcuts in the analysis because I did not think it would have this success. Don’t you think it is plausible ? This is also how you thought you could evaluate my skills, which are just fine by the way. While you judge one person by one produced content (which is also the first, considering that this article is the first one I wrote), I still hope (and think) that people don’t judge you on only one thing.
- When I was developping my own sentiment analysis API, I realized that the simplest features accounted for the vast, VAST, majority of the success rate. Details and fine tuning occurred while I was investigating with pos tagging for example, where it “only” (it is still huge) added 1% of success rate. The way to read this article is not to have a complex and full analysis of the debate, this article has to be read like you read an online medium article of 7 min long, which it is… Also, complexity does not mean truth. Gratuitous complexity is the reflex of anyone eager to show how smart he is, and it always fails to impress. This analysis does reveal interesting things, you just “wish” that it does not. Or at least, this is what I get from reading your comment. Comparing a debate with exotic languages is not just out of point, it is also plain stupid. Vocabulary size, personal word use, sentence length : these features will account for the majority of what you perceive as a listener, at least in the context of which I have written in this article. It is time and time again proven when other features are introduced in machine learning algorithms.
If you failed to see the derisory and provocative aspect of this article, you failed to understand that everything you will read on the subject (politics, Trump, Clinton) will be biased, not perfectly scientific. I guess I thought people would have understood that before making general and mean judgements about people’s skills or the article. My guess was wrong. Luckily, the majority of the readers grasped that.
With that in mind, re-read your own words :
“ I’m sure you have some skill, some knowledge, some thing that you can do that is of value to someone. You don’t need to lie and claim that you are anything other than what you are. It is a form of identity theft, a form of fraud when you claim credentials you do not possess.”
No need for an analysis to see that the first sentence is pure, nauseous condescendance. The second one is utterly false : I did not claim anything, you just imagined I had. The third sentence just shows how you misunderstood the context of this article. Your words, sir, are exaggerated. This is a Medium article, not a university lecture. I never claimed to make a lecture. If you think that, read again. If you still think that, read again.
By the way, I can link to wiki articles too :
LDA and hidden Markov models, as listed in the wiki article, are my daily routine. In conclusion, I feel no shame in adding a simple “Semantics” in the title of my article.