This project (Link to Github Repo) is a tweet auto-completer for members of Congress. I used the Twitter API & the DocNow hydrator to create a custom dataset. I then used the GenSim library to generate a custom word2vec representation and finally used a Keras LSTM model to auto-complete tweets.
I used subsets of the following datasets from George Washington University stored on Harvard Dataverse:
- 115th US Congress Tweet Ids
- U.S. Government Tweet Ids
- 2016 United States Presidential Election Tweet Ids
Sample output of top ten closest words to the word ‘simple’ in the generated embedded space.
Two dimensional representation of the embedded space (dataset of ~50,000 tweets).
Two dimensional representation of the embedded space (dataset of ~1.5 million tweets)
Written on April 15, 2018