Clara Currier
Aug 8, 2017 · 1 min read

This is a fascinating idea and one that makes a bit of intuitive sense when you think about it: word2vec is about reconstructing semantics, not words. As long as a given set of data has an internal semantic structure, it shouldn’t matter if the data is conversational english, conceptual categories, or even numeric gibberish. I now want to see if I can reproduce this using long hexadecimal values found in packets. I’ve wondered for awhile if there could be semantic structure to a packet at the byte level that can be leveraged for anomaly detection or classification. This post seems to be the answer. Thanks!