On Data Science / Machine Learning Workflow
Analysis Workflow Template v1.9 (Data Science in Python)
In the interest of moving towards a discussion of best practice workflow for analyzing data:
An interesting point in the relationship between stop words, word frequency, document frequency, and document-frequency-high/low-cut-off, comes from the parameter descriptions in sklearn for the two functions TfidfVectorizer & CountVectorizer (more details below).
In the past year I had the opportunity to work with a cross functional team of coders and fellow data scientists on a project for C4ADS: The Center for Advanced Defense Studies. Due to privacy allowances and NDA requirements I will be somewhat vague about the project, but here…