Should ML Topic Modeling replace your rules-based analysis?
We deal with large sets of data at work, perhaps you do too. As a natural path to speeding up work, we look at ways to leverage our computer’s power.
In a recent project, I was working with a large dataset comprised of user-generated text. My goal was to extract the main topics being discussed by participants in a survey. The volume of data being too large to manually process, I considered different options.
As I started project, I faced the following methodological decisions.
These key questions allowed me to select the right approach for our project:
- A first round of analysis with rule-based coding as primary findings
- A second round of analysis using Machine Learning to augment findings
In combining these two methodologies, we were able to meet the following criteria:
- The computer-aided interpretation accurately depicted participants’ voices, with reduced researcher bias
- The subjectivity and variance coming from various participants were aptly captured
- The codes could be re-used and scaled for future analysis, as this is a recurrent project