Essential Tricks for Using BERTopic You Should Know
Over the past year, we’ve extensively used BERTopic for processing a vast amount of text data. This experience has given us a deeper insight into user feedback, reviews, and interview contents, allowing us to systematically organize and significantly reduce time spent on text mining. Moreover, these topics have become a crucial variable in quantitative analysis.
Although the documentation for BERTopic is quite comprehensive, there are a few key points I believe need to be highlighted and shared:
- Prioritize and Implement Best Practices: Referring to and utilizing case examples from Best Practices can guide effective BERTopic application.
- Language Segregation or Translation: Separating different languages or performing translations beforehand can facilitate more accurate topic discovery.
- Exploring Beyond Sentence Transformer Embeddings: While sentence transformer embeddings are standard, experimenting with other pre-trained models can yield surprisingly effective results.
- Managing the Unclassifiable ‘-1’ Topic: When encountering numerous ‘-1’ topics that cannot be classified, remember to use Topic Reduction and subsequently update the topic model.
- Tweaking UMAP/HDBSCAN Parameters: While reducing dimensions and clustering, adjusting parameters within UMAP HDBSCAN is critical to ensure optimal model performance.
- Word Separation in Chinese Topics: Chinese topics require word separation, but it’s not always necessary for embedding.
- Leveraging LLM & Generative AI Models: Representative models from LLM & Generative AI significantly outperform default ones, so don’t forget to experiment with them.
- Hierarchical Clustering Visuals: While hierarchical clustering graphs are not very appealing, consider using tools like ‘ggraph’ in R for better visualization.
In conclusion, these insights are not just mere observations but are practical tips that can transform how you use BERTopic for text analysis. Whether you are a seasoned data analyst or just starting, these tricks will help you leverage BERTopic more effectively.