Essential Tricks for Using BERTopic You Should Know

ChunYu Ko
The whispers of a data analyst
2 min readJan 8, 2024

--

Over the past year, we’ve extensively used BERTopic for processing a vast amount of text data. This experience has given us a deeper insight into user feedback, reviews, and interview contents, allowing us to systematically organize and significantly reduce time spent on text mining. Moreover, these topics have become a crucial variable in quantitative analysis.

Although the documentation for BERTopic is quite comprehensive, there are a few key points I believe need to be highlighted and shared:

  1. Prioritize and Implement Best Practices: Referring to and utilizing case examples from Best Practices can guide effective BERTopic application.
  2. Language Segregation or Translation: Separating different languages or performing translations beforehand can facilitate more accurate topic discovery.
  3. Exploring Beyond Sentence Transformer Embeddings: While sentence transformer embeddings are standard, experimenting with other pre-trained models can yield surprisingly effective results.
  4. Managing the Unclassifiable ‘-1’ Topic: When encountering numerous ‘-1’ topics that cannot be classified, remember to use Topic Reduction and subsequently update the topic model.
  5. Tweaking UMAP/HDBSCAN Parameters: While reducing dimensions and clustering, adjusting parameters within UMAP HDBSCAN is critical to ensure optimal model performance.
  6. Word Separation in Chinese Topics: Chinese topics require word separation, but it’s not always necessary for embedding.
  7. Leveraging LLM & Generative AI Models: Representative models from LLM & Generative AI significantly outperform default ones, so don’t forget to experiment with them.
  8. Hierarchical Clustering Visuals: While hierarchical clustering graphs are not very appealing, consider using tools like ‘ggraph’ in R for better visualization.

In conclusion, these insights are not just mere observations but are practical tips that can transform how you use BERTopic for text analysis. Whether you are a seasoned data analyst or just starting, these tricks will help you leverage BERTopic more effectively.

--

--

ChunYu Ko
The whispers of a data analyst

Work is data, and hobby is also data, but I yearn for my roommate's two cats, lazily lounging at the doorway.