Tags: daqingyi770923/BERTopic
Tags
v0.9.1 (MaartenGr#211) Fix MaartenGr#210, MaartenGr#208, MaartenGr#205, and MaartenGr#213
v0.9 (MaartenGr#176) * Get most representative documents per topic: `topic_model.get_representative_docs(topic=1)` * Added `normalize_frequency` parameter to `visualize_topics_per_class` and `visualize_topics_over_time` * Return flat probabilities as default, only calculate the probabilities of all topics per document if `calculate_probabilities` is True * Implemented a guided BERTopic by defining seed topics: `BERTopic(seed_topic_list=seed_topic_list)` * Fix loading embedding model * Fix probability mapping * Improve accuracy of probabilities * Additional FAQs
v0.8.1 (MaartenGr#138) * Fix tests * Add Kaggle example * Add interactive visualizations to API documentation * Set transformers in Flair
v0.8 (MaartenGr#120) Visualization Update: * Topic Hierarchy: topic_model.visualize_hierarchy() * Topic Similarity Heatmap: topic_model.visualize_heatmap() * Topic Representation Barchart: topic_model.visualize_barchart() * Term Score Decline: topic_model.visualize_term_rank() Improvements: * Created bertopic.plotting library to easily extend visualizations * Improved automatic topic reduction by using HDBSCAN to detect similar topics * Sort topic ids by their frequency. -1 is the outlier class and contains typically the most documents. After that 0 is the largest topic, 1 the second largest, etc. * Update MKDOCS with new visualizations Fixes: * Fix typo MaartenGr#113, MaartenGr#117 * Fix MaartenGr#121 * Fix mapping of topics after reduction (it now excludes 0) (MaartenGr#103)
v0.7 (MaartenGr#87) Highlights: * (semi-)supervised topic modeling * Added Spacy, Gensim, USE (TFHub) * Use a different backend for document embeddings and word embeddings * Create your own backends with `bertopic.backend.BaseEmbedder` * Calculate and visualize topics per class Fixes: * Fixed issues with Torch req * Prevent saving term frequency matrix in CTFIDF class * Fixed DTM not working when reducing topics (MaartenGr#96) * Moved visualization dependencies to base BERTopic * `pip install bertopic[visualization]` becomes `pip install bertopic` * Allow precomputed embeddings in bertopic.find_topics() (MaartenGr#79)
v0.5 (MaartenGr#46) * Add Flair to allow for more (custom) token/document embeddings * Option to use custom UMAP, HDBSCAN, and CountVectorizer * Added low_memory parameter to reduce memory during computation * Improved verbosity (shows progress bar) * Improved testing * Use the newest version of sentence-transformers as it speeds ups encoding significantly * Return the figure of visualize_topics() * Expose all parameters with a single function: get_params() * Option to disable the saving of embedding_model, should reduce BERTopic size significantly * Add FAQ page
PreviousNext