-
Notifications
You must be signed in to change notification settings - Fork 755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v0.5 #46
v0.5 #46
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, noticed you facing same numpy and pip related issues for hdbscan and umap. I was able to resolve this (posted here) but I've been using BERTopic inside of a Dockerfile. Posting it here so it helps you, and you can update Do note that each of python version, order of install and the extra parameters used in pip for hdbscan seem to be important. Without them it fails. Also, most likely with future patches for hdbscan/umap these issues with pip and version cross-compatibility would probably get fixed.
Build it via: Run it via : |
@bhavul Thanks! I had been following the issues with pypi for the last few days and was waiting for a fix to this issue. It's a shame there aren't perfect workarounds. However, it seems to be working for now with the updated requirements. Thanks again for the help. |
Running the installs mentioned above with python 3.7.7 worked for me:
|
Several features and fixes will be added to this version (#44, #43, #49):
Features
low_memory
parameter to reduce memory during computationsentence-transformers
as it speeds ups encoding significantlyFlair
to allow for more (custom) token/document embeddingsvisualize_topics()
get_params()
embedding_model
, should reduce BERTopic size significantlyFixes
stop_words
andn_neighbors
were removed. These can still be used when a custom UMAP or CountVectorizer is used.calculate_probabilities
to False as a default. Calculating probabilities with HDBSCAN significantly increases computation time and memory usage. Better to remove calculating probabilities or only allow it by manually turning this on.Roadmap
Issues
In the progress of developing this new version, there might be more features added that you might see now. These will be added to this message when working on them.