Skip to content

sprark word2vec wordembedding embedding

Notifications You must be signed in to change notification settings

charlie80/word2vec-spark

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spark word2vec

train word2vec on spark and save as text file(google word2vec format ) 使用spark训练word2vec,由于spark保存的模型只能在spark上使用,本工程将spark训练好的wordvec转换成google word2vec的文本形式(word vector)

train

   spark-submit.sh -input hdfs_corpus -output hdfs_word2vec_model

display

   python w2v_visualizer.py word2vec.model ./log_result/
   tensorboard --logdir ./log_result/
   

result

image image image image

About

sprark word2vec wordembedding embedding

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Scala 59.9%
  • Python 31.2%
  • Shell 8.9%