This project applies Natural Language Processing (NLP) techniques to analyze and categorize the poems of Laxmi Prasad Devkota, a renowned Nepali poet. We aim to cluster 35 of Devkota's translated poems into four thematic categories using advanced machine learning algorithms.
Laxmi Prasad Devkota (1909-1959), revered as Nepal's "Maha Kavi" (Great Poet), was a pioneering figure in Nepali literature. His neo-romantic poetry, exemplified by works like "Muna Madan," seamlessly blends lyrical beauty with profound philosophical insights, capturing the essence of human experience and the spirit of everyday people.
Our goal is to categorize Devkota's poems into four thematic clusters:
- Nature & Beauty
- Society
- Culture
- Spirituality
We've sourced 36 poems from WikiSource and translated them into English for this analysis.
Our analysis follows this structured approach:
- Data Collection: Aggregating poems from WikiSource
- Preprocessing: Cleaning, translating, and preparing text data
- Feature Extraction: Employing Word2Vec for text embedding
- Clustering: Implementing K-means algorithm for thematic grouping
- Python 3.7+
- pip
-
Clone the repository:
git clone https://github.com/yourusername/devkota-poem-clustering.git cd devkota-poem-clustering
-
Install Dependencies:
pip install -r requirements.txt
-
Get your Gemini API Key (if you want to use from scratch)
-
Scrape poems:
python web-scrap.py
-
Translate poems:
python translate_poems.py
-
For preprocessing and feature extraction, use the provided Jupyter Notebook.
We use Principal Component Analysis (PCA) to visualize our high-dimensional data:
Our clustering results:
We welcome contributions to enhance this project! Whether you have ideas for improving clustering algorithms, refining preprocessing steps, or introducing new visualization techniques, please feel free to submit a pull request.
If you encounter any problems or have suggestions, please open an issue in the GitHub repository. License This project is licensed under the Apache License.
- Laxmi Prasad Devkota Foundation for preserving the poet's works
- WikiSource contributors for digitizing Devkota's poems
- The open-source NLP community for their invaluable tools and resources