add en readme

baidu · Jun 30, 2020 · d7be916 · d7be916
1 parent 13fa9b1
commit d7be916
Show file tree

Hide file tree

Showing 2 changed files with 165 additions and 7 deletions.
diff --git a/README.en.md b/README.en.md
@@ -0,0 +1,145 @@
+English|[简体中文](https://github.com/baidu/Senta/blob/master/README.md)
+
+# <p align=center>`Senta`</p>
+
+`Senta` is a python library for many sentiment analysis tasks. It contains support for running multiple tasks such as sentence-level sentiment classification, aspect-level sentiment classification and opinion role labeling. The bulk of the code in this repository is used to implement [SKEP: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis](https://www.aclweb.org/anthology/2020.acl-main.374.pdf). In the paper, we demonstrate how to integrate sentiment knowledge into pre-trained models to learn a unified sentiment representation for multiple sentiment analysis tasks.
+
+## How to use
+
+### Pip
+
+You can directly use the Python package to predict sentiment analysis tasks, by loading a pre-trained skep model.
+
+#### Installation
+
+1. `Senta` supports Python 3.6 or later. This repository requires PaddlePaddle 1.6.3, please see [here](https://www.paddlepaddle.org.cn/documentation/docs/en/1.6/beginners_guide/install/index_en.html) for installaton instruction.
+
+2. Install `Senta`
+
+    ```shell
+    python -m pip install Senta
+    ```
+   or
+
+    ```shell
+    git clone https://github.com/baidu/Senta.git
+    cd Senta
+    python -m pip install .
+    ```
+
+#### Quick Tour
+
+    ```python
+    from senta import Senta
+    my_senta = Senta()
+    
+    # get pre-trained model, we provide three pre-trained models, all of which are based on the SKEP
+    print(my_senta.get_support_model()) # ["ernie_1.0_skep_large_ch", "ernie_2.0_skep_large_en", "roberta_skep_large_en"]
+                                        # ernie_1.0_skep_large_ch, skep Chinese pre-trained model based on ERNIE 1.0 large.
+                                        # ernie_2.0_skep_large_en, skep English pre-trained model based on ERNIE 2.0 large.
+                                        # roberta_skep_large_en, skep English pre-trained model based on RoBERTa large, which is used in our paper.
+    
+    # get supported task
+    print(my_senta.get_support_task()) # ["sentiment_classify", "aspect_sentiment_classify", "extraction"]
+    
+    use_cuda = True # set True or False
+    
+    # predict different tasks
+    my_senta.init_model(model_class="roberta_skep_large_en", task="sentiment_classify", use_cuda=use_cuda)
+    texts = ["a sometimes tedious film ."]
+    result = my_senta.predict(texts)
+    print(result)
+    
+    my_senta.init_model(model_class="roberta_skep_large_en", task="aspect_sentiment_classify", use_cuda=use_cuda)
+    texts = ["I love the operating system and the preloaded software."]
+    aspects = ["operating system"]
+    result = my_senta.predict(texts, aspects)
+    print(result)
+    
+    my_senta.init_model(model_class="roberta_skep_large_en", task="extraction", use_cuda=use_cuda)
+    texts = ["The JCC would be very pleased to welcome your organization as a corporate sponsor ."]
+    result = my_senta.predict(texts)
+    print(result)
+    ```
+
+### From source
+
+#### Preparation
+
+    ```shell
+    # download code
+    git clone https://github.com/baidu/Senta.git
+    
+    # download a pre-trained skep model
+    cd ./Senta/model_files
+    sh download_roberta_skep_large_en.sh # download roberta_skep_large_en model. For other pre-trained skep models, you can find them in this dir.
+    cd -
+    
+    # download task dataset
+    cd ./Senta/data/
+    sh download_en_data.sh # download English dataset used in our paper. For Chinese dataset, you can find its download script in this dir.
+    cd - 
+    ```
+
+#### Installation
+
+1. `Senta` supports Python 3.6 or later. This repository requires PaddlePaddle 1.6.3, please see [here](https://www.paddlepaddle.org.cn/documentation/docs/en/1.6/beginners_guide/install/index_en.html) for installaton instruction.
+
+2. Install python dependencies
+
+    ```shell
+    python -m pip install -r requirements.txt
+    ```
+
+3. Set up environment variables such as Python, CUDA, cuDNN, PaddlePaddle in ./env.sh file. Details about environment variables related to PaddlePaddle can be found at the [PaddlePaddle Documentation](https://www.paddlepaddle.org.cn/documentation/docs/en/1.6/flags_en.html).
+
+#### Quick Tour
+
+1. Training
+
+    ```shell
+    sh ./script/run_pretrain_roberta_skep_large_en.sh # pre-trained model roberta_skep_large_en, which is used in our paper
+    ```
+
+2. Fine-tuning and predict
+
+    ```shell 
+    sh ./script/run_train.sh ./config/roberta_skep_large_en.SST-2.cls.json # fine-tuning on SST-2
+    sh ./script/run_infer.sh ./config/roberta_skep_large_en.SST-2.infer.json # predict
+    
+    sh ./script/run_train.sh ./config/roberta_skep_large_en.absa_laptops.cls.json # fine-tuning on ABSA(laptops)
+    sh ./script/run_infer.sh ./config/roberta_skep_large_en.absa_laptops.infer.json # predict
+    
+    sh ./script/run_train.sh ./config/roberta_skep_large_en.MPQA.orl.json # fine-tuning on MPQA 2.0
+    sh ./script/run_infer.sh ./config/roberta_skep_large_en.MPQA.infer.json # predict
+    ```
+
+3. An old version of Senta can be found at [here](https://github.com/baidu/Senta/tree/v1), which includes BoW, CNN and BiLSTM models for Chinese sentence-level sentiment classification.
+
+
+## Citation
+
+If you extend or use this work, please cite the [paper](https://www.aclweb.org/anthology/2020.acl-main.374.pdf) where it was introduced:
+
+    ```text
+    @inproceedings{tian-etal-2020-skep,
+        title = "{SKEP}: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis",
+        author = "Tian, Hao  and
+          Gao, Can  and
+          Xiao, Xinyan  and
+          Liu, Hao  and
+          He, Bolei  and
+          Wu, Hua  and
+          Wang, Haifeng  and
+          wu, feng",
+        booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
+        month = jul,
+        year = "2020",
+        address = "Online",
+        publisher = "Association for Computational Linguistics",
+        url = "https://www.aclweb.org/anthology/2020.acl-main.374",
+        pages = "4067--4076",
+        abstract = "Recently, sentiment analysis has seen remarkable advance with the help of pre-training approaches. However, sentiment knowledge, such as sentiment words and aspect-sentiment pairs, is ignored in the process of pre-training, despite the fact that they are widely used in traditional sentiment analysis approaches. In this paper, we introduce Sentiment Knowledge Enhanced Pre-training (SKEP) in order to learn a unified sentiment representation for multiple sentiment analysis tasks. With the help of automatically-mined knowledge, SKEP conducts sentiment masking and constructs three sentiment knowledge prediction objectives, so as to embed sentiment information at the word, polarity and aspect level into pre-trained sentiment representation. In particular, the prediction of aspect-sentiment pairs is converted into multi-label classification, aiming to capture the dependency between words in a pair. Experiments on three kinds of sentiment tasks show that SKEP significantly outperforms strong pre-training baseline, and achieves new state-of-the-art results on most of the test datasets. We release our code at https://github.com/baidu/Senta.",
+    }
+    ```
+
diff --git a/README.md b/README.md
@@ -1,3 +1,5 @@
+[English](https://github.com/baidu/Senta/blob/master/README.en.md)|简体中文
+
 # 情感分析
 
 ## 目录
@@ -488,12 +490,23 @@ sh ./script/run_infer.sh ./config/roberta_skep_large_en.MPQA.infer.json # 预测
 如需使用该项目中的代码、模型或是方法，请在相关文档、论文中引用我们的工作。
 
 ```text
-@misc{tian2020skep,
-    title={SKEP: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis},
-    author={Hao Tian and Can Gao and Xinyan Xiao and Hao Liu and Bolei He and Hua Wu and Haifeng Wang and Feng Wu},
-    year={2020},
-    eprint={2005.05635},
-    archivePrefix={arXiv},
-    primaryClass={cs.CL}
+@inproceedings{tian-etal-2020-skep,
+    title = "{SKEP}: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis",
+    author = "Tian, Hao  and
+      Gao, Can  and
+      Xiao, Xinyan  and
+      Liu, Hao  and
+      He, Bolei  and
+      Wu, Hua  and
+      Wang, Haifeng  and
+      wu, feng",
+    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
+    month = jul,
+    year = "2020",
+    address = "Online",
+    publisher = "Association for Computational Linguistics",
+    url = "https://www.aclweb.org/anthology/2020.acl-main.374",
+    pages = "4067--4076",
+    abstract = "Recently, sentiment analysis has seen remarkable advance with the help of pre-training approaches. However, sentiment knowledge, such as sentiment words and aspect-sentiment pairs, is ignored in the process of pre-training, despite the fact that they are widely used in traditional sentiment analysis approaches. In this paper, we introduce Sentiment Knowledge Enhanced Pre-training (SKEP) in order to learn a unified sentiment representation for multiple sentiment analysis tasks. With the help of automatically-mined knowledge, SKEP conducts sentiment masking and constructs three sentiment knowledge prediction objectives, so as to embed sentiment information at the word, polarity and aspect level into pre-trained sentiment representation. In particular, the prediction of aspect-sentiment pairs is converted into multi-label classification, aiming to capture the dependency between words in a pair. Experiments on three kinds of sentiment tasks show that SKEP significantly outperforms strong pre-training baseline, and achieves new state-of-the-art results on most of the test datasets. We release our code at https://github.com/baidu/Senta.",
 }
 ```