Skip to content

Latest commit

 

History

History
87 lines (75 loc) · 4.7 KB

recommendation.md

File metadata and controls

87 lines (75 loc) · 4.7 KB

Recommendation Algorithm

Currently, Pytorch on angel supports a series of recommendation algorithms.

In detail, the following methods are currently implemented:

We use DeepFM as an example to illustrate the details process of running an algorithm. The methods are similar for other algorithms.

Example of DeepFM

  1. ** Generate pytorch script model** First, go to directory of python/recommendation and execute the following command:

    python deepfm.py --input_dim 148 --n_fields 13 --embedding_dim 10 --fc_dims 10 5 1
    

    Some explanations for the parameters.

    • input_dim: the feature dimension for the data
    • n_fields: number of fields for data
    • embedding_dim: dimension for embedding layer
    • fc_dims: the dimensions for fc layers in deepfm. "10 5 1" indicates a two-layers mlp composed with one 10x5 layer and one 5x1 layer.

    This python script will generate a TorchScript model with the structure of dataflow graph for deepfm. This file is named deepfm.pt.

  2. ** Preparing the input data** The input data of DeepFM should be libsvm or libffm format. Each line of the input data represents one data sample.

    label feature1:value1 feature2:value2
    

    In Pytorch on angel, multi-hot field is allowed, which means some field can be appeared multi-times in one data example.

    label field1:feature1:value1 field2:feature2:value2
    
  3. ** Training model** After obtaining the model file (deepfm.pt) and the input data, we can submit a task through Spark on Angel to train the model. The command is:

    source ./spark-on-angel-env.sh  
    $SPARK_HOME/bin/spark-submit \
          --master yarn-cluster\
          --conf spark.ps.instances=5 \
          --conf spark.ps.cores=1 \
          --conf spark.ps.jars=$SONA_ANGEL_JARS \
          --conf spark.ps.memory=5g \
          --conf spark.ps.log.level=INFO \
          --conf spark.driver.extraJavaOptions=-Djava.library.path=$JAVA_LIBRARY_PATH:.:./torch/angel_libtorch \
          --conf spark.executor.extraJavaOptions=-Djava.library.path=$JAVA_LIBRARY_PATH:.:./torch/angel_libtorch \
          --conf spark.executor.extraLibraryPath=./torch/angel_libtorch \
          --conf spark.driver.extraLibraryPath=./torch/angel_libtorch \
          --conf spark.executorEnv.OMP_NUM_THREADS=2 \
          --conf spark.executorEnv.MKL_NUM_THREADS=2 \
          --queue $queue \
          --name "deepfm on angel" \
          --jars $SONA_SPARK_JARS  \
          --archives angel_libtorch.zip#torch\  #path to c++ library files
          --files deepfm.pt \   #path to pytorch script model
          --driver-memory 5g \
          --num-executors 5 \
          --executor-cores 1 \
          --executor-memory 5g \
          --class com.tencent.angel.pytorch.examples.supervised.RecommendationExample \
          ./pytorch-on-angel-*.jar \   # jar from Compiling java submodule
          trainInput:$input batchSize:128 torchModelPath:deepfm.pt \
          stepSize:0.001 numEpoch:10 testRatio:0.1 \
          angelModelOutputPath:$output \
    

    Description for the parameters:

    • trainInput: the input path (hdfs) for training data
    • batchSize: batch size for each optimizing step
    • torchModelPath: the name of the generated torch model
    • stepSize: learning rate
    • numEpoch: how many epoches for the training process
    • testRatio: how many training examples are used for testing
    • angelModelOutputPath: the output path (hdfs) for the training model