Skip to content

R library for converting Apache Spark ML pipelines to PMML

License

Notifications You must be signed in to change notification settings

jpmml/sparklyr2pmml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sparklyr2PMML

R library for converting Apache Spark ML pipelines to PMML.

Features

This package is a thin R wrapper for the JPMML-SparkML library.

Prerequisites

  • Apache Spark 3.0.X, 3.1.X, 3.2.X, 3.3.X, 3.4.X or 3.5.X.
  • R 3.3 or newer.

Installation

Install from GitHub using the devtools package:

library("devtools")

install_github("jpmml/sparklyr2pmml")

Configuration and usage

Sparklyr2PMML must be paired with JPMML-SparkML based on the following compatibility matrix:

Apache Spark version JPMML-SparkML branch Latest JPMML-SparkML version
3.0.X 2.0.X 2.0.3
3.1.X 2.1.X 2.1.3
3.2.X 2.2.X 2.2.3
3.3.X 2.3.X 2.3.2
3.4.X 2.4.X 2.4.1
3.5.X master 2.5.0

Launch Sparklyr; use the sparklyr.connect.packages configuration option to specify the coordinates of relevant JPMML-SparkML modules:

  • org.jpmml:pmml-sparkml:${version} - Core module.
  • org.jpmml:pmml-sparkml-lightgbm:${version} - LightGBM via SynapseML extension module.
  • org.jpmml:pmml-sparkml-xgboost:${version} - XGBoost via XGBoost4J-Spark extension module.

Launching core:

library("sparklyr")

config = spark_config()
config[["sparklyr.connect.packages"]] = "org.jpmml:pmml-sparkml:${version}"

sc = spark_connect(master = "local", config = config)

Fitting a Spark ML pipeline:

library("dplyr")
library("sparklyr")

data(iris)

iris_df = copy_to(sc, iris)

iris_pipeline = ml_pipeline(sc) %>%
	ft_r_formula(Species ~ .) %>%
	ml_decision_tree_classifier()

iris_pipeline_model = ml_fit(iris_pipeline, iris_df)

Exporting the fitted Spark ML pipeline to a PMML file:

library("sparklyr2pmml")

pmmlBuilder = PMMLBuilder(sc, iris_df, iris_pipeline_model)

buildFile(pmmlBuilder, "DecisionTreeIris.pmml")

License

Sparklyr2PMML is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0.

If you would like to use Sparklyr2PMML in a proprietary software project, then it is possible to enter into a licensing agreement which makes Sparklyr2PMML available under the terms and conditions of the BSD 3-Clause License instead.

Additional information

Sparklyr2PMML is developed and maintained by Openscoring Ltd, Estonia.

Interested in using Java PMML API software in your company? Please contact [email protected]

About

R library for converting Apache Spark ML pipelines to PMML

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages