This is an example repository for the Cubonacci platform. This README describes the basic machine learning lifecycle flow and how the different components interact.
The machine learning model needs data when trained. If Cubonacci has a matching, existing data snapshot available, you can select that one for training. If not, Cubonacci will call the .load_data()
method from the DataLoader to cache it inside the platform. By inspecting the returned data objects in depth, a schema is dynamically created. This schema is used for efficiently storing the data, setting up the API schemas and validating data in later stages.
When Cubonacci needs to train a model, it will instantiate the corresponding object with the hyperparameters for this training session. These hyperparameters either come from an experiment run where Cubonacci determines which hyperparameter settings to try or from the user interface. The data is loaded and passed to the .fit()
method of the model.
During an experiment run, every model training is done on a part of the training data. After the model is trained, the remaining data is used for evaluating the performance of this model. The features of this validation set are passed to the trained model. The predictions generated by the model are then passed together with the targets to the different metrics defined in the metrics
folder. In case of a validation schema that involves training models multiple times on different parts of the data, this process is repeated and the metrics will be averaged. After an experiment of a hyperparameter setting has completed, the metrics are collected by Cubonacci and will be used to steer the rest of the experiment run and are visible in the user interface.
When the optimal settings are found or when the hyperparameter search was already concluded in previous iterations a model can be trained on the full training set. Instead of collecting metrics, Cubonacci will save the model for later use and look at a number of predictions to determine what the schema of the predictions looks like so that the platform can prepare running the model in production.
After a model is trained in full it is available to deploy as an API. When deployed, an endpoint is available that can be called with JSON or with gRPC. The process serving these models uses the previously generated schemas to base the API schemas on. The calls are transformed into the same object format as the training set the model was trained on, after which .predict()
is called on the incoming data. The predictions are transformed back to the appropriate gRPC or JSON formats.