This repository contains the official implementation for the paper "T-Rep: Representation Learning for Time-Series Using Time-Embeddings".
It was built on top of the TS2Vec repository, which provided a very good start point for both model development and benchmarking. A big thanks to the authors!
The dependencies can be installed by:
pip install -r requirements.txt
The repository is not yet compatible with Pytorch 2.0. It includes specific (not always the latest) versions of packages, so we recommend having a dedicated virtual environment for this repo.
To train and evaluate T-Rep on one of the supported datasets (see below), run the following command:
python train.py <dataset_name> <run_name> --loader <loader> --repr-dims <repr_dims> --eval
The detailed descriptions about the arguments are as following:
Parameter name | Description of parameter |
---|---|
dataset_name | The dataset name |
run_name | The folder name used to save model, output and evaluation metrics. This can be set to any word |
loader | The data loader used to load the dataset. This can be set to UCR , UEA , forecast_csv , forecast_csv_univar , anomaly , or anomaly_coldstart |
repr_dims | The representation dimensions (defaults to 320) |
eval | Whether to perform evaluation after training |
(For descriptions of more arguments, run python train.py -h
.)
After training and evaluation, the trained encoder, output and evaluation metrics can be found in training/DatasetName__RunName_DateTime/
.
A detailed tutorial of how to use T-Rep is provided in full_tutorial.ipynb
, but we showcase below the simple sklearn-like interface used by T-Rep.
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from trep import TRep
import datautils
# Load the ECG200 dataset from UCR archive
train_data, train_labels, test_data, test_labels = datautils.load_UCR('ECG200')
# (Both train_data and test_data have a shape of n_instances x n_timestamps x n_features)
# Instantiate and train T-Rep
trep = TRep(
input_dims=1,
device=0,
time_embedding='t2v_sin',
output_dims=128
)
loss_log = trep.fit(train_data, n_epochs=80, verbose=1)
# Compute timestamp-level representations for test set
train_repr = trep.encode(train_data) # n_instances x n_timestamps x output_dims
test_repr = trep.encode(test_data) # n_instances x n_timestamps x output_dims
# Classify the learned representations using an SVM
svm_classifier = SVC(kernel='linear')
svm_classifier.fit(train_repr, train_labels)
y_pred = svm_classifier.predict(test_repr)
accuracy = accuracy_score(test_labels, y_pred)
This is all you need to know to use T-Rep. The produced np.ndarray
of representations can then be used as inputs for any task ranging from classification, clustering, forecasting, to anomaly detection etc.
The datasets used in the paper to evaluate the model can be downloaded from:
- 128 UCR datasets should be put into
datasets/UCR/
so that each data file can be located bydatasets/UCR/<dataset_name>/<dataset_name>_*.csv
. - 30 UEA datasets should be put into
datasets/UEA/
so that each data file can be located bydatasets/UEA/<dataset_name>/<dataset_name>_*.arff
. - 3 ETT datasets should be placed at
datasets/ETTh1.csv
,datasets/ETTh2.csv
anddatasets/ETTm1.csv
. - Yahoo dataset should be preprocessed using
datasets/preprocess_yahoo.py
and placed atdatasets/yahoo.pkl
. - Sepsis dataset files should be placed under
datasets/Sepsis
and preprocessed usingdatasets/preprocess_sepsis.py
.
All functions necessary to reproduce experiments and results shown in the T-Rep paper are provided in this repository. For reproduction and experiment details, please refer directly to the paper (Appendix A.2).
- Classification, Forecasting, Anomaly Detection: To reproduce experiments for these tasks, you can use functions in
evaluation.py
. - Clustering: Clustering experiments can be reproduced using functions in the
clustering.py
file. Example parameterisation and function calls are provided at the bottom of the file, in the__main__
function. - Sepsis: The code to reproduce Sepsis anomaly detection results can be found in the
sepsis_ad.py
file. An example function call is given in the__main__
function.
If this work has proven useful or you are using this repository for your project, please cite using
@inproceedings{
fraikin2024trep,
title={T-Rep: Representation Learning for Time Series using Time-Embeddings},
author={Archibald Felix Fraikin and Adrien Bennetot and Stephanie Allassonniere},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=3y2TfP966N}
}