Skip to content

A Julia implementation of Wavelet Kolmogorov-Arnold Networks. Mutli-layer Perceptron (MLP) and wavelet-KAN implementations of the Transformer and Recurrent Neural Operator (RNO) are applied to the 1D unit cell problem with a viscoelastic constitutive relation.

Notifications You must be signed in to change notification settings

exa-laboratories/Julia-Wav-KAN

Repository files navigation

Julia-Wav-KAN

A Julia implementation of Wavelet Kolmogorov-Arnold Networks (wavKAN). Mutli-layer Perceptron (MLP) and wavKAN implementations of the Transformer and Recurrent Neural Operator (RNO) are applied to the 1D unit cell problem with a viscoelastic constitutive relation.

This dataset is particularly difficult for the Transformer to learn, but easy for the RNO. The wavKAN is investigated here to see if it can improve the Transformer's performance, and perhaps even surpass the RNO.

The MLP models were developed in a previous side project. The commit history attributed to their development can be found there.

To Run

  1. Get dependencies:
julia requirements.jl
  1. Tune hyperparameters:
julia src/models/Vanilla_RNO/hyperparameter_tuning.jl
julia src/models/wavKAN_RNO/hyperparameter_tuning.jl
julia src/models/Vanilla_Transformer/hyperparameter_tuning.jl
julia src/models/wavKAN_Transformer/hyperparameter_tuning.jl
  1. (Alternatively to 2) Manually configure hyperparameters in the respective config.ini files.

  2. Train the models, (model_name variable is set on line 26), and log the results:

julia train.jl
  1. Compare the training loops:
python results.py
  1. Visualize the results:
julia predict_stress.jl

Results

Wavelet KAN models seem to perform poorly compared to their MLP counterparts. Additionally, the wavKAN Transformer had to be limited in complexity to load on the GPU, which may have contributed to its poor performance. However, the KAN RNO performed decently, but optimised towards a greater complexity than its MLP counterpart. The MLP RNO was the best performing model, with the lowest test loss and BIC, and the highest predictive power and consistency.

Visualised Predicted Stress Fields

Below are the resulting best predictions of the models. The MLPs consistently outperformed the wavKANs, with the RNOs performing better than the Transformers.

RNO Predicted Stress Field RNO Predicted Stress Field RNO Predicted Stress Field RNO Predicted Stress Field

Predictive Power and Consistency

Model Train Loss Test Loss BIC Time (mins) Param Count
MLP RNO 1.35 ± 0.20 0.41 ± 0.07 0.82 ± 0.13 62.61 ± 19.06 52
wavKAN RNO 2.62 ± 0.72 0.97 ± 0.39 10163.26 ± 0.77 43.53 ± 0.53 4,413
MLP Transformer 9.43 ± 2.28 34.52 ± 61.56 9692121.72 ± 123.13 5.01 ± 0.72 4,209,205
wavKAN Transformer 584.57 ± 153.44 187.15 ± 44.61 788293.94 ± 89.23 23.31 ± 0.22 489,562

TODO - Plot FLOPs comparison

Training time was recorded for each of the models, but this is not considered a reliable estimate of the computational cost of the models, given that they were not run on the same hardware, and multiple tasks were running on the same machine. The number of FLOPs for each model will be calculated and compared in the future, once GFlops is updated to work with the latest Julia version.

Wavelets

Message from author:

There were two intentions behind the development of this repo:

  • For me to learn about and verify wavelet transforms for function approximation in the context of KANs.
  • To show off some scientific machine learning and demonstrate that the same techniques used for NLP could instead be applied to something else.
  • Showcase empirically why you can't just chuck Transformers at sequence modelling problems outside of NLP and expect them to be the most efficient or optimal architecture.

I expected the discrete wavelet transform to work well here, since its good at representing both spatial and temporal dependencies, (which is what you need for viscoplastic material deformation). However, while the wavelet-KAN was able to learn the solution operator when realised as a Recurrent Neural Operator, it struggled wildly during tuning, was outperformed by its MLP counterpart, and completely failed when realised as a Transformer, (although its complexity was much reduced from the MLP Transformer).

That being said, in a different project, a wavelet-KAN realisation of a Convolutional Neural Network completely outshone its MLP variant when predicting a 2D Darcy FLow in terms of generalisation. This suggests that the choice of univariate function matters a lot in your KAN architectures - the wavelets were more suitable for learning the Darcy Flow problem than this viscoplastic material modelling problem.

One of the strengths of the KAN seems to be the ability to embed priors and shape its architecture through the choice of univariate function. Wavelets may be too restrictive compared to some of the other KAN models arising from the community. Architectural flexibility is really important for real-world problems, especially when data is limited, noisy, and expensive to obtain. Even AlphaFold v2 is not just a Transformer - it's an 'Evoformer' with embedded physical and biological priors to help it generalise.

So, I think KANs are awesome and an incredible opportunity for scientific machine learning. If you want to solve the big problems, (e.g. the climate crisis, growth of cancer cells, neurocognitive disorders), you can't just disregard centuries of accumulated knowledge and throw black-box algorithms at them. Different architectures are better at different things, and the KAN's flexibility, along with its capacity for symbolic regression, has the potential to be instrumental in expanding human knowledge.

Problem and Data - Unit Cell Problem with Viscoelastic Constitutive Relation

The dataset has been sourced from the University of Cambridge Engineering Department's Part IIB course on Data-Driven and Learning-Based Methods in Mechanics and Materials.

It consists of the unit cell of a three-phase viscoelastic composite material. The objective is to understand the macroscopic behavior of the material by learning the constitutive relation that maps the strain field $\epsilon(x,t)$ to the stress field $\sigma(x,t)$.

Both a Transformer and a Recurrent Neural Operator (RNO) are implemented in their MLP and wavKAN formats. From a previous project, I found this dataset to be especially difficult to learn for the Transformer, but easy enough for the RNO. It is also one-dimensional, making it a prime candidate to compare wavKAN against its MLP equivalents.

Governing Equations

The behavior of the unit cell is described by the following equations:

1. Kinematic Relation:

The strain field $\epsilon(x,t)$ is related to the displacement field $u(x,t)$ by:

$$ \frac{\partial \epsilon(x,t)}{\partial x} = \frac{\partial u(x,t)}{\partial x} $$

2. Equilibrium:

The equilibrium condition is given by:

$$ \int_0^1 \frac{d\sigma(x)}{dx} = 0 $$

3. Constitutive Relation:

The stress field $\sigma(x,t)$ is related to the strain field $\epsilon(x,t)$ and the displacement field $u(x,t)$ by:

$$ \sigma(x,t) = E(x)\epsilon(x,t) + v(x) \frac{\partial u(x,t)}{\partial t} $$

where $E(x)$ is the Young's modulus and $v(x)$ is the viscosity, both of which are piecewise constant functions with three different values corresponding to the three phases of the composite material.

Initial and Boundary Conditions

Initial Conditions:

$$ u(x,0) = 0, \quad \dot{u}(x,0) = 0 $$

Boundary Conditions:

$$ u(0,t) = 0, \quad u(1,t) = \bar{u}(t) $$

where $\bar{u}(t)$ is a prescribed displacement at $x = 1$.

Composite Material Properties

The composite material is made up of three different phases, each with distinct values of Young's modulus $E(x)$ and viscosity $v(x)$. These properties are piecewise constant functions over the spatial domain $0 \leq x \leq 1$.

Objective

The objective is to learn the macroscopic constitutive relation that maps the strain field $\epsilon(x,t)$ to the stress field $\sigma(x,t)$:

$$ \epsilon \mapsto \sigma $$

using a macroscopic constitutive model. This model should capture the complex viscoelastic behavior of the composite material.

Input and Output Framework

Input:

The input to the macroscopic constitutive model at each time step $t$ is the macroscopic strain field $\epsilon(x,t)$.

Output:

The output of the macroscopic constitutive model at each time step $t$ is the macroscopic stress field $\sigma(x,t)$.

In essence, the macroscopic constitutive model aims to learn the mapping between the applied macroscopic strain field $\epsilon(x,t)$ and the resulting macroscopic stress field $\sigma(x,t)$ at each time step $t$, for all spatial positions $x$ in the three-phase viscoelastic composite material, based on the underlying unit cell problem and its governing equations.

References

About

A Julia implementation of Wavelet Kolmogorov-Arnold Networks. Mutli-layer Perceptron (MLP) and wavelet-KAN implementations of the Transformer and Recurrent Neural Operator (RNO) are applied to the 1D unit cell problem with a viscoelastic constitutive relation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published