Typically, neural networks required activation functions to be able to approximate effectively complex distributions. ReLU-nets have been popular, e.g., ResNets. However, in the proposed polynomial nets, there is no strict requirement for activation functions as the Π-nets already include nonlinear interactions between the input elements. In fact, you could capture high-order correlations between the input elements without any activation functions, which is what we focus on in this experiment. In particular, we illustrate how Π-nets can learn classification even in the demanding ImageNet without activation functions. We hope that our code can inspire further experimentation with networks that do not require activation functions and can find alternative ways to express nonlinear relationships between the input elements.
Please follow mmclassification to set up the training environment. Our models are trained by a single server with eight V100 GPUs.
We slightly modifiy ResNet for different experiments.
All other training details follow the standard configuration.
Model | ReLu | Conv-1x1 | Top-1 (%) | Top-5 (%) | Backbone | Logs |
---|---|---|---|---|---|---|
ResNet-18 | Yes | No | 69.90 | 89.43 | config | log |
ResNet-18 | No | No | 18.348 | 36.718 | backbone | log |
PiNet-18 | No | No | 63.666 | 84.340 | backbone | log |
PiNet-18 | No | Yes | 65.306 | 85.830 | backbone | log |
PiNet-18 | Yes | No | 70.350 | 89.434 | backbone | log |
PiNet-18 | Yes | Yes | 71.644 | 90.232 | backbone | log |