Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems

14:45 - 15:00
Time-series ML-regression on Graphcore IPU-M2000 and Nvidia A100

Jan Balewski, Kristofer Bouchard
Lawrence Berkeley National Laboratory, CA

Zhenying Liu, Alexander Tsyplikhin, Manuel Lopez Roland
Graphcore, CA

We compare the ML-training performance of a Graphcore IPU-M2000-based system with Nvidia A100 GPU-based system on the Perlmutter HPC machine at NERSC/LBL. The multivariate regression of time series data from a simulated biological neuron was the scientific benchmark problem. The ML-model consisted of several convolutional, batch normalization, and fully connected layers. The training data were distributed in CPUs memory to eliminate the system dependent IO cost. The data-parallel training runs resulted in the same samples throughput on both GC200 IPUs and A100 GPUs for any choice of the number of accelerators between 1 and 256. The achieved best MSE validation loss on IPUs was only 10% to 20% larger. The aggregated energy use per 1 training epoch was between 2.5 to 3 times smaller for the Graphcore system in comparison to the Nvidia system. This paper also discusses aspects of software-hardware co-design to achieve highest efficiency on the IPU using PopTorch.

13th IEEE International Workshop on

Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems

held in conjunction with SC22: The International Conference for High Performance Computing, Networking, Storage and Analysis