Using stream benchmark for measuring memory bandwidth is a industry standard practice and I followed the same. For the x86 systems, to be unbiased, I picked the ‘Stream Triad’ results from a reputable benchmarking org (Anandtech).
Power9 CPU Config used for STREAM testing:
root@ubuntu:/home/ubuntu# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 176
Thread(s) per core: 4
Core(s) per socket: 22
Socket(s): 2
NUMA node(s): 2
Model: 2.2 (pvr 004e 1202)
Model name: POWER9, altivec supported
Memory Config used for STREAM testing:
16x 16GiB RDIMM DDR4 2666 MHz (0.4ns)
Theoretical Memory bandwidth:
Theoretical Memory Bandwidth Calculation on Barreleye G2:
=8(ch)*8(transaction_to_byte)*2.666(GHz)*2(socket)
= 8*8*2.666*2 = 341.248 GB/s
Compiler and run instructions for measurement:
wget http://www.cs.virginia.edu/stream/FTP/Code/stream.c
gcc -m64 -O3 -fopenmp -DSTREAM_ARRAY_SIZE=536895856 -DNTIMES=20 -mcmodel=large stream.c -o stream
OMP_NUM_THREADS=44 GOMP_CPU_AFFINITY=0-175:4 ./stream
Results:
Stream Application | Barreleye G2 – 2 x22 core (2400 MHz)
gcc |
Barreleye G2 2x 22 core (2666 MHz)
gcc |
AMD EPYC 32c 7601 (Anandtech) | 2x Intel Skylake 8176 (Anandtech) |
Stream Copy (MB/s) | 217909.8 | 241641.7 | ||
Stream Add (MB/s) | 240561.6 | 253784 | ||
Stream Scale (MB/s) | 245069.7 | 268929.6 | ||
Stream Triad (MB/s) | 247078.8 | 270000.4 | 207000 | 165000 |
Pictorial Representation of results: