2022年3月27日日曜日

Compare the performance with and without AVX2 and FMA of TensorFlow. Comment: Proposal of next-generation audio format with high sound quality of MQA3 (MQA Atudio Ver2) (MPEG-H 3D Audio) 360 Reality Audio and hybrid under the conditions of CPU compatible with both AVX2 & FMA.

https://www.acceluniverse.com/blog/developers/2019/05/tensorflowavx2-fma.html

2019.05.16  Subaru Nakamura  
TensorFlow  Machine learning
Compare performance with and without AVX2, FMA of TensorFlow

Overview

When I install the pre-built TensorFlow with pip, AVX2 and FMA of the CPU instruction set are not enabled. Using AVX2 or FMA can be expected to improve calculation speed and accuracy.

Reference  Meeting to clarify what MMX, SSE, AVX, FMA instructions are

So, this time, I prepared TensorFlow which was built from the source code and enabled AVX2 and FMA, and compared the speed and accuracy with the one with the pre-built package installed.

environment

  • Processor Intel Core i7-5557U 3.1 GHz
  • Memory 16GB 1867 MHz DDR3
  • macOS Sierra 10.12.6
  • Python 3.6.8
  • TensorFlow 1.13.1

Method

The learning was done by the following CNN. MNIST, Fashion MNIST, and CIFAR-10 were used as the data set.

import tensorflow as tf

#Select dataset 
dataset = tf . Keras .datasets .mnist
 # dataset = tf.keras.datasets.fashion_mnist # dataset = tf.keras.datasets.cifar10


( x_train , y_train ), ( x_test , y_test ) = dataset .load_data ( ) x_train , 
x_test = x_train / 255.0 , x_test / 255.0    

#Match shape to model input if len ( x_train .shape ) == 3 : x_train 
    = x_train .reshape ( x_train .shape + ( 1 , )) ; x_test = x_test .reshape ( x_test 
    .shape + ( 1 , ) ) ;
    

#Creating a CNN model 
model = tf .keras .models .Sequential ( [ 
    tf .keras .layers .Conv2D ( 32 , kernel_size = ( 3 , 3 ) , 
        activation = tf 
        .nn .relu , input_shape = x_train .shape 
        [ 1 : 1 : _ ]) , 
    tf . Keras .layers . Conv2D ( 64 , ( 3 , 3 ) , activation = tf .nn .relu ) , 
    tf .keras .layers .MaxPooling2D ( pool_size = ( 2 , 2 ) 
    ) , 
    tf .keras .layers .Dropout ( 0.25 ) , tf .keras . _ _ _ layers .Flatten ( ) , 
    tf .keras .layers _   . Dense ( 128 , activation = tf .nn .relu ) , 
    tf . Keras .layers .Dropout ( 0.5 ) , 
    tf .keras .layers .Dense ( 10 , activation = tf .nn .softmax ) ] ) 
model . _ _ _ _ _ _ _ _ _ = 'adam' , 
    loss =
'sparse_categorical_crossentropy' , 
    metrics = [ 'accuracy' ])

#Training 
model .fit ( x_train , y_train , epochs = 12 ) #Evaluation 
model .evaluate ( x_test , y_test ) _ _

The measurement was performed with the following command.

$ time python cnn .py

result

MNIST

AVX2, FMA YesAVX2, FMA Nonedifference
Learning time26m37s28m32s6.7%
accuracy0.99390.9928+0.0011

Fashion MNIST

AVX2, FMA YesAVX2, FMA Nonedifference
Learning time25m30s27m59s8.9%
accuracy0.92180.9241-0.0023

CIFAR-10

AVX2, FMA YesAVX2, FMA Nonedifference
Learning time32m00s37m04s13.7%
accuracy0.70490.7034+0.0015

summary

Execution time became shorter when AVX2 and FMA were enabled. On the other hand, the accuracy did not change much.

0 コメント:

コメントを投稿