https://www.acceluniverse.com/blog/developers/2019/05/tensorflowavx2-fma.html
Overview
When I install the pre-built TensorFlow with pip, AVX2 and FMA of the CPU instruction set are not enabled. Using AVX2 or FMA can be expected to improve calculation speed and accuracy.
Reference Meeting to clarify what MMX, SSE, AVX, FMA instructions are
So, this time, I prepared TensorFlow which was built from the source code and enabled AVX2 and FMA, and compared the speed and accuracy with the one with the pre-built package installed.
environment
- Processor Intel Core i7-5557U 3.1 GHz
- Memory 16GB 1867 MHz DDR3
- macOS Sierra 10.12.6
- Python 3.6.8
- TensorFlow 1.13.1
Method
The learning was done by the following CNN. MNIST, Fashion MNIST, and CIFAR-10 were used as the data set.
import tensorflow as tf
#Select dataset
dataset = tf . Keras .datasets .mnist
# dataset = tf.keras.datasets.fashion_mnist # dataset = tf.keras.datasets.cifar10
( x_train , y_train ), ( x_test , y_test ) = dataset .load_data ( ) x_train ,
x_test = x_train / 255.0 , x_test / 255.0
#Match shape to model input if len ( x_train .shape ) == 3 : x_train
= x_train .reshape ( x_train .shape + ( 1 , )) ; x_test = x_test .reshape ( x_test
.shape + ( 1 , ) ) ;
#Creating a CNN model
model = tf .keras .models .Sequential ( [
tf .keras .layers .Conv2D ( 32 , kernel_size = ( 3 , 3 ) ,
activation = tf
.nn .relu , input_shape = x_train .shape
[ 1 : 1 : _ ]) ,
tf . Keras .layers . Conv2D ( 64 , ( 3 , 3 ) , activation = tf .nn .relu ) ,
tf .keras .layers .MaxPooling2D ( pool_size = ( 2 , 2 )
) ,
tf .keras .layers .Dropout ( 0.25 ) , tf .keras . _ _ _ layers .Flatten ( ) ,
tf .keras .layers _ . Dense ( 128 , activation = tf .nn .relu ) ,
tf . Keras .layers .Dropout ( 0.5 ) ,
tf .keras .layers .Dense ( 10 , activation = tf .nn .softmax ) ] )
model . _ _ _ _ _ _ _ _ _ = 'adam' ,
loss =
'sparse_categorical_crossentropy' ,
metrics = [ 'accuracy' ])
#Training
model .fit ( x_train , y_train , epochs = 12 ) #Evaluation
model .evaluate ( x_test , y_test ) _ _
The measurement was performed with the following command.
$ time python cnn .py
result
MNIST
AVX2, FMA Yes | AVX2, FMA None | difference | |
---|---|---|---|
Learning time | 26m37s | 28m32s | 6.7% |
accuracy | 0.9939 | 0.9928 | +0.0011 |
Fashion MNIST
AVX2, FMA Yes | AVX2, FMA None | difference | |
---|---|---|---|
Learning time | 25m30s | 27m59s | 8.9% |
accuracy | 0.9218 | 0.9241 | -0.0023 |
CIFAR-10
AVX2, FMA Yes | AVX2, FMA None | difference | |
---|---|---|---|
Learning time | 32m00s | 37m04s | 13.7% |
accuracy | 0.7049 | 0.7034 | +0.0015 |
summary
Execution time became shorter when AVX2 and FMA were enabled. On the other hand, the accuracy did not change much.
0 件のコメント:
コメントを投稿