Compare the performance with and without AVX2 and FMA of TensorFlow. Comment: Proposal of next-generation audio format with high sound quality of MQA3 (MQA Atudio Ver2) (MPEG-H 3D Audio) 360 Reality Audio and hybrid under the conditions of CPU compatible with both AVX2 & FMA. ~ Denonへ。https://neovisionconsulting.blogspot.com/2025/03/denon.html

2022年3月27日日曜日

Compare the performance with and without AVX2 and FMA of TensorFlow. Comment: Proposal of next-generation audio format with high sound quality of MQA3 (MQA Atudio Ver2) (MPEG-H 3D Audio) 360 Reality Audio and hybrid under the conditions of CPU compatible with both AVX2 & FMA.

https://www.acceluniverse.com/blog/developers/2019/05/tensorflowavx2-fma.html

2019.05.16 Subaru Nakamura

TensorFlow Machine learning

Overview

When I install the pre-built TensorFlow with pip, AVX2 and FMA of the CPU instruction set are not enabled. Using AVX2 or FMA can be expected to improve calculation speed and accuracy.

Reference Meeting to clarify what MMX, SSE, AVX, FMA instructions are

So, this time, I prepared TensorFlow which was built from the source code and enabled AVX2 and FMA, and compared the speed and accuracy with the one with the pre-built package installed.

environment

Processor Intel Core i7-5557U 3.1 GHz
Memory 16GB 1867 MHz DDR3
macOS Sierra 10.12.6
Python 3.6.8
TensorFlow 1.13.1

Method

The learning was done by the following CNN. MNIST, Fashion MNIST, and CIFAR-10 were used as the data set.

import tensorflow as tf

#Select dataset 
dataset = tf . Keras .datasets .mnist
 # dataset = tf.keras.datasets.fashion_mnist # dataset = tf.keras.datasets.cifar10


( x_train , y_train ), ( x_test , y_test ) = dataset .load_data ( ) x_train , 
x_test = x_train / 255.0 , x_test / 255.0    

#Match shape to model input if len ( x_train .shape ) == 3 : x_train 
    = x_train .reshape ( x_train .shape + ( 1 , )) ; x_test = x_test .reshape ( x_test 
    .shape + ( 1 , ) ) ;
    

#Creating a CNN model 
model = tf .keras .models .Sequential ( [ 
    tf .keras .layers .Conv2D ( 32 , kernel_size = ( 3 , 3 ) , 
        activation = tf 
        .nn .relu , input_shape = x_train .shape 
        [ 1 : 1 : _ ]) , 
    tf . Keras .layers . Conv2D ( 64 , ( 3 , 3 ) , activation = tf .nn .relu ) , 
    tf .keras .layers .MaxPooling2D ( pool_size = ( 2 , 2 ) 
    ) , 
    tf .keras .layers .Dropout ( 0.25 ) , tf .keras . _ _ _ layers .Flatten ( ) , 
    tf .keras .layers _   . Dense ( 128 , activation = tf .nn .relu ) , 
    tf . Keras .layers .Dropout ( 0.5 ) , 
    tf .keras .layers .Dense ( 10 , activation = tf .nn .softmax ) ] ) 
model . _ _ _ _ _ _ _ _ _ = 'adam' , 
    loss =
'sparse_categorical_crossentropy' , 
    metrics = [ 'accuracy' ])

#Training 
model .fit ( x_train , y_train , epochs = 12 ) #Evaluation 
model .evaluate ( x_test , y_test ) _ _

The measurement was performed with the following command.

$ time python cnn .py

result

MNIST

	AVX2, FMA Yes	AVX2, FMA None	difference
Learning time	26m37s	28m32s	6.7%
accuracy	0.9939	0.9928	+0.0011

Fashion MNIST

	AVX2, FMA Yes	AVX2, FMA None	difference
Learning time	25m30s	27m59s	8.9%
accuracy	0.9218	0.9241	-0.0023

CIFAR-10

	AVX2, FMA Yes	AVX2, FMA None	difference
Learning time	32m00s	37m04s	13.7%
accuracy	0.7049	0.7034	+0.0015

summary

Execution time became shorter when AVX2 and FMA were enabled. On the other hand, the accuracy did not change much.

Denonへ。https://neovisionconsulting.blogspot.com/2025/03/denon.html

2022年3月27日日曜日