I use the following two makefile to compile my program to do Gaussian blur.
g++ -Ofast -ffast-math -march=native -flto -fwhole-program -std=c++11 -fopenmp -o interpolateFloatImg interpolateFloatImg.cpp
g++ -O3 -std=c++11 -fopenmp -o interpolateFloatImg interpolateFloatImg.cpp
My two testing environments are:
- i7 4710HQ 4 cores 8 threads
- E5 10 cores 20 threads
However, the first output has 2x speed on E5 but 0.5x speed on i7. The second output behaves faster on i7 but slower on E5.
Can anyone give some explanations?
this is the source code: https://github.com/makeapp007/interpolateFloatImg
I will give out more details as soon as possible.
The program on i7 will be run on 8 threads. I did't know how many threads will this program generate on E5.
Aucun commentaire:
Enregistrer un commentaire