OpenMP on Ubuntu
OpenMP is a library for executing C, C++ and Fortran code on multiple processors at the same time. This makes your code much faster if it uses a lot of loops, and utilizes the full power of your CPU. In case you just want to see the code, the link is here.
Setting up OpenMP on Ubuntu / Linux
I am not entirely sure how this works on other platforms. It is near-impossible on Mac, there are some weird setup instructions for Windows, but on Linux and Ubuntu it is really easy.
- Run
sudo apt-get install libomp-dev
in your Terminal. - Create a
C++ Project
, and title itHelloOpenMP
. - Select your project, and go to the
Properties
dialog. - Go to
C/C++ Build -> Settings
. - Select
GCC C++ Compiler / Miscellaneous
. - In the
Other flags
input, add on-fopenmp
. - Select
GCC C++ Linker / Libraries
. - In the
Libraries (-l)
field, click the add button and type ingomp
.
Afterwards, your properties should look something like this:
That’s it!
Using OpenMP
Say you have a awesome program that prints out a list of 10 numbers:
#include <stdio.h>int main(){
for(int i=0;i<10;i++){
printf("%i\n",i);
}
return 0;
}
This outputs something like:
0
1
2
3
4
5
6
7
8
9
Now, lets OpenMPinize it!
#include <stdio.h>int main(){
#pragma omp parallel for
for(int i=0;i<10;i++){
printf("%i\n",i);
}
return 0;
}
This outputs something like:
4
7
6
9
0
1
8
2
3
5
The numbers are out of order because each iteration in the loop is executed at a slightly different time, in parallel.
Wait, what? “How could it be that easy?” I hear you say. It actually is this easy, if your compiler supports OpenMP. In general, GCC with a recent version should be fine. And if your compiler doesn’t support it — the pragmas are ignored! And your code falls back to single-core sluggishness. So OpenMP is completely compatible with any machine.
The source code can be found here.
The End
I made a little mandelbrot program using my custom PPM image library I made:
#include <math.h>
#include "ppm.h"
#include <chrono>
#include "complex.h"
#include "omp.h"using namespace std::chrono;///https://stackoverflow.com/a/19555298/9609025
long curTime(){
milliseconds ms = duration_cast< milliseconds >(system_clock::now().time_since_epoch());
return ms.count();
}int main(){
int w=1000;
int h=1000;
ppm img;
img.setSize(w,h);
img.allocMem(); long start,end; start=curTime();
#pragma omp parallel for
for(int x=0;x<w;x++){
#pragma omp parallel for
for(int y=0;y<h;y++){
// printf("%i %i\n",x,y);
float fx=x;
float fy=y;
fx/=w;
fy/=h;
fx*=4;
fy*=4;
fx-=2;
fy-=2; complex c=fromXY(fx,fy);
complex c0=c; int max=50;
int i=0; for(i=0;i<max&&c.r<10000;i++){
c=c^2;
c=c+c0;
} float f=((float)i)/((float)max); img.setPixel(x,y,f);
}
} end=curTime(); unsigned long diff1=end-start; start=curTime();
for(int x=0;x<w;x++){
for(int y=0;y<h;y++){
// printf("%i %i\n",x,y);
float fx=x;
float fy=y;
fx/=w;
fy/=h;
fx*=4;
fy*=4;
fx-=2;
fy-=2; complex c=fromXY(fx,fy);
complex c0=c; int max=50;
int i=0; for(i=0;i<max&&c.r<10000;i++){
c=c^2;
c=c+c0;
} float f=((float)i)/((float)max); img.setPixel(x,y,f);
}
}
end=curTime(); long diff2=end-start; printf("With OMP : %lums\n",diff1);
printf("Without OMP : %lums\n",diff2);
printf("Speedup : %lums\n",diff2-diff1); img.clamp();
img.save("mandelbrot.ppm");
img.dealloc(); return 0;}
On my computer, this gives the following:
With OMP : 451ms
Without OMP : 1475ms
Speedup : 1024ms
Pretty awesome!
This can be applied to any loop in C++ that can execute independently of the other iterations, with a HUGE speedup. I get an even bigger speedup when using OpenMP with some of my pathtracing algorithms.
This new library makes it much easier to run multiple things on the CPU at once, and compared to OpenCL, is much easier to use.