久々の投稿ですがSourceforgのOpenCVのページにGPUデモのプログラムが公開されていました。
http://sourceforge.net/projects/opencvlibrary/files/opencv-win/2.3.1/
バージョンもOpenCV2.3.2 って、まだ公開されていないのに...
で早速、このファイルをダウンロードし、試してみました。
私の環境は
OS:Windows7 64bit (ただし、サンプルは32bit動作)
CPU:Core i7 870 (2.93GHz)
GPU:NVIDIA GeForce GTX470
で、パフォーマンス評価用のサンプル(demo_performance.exe)を実行した時の結果がこちら↓
CPU msec | GPU msec | SPEEDUP | DESCRIPTION |
matchTemplate | |||
480 | 5 | x86.5 | src 3000, templ 5, 32F, CCORR |
545 | 21 | x24.9 | src 3000, templ 25, 32F, CCORR |
792 | 31 | x25.4 | src 3000, templ 5, 32F, CCORR |
minMaxLoc | |||
7 | 1 | x4.81 | src 2000, 32F, no mask |
28 | 2 | x14.3 | src 4000, 32F, no mask |
106 | 3 | x28.2 | src 8000, 32F, no mask |
remap | |||
8 | 0 | x35.9 | src 1000,8UC1 |
30 | 0 | x50.6 | src 2000,8UC1 |
124 | 2 | x59.1 | src 4000,8UC1 |
12 | 0 | x32.2 | src 1000,8UC3 |
42 | 1 | x34.6 | src 2000,8UC3 |
164 | 4 | x34.7 | src 4000,8UC3 |
11 | 0 | x37.8 | src 1000,8UC4 |
40 | 0 | x41.4 | src 2000,8UC4 |
156 | 3 | x43.7 | src 4000,8UC4 |
29 | 0 | x79.7 | src 1000,16SC3 |
117 | 1 | x88.8 | src 2000,16SC3 |
481 | 4 | x98.1 | src 4000,16SC3 |
dft | |||
107 | 3 | x29.2 | size 1000, 32FC2, complex-to complex |
402 | 8 | x45.3 | size 2000, 32FC2, complex-to complex |
1781 | 32 | x55.4 | size 4000, 32FC2, complex-to complex |
cornerHarris | |||
129 | 19 | x6.55 | size 2000, 32F |
499 | 70 | x7.06 | size 4000, 32F |
integral | |||
28 | 15 | x1.82 | size 4000, 8U |
20 | 9 | x2.04 | size 4000, 8U |
20 | 9 | x2.11 | size 4000, 8U |
21 | 9 | x2.2 | size 4000, 8U |
21 | 9 | x2.15 | size 4000, 8U |
norm | |||
139 | 5 | x27.3 | size 2000, 32FC4, NORM_INF |
310 | 8 | x35.2 | size 3000, 32FC4, NORM_INF |
555 | 14 | x38.2 | size 4000, 32FC4, NORM_INF |
meanShift | |||
304 | 8 | x36.3 | size 400, 8UC3 vs 8UC4 |
1187 | 29 | x40.2 | size 800, 8UC3 vs 8UC4 |
SURF | |||
5179 | 117 | x43.9 | |
BruteForceMatcher | |||
1230 | 8 | x144 | match |
1257 | 9 | x136 | knnMatch, 2 |
1301 | 9 | x139 | knnMatch, 3 |
1220 | 9 | x135 | radiusMatch |
magnitude | |||
41 | 0 | x86 | size 2000 |
88 | 1 | x85 | size 3000 |
162 | 1 | x95 | size 4000 |
add | |||
10 | 0 | x20.1 | size 2000, 32F |
22 | 1 | x21.4 | size 3000, 32F |
41 | 1 | x23.9 | size 4000, 32F |
log | |||
25 | 0 | x55.8 | size 2000, 32F |
54 | 0 | x66.1 | size 3000, 32F |
97 | 1 | x71.3 | size 4000, 32F |
exp | |||
25 | 0 | x53 | size 2000, 32F |
58 | 0 | x73.7 | size 3000, 32F |
10 | 1 | x74.6 | size 4000, 32F |
mulSpectrums | |||
24 | 0 | x53 | size 2000, 32F |
58 | 0 | x73.7 | size 3000, 32F |
101 | 1 | x74.6 | size 4000, 32F |
resize | |||
7 | 0 | x33.5 | size 1000, 8UC1, up |
28 | 0 | x50.4 | size 2000, 8UC1, up |
64 | 1 | x53.3 | size 3000, 8UC1, up |
2 | 0 | x26.6 | size 1000, 8UC1, down |
8 | 0 | x51.6 | size 2000, 8UC1, down |
18 | 0 | x94.6 | size 3000, 8UC1, down |
21 | 2 | x8.5 | size 1000, 8UC3, up |
82 | 8 | x10.2 | size 2000, 8UC3, up |
187 | 17 | x10.9 | size 3000, 8UC3, up |
7 | 0 | x13.8 | size 1000, 8UC3, down |
26 | 1 | x23.8 | size 2000, 8UC3, down |
58 | 1 | x32.7 | size 3000, 8UC3, down |
27 | 1 | x17.1 | size 1000, 8UC4, up |
108 | 4 | x26 | size 2000, 8UC4, up |
250 | 9 | x27.5 | size 3000, 8UC4, up |
9 | 0 | x14.6 | size 1000, 8UC4, down |
32 | 0 | x37.4 | size 1000, 8UC4, down |
73 | 1 | x52 | size 1000, 8UC4, down |
9 | 2 | x4.72 | size 1000, 32FC1, up |
41 | 6 | x6.25 | size 2000, 32FC1, up |
88 | 13 | x6.45 | size 3000, 32FC1, up |
2 | 0 | x4.92 | size 1000, 32FC1, down |
10 | 0 | x12.1 | size 2000, 32FC1, down |
23 | 1 | x17.2 | size 3000, 32FC1, down |
cvtColor | |||
29 | 0 | x37.1 | size 4000, CV_GRAY2BGRA |
82 | 1 | x61.8 | size 4000, CV_BGR2YCrCb |
105 | 1 | x71.3 | size 4000, CV_YCrCb2BGR |
123 | 1 | x84.4 | size 4000, CV_BGR2XYZ |
116 | 1 | x83.7 | size 4000, CV_XYZ2BGR |
195 | 2 | x72.9 | size 4000, CV_BGR2HSV |
550 | 2 | x189 | size 4000, CV_HSV2BGR |
erode | |||
9 | 3 | x2.65 | size 2000 |
22 | 7 | x3.07 | size 3000 |
39 | 11 | x3.35 | size 4000 |
threshold | |||
0 | 0 | x1.62 | size 1000, 8U, THRESH_BINARY |
1 | 0 | x6.15 | size 2000, 8U, THRESH_BINARY |
3 | 0 | x11.1 | size 3000, 8U, THRESH_BINARY |
6 | 0 | x11.4 | size 4000, 8U, THRESH_BINARY |
1 | 0 | x7.49 | size 1000, 32F, THRESH_BINARY |
6 | 0 | x18.8 | size 2000, 32F, THRESH_BINARY |
14 | 0 | x19.6 | size 3000, 32F, THRESH_BINARY |
25 | 1 | x21.9 | size 4000, 32F, THRESH_BINARY |
pow | |||
5 | 0 | x32.9 | size 1000, 32F |
20 | 0 | x57.1 | size 2000, 32F |
44 | 0 | x58.2 | size 3000, 32F |
83 | 1 | x55.8 | size 4000, 32F |
projectPoints | |||
49 | 2 | x23.3 | size 1000000 |
37 | 2 | x16.1 | size 714285 |
26 | 1 | x24.5 | size510203 |
19 | 0 | x20.9 | size 364430 |
12 | 0 | x21.6 | size 230307 |
solvePnPRansac | |||
217 | 118 | x1.84 | num_points 5000 |
392 | 120 | x3.25 | num_points 18800 |
1315 | 127 | x10.3 | num_points 70688 |
4984 | 160 | x31 | num_points 265786 |
GaussianBlur | |||
1 | 0 | x4.26 | 8UC1, size 1000 |
6 | 1 | x5.61 | 8UC1, size 2000 |
14 | 2 | x6.08 | 8UC1, size 3000 |
24 | 3 | x6.36 | 8UC1, size 4000 |
5 | 0 | x7.57 | 8UC4, size 1000 |
24 | 2 | x10.2 | 8UC4, size 2000 |
56 | 4 | x11.8 | 8UC4, size 3000 |
100 | 8 | x11.7 | 8UC4, size 4000 |
2 | 0 | x5.16 | 32FC1, size 1000 |
8 | 1 | x7.39 | 32FC1, size 2000 |
17 | 2 | x7.6 | 32FC1, size 3000 |
33 | 3 | x8.63 | 32FC1, size 4000 |
pryDown | |||
18 | 5 | x3.55 | 8UC1, size 4000 |
10 | 2 | x3.62 | 8UC1, size 3000 |
4 | 1 | x3.41 | 8UC1, size 2000 |
1 | 0 | x2.74 | 8UC1, size 1000 |
64 | 6 | x9.52 | 8UC3, size 4000 |
38 | 3 | x10 | 8UC3, size 3000 |
16 | 1 | x9.26 | 8UC3, size 2000 |
4 | 0 | x7.71 | 8UC3, size 1000 |
66 | 7 | x8.98 | 8UC4, size 4000 |
38 | 4 | x9.14 | 8UC4, size 3000 |
17 | 2 | x6.36 | 8UC4, size 2000 |
4 | 0 | x7.98 | 8UC4, size 1000 |
107 | 6 | x15.6 | 16SC4, size 4000 |
60 | 3 | x15.6 | 16SC4, size 3000 |
26 | 1 | x14.9 | 16SC4, size 2000 |
7 | 0 | x10.2 | 16SC4, size 1000 |
29 | 5 | x5.7 | 32FC1, size 4000 |
17 | 3 | x5.92 | 32FC1, size 3000 |
7 | 1 | x5.65 | 32FC1, size 2000 |
1 | 0 | x4.62 | 32FC1, size 1000 |
88 | 6 | x13 | 32FC3, size 4000 |
52 | 3 | x13.8 | 32FC3, size 3000 |
22 | 1 | x12.7 | 32FC3, size 2000 |
5 | 0 | x4.92 | 32FC3, size 1000 |
122 | 7 | x17 | 32FC4, size 4000 |
67 | 4 | x16.9 | 32FC4, size 3000 |
30 | 1 | x16.6 | 32FC4, size 2000 |
7 | 0 | x14.4 | 32FC4, size 1000 |
pyrUp | |||
51 | 5 | x8.97 | 8UC1, size 2000 |
12 | 1 | x8.52 | 8UC1, size 1000 |
145 | 8 | x17.3 | 8UC3, size 2000 |
36 | 2 | x16.5 | 8UC3, size 1000 |
198 | 11 | x17.6 | 8UC4, size 2000 |
48 | 2 | x16.7 | 8UC4, size 1000 |
170 | 8 | x19.6 | 16SC3, size 2000 |
44 | 2 | x20.1 | 16SC3, size 1000 |
69 | 5 | x12.2 | 32FC1, size 2000 |
17 | 1 | x11.4 | 32FC1, size 1000 |
205 | 7 | x27.3 | 32FC3, size 2000 |
21 | 1 | x26.2 | 32FC3, size 1000 |
equalizeHist | |||
2 | 1 | x1.93 | size 1000 |
10 | 2 | x4.78 | size 2000 |
23 | 3 | x6.37 | size 3000 |
Canny | |||
29 | 3 | x9.63 | |
reduce | |||
1 | 0 | x7.79 | size 1000, dim = 0 |
1 | 0 | x10.1 | size 1000, dim = 1 |
6 | 0 | x14.5 | size 2000, dim = 0 |
6 | 0 | x25.2 | size 2000, dim = 1 |
13 | 0 | x17.5 | size 3000, dim = 0 |
13 | 0 | x29.3 | size 3000, dim = 1 |
average GPU speedup: x29.202
sizeの表記は、size1000の場合、画像サイズは1000×1000となります。
一般的にGPUを使った画像処理ではメモリの転送時間がかかり処理時間トータルでは、あまり高速化されない
と言われる場合も多いので、ソースコードを見てみないと、いまいち結果をそのまま信用できない...
でも、とりあえずは、そこそこ速そうな結果でした
ソースコードは こちら
https://code.ros.org/trac/opencv/changeset/6950?utm_source=twitterfeed&utm_medium=twitter