久々の投稿ですがSourceforgのOpenCVのページにGPUデモのプログラムが公開されていました。
http://sourceforge.net/projects/opencvlibrary/files/opencv-win/2.3.1/
バージョンもOpenCV2.3.2 って、まだ公開されていないのに...
で早速、このファイルをダウンロードし、試してみました。
私の環境は
OS:Windows7 64bit (ただし、サンプルは32bit動作)
CPU:Core i7 870 (2.93GHz)
GPU:NVIDIA GeForce GTX470
で、パフォーマンス評価用のサンプル(demo_performance.exe)を実行した時の結果がこちら↓
| CPU msec | GPU msec | SPEEDUP | DESCRIPTION |
| matchTemplate | |||
| 480 | 5 | x86.5 | src 3000, templ 5, 32F, CCORR |
| 545 | 21 | x24.9 | src 3000, templ 25, 32F, CCORR |
| 792 | 31 | x25.4 | src 3000, templ 5, 32F, CCORR |
| minMaxLoc | |||
| 7 | 1 | x4.81 | src 2000, 32F, no mask |
| 28 | 2 | x14.3 | src 4000, 32F, no mask |
| 106 | 3 | x28.2 | src 8000, 32F, no mask |
| remap | |||
| 8 | 0 | x35.9 | src 1000,8UC1 |
| 30 | 0 | x50.6 | src 2000,8UC1 |
| 124 | 2 | x59.1 | src 4000,8UC1 |
| 12 | 0 | x32.2 | src 1000,8UC3 |
| 42 | 1 | x34.6 | src 2000,8UC3 |
| 164 | 4 | x34.7 | src 4000,8UC3 |
| 11 | 0 | x37.8 | src 1000,8UC4 |
| 40 | 0 | x41.4 | src 2000,8UC4 |
| 156 | 3 | x43.7 | src 4000,8UC4 |
| 29 | 0 | x79.7 | src 1000,16SC3 |
| 117 | 1 | x88.8 | src 2000,16SC3 |
| 481 | 4 | x98.1 | src 4000,16SC3 |
| dft | |||
| 107 | 3 | x29.2 | size 1000, 32FC2, complex-to complex |
| 402 | 8 | x45.3 | size 2000, 32FC2, complex-to complex |
| 1781 | 32 | x55.4 | size 4000, 32FC2, complex-to complex |
| cornerHarris | |||
| 129 | 19 | x6.55 | size 2000, 32F |
| 499 | 70 | x7.06 | size 4000, 32F |
| integral | |||
| 28 | 15 | x1.82 | size 4000, 8U |
| 20 | 9 | x2.04 | size 4000, 8U |
| 20 | 9 | x2.11 | size 4000, 8U |
| 21 | 9 | x2.2 | size 4000, 8U |
| 21 | 9 | x2.15 | size 4000, 8U |
| norm | |||
| 139 | 5 | x27.3 | size 2000, 32FC4, NORM_INF |
| 310 | 8 | x35.2 | size 3000, 32FC4, NORM_INF |
| 555 | 14 | x38.2 | size 4000, 32FC4, NORM_INF |
| meanShift | |||
| 304 | 8 | x36.3 | size 400, 8UC3 vs 8UC4 |
| 1187 | 29 | x40.2 | size 800, 8UC3 vs 8UC4 |
| SURF | |||
| 5179 | 117 | x43.9 | |
| BruteForceMatcher | |||
| 1230 | 8 | x144 | match |
| 1257 | 9 | x136 | knnMatch, 2 |
| 1301 | 9 | x139 | knnMatch, 3 |
| 1220 | 9 | x135 | radiusMatch |
| magnitude | |||
| 41 | 0 | x86 | size 2000 |
| 88 | 1 | x85 | size 3000 |
| 162 | 1 | x95 | size 4000 |
| add | |||
| 10 | 0 | x20.1 | size 2000, 32F |
| 22 | 1 | x21.4 | size 3000, 32F |
| 41 | 1 | x23.9 | size 4000, 32F |
| log | |||
| 25 | 0 | x55.8 | size 2000, 32F |
| 54 | 0 | x66.1 | size 3000, 32F |
| 97 | 1 | x71.3 | size 4000, 32F |
| exp | |||
| 25 | 0 | x53 | size 2000, 32F |
| 58 | 0 | x73.7 | size 3000, 32F |
| 10 | 1 | x74.6 | size 4000, 32F |
| mulSpectrums | |||
| 24 | 0 | x53 | size 2000, 32F |
| 58 | 0 | x73.7 | size 3000, 32F |
| 101 | 1 | x74.6 | size 4000, 32F |
| resize | |||
| 7 | 0 | x33.5 | size 1000, 8UC1, up |
| 28 | 0 | x50.4 | size 2000, 8UC1, up |
| 64 | 1 | x53.3 | size 3000, 8UC1, up |
| 2 | 0 | x26.6 | size 1000, 8UC1, down |
| 8 | 0 | x51.6 | size 2000, 8UC1, down |
| 18 | 0 | x94.6 | size 3000, 8UC1, down |
| 21 | 2 | x8.5 | size 1000, 8UC3, up |
| 82 | 8 | x10.2 | size 2000, 8UC3, up |
| 187 | 17 | x10.9 | size 3000, 8UC3, up |
| 7 | 0 | x13.8 | size 1000, 8UC3, down |
| 26 | 1 | x23.8 | size 2000, 8UC3, down |
| 58 | 1 | x32.7 | size 3000, 8UC3, down |
| 27 | 1 | x17.1 | size 1000, 8UC4, up |
| 108 | 4 | x26 | size 2000, 8UC4, up |
| 250 | 9 | x27.5 | size 3000, 8UC4, up |
| 9 | 0 | x14.6 | size 1000, 8UC4, down |
| 32 | 0 | x37.4 | size 1000, 8UC4, down |
| 73 | 1 | x52 | size 1000, 8UC4, down |
| 9 | 2 | x4.72 | size 1000, 32FC1, up |
| 41 | 6 | x6.25 | size 2000, 32FC1, up |
| 88 | 13 | x6.45 | size 3000, 32FC1, up |
| 2 | 0 | x4.92 | size 1000, 32FC1, down |
| 10 | 0 | x12.1 | size 2000, 32FC1, down |
| 23 | 1 | x17.2 | size 3000, 32FC1, down |
| cvtColor | |||
| 29 | 0 | x37.1 | size 4000, CV_GRAY2BGRA |
| 82 | 1 | x61.8 | size 4000, CV_BGR2YCrCb |
| 105 | 1 | x71.3 | size 4000, CV_YCrCb2BGR |
| 123 | 1 | x84.4 | size 4000, CV_BGR2XYZ |
| 116 | 1 | x83.7 | size 4000, CV_XYZ2BGR |
| 195 | 2 | x72.9 | size 4000, CV_BGR2HSV |
| 550 | 2 | x189 | size 4000, CV_HSV2BGR |
| erode | |||
| 9 | 3 | x2.65 | size 2000 |
| 22 | 7 | x3.07 | size 3000 |
| 39 | 11 | x3.35 | size 4000 |
| threshold | |||
| 0 | 0 | x1.62 | size 1000, 8U, THRESH_BINARY |
| 1 | 0 | x6.15 | size 2000, 8U, THRESH_BINARY |
| 3 | 0 | x11.1 | size 3000, 8U, THRESH_BINARY |
| 6 | 0 | x11.4 | size 4000, 8U, THRESH_BINARY |
| 1 | 0 | x7.49 | size 1000, 32F, THRESH_BINARY |
| 6 | 0 | x18.8 | size 2000, 32F, THRESH_BINARY |
| 14 | 0 | x19.6 | size 3000, 32F, THRESH_BINARY |
| 25 | 1 | x21.9 | size 4000, 32F, THRESH_BINARY |
| pow | |||
| 5 | 0 | x32.9 | size 1000, 32F |
| 20 | 0 | x57.1 | size 2000, 32F |
| 44 | 0 | x58.2 | size 3000, 32F |
| 83 | 1 | x55.8 | size 4000, 32F |
| projectPoints | |||
| 49 | 2 | x23.3 | size 1000000 |
| 37 | 2 | x16.1 | size 714285 |
| 26 | 1 | x24.5 | size510203 |
| 19 | 0 | x20.9 | size 364430 |
| 12 | 0 | x21.6 | size 230307 |
| solvePnPRansac | |||
| 217 | 118 | x1.84 | num_points 5000 |
| 392 | 120 | x3.25 | num_points 18800 |
| 1315 | 127 | x10.3 | num_points 70688 |
| 4984 | 160 | x31 | num_points 265786 |
| GaussianBlur | |||
| 1 | 0 | x4.26 | 8UC1, size 1000 |
| 6 | 1 | x5.61 | 8UC1, size 2000 |
| 14 | 2 | x6.08 | 8UC1, size 3000 |
| 24 | 3 | x6.36 | 8UC1, size 4000 |
| 5 | 0 | x7.57 | 8UC4, size 1000 |
| 24 | 2 | x10.2 | 8UC4, size 2000 |
| 56 | 4 | x11.8 | 8UC4, size 3000 |
| 100 | 8 | x11.7 | 8UC4, size 4000 |
| 2 | 0 | x5.16 | 32FC1, size 1000 |
| 8 | 1 | x7.39 | 32FC1, size 2000 |
| 17 | 2 | x7.6 | 32FC1, size 3000 |
| 33 | 3 | x8.63 | 32FC1, size 4000 |
| pryDown | |||
| 18 | 5 | x3.55 | 8UC1, size 4000 |
| 10 | 2 | x3.62 | 8UC1, size 3000 |
| 4 | 1 | x3.41 | 8UC1, size 2000 |
| 1 | 0 | x2.74 | 8UC1, size 1000 |
| 64 | 6 | x9.52 | 8UC3, size 4000 |
| 38 | 3 | x10 | 8UC3, size 3000 |
| 16 | 1 | x9.26 | 8UC3, size 2000 |
| 4 | 0 | x7.71 | 8UC3, size 1000 |
| 66 | 7 | x8.98 | 8UC4, size 4000 |
| 38 | 4 | x9.14 | 8UC4, size 3000 |
| 17 | 2 | x6.36 | 8UC4, size 2000 |
| 4 | 0 | x7.98 | 8UC4, size 1000 |
| 107 | 6 | x15.6 | 16SC4, size 4000 |
| 60 | 3 | x15.6 | 16SC4, size 3000 |
| 26 | 1 | x14.9 | 16SC4, size 2000 |
| 7 | 0 | x10.2 | 16SC4, size 1000 |
| 29 | 5 | x5.7 | 32FC1, size 4000 |
| 17 | 3 | x5.92 | 32FC1, size 3000 |
| 7 | 1 | x5.65 | 32FC1, size 2000 |
| 1 | 0 | x4.62 | 32FC1, size 1000 |
| 88 | 6 | x13 | 32FC3, size 4000 |
| 52 | 3 | x13.8 | 32FC3, size 3000 |
| 22 | 1 | x12.7 | 32FC3, size 2000 |
| 5 | 0 | x4.92 | 32FC3, size 1000 |
| 122 | 7 | x17 | 32FC4, size 4000 |
| 67 | 4 | x16.9 | 32FC4, size 3000 |
| 30 | 1 | x16.6 | 32FC4, size 2000 |
| 7 | 0 | x14.4 | 32FC4, size 1000 |
| pyrUp | |||
| 51 | 5 | x8.97 | 8UC1, size 2000 |
| 12 | 1 | x8.52 | 8UC1, size 1000 |
| 145 | 8 | x17.3 | 8UC3, size 2000 |
| 36 | 2 | x16.5 | 8UC3, size 1000 |
| 198 | 11 | x17.6 | 8UC4, size 2000 |
| 48 | 2 | x16.7 | 8UC4, size 1000 |
| 170 | 8 | x19.6 | 16SC3, size 2000 |
| 44 | 2 | x20.1 | 16SC3, size 1000 |
| 69 | 5 | x12.2 | 32FC1, size 2000 |
| 17 | 1 | x11.4 | 32FC1, size 1000 |
| 205 | 7 | x27.3 | 32FC3, size 2000 |
| 21 | 1 | x26.2 | 32FC3, size 1000 |
| equalizeHist | |||
| 2 | 1 | x1.93 | size 1000 |
| 10 | 2 | x4.78 | size 2000 |
| 23 | 3 | x6.37 | size 3000 |
| Canny | |||
| 29 | 3 | x9.63 | |
| reduce | |||
| 1 | 0 | x7.79 | size 1000, dim = 0 |
| 1 | 0 | x10.1 | size 1000, dim = 1 |
| 6 | 0 | x14.5 | size 2000, dim = 0 |
| 6 | 0 | x25.2 | size 2000, dim = 1 |
| 13 | 0 | x17.5 | size 3000, dim = 0 |
| 13 | 0 | x29.3 | size 3000, dim = 1 |
average GPU speedup: x29.202
sizeの表記は、size1000の場合、画像サイズは1000×1000となります。
一般的にGPUを使った画像処理ではメモリの転送時間がかかり処理時間トータルでは、あまり高速化されない
と言われる場合も多いので、ソースコードを見てみないと、いまいち結果をそのまま信用できない...
でも、とりあえずは、そこそこ速そうな結果でした
ソースコードは こちら
https://code.ros.org/trac/opencv/changeset/6950?utm_source=twitterfeed&utm_medium=twitter


コメント