I was extremely bored, and I wanted to make something cool. Nowadays cool equals artificial intelligence, computer vision or anything related.
This project takes a webcam feed as an input and monitors for light blobs. It can be flashlight, laser, match or any light source that can be distinguished from the background.
The software catches every frame from the webcam and passes the frame's
BitmapData.Scan0 pointer to an external x64 library written in C++ for speed. Then, this library scans the data pixel-by-pixel and makes a list of the brightest pixels. This list now contains the points of the blob. To get the center of the blob, the function searches for the extreme points in the list, and using these 4 points calculates the center of the blob. This is added to a list, from which another function in another thread draws the "detected blob history" image.
Every blob is an object of type
Pixel and contains the X and Y coordinates, an associated shape ID, and the exact time it was spotted on a frame.
The frame is a
Bitmap image, which has a
GetPixel() function, so therefore this should be the way to go. However, this function is really-really slow. A scan takes 121 ms to complete. This is not acceptable at a 30 FPS video, so further optimization is required.
You can get the pointer of the image pixels in the
BitmapData.Scan0 property. Cutting out the middleman and scanning directly this is 83% faster.
The managed version of this scanner takes 20 ms to complete under x86. Compiling the application into an x64 assembly will run the algorithm under 8 ms.
The algorithm can be further optimized, by using the pointer in an unsafe code block instead of marshaling into a byte array. This way the scan runs for 5 ms.
The exactly same code, rewritten in C++ and compiled to an x64 library scans the picture under just 2 ms.
After another take on the C# code, I've separated the picture into several parts and parallelized their scan. Instead of a linear scan on a single core, the scan now runs on all the four cores of my CPU with two threads each, and this makes it waaay faster. From 5 ms to 0.3 ms!
So there you have it, after lots of thinking and coding I was able to optimize the algorithm to be 99.7% faster from 121 ms to just 0.3 ms.
I also made attempts with using the GPU, however since the algorithm is quite simple, it took much more time to convert the image into a parallel array, process it on the GPU, then convert it back.
What this does, is moving the mouse as the blob is recognized to move on the frame. It was provided only as a proof of concept, since the input image is 640x480 divided by 2, then the blob size is about 10 pixels big (in case of a flashlight), which means there are roughly 36*24 (768) positions where the blob can move the mouse. Also the image is of 4:3 ratio, while your screen is probably 16:9.
Given these facts, the function works surprisingly well.
This is not your regular optical character recognition, because all the letters are badly drawn. Serious science is going on behind the scenes: your shape is processed by an artificial neural network having millions of connections trained to recognize different letters. The method which is used to train the artificial neural network is called backwards propagation of errors or backpropagation for short. This type of neural network only works on numbers, so sadly, it won't become skynet overnight.
This is probably the most hardest part of the project, but also the most awesome.
This option can translate the flashes into morse code and then the morse code to letters. The space between two codes is 3 ~ 15 frames with lights off, a short code is 3 ~ 6 frames with lights on, a long code is >7 frames, and the new codes will be translated into a letter after >16 frames with lights off.