Wednesday, 3 October 2012

Implementing Gesture/Air Mouse


The operating system used for the project is Ubuntu 11.04.The platform used for the project is scientific python(scipy). SciPy (pronounced "Sigh Pie") is open-source software for mathematics, science, and engineering. It is also the name of a very popular conference on scientific programming with Python. The SciPy library depends on NumPy, which provides convenient and fast N-dimensional array manipulation. The SciPy library is built to work with NumPy arrays, and provides many user-friendly and efficient numerical routines such as routines for numerical integration and optimization. Together, they run on all popular operating systems, are quick to install, and are free of charge. NumPy and SciPy are easy to use, but powerful enough to be depended upon by some of the world's leading scientists and engineers.
          An additional library which we use is open cv. OpenCV (Open Source Computer Vision Library) is a library of programming functions mainly aimed at real time computer vision, developed by Intel and now supported by Willow Garage. It is free for use under the open source BSD license. It has C++, C, and Python and soon Java interfaces running on Windows, Linux, Android and Mac. The library has more than 2500 optimized algorithm. It is used around the world, has more than 2.5M downloads and 40K people in the user group. Uses range from interactive art, to mine inspection, stitching maps on the web on through advanced robotics. The library is cross-platform. It focuses mainly on real-time image processing.





1.2.1.      Operation Sequences:
1)      Webcam interfacing
Here we interface our program with webcam. Usually we prefer inbuilt webcam as in laptop because of its apt position. Web camera’s with minimum 3 megapixel is opted as it will be good in clarity and resolution so that the working will be efficient and smooth.
2)      Grab frame:
Here we grab the frame from captured video via webcam and these frames are imported into our program.
3)      Segment out Finger
Blob is a region with adjacent pixels having same intensity values. Using blob libraries
 we detect the blob from the grabbed frame and thus the finger is segmented out.
4)      Calculate Finger position:
Exact position of finger is calculated using open CV libraries.
5)      Identify Gesture:
We design different gestures for different operations. Implement the gesture recognition to carry out following gestures.
Left click
Right Click
                        Double click
 
                                                            Left click and hold and release

6)      Apply decision to mouse pointer:
            Gestures are identified and interfaced instead of hardware mouse to the computer.


IMPLEMENTATION


The project “Gesture Mouse” was the first step towards a new world of interaction with the digital world. So inorder to optimize our project we needed a reference and the reference should be a best information processing system. Out of all searches, we realized the best information processing system available in our current scenario which can recognize the human gestures and act accordingly is the Human Information Processing System. So we started our project by going through the way by which the brain takes and processes the information. It was a tedious task when we wanted to replace a biological system with a digital hardware system ensuring that both deliver the same response. We used a camera instead of a human eye. It was surprising that a camera’s properties like shutter speed, focus length, ISO etc. were matching with a human eye. Some features of camera were better than the human eye to capture a well defined image. Then python program was implemented instead of brain processing inorder to process and recognize the gesture from the sequence of captured images. After that, according to the gestures different operations like left click, double click, right click and mouse pointer movement are performed.
Main part of processing lies in the program written in python with the help of computer vision and pymouse libraries. We divided the task of processing into 6 stages. First step was to interface the webcam with the program. Second stage was to grab the frame. Then from this frame finger wearing caps of red or blue color are segmented out. After segmnentation stage, positon of the extracted blob is calculated and then gestures are recognized based on the position. On last stage, different operations are performed based on the gesture recognized on the fifth stage.
Each stage needed different levels of processing way. First step was initializing the interaction with the web camera and buffering the captured images. Buffered images are stored in a image variable. On each capture, the desire blob i.e., image of red or blue colored stuffs are extracted and segmented from the image. Extraction is done by first converting the image into HSV(hue, saturation and value) format and then setting the hue, saturation and value range for red and blue colors. After extraction, a mask layer is applied so that only the blobs of red and blue are segmented. For the purpose of erasing the noise, erode and dilation operations of computer vision library are used. So, after segmentation was done correctly we filtered the blobs. Filtering was essential because there can be several red or blue colored stuffs in captured image which got extracted and segmented. Filtering was done for red and blue color separately based on the size of the caps (which is the blob in the image) we used. Now, location of blobs was calculated and if there was a red and blue blob then their relative positions and time of occurrence are understood and realized whether it was a right click or left click or double click. If there was a red blob only then it indicated the mouse movement and we moved the mouse pointer to the co-ordinates located from the position of red blob. This was easier when we used the inbuilt functions of pymouse library.
We took the image and done all the processing based on HSV format. Actually the same project can be done on RGB or YCrCb or HSL formats of image. The purpose behind on choosing the HSV format was, of all other formats only HSV format was little bit independent of light variation. The independent processing from light variation effects was the main problem on the digital image processing concept. So to find out a better result out of all other ways, HSV format became our option. So the RGB format of image captured by the web camera was converted to HSV format using the inbuilt function cvcvtcolor with argument specifying as the image captured and destination as another image variable and the argument specifying function to be implemented as cv_bgr2hsv which converts the RGB format to HSV format.


No comments:

Post a Comment