WINLAB Home Page



Marauder’s Map Surveillance SystemA senior Design Project ProposalByEric WassermanHiran PatelNishit RavalJose SurielSapan SharmaIntroductionSecurity is a very important issue in the world. A large part of the technology industry is devoted towards the development and installation of security systems. People use surveillance systems to guard their homes, companies, government properties and even vehicles such as Rutgers buses. When it is used for the protection and safety of buildings, it is necessary to constantly monitor and record activities inside or outside of the building. There are two existing options in the market. One option is that cameras are fixed and monitor one particular location. The other option is a constantly rotating camera moving in a 180 degree plane. Moving cameras are possible to deceive and if an individual followed the motions of the camera closely although they do provide better coverage than stationary cameras. If these video cameras are installed in a building they will provide individual video streams which are displayed on a screen. Typically, each individual video stream will be displayed separately on the screen requiring the user to observe N different videos where N is the number of cameras in use. The existing surveillance systems also do not indicate if the person entering the building is authorized to be there or not. It is often very difficult to identify the individual if the person monitoring the cameras cannot see the face clearly. The Marauder`s map surveillance system will be able to provide bird`s eye view monitoring as well as more accurate security surveillance. The bird`s eye view representation of the building will bring each and every video stream together on a 2D plane and track the movements of any people within the building. Each individual will be represented by a moving or stationary set of feet. Each set of feet is an individual person with a name attached to the feet. To provide this type of identification, the system will incorporate face recognition software and an employee database. Each camera sends its data to the central server where images are sent through the face recognition and motion detection software. Next to each set of feet will be his/her name, and other information about the individual that is contained in the employee database. If someone who is not authorized to be in the building is recognized by the surveillance system, a red set of feet with the name “unknown” will represent the individual. Another use of the system is if a person is authorized to be in the building but not in a particular area, then if that individual is located in that unauthorized area, the person’s feet will turn red on the monitor. One last additional feature is to enable the systems’ user to view the real time video stream of a particular person or location in case if he/she would like to look closer at a particular person or location. This system can be thought of as providing a real time blueprint of a building that shows everyone within the building and their individual movements.Video SurveillanceTo provide video surveillance, some form of video camera must be used. There are two types of cameras that are of interest to this system. The first is a Digital IP video camera and the second is an Analog Video Camera with video to IP converter. Analog video cameras are still in existence in some older surveillance systems and with the video to IP converter, a preexisting surveillance system can be outfitted with the Marauder’s Map Surveillance system. An IP camera is a type of digital video camera commonly used for surveillance and it can send and receive data via Ethernet. There are two types of surveillance systems for IP cameras. The first is a centralized surveillance system, which requires a central server to hold and display the video streams from each camera and handle the alarm management. The second is a decentralized surveillance system, where each camera can store, process, and display its own video streams. There are many advantages for using digital IP cameras. They have high image resolutions of 640x480 and HDTV image quality at 10 to 30 frames per second at 3 Mbps. They offer secure data transmission through encryption and authentication methods such as WEP, WPA, WPA2, TKIP, AES. Live video feeds from selected cameras can be seen from any computer connected to the internet. IP cameras have the ability to operate without an additional power supply because they are powered through the Ethernet connection. The cheapest camera has a 20 meter viewing range. Video TransmissionOne of the issues with the Marauder’s Map Surveillance system is how the video streams from the cameras will be sent to the central server. The only feasible option is wireless transmission because laying wires between each camera and the server would be an expensive and difficult task. The next dilemma is how the video cameras will wirelessly transmit their data to the central server. After careful deliberation a wireless mesh network was decided. A wireless mesh network will be the most energy and cost effective solution. A wireless mesh network can utilize WiFi standards making it easy to configure each node. The network is self configurable meaning that if additional nodes are added, the nodes automatically configure themselves to identify the new node and an updated route to the server. A wireless mesh network is also self healing meaning that if a node goes down for some reason then the other nodes will find an alternate path for rating the data. The wireless mesh network will work as follows. Each IP digital video camera will be connected to its own node in the network. The nodes will communicate with each other in ad-hoc mode using the 2.4 GHz band. The nodes will create a shortest path route to the central server using a routing protocol (either OLSR or B.A.T.M.A.N). The video streams will then hop from node to node until it reaches the central server. Each node will have its own unique IP address so that the central server will know which camera the data is from and where the camera is located. Since the cameras will be stationary for the system, each IP address will have a corresponding physical location which describes where in the building the camera is located. There are multiple ways to implement a wireless mesh network. The following section describes three options. At the end a final decision will be made regarding which solution is the best for the surveillance system. Table 1 also lists statistics to compare the three options. ZigbeeThe first option for a mesh network is to build mesh nodes out of Zigbee modules. Zigbee is a wireless protocol based on the IEEE 802.15.4-2003 standard. Zigbee was first designed for low rate personal are networks but recent modules have increased its range. If Zigbee is used, then nodes would have to be individually built to hold the Zigbee modules. Right now there are two different Zigbee modules with different specifications. There is a low power module and a high power module. The low power module consumes very little power (about 1 mW) and is relatively cheap. The high power module consumes more power (100 mW) and costs a little more. Both modules can only achieve a maximum data rate of 250 Kbps at 2.4 GHz which will cause a considerable delay in the play out of the video at the server. The main problem with using Zigbee is that the surveillance system will not be able to utilize one of Zigbee’s primary features, which is its low power setting. When the Zigbee module is not in use, it goes into low power mode which saves considerably on energy especially if a battery is being used to power the device. With video, however, there needs to be a constant stream of data being transmitted by the wireless mesh node, so the Zigbee device will never be in power save mode. Firetide The second option for a mesh network is to use a private company’s predesigned wireless mesh nodes. The company Firetide produces high end wireless mesh equipment. All of their products come with software that enables the user to configure the nodes from a lap top or computer. One of the advantages of Firetide’s wireless mesh nodes is that they can transmit between nodes on the 5 GHz frequency band and then the nodes can transmit between client devices on the 2.4 GHz frequency band. This allows for less interference between the client devices and the mesh nodes. For the Marauder’s Map Surveillance system, the IP cameras will connect via Ethernet to the mesh nodes and the nodes will transmit between each other in 5 GHz band. The mesh nodes consume quite a bit of power at 400 mW but they have a significantly high maximum data rate of 54 Mbps at 5 GHz. Since Firetide’s mesh nodes are for commercial use, they have many additional functions that would not be needed for the Marauder’s Map Surveillance System. They are much more durable and can withstand harsh temperatures and their data rates may be unnecessarily high for the surveillance system.FreifunkThe third option for a mesh network is to use an ordinary Linksys WRT54G router and install the open source Freifunk firmware. Freifunk is an initiative started in Germany to provide free wireless radio networks to third world countries. When the firmware is installed on the Linksys router, the router is able to act as a node within a mesh network. The firmware uses the OLSR routing protocol to find the shortest path to the central server. The Linksys router consumes very little power at only 42 mW but has a significantly worse operating range than Firetide’s mesh nodes. The router does have a maximum data rate of 11 Mbps which is significantly better than the Zigbee modules but not as high as Firetide’s devices. Wireless Mesh Network DecisionTable 1 compares all of the specifications for the four (including the Zigbee low and high power modules) mesh network choices. The best decision for the Marauder’s Map Surveillance System is the Linksys router with Freifunk firmware. The router combines low cost with reasonably high data rates and receiver sensitivities. Since the Freifunk firmware is open source, it is not difficult to install and configure the devices. Also, since the IP cameras will be located indoors, there is very little need for such a durable router like the ones that Firetide supplies. Also, since cost is a big determinant, there is very little justification for spending nearly $2000 per node. In the case of Zigbee, although it would be cost effective, the data rates are too low to maintain real time video at the server.Table 1: Comparison of wireless mesh network technology.Frequency BandOutput power(mW)CostMaximum Data RateReceiver SensitivityZigbee ZMN2405 Module915 MHz or 2.4 GHZ1$22.00250 Kbps-92 dBm at 250 KbpsZigbee ZMN2405HP Module915 MHz or 2.4 GHZ100$37.50250 Kbps-95 dBm at 250 KbpsFiretide900 MHz, 2.4 GHz, or 5 GHz400$179554 Mbps-95 dBm at 1 MbpsLinksys router with Freifunk firmware2.4 GHz42$5011 Mbps-89 dBm at 1 MbpsFace RecognitionWith the Marauder’s Map Surveillance System it is very important to detect who is in the desired area. Whether the individual is in the database or not, and if he or she is, than the details we have on that individual. This is where the face recognition comes in. When the video cameras detect an individual in its coordinates, the face recognition software will then try to detect who that person is and will present a name if he or she is in the database on the GUI. If that person cannot be detected, he or she will be shown as “Unknown” on the GUI. Software for face recognition can be purchased online through big companies that specialize in this technology. Usually they are used in government areas or privates businesses. This, however, is very expensive.There are three tasks that have to be dealt with when using face recognition: document control, access control, and database retrieval [1]. Document control is the verification of a person with his or her actual camera image with a document photo stored in the database. Access control is if the individual detected in the vicinity is given permission to be in that area. Database retrieval is the actual documented information of the individual if he or she is in the database. There are many algorithm and software that already are able to do some kind of face recognition; I.e apple's iphoto and googles’s Picasa. Nevertheless, many of these software do not provide their source code and have copyrights. The goal was not to find open source software that is able to do everything by one click, but to find algorithms and programming libraries that can be changed to fit the requirements of the project and make the project more reasonable. Two different algorithms were found that not only show the different approaches of Face Recognition but make the face recognition part of our project feasible and doable. The first algorithm is Appearance-Based Recognition and the second is Scale Invariant Feature Transform (SIFT). Both methods take different approaches to face recognition. Appearance-Based Recognition The Basic Concept of Appearance Base recognition is to create a set of possible appearances of a certain object. The set of appearances are 2-D images on different illuminations and angles of a 3-D object. When the camera takes the image of the object that needs to be recognized, we check which set of appearances does the image most probably belong to. The set with the highest probability is chosen and the object is recognized. The first step for this algorithm is to create the set of possible appearances for a given face. Different images are taken of the face at different illuminations and angles with a blank background. The background must be blank since the background might affect the algorithm. Each image can be represented as an nXn matrix, since where each image is assumed to be the same height and width. The matrix is then represented as an n^2 vector where each entry of the vector are the pixel values of the image. In other words, the image has become a point in the n^2 dimension. This process is repeated for the other images in the set. As it happens, the different point representations of the images start to form a “cloud”. A weighted representation could be used so that the outlier points do not have a great effect on the cloud. When the camera takes the image of the scene, the image of the face to be recognized will have to be extracted from the image of the scene. This can be done by using motion detection and face detection in openCV. After the image of the face is extracted it is also represented as a point in the n^2 dimension. To find the set with the greatest probability, the cloud closest to the point representing the new face will be the match. This is done by checking the distance between the point and the different clouds. The center of the cloud could be chosen to be the point that is used to check the distance. Similarly, a threshold must be used so that if no distance is found to be lower than the threshold, then the face is tagged as unknown. The threshold will need to be decided later when more knowledge is gained on the subject. After the cloud with the closes distance is found that is lower than the threshold the face is tagged with the appropriate person’s name.There are several obvious problems if the algorithm is used just as it has been described above. The first problem is that n^2 can be a very large number. Checking the distance of a point with a given cloud can be very time consuming for a computer. This will result in the program running very slowly and become almost impractical for a laptop to do. Another concern is that storing all the images will require a lot of memory. Yet to deal with the first problem, the images of the face will need to be represented in eigenspace and thus obtain an eigenface representation of the faces. To do this, the images are first normalized and then the mean face of the entire database is found. This can be done by adding each vector and then dividing by the number of faces in the database. Hence, a face can be represented by:center64770Where xj is the given face, x is the mean face, gji are the eigenvalues for the given face, ei are the eigenfaces, and n is the number of faces in the database. Changing n by an integer, k, only the components associated with the largest eigenvalue will be kept. Consequently, this will reduce the problem to a lower dimension while keeping a reasonable representation of the faces. Also, by reducing the dimension it will allow the algorithm to be run on a laptop.Another obstacle with this method is that faces can be at different distances from the camera. Each face must be then scaled to a specified size. Another obstacle with this method is that the appearance of a person can change over time. People may change their hair or appear older. Hence this method would not be useful to use in a real world application. Nevertheless, it can be useful for the environment required with the project proposal, since peoples’ appearances will be somewhat static. Scale Invariant Feature Transform (SIFT)Another possible approach for Face Recognition is SIFT. The basic concept is to find certain features within the face that are invariant to scale, rotation and partially invariant to illumination. When a new image of the face comes in these features are extracted and then compared to a database. If a match occurs then the face has been successfully recognized. This algorithm follows four steps to find the features.The first step is called Scale-space extrema detection, which basically applies a difference of Gaussian filters to the image. To do this the image is first convoluted with different Gaussian filters at different scales. In other words the image is blurred by different frequencies. The blurred images are then subtracted to obtain the scale-space representation of the image. The next involves finding the key features within the image. The key points/features are maximum or minimum points across the scales. Each pixel is compared to its other adjacent neighbors as well as the neighboring scales. Yet not every maximum or minimum point is chosen. Key points with low contrast are remove as well as key points along edges. This is what makes Sift scale invariant, since it considers the images at different scales. The next step is to assign an orientation to each key point. The orientation is obtained by applying a gradient at different orientations. A gradient orientation histogram is then computed. The highest peak, as well as any peak along 80% is assigned to the key point orientation. The invariance to orientation is achieved because the properties of the key point are dependent to the chosen orientation. The last step is to generate a key point descriptor. This descriptor uses a set of 16 orientation histograms to generate a description of the key point.Face Recognition DecisionAfter stating what each algorithm does and what each is capable of, it makes the most amount of sense to use SIFT. SIFT doesn’t require a lot more data storage and there isn’t the problem of having to capture a perfect screen shot of the person to be successful. How the software will workA simplified overview of the software can be seen in figure 1.center0Fig. 1: Overview of Software structure.Cameras will take photos of the scene which are then run into the software. The first part of the software is the motion detection and object tracking part. To capture only motion within the scene openCV will be used. There are many tutorials showing how to use the library to attain this effect. Face detection (used to separate the face from the scene) can also be done using openCV. OpenCV uses an algorithm that uses haar-like features to detect a face. Haar-like features are adjacent rectangular features that differ by illumination. For instance, regions of the eyes are darker than the cheeks, thus a set of two adjacent rectangles, one on the eyes and another one on the cheeks counts as a haar-like feature. After the moving face has been detected, the system asks whether the face has been tagged or not. This is done so that the software will not be continually trying to recognize the same person. This is useful because recognition is not necessary for object tracking. Later it will be decided whether the tag will expire after some period of time so that the system can minimize any error of losing track of the person. If the person is not tagged or its tag is expired, then the images of the faces are send to the object recognition part of the software. This uses a database and one of the algorithms described before to match the image with the database. The multiple images of the face will be used to reduce error, since they should be matched to the same face. If the face has been tagged then it skips the object recognition and proceeds to construct a bird’s eye view of where the person is located.Data Storage/Central ServerA central processor is essential to developing the Marauder’s Map Surveillance system. The central server will consist of a PC running the Linux operating system. Linux Enterprise is a distribution that includes Apache web server software, which is optimal for building a server. Linux also supports web programming languages, such as PHP, Perl, and Python. This allows the server to “talk” with the web applications. Once this has been set up, users can access data from the central server remotely, if the need arises. The central server will also store incoming data from the video cameras, and must maintain a database of known people working for a company. For this, the PC will run database management software like MySQL. The advantage of using MySQL is that it supports multiple databases, and allows multiple users to access the different databases. An external hard drive can also be easily implemented to store GBs of data. This method of designing a central server is cost efficient and capable of handling the tasks necessary for our system. Video Stream ConstructionOnce the object has been tracked and tagged, one coherent output on the screen will need to be displayed. However, each camera tracks the same object from a different perspective. To coordinate the multiple images onto a common coordinate system, the cameras will first need to be calibrated. Camera calibration involves estimating external and internal parameters. Knowing the parameters helps in determining camera geometry and finding a common coordinate plane to map all the images. External parameters refer to location and orientation of the camera. Internal parameters are defined by focal length, image format, distortion, principal pointIn the system, cameras will remain in a fixed location, and will not be rotating. As a result, the cameras will only need to be calibrated once. Also, the camera parameters can be determined since there is free software available for use. There is a camera calibration toolbox available in Matlab. The next step to reconstructing the images is called stereo reconstruction. This involves finding the corresponding points in the multiple images. Since each of the cameras will be tracking the same objects, there will be a common set of points, in the common coordinate frame, corresponding to all of the camera images. There exists a featured based algorithm and a correlation based algorithm to solve this correspondence problem. The feature based algorithm requires matching features such as line segments and edge points within images, whereas the correlation based algorithm matches images based on the intensity of images. Both algorithms are adequate for our purposes in reconstructing the 3D view.Once the corresponding points are found, a 3D representation of the image can be reconstructed using a method known as camera triangulation. It is a means by which a point in 3D space is located given the fact that corresponding points exist. This common point can be found because it will exist at the point where the cameras’ focal points intersect. Using known algorithms like the Direct Linear Transformation and the Mid-Point Method, the geometry problems can be solved and then construct a 3D view of the room/building. To design a 2D bird’s eye view of the room however, one of the axes can be removed without losing any valuable data (the object being tracked). GUI/DashboardThe graphical user interface for Marauder’s Map Surveillance makes the system unique compared to other available surveillance systems and enables the users, namely security personnel, to have multiple options of viewing. The main design of the interface will allow the user to get a bird’s eye view of the entire floor of a building. Each person in the room will be displayed on screen with a pair of footprints, and his/her name will be visible alongside the footprints. In addition, if the user clicks on the footprints, an icon will pop up giving a detailed description of the person of interest. Along with the person’s name and photo, some other important data will be featured; this can include information such as the department the person works in, the name of that person’s manager, followed by contact information. However, if a person has not been tracked, either because they are not in the system database or because the cameras were unable to recognize the person, the name written next to the footprints would be labeled “Unknown”. An additional feature this system will give the user the option to view the live video footage by double clicking the footprints. This would be helpful in identifying people who aren’t tagged with a name. Below is an example of how the interface would appear to a user. For example, if this system were to track college students in SERC, this is how he/she would appear on screen. Figure 2 gives an example of what the dashboard will look like. Fig. 2: Dashboard for Marauder’s Map Surveillance System.ReferencesWikipedia, Camera Resectioning, August 2010. Bouguet. Camera Calibration Technique Toolbox for Matlab. Updated July 2010. , Correspondence Problem. Last modified October 2010.. Triangulation (Computer Vision). Last modified March, 2010.(computer_vision)Enmanuele Trucco & Alessandro Verri. Introductor Techniques for 3-d Computer Vision. Prentici Hall-Chapter 10David G. Lowe. Object Recognition from Local Scale-Invariant Features. University of British ColumbiaEstrada & A. Jepson & D. Fleet. Local Features Tutorial. Nov. 8, '04Wikipedia, Haar-like features ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download