Gesture Recognition in a Classroom Environment



Gesture Recognition in a Classroom Environment

By Michael Wallick

Submitted as partial requirement for

CS 766 (Computer Vision)

Fall 2002

1. Introduction

Gestures are a natural and intuitive way of that people use for non-verbal communication. Gestures can also be used to interact with a computer system, without having to learn how to operate different devices. Gesture recognition in a surveillance system can provide a computer with information about what people are doing in a giving scene.

In an automated editing system the gestures of the actors can be useful in helping to drive the focus of the camera. This becomes especially important in a classroom environment. The gestures of the lecturer (in essence the only actor) help to indicate what is important in the scene, such as specific parts of the board, during the lecture. While these gestures are important in giving clues about what is important, the gesture itself can also obscure the important information (such as pointing). A good automated editing system needs to be aware and correct for this.

In this paper, I will present a gesture recognition algorithm that can be used in a classroom environment for the purposes of lecture understanding, with an emphasis on automated video editing of the classroom lecture. In particular, the method will be for recognizing “pointing and reaching” gestures. Because both of these gestures can be confused with writing and in order to avoid situations where the lecturer is blocking important information with his or her gestures, I will introduce the concept of “board regions.” These regions are a partitioning of the board into semantically linked groups of writing on the board. With the knowledge of the attributes of these regions (such as the times a region is first drawn, stops being drawn or is erased) the gesture of pointing and reaching can be disambiguated from writing, and the system can get a better idea of what part of the board is being pointed at.

This paper is structured in the following way. The next section is a brief survey of existing gesture recognition techniques, including template based approaches (the approach used in this paper). Following that is a discussion of Virtual Videography, the automated video editing project that this work is being done for. After the Virtual Videography section is a description of the “board regions.” This will include a formal definition of the region as well as a computer vision technique for finding these regions. A description of the gesture recognition algorithm that was implemented follows the region section. Finally, I will conclude with a discussion of how the gestures and the regions can be used together to get a better understanding of the classroom lecture and improve the automatic editing results.

2. Gesture Recognition

Gesture recognition has been a hot topic in computer science in recent years. There are many different techniques for achieving this goal, including template matching, neural networks, and statistical approaches. Either of these approaches can special tracking equipment or tracking based on computer vision [Watson93]. Special equipment will generally give better results and are easier to implement than computer vision techniques, however they will also tend to be more expensive, intrusive and restrictive than computer vision approaches. In this section, I will briefly outline the different techniques.

The first means of gesture recognition is with a template based approach. The raw input data is fed to the computer and then compared against a set of known gestures and classified as the gesture that it best fits. This approach has many advantages including that it is simple to implement and maintain such a system. The disadvantages to this approach is that it can be thrown off by noise, either in the templates or the data that is being classified, and it can be very dependant on background as well as the clothing worn by the actor whose gestures are being recognized [Watson93]. For this project I have constructed a template based gesture recognition system, and have addressed several of these problems in my implement.

A method for gesture recognition that is more advanced than template based matching is statistical approaches. With a statistical approach, models of the gestures are employed to help classify unknown gestures. These models can include a Hidden Markov Model (HMM) or Dynamic Time Warping (DTW). The known gestures are stored in the models and pattern recognition algorithms are used to identify the gestures. The difficulties with this scheme is that they can hard to implement and are slow to look up gestures [Martin99].

Similar to statistical methods of gesture recognition is using a neural network. The network is trained on different gestures and can then use pattern recognition techniques, which neural networks are known for. As with other statistical approaches, the neural network is difficult to implement. Additionally it requires many training examples, sometimes in the thousands, for each gesture [Fels93].

3. Virtual Videography

The main motivating factor for this project is Virtual Videography [Gleicher00, Gleicher02]. In Virtual Videography, a camera is placed in an unobtrusive location in a classroom and aimed at the board; the resulting footage is boring and difficult to watch. It is our goal to have a system that automatically edits this footage producing a useful video that is easy to watch and learn from.

For both the region finding (described in the next section) and the gesture recognition, the data set consisted of video taken from CS 559 (Introduction to Computer Graphics) in the Fall of 2001, taught by Michael Gleicher. Two cameras were placed close to each other in the back of the classroom and pointed towards the board. The professor is the only obstruction between the camera and the chalkboard.

In the following sections, I will formally describe a region, and then give an algorithm for extracting the regions from a videotape of a classroom lecture. After that, I present a method for using the same video to analyze the gestures that are being made by the professor.

4. Regions and Region Finding

In their work, Onishi et. al [Onishi00] introduce a method for segmenting blackboards based on an idea called written rectangles. The concept of regions can be viewed as an extension of the written rectangles idea, as it differs from and expands on their work in several ways. First is a more formalized definition for region. Second, the region concept applies over a more general domain of images; theirs is only applied and tested on a blackboard. Finally, a different algorithm is used for extracting regions from marker and chalkboards, and it can easily be extended to other domains.

Intuitively we can think of a region as being the partitioning of the surface over time and space, where each partition represents a single thought or idea that was written down. More formally we define a region to be the collection of strokes, such that all strokes in the region are related to each other.

Notice that the region definition requires a semantic understanding of what is written. Current technology does not provide a means of automatically extracting this information. Instead, make the observation that regardless of the overall structure of writing, ideas will still be grouped together. Using this observation the regions can be approximated by grouping together stokes that occur close together in either time or space.

A specific “lifespan” is applied to the region model. Each region is “born,” “grows,” “matures,” and eventually “dies.” A region will be in each of the states, in the above order, for some time during its life. When a stroke appears close to an existing region, the state of the region combined with the distance the stroke is from the region determines how the stroke is processed. The following is an in-depth explanation of each of these states:

Birth: A region is born when the first stroke belonging to a new region is drawn. In our model we assume that the writing surface either starts clean or originally contains irrelevant writing left over from the previous use. Therefore, all regions have a birth and cannot exist prior to the start of the event that is making use of the surface.

Growth: Birth is immediately followed by growth and is characterized by writing being added to the region for up to some fixed amount of time. A region is still considered to grow if a part of it is erased. However, the spatial extent of regions will generally not shrink. Either a small amount is erased to make a correction, or it is erased entirely.

Maturity: A region is mature once it stops growing, or another region is born. This means that only one region can be growing at once.

Death: A region dies as either a result of being erased; merging with another region or the event that is using the surface has ended.

If it seems that a region that has entered maturity is again growing, it is said that a new region is born, and merge the old region with this new one.

5. Algorithm for Region Finding

The region finding algorithm consists of three steps. The first step is segmentation and refilling. The person or people (who will be referred to as the presenter) are removed from the video and replaced with the area of the board that is being obstructed. The second step is to find where and when the board was changed, either by writing or erasing. Finally, this information is combined to form regions.

5.1 Segmentation and Refilling

Throughout the video, the presenter will be blocking some part of the board. It is likely that he or she will often block important information, i.e. what is being written. Therefore, the first step is to remove the presenter from the video and use in-painting to replace the part of the board that is obstructed. To identify the presenter and board, color classification is used to segment the video. I choose to use color segmentation for several reasons: the board is generally a unique color which makes finding non-board pixels easy; segmentation based on color will work when there is little motion or on single images, such as those acquired with a high-resolution digital camera; color classification will work with multiple occlusions (i.e. other participants or more than one presenter). I choose to use color classification for the reasons listed above, however any segmentation technique can be used instead.

In order to perform color segmentation, I extended the algorithm described in [Jones98]. The algorithm uses a three-dimensional array indexed by RGB values. The array is “trained” with pixels that are known to be the color of the object in question. In performing color classification, each pixel is checked against the trained array. If the location of the array at the pixels RGB value is greater than some threshold value, then that pixel is the same color as the object in question. In this particular method, each dimension of the array has less “bins” than the number of possible colors, so similar colors get grouped together. For our implementation, we use 32 bins per dimension. Training is done once per video, using the video itself. This compensates for lighting and camera configuration changes that occur throughout the data set.

The first two extensions have to do with the color arrays. First I allow for multiple arrays, in order to represent several objects. Additionally, each array has an associated confidence function:

Ci(R,G,B) = [0,1] (1)

Given an R, G, B vector, a color array i will return a confidence between 0 and 1. More specifically, the function is implemented as:

Ci(R,G,B) = iR,G,B/itotal (2)

In other words, the confidence function is the percentage of the number of training pixels that correspond to the R,G,B value in question. A pixel in question belongs to the object whose array returns the highest confidence value. Our current implementation uses two arrays, one for the board and one for the presenter.

The final extension to this method is in training the arrays. In [Jones98], the array is trained manually. Since I assume that the location of the board in the video is known, the extension allows automatic training of the color arrays. As the presenter is likely to move around during the lecture, pixels that change drastically from one training frame to the next are considered to be presenter, and those that remain constant are board. While this method does not work with 100% accuracy, the error tends to remain low enough that it does not cause misclassification in the end.

The following is the algorithm for training the color segmentation program:

1. Select a set of frames to be used for the training, i.e. every x frame from the video

2. Enumerate these frames from 1 to n, maintaining their order

3. For each pixel in the board region of frame 1, increment the board array at the appropriate location

4. For i = 2 to n do :

a. for each pixel in the board region of frame i, compare it to the pixel at the same location in frame i-1

b. if the two pixels are close in value, then increment the board array, otherwise increment the presenter array

After the training is complete, we segment the video and refill the pixels that have been removed. The following is our segmentation and refilling algorithm:

Let n be the number of frames in the video

For i = n-1 down to 1 do:

1. Perform color segmentation on frame $i$, placing all presenter pixels into a mask temp

2. Perform a series of morphological erode and dilate operations on temp to remove noise in the classification and include chalk or marker that was not classified as board

3. Replace each pixel that is marked in temp (i.e. not board) with the pixel in frame i+1

By performing the segmentation and refilling backwards, future information from the video is used to replace parts of the board that are obscured by the presenter. This means that if the presenter is blocking what he or she is writing, the writing will appear in the video as soon as that part of the board is obscured, rather than after the presenter moves.

Once this process is complete, there is a new stream of video that contains a “clean” shot of the board at every frame during the video. This means that anything written on the board will be visible, regardless of where the presenter is standing. Because the future information is used, marks which have not yet been written may also show up. This stream will be used for further processing of the video.

Figure 1 shows a frame before and after segmentation. In the segmented image, the area that is obstructed by the lecturer can be seen. In addition, what the presenter is about to write is visible.

[pic] [pic]

Figure 1: (Left) Original Image (Right) Segmented and In-painted image

5.2 Finding Strokes

For my purposes, I define a stroke to be the result of the writing or erasing a part of the board over a small period of time. The stroke is roughly equivalent a single marking on the board. Several strokes collected together will form a region. The method for finding the strokes is based on frame differences; however additional processing is necessary to correctly recognize marks in the image.

Since the presenter tends to write slowly and the size of the marks are very small, if I compare two frames that are too close together in time, there will either not be enough change or such a small amount of change that it will be too difficult to distinguish the marks from noise. If I compare frames that are too far apart, I run the risk of missing something or having information that does not accurately reflect what actually happened. I have found that looking at one frame every 4 to 6 seconds (120 to 180 frames) is a reasonable amount of time for a change to occur but not so long that we miss important information.

i first subtract two frames that are spaced far enough apart. The second step is to apply a threshold to difference image. This result will be a binary image where an “on” pixel in the board area indicates that there was some change between the current image and the last image. Third I perform morphological operations on the result to attach pixels that are close together. We then apply connected components to the image and take the largest component to the the stroke. If there are no components, then there are no strokes.

If there is a largest component, then a stroke has been found. In order to determine if a stroke represents writing or erasing, i perform high pass filtering on the original segmented frame of video in the area where the stroke occurs. If there are very few high frequencies, I mark the stroke as being “erased.”

Otherwise, i mark it as being “written.” Finally, I store all the information associated with the stroke, such as type, time, and dimensions for processing in the next step.

Figure 2 shows three strokes that were extracted from the segmented video. A bounding rectangle has been placed around each one.

[pic] [pic] [pic]

Figure 2: Three consecutive strokes found from the video

5.3 Building Regions

The output of the first two steps (segmentation/refilling and finding the strokes) is used in conjunction with heuristics from prior observations in order to build the regions based on the model described in previous sections. An important heuristic is that all of the writing for one region will occur close together both in time and space. Therefore, I group strokes together as a region if the strokes are spatially close and do not occur over a large time span. The second observation is that a region takes some amount of time to form and mature. This means that a region must be comprised of at least two strokes or must grow for some amount of time. It cannot be born mature. The final heuristic has to do with the erasing of a region. I have noticed that small pieces of regions are erased generally to make corrections and that most (if not all) of the region will be erased in order to “kill” it. With this in mind, if an erase block occurs, I check to see if most of the region has been erased; otherwise, I will consider it to be a form of growing.

I use the following algorithm for processing the blocks and building regions:

1. For each stroke check if it is writing or erasing perform steps 2 or 3

2. If it is a write stroke:

a. check if the current stroke overlaps the existing region that is growing (if any) \item if the stroke overlaps the region, extend the existing region and update the “maturity time” of the region to the time that the current occurred

b. if the region does not overlap the stroke or there are no growing regions, create a new region with the time of birth and maturity set as the time that the block occurred

3. If the block is an erase stroke:

a. check if this stroke overlaps with any existing regions.

b. if the stroke does overlap with a region, check if the entire region was erased.

c. if the entire region has been erased, mark the time the stroke occurred as the time the region dies.

d. otherwise, just part of the region was erased, likely to make a correction. If the region is still maturing, update the maturity time for that region to be the time the stroke occurred.

4. Discard any region that has the same birth and maturity time.

The test used to determine if an entire region is erased is similar to the test used to determine if a stroke is writing or erasing. A high-pass filter is performed on the region from the frame of the segmented video that corresponds to the time when the stroke occurred. If there are almost no high frequencies, then the test returns true; otherwise it returns false. Figure 3 places a box around the two regions that were found by that point in the video.

The code for the implementation of the stroke and region finding are included in the back of this paper.

[pic]

Figure 3: Image showing 2 regions that were found on the chalkboad

6. Algorithm for Gesture Recognition

The algorithm for gesture recognition that was implemented for this project was based on template matching. The user selects several key frames, or templates, from the video, each template represents a pose that the presenter can be in while making a gesture of interest. Unknown frames can then e compared against the set of templates in order to determine what gesture (if any) it relates to.

The classroom environment introduces several issues that make it difficult to be able to directly compare unknown frames against the template image. First, the background of the classroom is not constant. There is a chalk or marker board which contains writing; and makes the background dynamic (i.e. a background model will not suffice). Next, the exact location of the presenter is not known in the frame. Finally, the presenter will be wearing different outfits for each day that the data was collected, making direct template matching difficult.

In order to solve these problems, I make use of the segmented video that was produced during the region finding step of the process. Recall that the segmented video contains a “clean shot” of the board at every frame. This means that if the original video is subtracted from the segmented video, the difference pixels will represent the professor. This idea is used when generating the templates, and recognizing the gestures.

The following algorithm is used to generate a template image from a user selected key frame:

1. Subtract the original frame from segmented frame

2. Threshold the resultant image to create a binary output where and “on” pixel means presenter

3. Place a binding rectangle around the “on” pixels and crop the output image to that rectangle

4. Resize the output image to be 256x256

5. Place the output image into the set specified by the user

The image is resized in order to make the template matching easier in future steps. Since the camera parameters are constant in both the template set and unknown images, the resizing will not make a different in terms of the final output; if a template image is distorted, a matching gesture will be distorted in the same way. Figure 4 shows two example key frame images (a pointing and a reaching gesture) and figure 5 shows binary template images.

[pic] [pic]

Figure 4: Two example key frame images. (left) Pointing gesture (right) Reaching gesture

[pic] [pic]

Figure 5: Binary Template images

After enough templates of each interest gesture have been identified, a mask image created for each gesture based on the template images. The unknown frames can then be compared mask of each gesture to classify the frame. In order to create the mask for a gesture, the following algorithm is used:

1. create an empty 256x256 image (i.e. 0 at every pixel location)

2. for each template in the gesture set:

a. for every “on” pixel in the template image, increment the mask at that location

An unknown frame can be compared with the mask image by:

1. perform steps 1-4 of the template generating algorithm on the unknown frame

2. for every “on” pixel in the unknown frame, increment a score by the amount of the corresponding pixel value in the mask image

3. calculate a confidence for the gesture by dividing the score by the total of all pixel values in the mask image

The unknown frame belongs to a gesture set if (a) the confidence is greater than 50% and at least 1% larger than the confidence of other gestures; or (b) if the confidence is at least 15% greater than the confidence of the other gestures. If neither condition holds, then the frame does not match to either gesture.

While overall this algorithm works quite well; individual frames can sometimes be misclassified. Since gestures do not happen in isolation, temporal filtering can be applied over the entire video sequence. First every frame is individually classified. Next, each frame is reclassified based on the majority of the neighboring frames. This can help improve the results greatly.

For this project, the above algorithms were implemented, with two gestures sets, pointing and reaching. The following figures show the output of the system. In the bottom left hand corner of the image is the gesture followed by the pointing and reaching confidences. The implementations for these algorithms are at the end of this paper. Figure 6 shows three results of the gesture recognition system. The first number indicates the “pointing” confidence and the second number indicates the “reaching” confidence.

[pic][pic][pic]

Figure 6: Results of the gesture recognition (top left) Ground or unknown gesture (top right) Pointing gesture (bottom) Reaching gesture

7. Combining Regions and Gesture Information

As stated in section 3 of this paper, the main reason for this project is for use with Virtual Videography, or an automated video editing system. The main goals of such a system are to be able to determine the most important thing happening at a given moment, and determine the best way to convey that information. Using regions alone, the focus model that a system can use are quite limited. In general, important information will be determined by the age of the region (i.e. the most recent region is the most important one). Likewise, just using gestures alone does not tell the full story either. A pointing gesture gives the general direction of some important information, but the exact location is unknown.

By combing the region and gesture information, the gesture information can be improved and verified. Frames corresponding to the creation of strokes (see section 5.2) can be relabeled as “writing.” Since writing should generally appear as either reaching or pointing, the accuracy of the system can be checked by looking that the writing frames were actually labeled. Likewise, if there is a pointing gesture, but no regions in the direction of the point, this can imply that the pointing gesture was in error.

Combing the gestures with the regions can help improve the knowledge of what in the lecture is important at some given time. Consider for example, that the presenter points. Using the region information can dramatically reduce the area of the board that is important down to only the regions that exist in the direction of the pointing gesture. With gesture information alone, it would not be possible to know what part(s) of the board can potentially be important.

8. Conclusion

Gestures are a natural way that people use for non-verbal communication. They can be a valuable tool in automatically extracting information from a video. In this project I used gesture recognition to extract information from a classroom lecture, in order to automatically create an edited version of the lecture. In order to do this, I implemented a template based approach, which works regardless of changing background, and the clothing worn by the presenter. This information was then combined with “board regions” in order to verify and improve the results of both the gesture recognition and the over all knowledge of the lecture and what important events are happening.

References

[Fels93] S. S. Fels and G.E. Hinton Glove-Talk: A Neural Network Interface Between a Data-gloveand a Speech Synthesizer. IEEE Transactions on Neural Networks, 4, 2-8.

[Gleicher00] Michael Gleicher and James Masanz. Towards virtual videography. In Proceedings ACM Multimedia 2000, November 2000.

[Gleicher02] Michael L. Gleicher, Rachel M. Heck, and Michael N. Wallick. A framework for virtual videography. In Proceedings SmartGraphics 2002, June 2002.

[Jones98] Michael J. Jones and James M. Rehg. Statistical color models with application to skin detection. Technical Report CRL 98/11, Cambridge Research Laboratory, December 1998.

[Martin99] Jerome Martin, Daniela Hall, James L. Crowley. Statistical Gesture Recognition Through Modelling of Parameter Trajectories. In Gesture Workshop, Lecture Notes on Computer Science, Vol. 1793:129-140, 1999

[Onishi00] M. Onishi, M. Izumi, and K. Fukunaga. Blackboard segmentation using video image of lecture and its applications. In ICPR00, pages Vol IV: 615–618, 2000.

[Watson93] Richard Watson. A survey of gesture recognition techniques. Technical Report TCD-CS-93-11, Department of Computer Science, Trinity

Appendix A: Python Stroke Finding Code

#############################################

# This program finds the basic rectangles #

# or strokes from the chalk/ marker board #

# RUN THIS SCRIPT FIRST #

#############################################

import pyImage

import ImageMathpy

import VideoMonster

ImageMath = ImageMathpy

YES = 1

NO = 0

RADIUS = 10

MAXSIZE = 2000

MINSIZE = 100

FPS = 30

STEPSEC = 3

STEPSIZE = FPS*STEPSEC

def compareImage( avis, img, img2, out, out2, roi, i ) :

avis.getFrame(i)

avis.cache.copy( img )

ImageMathpy.gaussBlur( img, out, 5 )

out.copy( img )

ImageMath.highPass( img, out )

ImageMath.subtractB( out, img2, out2, 100 )

ImageMath.erode( out2, out2, 2 )

ImageMath.open( out2, out2, 5 )

ImageMath.dilate( out2, out2, 2 )

#pyImage.imageio_write( out2, "Bahh%05d.jpg"%(i) )

out2.setROI( roi[0]+RADIUS, roi[1]+RADIUS, roi[2]-RADIUS, roi[3]-RADIUS )

r = ompR( out2 )

#r = ImageMath.findRect( out2 )

#out.copy( img2 )

isgood = 1

if r.getVal(2)*r.getVal(3) > MAXSIZE : isgood = 0 #or r.getVal(3) > MAXSIZE : isgood = 0

if r.getVal(2)*r.getVal(3) < MINSIZE : isgood = 0 #or r.getVal(3) < MINSIZE : isgood = 0

return [r, isgood]

def function(foo) :

print "Starting: Test"

avi = foo.avi

avis = foo.aviSeg

#get the ROI

roi = []

for i in range(0, 4) :

roi.append( foo.glwin.get1ROI(i) )

if( roi[0] == -1 or roi[2] == 0 ) :

print "No Region Of Interest Set"

print "Ending: Test (failed!)"

return

roi[2] = roi[2]-roi[0]

roi[3] = roi[3]-roi[1]

f = open( "regions.txt", "w" )

f.write( "%d\n"%(avi.nframes()) )

f.write("frame#, diffpixels, x, y, w, h, area, C/D \n")

img = pyImage.ITImage( 720, 480, 3, 8 )

img2 = pyImage.ITImage( 720, 480, 3, 8 )

img3 = pyImage.ITImage( 720, 480, 3, 8 )

out = pyImage.ITImage( 720, 480, 3, 8 )

out2 = pyImage.ITImage( 720, 480, 1, 8 )

avis.getFrame(0)

avis.cache.copy(img2)

avis.cache.copy(out)

for i in range(10, avi.nframes(), STEPSIZE) :

print i,

cmp = compareImage( avis, img, img2, out, out2, roi, i )

#cmp = [r, isgood]

if ( not cmp[0].isZero() and i!=10 and cmp[1] ) :

# now lets see if we can move backwards

j = i-10

cmp = compareImage( avis, img, img2, out, out2, roi, j )

while( not cmp[0].isZero() and j>i-(STEPSIZE) ) :

j = j-10

avis.getFrame(j)

cmp = compareImage( avis, img, img2, out, out2, roi, j )

print j,

avis.getFrame(i)

cmp = compareImage( avis, img, img2, out, out2, roi, i )

tup = [ cmp[0].getVal(0), cmp[0].getVal(1), cmp[0].getVal(2), cmp[0].getVal(3) ]

temp = pyImage.ITImage( img.getWidth(), img.getHeight(), 3, pyImage.IT_UNSIGNED_CHAR )

ImageMath.highPass( avis.cache, temp )

temp.convert( 1, pyImage.IT_UNSIGNED_CHAR )

ImageMath.threshold( temp, temp, 100 )

create = 0

if( ImageMath.countOnI( temp ) != 0 ): create = 1

tot = ImageMath.countOnI( out2 )

if tot > 25 :

f.write ("%d %d %d %d %d %d %d %d\n" %(j, tot, tup[0], tup[1], tup[2], tup[3], tup[2]*tup[3], create ) )

ImageMath.drawRect( avis.cache, temp, cmp[0] )

del temp

print

out.copy( img2 )

del img

del img2

del out

del out2

print "Ending: Test"

Appendix B: Python Region Finding Code

#############################################

# This is a script to read the rectangles #

# file and determine interest regions #

# RUN THIS PROGRAM SECOND! #

#############################################

from math import sqrt

################################# Definations#############################

YES = 1 # Simple "yes"

NO = 0 # and "no"

OUT = 0 #outside region

OVER = 1 #overlap a region, but not contained

INSIDE = 2 #completely inside a region

ADDED = 0 #added something to the list

ALTERED = 1 #altered an entry in the list

NOTFOUND = 0 #a block was not found

FOUND = 1 #a block was found

TIMESECS = 45 #the number of seconds in temporal spacing, set this, not TIMEMAX

TIMEMAX = TIMESECS*30 #the max temporal space

ALLOWDIST = 50 # The number of pixels allowed in the "closeness" test

RADIUS = 50 # The number of pixels to extend regions by (creating safety border)

################################# Functions##############################

# takes a string and returns a list of the numbers in the string

# note: assumes that the list is only numbers!

def lineList( str ) :

li = str.split()

for i in range(len(li)) :

li[i] = int(li[i])

return li

# takes two rectanges and returns the eucleadian distance from the center of each rectangle

def dist( re1, re2 ) :

#figure out the 2 points of each rectangle

r1_x1 = re1[0] #upper left rectangle 1

r1_y1 = re1[1]

r1_x2 = re1[2] + re1[0] #lower right rectangle 1

r1_y2 = re1[3] + re1[1]

r2_x1 = re2[0] #upper left rectangle 2

r2_y1 = re2[1]

r2_x2 = re2[2] + re2[0] #lower right rectangle 2

r2_y2 = re2[3] + re2[1]

pt1_x = (r1_x2+r1_x1)/2.0

pt1_y = (r1_y2+r1_y1)/2.0

pt2_x = (r2_x2+r2_x1)/2.0

pt2_y = (r2_y2+r2_y1)/2.0

return sqrt( (pt1_x-pt2_x)*(pt1_x-pt2_x) + (pt1_y-pt2_y)*(pt1_y-pt2_y) )

# Tells if two rectangles are "close" to each other, as defined by ALLOWDIST

def isClose( re1, re2 ):

r1 = ImageMath.rect( re1[0], re1[1], re1[2], re1[3] )

r2 = ImageMath.rect( re2[0], re2[1], re2[2], re2[3] )

if ImageMath.isNearby( r1, r2, ALLOWDIST ) : return YES

else : return NO

# Goes through the activeList from i+1 and checks if "i" should be "merged" with another region

# will return (YES, #) if i should be merged with # and (NO, 0) otherwise

# This function perserves temporal spacing as well

def isCloseL( i, activeList, birthList, matureList ):

if i > len(activeList)-1: return #safety check

reg = activeList[i]

for j in range( i+1, len(activeList) ) :

if birthList[j]-matureList[i] ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download