1. The Problem - Microsoft Azure

嚜澤 Purely Relational Way of Computing Neighbors on a Sphere

Jim Gray

July 2002

Abstract: Relational operators and a zoned index accelerates spatial lookups by about

8-fold and spatial joins by about 32-fold when compared to a technique based on an

external stored procedure.

1. The Problem

o.xyz?xyz

SkyServer applications and queries often need to find all objects nearby a given object in

the celestial sphere. This is such a common operation that we implemented a series of

table-valued functions fGetNearbyObjEq and fGetNearbyObjXyz that return all

objects within a certain radius of a given equatorial point (right ascension and

declination) or a given x,y,z unit vector in the J2000 coordinate system. These functions

use an HTM index to limit the search to about 15 bounding HTM triangles. Then they

filter the objects using the following equation to compute the actual distance (between

object o with celestial coordinates o.cx, o.cy, o.cz and the point x,y,z:

DistanceInDegrees = degrees(2℅asin(sqrt((o.cx-x)2+(o.cy-y)2+ (cz-z)2))/2)). (1)

In fact the SkyServer has a dozen

neighbors functions that return nearby or

sin(牟/2) = |o.xyz?xyz|/2

xyz

nearest objects and some of their attributes

牟/2



每 but all these functions are based on the

o.xyz

core HTM-then-filter algorithm 每 indeed

they all call one core procedure that has

this logic.

Figure 1: construction for equation (1).

xyz and o.xyz are unit vectors.

Some queries want to compare each of

several hundred million objects with all their neighbors. Searchers for gravitational

lenses and for clusters are examples of such queries. To speed these queries the

SkyServer precomputes the Neighbors Table that for each object lists all its neighbors

within 30 arcseconds along with summary attributes (type and isPrimary). This table

averages about 9 neighbors per object; but, some objects have hundreds of neighbors and

some have none.

Computing the neighbors table using the fGetNearbyObjectsXyz() function can take a

long time: on the fifteen million object SDSS early data release, the computation took 56

hours 每 or about 74 objects per second. Fortunately, the computation was done only a

few times in the load process and then used many times in queries. But, it was obvious

that some speedup will be needed as the SDSS database grows twenty-fold over the next

3 years. Indeed, with the SDSS Data Release 1, the database is about to grow six-fold so

the naive computation would take about 2 weeks1. The full dataset would take about 2

months to compute.

1

This performance is measured on a dual 800 MHz Xeon processor with 2GB ram and 12 disks with

150MB/s bandwidth. All later measurements are performed on a 722MHz Pentium III with 0.5 GB of ram

and 1 disk with 10 MB/s bandwidth.

1

The computation is embarrassingly parallel and cpu-bound. Each object*s neighbors can

be computed independently and the computation is cpu-bound. So, the computation

could be accelerated by using multiple processors. A 7-node processor farm could do

the DR1 job in 2 days. But, it makes sense to analyze the problem and look for better

algorithms.

The basic problem is that SQL can evaluate equation (1) at the rate of about 170,000

records per second per cpu (5.6 ?s per row) while the HTM functions run at about 170

records per second per cpu (5.9 ms per row to return the nearest object.) This is a 1,000:1

performance difference. The high costs of the HTM functions is a combination of the

HTM procedures, the expensive linkage to SQL via external stored procedures (a string

interface), and the use of table-valued functions. It appears, based preliminary timing

tests (a thousand calls to itest5() of the sqlExample HTM procedures) that the HTM

code uses about 3 ms and that the other costs (linkage, string conversion, and tablevalued function are in the range of 2ms). It will be fruitful to investigate why the HTM

code consumes so much cpu time 每 but that code is opaque to me. The linkage costs will

be much reduced with the SQLserver integration with C# in 2003.

2. A Zoned-join Accelerates the Near Functions

An alternative is to use SQL operations to limit the search rather than using the HTM

procedures. By doing this we achieve about a three-fold speedup for the table-valued

functions. This same idea when applied to computing the neighbors functions gives

about a 32-fold speedup, tuning a 14-day neighbors-table computation into a 9-hour job.

Figure 2: The division of the sphere into 12 zones (in practice there

are thousands of zones). Two circular neighborhoods are shown, one

inside one zone (minZone=maxZone) and another crossing 3 zones

(minZone+2= maxZone.) The dotted boxes show how the ra filter

and the dec filter further reduce the search. The ra filter needs to be

※expanded§ by 1/cos(dec).

The basic idea is to map the celestial sphere into zones, each zone is a declination stripe

of the sphere with some zoneHeight (see Figure 2). Number the south-pole zone as zone

zero. Then an object with a declination of dec degrees will be in zone:

zoneNumber = floor((dec+90) /zoneHeight)

(2)

There will be ceiling(180/zoneHeight) zones. The following code builds the zone table.

create table zone (zone int, objID bigint, ra float, dec float,

(3)

x float, y float, z float,

primary key (zone, ra, objID))

insert into zone

select floor((dec+90) /zoneHeight), ra, dec, cx, cy, cz

from PhotoObj

If we search for all objects within a certain radius of point (ra, dec) then we need only

look in certain zones, and only in certain parts of each zone. Indeed, to find all objects

2

within radius R of point RA, DEC2, one only consider zones between

maxZone = ceiling ( (DEC+90+R)/zoneHeight)

(4)

minZone =floor( (DEC+90-R)/zoneHeight)

and within these zones one only need consider objects o with

o.ra between RA-R and RA+R

(5a)

(modulo some cos() and mod(360∼) corrections added below (see (5)).

This way of limiting the search is similar to the HTM approach but avoids calling an

external procedure. The primary key on zones makes this lookup very fast, so that the

resulting procedure has the performance given in Figure 3. This incidentally shows that

table-valued functions can be quite fast.

1000

2.00

Rows vs elapsed time

fit is 1.46+2.2e-4*r^2 ms/asec

Relative time vs zone height (asec)

4 minute zone is near optimal

2 & 8 minute are slower

7 asec

15 asec

30 asec

1 amin

2 amin

4 amin

16 amin

1 degree

1.90

1.80

7.5 asec

15 asec

30 asec

1 amin

2 amin

4 amin

64 amin

r^2 fit

10

1.70

time vs best

time (ms)

100

1.60

1.50

1.40

1.30

1.20

1.10

1.00

1

10

100

r (asec)

1000

10

100

r (asec)

1000

Figure 3: At left is time vs. radius for neighbors function for various zone heights. It

suggests that any small zone height is adequate. The graph at right plots time vs. best

time for each radius. It indicates that a zoneHeight of 4 arcminutes is near-optimal and

that it outperforms smaller and larger radii by large factors.

There are some nasty details that need a bit of extra mechanism. The biggest problem is

that the sphere is round, so equations (4) and (5) must be computed modulo 360∼, and

equation 5a must be corrected for the fact that the right-ascension is ※compressed§ by

cos(dec) as it moves away from the equator. So ra should be divided by cos(dec)+汍

where 汍 is a tiny number added to prevent division by zero when dec is ㊣90∼.

Fortunately, equation 4 needs no correction, but equation (5a) should become

o.ra between (ra-R)/ (cos(dec)+汍) and (ra+R)/(cos(dec)+汍)

(5)

One other detail is that when one is near the prime meridian (ra = 0∼ or ra = 360∼) then

the ※other end§ of the range is nearby. This problem is solved adding these neighbors

into the zone as the margins with ra less than zero for the left margin and ra greater than

360∼ for the right margin. The main area is the ? open interval [0∼,360∼) and the margins

must respect this ? open property. Assuming a MaxRadius of 1∼ and epsilon is 1e-6:

2

We assume RA and DEC have been normalized to be in the ranges [0∼, 360∼] and [-90∼,90∼] respectively.

3

Insert into zone

-- right margin, notice +360 on ra

select floor((dec+90) /@zoneHeight), ra+360, dec, x, y, z

from PhotoObj

where ra >= 0

and ra < @MaxRadius/(cos(radians(dec))+@epsilon)

Insert into zone

-- left margin, notice -360 on ra

select floor((dec+90) /@zoneHeight), ra-360, dec, x, y, z

from PhotoObj

where ra >= 360每@MaxRadius/(cos(radians(dec))+@epsilon)

and ra < 360

Now, equation (5) is actually correct and finds all neighbors within the zone. The full

query to select the neighbors within @r of @ra and @dec from a zone is:

select o1.objID

from zone o1

where o1.zoneID = @zoneID

-- force the zone

-- using zone number and ra interval

-- scaled by cos(abs(dec)) (positive cos)

and o1.ra between @ra - @r/(cos(radians(abs(@dec)))+ @notZero)

and @ra + @r/(cos(radians(abs(@dec)))+ @notZero)

and o1.dec between @dec-@r -- quick filter on dec

and @dec+@r

and (2*degrees(asin(sqrt(

power(o1.cx-@cx,2)+power(o1.cy-@cy,2)+power(o1.cz-@cz,2) )/2)))

< @r -- careful filter on distance

This statement combined with the minZone and maxZone logic of equation (4) gives the

performance described in Figure 3 for a table-valued function finding neighbors nearby a

point. This is the 7x speedup over the HTM external procedures. The full statement

handling this zone logic is:

select o1.objID

from zone o1

where o1.zoneID between

-- force the zone

-- using zone number and ra interval

floor((@dec+90-@r)/@zoneHeight) and

floor((@dec+90+@r)/@zoneHeight)

and o1.ra between @ra - @r/(cos(radians(abs(@dec)))+ @notZero)

and @ra + @r/(cos(radians(abs(@dec)))+ @notZero)

and o1.dec between @dec-@r -- quick filter on dec

and @dec+@r

and (2*degrees(asin(sqrt(

power(o1.cx-@cx,2)+power(o1.cy-@cy,2)+power(o1.cz-@cz,2) )/2)))

< @r -- careful filter on distance

Expressed in this format rather than a nested loop

exactly specifying each zone and ra range, the

optimizer is unlikely to pick the right plan.

One can further accelerate the test by observing that

@ra-@r and @ra+@r is too ※fat§ a band for any zone

except the one holding the center of the circle (see

Figure 4 .

These changes turn the 2-week build-the-neighbors

job into a 2-day job. But, there is one more speedup.

r

ra-zoneMax

x

      

  

 

         

zoneMax

Ra ㊣ x

Figure 4: when a circle crosses multiple

zones, the ra range of the adjacent zones can

be reduced from r to x as shown in the

diagram. For higher zones, zoneMin replaces

zoneMax.

4

3. A Zoned-Nested-Loops-Join Accelerates Building Neighbors

When one wants to batch-compute many neighbors, the zone approach can bypass the

stored procedure and get another 10-fold speedup (or more if parallelism is used) as

follows. We can join each zone with itself and then with its north and south neighbor

zones. These three joins all use the relational operators with automatic parallelism and

with some very sophisticated optimizations. This bypasses much of the transact-SQL

logic in the original algorithm.

The basic join to compute the neighbors is:

insert neighbors

-- insert one zone's neighbors

select o1.objID as objID,

o2.objID as NeighborObjID,

.. other fields elided

from zone o1 inner loop join zone o2

-- force a nested loop

on o1.zoneID-@deltaZone = o2.zoneID -- using zone number and ra

and o2.ra between o1.ra - @r/(cos(radians(abs(o1.dec)))+ @notZero)

and o1.ra + @r/(cos(radians(abs(o1.dec)))+ @notZero)

where (

o1.ra >= 0 and o1.ra < 360

-- objects not both marginal

or o2.ra >= 0 and o2.ra < 360 )

and o1.objID < o2.objID

-- do 1/2 the work now

and o2.dec between o1.dec-@r and o1.dec+@r -- quick filter on dec

and (2*DEGREES(ASIN(sqrt(

power(o1.cx-o2.cx,2)+power(o1.cy-o2.cy,2)+power(o1.cz-o2.cz,2)

)/2))) < @r

-- careful filter on distance

this is done for @deltaZone in {-1, 0, 1}. The insert-join above does only ? the work,

finding only objects where o1.objID ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download