If you have some location based data, such as point of interests (POI) data in your database, one common query you want to have is to search POIs within an area, typically a circle defined by a center point and a radius. This is so called spatial search. To increase the speed of spatial search in a relational database, the first thing you do is to index the latitude and longitude columns. The next thing is to write an efficient stored procedure for the job - a topic of this post.

To determine if a point is within a circle, you need to calculate the distance between the point and the search center. An accurate distance formular can be quite complex considering the Earth's curvature and irregular shape, and thus takes a lot of time to compute, especially if you have millions of POIs in your database. However, if the search area is not very big, as in most real world cases, a simplified distance formular is good enough:

Based on the above formular, the spatial search stored procedure can be written as:

CREATE PROCEDURE [dbo].[GetPOIs_Circle]

@latitude float,

@longitude float,

@radius float,

@catCode varchar(10),

@maxReturns int

AS

BEGIN

SET NOCOUNT ON;

Declare @R float -- Earth's radius in miles

Set @R = 3958.755866

Set ROWCOUNT @maxReturns

Select * From

(

Select POIName, Address, City, State, Country, Zip, Phone, Website, Latitude, Longitude,

@R * radians(sqrt(square(Latitude - @latitude) +

square((Longitude - @longitude) * cos(radians((Latitude + @latitude)/2))))) As Distance

From POIs

Where CatCode = @catCode

) As a

Where a.Distance < @radius

Order By a.Distance

END

GO

The stored procedure will return no more than *maxReturns* POIs of specified category within the specified circular area, sorted by the distance from the search center. This stored procedure works but it is not optimal. It has to calculate many distances before it can narrow down to the POIs whose distances from the search center are smaller than the radius. A quick way to narrow down potential POIs is to search within a square that the search circle inscribes. Once these POIs are found, they can be further narrowed down to within the search circle using the distance formular. So the modified stored procedure is:

CREATE PROCEDURE [dbo].[GetPOIs_SquareCircle]

@latitude float,

@longitude float,

@radius float,

@catCode varchar(10),

@maxReturns int

AS

BEGIN

SET NOCOUNT ON;

Declare @R float -- Earth's radius in miles

Set @R = 3958.755866

-- Find bounding box

declare @minLat float

declare @minLon float

declare @maxLat float

declare @maxLon float

declare @sita float

declare @corr float

set @sita = degrees(@radius/@R)

set @corr = 1/cos(radians(@latitude))

set @minLat = @latitude - @sita

set @maxLat = @latitude + @sita

set @minLon = @longitude - @sita * @corr

set @maxLon = @longitude + @sita * @corr

Set ROWCOUNT @maxReturns

Select * From

(

Select POIName, Address, City, State, Country, Zip, Phone, Website, Latitude, Longitude,

@R * radians(sqrt(square(Latitude - @latitude) +

square((Longitude - @longitude) * cos(radians((Latitude + @latitude)/2))))) As Distance

From POIs

Where CatCode = @catCode and

(Latitude > @minLat and Latitude < @maxLat) and

(Longitude > @minLon and Longitude < @maxLon)

) As a

Where a.Distance < @radius

Order By a.Distance

END

GO

This version of the stored procedure first calculates the bounding box of the search circle and then applies it to the *Where* clause of the *Select* statement. Now let's compare the efficiency of the two stored procedures using the Actual Execution Plan tool in SQL Server 2005.

The above table showes the estimated CPU costs when both stored procedures are used for the same search - search the Golf Course category within 100 miles radius from downtown Los Angeles. The main POIs table contains some 14 million records. As you can see, the overall improvement of the second stored procedure over the first one is about 32% in terms of CPU usage. The estimated I/O costs are identical for both stored procedures, and thus are not included in the table. Percentage wise the biggest improvement is the Compute Scalar operator, which in our case is the evaluation of the distance formular. As expected, the second stored procedure avoids significant amount of distance calculations because of the square (bounding box) filtering. This also suggests that with the methodology of the second stored procedure we can afford more complex distance formular or shape, such as a polygon search area, without compromising the performance.