Stored Procedure for Spatial Search

by Gong Liu January 23, 2009 19:59

If you have some location based data, such as point of interests (POI) data in your database, one common query you want to have is to search POIs within an area, typically a circle defined by a center point and a radius. This is so called spatial search. To increase the speed of spatial search in a relational database, the first thing you do is to index the latitude and longitude columns. The next thing is to write an efficient stored procedure for the job - a topic of this post.

To determine if a point is within a circle, you need to calculate the distance between the point and the search center. An accurate distance formular can be quite complex considering the Earth's curvature and irregular shape, and thus takes a lot of time to compute, especially if you have millions of POIs in your database. However, if the search area is not very big, as in most real world cases, a simplified distance formular is good enough:

Based on the above formular, the spatial search stored procedure can be written as:

 CREATE PROCEDURE [dbo].[GetPOIs_Circle]
  @latitude float,
  @longitude float,
  @radius float,
  @catCode varchar(10), 
  @maxReturns int
 AS
 BEGIN
  SET NOCOUNT ON;
  Declare @R float -- Earth's radius in miles 
  Set @R = 3958.755866
  Set ROWCOUNT @maxReturns
  Select * From
  (
   Select POIName, Address, City, State, Country, Zip, Phone, Website, Latitude, Longitude,
     @R * radians(sqrt(square(Latitude - @latitude) +
     square((Longitude - @longitude) * cos(radians((Latitude + @latitude)/2))))) As Distance
   From POIs
   Where CatCode = @catCode
  ) As a
  Where a.Distance < @radius
  Order By a.Distance
 END
 GO 

The stored procedure will return no more than maxReturns POIs of specified category within the specified circular area, sorted by the distance from the search center. This stored procedure works but it is not optimal. It has to calculate many distances before it can narrow down to the POIs whose distances from the search center are smaller than the radius. A quick way to narrow down potential POIs is to search within a square that the search circle inscribes. Once these POIs are found, they can be further narrowed down to within the search circle using the distance formular. So the modified stored procedure is:

CREATE PROCEDURE [dbo].[GetPOIs_SquareCircle]
 @latitude float,
 @longitude float,
 @radius float,
 @catCode varchar(10), 
 @maxReturns int
AS
BEGIN
 SET NOCOUNT ON;
 Declare @R  float -- Earth's radius in miles 
 Set @R = 3958.755866
 -- Find bounding box
 declare @minLat float
 declare @minLon float
 declare @maxLat float
 declare @maxLon float
 declare @sita float
 declare @corr float
 set @sita = degrees(@radius/@R)
 set @corr = 1/cos(radians(@latitude))
 set @minLat = @latitude - @sita
 set @maxLat = @latitude + @sita
 set @minLon = @longitude - @sita * @corr
 set @maxLon = @longitude + @sita * @corr 
 Set ROWCOUNT @maxReturns
 Select * From
 (
  Select POIName, Address, City, State, Country, Zip, Phone, Website, Latitude, Longitude,
   @R * radians(sqrt(square(Latitude - @latitude) +
   square((Longitude - @longitude) * cos(radians((Latitude + @latitude)/2))))) As Distance
  From POIs
  Where CatCode = @catCode and 
   (Latitude > @minLat and Latitude < @maxLat) and
   (Longitude > @minLon and Longitude < @maxLon)
 ) As a
 Where a.Distance < @radius
 Order By a.Distance
END
GO

This version of the stored procedure first calculates the bounding box of the search circle and then applies it to the Where clause of the Select statement. Now let's compare the efficiency of the two stored procedures using the Actual Execution Plan tool in SQL Server 2005.

               

The above table showes the estimated CPU costs when both stored procedures are used for the same search - search the Golf Course category within 100 miles radius from downtown Los Angeles. The main POIs table contains some 14 million records. As you can see, the overall improvement of the second stored procedure over the first one is about 32% in terms of CPU usage. The estimated I/O costs are identical for both stored procedures, and thus are not included in the table. Percentage wise the biggest improvement is the Compute Scalar operator, which in our case is the evaluation of the distance formular. As expected, the second stored procedure avoids significant amount of distance calculations because of the square (bounding box) filtering. This also suggests that with the methodology of the second stored procedure we can afford more complex distance formular or shape, such as a polygon search area, without compromising the performance.  

About

A seasoned computer professional. A tofu culture evangelist...
more >>

Tag Cloud

Calendar

<<  April 2017  >>
MoTuWeThFrSaSu
272829303112
3456789
10111213141516
17181920212223
24252627282930
1234567

View posts in large calendar
Copyright © 2008-2011 Gong Liu. All rights reserved. | credits | contact me
The content on this site represents my own personal opinions, and does not reflect those of my employer in any way.