Is there a better way to compute a Median in SQL?

I'm currently playing with random ideas I have for Nimitz (aka the next version of my NWMLS search tool), and one of the things I'm thinking about is displaying the Median value of properties that match an arbitrary search criteria. I already do something similar to this now on Zearch (the current version of my NWMLS search tool). with MIN, MAX, and AVG. So I wrote a Stored Proc (using MS SQL 2005 features), and came up with the following. (FYI - I was inspired by this blog post).

   1:  CREATE  PROC [dbo].[GetMedian]
   2:  (
   3:      @Col varchar(100),
   4:      @Table varchar(100),
   5:      @SqlWhere  varchar(1000)
   6:  )
   7:  AS
   8:  BEGIN
   9:   
  10:      DECLARE @Count AS INT, @MiddleRow AS INT, @Median AS INT 
  11:      DECLARE @SqlAll nvarchar(MAX)
  12:   
  13:      SET @SqlAll = 'WITH MLSSTATS AS '
  14:      SET @SqlAll = @SqlAll + '('
  15:      SET @SqlAll = @SqlAll + 'SELECT ROW_NUMBER() OVER (ORDER BY ' + @Col
  16:      SET @SqlAll = @SqlAll + ' DESC ) AS RowNum , '
  17:      SET @SqlAll = @SqlAll + '(SELECT COUNT(*) FROM ' +@Table + ' ' 
  18:      SET @SqlAll = @SqlAll + @SqlWhere + ') As Cnt,' 
  19:      SET @SqlAll = @SqlAll + @Col
  20:      SET @SqlAll = @SqlAll + ' FROM ' + @Table + ' ' + @SqlWhere 
  21:      SET @SqlAll = @SqlAll + ') '
  22:      SET @SqlAll = @SqlAll + 'select AVG(' + @Col +') from MLSSTATS '
  23:      SET @SqlAll = @SqlAll + 'where rownum  IN((Cnt + 1) / 2, (Cnt + 2) / 2)'
  24:   
  25:      
  26:      EXEC sp_executesql @SqlAll 
  27:  END

So if, let's say, I wanted to find the median value of all the houses currently listed in the city of Redmond, I'd do something like this...

   1:  GetMedian 'ListPrice', 'MLSTABLE', 'WHERE City in (''Redmond'')'

And then the proc would do it's dynamic SQL black magic and end up executing this...

   1:  WITH MLSSTATS AS 
   2:  (
   3:  SELECT ROW_NUMBER() OVER (ORDER BY ListPrice DESC ) AS RowNum , 
   4:  (SELECT COUNT(*) FROM MLSTABLE WHERE City in ('Redmond')) As Cnt,
   5:  ListPrice 
   6:  FROM MLSTABLE
   7:  WHERE City in ('Redmond')
   8:  )
   9:  select AVG(ListPrice)
  10:  from MLSSTATS 
  11:  where rownum  IN((Cnt + 1) / 2, (Cnt + 2) / 2)


The SQL is ugly, but that's that nature of the beast. Anyway, the problem is that I really want do to this...

   1:  SELECT AVG(listprice), MEDIAN('ListPrice', 'MLSTABLE', 'WHERE City = ''Redmond''')
   2:  FROM MLSTABLE
   3:  WHERE City = 'Redmond'


Or better yet, try to implement Median as a user defined aggregate function in C# / CLR language so it's even easier to use like so...

   1:  SELECT AVG(listprice), MEDIAN(ListPrice)
   2:  FROM MLSTABLE
   3:  WHERE City = 'Redmond'


According the above blog post it can't be done in a CLR language (which I don't understand). I guess user defined aggregates written in a CLR language only get the values of a set one at a time, instead of the entire set (which you would obviously need to compute a median)? Either way, I'm wondering if there's a way to convert my ugly proc into an ugly function instead? I think what's tripping me up, is that I need to define a variable that's accessible from dynamic SQL and returnable from the function that calls it.

 

Print | posted on Tuesday, May 01, 2007 8:38 AM

Feedback

# re: Is there a better way to compute a Median in SQL?

Gravatar left by Allen Benson at 5/4/2007 3:21 PM
There is a way easier way. Took me 5 years to figure it out. I use to do all sorts of things to figure out median. It a function in other DB's but not with our friends at MS.

# re: Is there a better way to compute a Median in SQL?

Gravatar left by Robbie at 5/4/2007 3:59 PM
I think Oracle has a MEDIAN, but given the cost of Oracle that's a non-starter. Anybody else have it?

I've toyed with MySQL, but most of the MySQL median solutions I've seen look uglier than the MS SQL versions (temp tables? yuck). Oh well, it's just a minor thing that annoyed me...
Title  
Name
Email
(never displayed)
Url
Comments   
Please add 8 and 7 and type the answer here: