As many of you know, I write real estate web applications for fun and profit. I have to admit it really is an interesting area for innovative web applications. After all, writing a modern IDX solution requires taking advantage of AJAX technologies, mastering a web based mapping platform (like Microsoft Virtual Earth or Google Maps), integration w/ 3rd party APIs and web sites (such as Zillow or Trulia), pretty dynamic SQL, and handling large amounts of graphical images. Needless to say, it can pose some interesting technical challenges at times.
I'm in the midst of an application architecture upgrade; I'm currently debating how I want store my MLS images. I currently store broker specific images in a database (currently just agent photos and property management photos). My biggest customer (Real Property Associates) has about 70 agent photos and 2000 property images stored in a database, (I currently use ASP.NET HTTP handlers to extract images out of the DB and re-size said image on the fly before I send the response stream to the browser). The NWMLS images are on the file system scattered in one of a thousand subdirectories.
Although this arrangement is more efficient than storing all 474,589 or so NWMLS images in a single directory, it's painful to deal with. Deleting photos for sold or expired listings is like going spelunking instead of doing a DELETE * FROM … WHERE … SQL query. SELECT queries are a lot more flexible that DIR commands. Modifying data only requires one set of DB permissions instead of setting file system permissions on a thousand directories. I don't have to worry about the physical location of my data, since it's just a SELECT query away (regarding of where SQL ends up putting it). Adding meta data is just an ALTER TABLE … ADD COLUMN … SQL command away. In short, even if SQL Server is slower than the file system, the benefits of dealing with it instead are worth the price, so far.
Conventional wisdom is that you don't store images in SQL server. Unfortunately, my good experience with smaller data sets is leading me to believe that conventional wisdom is wrong. Then again, there's a world of difference between 2000 images and a number hovering 25K shy of half a million images. In my case, the images tend to be smaller JPEGs (about 400 x 300 in size) usually hovering around 50K, and I suspect (though I don't know), as long as the size of the images are small-ish, the performance penalty you'll pay appears to be comparatively is minor. My DBA guru friends have told me as long as the image sizes are small, it should be fine. SQL finding a BLOB via a primary key should be pretty quick.
Of course, the only way to find out for sure is to go for it (and see if you regret it later). I suppose, I already know the answer I want to hear. I just want to feel good about it before I go ahead and database-tize my NWMLS image set. Rumor has it that SQL 2008 has a new FileStream datatype, that stores BLOBs in the file system, but yet remain an integral part of the database with transactional consistency, so perhaps I should wait? Ironically, doing nothing (and waiting for Microsoft to fix the problem) is sometimes the smartest design decision you can make.
So does anybody have any good or bad experiences with storing images in their DB / Web applications? How big of a dataset are you storing in the DB? How big of a dataset are you displaying in the web browser window? How do the MLS Vendors deal with this problem?
Print | posted on Tuesday, June 26, 2007 9:17 AM