Tim points out just how bad…

…the Case-Shiller index is.   Interesting because I’ve heard multiple times in the past few months how trivial it would be to create a better index and I’m somewhat surprised Zillow hasn’t taken this on yet. But then again, a good argument could be made that they may already be too big for their own good, so I shouldn’t be too surprised.

20 responses

  1. Dustin,

    We’ve actually published an index (the Zindex) on Zillow since the site launched. The Zindex is the median Zestimate and is available at the Zip, City, County, State and National level, and can be potted vs. a homes’ historic Zestimates. The strengths of the Zindex lie in the fact that it considers the value of all homes, not just those that have sold and the median does a good job of effectively ignoring potentually inaccurate Zestimates. Zindexes get updated with every Zestimate update but we also publish quarterly reports that include Zindex trends.

  2. David,

    I thought about the Zindex when writing the post, but I’m thinking much more along the lines of an index that would be valuable to Wall Street. I have a few guesses as to why the Zindex hasn’t caught on with this crowd yet, but I’ll leave those to myself for the time being.

    In this area, Radar Logic is getting the most buzz around their index for housing derivatives. My guess is that the Zillow machine is chewing on substantially more relevant data than Radar Logic and yet missing out on providing appropriate data to the Wall Street.

  3. Not to beat a dead horse, but an aggregate of Zestimates is an aggregate of errors. We expect a large sample to flatten out errors, but the initial presumption is that the bulk of the data points are not in error. We know going in that all Zestimates are inherently erroneous. No one single data point can be accurate except by accident. It is a canard to insist that the errors will cancel each other out. Without examining and correcting for each individual error, all we can know is that garbage + garbage = garbage. Zindices are good for press releases. To uses them as investment advice would seem to me to be a poor idea.

  4. “I’ve heard multiple times in the past few months how trivial it would be to create a better index”

    It would not be trivial. But who was saying that? I could reconsider my opinion.

    By the way, Arizona State University just announced – I haven’t seen the numbers yet – a Case-Shiller type index broken down by submarkets, I suppose the cities within metro Phoenix.

    Now that index will become the local gold standard.

    You will no doubt find that some cities are doing fine and others are basket cases.

    The whole psychology of the public will change from being pessimistic to being… confused.

    I think they’ll need a local real estate expert to help them interpret those confusing numbers.

    The bubble blogger will have a conniption.

  5. Hi Dustin,

    IMO, the Zindex is the unsung hero on Zillow but Greg’s comment demonstrates very well why the it doesn’t get more airplay. Greg is wrong but he is also very smart. If he doesn’t understand how a median excludes outliers (or even the simple fact that it is not a sum), it’s really hard to expect the average Joe to get it. That said, the average Joe is probably also unaware of Case/Schiller.

    So, we’ve started to take a story-telling approach to publishing quarterly trend information. This past quarter, instead of just publishing the numbers we researched and packaged up the story that showed that 16% of homes purchased in the last year have negative equity. If you google that headline you’ll see that it’s gotten far more mainstream media pickup than even Robert Schiller’s recent sound-bites. This was been a very successful strategy for publishing trend data and one I’m sure we’ll repeat.

    Wall street isn’t interested in the past or even the present which is why you currently see Robert Schiller quoted making more and more forward-looking statements. Anyone who watched Lereah undermine his credibility with similar predictions should see where this is headed. Time will tell if the future’s as gloomy as Schiller’s predicting but IMO, it’s when the analysts become fortune-tellers that you should see red flags.

    Among investors, the Zindex is in fact well used and I’ve read more than one conversation where it was compared to the local CSI. It’s important that the investors who study these numbers understand the basis for their calculations. CSI is a 3-month average that estimates change in the sum of values of all homes in an area. The Zindex is a current estimate of the value of the “average” home in an area. They measure totally different things – and I am truly biased – but I think the median value more accurately represents what people are talking about when they discuss whether home values are “up” or “down”.

  6. > If he doesn’t understand how a median excludes outliers (or even the simple fact that it is not a sum)

    I said neither of these things. What I said is that an aggregate of errors cannot be said to be anything other than an error. Ignorance accumulated never becomes knowledge.

  7. Sorry, I don’t want there to be a mistake about this. This is Zillow’s definition of a Zindex:

    The point at which “half the Zestimates for a region are below this number and half the Zestimates are above it.”

    We know for a fact that the half below the median are all in error, but we can never know by how much.

    We know for a fact that the half above the median are all in error, but we can never know by how much.

    Ergo, we can know with unassailable certainty that the Zindex is itself in error, but we can never know by how much.

    You cannot arrive at a fact from an array of errors, no matter how vast that array might be. This is simple epistemology. The less Zillow talks about accuracy, the better.

    A median struck among recent past sales is itself erroneous — which John Wake talks about above — albeit probably less so. But even if other means of estimating real estate activity might introduce errors, this is not a reason to select a method that cannot possibly under any circumstances be valid.

  8. Greg – you said Garbage + Garbage = Garbage. That is a sum. It is not true of the Zindex. “ignorance accumulated” is again an inaccurate characterization of the mechanism used to calculate a median.

    Zestimate accuracy fits a bell curve. Many Zestimates are incredibly close to the values homes sell at — and then there’s a long tail of increasingly less accurate Zestimates on either side of that hump — but the hump is most certainly in the “middle”. Many studies have been done on Zestimate accuracy and one thing those researchers consistently find is that where Zestimates are “off”, those inaccuracies are evenly distributed on either side of “accurate”. In other words, inaccurate Zestimates have equal chance of being “high” as they do of being “low.” This is why a median is such an effective tool — it ignores outliers — but only when the condition is met that requires outliers to be evenly distributed on the low and high side — which is the case with the Zindex.

    I agree that a median of past sales is a horribly poor measure of value trends.

    I am not discussing Zestimate accuracy here; I’m explaining the usefulness of the Zindex. Given a large enough sample size, it’s an approach which yields a very useful measure of the change in value of the “average” home in the area in question. To my knowledge there’s no other index or trend metric that does that.

  9. Here’s a good example showing pickup of the quarterly reporting Zillow publishes: http://www.slocountyhomes.com/zillow_q307.htm

    I’m actually seeing quite a lot of this.

  10. David,

    Your claims are absurd, but there is no profit in my trying to disabuse you of delusion. Instead, I’ve mailed you a box of broken watches. Look them over, strike a media and then tell us all what time it is. With a five minute margin of error, you have one chance in 288 of being correspondent to reality. (Note: Not accurate.) One in 144 if you ignore the meridian. Close enough for a press release, I expect.

  11. 🙂 While there’s truly no insult like a Greg Swann insult, I’m hanging in here. Let’s try to simplify this and run with your watch analogy …

    I received your 144 watches … and noticed that their accuracy is distributed similarly to Zestimate accuracy (how odd?) …

    Only 78 of 144 watches tell a time that’s within a few minutes from “now” and …

    20 watches are an hour ahead and
    10 watches are 6 hours ahead and
    6 watches are 12 hours ahead and
    2 watches are 2 days ahead (OMG!) and …

    20 watches are half an hour behind and
    10 watches are 3 hours behind and
    6 watches are 6 hours behind and
    2 watches are a day behind (OMG!) and …

    Now, to calculate the median time … we take the watch “in the middle.” Now, because the watch in the middle of our set of watches has an even number of watches either side of it that are either way ahead of time or way behind time, the median will always come from the 78 watches that are roughly on time. So, the median tells us the time.

    A median is not an average or a mean; both would have yielded the incorrect time in the watch analogy above. Your criticisms would be valid had we used either the mean or the average. Multiple independent studies into Zestimate accuracy have always found inaccurate Zestimates to be normally distributed. If it were true of (inaccurate) Zestimates that they were “always high” or “always low”, then the median would be biased (and yes, I would be deluded.) It is however not true. Zestimate accuracy is normally distributed and so, the Zindex is not biased (and I’m thankfully not deluded though I make no excuses for the watch that’s a day behind.) 🙂

  12. Amazingly enough, 1142 West Culver Street is still not there, yet is still has a completely nonsensical Zestimate.

    I’m not trying to insult you, and I am perfectly willing to believe that you believe what you are saying. I do not believe what you are saying. You cannot know how errors are distributed without examining — and, what the heck, correcting — each one. If there is one error in the Zillow database I would have expected to have been corrected by now, that would be 1142 West Culver Street.

    Do you want to tell me the whole company has been hoodwinked by statisticians? This I’ll believe. Do you want to claim that a few random perfect-laboratory-conditions spot checks seem to bear out the distributions you are reporting. This is plausible to me. Beyond that, you are making claims you cannot possibly defend.

    To the contrary, I can show you whole neighborhoods where every Zestimate is insanely low. With sales so slow right now — and with so many anomalous transactions among those few sales — your error rate has to be soaring — by how much and in which direction you have no way of knowing.

    Look at your own chart for 1142 West Culver Street:

    Does anyone actually believe that this empty lot lost $150,000 in value from February to mid-May, then regained $90,000 by mid-July? Would you trust a human amanuensis this erratic? Given these errors-upon-errors identified on only one burned-down house, why would anyone credit any claim made about the aggregated database?

    I love you better than anyone I know among vendors, David, and I’m not trying to be cruel to you. But the position you are taking is not credible. I don’t care how many appeals you make to authority. We know beyond doubting that Zestimates lack too much on-the-ground information to provide even a pale reflection real property values. To insist you know in the aggregate what you obviously cannot know by particulars is absurd.

    The Zestimate was a cute toy for getting press attention. It is not your strong suit now, and the more Zillow talks about “accuracy,” the more it invites derision.

  13. Here’s a simple summary of my feedback to Greg …

    Because of the way the Zindex is calculated, you don’t have to trust every Zestimate in the area to be able to trust the Zindex value. Rather, you can trust in the Zindex value wherever the following is true …
    1) Zestimates are often wrong but they’re also often uncannily “right” and …
    2) When Zestimates are wrong, they’re high as often as they are low.

  14. And here is a summary of my responses to David:

    1. We know for a fact that Zestimates are never right and are only correspondent to real property values by random accident, which randomness is inherently unpredictable.

    2. Any statement made about an aggregate of Zestimates is necessarily absurd.

    Zillow.com does many interesting and valuable things. Evaluating real property is not among them and never has been. It has every right to run an elaborate Ouija board, which is why I have repeatedly defended it from what I consider to be unjust attacks. But the claims it makes about the accuracy Zestimates and Zindices are undefended and indefensible.

  15. […] 12, 2007 by Dustin …goes sterile.    If you haven’t been following the dialog on Zillow’s accuracy in the comments of this post, then you’ve been missing […]

  16. Greg –

    I love you too my friend 🙂

    And I believe that we largely see eye to eye on Zestimate accuracy; many Zestimates are off the mark.

    At issue here is not Zestimate accuracy; it’s the usefulness of the Zindex as an indicator of the trend in home values.

    Where you’re incorrect is that the Zindex can be useful even when many Zestimates are inaccurate. The statistical explanation above explains why. If you still don’t get it let’s get on the phone and talk this through.

  17. Thanks, David. I understand the argument. I just don’t buy it.

  18. I agree with Swann.

    Here is an honest analysis I did of Zillow’s “accuracy” in my neighborhood.


    The results were surprising even to me. I like Zillow as a fun, entertaining site to play with, but there is no real accuracy at all. The technology is at best, simply guessing.

  19. […] 13, 2007 by Dustin looks like he is much farther along than I suspected in realizing the dream of a better housing […]

  20. jeeponrock Avatar

    Let me jump in here for a moment and get to the heart of Greg Swan’s argument…
    If you knew that the errors in Zestimates were evenly distributed, then you would have to know the real value of the properties. If you knew the real values of the properties, the Zestimates wouldn’t be in error.
    The problem is here. You simply don’t know how far off the Zestimates are. You can assume that they are randomly distributed but balanced between too high and too low, but until you know exactly the level of accuracy for all of the Zestimates, any derivative of them could be flawed… to an unknown degree.
    Simply put, if the data to make the calculation is flawed, regardless of the way you calculate, the result of the calculation is still wrong.

Leave a Reply