As many readers / listeners already know, Linh Da and I are in the process of looking to buy a house here in Los Angeles. I set out to collect historical data on property sales so that I could leverage the techniques of data science to help make a more informed decision about the price of homes.
Now, I had no illusion about the results of my model. I realized that there are many well qualified data scientists who send their entire careers working on home price prediction. I'm not more likely to outperform their results than I am competing with a Wall Street quant that has better and faster data access than I do, let alone the experience of studying their dataset 7 days a week their entire career.
This project really has four intentions:
To be honest, we're off to a slow start in two respects.
First, data is no where near as available as I thought it might be. I started the projec out of a frustration for not being able to find the datasets I was seeking. I find only filtered, pruned, active listings. Modeling only these will surely introduce a significant bias.
I knew we would have a hard time aquiring this data, and it's proving more difficult than I expect. Further, the "feel free to volunteer to do any idea you like" has been a failure. Thankfully, Linh Da (a project manager, as keen listeners will know), is giving me some tips on how to better organize. I sent out a spreadsheet asking people to list their skills and interest in contributing. Very soon, I'm going to ask people to take responsibility for specific tasks in the overall project.
That means I also need to formalize this as a more concrete project with tangible objectives and goals. I hope to provide such an outline in this blog post.
One issue I've had so far in the project is that the majority of feedback takes the form of a would-be contributor doing a single google search, and sending me the top link which undoubtable doesn't really provide what I'm looking for in the project. While I deeply appreciate the effort, I can do my own Googling. If people want to contribute, I'd like them to do so in measurably meaningful ways, so the first order of buiness for me is to set up a system that allows them to do that.
Thus, my first milestone goal is to set up a public facing API that accepts property information submissions and returns them to requestors as well. Consider it a sort of Wiki for property data. Anyone can consume the API and initiate CRUD opperations. The API can also be consumed by anyone.
This, naturally, presents a few problems once created:
And from these questions, some other projects can arrise as well including:
This are the key questions I'll be tackling in the near future. If you're interested in contributing, join the slack group, and in particular, check for our next live sessions and come participate.