Machine Learning and Real Estate Forecasting

Background

Two summers ago, my fiancée had an afternoon birthday party at an outdoor patio.

It was one of those rare instances during the pandemic where it was safe enough to gather outside in small groups.

The party ended around 6 pm. When it did, one of her friends pulled us to the side to ask if it was safe to travel home along Queen West.

(For those not familiar with Toronto, Queen St. West is one of the most popular streets in the city.)

Puzzled, my fiancée and I looked at each other, then looked back at them, and, in unison, said, “yeah, why wouldn’t it be?”

It turns out that they were concerned something might happen to them on their way home because going home that way meant they had to pass a homeless encampment in Trinity Bellwoods Park.

(As the pandemic hit Toronto, encampments had sprouted up in many of the city’s parks as hundreds fled the city’s shelters for fear of contracting COVID-19 and the violence within the shelter system. Eventually, Toronto would spend $2 million clearing homeless encampments across the city.)

To me, it seemed pretty odd that they were concerned with that.

It wasn’t as if the encampment residents were living on the road.

You actually had to go some ways into the park before coming across them.

Plus, they kept to themselves for the most part.

All they were trying to do was survive, like the rest of us.

Still, you never know what a person has been through, so I did my best not to judge and simply said, “Yeah, it’s fine. You have nothing to worry about.”

Fast forward six months. My fiancée tells me that the aforementioned friend has bought a house. I ask where. She tells me in an area that has a reputation for being one hundred times more dangerous than Trinity Bellwoods has ever been.

It's a place I'd play basketball in as a kid. Whenever I went there, I was told to watch my back. Mind you, nothing ever happened there, but still, it had the reputation.

Even more baffling, their new house was one block from an encampment.

I asked my fiancée to repeat what she said.

Surely, I hadn’t heard correctly.

She repeated herself.

I did, in fact, hear her correctly.

I asked her if her friend’s feelings towards encampments changed?

She shook her head no.

How is it that a person who’s so worried about their safety that they’re terrified of walking past an encampment not only bought a house next to one but ended up living in one of the most dangerous neighbourhoods in the city?

VIEW ON GITHUB

Datasets:

Libraries:

matplotlib, numpy, pandas, seaborn, and sklearn.

The Problem

89% of millennials want to own a home

67% of them will have to wait at least 20 years to be able to afford one

46% of millennials need financial assistance from their parents to buy their first home.

Still, even with the odds against them, many millennials are pursuing homeownership.

And what are they finding when they do?

Absurd prices.

Craig J. Lazzara, S&P DJI

Realtors more concerned with their commission than their wellbeing

My fiancée and I may or may not watch Million Dollar Listing Los Angeles when trying to veg out.

And real estate companies stacking the deck in their favour.

Bloomberg

Oh, and so Zillow doesn’t sue me:

Sadly, even if millennials somehow overcome the odds and purchase a home, many still aren’t finding the security they hoped for through homeownership.

Hometap

Two-thirds of millennials have home-buying regrets.

Millennials are also more likely than the generations before them to think a home isn’t a good investment or that they overpaid for their house.

Nonetheless, millennials still want to buy them.

And they’ll do the seemingly inexplicable to get one.

Hence my fiancée’s friend.

That’s where we at MyFirstHome come in.

(Sidenote: I use “we” to represent my fictitious company.)

Our goal is simple, provide millennial homebuyers with an easy-to-use tool that answers the question: how much should a home cost?

Not how much a realtor says a house should cost.

Not how much a house with a porch you really like should cost.

And not how much a house should cost because a house two blocks away sold $60K above asking.

Our hope is by doing this we will help them make better purchasing decisions.

Challenge

Can we create an easy-to-use tool that helps millennials answer the question: how much should a house cost?

Since this is more a labor of love for our fellow millennials, not profit, we wanted to do this with easily accessible information. We aim to have a turnkey process capable of deployment anywhere in North America.

And so, for our test case, we've picked Ames, Iowa.

Why Ames?

  1. The data is publicly available.
  2. With a population of about 66,000, there is enough data to work for a test case, but not so much that we’d be overwhelmed.
  3. Its “ridiculously friendly people'' makes it one of the top 15 places to live in America, so it’s a place we can see millennials wanting to live.
Ames Tribune

(Didn’t believe me about that top 15 thing, did you?)

The dataset we worked with had 80 features and 2051 observations.

You can read an explanation of all the original features here.

While it was nice to have 80 features to work with, our goal, above all else, is simplicity.

This is a labor of love for our fellow millennials, intending to help, not profit from them. We aren't looking to charge exorbitant subscription fees or depend heavily on outside investment.

Since that is the case, we wanted to create a model that focused on essential features with information that is easy to obtain and does not depend on a “special sauce” that might not be easily reproducible from one location to another.

Here’s an example of some of the features that are highly correlated with house price

Note: For ordinal or nominal features I assigned numeric values based on median price

Models

Final features list:

While we tried engineering a variety of features, total_sf proved to be the best for boosting our model's performance.

We tried four general model types:

  • Linear Regression
  • Lasso
  • Ridge
  • KNN
Outside of K-Nearest Neighbors, all our models were actually underfit.

While the results varied from model to model, we generally saw a final R2 score of 91% to 92%.

Well, outside of our attempts with KNN, which on average had a final R2 score that was 6-7% lower than the other 3 model types we tested.

Since Ridge performed slightly better than Lasso, that is the model we chose in the end.

Our final model's performance:

Price is in USD.

Conclusions

  1. Yes, it is possible to create a model to accurately predict housing prices.
  2. Our model struggles with higher price points. But since millennials, on average, have $87,448 in debt, it's not something we are particularly concerned with. The typical millennial simply does not have the budget for those price points.
  3. We're confident about the scalability of our model. Because our model doesn't require a "secret sauce" and uses only a handful of features, we believe it can be rolled out to other locations with minor tweaking.