Daglas Aitsen, Johan Kirikal, Mihkel Jaas Karu 2023

Housing market price analysis and prediction

The Idea

We were fascinated by the dynamics of the housing market and how economic factors contributed to its changes. Additionally, we wanted to compare different districts in terms of their affordability and attractiveness for potential buyers and renters. Our goals for this project were to visualize market trends and to potentially create a model that could predict housing prices based on various features.

Python
Matplotlib
Numpy
Tensorflow

Delta X competition

The project achieved third place in the 2023 Delta X Data Science Projects Competition, which featured a field of over 80 participants.

Collecting Data

We chose to focus our analysis on Estonia's two largest cities, Tallinn and Tartu, ensuring sufficient data availability. Initially, we obtained geospatial maps for both cities to visualize the data. Additionally, we acquired three distinct time-series datasets to enhance our analysis. The first dataset, scraped using a Python script from Estonia's leading real estate site, www.kv.ee, provided us with house prices, their locations, and sale dates, categorized by month and district from 2007 to 2023. The second dataset, openly available on the Statistics Estonia website, gave us insight into interest rates in Estonia. Lastly, we utilized data from Statistika Amet, which displayed average salaries in Tallinn and Tartu from 2007 to 2023. By integrating these datasets, we could effectively reflect changes in the housing market.

Time Series Forecasting Using an LSTM Model

As my primary contribution to the project involved generating ideas, collecting data, and building the time series forecasting model, I will refrain from delving into the market analysis aspect of our work. For further details, please visit the project's GitHub page.

As the concluding phase of our project, we sought to assess our ability to predict future average housing prices based on the data we had accumulated. After experimenting with a simple ARIMA model, I opted for a multivariate Long Short-Term Memory (LSTM) network to capture more intricacies and details from our historical data. The model incorporates housing prices, the number of housing advertisements, housing loan interest, and average salary differences over the past 60 months as features, producing a price prediction for the upcoming month. To optimize our model's performance, I conducted a manual grid search to identify the most effective hyperparameter values. The final network comprises 60 input nodes, an LSTM layer with 50 nodes, and two dense layers with 75 and 25 nodes, respectively. The accompanying plot illustrates one-shot forecasts for housing prices over the last 3 years. The average root mean square error for predictions is 100 €.

The comprehensive Jupyter notebook code repository containing the detailed model-building process with comments can be accessed by clicking here. See the model prediction results below.

form

Check Out The Project

If you are interested in exploring the project firsthand, you can visit the github repository here.