Exploratory Data Analysis- Property rental market, Luanda
For the past few months I have been learning how to code, R has been my programming language of choice where I have been focus on data analysis topics. For me the best approach to learning is by doing, thus, I have decided to do a small Exploratory Data Analysis(EDA) project to practice my coding skills in R.
This project analyses the property rental market (apartments and houses)in Luanda, Angola, my hometown. A side for practicing coding skill, this project also looks to help those looking to rent property in Luanda by looking to the following aspects:
- What is the number of properties are in the rental market: apartments vs houses, number of rooms.
- What is the relationship between rental price with distance to the city center, and rental price vs number of rooms.
- What is the average rental prices per location and per number of rooms.
background info
- Luanda is the Angolan capital and the most populous city with around 7 million habitants.
- Exchange rate used: AOA653 = $1
- Please note that property prices might only represent asking prices by the owners.
1- Getting data
The first step of every EDA project is to get the data, since I could not find a databases with recent property rental information I decided to scrap rental listings from the following websites : www.angocasa.com and www.casa.sapo.ao. I have also used google search to find distances from locations to city center.
2- Data scrapping
Data scrapping is a technic used to read and store information in html language(programming language for websites). I use the package called “Rvest”, with functions to scrap all of the website pages and store it in a data table formate. (see full code below).
library(rvest) #scrapping data# save 1st websiteurl <- "https://casa.sapo.ao/alugar-casas/apartamento/distrito.luanda/?pn=%d" # formula to scrap all the pages and give me the results in table form map_df (1:11,
function(i) {
page <- read_html(sprintf(url,i))
data.frame(Listagem = html_text(html_nodes(page,".searchPropertyTitle")), Price= html_text(html_nodes(page,".searchPropertyPrice")), stringsAsFactors = F) }) -> sapo_casas
3- Data cleaning
Data cleaning was the biggest portion of this project, 90% of the time was spend cleaning data and most of the data cleaning was done in R, but I also used Excel to write some incomplete location names from the website.
The data cleaning process was about cleaning unnecessary information from the html code, filtering data, taking out missing values, creating tables, remove outliers, merging and binding data. To do that I used packages such as: dplyr, gsubfn, tidyverse ( see full code below).
3- Exploratory Analysis
To start the analyzing the data, I had to first visualize the information, using mainly the R package ggplot2(full code below).
With data visualizations I am trying to make sense of the rental information by understanding relationships, distribution, and averages.
Number of rental listings
First I will look at the number of rental listings per property type to a have a sense of the final sample size.
There are around 800 Apartments rental listings and 270 houses in this sample size.
Majority of the property rental listings are in Talatona with 441 properties, Kilamba has the lowest number of properties for rental with only 11 properties.(data my not represent the full number of rental listings in Luanda)
Majority of rental listings are apartments with 2 and 3 rooms.
Histograms — Rental price
Now let’s look at a histogram, to understand what is the distribution of the number of rental by price groups.
Apartments have a higher rental listings concentration on a range below $2000, houses have a higher concentration on a range below $3000.
Distance to the city center vs Monthly rent price
The relation between price and distance allow us to understand how can prices vary giving their distance to the city center
At a first glance it does not seem that there is a linear relationship with prices vs distance, every location with a few exceptions have multiple prices within similar ranges with majority of prices being below $4000 and a few outliers ranging from $4000 to $6000.
This indicates that there are other major influences on a rental price other than distance, such as: number of rooms, amenities, security, others…
Lets look at the relationship between price and number of rooms
We can see that the higher the number of rooms the higher the rents, however it seems that rentals with 1 to 5 rooms have similar distributions and ranges in prices, location my be a factor into this.
Average price per location
Now let’s look at the average price per location to see how each location ranks in terms of average rental prices.
Miramar has the highest average price (~$2,500) for rental in Luanda and Kilamba has the lowest (~$500).
Now let’s look on how each location ranks if we separate apartments and houses.
Average rental monthly price per location and per number of rooms.
Below we can see the average rental price per location and per number of rooms, this gives us more specific details to our preference of location and number of rooms in a property.
Conclusion
- Overall the Luanda property market presents more apartments than houses for rental.
- Distance does not seems to contribute by much on the rental prices differences, other factors might be more relevant.
- It is more likely to find properties with 2–3 rooms for rental.
- Miramar has the highest average price for rental in Luanda and Kilamba has the lowest.
- Talatona is the location with most rental listings.
Coding
This was a good project to practice coding in R especially technics to gather, clean, and visualize that data. Many hours were spend googling answer to specific questions on how to clean and visualize data on a certain way. but at the end many skills and techniques were learned.
See the full code below.