Our goal in this project was to make use of historical Airbnb listings data, as well as user review sentiment data, to provide point-in-time rental price predictions for large metropolitan areas and major cities. We aimed to use sentiment analysis and gradient boosted regression trees with the XGBoost algorithm to generate these predictions and share them with users via a web-based visualization tool.
The AIBNB application is composed of four different components - data engineering, REST API, analytics and frontend.
See below for a description of each component:
-
The data engineering component contains modules, functions and scripts to scrape reviews, calendar, and listings datasets from the www.insideairbnb.com website. It downloads, cleans, formats, and writes the data to an AWS-hosted MySQL database, where the records are served via a REST API.
-
The REST API component contains modules and functions to enable interaction between the database tables and the frontend. It was built using Flask and needs to run as a separate service in order for the frontend to retrieve data from the MySQL database.
-
The analytics component contains libraries and modules that perform sentiment analysis on review data and run the modelling experiments used for the final report. It also contains the code that generates the final predicted prices table that is displayed on the front-end. The code in this module performs the following steps:
- Generate sentiment scores for each listing in the aibnb.listings table that has a corresponding set of reviews in the aibnb.reviews table, and write the outputs to the aibnb.listings_sentiment table in the DB
- Train an XGBoost regression model to predict the price of a listing on a given date
- Run experiments and produce graphs to compare the peformance of the XGboost model using the sentiment scores as additional features vs not using the sentiment scores
- Use the XGBoost pricing model to predict rental prices for a set of listing IDs across various calendar dates
-
The frontend component is a data-driven application and map-based UI used to allow users to interact with the results of our model's predicted prices. It was built using React and interacts with the MapBox API to display listing data for all locations available in our datasets.
We used Docker to run the data engineering and analytics modules in order to simplify package and dependency management. In order to install the application locally, ensure your working directory is set to the "CODE" directory and follow the steps below:
-
Create an AWS account, if you do not already have an existing account.
-
Install the following software packages
- Docker Community Edition 18.06.1 or higher
- node.js 10.13.0 LTS
-
You will need to set up a MySQL database server instance on AWS and save the host details (e.g. URL, username, password). Please follow the tutorial here to create a MySQL DB instance on AWS.
-
Once you have created your database server, you will need to create a database named aibnb and ensure that your DB user has full privileges (CREATE/MODIFY/DELETE) on this database.
-
Build the data engineering module Docker image using the following steps:
- Update the MySQL database details in "integrate.sh" Shell scripts in the backend/data_eng/calendar and backend/data_eng/reviews directories to use your own DB details
- Update the MySQL database details in the backend/data_eng/listings/2mysql.py Python file to use your own DB details
- Run
cd backend/data_eng - Run
docker build . -t data_eng
-
Build the REST API module Docker image using the following steps:
- Update the MySQL database details in the backend/rest_api/rest/connect_db.py Python file with your own DB details
- Run
cd backend/rest_api - Run
docker build . -t rest_api
-
Build the analytics module Docker image using the following steps:
- Update the MySQL database details in the backend/analytics/.env environment file with your own DB details
- Run
cd backend/analytics - Run
docker build . -t analytics
-
Install the frontend application component using the following steps:
- Run
cd frontend/aibnb - Run
yarn install
- Run
To run the application locally, you will need to ensure that your AWS MySQL database is running and has sufficient CPU, RAM and disk space to handle the creation of all the database tables and returning query results to the frontend. Furthermore, your local client should have sufficient RAM and disk space (at least 16 GB RAM, 40 GB disk space) to train the machine learning models and run all the modelling experiments. Please follow the steps below to run the application locally.
WARNING: Running the application E2E will take many hours (at least 12 to 24 hours) because of the large amount of data that needs to be downloaded, processed and uploaded, as well as the training and scoring of the machine learning models.
-
Change your working directory to the CODE/backend directory.
-
Run the data engineering module and generate all the database tables by following the steps below:
- Run
mkdir /data - Run
docker run -it --rm -v /data:/data data_engto start an interactive Shell session in the data engineering Docker container - In the container's Shell session, run
./data_eng_pipeline.sh
- Run
-
Run the analytics module to train all the models, generate all experiment outputs (graphs, tables, etc.) and create the final calendar_predicted table which contains the predicted prices for all of the records in the "calendar" table - follow the steps below:
- Run
docker run -it analyticsto start an interactive Shell session inside the analytics Docker container - Once inside the container's Shell session, change your working directory to the sentiment directory and run
./run_sa.shto generate sentiment scores and write them to the MySQL DB - Change your working directory to the pricing folder and run
./run_pricing_trainto train the ML models and generate experiment outputs - Run
./run_pricing_score.shto load the best model and score the records in the aibnb.calendar table, and write the outputs to the aibnb.calendar_predicted table
- Run
-
Start the REST API by changing your working directory to the CODE/backend/rest_api directory and running
docker run -d rest_api. This will run the REST API container in detached mode until you force stop the container. -
Change your working directory to the CODE/frontend/aibnb directory and run
npm run startto start the frontend UI. The application will now be accessible at http://localhost:3000.

