Www.ijsdr.org



HOUSE PRICE PREDICTION USING GRADIENT BOOST REGRESSION MODEL

1B.Vijay Kumar, 2 B.Ashritha, 3 CH.Teja, 4 M.Vineeth

1Assistant Professor, 2Student, 3Student, 4Student

1Department of Information Technology,

1JBIET, Hyderabad, India.

1Vijaykumar.it@jbiet.edu.in, 2bompallyashritha1999@, 3teja.ch2015@, 4vineethmaroodi@.

______________________________________________________________________________________

Abstract: House prices increase every year, so there is a need for a system to predict house prices in the future. House price prediction can help the developer determine the selling price of a house and this also helps the customer to purchase a house at right time. There are three factors that influence the price of a house which include physical conditions, concept and location. This research aims to predict house prices based on trained house classifier model which is done using gradient boosting algorithm Malang with regression analysis. This takes the user's information about their house as input and predicts the price of their house using a Gradient Boosting model with Huber loss and 1000 regression trees of depth 6.

Index Terms – Machine learning, non-householders, prediction regression, landholdings, location.

________________________________________________________________________________________________________

Introduction

Comparing the housing crisis of the last decade with the 1930s-great depression, Global Research estimated that with the current 316 million Americans, the foreclosure rate was higher than the great depression era. During the bubbling phase of the housing market, real estate market saw a large increase in transactions. Buyers were willing to make offers to available properties and home sellers made the most of the situation. Mortgage lenders developed new models of lending like stated income and no credit verification. However, some months after the housing bubble began, things fell apart, and the market took a downward trajectory, the real estate market collapsed and the bubble busted .There have been different research work on the free fall of the housing market.

The policy of the government to encourage homeownership and the predatory lending practices of the subprime mortgage were mostly blamed for the crisis. However, braking away from the public perception, the former mayor of New York, Michael Bloomberg blamed the crisis on the congress. On November 1, 2011 at a breakfast in mid-town Manhattan, he argued that it was the congress and not the banks that created the illusion of home ownership for everyone. He concurred to the notion that some people wouldn’t have been homeowners if not for the congress that pushed Fannie Mae and Freddie Mac to make loan available to qualified and unqualified would be mortgagors. Overvaluation of real estate prices has been blamed as one of the primary reasons why the housing market went out of control. The present value estimation system uses sales comparison approach to determine the price of houses. The major setback of this approach is that the value of a house is expressed as a function of its three recently sold closest neighbours. Price adjustment are made up for variation where necessary. This approach has been criticized for its vulnerability to price manipulation by stakeholders.

Investment is a business activity that most people are interested in this globalization era. There are several objects that are often used for investment, for example, gold, stocks and property. In particular, property investment has increased significantly since 2011, both on demand and property selling. “Location, Location, Location” is frequently used real estate agents when marketing residential properties. Numerous studies have demonstrated the importance of location and the surrounding neighbourhoods of properties in price determination. However, some neighbourhood attributes may be unobservable and it remains an open question of how best to capture location information in house price modelling. One of the increasing of property demand is because of high population in Indonesia. Indonesian Central Bureau of Statistics states that in East Java 50% of the population of East Java classified as a young population who have age approximately at 30 years old. The result of this census indicates that the younger generation will need a house or buy a house in the future.

Based on preliminary research conducted, there are two standards of house price which are valid in buying and selling transaction of a house that is house price based on the developer (market selling price) and price based on Value of Selling Tax Object (NJOP). According to Lim, et al the fundamental problem for a developer is to determine the selling price of a house. In determining the price of home, the developer must calculate carefully and determine the appropriate method because property prices always increase continuously and almost never fall in the long term or short. There are several approaches that can be used to determine the price of the house, one of them is the prediction analysis. The first approach is a quantitative prediction. A quantitative approach is an approach that utilizes time-series data.

LITERATURE SURVEY

Theoretical background suppose we have two differentiated commodities A and B of the same characteristic features, then, all things being equal, hedonic pricing theory suggests that the value of the two commodities will be approximately the same. The intuition here is the economic rule of commodity substitution. The substitutability of two commodities rule suggests that if A and B are two differentiated commodities of the same characteristic features, under an arm’s length transaction, enable and willing economic actor will not purchase B at a price more than the value of A. For example, if A and B are real estate properties in the same neighbourhood with the same structural and social characteristic features. Suppose the recent sales price of A is $200,000.00. In an arm’s length transaction and all things being equal, the market value of B will not be more than $200,000.00. (Utilities of both properties are the same). This is in line with Lancaster’s consumer behaviour’s Theory.

Lancaster (1966) argued that the utility of a differentiated commodity is a function of its characteristic features. Lancaster’s argument is based on the fact that the utility of a differentiated commodity is not based directly on the commodity but on its characteristic features. In a seminar paper, Rosen (1974), postulated the hedonic pricing of a differentiated commodity using the implicit prices of its characteristic features. Thus, given its characteristic Features, we can estimate the price of a differentiated commodity. While Lancaster’s argument implies a linear function, Rosen on the other hand argued for a non-linear price function and non- constant marginal price, because characteristic features are bundles which cannot be untied.

________________________________________________________________________________________________________

PROPOSED SYSTEM

In this proposed system, we focus on predicting the house price values using machine learning algorithms like Gradient Boost regression model. We proposed the system “House price prediction using Machine Learning” we have predicted the House price using the Gradient boost regression model. In this proposed system, we were able to train the machine from the various attributes of data points from the past to make a future prediction. We took data from the previous year stocks to train the model .The data set we used was from the official Organisations. Some of data was used to train the machine and the rest some data is used to test the data.

The basic approach of the supervised learning model is to learn the patterns and relationships in the data from the training set and then reproduce them for the test data. We used the python pandas library for data processing which combined different datasets into a data frame. The raw data makes us to prepare the data for feature identification. The attributes were stories, no. of bed rooms, bath rooms, Availability of garage, swimming pool, fire place, year built, area in sqft, sale price for a particular house. We used all these features to train the machine on Gradient boost regression and predicted the house price, which is the price for a given day. We also quantified the accuracy by using the predictions for the test set and the actual values. The proposed system gives the Predicted price.

_______________________________________________________________________________________________________

SYSTEM ARCHITECTURE

The first step in this is collection of raw data from the various sources and dataset can be of any historical data of the organization. From the raw data we can extract the attributes which are used for the prediction. After extraction, we can train the data model using these previous datasets. Here we should give Testing data as input to data analytical tool. [pic]

ALGORITHM:

GRADIENT BOOST REGRESSOR:

We used the python pandas library for data processing which combined different datasets into a data frame. The raw data makes us to prepare the data for feature identification. Gradient Boosting for regression builds an additive model in a forward stage wise fashion. It allows for the optimization of arbitrary differentiable loss functions. In each stage, a regression tree is fit on the negative gradient of the given loss function.

The idea of boosting came out of the idea of whether a weak learner can be modified to become better. A weak hypothesis is defined as one whose performance is at least slightly better than random chance. The Objective is to minimize the loss of the model by adding weak hypothesis using a gradient descent like procedure. This class of algorithms was described as a stage-wise additive model. This is because one new weak learner is added at a time and existing weak learners in the model are frozen and left unchanged.

Input: ml_house_dataset.csv, values of new house for prediction.

Output: House cost prediction

Step 1: Load the data set

df = pd.read_csv(“ml_house_data_set.csv”)

Step 2: Replace categorical data with one-hot encoded data

Step 3: Remove the sale price from the feature data

Step 4: Create the features and labels X and Y arrays.

Step 5: Split the data set in a training set (70%) and a test set (30%).

Step 6: Fit regression model.

Step 7: Save the trained model to a file trained_house_classifier_model.pkl

Step 8: Predict house worth using predict function

______________________________________________________________________________________________________

MODULES

Data Collection

Firstly, Dataset can be collected from various sources of any organization. The right dataset helps for the prediction and it can be manipulated as per our requirement. Our data mainly consists of the attributes of houses available in particular Area. The data can be collected from the organization based on the house area, no. of bed rooms, bath rooms, availability of swimming pool, fire place. By collecting these it makes accurate in prediction.

Data Processing

7

8 At the beginning, when the data was collected, all the values of the attributes selected were continuous numeric values. Data transformation was applied by generalizing data to a higher-level concept so as all the values became discrete. The criterion that was made to transform the numeric values of each attribute to discrete values depended on the closing price of the house. The attribute values of the houses of area are taken to predict the price of the house in that area.

Training the Data

After the data has been prepared and transformed, the next step was to build the classification model using the decision tree technique. The decision tree technique was selected because the construction of decision tree classifiers does not require any domain knowledge, we can done by using the Decision Tree Classifier () in which 70 % of the data is used for training the data and another 30 % is used for testing the data.

Deploying the Model

The classification rules are generated from the decision tree algorithm. The trained data can be used for the Testing the data. It helps to give the output or accurate Predicted price of the stock using this model.

CONCLUSION

This project entitled “House Price Prediction Using Gradient Boost Regression Model.” is useful in buying the houses, by predicting house prices, and thereby to guide their buyers accordingly. The proposed system is also useful to the buyers to predict the cost of house according to the area it is present. Gradient boosting algorithm has high accuracy value when compared to all other algorithms regarding house price prediction. There can be a further improvement to the metric by doing some pre-processing before fitting the data.

_______________________________________________________________________________________________________

REFERENCES

1. K.J Lancanster “A New Approach to Consumer Theory," Journal of Political Economy, vol. 74, no. 2, pp. 132-157, 1966.

2. .Sherwin Rosen, "Hedonic Prices and Implicit Markets: Product Differentiation in Pure Competition," Journal of Political Economy, vol. 82, no. 1, pp. 34-55, 1974.

3. Chau Kwong Wing and T. L. Chin, "A Critical Review of Literature on the Hedonic Price Model," International Journal for Housing Science and Its Applications, pp. 145-165, 2003.

4. Zvi Griliches, "Hedonic Price Indexes for Automobiles: an Econometric Analysis of Quality Change," National Bureau of Economic Research, vol. 0-87014-072-8, 1961.

5. Richard C. Ready and Charles W. Abdalla, "The Amenity and Disamenity Impacts of Agriculture: Estimates from a Hedonic Pricing Model," American Journal of Agricultural Economics, vol. 87 (2), pp. 314-326., May2005.

6. Zvi Griliches, "Hedonic Price Indexes for Automobiles: an Econometric Analysis of Quality Change," National Bureau of

Economic Research, vol. 0-87014-072-8, 1961.

7. Richard C. Ready and Charles W. Abdalla, "The Amenity and Disamenity Impacts of Agriculture: Estimates from a Hedonic

Pricing Model," American Journal of Agricultural Economics, vol. 87 (2), pp. 314-326., May 2005.

8. Ben Monty and Mark Skidmore, "Hedonic Pricing and Willingness to Pay for Bed and Breakfast Amenities in Southeast

Wisconsin," Journal of Travel Research, vol. 42, no. 2, November 2003.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download