Homework 1: warming up with matlab®



Soft Computing Homework 7: Linear Regression and Case-Based Reasoning

Due: Thursday November 2nd

Problem 1:

Use the linear regression algorithm described in class to find a line, in the form y = ax + b, that fits the data below. What is the value of this line for x = 7. (1 point)

X Y

1. 2

2. 4

3. 5

4. 5

5. 7

6. 8

Predict the value of x = 7 for the above data using the smoothing algorithm f(x + 1) = a f(x) + (1 – a) y(x), where a = .5. Assume f(1) = y(1) = 2. (1 point)

Plot the following in one graph: (2 points)

- raw data (x = 1 … 6)

- regression line (x = 1 … 7)

- smoothed line (x = 1 … 7)

Problem 2:

Create a case based reasoning system to determine the value of residential property. Modify the matlab code in appraiser.m. You should add your code below the comments saying “TO DO” in appraiser.m.

In the retrieval phase.

Determining weights for the various attributes.

Use leave-one-out testing to find good weights.

Determine a good number of properties to retrieve, N.

In the reuse phase.

Create rules that will adjust the value of the houses selected to more accurately reflect the price of the house being appraised.

Make one rule for each attribute (e.x. each extra bedroom is worth $1000).

Use leave-one-out testing to evaluate/improve your rules.

In the revise phase.

Combine the top N cases to create a single value for the house being appraised.

Should the N properties have different weights?

Your Goal:

The goal of the phases above is to create the most accurate estimator possible. The most accurate estimator will have the lowest total error in leave-one-out test mode.

What to hand in:

Run your modified program on the test houses and hand in the print out. (4 points)

Run your modified program in leave-one-out test mode and hand in the print out. What is the total error? (4 points)

Discuss your approach to each of the modifications above. (7 points)

List the attributes in order of importance for retrieved

Show how each attribute was adapted

Show how you aggregated the adapted values of the property

Modified appraiser.m code. (3 points) (email this)

Extra Credit: (? Points)

Determine a confidence in the appraised value for each of the test properties. The confidence says what the expected error is for the estimated value.

Discuss your confidence calculation.

The data in house_database.dat has the following format:

Each row is a different house.

The columns are as follows.

latitude - used to find distance between houses (1 unit = 200 yards)

longitude - used to find distance between houses (1 unit = 200 yards)

lot size - size of land the house is on (square feet)

living area – total size of all rooms in house (square feet)

bedrooms

bathrooms

quality - (0 - fair, 1 - good, 2 - excellent)

price – value of house in $

The data in test_houses.dat has the same format except it does not include price.

Extra Credit Hint - Use information from the CBR process such as:

Number of cases above a given similarity

Adaptation needed

Spread of actual/adapted cost of similar properties

Example confidence values are:

Example 1:

High – 95% of properties have error less than 4% of actual sales price.

Medium – 95% of properties have error between 2% - 8% of actual sales price.

Low – 95% of properties have error over 7% of actual sales price.

Example 2:

High – expected error under $5000

Low – expected error over $5000

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download