EXAMPLES OF DATA MINING QUESTIONS



EXAMPLES OF DATA MINING QUESTIONS

1. Describe example of data set for which apriori check would actually increase the cost? By describe I mean either show an instance of the data set or describe how would it look like.

2. Same question for MaxMiner. When does MaxMiner perform worse than apriori. How does MaxMiner generate the frequency counts for every itemset which meets support constraints?

3. Describe a data set for which sampling would actually increase the amount of work. In other words it would be faster to work on full data set.

4. Is support as defined in correlation rule paper Downward closed? Why?

5. How large is a contingency table for itemset of N items

6. Under what conditions AVG(Salary) > 100K would be downward closed; upward closed?

7. Assume that each item in supermarket is bought by 1% of transactions. Assume that there are 10 million transactions and that items are statistically independent. Assume mid-sup = 10. What is the expected size of a frequent set? What is the expected number of frequent sets?

8. Suppose that you have data describing the closing prices of the stock you own for the last 1000 days. Suppose you are interested in generating all rules which tell you about chances of your stock going up on a given day provided you know the pattern (up or down) on K preceding days, with some minsup and minconf defined. How would you model this problem as association rule mining problem, is there a way to represent this as transactions with binary attributes like in the supermarket case?

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download