MIDTERM EXAMINATION - azharunisel



1MIDTERM EXAMINATIONAUGUST 2013 (COHORT 4) SEMESTER EXAMINATIONACADEMIC SESSION 2013COURSE: DATABASE SYSTEMCOURSE CODE: IAS 2143DURATION: 2 HOURS FACULTY: COMPUTER SCIENCE AND INFORMATION TECHNOLOGYPROGRAMME: BACHELOR OF SCIENCE IN INFORMATION TECHNOLOGY (HONS) BACHELOR OF COMPUTER SCIENCE (SOFTWARE ENGINEERING) (HONS) BACHELOR OF COMPUTER SCIENCE (HONS)BACHELOR OF INFORMATION TECHNOLOGY (KNOWLEDGE MANAGEMENT) (HONS)BACHELOR OF SCIENCE IN IT WITH SUPPLY CHAIN MANAGEMENT(HONS)BACHELOR OF BIOINFORMATICS (HONS)LECTURER: MR AZHAR HAMIDGROUP: MR AZHAR HAMID (1) INSTRUCTIONS TO CANDIDATESLECTURER :_________________________________________________________NAME :___________________________________________________________MATRIC NO : _____________________________________________________PROGRAM : ______________________________________________________This paper consists of (3) sections:Section A – 20 questions (multichoice)Section B – 4 questions (structured)Section C – 2 questions (essay)All answers of:Answer all questions in Section A, B and C. All questions must be answered in the spaces provided. Candidates are NOT ALLOWED to bring any unauthorized items into the exam hall except with the permission of your invigilator.Do Not Open the Question Paper Until Instructed This Question Paper Consists of Ten (10) Printed PagesSECTION A (20 Marks) Answer ALL the questions. For each question circle the correct answer.1A goal of data mining includes which of the following?A.To explain some observed event or conditionB.To confirm that data existsC.To analyze data for expected relationshipsD.To create a new data warehouse2Fact tables are which of the following?pletely denoralizedB.Partially pletely normalizedD.Partially normalized3Data transformation includes which of the following?A.A process to change data from a detailed level to a summary levelB.A process to change data from a summary level to a detailed levelC.Joining data from one source into various sources of dataD.Separating data from one source into various sources of data4The load and index is which of the following?A.A process to reject data from the data warehouse and to create the necessary indexesB.A process to load the data in the data warehouse and to create the necessary indexesC.A process to upgrade the quality of data after it is moved into a data warehouseD.A process to upgrade the quality of data before it is moved into a data warehouse5The extract process is which of the following?A.Capturing all of the data contained in various operational systemsB.Capturing a subset of the data contained in various operational systemsC.Capturing all of the data contained in various decision support systemsD.Capturing a subset of the data contained in various decision support systems6A star schema has what type of relationship between a dimension and fact table?A.Many-to-manyB.One-to-oneC.One-to-manyD.All of the above.7. The content of a data warehouse is said to be ‘non-volatile’, becauseA. It remains the same even after the system crashesB.Its life-span is very longC.It is a read-only dataD.It disappears when the system is switched off8. A data warehouse is built as a separate repository of data, different from the operational data of an enterprise becauseA.It is necessary to keep the operational data free of any warehouse operationsB. A data warehouse cannot afford to allow corrupted data within it.C.A data warehouse contains summarized data whereas the operational database contains transactional dataD. It contains data derived from multiple sources9. A ‘Data mart’ isA.the analysis unit of the overall architecture of a warehouseB.a stand-alone of a data warehouse, by itself, of a department of the enterpriseC.an essential component in the sense that every data warehouse necessarily has several data martsD.a place where data are dumped without any restrictions .The followings are possible value in fact table EXCEPT;A.Product_idB.Time_idC.Quantity soldD.SizeThe followings are possible value in dimension table EXCEPT;A.Product_idB.SizeC.Quantity soldD.Color12. What is ‘ROLAP’?A. ROLAP is an OLAP engine for (i) multidimensional models and (ii) SQL based OLAP operationsB.ROLAP is an OLAP engine for (i) relational model and (ii) SQL based OLAP operationsC.ROLAP is an OLAP engine for (i) Multidimensional models and (ii) SQL queries, but does not support ‘slice’ and ‘dice’ operationsD.ROLAP is a set of relational operations equivalent to OLAP operations13. What is ‘MOLAP’?A.MOLAP is an OLAP engine for (i) relational models and (ii) multidimensional OLAP operationsB.MOLAP is an OLAP engine for (i) multidimensional models and (ii) SQL based OLAP operationsC.MOLAP is an OLAP engine for (i) Multidimensional models and (ii) supports multidimensional OLAP operationsD.MOLAP is a ROLAP with a supporting multidimensional model.14. Choose the correct answer:A.Data mining and data warehousing are totally unrelatedB.Data mining is the application of data warehousing.C.Data warehousing is the application of data mining.D.Data mining and data warehousing are the same15. The following statement are TRUE about data mining and data warehouse EXCEPT;A.Many IT professionals use the two terms interchangeably.B.Goals of both are related and often overlap.C.Dedicated to furnishing different types of analytics.D.Dedicated to furnishing same types of users.16. Data mining is used to aid in A.operational management B.analyzing past decision made by managers C.detecting patterns in operational data D.retrieving archival data 17. Data mining requires A.large quantities of operational data stored over a period of time B.lots of tactical data C.several tape drives to store archival data D.large mainframe computers 18. Data mining cannot be done if A.operational data has not been archived B.earlier management decisions are not available C.the organization is large D.all processing had been only batch processing 19. By continually breaking data sets into two separate smaller groups, a predictive model can be built. This statement refers to _______________.A.Statistical AnalysisB.Neural NetworksC.Genetic AlgorithmsD.Decision Trees20. Which of the following answer is NOT the tools for data visualization to present data mining analysis?A. GraphicalB. GeometricC. Text-basedD. Pixel-basedSECTION B (15 Marks)Answer ALL questions.Question 1Discuss the difference between data warehouse and data mining.(10 marks)Answer:Data warehousing is completely different from data mining. However, data warehousing and data mining are interrelated. Data warehousing is the process of compiling information or data into a data warehouse. A data warehouse is a database used to store data. It is a central repository of data in which data from various sources is stored. This data warehouse is then used for reporting and data analysis. It can be used for creating trending reports for senior management reporting such as annual and quarterly comparisons.?The purpose of a data warehouse is to provide flexible access to the data to the user. Data warehousing generally refers to the combination of many different databases across an entire enterprise.?The main difference between data warehousing and data mining is that data warehousing is the process of compiling and organizing data into one common database, whereas data mining is the process of extracting meaningful data from that database. Data mining can only be done once data warehousing is complete.?Data Mining is actually the analysis of data. It is the computer-assisted process of digging through and analyzing enormous sets of data that have either been compiled by the computer or have been inputted into the computer. In data mining, the computer will analyze the data and extract the meaning from it. It will also look for hidden patterns within the data and try to predict future behavior. Data Mining is mainly used to find and show relationships among the data.?The purpose of data mining, also known as knowledge discovery, is to allow businesses to view these behaviors, trends and/or relationships and to be able to factor them within their decisions. This allows the businesses to make proactive, knowledge-driven decisions.?The term ‘data mining’ comes from the fact that the process of data mining, i.e. searching for relationships between data, is similar to mining and searching for precious materials. Data mining tools use artificial intelligence, machine learning, statistics, and database systems to find correlations between the data. These tools can help answer business questions that traditionally were too time consuming to resolve.?Data Mining includes various steps, including the raw analysis step, database and data management aspects, data preprocessing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.Question 2a)Below is a list of the “seven deadly sins” in data warehouse implementation suggested by Kozar.Sin1 – If you build, they comeSin2 – Omission of an architectural frameworkSin3 – underestimating the importance of documenting assumptionsSin4 – Failure to use the right toolSin5 – Life cycle abuseSin6 - ignorance concerning the resolution of data conflictsSin7 - failure to learn from mistakeExplain FIVE (5) of them.[7 marks]“If you build it, they will come” – the DW needs to be designed to meet people’s needsOmission of an architectural framework – you need to consider the number of users, volume of data, update cycle, etc.Underestimating the importance of documenting assumptions – the assumptions and potential conflicts must be included in the frameworkFailure to use the right tool – a DW project needs different tools than those used to develop an applicationLife cycle abuse – in a DW, the life cycle really never endsIgnorance about data conflicts – resolving these takes a lot more effort than most people realizeFailure to learn from mistakes – since one DW project tends to beget another, learning from the early mistakes will yield higher quality laterQuestion 3Illustrate data mining model.(10 marks) Data mining modelPredictiveDescriptiveClassification Regression Time series Prediction Clustering Summarization Association Sequence analysis rules discoveryQuestion 4Explain market basket analysis.(2 marks)Market Basket Analysis - is the most widely used and, in many ways, most successful data mining algorithm.It essentially determines what products people purchase together.Explain ONE (1) usage of market basket analysis(2 marks)Stores can use this information to place these products in the same area.Direct marketers can use this information to determine which new products to offer to their current customers.Inventory policies can be improved if reorder points reflect the demand for the complementary products. List and explain the three main components in measuring( in market basket analysis)(6 marks)Support refers to the percentage of baskets where the rule was true (both left and right side products were present).Confidence measures what percentage of baskets that contained the left-hand product also contained the right.Lift measures how much more frequently the left-hand item is found with the right than without the right.SECTION C (50 Marks)Answer ALL questions.Question 1An international winter clothing boutique franchise company is interested in data warehouse implementation by executing a data mart in their holding company. The CEO of ‘Snow Boutique’ said that he want to have analysis of the sales for his entire store to support their strategic marketing plan.He wants to know how many products they have sell in their stores for specific period of time such as by day, month, quarter and year. He also wants to have the figure of sales qty, amount and the cost for particular store, product and period.The product of their stores such as sweater, shoes and gloves are varies by color and size and the store is located per city which is under responsibility of a manager contactable by telephone number.You are required to draw a star schema for ‘Snow Boutique Holdings’ considering all information given by the above paragraph.(30 Marks)Ans: 1 mark sales fact table and each one of dimension tables. ( 4 Marks)1 mark for each attributes in each table. ( Product = 4, Period = 5, Store = 5, Sales = 6)1 mark for each connection from fact table to dimension table. ( 3 Marks)? mark for each foreign key underlined and ? mark for primary key underlined. (3 Marks)Question 2Consider the following transactions at a groceries stores:Transaction 1Coke, newspapers, eggs, curry powder, candy barTransaction 2Milk, cereal, yogurt drinks, coke, candy barTransaction 3Coconut milk, rice, onion, kitchen towel, eggs, vegetablesTransaction 4Vegetables, eggs, meat, soy sauceTransaction 5Candy bar, coke, newspaperTransaction 6Vegetables, soy sauce, fish, rice, candy barTransaction 7Candy bar, cokeTransaction 8Vegetables, milk, eggs, breadTransaction 9Vegetables, fishBased on the Market Basket Analysis approach use the appropriate rules to calculate the following:a)Calculate the support of the item “coke” (1 marks)4/9 X 100 = 44.4%b)Calculate the support measure for the rule “candy bar” (1 marks)5/9 X 100 = 55.6%c)Calculate the confidence of the rule “candy bar IMPLIES coke”. (3 marks)confidence = support of combination/ support of conditionsupport of combination = 4/9 X 100% = 44.4%support of condition = 5/9 X 100% = 55.6%confidence = 44.4 / 55.6 X 100% = 79.8%d)Calculate the confidence of the rule “vegetables IMPLIES coke”.(3 marks)confidence = support of combination/ support of conditionsupport of combination = 0/9 X 100% = 0%support of condition = 5/9 X 100% = 55.6%confidence = 0/55.6 X 100% = 0%e)“Fathers who go to the groceries store, are most likely will be accompanied by their children.”(2 marks)Support the above statement. Show the related calculation to support your answerNote : Student need to see relationship between candy bar and coke and candy bar with othersCoke IMPLIES candy barconfidence = support of combination/ support of conditionsupport of combination = 4/9 X 100% = 44.4%support of condition = 4/9 X 100% = 44.4%confidence = 44.4 / 44.4 X 100% = 100% END OF QUESTION PAPER ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download