LU10 Notes - DbWiki



Lab For LU12 – Physical Database Design and Data ConversionIn this lab you will learn the basics of physical database design and learn how to use SQL to perform data conversion techniques. Learning ObjectivesDescribe the steps in the physical design phaseExplain the various data storage formats for attributesExplain the appropriate conditions to de-normalizeDescribe sequential, indexed and direct (hashed) file organizationsExplain when to use indexing as a method for improving performanceExplain the issues associated with data migrationDescribe various techniques for migrating data in SQL implementationsLab GoalsOur lab goals are to:Learn how to import data into SQL serverUse SQL to cleanse, transform and load our source data into the target tables.Use SQL to perform data verification. This lab consists of two parts in part one; we will walk you through a data migration scenario. In part two you will attempt a data migration on your own.What you will need to beginMicrosoft SQL Server Management Studio (for executing SQL queries).Microsoft Access 2003 or higher for importing data.Your account credentials on the ISTSTUDENTS server.The lab zip file containing the data to import and the target SQL schema.Part 1: Data Migration Walk-ThruScenarioYou have been hired by fictitious book-selling giant Fudgenbooks to improve the design of their existing database. Fudgenbooks currently keeps inventory in a spreadsheet - which is far from ideal. You normalized their existing database and created the SQL for the new data model. The last step is to migrate the data into the new data model.Step one: Import Source Data into SQL ServerFudgenbooks keeps their existing database in a spreadsheet. Your first task will be to import that spreadsheet as-is into your SQL server database. One of the things Microsoft Access does very well is connect to other data sources. This makes Access a verify versatile tool for assisting in data migration.Create a new project in Microsoft Access. Access Data projects are different from Access Databases in that the back-end data is in SQL Server. This creates a two-tier client/server model and a distributed database with location transparency.Open up Microsoft Access from the start menu. From the Office menu, select New. Then click on [Blank Database]Click on the folder icon near the filename to browse for the file destination folderSelect your folder then click on [SAVE AS TYPE] drop down optionSelect the [Microsoft Access Projects *.adp] option. Then type a filename (no need to indicate extension). Notice the name and the extension adp. Click on create.You will be askedSelect YesComplete the details using the sample below.Be sure to check thisMake sure this is your databaseTest your connection before clicking OKFrom the menu, External Data Excel. From the import dialog, change the file type to Excel and select the fudgenbooks.xls file (The file is available in the LMS).When you’re ready, select ImportFollow along with the Import Spreadsheet Wizard. Check First row contains column headings then click Next. Store data in a new table, click Next. Name the table fudgenbooks_import and click Finsh. If you are successful, you should have a table in your database called fudgenbooks_import which looks like this (yes, it is not a lot of data, but it is enough )At this point, you no longer need Microsoft Access. It has done its job by helping us get the spreadsheet data into SQL server.Step two: Create target database schemaIn this next step you will use SQL Server Management Studio to create the target database schema (tables, keys and constrains). It should be noted that this is not a typical step in the data migration process, since usually the target schema already exists. Open SQL Server Management Studio and connect to your SQL server instance.From the menu choose File Open File. Open the file from the zip file archive called create_fudgenbooks_schema.sql. After the file opens, execute the SQL script to create the 4 fb_* tables. Create a database diagram by adding all the tables that begin with fb_ this should create a diagram like this:Save the diagram as Fudgenbooks.Step three: Data cleansingThe data here represents the same thing, but it is represented differently ‘mysql’ vs. ‘MySQL’ this need to be corrected before we transform the data.The next step in the migration process is data cleansing. In this step you “clean up” the source data so that things will be nice and neat when we do the data transformations. What do we need to clean up? Take a look at the following:To clean the data up we run SQL update statements on the source data. Open a new query window in SQL Management Studio and type the following:UPDATE fudgenbooks_import SET Subj1 = 'MySQL' WHERE lower(Subj1) = 'mysql';UPDATE fudgenbooks_import SET Subj2 = 'MySQL' WHERE lower(Subj2) = 'mysql';UPDATE fudgenbooks_import SET Subj1 = 'MSSQL' WHERE lower(Subj1) = 'mssql';UPDATE fudgenbooks_import SET Subj2 = 'MSSQL' WHERE lower(Subj2) = 'mssql';The resultant source data changes to look like this:Now that the data is clean, we’re ready to transform!Step four: Transform the source data and load into the target tablesThis next step is the critical step. Here’s where we move data from the source to the destination tables. In looking at our target schema, we must start our import into the tables without foreign keys. Therefore we first migrate data into the fb_publishers table, then fb_bookstore, fb_authors, and fb_subjects in that order.Regardless of the table we’re migrating, the approach is the same. First, create the SQL SELECT statement(s) which output the data as you need it to appear in the target table. Keep in mind these SQL techniques as you attempt to accomplish the task:The DISTINCT keyword outputs unique rows of data. This is important for building lookup tables, and tables on the “one” side of the 1-M relationship.The UNION keyword allows you to combine like SQL statements into one output. This is useful for addressing multi-valued attributes and repeating groups of data.To combine columns of data use the + operator. eg.[Last Name] + ‘, ‘ + [First Name] as [Full Name]To extract one column into two columns use the CHARINDEX function. eg. left([Full Name],charindex([Full Name],’,’)) as [Last Name]Use cast and convert to change the data type of columns First we load up the fb_publishers table:-- 1st step: this statement populates the publishers tableinsert into fb_publishersselect distinct Pub_id, [Publisher Name], [Publisher URL] from fudgenbooks_importThen the table data should look like this.Next we load the fb_bookstore table:-- 2nd this statemnt populates the fb_bookstore tableinsert into fb_bookstoreselect distinct convert(int, [ISBN]), [Title], [Price], [Pages], [Pub_id] from fudgenbooks_importAnd the table data looks like:Finally we move on to the fb_authors and fb_subjects tables. These require union queries since columnar data needs to be moved into rows:-- 3rd this statement populates the fb_authors table insert into fb_authorsselect Author1, convert(int,ISBN) from fudgenbooks_importwhere Author1 is not nullunionselect Author2, convert(int,ISBN) from fudgenbooks_importwhere Author2 is not nullunionselect Author3, convert(int,ISBN) from fudgenbooks_importwhere Author3 is not nullAnd the table data looks like:-- Last: this statement populates the fb_subjectsinsert into fb_subjects select Subj1, convert(int,ISBN) from fudgenbooks_importwhere Subj1 is not nullunion select Subj2, convert(int,ISBN) from fudgenbooks_importwhere Subj2 is not nullWhew! That seems like a lot of work. We did, however automate it to the point where it will work each and every time regardless of the actual data. One thing I might change in hindsight is to alter the ISBN column in the fudgenbooks_import table during the cleansing phase to avoid having to use convert(int,ISBN)(This was necessary because conversion from float to varchar contain decimals and the ISBN could have come over like this 121398742.0 !Step five: Data VerificationThe final step is to perform data verification on the target data to make sure it does indeed have the same data as the source. The easiest (but farthest from complete) means of accomplishing this is to check row counts.The following query compares the total row counts of the original data source and the corresponding table in the target data source.SELECT (select count(*) from fb_bookstore) as fb_bookstore_count,(select count(*) from fudgenbooks_import) as fudgenbooks_countRow counts work as long as you’re also checking to make sure the SQL commands are not generating errors. Data that errors out during transformation / loading well definitely affect row counts. While not true data verification, it is a good idea to spot check some sample data for correctness. For example, this query joins all the tables back together to give you a view that closely resembles the original data.SELECT b.isbn, title, price, pages, p.pub_id, pub_name, pub_url, author, subjFROM fb_bookstore b JOIN fb_publishers p ON b.pub_id=p.pub_idJOIN fb_authors a ON b.isbn=a.isbnJOIN fb_subjects s on b.isbn=s.isbnORDER BY titleQuestionsData cleansing and data transformation are similar tasks, but they do differ in some significant ways. How do they differ?What are the challenges associated with automating the transformation and load phases of data migration? In this example why is it not possible to re-create the original source data from the target data? Hint: what normal form was the source in when we started?Now that you’ve experienced a migration. Do you think explicit transactions might bring anything useful to the process? If so, what?Many DBMSs like SQL Server have their own built-in import and export facilities which can be used for the extraction phase of migration. What might the advantages of using a 3rd party application such as Microsoft access be?Part 2: Perform your own data migrationIn this part, you will perform your own data migration by applying the principles and techniques used in the walk-thru portion of the lab in part one. OverviewMigrate data from the spreadsheet named fudgeconsulting.xls in the lab zip file, into the SQL schema for fudge consulting create_fudgeconsulting_schema.sql once you’ve performed the data migration, verify the data by “eyeballing” the original table against a SELECT statement that joins the target data mode back togetherHere are some hints to make the process a little easier:Use Microsoft Access to import the data as-is into SQL server, just like we did in part 1 of the lab. Execute the SQL file to create the target schema and then create a database diagram so that you have a picture of what the internal model looks like, and can determine which tables to migrate first.Microsoft Access imports the [Proj Num] and [Emp No] columns as float. During the data cleansing step, alter the fudgeconsulting_import table and change the columns data type to int. This will make the transformation / load step much easier. For example:ALTER TABLE fudgeconsulting_import ALTER COLUMN [Proj Num] intYou’ll have to do this a second time for the [Emp No] column.Remember to use DISTINCT to only return the unique rows. Also when you need to split the last name and first name from the [Emp Name] use expressions. Hint: This gives you the last name: right([Emp Name], len([Emp Name]) - charindex(' ',[Emp Name]))You don’t need to worry about [Total billed] it has been implemented as a calculated column with a function.When you verify the migration run two SQL queries and compare them:QuestionsWrite down the SQL statement you used to transform data into the fc_rates table.Write down the SQL statement you used to transform data into the fc_projects table.Write down the SQL statement you used to transform data into the fc_employees table.Write down the SQL statement you used to transform data into the fc_consulting table.Write Which SQL statement to verify the data is the same as the source. (The output should look like the bottom grid in #6 above). ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download