Carson College of Business



Featherman’s SQL Server Intermediate Programming ? - Table variablesTable variables are arrays that are used to produce analytics, to pull together data that is stubborn. A recurring problem with data retrieval is that data is in different tables, different databases, and at different levels of granularity. Further the data is in access files, excel files, notepad .csv or .txt files, other formats you never heard of, and relational databases. You may experience huge pain to merge the data. There are at least two common sure-fire SQL-based methodologies to merge data from different data sources; SSIS packages which can get very detailed, and Table variables (pulling data into arrays and co-mingling the compiled data). This document shows the table variables (inserting data into arrays) technique that usually works to bring different tabular data together into one resultset. OF course bringing data into one dataset allows dashboards to be easily generated and updated. A pre-requisite is that you need to have a column in common between the different tables, (probably a primary key) such as employeeID, productID, storeID, etc. and then you can use the content revealed here to pull columns of metrics to add to the base dimension information that you have (such as something about the employee or product or store). When you can’t figure a way to pull data together from different sources (i.e., the joins don’t work), then pull the data into an arrayTable Variables are most commonly referred to as arraytables, or arrays in this module. Some of these examples can be performed with sub-queries, making the use of table variables overkill. Sub-queries are deceptively powerful and there are more capabilities. This author has found situations however where sub-queries do not work. In every one of these cases such as the first query, arraytables did work. In essence you are just pulling data from different sources into one table to compact data for storage and further analysis. Often the compiled data is the datasource for a dashboard, othertimes the arraytable is the in-memory storage table to hold compiled data in the middle of an Extract, Transform and Load process.A table variable is literally a table created in memory, like an array. You specify the columns, their names and data types. Next you insert rows of data into the columns. This is loading the array with data. You can have local table variables which can be used only in the one query or stored procedure or global table variables which can be used in anywhere in other queries of the same database. Table variables or arrays have been used by programmers for over 50 years to transfer data from storage to working memory to perform data management and data crunching very quickly. RAM has always worked a lot faster than secondary storage. Arrays are used in coding, to store values, and as a place where you can manipulate and comingle values and build a structure of values. To improve data management and simplify organization of data assets, a DBA may want to build fewer more complex resultsets, each one serving as the datasource for many reporting needs (rather than build lots of queries, build a few!! Fewer stored procedures (saved queries) are easier to manage and update. With the explosion of reporting needs for modern corporate life, the resultsets that are needed are often very complex. So being a cautious scientist, the DBA will want to carefully build their resultset brick by brick (i.e., column by column) using either sub-queries which might work, or arrays which so far always work. This custom building of datasets will often require the merging of different aggregated data such that the required query results are not possible. In these cases the DBA can use temporary tables to allow the joining of two tables and thereby adding extra columns of aggregated data for either a field name in common (such as month or year) or for product numbers that are in one table but not the other. As a side note for smallish resultsets you can also perform this functionality using a Visual studio project using constructs and procedures (i.e., datatables (which are arrays), datasets (which are collections of arrays which can be joined together, derived columns, and expressions) or the SSIS packages as mentioned.USE Featherman_Analytics,DECLARE @SubCategory nvarchar(30)SELECT ProductKey, Sum(OrderQuantity) as [Total] FROM FactResellersalesWHERE ProductSubCategory = @SubCategoryGROUP BY ProductKeyBefore we jump into arraytables which is a grid of variables, let’s take our first look at declaring and using a single variable in SQL. What is a variable? A variable is a named piece of system memory (in RAM) that is created by, accessible and manipulated by SQL programs and the users of the SQL programs. The program shown to the left is common, this query allows one textual sub-category value (presumably from a dropdown list in an webpage, Excel, SSRS, or Tableau) to be passed into the query and used in the WHERE statement to filter the results retrieved from the database and returned to the program user.Later In this document we focus on a special type of variable called a table variable (which is special in that it can have many columns and rows). But let’s first look at creating a simple nvarchar variable that can accept a word or phrase. Most variables that programmers use can store one value, a number, a date or a word or set of words. In T-SQL use the DECLARE phrase to create a variable in the memory (RAM) of the development machine. The DECLARE line shown to the left creates a variable of datatype nvarchar that can accept up to 30 characters (letters or numbers). Here the intent is that another program such as a webpage or Excel dropdownlist passes in a word for the subcategory such as hats, or wheels.The idea is that this query would be ‘called’ or ‘invoked’ from Excel or a reporting software which passes in the value for the variable. This is a decades old common way to have a parameterized (variant) query that can provide different results depending on value passed in to the query. This is of course the same functionality that a slicer would provide.We can see that the variable is used in the WHERE clause to filter the results. So yes, variables are very useful in T-SQL programming. Since 2005 DBA’s and analysts have built reporting infrastructures connecting SSRS and SSMS. Before then webpages were routinely connected to SSMS.USE [Featherman_Analytics];SELECT * INTO Featherman_Analytics.dbo.ManagersListFROM [AdventureWorksDW2012].[dbo].[DimEmployee]WHERE [Title] LIKE '%manager%'In this module we will also use the concept of copying data from one data table into another, or from a data table on disk to a data table in memory (the famed arraytable or DBA’s call them table variables. In this document data is copied from a table or set of tables INTO an array. A similar concept is copying data from one table to another. You must be the owner of the database to be able create tables in the database to be able to insert/update/delete data. None of us can add tables to AdventureWorksDW2012.This code runs on the Featherman_Analytics database. The code creates the ManagersList table and inserts 20 rows of data into it from the DimEmployee table of AdventireWorksDW2012.This code creates a new table on a disk drive. In this module we will create virtual tables called an array and load data into it. Step 1 - declare the array tables and their columnsStep 2 – Select data using normal queries and load the data INTO the arraysStep 4 Select columns from the two array tables, add metricsStep 3 – join the arrays together on a common field DECLARE @ArrayTable1 Table (column1 int, column2 NVARCHAR(50))DECLARE @ArrayTable2 Table (column1 int, column2 VARCHAR(50))INSERT INTO @ArrayTable1SELECT Column1, Sum(Column2)FROM SomeTable GROUP BY Column1INSERT INTO @ArrayTable2SELECT Column1, Count(Column2) FROM SomeTableGROUP BY Column1SELECT columns and create metrics using columns from either array @ArrayTable1, or @ArrayTable2FROM @ArrayTable1AS T1, @ArrayTable2 AS T2WHERE T1.primarykeyfield = T2primarykeyfield--You can more safely replace the prior two lines that connect the table on a common filed with the following FULL JOIN FROM @temptable1 AS T1FULL JOIN @temptable2 AS T2Here is the basic procedure for creating arraytables (ie, table variable), loading them with data and producing analytics.This query shows one way to combine two tables of retrieved data when you surely have a common field of interest, the primary key. First you create two table variables (arrays) and then SELECT values and INSERT them INTO the table variables. Finally you use a final SELECT statement to pull data from the two table variables together and to create new columns of derived data so that the resultset has all the calculated or aggregate fields.The major benefit is that you can put data from different data sources into the same datastore and then make the metrics that you need (the columns of aggregated data) easily pulling data from either array.Here we just get the results from SSMS compiled into one table, and use it for charts, maps, reports and subsequent analysis. We could also take the merged data columns and save them into a database table or staging table (common when creating ETL procedures). We will see later that this last section where the table variables are joined, and the fields for display are selected from the table variables can vary and provide good solutions.The code above can be used WHEN YOU KNOW FO SURE that there is field in common and there are values in the field in both tables (such as month or city). This approach above is like an inner join in that values are displayed ONLY if there is a value in both tables (both side of the join). This first example is different in that it proves that you can just create an arraytable and load data into it. Sometimes you will want to pull compiled data together from different tables and databases into a compact location even though there is no primary key to connect the tables.USE [Featherman_Analytics];DECLARE @ChannelWidth TABLE([#SubCategories in Stores] int, [#SubCategories on Web] int)--insert the data retrieved below into the specified columns of the arrayINSERT INTO @ChannelWidth ([#SubCategories in Stores], [#SubCategories on Web])SELECT --this is the sub-query for the first column(SELECT(COUNT(DISTINCT(p.[Sub Category]))) FROM [dbo].[FactResellerSales] as rsINNER JOIN [dbo].[AW_Products_Flattened] as p ON p.[ProductKey]= rs.[ProductKey])--this is the sub-query for the second column, (SELECT(COUNT(DISTINCT(p.[Sub Category]))) FROM [dbo].[FactInternetSales] as fisINNER JOIN [dbo].[AW_Products_Flattened] as p ON p.[ProductKey]= fis.[ProductKey])--this is actually retrieving and showing the data SELECT * FROM @ChannelWidthTo simplify the query, the AW_Products_Flattened table is used form Featherman_Analytics Database.It is interesting to know how many product sub-categories are sold on each channel. I guess you can either do this analysis with brute force counting or you can write a SQL Query. Unless you write SQL often, this query might be considered brutal. This query uses sub-queries Again you define a table variable (a virtual table with two columns) to hold values from different queries, then you use two different SELECT statements that are sub-queries to select (INSERT) values INTO the virtual table.Notice there is no primary key field that is in common between these two tables. We just want to pull values from the tables and display them in a table! If we could write a loop we could pull the data by years into each rows (or maybe use a pivot query?) Recurring questions:It is worth recalling a recent assignment.What are the product categories that are sold via reseller that should be sold online? What products and sub categories are sold via resellers that should be sold via the Internet? What do you think goes into the decision? Shipping cost? Profit margin? As a manufacturer do we want to compete with ou resellers? What sub-category product lines that are already selling on the internet should be expanded?The next is a little more powerful query – Often when you do a join of tables you do so to add more columns of metrics or to make more complete lists. This query tackles a different problem, adding aggregate columns from a second table. One way to do this is to create two virtual tables with the results you want then to join the tables. So here we create the virtual tables, then select data into them, then finally pull the desired compiled data from the two virtual tables. This following query can be written with a fancy subquery, but the calculated fields are MUCH EASIER if you use array tables. And yes 2 array tables are needed.Step 1 – create the arrays and columnsStep 2 – load the data into the arrays766445351155Step 4 – now that the two arrays are loaded with data and joined in step 3, we can select the columns that we want from the two arrays, and create some new columns of metricsStep 3 – join the two arraysUSE [AdventureWorksDW2012];DECLARE @RSSales TABLE ([ProductKey] int, [Reseller Units] int) DECLARE @INETSales TABLE ([ProductKey] int,[Internet Units] int)INSERT INTO @RSSales ([ProductKey],[Reseller Units])SELECT [ProductKey], SUM([OrderQuantity]) FROM [dbo].[FactResellerSales]WHERE DATEPART(year, [OrderDate]) = 2007GROUP BY [ProductKey]INSERT INTO @INETSales ([ProductKey], [Internet Units])SELECT [ProductKey], SUM([OrderQuantity]) FROM [dbo].[FactInternetSales]WHERE DATEPART(year, [OrderDate]) = 2007 GROUP BY [ProductKey]-------------------------------------------------------------------------------SELECT rs.[ProductKey],rs.[Reseller Units] AS [Reseller Units], inet.[Internet Units] , (rs.[Reseller Units] + inet.[Internet Units]) AS [Total Units],format(100* rs.[Reseller Units]/ (rs.[Reseller Units] + inet.[Internet Units] * 1.0), 'N2') AS [% Reseller Channel],format(100* inet.[Internet Units]/ (rs.[Reseller Units] + inet.[Internet Units] * 1.0), 'N2') AS [% Internet Channel]FROM @RSSales rs, @INETSales inetWHERE rs.ProductKey = inet.ProductKeyORDER BY rs.ProductKeyHere you can see that the same product key which is an internal referent number used as the primary key (rather than the actual model # which is stored in the products table). First declare two virtual tables. Note that you need to define the field names and data types for each virtual table. It’s best to keep the column names short! We will load these columns with data, so that the data type corresponds to the data copied in.In step 2 - Here we select two columns of data from the factResellerSales table and insert the data into the virtual table. There are 266 rows of data copied into the virtual table, each row has two columnsIn step 2 - Here we select two columns of data from the factInternetSales table and insert the data into the virtual table. There are 133 rows of data copied into the virtual table, each row has two columnsHere we select 3 columns of data from the two virtual tables productkey, reseller units, Internet Units, Next we add 3 columns of derived data. Note that an alias for the virtual tables is needed (code in yellow) so that you can refer to the columns directly.Clearly the strength of using arraytables is the ease of creating calculated fields from the available columns. If each new metric column was an inner join, then the query would be much harder to write.Programmers challenge!The data could also be sliced by product sub-category in Excel – if you add the columns. Can you do it?USE [AdventureWorksDW2012];DECLARE @RSSales TABLE ([ProductKey] int, [Reseller Units] int) DECLARE @INETSales TABLE ([ProductKey] int,[Internet Units] int)INSERT INTO @RSSales ([ProductKey],[Reseller Units])SELECT [ProductKey], SUM([OrderQuantity]) AS [Total Sales]FROM [dbo].[FactResellerSales]WHERE DATEPART(year, [OrderDate]) = 2007GROUP BY [ProductKey]INSERT INTO @INETSales ([ProductKey], [Internet Units])SELECT [ProductKey], SUM([OrderQuantity]) AS [Total Sales]FROM [dbo].[FactInternetSales]WHERE DATEPART(year, [OrderDate]) = 2007 GROUP BY [ProductKey]SELECT rs.[ProductKey],rs.[Reseller Units] AS [Reseller Units], inet.[Internet Units] , (rs.[Reseller Units] + inet.[Internet Units]) AS [Total Units],format(100* rs.[Reseller Units]/ (rs.[Reseller Units] + inet.[Internet Units] * 1.0), 'N2') AS [% Reseller Channel],format(100* inet.[Internet Units]/ (rs.[Reseller Units] + inet.[Internet Units] * 1.0), 'N2') AS [% Internet Channel]FROM @RSSales AS rsFULL JOIN @INETSales AS inet ON rs.ProductKey = inet.ProductKeyWHERE rs.ProductKey IS NOT NULLORDER BY rs.ProductKey The prior example used a WHERE clause as a sort of INNER JOIN and ensures that you must have the same key field in both tables. As with any INNER JOIN, you will receive a subset of the total records, if there are records in one table without a corresponding set of values in the other table. This example uses a full outer join to get all the product sale data from either sales channel (whereas the prior query gets only the products that are sold on both channels). Try running the query with and without the final WHERE filter.As you see in the resultset below, we probably need to use the ISNULL() function to replace the NULLS with 0’s. Can you do it?USE [Featherman_Analytics];DECLARE @RSSales TABLE ([Sub Category] nvarchar(50),[Model] nvarchar(50) , [Product] nvarchar(50), [ProductKey] int, [Reseller Units] int) DECLARE @INETSales TABLE ([Sub Category] nvarchar(50), [Model] nvarchar(50) ,[Product] nvarchar(50), [ProductKey] int,[Internet Units] int)INSERT INTO @RSSales ([Sub Category], [Model], [Product],[ProductKey],[Reseller Units])SELECT [Sub Category], [Model], [Product], p.[ProductKey], SUM([OrderQuantity]) FROM [dbo].[FactResellerSales] as frsINNER JOIN [dbo].[AW_Products_Flattened] as p ON frs.ProductKey = p.ProductKeyWHERE DATEPART(year, [OrderDate]) = 2007 AND [Sub Category] NOT LIKE '%bike%'AND [Sub Category] NOT LIKE '%frame%'GROUP BY [Sub Category], [Model], [Product], p.[ProductKey]INSERT INTO @INETSales ([Sub Category], [Model],[Product], [ProductKey], [Internet Units])SELECT [Sub Category], [Model], [Product], p.[ProductKey], SUM([OrderQuantity]) FROM [dbo].[FactInternetSales] as fisINNER JOIN [dbo].[AW_Products_Flattened] as p ON fis.ProductKey = p.ProductKeyWHERE DATEPART(year, [OrderDate]) = 2007 AND [Sub Category] NOT LIKE '%bike%'AND [Sub Category] NOT LIKE '%frame%'GROUP BY [Sub Category], [Model], [Product], p.[ProductKey]SELECT rs.[Sub Category], rs.[Model], rs.[Product], rs.[ProductKey],rs.[Reseller Units] AS [Reseller Units], inet.[Internet Units] , (rs.[Reseller Units] + inet.[Internet Units]) AS [Total Units],format(100* rs.[Reseller Units]/ (rs.[Reseller Units] + inet.[Internet Units] * 1.0), 'N2')AS [% Reseller Channel],format(100* inet.[Internet Units]/ (rs.[Reseller Units] + inet.[Internet Units] * 1.0), 'N2') AS [% Internet Channel]FROM @RSSales AS rs FULL JOIN @INETSales AS inetON rs.ProductKey = inet.ProductKeyWHERE rs.ProductKey IS NOT NULLORDER BY [Total Units] DESC-3746569850004210022-1032600This is about the same query as above but expanded to add more descriptive information, Product Sub-category, Model, and Product. This is quite an improvement over just having the product ID #. You can see that the query can get quite long very fast! Can you also add the ISNULL() functionality to this query?To bring in dimension information (see the picture on the next page), the database was switched to take advantage of the AW_Products_Flattened table.The query is same as before except that the table variables have 5 columns now not just 2. And the SELECT statement used to fetch the data inserting into the table variable is a bit more complex in that it has its own INNER JOIN and more complex WHERE statement.The rest of the query is the same. Notice that you only receive 25 rows if you don’t use a FULL JOIN of the array tables as shown.It is still WONDERFUL to be able to create new columns of analytics building on columns from any of the arrays.USE [Featherman_Analytics];DECLARE @p1 integer = 2007DECLARE @p2 nvarchar(50) = '%bike%'DECLARE @RSSales TABLE ([Sub Category] nvarchar(50),[Model] nvarchar(50) , [Product] nvarchar(50), [ProductKey] int, [Reseller Units] int) DECLARE @INETSales TABLE ([Sub Category] nvarchar(50), [Model] nvarchar(50) ,[Product] nvarchar(50), [ProductKey] int,[Internet Units] int)INSERT INTO @RSSales ([Sub Category], [Model], [Product],[ProductKey],[Reseller Units])EXEC SPGetResellerUnits (@p1, @p2) INSERT INTO @INETSales ([Sub Category], [Model],[Product], [ProductKey], [Internet Units])EXEC SPGetInetUnits (@p1, @p2)SELECT rs.[Sub Category], rs.[Model], rs.[Product], rs.[ProductKey],rs.[Reseller Units] AS [Reseller Units], inet.[Internet Units] , (rs.[Reseller Units] + inet.[Internet Units]) AS [Total Units],format(100* rs.[Reseller Units]/ (rs.[Reseller Units] + inet.[Internet Units] * 1.0), 'N2')AS [% Reseller Channel],format(100* inet.[Internet Units]/ (rs.[Reseller Units] + inet.[Internet Units] * 1.0), 'N2') AS [% Internet Channel]FROM @RSSales rs, @INETSales inetWHERE rs.ProductKey = inet.ProductKeyORDER BY [Total Units] DESCPretty soon we will write queries similar to this. Do you see the difference?If the data is commonly needed, the analyst or DBA can make stored procedures and use them to break up big queries into more manageable modules. This compartmentalization of logic or breaking of big queries into smaller pieces is standard practice in programming communities.Here the values for the filters (2007 and not the bike subcategories) are specified in the query. More commonly they are passed in by the analytics program (SSRS, Excel).You call stored queries (which are store queries) by using the EXEC spName (parameter values.We will get here, and using stored procedures will simplify your development.Exercise #1: Let’s try it out. We learn by doing right? Can you pull the unit sales for bicycle sub-categories (or products) for each city in America? Using a table variable and sub-queries, show the sales for both the reseller channel and the internet channel for 2007. Expand or change the query as you see fit.Exercise #2: Chiara the CEO wants to penetrate the German market. They want to figure out whether to get their product into more stores or use the Internet channel more. Chiara requested you pull the sales for all the months 2007 and 2008 (so 24 rows) for each of the cities in Germany. Start by analyzing the total sales performance for the 24 months, then go on to the monthly sales analysis. Copy the data into Excel to prepare charts for your analysis and recommendation. Add percentage change formula columns to both levels of analysis (entire country and cities). Choose one or two cities to focus sales on and prepare your reasoning for marketing strategy (greater store presence such as sponsoring competitions) or greater internet presence (which might be supported by billboards, posters, etc.)USE [AdventureWorksDW2012];DECLARE @RSSales TABLE ( [ProductKey] varchar(25), [Product] varchar(50), [Total Reseller Unit Sales] int) DECLARE @INETSales TABLE ( [ProductKey] varchar(25), [Product] varchar(50), [Total Internet Unit Sales] int)INSERT INTO @RSSales ([ProductKey],[Product], [Total Reseller Unit Sales])SELECT [ProductAlternateKey], [EnglishProductName], SUM([OrderQuantity]) AS [Total Sales]FROM [dbo].[FactResellerSales] as rsINNER JOIN [dbo].[DimProduct] as p ON p.[ProductKey] = rs.[ProductKey]GROUP BY [ProductAlternateKey], [EnglishProductName]INSERT INTO @INETSales ([ProductKey],[Product],[Total Internet Unit Sales])SELECT [ProductAlternateKey], [EnglishProductName], SUM([OrderQuantity]) AS [Total Sales]FROM [dbo].[FactInternetSales] as InetINNER JOIN [dbo].[DimProduct] as p ON p.[ProductKey] = Inet.[ProductKey]GROUP BY [ProductAlternateKey], [EnglishProductName]SELECT rs.[ProductKey], rs.[Product],rs.[Total Reseller Unit Sales] AS [Reseller Units], inet.[Total Internet Unit Sales] AS [Internet Units], (rs.[Total Reseller Unit Sales] + inet.[Total Internet Unit Sales]) AS [Total Units],format(100* rs.[Total Reseller Unit Sales]/ (rs.[Total Reseller Unit Sales] + inet.[Total Internet Unit Sales] * 1.0), 'N2') AS [% Reseller Channel],format(100* inet.[Total Internet Unit Sales]/ (rs.[Total Reseller Unit Sales] + inet.[Total Internet Unit Sales] * 1.0), 'N2') AS [% Inet Channel]FROM @RSSales rs FULL JOIN @INETSales inet ON rs.ProductKey = inet.ProductKeyWHERE rs.ProductKey IS NOT NULLORDER BY rs.ProductKey This query is a little different than the prior:The product ID is replaced with a better indication of the part # and the name of the product is added. So the virtual tables now have three columns.Here we still use the FULL JOIN to make sure we get all the products from either sales channel (reseller and Internet).IF you look below you see there is a problem, that some of the products are not sold on the Internet. So none of the calculated fields will calculate properly.Scroll down to see the solution.--the first part is all the sameSELECT rs.[ProductKey], rs.[Product],rs.[Total Reseller Unit Sales] AS [Reseller Units], ISNULL(inet.[Total Internet Unit Sales], 0) AS [Internet Units], (rs.[Total Reseller Unit Sales] + ISNULL(inet.[Total Internet Unit Sales], 0)) AS [Total Units], format(100* rs.[Total Reseller Unit Sales]/ (ISNULL(rs.[Total Reseller Unit Sales],0) + ISNULL(inet.[Total Internet Unit Sales],0) * 1.0), 'N2') AS [% Reseller Channel],format(100* ISNULL(inet.[Total Internet Unit Sales], 0)/ (rs.[Total Reseller Unit Sales] + ISNULL(inet.[Total Internet Unit Sales], 0) * 1.0), 'N2') AS [% Inet Channel]FROM @RSSales rs FULL JOIN @INETSales AS inet ON rs.ProductKey = inet.ProductKeyWHERE rs.ProductKey IS NOT NULLORDER BY rs.ProductKeyFixing Problems with Null VariablesNotice the consistent use of the ISNULL(, 0) function. In case any of the columns could be null then display zeros.If some of the rows have not been assigned to a product key then they can show NULLS. Look at the final WHERE statement Another problem solved with table variables - In this next query demonstration you see another example of using a FULL JOIN in the final query to join the @table variables. Making sure you retrieve all the rows. Use a FULL JOIN (similar to UNION ALL) to ensure you retrieve all the rows from both tables, not only the rows that have a matching primary key value in both tables. This section adds the product category field from the dimProductCategory table, and shows that addition of several inner joins is possible. Yet another problem is solved seemingly simple but devastatingly difficult without table variables. Email your professor if you know an easier approach to solve this problem of combining aggregate values from different tables (here factInternetSales and factResellerSales). Try to recreate this functionality using pivot tables.USE [Featherman_Analytics];DECLARE @RSSales TABLE ([Product Category] varchar(50), [Reseller Units] int) DECLARE @INETSales TABLE ([Product Category] varchar(50), [Internet Units] int)INSERT INTO @RSSales ([Product Category], [Reseller Units])SELECT p.[Category], SUM([OrderQuantity]) AS [Total Sales]FROM [dbo].[FactResellerSales] as rsINNER JOIN [dbo].[AW_Products_Flattened] as p ON p.ProductKey = rs.ProductKeyGROUP BY p.[Category]INSERT INTO @INETSales ([Product Category], [Internet Units])SELECT p.[Category], SUM([OrderQuantity]) AS [Total Sales]FROM [dbo].[FactInternetSales] as rsINNER JOIN [dbo].[AW_Products_Flattened] as p ON p.ProductKey = rs.ProductKeyGROUP BY p.[Category]SELECT reseller.[Product Category], reseller.[Reseller Units], inet.[Internet Units], inet.[Internet Units] + reseller.[Reseller Units] as [Totals]FROM @RSSales AS reseller FULL JOIN @INETSales AS inet ON reseller.[Product Category] = inet.[Product Category] ORDER BY reseller.[Product Category]19059874250038104381500Here’s a similar example that condenses >120,000 rows of fact table data down to three columns.Think of the process as:Declaring two table arraysRunning two or more select statements as normal (but prefacing the select statement with an INSERT INTO statement to move the results into one of the table variables. You can add different data columns that are needed into either of the table variables. You just need a common field amongst the tables (that entity of interest such as employee, store, customer, product, month, etc.).A final SELECT statement pulls all the columns together and can add new columns of derived data now that you have access to columns of data from different data sourcesAnother problem solved with table variables - In this next query demonstration the group field is month. While you can replicate the first two columns with a PIVOT query, here the third column is an aggregated column from a different table. This is the solution.USE [AdventureWorksDW2012];DECLARE @RSSales TABLE ([Month] varchar(20), [Total Reseller Unit Sales] int) DECLARE @INETSales TABLE ([Month] varchar(20), [MonthNum] int, [Total Internet Unit Sales] int)INSERT INTO @RSSales ([Month], [Total Reseller Unit Sales])SELECT DATENAME(month, rs.OrderDate), SUM([OrderQuantity]) AS [Total Sales]FROM [dbo].[FactResellerSales] as rsGROUP BY DATENAME(month, rs.OrderDate)INSERT INTO @INETSales ([Month], [MonthNum], [Total Internet Unit Sales])SELECT DATENAME(month, inet.OrderDate), DATEPART(month, inet.OrderDate), SUM([OrderQuantity]) AS [Total Sales]FROM [dbo].[FactInternetSales] as inetGROUP BY DATENAME(month, inet.OrderDate), DATEPART(month, inet.OrderDate)SELECT inet.[Month], FORMAT(reseller.[Total Reseller Unit Sales], 'N0') [Total Reseller Unit Sales], FORMAT (inet.[Total Internet Unit Sales], 'N0') [Total Internet Unit Sales], FORMAT (reseller.[Total Reseller Unit Sales] + inet.[Total Internet Unit Sales], 'N0') As [Total Sales]FROM @RSSales AS reseller FULL JOIN @INETSales AS inetON reseller.[Month]= inet.[Month]ORDER BY [MonthNum]These results could easily be improved by passing in a parameter for product sub-category or product category.This is a different example that pulls data by month from two different tables into one resultset. The secon table variable includes and extra field so that the results can be sorted by Month #.-271357700Note the base code is altered to remove product category information and replace it with a month column. Also notice the safer FULL JOIN is used.The factInterenetsales table has sales in all 12 months so that table was chosen to pull the month number.Can you merge the % of total syntax with this table variable syntax! I bet you can!Another problem solved with table variables – the total sales booked for each sales rep is compared to the sales quota set for that sales rep. This query then is the basis for a dashboard that measures business performance to goal. Combining the data for employee over time is one way to look at the data, another is to look at each time period and rank/rate the sales rep performance within that time period. USE [AdventureWorksDW2012];DECLARE @Quota TABLE ([EmployeeKey] int, [Name] varchar(75), [CalendarYear] int, [CalendarQuarter] int, [YearQtr] varchar, [SalesAmountQuota] decimal)DECLARE @Actual TABLE ([EmployeeKey] int, [CalendarYear] int, [CalendarQuarter] int,[YearQtr] varchar, [SalesAmountActual] decimal)INSERT INTO @Quota ([EmployeeKey], [Name], [CalendarYear], [CalendarQuarter], [SalesAmountQuota] )SELECT q.[EmployeeKey], CONCAT([FirstName], ' ', [LastName]) as Name, [CalendarYear], [CalendarQuarter] , [SalesAmountQuota]FROM [dbo].[FactSalesQuota] as qINNER JOIN [dbo].[DimEmployee] as e ON e.[EmployeeKey] = q.[EmployeeKey]INSERT INTO @Actual ([EmployeeKey], [CalendarYear], [CalendarQuarter], [SalesAmountActual])SELECT [EmployeeKey], DATEPART(year, [OrderDate]), DATEPART(quarter, [OrderDate]) , SUM([SalesAmount]) FROM [dbo].[FactResellerSales]GROUP BY [EmployeeKey], DATEPART(year, [OrderDate]), DATEPART(quarter, [OrderDate]) SELECT q.[EmployeeKey] AS [EmpID], q.Name, q.[CalendarYear] as [Year], q.[CalendarQuarter] as [Qtr], CONCAT( a.[CalendarYear], ' Q', a.[CalendarQuarter]) AS [Time Period], a.SalesAmountActual as [Recorded Sales] ,q.[SalesAmountQuota] as [Sales Quota],FORMAT((a.SalesAmountActual / q.[SalesAmountQuota]), 'N2') as SalesKPI, (a.SalesAmountActual - q.[SalesAmountQuota]) as DeltaFROM @Quota q JOIN @Actual a ON CONCAT(q.[CalendarYear], ' Q', q.[CalendarQuarter]) = CONCAT( a.[CalendarYear], ' Q', a.[CalendarQuarter])AND q.EmployeeKey = a.EmployeeKeyORDER BY a.[EmployeeKey], a.[CalendarYear], a.[CalendarQuarter]31758128000The sales quotas are in one table, the actual sales are in another table (and have to be aggregated) and the employee names are in a third table.This is a tricky problem and you can see that the code gets long. The first sections are standard and handle the field retrieval, the row calculations, the data transformation within each dataset, and the GROUP By aggregations and other aggregations (i.e. PIVOT())The final SELECT statement is where if you have all the columns you need, then you can perform data comparisons, etc. That the columns of data came from very different tables is irrelevant, the data can be combined and analyzed.Here an inner join is used. Be careful and look at your data to see if you need to use a left join or full join to ensure you retrieve all the records needed.Here is an extended example that researches the products and product sub categories that are sold on the different sales channels in the hopes to identify what additional product sub-categories and products can be sold at profit on the Internet channel. While many even difficult requests for information can be solved with GROUP BY, some can only be solved with sub queries and table variables. Don’t worry about detailed learning of the table variables until later.USE [AdventureWorksDW2012];SELECT (COUNT(DISTINCT([ProductKey]))) AS [# SKU's Reseller Channel]FROM [dbo].[FactResellerSales]SELECT (COUNT(DISTINCT([ProductKey]))) AS [# SKU's Intnernet Channel]FROM [dbo].[FactInternetSales]Here we can run two different queries and see two different results in two different query windows. The bike company we have been commissioned to examine sells a lot more product SKU’s on the reseller channel than via the Internet. Management wants us to recommend 25 to 100 products that should be added to the Internet channel. The expense for creating additional web pages in the web catalog is considerable as the web design team is not fluent with dynamic webpages (i.e, they will have to create one new web page for each product added to the website, rather than use templates that fill in data and pictures from a database)USE [AdventureWorksDW2012];DECLARE @ChannelWidth TABLE([Reseller Channel #SKU's] int, [Internet Channel #SKU's] int)INSERT INTO @ChannelWidth ( [Reseller Channel #SKU's], [Internet Channel #SKU's] )SELECT (SELECT(COUNT(DISTINCT([ProductKey]))) FROM [dbo].[FactResellerSales]), (SELECT(COUNT(DISTINCT([ProductKey]))) FROM [dbo].[FactInternetSales])SELECT * FROM @ChannelWidthHere the results are appended to one table using a table variable which is a virtual table (i.e., array). Much more on this later but first a table variable (a piece of system memory) is created then data is selected and inserted into it.Notice the two sub-queries are wrapped in their own parentheses, each is generating the value for inserting into the column, both sub-queries together make up the one row.Each sub-query counts up the # product ID’s that have been sold on each sales channel. USE [AdventureWorksDW2012];DECLARE @Year1 int = 2006DECLARE @Year2 int = 2007DECLARE @ChannelWidth TABLE([Reseller Channel #SKU's] int , [Reseller Channel Revenue] nvarchar(30), [Internet Channel #SKU's] int, [Internet Channel Revenue] nvarchar(30))INSERT INTO @ChannelWidth ([Reseller Channel #SKU's], [Reseller Channel Revenue], [Internet Channel #SKU's], [Internet Channel Revenue])SELECT (SELECT (COUNT(DISTINCT(rs.[ProductKey]))) FROM [dbo].[FactResellerSales] as rs WHERE YEAR(rs.[OrderDate]) = @Year1), (SELECT FORMAT((SUM(rs.[SalesAmount])), 'N0') FROM [dbo].[FactResellerSales] as rs WHERE YEAR(rs.[OrderDate]) = @Year1 GROUP BY YEAR(rs.[OrderDate]) ), (SELECT (COUNT(DISTINCT(fis.[ProductKey]))) FROM [dbo].[FactInternetSales] as fis WHERE YEAR(fis.[OrderDate]) = @Year1), (SELECT FORMAT((SUM(fis.[SalesAmount])), 'N0') FROM [dbo].[FactInternetSales] as fis WHERE YEAR(fis.[OrderDate]) = @Year1 GROUP BY YEAR(fis.[OrderDate]) )--here we insert the second rowINSERT INTO @ChannelWidth ([Reseller Channel #SKU's], [Reseller Channel Revenue], [Internet Channel #SKU's], [Internet Channel Revenue])SELECT (SELECT (COUNT(DISTINCT(rs.[ProductKey]))) FROM [dbo].[FactResellerSales] as rs WHERE YEAR(rs.[OrderDate]) = @Year2), (SELECT FORMAT((SUM(rs.[SalesAmount])), 'N0') FROM [dbo].[FactResellerSales] as rs WHERE YEAR(rs.[OrderDate]) = @Year2 GROUP BY YEAR(rs.[OrderDate]) ), (SELECT (COUNT(DISTINCT(fis.[ProductKey]))) FROM [dbo].[FactInternetSales] as fis WHERE YEAR(fis.[OrderDate]) = @Year2), (SELECT FORMAT((SUM(fis.[SalesAmount])), 'N0') FROM [dbo].[FactInternetSales] as fis WHERE YEAR(fis.[OrderDate]) = @Year2 GROUP BY YEAR(fis.[OrderDate]) )--now that the rows of data are inserted, display the dataSELECT * FROM @ChannelWidthThis query can be turned into a stored procedure that can accept 2 years as criteria, to facilitate a comparison of channel width and channel revenue.It is a recurring problem to decide how amny products should be in retail stores (physical or virtual).This query is the same as prior except that it has more columns and two rows.If you write this type of query you have to remember that each successive sub-query is wrapped in its own parentheses and separated by commas.SELECT SUM() always need a GROUP BY ColumnSummarySubqueries are used to retrieve and format data into columns. If each column requires a different level of aggregation or a different filtering, or different joins you can use sub queries to collocate data into columns. Table variable are another amazing tool used to pull together data. ReferencesOver clause - functions - 3 - Blog - of total - of total with group - ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download