Washington State University



Featherman’s T-SQL Adventures ? - Array Tables (aka SQL Table Variables)Array tables are used in SQL to build datasets. In SQL arrays are used similarly to how they are used in other programming environments. Table variables or arrays have been used by programmers for over 50 years to transfer data from storage to working memory to perform data management and data crunching very quickly. RAM has always worked a lot faster than secondary storage. Arrays are used in coding, to store values, and as a place where you can manipulate and comingle values and build a structure of values. Arrays tables are used to hold data. Arrays can be used to hold rows and columns of analytics, and because they are malleable, they can be used to pull data together when other methodologies do not work.A recurring reporting problem is that data is needed at different levels of granularity, for example a final column with totals, or a row of grand totals. You can easily add this aggregated data, in reporting software, or you can provide the columns in the dataset. Often several layers of aggregation are needed to compare data sections.Array tables excel at allowing you to pull data together, co-mingling data from different sources, into a compiled dataset that is ready for analysis. Bringing disparate data into one organized dataset allows dashboards and reports to be easily generated and updated. A pre-requisite is that you need to have a dimension attribute (column) in common between the different source data tables. The field in common will usually be a primary key such as employeeID, productID, storeID, but can also be city, state, or country (for example when compiling econometrics data). When you can’t figure a way to pull data together from different sources (i.e., the joins don’t work, and neither does a sub-query), then pull the data into an array table.Some of these examples demonstrated here to explain array tables can also be performed with sub-queries. Sub-queries are deceptively powerful and provide more capability than shown. This author has found situations however, where sub-queries do build the dataset. When using array tables, in essence you are just pulling data from different sources into one table to compact data for storage and further analysis, just be sure that the data columns all relate to the same dimension attribute of interest (such as employeeID). Often the compiled data is the datasource for a dashboard, other times the arraytable is the in-memory storage table to hold compiled data in the middle of an Extract, Transform and Load process. For example, array tables can hold data that is culled from different sources, and this array table (aka resultset) can be written to a new permanent SQL database table. The newly compiled database table can then be merged with other data in an iterative ETL process of data culling, formatting and combining. Array tables then are your canvas and you the emerging DBA can paint your masterpiece dataset onto them.A table variable is literally an array created in memory in the shape of a table (either wide, or tall or both). You specify the columns, their names and data types. Next you run a SELECT statement that specifies the data for the columns, and to to insert rows of data into the array. Table variables which can be used only in the one query or stored procedure or global table variables which can be used in anywhere in other queries of the same database.To improve data management and simplify organization of data assets, a DBA may want to build fewer more complex resultsets, each one serving as the datasource for many reporting needs (rather than build lots of queries, build a few that are multi-purpose!!) Fewer stored procedures (saved queries) are easier to manage and update. Using arrays, the DBA can carefully build their resultset brick by brick (i.e., column by column). This custom building of datasets will often require the merging of different aggregated data such that the required query results are not possible. In these cases the DBA can use temporary tables to allow the joining of two tables and thereby adding extra columns of aggregated data for either a field name in common (such as month or year) or for product numbers that are in one table but not the other. USE [Featherman_Analytics];DECLARE @SubCategory nvarchar(30) = 'Mountain Bikes'SELECT [Model], Sum(OrderQuantity) as [Total] FROM FactResellersales as sInner JOIN [dbo].[AW_Products_Flattened] as pON p.ProductKey = s.ProductKeyWHERE [Sub Category] = @SubCategoryGROUP BY [Model]When this query is turned into a stored procedure, the value for the @subCategory variable will not be provided. Rather that value will be passed in when the query is run.For example if the query is saved into a stored procedure called spUnitsForSubCategory then you would run the query and retrieve te data by typingEXEC spUnitsForSubCategory 'Mountain Bikes'-1905317500Before we jump into arraytables which is a grid of variables, let’s take another look at declaring and using a single variable in SQL. What is a variable? A variable is a named piece of system memory (in RAM) that is created by, accessible and manipulated by SQL programs and the users of the SQL programs. The DECLARE @variablename datatype is the way to create a variable. You can set the value of the variable immediately or use a SET command later on. The program shown to the left allows one textual sub-category value. Here we show th equery in test mode (by giving a value to the variable). Future usage of the query will receive values (from a dropdown list in an webpage, Excel, SSRS, or Tableau project) passed in and used in the WHERE statement to filter the results retrieved from the database.Next in this document we focus on a special type of variable called a table variable (which is special in that it can have many columns and rows), not just one single parcel of info.But let’s first look at creating a simple nvarchar variable that can accept a word or phrase. Most variables that programmers use can store one value, a number, a date or a word or set of words. In T-SQL use the DECLARE phrase to create a variable in the memory (RAM) of the development machine. The DECLARE line shown to the left creates a variable of datatype nvarchar that can accept up to 30 characters (letters or numbers). The idea is that this query would be ‘called’ or ‘invoked’ from Excel or a reporting software which passes in the value for the variable. This is called a parameterized query and can provide different results depending on value passed into the query, similar to a slicer.We can see that the variable is used in the WHERE clause to filter the results. USE [Featherman_Analytics];SELECT * INTO Featherman_Analytics.dbo.ManagersListFROM [AdventureWorksDW2012].[dbo].[DimEmployee]WHERE [Title] LIKE '%manager%'In this module we will also use the concept of copying data from one SQL Server database table into another, or from a database table from a cloudDB to a data table in memory. In this document data is copied from a table or set of tables INTO an array, and used for reporting. You can also perform data management and shaping, comingling data into one array that is saved into a table. You must be the owner of the database to be able create tables in the database to be able to insert/update/delete data. This code runs on the Featherman_Analytics database. The code creates the ManagersList table and inserts 20 rows of data into it from the DimEmployee table of AdventireWorksDW2012.This code creates a new table on a disk drive. In this module we will create virtual tables called arrays and load data into them. Step 1 - declare the array tables and their columnsStep 2 – Select data using normal queries and load the data INTO the arraysStep 3 Select columns from the two array tables, add metricsStep 4 – join the arrays together on a common field DECLARE @ArrayTable1 Table (column1 int, column2 Decimal)DECLARE @ArrayTable2 Table (column1 int, column2 Decimal)INSERT INTO @ArrayTable1SELECT Column1, Sum(Column2)FROM SomeTable GROUP BY Column1INSERT INTO @ArrayTable2SELECT Column1, Count(Column2) FROM SomeTableGROUP BY Column1SELECT columns and create metrics using columns from either array @ArrayTable1, or @ArrayTable2FROM @ArrayTable1AS T1, @ArrayTable2 AS T2WHERE T1.primarykeyfield = T2primarykeyfield--You can more safely replace the prior two lines that connect the table on a common field with the following FULL JOIN FROM @temptable1 AS T1FULL JOIN @temptable2 AS T2Here is the basic process for creating arraytables, loading them with data and producing analytics.This query shows one way to combine two tables of retrieved data when you surely have a common field of interest, the primary key. First you create arrays and then SELECT values and INSERT the retrieved data INTO the arrays. Finally you use a final SELECT statement to pull data from the two table variables together and to create new columns of derived data so that the resultset has all the calculated or aggregate fields.The major benefit is that you can put data from different data sources into the same array and then make the columns metrics that you need using data from the different sources, now that it is in your array.Here we just get the results from SSMS compiled into one table, and use it for charts, maps, reports and subsequent analysis. We could also take the merged data columns and save them into a database table or staging table (common when creating ETL procedures). We will see later that this last section where the table variables are joined, and the fields for display are selected from the table variables can vary and provide good solutions.The code above can be used WHEN YOU KNOW FOR SURE that there is an attribute field in common and there are values in the field in both tables (such as month, or city). The WHERE statement is an inner join in that values are displayed ONLY if there is a value in both tables (both side of the join). The blue code shows the way to show the data whether there is a match or not in the other table.This first example is different in that it proves that you can just create an arraytable and load data into it. Sometimes you will want to pull compiled data together from different tables and databases into a compact location even though there is no primary key to connect the tables.USE [Featherman_Analytics];DECLARE @ChannelWidth TABLE([#SubCategories in Stores] int, [#SubCategories on Web] int)--insert the data retrieved below into the specified columns of the arrayINSERT INTO @ChannelWidth ([#SubCategories in Stores], [#SubCategories on Web])SELECT --this is the sub-query for the first column(SELECT(COUNT(DISTINCT(p.[Sub Category]))) FROM [dbo].[FactResellerSales] as rsINNER JOIN [dbo].[AW_Products_Flattened] as p ON p.[ProductKey]= rs.[ProductKey])--this is the sub-query for the second column, (SELECT(COUNT(DISTINCT(p.[Sub Category]))) FROM [dbo].[FactInternetSales] as fisINNER JOIN [dbo].[AW_Products_Flattened] as p ON p.[ProductKey]= fis.[ProductKey])--this is actually retrieving and showing the data SELECT * FROM @ChannelWidthTo simplify the query, the AW_Products_Flattened table is used form Featherman_Analytics Database.It is interesting to know how many product sub-categories are sold on each channel. I guess you can either do this analysis with brute force counting or you can write a SQL Query. Unless you write SQL often, this query might be considered brutal. This query uses sub-queries .Again you define a table variable (a virtual table with two columns) to hold values from different queries, then you use two different SELECT statements that are sub-queries to select (INSERT) values INTO the virtual table.Notice there is no primary key field that is in common between these two tables. We just want to pull values from the tables and display them in a table. Its just a baby step but an important one.You can put any values you want columns of an array, they can be totally different. AS we shall see usually you want to pull data columns about some attribute of interest, so your rows are used to oprganize the data. For example if your first column is US state, you could pull different data into columns for each row (each state).The next is a little more powerful query – Often when you do a join of tables you do so to add more columns of metrics or to make more complete lists. This query tackles a different problem, adding aggregate columns from a second table. One way to do this is to create two virtual tables with the results you want then to join the tables. So here we create the virtual tables, then select data into them, then finally pull the desired compiled data from the two virtual tables. This following query can be written with a fancy subquery, but the calculated fields are MUCH EASIER if you use array tables. And yes 2 array tables are needed.Step 1 – create the arrays and columnsStep 2 – load the data into the arrays766445351155Step 3 – now that the two arrays are loaded with data and joined, we can select the columns that we want from the two arrays, and create some new columns of metricsStep 4 – join the two arraysUSE [AdventureWorksDW2012];DECLARE @RSSales TABLE ([ProductKey] int, [Reseller Units] int) DECLARE @INETSales TABLE ([ProductKey] int,[Internet Units] int)INSERT INTO @RSSales ([ProductKey],[Reseller Units])SELECT [ProductKey], SUM([OrderQuantity]) FROM [dbo].[FactResellerSales]WHERE DATEPART(year, [OrderDate]) = 2007GROUP BY [ProductKey]INSERT INTO @INETSales ([ProductKey], [Internet Units])SELECT [ProductKey], SUM([OrderQuantity]) FROM [dbo].[FactInternetSales]WHERE DATEPART(year, [OrderDate]) = 2007 GROUP BY [ProductKey]-------------------------------------------------------------------------------SELECT rs.[ProductKey],rs.[Reseller Units], inet.[Internet Units] , (rs.[Reseller Units] + inet.[Internet Units]) AS [Total Units],format(100* rs.[Reseller Units]/ (rs.[Reseller Units] + inet.[Internet Units] * 1.0), 'N2') AS [% Reseller Channel],format(100* inet.[Internet Units]/ (rs.[Reseller Units] + inet.[Internet Units] * 1.0), 'N2') AS [% Internet Channel]FROM @RSSales rs, @INETSales inetWHERE rs.ProductKey = inet.ProductKeyORDER BY rs.ProductKeyHere you can see that the same product key is used for both sales channels. The product key is an internal referent number used as the primary key (rather than the actual model # which is stored in the products table). First declare two virtual tables defining the field names and data types for each virtual table. It’s best to keep the column names short! We will load these columns with data, so that the data type corresponds to the data copied in.In step 2 - Here we select two columns of data from the factResellerSales table and insert the data into the virtual table. There are 266 rows of data copied into the virtual table, each row has two columnsIn step 2 - Here we select two columns of data from the factInternetSales table and insert the data into the virtual table. There are 133 rows of data copied into the virtual table, each row has two columnsHere we select 3 columns of data from the two virtual tables productkey, reseller units, Internet Units, Next we add 3 columns of derived data. Note that an alias for the virtual tables is needed (code in yellow) so that you can refer to the columns directly. NOTE: A CAST() operation can also be used rather than the * 100 trick to make the division work.Clearly the strength of using arraytables is the ease of creating calculated fields from the available columns. If each new metric column was an inner join, then the query would be much harder to write.Programmers challenge!The data could also be sliced by product sub-category in Excel – if you add the columns. Can you do it?USE [AdventureWorksDW2012];DECLARE @RSSales TABLE ([ProductKey] int, [Reseller Units] int) DECLARE @INETSales TABLE ([ProductKey] int,[Internet Units] int)INSERT INTO @RSSales ([ProductKey],[Reseller Units])SELECT [ProductKey], SUM([OrderQuantity]) AS [Total Sales]FROM [dbo].[FactResellerSales]WHERE DATEPART(year, [OrderDate]) = 2007GROUP BY [ProductKey]INSERT INTO @INETSales ([ProductKey], [Internet Units])SELECT [ProductKey], SUM([OrderQuantity]) AS [Total Sales]FROM [dbo].[FactInternetSales]WHERE DATEPART(year, [OrderDate]) = 2007 GROUP BY [ProductKey]SELECT rs.[ProductKey],rs.[Reseller Units] AS [Reseller Units], inet.[Internet Units] , (rs.[Reseller Units] + inet.[Internet Units]) AS [Total Units],format(100* rs.[Reseller Units]/ (rs.[Reseller Units] + inet.[Internet Units] * 1.0), 'N2') AS [% Reseller Channel],format(100* inet.[Internet Units]/ (rs.[Reseller Units] + inet.[Internet Units] * 1.0), 'N2') AS [% Internet Channel]FROM @RSSales AS rsFULL JOIN @INETSales AS inet ON rs.ProductKey = inet.ProductKeyWHERE rs.ProductKey IS NOT NULLORDER BY rs.ProductKey We should be passing in the year as a variable.The prior example used a WHERE clause INNER JOIN which requires that you must have the same key field in both tables. As with any INNER JOIN, you will receive a subset of the total records, if there are records in one table without a corresponding set of values in the other table. This example uses a full join to get all the product sale data from either sales channel (whereas the prior query gets only the products that are sold on both channels). Try running the query with and without the final WHERE filter.As you see in the resultset below, we probably need to use the ISNULL() function to replace the NULLS with 0’s. Can you do it?USE [Featherman_Analytics];DECLARE @RSSales TABLE ([Sub Category] nvarchar(50),[Model] nvarchar(50) , [Product] nvarchar(50), [ProductKey] int, [Reseller Units] int) DECLARE @INETSales TABLE ([Sub Category] nvarchar(50), [Model] nvarchar(50) ,[Product] nvarchar(50), [ProductKey] int,[Internet Units] int)INSERT INTO @RSSales ([Sub Category], [Model], [Product],[ProductKey],[Reseller Units])SELECT [Sub Category], [Model], [Product], p.[ProductKey], SUM([OrderQuantity]) FROM [dbo].[FactResellerSales] as frsINNER JOIN [dbo].[AW_Products_Flattened] as p ON frs.ProductKey = p.ProductKeyWHERE DATEPART(year, [OrderDate]) = 2007 AND [Sub Category] NOT LIKE '%bike%'AND [Sub Category] NOT LIKE '%frame%'GROUP BY [Sub Category], [Model], [Product], p.[ProductKey]INSERT INTO @INETSales ([Sub Category], [Model],[Product], [ProductKey], [Internet Units])SELECT [Sub Category], [Model], [Product], p.[ProductKey], SUM([OrderQuantity]) FROM [dbo].[FactInternetSales] as fisINNER JOIN [dbo].[AW_Products_Flattened] as p ON fis.ProductKey = p.ProductKeyWHERE DATEPART(year, [OrderDate]) = 2007 AND [Sub Category] NOT LIKE '%bike%'AND [Sub Category] NOT LIKE '%frame%'GROUP BY [Sub Category], [Model], [Product], p.[ProductKey]SELECT rs.[Sub Category], rs.[Model], rs.[Product], rs.[ProductKey],rs.[Reseller Units] AS [Reseller Units], inet.[Internet Units] , (rs.[Reseller Units] + inet.[Internet Units]) AS [Total Units],format(100* rs.[Reseller Units]/ (rs.[Reseller Units] + inet.[Internet Units] * 1.0), 'N2')AS [% Reseller Channel],format(100* inet.[Internet Units]/ (rs.[Reseller Units] + inet.[Internet Units] * 1.0), 'N2') AS [% Internet Channel]FROM @RSSales AS rs FULL JOIN @INETSales AS inetON rs.ProductKey = inet.ProductKeyWHERE rs.ProductKey IS NOT NULLORDER BY [Total Units] DESC-3746569850004210022-1032600This is about the same query as above but expanded to add more descriptive information, Product Sub-category, Model, and Product. This is quite an improvement over just having the product ID #. You can see that the query can get quite long very fast! Can you also add the ISNULL() functionality to this query?To bring in dimension information (see the picture on the next page), the database was switched to take advantage of the AW_Products_Flattened table.The query is same as before except that the table variables have 5 columns now not just 2. And the SELECT statement used to fetch the data inserting into the table variable is a bit more complex in that it has its own INNER JOIN and more complex WHERE statement.The rest of the query is the same. Notice that you only receive 25 rows if you don’t use a FULL JOIN of the array tables as shown.It is still WONDERFUL to be able to create new columns of analytics building on columns from any of the arrays.USE [Featherman_Analytics];DECLARE @p1 integerDECLARE @p2 nvarchar(50)DECLARE @RSSales TABLE ([Sub Category] nvarchar(50),[Model] nvarchar(50) , [Product] nvarchar(50), [ProductKey] int, [Reseller Units] int) DECLARE @INETSales TABLE ([Sub Category] nvarchar(50), [Model] nvarchar(50) ,[Product] nvarchar(50), [ProductKey] int,[Internet Units] int)INSERT INTO @RSSales ([Sub Category], [Model], [Product],[ProductKey], [Reseller Units])EXEC SPGetResellerUnits (@p1, @p2) INSERT INTO @INETSales ([Sub Category], [Model], [Product], [ProductKey], [Internet Units])EXEC SPGetInetUnits (@p1, @p2)SELECT rs.[Sub Category], rs.[Model], rs.[Product], rs.[ProductKey],rs.[Reseller Units] AS [Reseller Units], inet.[Internet Units] , (rs.[Reseller Units] + inet.[Internet Units]) AS [Total Units]FROM @RSSales rs, @INETSales inetWHERE rs.ProductKey = inet.ProductKeyORDER BY [Total Units] DESCPretty soon we will write queries similar to this. Do you see the difference in blue?If the data is commonly needed, the analyst or DBA can make stored procedures and use them to break up big queries into more manageable modules. This compartmentalization of logic or breaking of big queries into smaller pieces is standard practice for programmers.Here the values for the variables @p1, @p2 are passed in by the analytics or reporting program (SSRS, Excel, rtc.).You call stored queries (which are store queries) by using the EXEC spName (parameter values).We will get here, and using stored procedures will simplify query development.Exercise #1: Chiara the CEO wants to penetrate the German market. They want to figure out whether to get their product into more stores or use the Internet channel more. Chiara requested you pull the sales for all the months 2007 and 2008 (so 24 rows) for each of the cities in Germany. Start by analyzing the total sales performance for the 24 months, then go on to the monthly sales analysis. Copy the data into Excel to prepare charts for your analysis and recommendation. Add percentage change formula columns to both levels of analysis (entire country and cities). Choose one or two cities to focus sales on and prepare your reasoning for marketing strategy (greater store presence such as sponsoring competitions) or greater internet presence (which might be supported by billboards, posters, etc.)USE [AdventureWorksDW2012];DECLARE @RSSales TABLE ( [ProductKey] varchar(25), [Product] varchar(50), [Total Reseller Unit Sales] int) DECLARE @INETSales TABLE ( [ProductKey] varchar(25), [Product] varchar(50), [Total Internet Unit Sales] int)INSERT INTO @RSSales ([ProductKey],[Product], [Total Reseller Unit Sales])SELECT [ProductAlternateKey], [EnglishProductName], SUM([OrderQuantity]) AS [Total Sales]FROM [dbo].[FactResellerSales] as rsINNER JOIN [dbo].[DimProduct] as p ON p.[ProductKey] = rs.[ProductKey]GROUP BY [ProductAlternateKey], [EnglishProductName]INSERT INTO @INETSales ([ProductKey],[Product],[Total Internet Unit Sales])SELECT [ProductAlternateKey], [EnglishProductName], SUM([OrderQuantity]) AS [Total Sales]FROM [dbo].[FactInternetSales] as InetINNER JOIN [dbo].[DimProduct] as p ON p.[ProductKey] = Inet.[ProductKey]GROUP BY [ProductAlternateKey], [EnglishProductName]SELECT rs.[ProductKey], rs.[Product],rs.[Total Reseller Unit Sales] AS [Reseller Units], inet.[Total Internet Unit Sales] AS [Internet Units], (rs.[Total Reseller Unit Sales] + inet.[Total Internet Unit Sales]) AS [Total Units],format(100* rs.[Total Reseller Unit Sales]/ (rs.[Total Reseller Unit Sales] + inet.[Total Internet Unit Sales] * 1.0), 'N2') AS [% Reseller Channel],format(100* inet.[Total Internet Unit Sales]/ (rs.[Total Reseller Unit Sales] + inet.[Total Internet Unit Sales] * 1.0), 'N2') AS [% Inet Channel]FROM @RSSales rs FULL JOIN @INETSales inet ON rs.ProductKey = inet.ProductKeyWHERE rs.ProductKey IS NOT NULLORDER BY rs.ProductKey This query is a little different than the prior:The product ID is replaced with a better indication of the part # and the name of the product is added. So the virtual tables now have three columns.Here we still use the FULL JOIN to make sure we get all the products from either sales channel (reseller and Internet).IF you look below you see there is a problem, that some of the products are not sold on the Internet. So none of the calculated fields will calculate properly.Scroll down to see the solution.--the first part is all the sameSELECT rs.[ProductKey], rs.[Product],rs.[Total Reseller Unit Sales] AS [Reseller Units], ISNULL(inet.[Total Internet Unit Sales], 0) AS [Internet Units], (rs.[Total Reseller Unit Sales] + ISNULL(inet.[Total Internet Unit Sales], 0)) AS [Total Units], format(100* rs.[Total Reseller Unit Sales]/ (ISNULL(rs.[Total Reseller Unit Sales],0) + ISNULL(inet.[Total Internet Unit Sales],0) * 1.0), 'N2') AS [% Reseller Channel],format(100* ISNULL(inet.[Total Internet Unit Sales], 0)/ (rs.[Total Reseller Unit Sales] + ISNULL(inet.[Total Internet Unit Sales], 0) * 1.0), 'N2') AS [% Inet Channel]FROM @RSSales rs FULL JOIN @INETSales AS inet ON rs.ProductKey = inet.ProductKeyWHERE rs.ProductKey IS NOT NULLORDER BY rs.ProductKeyFixing Problems with Null VariablesNotice the consistent use of the ISNULL(, 0) function. In case any of the columns could be null then display zeros.If some of the rows have not been assigned to a product key then they can show NULLS. Look at the final WHERE statement Another problem solved with table variables - In this next query demonstration you see another example of using a FULL JOIN in the final query to join the @table variables. Making sure you retrieve all the rows. Use a FULL JOIN to ensure you retrieve all the rows from both tables, not only the rows that have a matching primary key value in both tables. Try to recreate this functionality using pivot tables.USE [Featherman_Analytics];DECLARE @RSSales TABLE ([Product Category] varchar(50), [Reseller Units] int) DECLARE @INETSales TABLE ([Product Category] varchar(50), [Internet Units] int)INSERT INTO @RSSales ([Product Category], [Reseller Units])SELECT p.[Category], SUM([OrderQuantity]) AS [Total Sales]FROM [dbo].[FactResellerSales] as rsINNER JOIN [dbo].[AW_Products_Flattened] as p ON p.ProductKey = rs.ProductKeyGROUP BY p.[Category]INSERT INTO @INETSales ([Product Category], [Internet Units])SELECT p.[Category], SUM([OrderQuantity]) AS [Total Sales]FROM [dbo].[FactInternetSales] as rsINNER JOIN [dbo].[AW_Products_Flattened] as p ON p.ProductKey = rs.ProductKeyGROUP BY p.[Category]SELECT reseller.[Product Category], reseller.[Reseller Units], inet.[Internet Units], inet.[Internet Units] + reseller.[Reseller Units] as [Totals]FROM @RSSales AS reseller FULL JOIN @INETSales AS inet ON reseller.[Product Category] = inet.[Product Category] ORDER BY reseller.[Product Category]19059874250038104381500Here’s a similar example that condenses >120,000 rows of fact table data down to three columns.Think of the process as:Declaring two table arraysRunning two or more select statements as normal (but prefacing the select statement with an INSERT INTO statement to move the results into one of the table variables. You can add different data columns that are needed into either of the table variables. You just need a common field amongst the tables (that entity of interest such as employee, store, customer, product, month, etc.).A final SELECT statement pulls all the columns together and can add new columns of derived data now that you have access to columns of data from different data sourcesAnother problem solved with table variables - In this next query demonstration the group field is month. While you can replicate the first two columns with a PIVOT query, here the third column is an aggregated column from a different table. This is the solution.USE [AdventureWorksDW2012];DECLARE @RSSales TABLE ([Month] varchar(20), [Total Reseller Unit Sales] int) DECLARE @INETSales TABLE ([Month] varchar(20), [MonthNum] int, [Total Internet Unit Sales] int)INSERT INTO @RSSales ([Month], [Total Reseller Unit Sales])SELECT DATENAME(month, rs.OrderDate), SUM([OrderQuantity]) AS [Total Sales]FROM [dbo].[FactResellerSales] as rsGROUP BY DATENAME(month, rs.OrderDate)INSERT INTO @INETSales ([Month], [MonthNum], [Total Internet Unit Sales])SELECT DATENAME(month, inet.OrderDate), DATEPART(month, inet.OrderDate), SUM([OrderQuantity]) AS [Total Sales]FROM [dbo].[FactInternetSales] as inetGROUP BY DATENAME(month, inet.OrderDate), DATEPART(month, inet.OrderDate)SELECT inet.[Month], FORMAT(reseller.[Total Reseller Unit Sales], 'N0') [Total Reseller Unit Sales], FORMAT (inet.[Total Internet Unit Sales], 'N0') [Total Internet Unit Sales], FORMAT (reseller.[Total Reseller Unit Sales] + inet.[Total Internet Unit Sales], 'N0') As [Total Sales]FROM @RSSales AS reseller FULL JOIN @INETSales AS inetON reseller.[Month]= inet.[Month]ORDER BY [MonthNum]These results could easily be improved by passing in a parameter for product sub-category or product category.This is a different example that pulls data by month from two different tables into one resultset. The secon table variable includes and extra field so that the results can be sorted by Month #.-271357700Note the base code is altered to remove product category information and replace it with a month column. Also notice the safer FULL JOIN is used.The factInterenetsales table has sales in all 12 months so that table was chosen to pull the month number.Can you merge the % of total syntax with this table variable syntax! I bet you can!Another problem solved with table variables – the total sales booked for each sales rep is compared to the sales quota set for that sales rep. This query then is the basis for a dashboard that measures business performance to goal. Combining the data for employee over time is one way to look at the data, another is to look at each time period and rank/rate the sales rep performance within that time period. USE [AdventureWorksDW2012];DECLARE @Quota TABLE ([EmployeeKey] int, [Name] varchar(75), [CalendarYear] int, [CalendarQuarter] int, [YearQtr] varchar, [SalesAmountQuota] decimal)DECLARE @Actual TABLE ([EmployeeKey] int, [CalendarYear] int, [CalendarQuarter] int,[YearQtr] varchar, [SalesAmountActual] decimal)INSERT INTO @Quota ([EmployeeKey], [Name], [CalendarYear], [CalendarQuarter], [SalesAmountQuota] )SELECT q.[EmployeeKey], CONCAT([FirstName], ' ', [LastName]) as Name, [CalendarYear], [CalendarQuarter] , [SalesAmountQuota]FROM [dbo].[FactSalesQuota] as qINNER JOIN [dbo].[DimEmployee] as e ON e.[EmployeeKey] = q.[EmployeeKey]INSERT INTO @Actual ([EmployeeKey], [CalendarYear], [CalendarQuarter], [SalesAmountActual])SELECT [EmployeeKey], DATEPART(year, [OrderDate]), DATEPART(quarter, [OrderDate]) , SUM([SalesAmount]) FROM [dbo].[FactResellerSales]GROUP BY [EmployeeKey], DATEPART(year, [OrderDate]), DATEPART(quarter, [OrderDate]) SELECT q.[EmployeeKey] AS [EmpID], q.Name, q.[CalendarYear] as [Year], q.[CalendarQuarter] as [Qtr], CONCAT( a.[CalendarYear], ' Q', a.[CalendarQuarter]) AS [Time Period], a.SalesAmountActual as [Recorded Sales] ,q.[SalesAmountQuota] as [Sales Quota],FORMAT((a.SalesAmountActual / q.[SalesAmountQuota]), 'N2') as SalesKPI, (a.SalesAmountActual - q.[SalesAmountQuota]) as DeltaFROM @Quota q JOIN @Actual a ON CONCAT(q.[CalendarYear], ' Q', q.[CalendarQuarter]) = CONCAT( a.[CalendarYear], ' Q', a.[CalendarQuarter])AND q.EmployeeKey = a.EmployeeKeyORDER BY a.[EmployeeKey], a.[CalendarYear], a.[CalendarQuarter]31758128000The sales quotas are in one table and do not have to be aggregated., the actual sales are in another table (and have to be aggregated) and the employee names are in a third table.This is a tricky problem and you can see that the code gets long. The first sections are standard and handle the field retrieval, the row calculations, the data transformation within each dataset, and the GROUP By aggregations and other aggregations (i.e. PIVOT())The final SELECT statement is where if you have all the columns you need, then you can perform data comparisons, etc. That the columns of data came from very different tables is irrelevant, the data can be combined and analyzed.Here an inner join is used. Be careful and look at your data to see if you need to use a left join or full join to ensure you retrieve all the records needed.Here is an extended example that researches the products and product sub categories that are sold on the different sales channels in the hopes to identify what additional product sub-categories and products can be sold at profit on the Internet channel. While many even difficult requests for information can be solved with GROUP BY, some can only be solved with sub queries and table variables. Don’t worry about detailed learning of the table variables until later.USE [AdventureWorksDW2012];SELECT (COUNT(DISTINCT([ProductKey]))) AS [# SKU's Reseller Channel]FROM [dbo].[FactResellerSales]SELECT (COUNT(DISTINCT([ProductKey]))) AS [# SKU's Intnernet Channel]FROM [dbo].[FactInternetSales]Here we can run two different queries and see two different results in two different query windows. The bike company we have been commissioned to examine sells a lot more product SKU’s on the reseller channel than via the Internet. Management wants us to recommend 25 to 100 products that should be added to the Internet channel. The expense for creating additional web pages in the web catalog is considerable as the web design team is not fluent with dynamic webpages (i.e, they will have to create one new web page for each product added to the website, rather than use templates that fill in data and pictures from a database)USE [AdventureWorksDW2012];DECLARE @ChannelWidth TABLE([Reseller Channel #SKU's] int, [Internet Channel #SKU's] int)INSERT INTO @ChannelWidth ( [Reseller Channel #SKU's], [Internet Channel #SKU's] )SELECT (SELECT(COUNT(DISTINCT([ProductKey]))) FROM [dbo].[FactResellerSales]), (SELECT(COUNT(DISTINCT([ProductKey]))) FROM [dbo].[FactInternetSales])SELECT * FROM @ChannelWidthHere the results are appended to one table using a table variable which is a virtual table (i.e., array). Much more on this later but first a table variable (a piece of system memory) is created then data is selected and inserted into it.Notice the two sub-queries are wrapped in their own parentheses, each is generating the value for inserting into the column, both sub-queries together make up the one row.Each sub-query counts up the # product ID’s that have been sold on each sales channel. USE [AdventureWorksDW2012];DECLARE @Year1 int = 2006DECLARE @Year2 int = 2007DECLARE @ChannelWidth TABLE([Reseller Channel #SKU's] int , [Reseller Channel Revenue] nvarchar(30), [Internet Channel #SKU's] int, [Internet Channel Revenue] nvarchar(30))INSERT INTO @ChannelWidth ([Reseller Channel #SKU's], [Reseller Channel Revenue], [Internet Channel #SKU's], [Internet Channel Revenue])SELECT (SELECT (COUNT(DISTINCT(rs.[ProductKey]))) FROM [dbo].[FactResellerSales] as rs WHERE YEAR(rs.[OrderDate]) = @Year1), (SELECT FORMAT((SUM(rs.[SalesAmount])), 'N0') FROM [dbo].[FactResellerSales] as rs WHERE YEAR(rs.[OrderDate]) = @Year1 GROUP BY YEAR(rs.[OrderDate]) ), (SELECT (COUNT(DISTINCT(fis.[ProductKey]))) FROM [dbo].[FactInternetSales] as fis WHERE YEAR(fis.[OrderDate]) = @Year1), (SELECT FORMAT((SUM(fis.[SalesAmount])), 'N0') FROM [dbo].[FactInternetSales] as fis WHERE YEAR(fis.[OrderDate]) = @Year1 GROUP BY YEAR(fis.[OrderDate]) )--here we insert the second rowINSERT INTO @ChannelWidth ([Reseller Channel #SKU's], [Reseller Channel Revenue], [Internet Channel #SKU's], [Internet Channel Revenue])SELECT (SELECT (COUNT(DISTINCT(rs.[ProductKey]))) FROM [dbo].[FactResellerSales] as rs WHERE YEAR(rs.[OrderDate]) = @Year2), (SELECT FORMAT((SUM(rs.[SalesAmount])), 'N0') FROM [dbo].[FactResellerSales] as rs WHERE YEAR(rs.[OrderDate]) = @Year2 GROUP BY YEAR(rs.[OrderDate]) ), (SELECT (COUNT(DISTINCT(fis.[ProductKey]))) FROM [dbo].[FactInternetSales] as fis WHERE YEAR(fis.[OrderDate]) = @Year2), (SELECT FORMAT((SUM(fis.[SalesAmount])), 'N0') FROM [dbo].[FactInternetSales] as fis WHERE YEAR(fis.[OrderDate]) = @Year2 GROUP BY YEAR(fis.[OrderDate]) )--now that the rows of data are inserted, display the dataSELECT * FROM @ChannelWidthThis query can be turned into a stored procedure that can accept 2 years as criteria, to facilitate a comparison of channel width and channel revenue.It is a recurring problem to decide how amny products should be in retail stores (physical or virtual).This query is the same as prior except that it has more columns and two rows.If you write this type of query you have to remember that each successive sub-query is wrapped in its own parentheses and separated by commas.SELECT SUM() always need a GROUP BY ColumnReferencesOver clause - functions - 3 - Blog - of total - of total with group - ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download