Washington State University



Featherman’s T-SQL Adventures ? - Arrays (aka SQL Table Variables)Arrays are used in data management, data manipulation, data compiling, and report generation processes. Arrays are created in RAM of the database server and deployed temporarily to perform these data management tasks. Transaction data from different sources stored on disk can be copied into one or more RAM based arrays so that the data can be comingled and then analyzed. It is typical to have to bring together many different datasources to analyze business performance. Marketers for example often purchase datasets such as DMV data that can be used as an indicant for affluence of a zip code. Buying industry data is also helpful to compare a firm’s performance as compared to the indistry. The analyst would use arrays to merge in the purchased data (this is where understanding granularity is key).Once the data is merged, analytics can be run, dashboards updated, and the newly configured data can be written back to disk storage for long-term storage. The process of data retrieval and manipulation can be automated with SQL stored prcedure, for example so that a dashboard can be updated every hour. The major benefit of using arrays is that they can take in data from many different sources merged into new formations and then saved back to disk. If you connect reports to this process then the reporting infrastructure can also be automated. 55689501965325Figure SEQ Figure \* ARABIC 1 courtesy 1 courtesy perform analytics and generate insight into business problems the analytst often needs to pull together many different datasets. The insight is generated when new columns of metrics can be built. An example is to merge in weather or credit level data to investigate seasonal changes to automobile sales in different regions. While analysts and DBA’s can pull together data into arrays for data processing for a special research project, DBA’s also spend considerable time automating these processes to perform regular ETL (Extract, transform, Load) processes to support business analysis. Consider the scenario of a graduating student that needs to choose amongst three job offers in three different cities. They are needing to base their career decision on multiple criteria (e.g., weather data, crime data, housing price data, entertainment opportunity ratings,etc.). The typical thoughtful student would pull all the data for each decision criteria into a spreadsheet to facilitate comparison. An array is similarly a spreadsheet in the memory of the computer server, you can manipulate the data in each cell (intersection of row and column), and you can then aggregate the data in new ways. The concept in this scenario is the same, once you pull the data together, you can compare it and make new calculations to facilitate decision-making. You could pull data from different datasources into columns of your array. Often by pulling data together you can see the patterns and can have an epiphony. When using an array in a database (SQL calls them table variables) you run SQL SELECT queries to retrieve data from different databases, and comingle this data, then create new columns of derived analytics. This is the alchemy of business analysts, this is the magical insight the best analysts are famous for. Successful business analysts create new insights and understanding when they create new columns of analytics. In the spreadsheet or array you would create one row per city (being considered for employment). The columns are built from the imported data. Once you have all your important decision criteria implemented in columns of arrays the value that the business analyst adds are the new columns of metrics that you can implement. In the student scenario, for example you would want to create a new column such as a weighted average so you can compare your options numerically. The point is worth repeating you don’t get promoted because you can do SQL, manage data, and automate processes, these are expected. The analyst or DBA gets promoted when they provide insight and analysis to top management that make problem-solving easier. It’s a merging of business understnading, intuition, technical skills and persistence.But we are getting ahead of ourselves. Back to the nuts and bolts of SQL Arrays. SQL statements build datasets, and array tables free up the developer to merge data from different datasources). Because you can add columns of data together, they can be at different levels of granularity. This is another innovation that has not been presented in our class together.Table variables (arrays) have been used by SQL programmers for over 50 years to transfer data from storage to working memory to perform data management, data integrations, and data aggregation. Shuttling data from disk storage to RAM storage for processing is a standard operating procedure for data centers because RAM has always worked a lot faster than secondary storage (RAM processing is measured in nano-seconds compared to milliseconds for disk read/write speed). Arrays are used in many forms of coding to store values, and as a place where you can manipulate and comingle data and build a structure of values. For our needs arrays tables are used to hold data and are useful to pull data together when other SQL query methodologies do not work.A recurring reporting problem is that data is needed at different levels of granularity, for example it is common to add a final column with totals or % of total metric. In SSRS, PowerBI or Tableau reports you can add column and row totals easily. If your dataset from a SQL query needs a mixture of detailed and summary data however (as a function of ETL processing) then you can use one or more arrays to mingle data that was calculated at different levels of summarization. Often several layers of aggregation and data shaping are needed in ETL processes.Array tables excel at allowing you to pull data together, co-mingling data from different sources, into a compiled dataset that is ready for analysis. Bringing disparate data into one organized dataset allows dashboards and reports to be easily generated and updated. A pre-requisite is that you need to have a dimension attribute (column) in common between the different source data tables. The field in common will usually be a primary key such as employeeID, productID, storeID, but can also be textual city, state, or country (for example when compiling econometrics data). When you can’t figure a way to pull data together from different sources (i.e., the joins don’t work, and neither does a sub-query), then pull the data into one or more an array table. If you need more than one array then you can join the array tables inside your query. When using array tables, in essence you are just pulling data from different sources into one table to compact data for storage and further analysis. For example, array tables can hold data that is culled from different sources, and this array table (aka resultset) can be written to a new permanent SQL database table, and used as the datasource for reports.Often the compiled data is the datasource for a dashboard, other times the arraytable is the in-memory storage table to hold compiled data in the middle of an Extract, Transform and Load process. The newly compiled array table can then be merged with other array data in an iterative ETL process of data culling, formatting and combining. The data organization can be automated by running stored procedures and the custom built array data can be saved to SQL database tables for permanent storage.Array tables then are your canvas and you the emerging DBA can paint your masterpiece dataset onto them, column by column.Arrays are either wide (many columns), tall (many rows) or both. You specify the columns, their names and data types. Next you run a SELECT statement that specifies the data for the columns, and to then insert rows of data into the array. To improve data management and simplify organization of data assets, a DBA may want to build fewer more complex resultsets, each one serving as the datasource for many reporting needs (rather than build lots of queries, build a few that are multi-purpose!!). The array table then is usually the intermediate step needed to pull together data into a complex dataset (then save the dataset into a new database table.Let’s get started!Examples:USE [Featherman_Analytics];DECLARE @SubCategory nvarchar(30) = 'Mountain Bikes'SELECT [Model], Sum(OrderQuantity) as [Total] FROM FactResellersales as sInner JOIN [dbo].[AW_Products_Flattened] as pON p.ProductKey = s.ProductKeyWHERE [Sub Category] = @SubCategoryGROUP BY [Model]When this query is turned into a stored procedure, the value for the @subCategory variable will not be provided. Rather that value will be passed in when the query is run.For example if the query is saved into a stored procedure called spUnitsForSubCategory then you would run the query and retrieve te data by typingEXEC spUnitsForSubCategory 'Mountain Bikes'107957620000Before we jump into arraytables which are a grid of variables where each column can be a different datatype, let’s take another look at declaring and using a single variable in SQL. What is a variable? A variable is a named piece of system memory (in RAM) that is created by, accessible and manipulated by SQL programs and the users of the SQL programs. The DECLARE @variablename datatype is the way to create a variable. In this line you can set the value of the variable immediately (such as run a sub-query) or use a SET command later on. The program shown to the left allows one textual sub-category value. Here we show th equery in test mode (by giving a value to the variable). Future usage of the query will receive values (from a dropdown list in an webpage, Excel, SSRS, or Tableau project) passed in and used in the WHERE statement to filter the results retrieved from the database.Next in this document we focus on a special type of variable called a table variable (which is special in that it can have many columns and rows), not just one single parcel of info.But let’s first look at creating a simple nvarchar variable that can accept a word or phrase. Most variables that programmers use can store one value, a number, a date or a word or set of words. In T-SQL use the DECLARE phrase to create a variable in the memory (RAM) of the development machine. The DECLARE line shown to the left creates a variable of datatype nvarchar that can accept up to 30 characters (letters or numbers). The idea is that this query would be ‘called’ or ‘invoked’ from Excel or a reporting software which passes in the value for the variable. This is called a parameterized query and can provide different results depending on value passed into the query, similar to a slicer.We can see that the variable is used in the WHERE clause to filter the results. USE [Featherman_Analytics];SELECT * INTO Featherman_Analytics.dbo.ManagersListFROM [AdventureWorksDW2012].[dbo].[DimEmployee]WHERE [Title] LIKE '%manager%'344955111274We will use this technique to move some data into your database account. Log into YOUR database, then USE Featherman’s database. Your INTO statement should be YOUR database. You are making a new table so give it a nice name.00We will use this technique to move some data into your database account. Log into YOUR database, then USE Featherman’s database. Your INTO statement should be YOUR database. You are making a new table so give it a nice name.In this module we will also use the concept of copying data from one SQL Server database table into another, or from a database table from a cloudDB to a data table in memory. Data is copied from a table or set of tables INTO an array, and manipulated, creating new columns of analytics.We will use the INSERT INTO arrayname SELECT * FROM SQL table command.First we demonstrate code that copies data from one database table INTO your database. This code creates a new table in the USE database so you must be the owner of the database.The code creates the ManagersList table in the Featherman_Analytics database and inserts 20 rows of data INTO it. The data copied in comes FROM the DimEmployee table of AdventureWorksDW2012.This code will be used in a future mode to copy massive amounts of data YOU’RE YOUR database so it can be manipulated using SQL stored procedures. This code shown here in case you want to move data into your database. 707555710160004153839635000Array Examples: It is common to give regional managers their own datatables on the corporate server to enable simple uploads of business performance data. This approach can speed up data management and reporting at the regional level. Notice on the first image that there are six tables of data, one for each country where sales are made (The Rollup table signifies you are combining the data – the opposite of drill-down is rollup). A common requirement is to need to pull together the tables of data for some further analysis. How would you do that? Copy rows into Excel? No need (and that process is not automated so everytime you want to refresh the report, ugh, back to Excel copy/paste purgatory. Behold the UNION query – which is used to add more rows of data to an array when the schema is the same across all the tables.What we want is the table on the right where you can see the data is merged from the differnet tables (Australia, Canada, France shown).We can use the INSERT INTO, SELECT * FROM, UNION SELECT commands to pull rows of data from different database tables into an array, then write that array to a new table in the database. In the following example all the tables are in the asme database (at corporate HQ) but the tables can also be in different databases. This SQL code should give you some ideas, you can pull different data together, transform it, add metrics then save the data into a database table, finishing by connecting a report to that table. You can of course automate the entire process.The first example uses a decimal variable and a table variable.USE [Featherman_Analytics];DECLARE @Sales TABLE ([Country] varchar(50), [Product] varchar(50), [Model] varchar(50), [Totals] decimal(10,2) ) -4112683185INSERT INTO @Sales ([Country], [Product], [Model], [Totals]) SELECT * FROM [featherman].[BikeSales_Australia]-2719392412UNION SELECT * FROM [featherman].[BikeSales_Canada]UNION SELECT * FROM [featherman].[BikeSales_France]UNION SELECT * FROM [featherman].[BikeSales_Germany]UNION SELECT * FROM [featherman].[BikeSales_UK]-5961573212DECLARE @TotalSales decimal = (SELECT SUM([Totals]) FROM @Sales)-68693107502SELECT [Country], [Model], SUM([Totals]) as [Totals], FORMAT(SUM([Totals])/@TotalSales, 'P2') as [% Total]FROM @SalesGROUP BY [Country], [Model]ORDER BY [Country], [Model]1210274855/* If this was your database then you could save the compiled data from the array into a table on your DB as follows. It is common to compile data in arrays then save the compiled dataset into a DB table to facilitate reporting.SELECT * INTO [Featherman_Analytics].[dbo].[BikeSales_rollUP]651223183275The result of this epic query is shown on the previous page0The result of this epic query is shown on the previous pageFROM @Sales*/Example #11) First you create a table array, here we have 4 columns of data (Country, Product, Model, Totals). 2) Next we insert into that array, all the data from the BikeSales_Australia. The syntax is INSERT INTO @ARRAYTABLE(columns) SELECT * FROM SQL table3) Next you can add more records (rows) with the UNION SELECT command. Each additional UNION SELECT command works with the INSERT INTO statement.4) Next we use a sub-query to calculate a value for a % of Total calculation. This is a sub-query of the array table, that computes a grand total.5) Now that we have all the data in the array and have totalled one column, we use a SELECT statement to add a new % of total metric column. We also do not need to select all the columns to include them in the dataset (the product column was omitted to change the granularity from product to model). The result is a reduction in the # rows…if that meets the report requirements.When the data is in an array we can add calculated columns, perform GROUP BY analysis and shape the data into the format that we need depending on granularity of reporting needs. 6) Finally when the data is shaped and formatted, it can be saved back into the database into an existing or new table (which can serve as the datasource of a report or dashboard). When saving data into an existing table, we would need to clear out the old data first (not shown here).USE [Featherman_Analytics];DECLARE @Sales TABLE ([Country] varchar(50), [Product] varchar(50), [Model] varchar(50), [Totals] decimal(10,2), [%Total] decimal(10,2)) 389961568800DECLARE @TotalSalesA decimal = (SELECT SUM([Product Totals]) FROM [featherman].[BikeSales_Australia])DECLARE @TotalSalesC decimal = (SELECT SUM([Product Totals]) FROM [featherman].[BikeSales_Canada])DECLARE @TotalSalesF decimal = (SELECT SUM([Product Totals]) FROM [featherman].[BikeSales_France])DECLARE @TotalSalesG decimal = (SELECT SUM([Product Totals]) FROM [featherman].[BikeSales_Germany])DECLARE @TotalSalesUK decimal = (SELECT SUM([Product Totals]) FROM [featherman].[BikeSales_UK])16324321300INSERT INTO @Sales ([Country], [Product], [Model], [Totals], [%Total]) SELECT [Country], [ProductAlternateKey], [ModelName], [Product Totals] ,[Product Totals]/@TotalSalesAFROM [featherman].[BikeSales_Australia]UNION SELECT [Country], [ProductAlternateKey], [ModelName], [Product Totals] ,[Product Totals]/@TotalSalesC FROM [featherman].[BikeSales_Canada]UNION SELECT [Country], [ProductAlternateKey], [ModelName], [Product Totals] ,[Product Totals]/@TotalSalesF FROM [featherman].[BikeSales_France]UNION SELECT [Country], [ProductAlternateKey], [ModelName], [Product Totals] ,[Product Totals]/@TotalSalesG FROM [featherman].[BikeSales_Germany]UNION SELECT [Country], [ProductAlternateKey], [ModelName], [Product Totals] ,[Product Totals]/@TotalSalesUK FROM [featherman].[BikeSales_UK]-205078012200SELECT [Country], [Model], SUM([Totals]) as [Totals], SUM([%Total]) as [% Total]FROM @SalesGROUP BY [Country], [Model]ORDER BY [Country], [Model]-23916114700-187747400Example #2What if the manager wants to drill down to see % of totals by model within each country?How do we provide that data grouping? First we add a new column to the array that will hold a new metric. Next we pull the totals from the 5 tables into 5 variables.Next we run our INSERT INTO SELECT FROM UNION SELECT statements. Each SELECT statement calculates the % of total for that country, for each product in the dataset and populates the values for the new cilumn [%Total]. We finish with our GROUP BY to change the granularity to from product to Model (by removing product from the GROUP BY) then SUM the totals and % Totals by model. Therei is an easier way. If the tables were numbered (Country1, etc.) there might be a way to use a loop…. This next example again shows that you can use sub-queries to pull data from different datasources into the same arraytable. This is a simple example that could be solved with two variables but extrapolating to a larger scenario, the concept can be very helpful to compile data for many values of a dimension. Here there is no primary key to connect the tables, but the data can still be pulled into adjacent columns of an array, which allows further analysis.USE [Featherman_Analytics];DECLARE @ChannelWidth TABLE([#SubCategories in Stores] decimal, [#SubCategories on Web] decimal)-314148643600--insert the data retrieved below into the specified columns of the arrayINSERT INTO @ChannelWidth ([#SubCategories in Stores], [#SubCategories on Web]) SELECT --this is the sub-query for the first column(SELECT(COUNT(DISTINCT(p.[Sub Category]))) FROM [dbo].[FactResellerSales] as rsINNER JOIN [dbo].[AW_Products_Flattened] as p ON p.[ProductKey]= rs.[ProductKey])--this is the sub-query for the second column, (SELECT(COUNT(DISTINCT(p.[Sub Category]))) FROM [dbo].[FactInternetSales] as fisINNER JOIN [dbo].[AW_Products_Flattened] as p ON p.[ProductKey]= fis.[ProductKey])-37410626900--now we retriev the data from memory and display it SELECT * FROM @ChannelWidthExample #3: The brand manager is wanting to analyze and adjust how many product sub-categories are sold on each channel. Unless you write SQL often, this investigation might be time consuming. Sub-queries and arrays again come to the rescue.Again you define an array and INSERT values INTO the virtual table using sub-queries, then you use two different SELECT statements that are sub-queries to select Notice there is no primary key field that is in common between these two tables. We just want to pull values from the tables and display them in a table. Its just a baby step but an important one.You can put any values you want into columns of an array, they can be totally different. The array can hold different types of unrelated data.As we shall see usually you want to pull data columns about some attribute of interest, so your rows are used to organize the data. For example if your first column is US state, you could pull different data into columns for each row (each state).Step 1 - declare the array tables and their columnsStep 2 – Select data using normal queries and load the data INTO the arraysStep 3 Select columns from the two array tables, add metricsStep 4 – join the arrays together on a common field DECLARE @ArrayTable1 Table (column1 int, column2 Decimal)DECLARE @ArrayTable2 Table (column1 int, column2 Decimal)-196105218300INSERT INTO @ArrayTable1(SELECT Column1, Sum(Column2)FROM SomeTable GROUP BY Column1)INSERT INTO @ArrayTable2(SELECT Column1, Count(Column2) FROM SomeTableGROUP BY Column1)-5546916140200SELECT columns and create metrics using columns from either array @ArrayTable1, or @ArrayTable2FROM @ArrayTable1 AS T1, @ArrayTable2 AS T2WHERE T1.primarykeyfield = T2primarykeyfield-462067720900--You can also replace the prior two lines that connect the table on a common field with the following FULL JOIN FROM @temptable1 AS T1FULL JOIN @temptable2 AS T2Example #4: Here is the another way to use arraytables, use sub-queries to load two array tables with data from different data sources, and producing analytics.This query shows conceptually one way to create and combine two tables of retrieved data when you have a common field of interest, the primary key (preferrably integer). First you create arrays and then SELECT values and INSERT the retrieved data INTO the arrays. Finally you use a final SELECT statement to pull data from the two table variables together and to create new columns of derived data so that the resultset has all the calculated or aggregate fields.The WHERE statement is an inner join in that values are displayed ONLY if there is a value in both tables (both side of the join). The blue code shows the FULL JOIN which can show the data whether there is a match or not in the other table.The major benefit is that when you pull data from different data sources into arrays you can join them and make new columns of required metrics, the data can be different levels of granularity.These arrays are staging tables in RAM (common when creating ETL procedures). The code above can be used WHEN YOU KNOW FOR SURE that there is an attribute field in common and there are values in the field in both tables (such as month, or city). Reports often need to give perspective by displaying data at different levels of granularity to examine th escope of a proble, (city, state, country columns can all coexist in adjacent columns).You can for example show a department is losing money for one store in one city, but the same department is not problematic at the state level. Managers need to know how big problems are.The next is a little more powerful query – Often when you do a join of tables you do so to add more columns of metrics or to add rows to make more complete lists. This query tackles a different problem, adding aggregate columns from a second table. One way to do this is to create two virtual tables with the results you want then to join the tables to display the different aggregations. So here we create two virtual tables, select data into them, then finally pull the desired compiled data from the two virtual tables. This following query is MUCH EASIER by utilizing array tables. Step 1 – create the arrays and columns. FYI if you want to use nice P@ % formatting you need to specify a nvarchar field.Step 2 – load the data into the arrays652863152296900Step 3 – now that the two arrays are loaded with data and joined, we can select the columns that we want from the two arrays, and create some new columns of metricsStep 4 – join the two arraysUSE [AdventureWorksDW2014];DECLARE @Year int = 2013DECLARE @RSSales TABLE ([ProductKey] int, [Reseller Units] decimal) DECLARE @INETSales TABLE ([ProductKey] int, [Internet Units] decimal)DECLARE @Combined TABLE ([ProductKey] int, [Reseller Units] decimal, [Internet Units] decimal, [Total Units] decimal146838411577400, [% Reseller Channel] nvarchar(8), [% Internet Channel] nvarchar(8))38065613522500INSERT INTO @RSSales ([ProductKey],[Reseller Units])SELECT [ProductKey], SUM([OrderQuantity]) FROM [dbo].[FactResellerSales]WHERE DATEPART(year, [OrderDate]) = @YearGROUP BY [ProductKey]INSERT INTO @INETSales ([ProductKey], [Internet Units])SELECT [ProductKey], SUM([OrderQuantity]) FROM [dbo].[FactInternetSales]WHERE DATEPART(year, [OrderDate]) = @Year GROUP BY [ProductKey]INSERT INTO @Combined ([ProductKey], [Reseller Units], [Internet Units], [Total Units], [% Reseller Channel], [% Internet Channel] )SELECT rs.[ProductKey],rs.[Reseller Units], inet.[Internet Units], (rs.[Reseller Units] + inet.[Internet Units]) AS [Total Units],FORMAT(rs.[Reseller Units]/ (rs.[Reseller Units] + inet.[Internet Units]), 'P2') AS [% Reseller Channel]15213522798800,FORMAT(inet.[Internet Units]/ (rs.[Reseller Units] + inet.[Internet Units]), 'P2') AS [% Internet Channel]14776936937100FROM @RSSales rs, @INETSales inetWHERE rs.ProductKey = inet.ProductKeyORDER BY rs.ProductKeySELECT * FROM @CombinedExample #5: Here you can see that the same product key is used in the database tables for both sales channels. The product key is an internal referent number used as the primary key (rather than the [ProductAlternateKey] field). First declare a variable to filter the query by year, and two virtual tables defining the field names and data types for each virtual table. It’s best to keep the column names short! We will load these columns with data, so the data type corresponds to the data copied in.In step 2 - Here we select two columns of data from the factResellerSales table and insert the data into the virtual table. There are 266 rows of data copied into the first virtual table, each row has two columns. Next two columns of data are selected from the factInternetSales table and inserted the data into the second virtual table. There are 133 rows of data copied into the virtual table.Step 3 – Next we select 3 columns of data from the two virtual tables productkey, reseller units, Internet Units, and add 3 columns of derived data. Note that an alias for the virtual tables is needed (code in yellow) so that you can refer to the columns directly. -4693920205500We need to turn this into a stored procedure, and dynamically pass in the year00We need to turn this into a stored procedure, and dynamically pass in the yearClearly the strength of using arraytables is the ease of creating calculated fields from the available columns in the arrays. If each new metric column was an inner join, then the query would be much harder to write.Programmers challenge!The data could also be sliced by product sub-category in Excel – if you add the columns. Can you do it?USE [AdventureWorksDW2012];DECLARE @RSSales TABLE ([ProductKey] int, [Reseller Units] decimal (10,2))DECLARE @INETSales TABLE ([ProductKey] int,[Internet Units] decimal (10,2))-4934454199 INSERT INTO @RSSales ([ProductKey],[Reseller Units])SELECT [ProductKey], ISNULL(SUM([OrderQuantity]), 0)FROM [dbo].[FactResellerSales]WHERE DATEPART(year, [OrderDate]) = 2007GROUP BY [ProductKey] INSERT INTO @INETSales ([ProductKey], [Internet Units])SELECT [ProductKey], ISNULL(SUM([OrderQuantity]) , 0)FROM [dbo].[FactInternetSales]WHERE DATEPART(year, [OrderDate]) = 2007GROUP BY [ProductKey]-7130748223 SELECT rs.[ProductKey], rs.[Reseller Units] AS [Reseller Units], ISNULL(inet.[Internet Units], 0) as [Internet Units], rs.[Reseller Units] + ISNULL(inet.[Internet Units], 0) AS [Total Units] ,FORMAT(rs.[Reseller Units]/(rs.[Reseller Units] + ISNULL(inet.[Internet Units], 0)), 'p1') AS [% Reseller Channel] ,FORMAT(ISNULL(inet.[Internet Units], 0)/(rs.[Reseller Units] + ISNULL(inet.[Internet Units], 0)), 'p1') AS [% Internet Channel] FROM @RSSales AS rsFULL JOIN @INETSales AS inet ON rs.ProductKey = inet.ProductKeyWHERE rs.ProductKey IS NOT NULLORDER BY rs.ProductKey3582220469If you can bring values from different tables into one array great, if you can’t then use an approach shown here, join them.00If you can bring values from different tables into one array great, if you can’t then use an approach shown here, join them.Example #6: We should be passing in the year as a variable.The prior example used a WHERE clause INNER JOIN which requires that you must have the same key field in both tables. As with any INNER JOIN, you will receive a subset of the total records, if there are records in one table without a corresponding set of values in the other table. It is usually more insightful however to see attributes with zero values.This example uses a full join to get all the product sale data from either sales channel (whereas the prior query gets only the products that are sold on both channels). Try running the query with and without the final WHERE filter.As you see in the resultset below, we probably need to use the ISNULL() function to replace the NULLS with 0’s. USE [AdventureWorksDW2012];-- Set up your arrayDECLARE @Sales TABLE ([Product] int, [Reseller Units] decimal, [Inet Units] decimal ) -2693167571-- Declare which columns will be added to INSERT INTO @Sales ([Product], [Reseller Units] )-- load the first two columnsSELECT [ProductKey], SUM([OrderQuantity])FROM [dbo].[FactResellerSales]GROUP BY [ProductKey]-2166583259-- Choose a column to change the values – here values are changed from NULL to a total. UPDATE @Sales SET [Inet Units] = (SELECT SUM([OrderQuantity]) FROM [dbo].[FactInternetSales]WHERE [ProductKey] = [Product] ) -- Notice the WHERE statement is a JOIN from the database table to the array-3892298537-- Now choose what columns to displaySELECT [Product], [Reseller Units], ISNULL([Inet Units], 0) as [INET Units] FROM @SalesORDER BY [Product]Example #7: 1905000This is an innovative use of a sub-query that leverages an UPDATE TableName SET [Column name] = value command. We fill a column of data. While the values here go from NULL to a number, the this technique could also be used to update columns in arrays. So there are many ways to pull in or update data a) subquery, b) create values in the INSERT INTO statement, c) use an UPDATE SET command.We need is to update every row in the table adding the Internet sales for every product.In other examples we used two array tables and pulled values from both of them usinga) WHERE rs.ProductKey = inet.ProductKeyOrb) FULL?JOIN?@INETSales?AS?inetON?rs.ProductKey?=?inet.ProductKeyBut look again here we use only one array table, we INSERT 2 columns of data into the arraytable, then use an UPDATE SET to change the NULL value for each row of the 3rd column to a number.The amazing part is that the WHERE statement connects the FactInternetSales column to a column in the arraytable. This opens up a lot of possibilities.We still need to SELECT the values from the arraytable to show them, which gives us a chance to use our ISNULL command (thanks again Andrew Pirie and Leo for the ISNULL code).This code is much simpler than using two tables and a FULL JOIN USE [Featherman_Analytics];DECLARE @Sales TABLE ([Sub Category] nvarchar(50), [Model] nvarchar(50), [ProductID] int, [Product] nvarchar(50), [Reseller Profit] decimal (10,2), [Internet Profit] decimal (10,2), [Total Profit] decimal (10,2), [Reseller %] decimal (10,2), [Internet %] decimal (10,2), [Channel Preference] nvarchar(50))-4819580710INSERT INTO @Sales ([Sub Category], [Model], [ProductID], [Product], [Reseller Profit])SELECT [Sub Category], [Model], p.[ProductKey], [Product], SUM([SalesAmount] - [TotalProductCost]) FROM [dbo].[FactResellerSales] as frsINNER JOIN [Featherman_Analytics].[dbo].[AW_Products_Flattened] as p ON frs.ProductKey = p.ProductKey40419737291500GROUP BY [Sub Category], [Model], p.[ProductKey], [Product]-6479171185UPDATE @Sales SET [Internet Profit] = (SELECT SUM([SalesAmount] - [TotalProductCost])FROM [dbo].[FactInternetSales] as iINNER JOIN [Featherman_Analytics].[dbo].[AW_Products_Flattened] as p ON i.ProductKey = p.ProductKeyWHERE p.[ProductKey] = [ProductID] )-2400066278UPDATE @Sales SET [Total Profit] = 268493511192600([Reseller Profit] + [Internet Profit]) 21782288877100UPDATE @Sales SET [Reseller %] = (CASEWHEN [Reseller Profit] > 0 THEN ([Reseller Profit]/ [Total Profit])21840527337700WHEN [Reseller Profit] < 0 THEN 0END)UPDATE @Sales SET [Internet %] = (CASEWHEN ([Internet Profit] < [Total Profit]) THEN ([Internet Profit] /[Total Profit])WHEN ([Internet Profit] > [Total Profit]) THEN 1WHEN [Internet Profit] < 0 THEN 0END) UPDATE @Sales SET [Channel Preference] =(CASE WHEN [Reseller %] > .8 THEN 'Reseller dominant'WHEN [Reseller %] BETWEEN .55 AND .799 THEN 'Reseller favored'WHEN [Reseller %] Between .40 AND .50 THEN 'Internet favored'WHEN [Reseller %] < .4 THEN 'Internet dominant 'END)89134527SELECT * FROM @SalesWHERE [Total Profit] IS NOT NULLORDER BY [Total Profit] DESC3390755799760031170187997600-227619623140530027908617865340024107117865341742168040070-635078613000Example #7: This example leverages our use of UPDATE SET commands inside arrays. As we shall see UPDATE SET commands are easy to create as the expressions can use calculated columns in the expressionsThe INSERT INTO SELECT FROM statement compiles the data for the first five columns. Then a series of update set commands are used, the first fills a summary, calculated column using a sub-query. Next in green the [Total Profit] column is calculated by simply adding two other array columns.Next the [Reseller %] and [Internet%] columns are calculated. This is an analysis of profit, therefore some column values can possibly be negative (look at the reseller profit column to see negative values – l;osing money. A CASE statement is used fix the results so that it is not possible to see a value greater than 1. These are great usages of CASE statements.The last CASE statement within an UPDATE SET fills the [Channel Preference] column. It provides a categorizing column that can be used as a slicer. The categories indicate whether the product generates more profit on the Internet channel, Reseller channel or mix. Finally the data is displayed on-screen, filtering the list to only those catalog products that had sales. The following chart is possible in Excel.Example #8 – the total sales booked for each sales rep is compared to the sales quota set for that sales rep. This query then is the basis for a dashboard that measures business performance to goal. Combining the data for employee over time is one way to look at the data, another is to look at each time period and rank/rate the sales rep performance within that time period. USE [AdventureWorksDW2012];DECLARE @SalesPerformance TABLE ([EmployeeID] int, [Name] nvarchar(75), [CalendarYear] int, [CalendarQuarter] int, [SalesAmountQuota] decimal, [SalesAmountActual] decimal, [Performance to goal] decimal, [Sales KPI] nvarchar(10))-2489874149INSERT INTO @SalesPerformance ([EmployeeID], [Name], [CalendarYear], [CalendarQuarter], [SalesAmountQuota] )SELECT q.[EmployeeKey], CONCAT([FirstName], ' ', [LastName]) as Name, [CalendarYear], [CalendarQuarter] , [SalesAmountQuota]FROM [dbo].[FactSalesQuota] as qINNER JOIN [dbo].[DimEmployee] as e ON e.[EmployeeKey] = q.[EmployeeKey]-65534115522UPDATE @SalesPerformance SET [SalesAmountActual]= (SELECT SUM([SalesAmount]) FROM [dbo].[FactResellerSales]WHERE [EmployeeKey] = [EmployeeID] AND DATEPART(year, [OrderDate]) = [CalendarYear] AND DATEPART(quarter, [OrderDate]) = [CalendarQuarter] GROUP BY [EmployeeKey], DATEPART(year, [OrderDate]), DATEPART(quarter, [OrderDate]) )-7163104905UPDATE @SalesPerformance SET [Performance to Goal]= ([SalesAmountActual] - [SalesAmountQuota]) UPDATE @SalesPerformance SET [Sales KPI]= FORMAT([SalesAmountActual] / [SalesAmountQuota], 'p1') -3028278461SELECT * FROM @SalesPerformance/*SELECT * INTO [Featherman_Analytics].[dbo].[SalesPerformance]FROM @SalesPerformance */The sales quotas are in the [FactSalesQuota] table and do not have to be aggregated. Quotas are for each quarter for each employee.The actual sales are in the [FactResellerSales] table and saved by invoice. The invoices have to be aggregated for each employee and totalled for each year and quarter. The GROUP BY statement fills the [SalesAmountActual] column.Here the UPDATE SET command allows you to calculate two columns of metrics. Notice the [Sales KPI] field is a varchar(10) data type so the FORMAT([Foeld name]), ‘p1’ can be used to nicely format the number into a percent.Results are below as are some idea for charts.Well that wraps up this module, eight examples are just enough. A lot of ground was covered in this module. Arrays seems to make a lot of data manaipulation easier. Sub-queries work great to pull columns of values into the array, even changing the summary level of the data. UNION SELECT commands can be used to add rows of data, and UPDATE SET commands can be used to calculate values to fill columns of data with totals, % of total calcualtions and other new metrics.HAPPY SQL Coding! Mauricio Featherman, Ph.D.3-8-2020 ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download