Washington State University



Featherman’s T-SQL Adventures? - Sub-queries (sub-titled: How to build a dataset column by column from different fact tables)After many successes with GROUP BY() and PIVOT() queries you will soon meet the nagging problem of not being able to produce the datasets needed. This document helps by introducing sub-queries that are used here for two major purposes a) perform the processing of data in a step 1, step 2 approach, and b) to pull data from different tables to build columns of summarized data, some aggregated others calculated. Another recurring problem with data retrieval is that data is in different tables or worse in different databases, and different servers. While you can often merge data with INNER JOINS, LEFT or RIGHT JOINS, or FULL JOINS, often the analyst does not have a good idea of how the tables are connected and the joins produce incorrect data. This can cause the most heinous of all data errors, data that looks correct, but actually is not. It is easy for the analyst to detect repeating numbers or inflated numbers, but calculations must always be verified. A sub-query can help by allowing you to directly pull the data from another table, and simplify the JOIN onto only 1 table, the primary key of the dimension table.Joins won’t always work. A 5-table join has complex inner-workings that can sometimes produce incorrect results. To simplify the pulling together of data, the analyst and DBA need more precise tools. If joins are not accurately merging the tables of data, then reach for sub–queries (aka correlated sub-queries). Here we use sub-queries to add columns of metrics from different tables together into one resultset. You need a column in common between the different tables, such as employeeID. Run the queries in the left hand column in SSMS and read the commentary in the right hand column. Perform data visualizations for insight and fun. More info at (v=sql.105).aspxUSE [AdventureWorksDW2014]; SELECT DISTINCT([ModelName]), AVG([DealerPrice]) as [Avg Price]FROM [dbo].[DimProduct]WHERE [ProductSubcategoryKey] IN (1,2,3)GROUP BY [ModelName]========================================This is the query I want to work. I want to know the products are more expensive than the average for the product sub-catorgory. This query alas does not work.SELECT [ModelName], [EnglishProductName], [DealerPrice]FROM [dbo].[DimProduct]WHERE [ProductSubcategoryKey] =1AND [DealerPrice] > AVG([DealerPrice])-10397400This is a useful query to find the average price that each bicycle model is sold to dealers in the reseller network.How can you go deeper though, how can you identify the most expensive models in one sub-category? We need to calculate the average proce for the sub-category and then find the products that are more expensive than the average price for the product category (ie the most expensive mountain bikes).There are many similar questions you might have, and processesing scenarios, that just wont work. This document seeks to provide one stragtegy to get the data you need, without producing different datasets that get merged together in Excel.USE [AdventureWorksDW2014]; SELECT [ModelName], [ProductAlternateKey], [DealerPrice]FROM [dbo].[DimProduct]WHERE [ProductSubcategoryKey] IN (1,2,3)AND [DealerPrice] > 1.5 *(SELECT avg([DealerPrice]) FROM [dbo].[DimProduct]WHERE [ProductSubcategoryKey] IN (1,2,3))SELECT DISTINCT([ModelName]), [DealerPrice]FROM [dbo].[DimProduct]WHERE [ProductSubcategoryKey] IN (1,2,3)AND [DealerPrice] > 1.5 *(SELECT avg([DealerPrice]) FROM [dbo].[DimProduct]WHERE [ProductSubcategoryKey] IN (1,2,3))USE [AdventureWorksDW2014]; DECLARE @Avg decimal(10,2) = (SELECT avg([DealerPrice]) FROM [dbo].[DimProduct]WHERE [ProductSubcategoryKey] IN (1,2,3))SELECT DISTINCT([ModelName]), [DealerPrice]FROM [dbo].[DimProduct]WHERE [ProductSubcategoryKey] IN (1,2,3)AND [DealerPrice] > (1.5 * @Avg)PRINT @avg * 1.516847793488400-103922300The results of the 1st query are on the left, anwering the question, “what bicycles are sold at a price 50% higher than the average bicycle price?The second question reduces the granularity to look at just the model’s that are sold at a price 50% higher than the average bicycle. The sub-queries are highlighted and allow the functioning that we wanted.When you see a second SELECT statement, you know you have a sub-query. You could also solve this problem using a variable that has its own SELECT statement, just be sure to put these in their own ().Click on the messages tab to see any values from a PRINT statement.USE [AdventureWorksDW2012];SELECT [ResellerKey], [ResellerName], [BusinessType], (SELECT COUNT([OrderQuantity]) FROM [dbo].[FactResellerSales] as s WHERE s.[ResellerKey] = r.[ResellerKey]) AS [Unit Sales], (SELECT SUM([SalesAmount]) FROM [dbo].[FactResellerSales] as sWHERE s.[ResellerKey] = r.[ResellerKey]) AS [Total Revenue]FROM [dbo].[DimReseller] as rWHERE [GeographyKey] = 41ORDER BY [Total Revenue] DESC1261745232410The SELECT fields are from the dimension table. The measures are in the subqueries.00The SELECT fields are from the dimension table. The measures are in the subqueries.-654051333500You can say that the yellow sub-queries are nothing special as they pull columns from the same fact table and the whole query could be written more easily using the standard GROUP BY, but sub-queries do allow more freedom in bringing in columns from other tables. (shown next). They can perform aggregating functions; here a count and a sum. The query needs to link the new tables to the base table (here DimReseller). They are adding a new column so be sure to start them with , ()If you are fortunate the business requirement can be satisfied by pulling data using an inner join (or other type of join). This isn’t always the case. For example, how can we pull some primary key columns from two tables then add columns of aggregated data from the two tables? How would you do this if you didn’t know SQL? Well you can run two queries and then coy the data to Excel and be a copy/paste wizard, but that’s no fun, and that process is not sustainable as there is no automation.Sub-queries can save you a lot of Excel copy and paste time, and potential embarrassment caused by publishing erroneous reports. The problem is that if the data is in two fact tables each sharing the same primary key (integer field or even field such as USstate), you would have to comingle the data (often giving weird results). You might need to make a new table pulling the primary key values together. Often you see data explosion, missing columns or duplicated values meaning the connection between tables is wrong. But you can research the database and find the relationships and hopefully join the tables. When the joins don’t seem to work, simplify the query by joining use sub-queries, each of which just joins one table. USE [AdventureWorksDW2012];SELECT [EnglishProductSubcategoryName] AS [Sub-Category], [ModelName],p.[ProductKey],[ProductAlternateKey], [EnglishProductName], ISNULL((SELECT SUM([OrderQuantity]) FROM [dbo].[FactResellerSales] as rs WHERE rs.[ProductKey] =p.ProductKey), 0) AS [Reseller Units], ISNULL((SELECT AVG([UnitPrice]) FROM [dbo].[FactResellerSales] as rsWHERE rs.[ProductKey] =p.ProductKey), 0) as [Reseller Avg. Sales Price], [DealerPrice]FROM [dbo].[DimProduct] as pINNER JOIN [dbo].[DimProductSubcategory] as ps ON ps.ProductSubcategoryKey = p.ProductSubcategoryKeyWHERE ISNULL((SELECT AVG([UnitPrice]) FROM [dbo].[FactResellerSales] as rsWHERE rs.[ProductKey] =p.ProductKey), 0) < ([DealerPrice] * .9)AND (SELECT SUM([OrderQuantity]) FROM [dbo].[FactResellerSales] as rs WHERE rs.[ProductKey] =p.ProductKey) >0ORDER BY [Sub-Category], [ModelName], p.ProductKeyHere a subquery is used to retrieve only products selling at a certain loss (10% below dealer price)Check out the WHERE clause, go ahead and run the query with just the WHERE [FinishedGoodsFlag] =1 clause and comment out the rest. You should see almost 400 rows of data. About 50 of the rows however do not have any sales on either sales channel. With our expanded WHERE clause you can filter the list to only products that were either sold on one channel or the other. The WHERE clause has two sub-queries of its own since you can not reference the aggregated field (ie, [Reseller Units]). The first filters to find products that are on average sold 10% below dealer price (catalog price). The second ensures that only products that have sales are included (to filter out products that did not have sales and therefore a average sales price of 0).-30314109496Sub-query challenge: Perform an analysis of what products are priced radically different in the reseller channel (wholesale) and Internet Channel (B2C retail). Compare the pricing to the standard pricing in the products table.00Sub-query challenge: Perform an analysis of what products are priced radically different in the reseller channel (wholesale) and Internet Channel (B2C retail). Compare the pricing to the standard pricing in the products table.Scenario: What if you want to pull total sales for every product sold on either sales channel? (Reseller and Internet?). This simple request is apparently not that simple. The next query demonstrates that 334 different SKU’s are sold on the Reseller channel and 158 via the Internet channel, but even learning that factoid was only made possible by our understanding of JOINS. Lets review how JOINS work.USE [AdventureWorksDW2012];SELECT COUNT(DISTINCT(r.[ProductKey])) as [Reseller Products], COUNT(DISTINCT(i.[ProductKey])) as [Internet Products]FROM [dbo].[FactResellerSales] as rFULL JOIN [dbo].[FactInternetSales] as iON r.productkey = i.productkeyHere we break out the FULL JOIN. We can see the full count for both channels, whether they were sold on the other channel or not. This is great for counting, but as you will see below when trying to build a list of products the resultset will show lots of NULLS as most of the products were sold on one channel or the other, not both. What’s an analyst to do?USE [AdventureWorksDW2012];SELECT COUNT(DISTINCT([ProductKey])) FROM [dbo].[FactResellerSales] as rs----------------------------------------------------------USE [AdventureWorksDW2012];SELECT COUNT(DISTINCT([ProductKey])) FROM [dbo].[FactInternetSales]Let’s dive into the data, run all these queries to learn more quickly.If you run this first query you discover 334 Different products (SKU’s) have been sold in the different Bike Shops. The second query reports that 158 different SKU’s have been sold online.Once we begin to pull the records of SKU sales (based on productID) the resultset will have many NULL values (see table below). This means that some of the items were sold in the bike shops and not on the web, and vice versa. Only a sub-set of the products were sold on both channels (142 as shown next).USE [AdventureWorksDW2012];SELECT COUNT(DISTINCT(r.[ProductKey])) as [Reseller Products], COUNT(DISTINCT(i.[ProductKey])) as [Internet Products] FROM [dbo].[FactResellerSales] as rINNER JOIN [dbo].[FactInternetSales] as iON r.productkey = i.productkey-82559080500The Inner Join pulls the values that are in common in both tables. 142 products have been sold on both retail channels. BTW is there much strategy as to what products are sold on what channel? The Internet channel is good for rural customers and small, and lightweight parts/products that are easy to ship.USE [AdventureWorksDW2012];SELECT COUNT(DISTINCT(r.[ProductKey])) as [Reseller Products], COUNT(DISTINCT(i.[ProductKey])) as [Internet Products] FROM [dbo].[FactResellerSales] as rLEFT JOIN [dbo].[FactInternetSales] as iON r.productkey = i.productkeyThe left join pulls all the values from the first table in the FROM statement and the matching values from the joined, second table.334 products have been sold on the Reseller channel, of those 334, 142 have also been sold on the Internet channel (142 in common)USE [AdventureWorksDW2012];SELECT COUNT(DISTINCT(r.[ProductKey])) as [Reseller Products], COUNT(DISTINCT(i.[ProductKey])) as [Internet Products]FROM [dbo].[FactResellerSales] as rRIGHT JOIN [dbo].[FactInternetSales] as iON r.productkey = i.productkeyThe right join pulls all the values from the joined table (the 2nd table) and then finds the matches from the first table in the FROM statement.158 products have been sold on the Internet channel, of those 158, 142 have also been sold on the Internet channel (142 in common). So the 142 are the matching values.Next we try to build a resultset that matches the business requirement, to build a list of product sales across both distribution channels. This next query might be your first attempt to build this dataset. Notice anything wrong with the data?USE [AdventureWorksDW2012];SELECT DISTINCT r.[ProductKey] as [Reseller Product ID], i.productkey as [Web Product ID], ISNULL(SUM(r.[OrderQuantity]), 0) as [Reseller totals], ISNULL(SUM(i.[OrderQuantity]), 0) as [Web totals]FROM [dbo].[FactResellerSales] as rFULL JOIN [dbo].[FactInternetSales] as iON r.productkey = i.productkey-- Run this query with and without this next line to better understand the JOIN.-- WHERE r.[ProductKey] IS NOT NULLGROUP BY r.[ProductKey] , i.productkey ORDER BY r.[ProductKey]25406350000This is a nice first attempt but JOINING two fact tables directly usually makes trouble, and a three table JOIN with 2 fact tables can be just as problematic, please test this.Before alternate strategies are shown, lets examine the highlighted ISNULL() function. Its is a keeper!!(ISNULL(fieldname),0) AS [Reseller Units]. This feature reads, for any of the retrieved rows, if the fieldname value is NULL then replace the NULL with a 0. This is important to clean up the results and later on if you want to add one column to another column, you can’t add a NULL to a number, any field that has NULLS cannot be added to any other field. So NULL + 300 = error, whereas 0 + 300 = 300. Later we will calculate the total for both sales channels.USE [AdventureWorksDW2012];SELECT [ModelName], [EnglishProductName], r.[ProductKey] as [Reseller Product ID], i.productkey as [Web Product ID], ISNULL(SUM(r.[OrderQuantity]), 0) as [Rseller totals], ISNULL(SUM(i.[OrderQuantity]), 0) as [Web totals]FROM [dbo].[DimProduct]as pFULL JOIN [dbo].[FactResellerSales] as r ON r.productkey = p.productkeyFULL JOIN [dbo].[FactInternetSales] as i ON i.productkey = p.productkeyWHERE [FinishedGoodsFlag] =1GROUP BY [ProductSubcategoryKey], [ModelName], [EnglishProductName], r.[ProductKey] , i.productkey 3406941245993488812651510Anyone think the totals look a little too high? This is data explosion00Anyone think the totals look a little too high? This is data explosionORDER BY [ProductSubcategoryKey], [ModelName], r.[ProductKey], [EnglishProductName]4889511430000This query returns 381 rows, and is a great improvement, even a heroic step forward in logic and performance. Two fact tables are joined to a dimension table. There are extra rows returned however, that would need to be cleaned up. Also the query runs very slowly. The picture starts on row 314 of 381 to show how the JOINS are working. The WHERE statement is used to filter out products that are not sold to the public or bike shops.Notice that you can put a field in the GROUP BY and ORDER BY (ProductSubcategoryKey) to organize the results, but you do not have to show that field in the resultset (ie not in the SELECT statement).USE [AdventureWorksDW2012];SELECT [ModelName], p.[ProductKey], [EnglishProductName], (SELECT ISNULL(SUM([OrderQuantity]), 0) FROM [dbo].[FactResellerSales] as s WHERE p.[ProductKey] = s.[ProductKey] ) AS [Reseller Sales], (SELECT ISNULL(SUM([OrderQuantity]), 0) FROM [dbo].[FactInternetSales] as iWHERE p.[ProductKey] = i.[ProductKey] ) AS [Internet Sales]FROM [dbo].[DimProduct] as pWHERE [FinishedGoodsFlag] =1ORDER BY [ProductSubcategoryKey], [ModelName], [ProductKey] DESC4889511430000Here is an improved query using two sub-querys. Notice there is no GROUP BY statement. Interesting!!More importantly the totals look much more normal. OMG does this mean an analyst ALWAYS needs to check their calculations fpr data explosion? Only of you don’t want to get laughed at.So what have we learned so far in this module? A) sub-queries allow two queries to be used together, and B) connecting 2 fact tables in the same query leds to data explosion where the numbers are inflated.Let’s see some other uses of sub-queries.USE [AdventureWorksDW2012];SELECT COUNT(DISTINCT(r.[ProductKey])) as [Reseller Products] FROM [dbo].[FactResellerSales] as rWHERE r.[ProductKey] NOT IN (SELECT DISTINCT([ProductKey]) from [dbo].[FactInternetSales] )This is a good example of a sub-query. The WHERE clause is its own SQL statement192 products have been sold on the Reseller channel but not on the Internet. If you want a product list, then take out the COUNT() term.If you add some more infomation from the products tables then you can make a recommendation as to what prodcts can also be offered online.USE [AdventureWorksDW2012];SELECT DISTINCT(r.[ProductKey]) as [Reseller Products], [EnglishProductName], SUM([OrderQuantity]) as [# Reseller Units]FROM [dbo].[FactResellerSales] as rINNER JOIN [dbo].[DimProduct] as p ON p.ProductKey = r.ProductKeyWHERE r.[ProductKey] NOT IN(SELECT DISTINCT([ProductKey]) from [dbo].[FactInternetSales])GROUP BY r.[ProductKey], [EnglishProductName]ORDER BY SUM([OrderQuantity]) DESC8699515557500Here is the list of products sold in stores but not on the internet channel. What do you do with this data? There are at least 2 ways to interpret this data.Restricting some products to only in-store sales generates foot traffic into the stores, where consumers often are upsold or cross-sold to buy different and more product.On the other hand give consumers the ease of consumption by selling direct. Especially for rural customers that are far from stores.USE [AdventureWorksDW2012];SELECT COUNT(DISTINCT(i.[ProductKey])) as [Internet Products] FROM [dbo].[FactInternetSales] as iWHERE i.[ProductKey] NOT IN (SELECT DISTINCT([ProductKey]) from [dbo].[FactResellerSales] )-2973300How about products sold online and not in stores? Should these be the bulky items, or niche items? What products should be sold online and not in stores??16 products have been sold on the Internt channel but not on the Reseller Channel. If you want a product list, then take out the COUNT() term. In the first query you can add more columns from the products table to see more information. Can you do it?This is a very common usage of a sub-query to create a very different WHERE clause.Sometimes you want to delete (or just identify) any items are in one table but not in another. Fopr example what customers had sales in 2018 but none in 2019? Or what customers reduced their purchases by > 50% in the current year over the last year.USE [AdventureWorksDW2012];SELECT DISTINCT i.[ProductKey] as [Product ID], [EnglishProductName], [ListPrice] FROM [dbo].[FactInternetSales] as i INNER JOIN [dbo].[DimProduct] as p ON i.ProductKey = p.ProductKeyWHERE i.[ProductKey] NOT IN (SELECT DISTINCT([ProductKey]) from [dbo].[FactResellerSales] )8699515367000Here is the list of 16 products that are sold on the internet but not in bike stores.The not in query is very useful to find dimensions that are in one fact table and not in another.USE [AdventureWorksDW2014];SELECT i.[ProductKey] as [Product ID], [EnglishProductName], [ListPrice], SUM([OrderQuantity]) as [Web Units], FORMAT(SUM([UnitPrice] - [TotalProductCost]), 'N0') as [Profit] FROM [dbo].[FactInternetSales] as i INNER JOIN [dbo].[DimProduct] as p ON i.ProductKey = p.ProductKey WHERE i.[ProductKey] NOT IN (SELECT DISTINCT([ProductKey]) FROM [dbo].[FactResellerSales]) GROUP BY i.[ProductKey], [EnglishProductName], [ListPrice]225867577850Run this query and check the results you retrieve. To learn faster, just copy any of the queries on the left into your version of SSMS and connect to WSU resources if you are a WSU student.00Run this query and check the results you retrieve. To learn faster, just copy any of the queries on the left into your version of SSMS and connect to WSU resources if you are a WSU student. ORDER BY SUM([UnitPrice] - [TotalProductCost]) DESC8724415256600In our typical business scenario, the marketing manager is going to want to make their decision to drive reseller store profit, based on some analytics, here units sold, and profit generated. Additional insight needed would be seasonality of demand.For example the AdventureWorks sales reps can recommend to the bike shop owners which new inventory can help drive revenue. At a minimum, a salesrep can provide this list or similar and ask the bikeshop owner if there is a lot of foot traffic for these items.Server: cb-ot-devst05.ad.wsu.edu login: mfstudent pwd: BIanalystHere the start of the show is the WHERE NOT IN Sub-query.Recapping why can’t I just use an INNER JOIN like in module 2? Because you return only a sub-set of the data.USE [AdventureWorksDW2012];SELECT r.[ProductKey] as [Reseller Products], [EnglishProductName], [ModelName], SUM(r.OrderQuantity) as [Sales on Reseller Channel], SUM(i.OrderQuantity) as [Sales on Internet Channel], SUM(r.OrderQuantity) + SUM(i.OrderQuantity) as [Total Sales]FROM [dbo].[FactResellerSales] as r3041181104830INNER JOIN [dbo].[FactInternetSales] as iON r.productkey = i.productkeyINNER JOIN [dbo].[DimProduct] as p ON r.ProductKey = p.ProductKey2365320233405Data Explosion00Data ExplosionGROUP BY r.[ProductKey], [EnglishProductName], [ModelName]This query provides the data we want but, guess what? On only 142 rows. So what do most analysts do?You can tell the correlated sub-queries in that they have their own SELECT statement, which seems to pull the data more correctly when there are two fact tables. They don’t use an INNER JOIN, but they do include a WHERE clause that performs similar functionality. So here the reseller units comes from the FactResellerSales table and the Internet Units comes from the FactInternetSales table. This next query is provides the solution.USE [AdventureWorksDW2012];SELECT [EnglishProductSubcategoryName] as [Sub-Category], [ProductKey], [ProductAlternateKey], [EnglishProductName]3963531139175389511728707400, ISNULL((SELECT SUM([OrderQuantity]) FROM [dbo].[FactResellerSales] as rsWHERE rs.[ProductKey] =p.ProductKey ),0) AS [Reseller Units], ISNULL((SELECT SUM([OrderQuantity]) FROM [dbo].[FactInternetSales] as iWHERE i.[ProductKey] =p.ProductKey),0) AS [Internet Units]--This tot units column From Leo and Andrew Class of 2015 3184304242598Subqueries fix the data explosion problemSubqueries fix the data explosion problem,ISNULL((SELECT SUM([OrderQuantity]) FROM [dbo].[FactResellerSales] as rsWHERE rs.[ProductKey] =p.ProductKey),0) + ISNULL((SELECT SUM([OrderQuantity]) FROM [dbo].[FactInternetSales] as iWHERE i.[ProductKey] =p.ProductKey),0) AS [Total Units Sold]FROM [dbo].[DimProduct] as pINNER JOIN [dbo].[DimProductSubcategory] as sc on sc.[ProductSubcategoryKey] = p.[ProductSubcategoryKey]WHERE p.[FinishedGoodsFlag] = 1 ORDER BY [Sub-Category], [ProductKey]Now we use two subqueries, and look at the trouble to calculate the total units. It has its own two subqueries! That problem is fixed in the next module. This query outputs 397 rows. 47 have 0 sales on both channels. So the correct number we need is 350. How do we pull the 347?It would be nice to filter out the rows that have 0 in the Total Units Sold column. Go ahead and copy the dataset in Excel and make some charts and think about what additional products can be sold online.Here is a suggestion from a former students, that looks like it might work, but actually still suffer from data explosionUSE [AdventureWorksDW2012];SELECT p.[ProductKey], [EnglishProductName], ISNULL(SUM(r.[OrderQuantity]), 0) as [Reseller totals], ISNULL(SUM(i.[OrderQuantity]), 0) as [Internet totals], ISNULL(SUM(r.[OrderQuantity]), 0) + ISNULL(SUM(i.[OrderQuantity]), 0) as [Grand Total]FROM [dbo].[DimProduct] AS pFULL join [dbo].[FactResellerSales] as r ON p.ProductKey = r.ProductKeyFULL JOIN [dbo].[FactInternetSales] as i ON p.ProductKey = i.ProductKeyWHERE [FinishedGoodsFlag] = 1 GROUP BY p.[ProductKey], [EnglishProductName]HAVING NOT (ISNULL(SUM(r.[OrderQuantity]), 0) + ISNULL(SUM(i.[OrderQuantity]), 0) ) =0ORDER BY p.[ProductKey]-4953082994500This second solution does the grouping of all the sales for all the products, the filters out the rows that have 0 in the grand total. Notice that the ORDER BY has to be after the HAVING.You can only use HAVING () after a GROUP BY ()Data Explosion!!!! Run for the hills!!593153524511000Sub-query example for the accountants: The Aging Report133985071183500227330051943000If a company allows credit sales, then they have to monitor and control accounts receivable from customers. The table of sales transactions such as shown on the right needs to be turned into a summary Aging report such as here.This example demonstrates an even more usefil detailed aging report, with one row for each customer. If you run this sub-query in your SSMS, you learn how it works. This sub-query takes massive numer of transaction rows of data and compiles it quickly into the time buckets that are used in United States accounting systems. The table to the right is as of this writing 2427 lines of data. It all needs to be examined and categorized without error.216027014224000USE Featherman_Analytics;SELECT [CustomerID], [CustomerName], ISNULL((SELECT SUM([Total_Sale]) FROM [featherman].[Sales] as sWHERE [Paid] = 0 AND (GETDATE() - [Invoice_Date]) BETWEEN 0 AND 30AND c.[CustomerID] = s.CustomerID), 0) as [0-30], ISNULL((SELECT SUM([Total_Sale]) FROM [featherman].[Sales] as sWHERE [Paid] = 0 AND (GETDATE() - [Invoice_Date]) BETWEEN 31 AND 6033420339450500AND c.[CustomerID] = s.CustomerID), 0) as [31-60], ISNULL((SELECT SUM([Total_Sale]) FROM [featherman].[Sales] as s33738388594600WHERE [Paid] = 0 AND (GETDATE() - [Invoice_Date]) BETWEEN 61 AND 90AND c.[CustomerID] = s.CustomerID), 0) as [61-90], ISNULL((SELECT SUM([Total_Sale]) FROM [featherman].[Sales] as sWHERE [Paid] = 0 AND (GETDATE() - [Invoice_Date]) BETWEEN 91 AND 120AND c.[CustomerID] = s.CustomerID), 0) as [91-120], ISNULL((SELECT SUM([Total_Sale]) FROM [featherman].[Sales] as sWHERE [Paid] = 0 AND (GETDATE() - [Invoice_Date]) BETWEEN 121 AND 180AND c.[CustomerID] = s.CustomerID), 0) as [121-180], ISNULL((SELECT SUM([Total_Sale]) FROM [featherman].[Sales] as sWHERE [Paid] = 0 AND (GETDATE() - [Invoice_Date]) BETWEEN 181 AND 360AND c.[CustomerID] = s.CustomerID), 0) as [181-360], ISNULL((SELECT FORMAT(SUM([Total_Sale]), 'N0')FROM [featherman].[Sales] as sWHERE [Paid] = 0 AND (GETDATE() - [Invoice_Date]) BETWEEN 0 AND 360AND c.[CustomerID] = s.CustomerID), 0) as [Total Unpaid Last Year]FROM [featherman].[Customers] as c71120709930This query produces the dataset that you see in the picture above. Once you have this data compiled, it can be saved into its own database table. Every business analyst knows that once you have data into a database then you can use free chart making software to visualize the data as shown below.00This query produces the dataset that you see in the picture above. Once you have this data compiled, it can be saved into its own database table. Every business analyst knows that once you have data into a database then you can use free chart making software to visualize the data as shown below. -171958094233900 This query is simple and powerful at the same time. It is simple in that only two dimensions are included, CustomerID and CustomerName, then six columns of accounting data is compiled and the formula is identical for all six columns. The power is that the [featherman].[Sales] fact table is queried over and over using the same three criteria:a) is the invoice unpaid? b) is the invoice date within a date range in the past? c) is the CustomerID in the sales table match a value of CustomerID in the [featherman].[Customers]dimension table which we are using to generate the report?These subqueries all use a JOIN functionality in the WHERE statement c.[CustomerID] = s.CustomerIDAnother interesting aspect of this query is the calculation of the number days ago the invoice was. (GETDATE() - [Invoice_Date])Better than that we can use the results of that date calculation in the BETWEEN statement such as GETDATE() - [Invoice_Date]) BETWEEN 181 AND 3605697110184856700 the link above to add some new records to the Featherman.Sales table. If you make a PowerBI report using the query below, you should be able to see the orders you added. However the free version only updates once per day.0Click the link above to add some new records to the Featherman.Sales table. If you make a PowerBI report using the query below, you should be able to see the orders you added. However the free version only updates once per day.USE Featherman_Analytics;SELECT c.[CustomerID], [CustomerName], [Invoice_Date], CAST((GETDATE() - [Invoice_Date]) as decimal(4)) as [Days], CAST(CAST((GETDATE() - [Invoice_Date]) as decimal(4)) as nvarchar(4)) as [DaysT], CaseWHEN CAST((GETDATE() - [Invoice_Date]) as decimal(4)) BETWEEN 0 and 90 THEN 'Current'WHEN CAST((GETDATE() - [Invoice_Date]) as decimal(4)) BETWEEN 60 and 90 THEN '60 days late'WHEN CAST((GETDATE() - [Invoice_Date]) as decimal(4)) BETWEEN 91 and 120 THEN '90 days late'WHEN CAST((GETDATE() - [Invoice_Date]) as decimal(4)) BETWEEN 121 and 180 THEN '1/3 year late'WHEN CAST((GETDATE() - [Invoice_Date]) as decimal(4)) BETWEEN 181 and 360 THEN '1/2 year late'12680403426200END as [AR Category], SUM([Total_Sale]) as [Total Unpaid Amount]FROM [featherman].[Sales] as sINNER JOIN [featherman].[Customers] as cON c.CustomerID = s.CustomerIDWHERE [Paid] = 0 AND CAST((GETDATE() - [Invoice_Date]) as decimal(4)) < 500GROUP BY c.[CustomerID], [CustomerName], [Invoice_Date]ORDER BY c.[CustomerID]Another great The days column GROUP BY query! Here we calculate the number of days between today and the invoice date. The result is in some clock system and needs to be CAST() format the data into a decimal number, number of days, to be used in charts.To facilitate Power BI analysis this Days column is next copied and CAST() into a textual variable (nvarchar) so that it can be easily used in the X axis of a chart (which prefer text columns to display the data correctly).Notice the two similar columns are named Days and DaysT. In PowerBI be sure to sort DaysT by the Days column. This pattern of sorting a textual column by a numeric column is common such as sorting the months or days of the week. Unfortunately sorting is not automatic in many data visualization softwares.While we can make charts quickly in PowerBI using the [Total Unpaid Amount] field5704868721194Made from our CASE statementMade from our CASE statement67862441444763Here we add up invoice amounts for each Accounts receivable category. Below we count up the invoices in each category (created by the CASE statementSub-queries allow you to add new columns of textual analytics such as when using a CASE statement, and also new columns of calculated numeric analytics such as counts, averages and totals. Each metric added needs to be added at the same level of granularity as the overall query. If you can define your level of granularity, then you can pull columns from some dimension tables and then use sub-queries to pull data from transaction or summary measure tables to produce datasets of your accurate analysis. Sub-queries are the best way to integrate data tables from different data sources. To make the sub-query;s all work together is sharing the same level of granularity. Maybe one day when you can’t get a GROUP BY() query to work, or you intuit and verify data explosion, you will remember to use sub-queries and build the dataset columns by column. In summary use sub-queries to a) add detailed WHERE statements that can pull and calculate values from some other tables, and b) make queries calculate correctly when the calculations are not accurate, or c) creating new datasets that summarize data in new ways such as the AccountAging Report.Enjoy Mauricio Featherman ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download