Washington State University



Featherman’s T-SQL Adventures ?Table JOINS with GROUP BY – more practiceThis document shows some star-schema data models that can be made by pulling together tables from the AdventureWorksDW2014 database. If you read this document, you will become more familiar with the database tables, and their foreign keyprimary key linkages. Undersatanding how to connect tables gives you the ability to perform analysis based on categories. If you can pull together a fact table (of transaction details – measures and numberic and identifiers) and the dimensions (the who, what were, when) then you can produce datasets that can be analyzed. If you can figure out how the dimension tables connect to the fact table (and sometimes to each other to form a hierarchy) to form the data model, then you can group and slice the data into different combinations to answer different business questions. You will basically become an analytics hero.If you type in the queries shown here (copy and paste if pressed for time, but you wil learn less) you will have ample opportunity to perform table joins using the GROUP BY command. You also gain experience making columns of metrics. These standard operations need to be mastered for conceptual understanding, and implementation performance. A good analyst can look at a table structure and write SQL queries similar as shown with limited trouble, and quickly.This is your goal and a signal for readiness to apply and compete for analyst positions. This content therefore needs to be mastered for speed, and the prudent student will ingest this content and then write their own queries. The implementation of table JOINS and specifying dimensions for grouping data is a skill that can be mastered using repetition. Copy the data into Excel and make some pivot charts for edutainment and to build skills.Remember the numbers calculated are not just some numbers on a page, they are signals that can be perceived and used to make sense of current and past business performance.To the left is the data model to categorize sales data by business customer, within different geographic locations. The three tables shown to the left are connected using the SQL code below.- first specify the fact table of measures. Recall the AS statement in SQL is used in this instance to assign an alias (nickname) for the tableFROM [dbo].[FactResellerSales] as s-- now connect two dimension tables to categorize the dataINNER JOIN [dbo].[DimReseller] as rON s.ResellerKey = r.ResellerKeyINNER JOIN [dbo].[DimGeography] as gON g.GeographyKey = r.GeographyKeyIn this next example, type in the program and then type in different dimensions to group on. For example try different combinations of city/state/country/retailer/business type. Also think of another metric to calculateUSE [AdventureWorksDW2014];SELECT [City]/*this is the dimension to group the data by, so the granularity is city which is more condensed than store level data, but less summary in nature than state/province or country. Data is always aggregated on the most granular (lowest level of detail) dimension. This means that even if we add state/province and/or country, while we would have those nice new columns that are useful for sorting data and mapping, the measures would still be summarized by city. Don't believe it? add them and see the measures do not recalculate */ /* These next three columns are metrics measuring frequency of transacting, how often are sales being made? Frequent transactions with target markets is a sign of health, and can provide suggestions to improve supply chain efficiencies such as moving top 100 inventory closer to the customer */, COUNT(DISTINCT([SalesOrderNumber])) as [# Invoices], COUNT([SalesOrderNumber]) as [# Line Items], COUNT([SalesOrderNumber])/COUNT(DISTINCT([SalesOrderNumber])) AS [Avg. Items per Invoice]/* These metrics show revenue - first total revenue (this is not profit!) and then average invoice amount */, SUM([SalesAmount]) as [Total Revenue], AVG([SalesAmount]) as [Avg. Invoice Revenue]/* These metrics show revenue - 1) total revenue (this is not profit!) and 2) average invoice amount */, SUM([OrderQuantity]) as [Total Units], SUM([OrderQuantity])/COUNT(DISTINCT([SalesOrderNumber])) as [# Units per Invoice]--specify where the data comes from - first specify the fact table of measuresFROM [dbo].[FactResellerSales] as s-- now connect two dimension tables to categorize the dataINNER JOIN [dbo].[DimReseller] as r ON s.ResellerKey = r.ResellerKeyINNER JOIN [dbo].[DimGeography] as g ON g.GeographyKey = r.GeographyKeyWHERE [EnglishCountryRegionName] = 'Germany'GROUP BY [City]ORDER BY [Avg. Invoice Revenue] DESC-1070610152590500left53200This next example uses the FactFinance table which is an accounting of the money spent by each department group (which are shown on the left for each organization (below). The parent organization is AdventureWorks Cycle, and they have 3 operational groups (North American, european, and Pacific). The Operations groups are split into six divisions (for North America and three Countries).Here are the JOINs to see how the tables connect to each other.FROM [dbo].[FactFinance] as fINNER JOIN DimDepartmentGroup as dg ON dg.DepartmentGroupKey = f.DepartmentGroupKey INNER JOIN [dbo].[DimOrganization] as o ON anizationKey = anizationKey272986513271500INNER JOIN [dbo].[DimScenario] as s ON f.ScenarioKey = s.ScenarioKey INNER JOIN [dbo].[DimDate] as d ON d.DateKey = f.DateKeyThis next query has a lot of dimensions. In practice you can focus the query on different levels of granularity (or pull all the data at the transaction level and thenaggregate it inside the report writing program. Run the query using different combinations of the dimensions to see different level of aggregation. The DimDate table is a special table that has a value for every day of the year for many years. The fact table can connect to the dimDate table giving access to many pre-made columns such as week#, month #, month name, etc. USE [AdventureWorksDW2014];SELECT [OrganizationName], [DepartmentGroupName],[ScenarioName], [CalendarYear] AS [Year], [MonthNumberOfYear] AS [Month#], [EnglishMonthName] AS [Month]/*just one measure in this example so just one column summarized. In the future we will compare months to the prior month and calcualte running totals */, SUM([Amount]) as [Total Amount] FROM [dbo].[FactFinance] as fINNER JOIN DimDepartmentGroup as dg ON dg.DepartmentGroupKey = f.DepartmentGroupKey INNER JOIN [dbo].[DimOrganization] as o ON anizationKey = anizationKeyINNER JOIN [dbo].[DimScenario] as s ON f.ScenarioKey = s.ScenarioKey INNER JOIN [dbo].[DimDate] as d ON d.DateKey = f.DateKey GROUP BY [OrganizationName], [DepartmentGroupName], [ScenarioName] , [CalendarYear], [MonthNumberOfYear], [EnglishMonthName] ORDER BY [OrganizationName], [DepartmentGroupName],[CalendarYear], MonthNumberOfYear] 018415000This next chart in Power BI is useful to look money spent or earned over time for different combinations of department groups and Organizations.This next program is only 1 table, but is an interesting diversion from looking at sales data. The data is for adventureWorks call center where operators answer customer questions, respond to product issues, log issues that are serious (for feedback to management) and take orders. Level 1 operators are less skilled at problem resolution than Level 2 operators, and typically are tempoorary workers that can solve problems using recommendations from the expert system . Level 1 operator salary is half tht of Level 2, which makes shifts with more Level 1 operators more cost-effective, but customer service may suffer.As a percentage of total operators, how many level 2 operators is the right number? Does call volume, and operator performance differ by Shift? Is there a difference in performance for weekdays vs. holidays? Run the query with both dimensions in the GROUP BY then try one or the other dimension.USE [AdventureWorksDW2014]; SELECT [Shift], [WageType] ,SUM([LevelOneOperators]) AS [# Level 1 Operators] ,SUM([LevelTwoOperators]) AS [# Level 2 Operators] ,SUM([TotalOperators]) AS [# TotalOperators] ,SUM([Calls]) as [# Calls] ,SUM([AutomaticResponses]) as [# Responses] ,SUM([Orders]) as [# Orders] ,SUM([IssuesRaised]) as [# Issues]--these two metrics imagined, designed and implemented. Can you think of another metric for a call center? ,SUM([Calls])/SUM([TotalOperators]) as [Operator Efficiency Rating] ,SUM([Orders])/SUM([TotalOperators]) as [Orders Booked Conversion KPI] FROM [AdventureWorksDW2014].[dbo].[FactCallCenter] GROUP BY [Shift], [WageType]4889535369500 ORDER BY [Shift], [WageType]The code is actually simplistic. You just need to know what the data in the columns mean. Notice there are no joined tables, so there is no data model.Again business analysts build value if they can design and implement columns of analytics tht bring insight to managerial decision-making.This next data model is used to show the sales quota for each salesrepresentative for each sales territory. The sales quotas are quarterly, sales data from the factreseller sales table is also included, but the numbers must be checked as any data model with 2 fact tables, can cause inconsistencies in how the data is summarized.The chart below looks correct, but use the SQL statement below to look at actual sales amount for any quarter. Notice the tables used include the date table, to gain access to quarter and year dimension fields. These dimensions also could have been calcualted using fact table data.Later in the course we will use array tables to fix the problem of pulling data from 2 different fact tables. Use [AdventureWorksDW2014]; SELECT e.[EmployeeKey], [LastName],[FirstName], [Title], [CalendarYear], [CalendarQuarter], SUM([SalesAmount])FROM [dbo].[FactResellerSales] as s INNER JOIN [dbo].[DimEmployee] as e ON e.EmployeeKey = s.EmployeeKeyINNER JOIN [dbo].[DimDate] as d ON d.DateKey = s.OrderDateKeyGROUP BY e.[EmployeeKey], [LastName],[FirstName], [Title], [CalendarYear], [CalendarQuarter]Now to leave you with a problem. The factProductInventory table shows the balance of the inventory each time the inventory is checked for the product ID. There are many rows for each product ID, so the grain is different than the dimProduct table which causes a data explosion. This problem will be fixed later in the semester. Have any ideas on how to pull this query and retrieve approx. 600 rows?Use [AdventureWorksDW2014]; SELECT DISTINCT(i.[ProductKey]), [EnglishProductSubcategoryName] as [Sub-Category], [EnglishProductName], [UnitCost], [ListPrice], [UnitsBalance]FROM [dbo].[FactProductInventory] as iINNER JOIN [dbo].[DimProduct] as p on p.ProductKey = i.ProductKeyINNER JOIN [dbo].[DimProductSubcategory] as s on s.ProductSubcategoryKey = p.ProductSubcategoryKeyWHERE [ListPrice] IS NOT NULL ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download