Washington State University



Comparing GROUP BY () and PIVOT() QueryingFeatherman?This document introduces the concept of putting data into tables. Similar to how excel spreadsheets use rows an columns to identify each cell, tables use row and column dimensions to slice up data into cells. Think of each cell as its own little GROUP BY() query. The concept is called cross-tabulating data or cross-tabs. You specify the row and column dimensions and tell the program to aggregate a measure on the two dimensions.228955565305000If you need to quickly summarize data into a table PIVOT () is a great SQL command. After reviewing GROUP BY () functionality, we compare GROUP BY() functions and PIVOT() functions within T-SQL. GROUP BY () resultsets and the demonstrated charting is compared with the simpler concept of pivoting data into tables. It is also common to pre-compile data with GROUP BY() query and then either move the data to Excel for a pivot table, or save the grouped data to a temporary table or SQL table and then perform a SQL PIVOT().Sometimes it takes several data transformations to get the data into the right format When pivoting data you perform a great deal of GROUP BY () queries (one for each cell) by using every combination of two different dimensions. In the example shown above we are grouping sales by two related dimensions, year and month. So in effect we are running 24 GROUP BY () queries one for each month shown. You can also pivot data on two different dimensions such as geography and product sub-category. The picture above is from Excel so it has built-in row, column and grand totals. While Excel may not handle all your data, SQL can handle terabytes.We start by reviewing and comparing GROUP BY() and PIVOT() then show examples of PIVOT().GROUP BY () QueriesUsed to provide grouped, aggregated, summary, compiled calculations based on whatever group (category) is specified (e.g. who, what, where, when, why). Can be used to perform high level of summarization (ie country), lower levels of summarization (ie city within state, within country), or provide lightly summarized data (ie reseller store within state)Group BY queries allow you to add as many columns of metrics as you wish – all have to be at the same level of granularity though, meaning all relaed to the terms in the group by statement – such as city. You can create many columns of aggregated results such as counts, totals, averages. Moving averages, ranks, etc.) The chosen grouping outputs one row of aggregated data per grouping of dimension attributes (such as geographic region, product line, employee, project, etc.).If you have one level of grouping (say country) then the totals will be at a high level (VP level). This minimal grouping may be visualized on an executive’s dashboard. If you have 2 or 3 levels of grouping then you have more rows of more detailed data (i.e., within each country you have customers grouped within cities, where one record per customer is calculated and provided in the dataset, If you include detailed dimensions and also more summary dimensions, the calculations are performed based on the most detailed dimension provided. For example if you include customerID, city, state in the GROUP BY term the data is calculated at the CustomerID level. IT is nice to include different dimensions so that you can create a hierarchy within a dimension (e.g., city, state, regio, country) which is useful for drill-down maps or reports.Advantages: Can have many different columns of aggregated analytics for any dimension that is being evaluated. Provides an easy solution for commonly needed analytics. Good to build resultsets for reports & spreadsheets. The columns of metrics become the datasource for a KPI dashboard. Disadvantages: GROUP BY will not solve all aggregation needs. Different aggregations may be needed, and the requirement that ALL the non-aggregate columns in the SELECT statement must be in the GROUP BY statement can restrict analysis – requiring need for many queries (until you learn table variables).PIVOT ()A SQL function used to aggregate large amounts of data into a cross-tabulated table format. Pivot queries perform A LOT of functionality and are well worth the effort to learn how to use them. While a programmer may be tempted to use nested loops to perform calculations for any pair of row and column dimensions, the PIVOT function is very compact and powerful. The measures can be Sums, Counts, Min Max, Standard Deviations, and other custom calculations. Putting data into tables can help comprehension.Pivot queries provide a static compacted table of results displaying ONE aggregated value in a tabular format. The aggregation for the ONE calculated value (the measure) is based on all the row & column combinations of two other dimensions. For example production units can be shown for month and machine, units sold can be totaled for city and product line.Advantages: Compacts a ton of data down into a small footprint table, which is the perfect datasource for a column or line chart. Pivoted data is very commonly the data format needed. You can also compact data from different sources with PIVOT() then you can combine them. Can aggregate massive amounts of data, which often local data analyzed in a pivot table in excel, tableau visualization or an SSRS crosstab report cannot handle. Some managers still really like tables of data. Disadvantages:Unlike GROUP BY queries which are pretty intuitive, the PIVOT () code takes a bit of time to get used to. Currently require advanced dynamic SQL to parameterize. Typically can have only have one aggregated measure in the crosstab (ie pivot table). In an Excel pivot table, you specify the row, columns and values. Same here. You typically only use three fields in the SELECT statement 1) the row heading from a dimension table, the rows go down the page – here year 2) the column heading from a different dimension that are sliced up in the PIVOT statement to create columns that go across the page.3) the measure that is being aggregated. Take a look at this select statement and also notice there is no GROUP BY statement, the data rows are brought in with no summarization functions. The yellow code is run and stored into an in-memory array (here called base table).The data from the base table is selected and pivoted using the FOR statement. The IN statement creates the columns. The “pivot” means that the SUM GROUP BY is being performed for each cell using the intersection of the row and column attributes to calculate each cell.USE [AdventureWorksDW2012];SELECT * from 85153523050500(SELECT YEAR(OrderDate) AS OrderYear, DATENAME(MONTH,OrderDate) AS [MonthName], [SalesAmount]FROM [dbo].[FactResellerSales]) AS BaseDataTablePIVOT(SUM([SalesAmount]) FOR [MonthName] IN(January,February,March,April,May, June,July, August,September,October, November,December)) AS PivotTableResults are again shown in this pivot table - but only sales are totaled by month and year, IF you want a different measure cross-tabulated then make another PIVOT query. Move to the next document to learn more about PIVOT queries.1238252095500 ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download