Washington State University



Additional glossaryCardinalityA database term that specifies how many different values are possible for a column in a database table or array. In a pizza ordering app, the size of pizza may have a cardinality of 4 (S, M, L, XL). In an e-commerce app, the cardinality for US state would be 50. When generating reports or pivot tables, place high cardinality values down the rows of the report, rather than across the columns, with the result of a narrow and tall report. If you have a dimension that has a high cardinality, you can improve the reporting by putting the values into groups using a SQL CASE statement. An example would be to put the 50 US states into 6 regions (NW, SW, etc.). A column chart then would have 6 columns not 50.Cardinality also must be considered when building reports, if a data column has useful data but too many values (such as date of transaction plus time of day) you may want to create another column that would be used in analysis. For example you can reduce the number of possible values (such as to day # of the month with values 1-31) using DATEPART()You can also reduce cardinality with CASE statements that can be used to put dimensional attributes into categories (e.g. putting US 50 states into 6 regions).Conditional Formatting1270000Report writing software typically include a way to change the color of the text in cells or columns, or shading of the cells. Ability to highlight outlier cells, or put little status indicator icons into a cell based on a condition (classic red, yellow, green stoplights). The report designer and management team together decide the business rules and numeric targets that decide the conditional formatting (i.e. the usage of color and icons). Looking at a table of numbers can obfuscate (hide) the variance in the data, for example it ca be hard to detect values that have one extra digit (eg. Compare 100,000 to 1,000,000). To reduce the mind-numbing process of examining table values it is useful to introduce color to speed up the report comprehension process. Many managers still want tabular reports of values, so use conditional formatting to bring attention to key data points. Keep in mind however that 15% of north americans are color blind, and colors to not differentiate when printing with black ink. Be sure to use icons that are different shapes as well as different colors. -5080-2216785003380740000Placing the famous stoplights in a cell (red, yellow, green) to categorize the value in the cell into one of three groups is similar to using a SQL CASE() or NTILE() statement to place a term into a new column. Both are used to categorize data.ConnectionA connection is a wireless link to a database. To create a connection you need to specify the server, an authorized username and password, and the name of the database that you have been granted access to.When making a new report, pivot chart, or dashboard, you first need to specify the connection to the data source(s) before writing a query to retrieve data.Crosstab(aka cross tabulation)-311153365500A crosstab is a type of report that can compile data based on two dimensions. This is similar to a pivot table in Excel.Crosstabs are powerful in that they can reduce the granularity of data into well received tables using the dimensions listed. In SQL the PIVOT () function builds a crosstab. In SSRS you create the report using the crosstab (aka matrix report) tool, and when the report is generated, the data is compiled for you (SSRS runs the PIVOT() command for you. A limitation is that only one measure can be summarized (cross tabulated) at a time, meaning that reports would often have several crosstab tables within them. Analysts often use conditional formatting in a crosstab report to draw attention to highlights and problems In some report writing software’s such as Excel you can swap out the dimensions easily enabling the analyst to quickly organize the data along important dimensions. In other report writing software such as SSMS, the row and column dimensions must be specified in advance and are not easily changed, for example a different stored procedure that uses a GROUP BY procedure must be run.Crosstabs are a standard reporting tool, but can be also be used as just one step in the data compiling process. For example you can condense transaction data into a crosstab, which then is merged with data from another data source. When trying to compact data and reduce its granularity, crosstabs can be used in very powerful ways.DashboardSimilar to the dashboard of a vehicle, an analytics dashboard provides indicants of system performance in simple to consume graphics such as a gauge, thermometer, column chart, stoplight, or other indicant. The data is high-level such as the red/yellow/ green stoplight indicant (or thumbs up/down), that alerts the manager to out of tolerance conditions.A dashboard is a collection of indicators on one screen. They should be connected to more detailed reports on each phenomenon. The dashboard can be as simple as the first tab of the PowerBI file that shows small, but direct summary data. Other tabs can provide graphics that monitor the phenomenon at a more detailed level.DatasetWhen you specify a query to run over a connection then the results that are returned is a table of data, or a set of tables organized into a data model called a dataset or resultset (ie set of results).An example in usage is the analyst asking the question of the operations manager before they start a query session, ‘Do we need any more dimensions in our dataset, before we begin the Tableau dashboard’?Data sourceThe datasource refers to the names of the server and may more generically refer to the names of the databases, tables where needed data resides. The analyst typically asks the DBA what data sources are available (internally or externally) when they are gathering data for analysis.Alternatively when building a report or visualization the software often asks you a) where is the data (what server) b) specify the query that builds the dataset, c) before making the set of visualizations. Data CleaningAll reporting and analytics is dependent on accurate data. Ideally operational systems ensure data entered as a result of business transactions is complete and accurate, but this goal is a moving target. The DBA often has to clean data when it is gathered, and merged with other data. Data must be cleaned and checked for valid entries such as data types, data ranges, and acceptable values (eg. US not USA).Operational data should be verified and cleaned when transferred to the data warehouse during the ETL process. The DBA that cleans data and sees repetitive errors should discuss this with the programmer/DBA’s in charge of the transaction processing systems to root out the source of the errors, thus saving time and effort. This activity is an example of continuous improvement of IT systems.Data Transformation115830311875760It is easy to simplify the discussion of data transformations and focus on the ETL acronym and discuss only its role as step 2 in the ETL process that prepares and moves data from operational systems to a data warehouse. Data transformation however is at the core of the data management and analytics process.From analytics requires the source data to be processed, verified, cleaned and transformed from mountains of disorganized data compiled down to column and line charts, maps, yes/no binary decisions etc. Transforming data from one state to another is at the center of data management activities. Large amounts of organizational is transformed regularly by DBA’s. Two benefits of having DBA’s manage organizational data include:a) they will typically find a way to automate the process, reducing labor effort, increasing reliability of data access, use SQL, rolled into stored procedures, that together are used to automate the data transformation process (such as performing the ETL nightly at 2 am) b) they will instill a system of quality control. Data transformations can also be performed in Excel with the copy/paste/edit process (hard to automate), and as shown below PowerBI has many useful data transformations possible. From one perspective tables of data are built, populated with accurate data, from another perspective raw transactional data are compiled into more easily consumable (i.e. Mochi) 3788743-63479900Many data transformations occur both prior and during the analytics process. The final dataset generated for analysis or the final set of database tables that store operational data into a data warehouse have both been processed and transformed. One main aspect of data transformation is that new columns of metrics are calculated and added to datasets. Thus new columns of insightful information can be derived from other columns of numeric data. The added columns of metrics can be used to automate decision making and dashboard updates. For example:a) records are categorized, and segmented into groups, b) rankings and rating are performed, c) data is cleaned and made more accurate, d) data is translated from one language or currency to another, e) data from different sources (internal & external) is merged, e) data is stored in agreed upon standards (i.e. 2 letters for states, 3 letters for country), f) column names are changed into agreed upon standards, g) errors in spelling are removed, h) data is moved from one server to another (or integrating data from different servers), i) data measures are summarized, j) change the data type of the data (e.g. from numeric to text, to facilitate chart making or turnk) reduce the number of decimal values in the data (e.g. from 12.5839 to 12.59)l) to create more compact data values that are used in charts (e.g. Mar-09)m) even using a column or line chart to display granular data can be thought of as a data transformation.n) geocode data for reports (specify column for latitude/longitude)o) replacing values with other values, using built-in rulesp) etc., etc., etc.Data must be transformed from its messy, dis-integrated state and stored into a purpose driven data models, organizing the data for speed of retrieval during querying and reporting. If the data is stored in linked tables, then it can later be analyzed to investigate research questions.The ETL process moves data from different operational databases to a centralized data warehouse. This data movement occurs on a regular schedule, and data is cleaned and then transformed before it is copied into the data warehouse. Some data transformations that we utilized include:a) making new columns of data and time data such as with DATEPART()b) making new tables of condensed data such as the resultset from using a GROUP BY() or PIVOT () query c) make new columns by merging (ie FN & LN) or splitting apart words. d) change values of the dimensions such as collapsing data that is too granular such as turning age into generational ranges (20-35, 36-40 etc.) as with CASE() statements.e) pull columns together to build a hierarchy-36214-17077100Power BI has a rich set of data transformation tools such as removing duplicates from columns, renaming columns, splitting data, replacing values, merging data, appending new rows to existing tables, pivoting rows and columns, adding columns of parsed time/date data etc.DrilldownDrill-down is both an aspect of business communication and a feature in dashboards/reports. Conceptually drilldown means a change in level of analysis from a more summarized perspective to more granular perspective, such as switching level of analysis from WSU team rebounding statistics to individual player (Josh Hawkinson).Conceptually in business communications it is common for an analyst to present their overall findings, conclusion, or recommendation first. If the manager (recipient of communication) has time, they may request more detail or supporting analysis. This request for more detail is called drilling down into the data.In a report you can enable drill down on a chart, pivot table or map, for example enabling the report consumer to change level of analysis from country, to state to city. You must first set up a hierarchy in the dimension attributes (e.g. geography table) to enable drilldown. You typically drilldown to look at root causes of business performance.The opposite of drill-down (to more detail) is roll-up (to generate a summarization on a dimension). For example if you are looking at performance of individual machines in a work cell, you may want to roll-up the analysis to present a summary for the work cell or factory.Exception ReportingThe process and output of reports that bring attention to out of tolerance conditions. Rather than the manager needing to track and look for operational problems, the report is set up to push this data to the manager.For example a report can be set-up to highlight machines that produce more than 1% errors per day. This report would drive managerial corrective action such as maintenance and recalibration of the machine. Exploratory AnalysisAn analysis session where the manager or analyst is poking around the data, trying to make sense of it, looking for problems, trends, insights and highlights.The wise analyst is lucky if they have the reaction, “hmm that’s interesting” and more importantly that they perform follow-up analysis. These insights make the job interesting. Often the reports made in an exploratory session to learn about a phenomenon, discover business aspects worth monitoring. Therefore it is common that exploratory reports turn into productionized reports.HierarchyA hierarchy in analytics refers to levels of a dimension that could each be used to compile data at a different level of detail (ie granularity). For example hour, day, week, month, quarter, year is a time/date hierarchy. Many dimensions are hierarchical, and it is wise to create them to allow the report user to drill-down or roll-up the data. Can you think of others?If you want to build a hierarchy in a dataset you may need to use a SQL query with several INNER JOINS to pull columns from different tables together. In the time/date example you would use different DATEPART() formulas.It is natural for a manager or analyst to want to zoom down (or up) to see different levels of analysis when they are exploring a problem using a dashboard/report. Maybe they see a problem and want to find more detailed information to see root causes (such as drilling down to specific productID’s or stores). Alternatively the manager may see a specific problem and then wonder how widespread it is and zoom out to more summarized data (eg. Product line or region).The prudent analyst uses their empathy and detective skills and stays a step ahead of the report requester and builds the hierarchies in the data and makes the drill-down/roll-up features available in their report, dashboard or visualization. The report requester is going to ask for reports at different levels of detail, so why not just give it to them before they ask. Sure it will take extra time, but your care will be noticed.Hierarchies also make it possible for the analyst to publish one report that can be used by different levels of management, rather than publish more reports. Fewer reports are easier to track, manage and update.ParameterIn analytics a parameter is a criteria used to dynamically filter a report (e.g. month, or product line). The prudent analyst provides a list of values that the program user can dynamically choose from (as compared to hard-coded filter in a report such as state = ‘WA’) The analyst can enable the report to be filtered on one or several criteria. Parameters allow one report/dashboard to be used by many different individuals (e.g. different states).Parameterized queryA query that allows a filtering criteria (e.g. state) to be dynamically provided by the report/dashboard user in live mode during analytical sessions. This means that the program user selects one or more criteria (e.g. state or time period) and these values are passed into the SQL query when it is run. In SQL the @ sign designates a variable, so the parameter (e.g. @p1 or @state) is a variable that gets its value from the report user, and the value is passed into the query, which then retrieves the dataset, and refreshes the chart, map or report.In SQL terms to implement the dynamic filtering you could use something like WHERE State = @p1 (p is short for parameter) or WHERE State =@state (here you name the parameter the same as the column name which reduces a lot of confusion.)ProductionizedReportLiterally a report in production, as in it is produced on a regular basis. Many managerial reports are end of day or weekly/monthly summaries. These reports are automated, therefore their production and distribution (via emailed pdf, printed report, phone app, website, etc.) occurs with no human labor. Productionized reports auto-refresh on a regular basis and become a relied upon source of information for managers.Query editoraka QBEA graphical facility that assists query writing. For example in Access and SSRS youa) choose tables for a list, b) connect the tables by connecting their matching key fields, c) pick columns that you need to include in your resultset, d) set filters,e) set parameters (variables that accept filtering criteria from the report user)f) create calculated columnsg) rename columnsMost of the functionality you need for basic queries is provided. Writing the query this way however ties it to the report, and therefore the query is not portable and available to be easily used by other reports (in contrast to a stored procedure). Graphical querying seems more user friendly that typing SQL as it uses mouse clicks and does not require the typing of SQL. In fact the query editor is writing the SQL for you. IT is pretty easy to create queries but also easy to have a false sense of confidence and get the query incorrect, so check your answers, and resultset. You may think that learning the SQL is therefore not useful, but that isn’t true. Graphical query editors have limitations, and are useful only for simple data retrieval and presentation tasks (perhaps half of the reports are relatively simple). The pathway to higher paygrades, consulting gigs, and automation wizardry is SQL.Caution: You still need to understand data modeling and querying concepts whether you type SQL or use a graphical click-click environment.VariableA variable is a piece of RAM memory that has been prepared to accept one or more values from a program user. The variable is named, has a datatype and is ‘scoped’ set to be used locally in one report or query or increasing levels of system-wide access. In SQL variables start with an @ as in @employeeID. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download