Washington State University



Building Data Structures in the CloudPurpose: This project extends our webpage’s ability to store data from the business transactions. Here we introduce and leverage arrays. Think of arrays in your webpages as one or more tabs of an Excel spreadsheet. Data is organized into tables, like an Excel table. Rather than save the data into a local .xslx file though on a local machine where only 1 person can access the data, here webpage technology and arrays together store the tabular data in the RAM of the webserver. In later modules we transfer the in-memory data to a cloud-based SQL Server database for permanent storage. Arrays are a useful part of the data storage process as they are used to collect, store and modify the data before it is saved to the cloud database.History:Arrays are tabular shaped data structures stored in the RAM (primary memory) of the computer. Arrays have been used by programmers and every programming language since 1945. Arrays allow you to store and manipulate data, lots of data. Early computer scientists used arrays in Fortran, COBOL and C back in the day of big box IBM mainframe computing. Array structures are stored in the RAM of the computer and used for data processing. Data was fed from secondary storage into RAM and then the compiled data results were saved back to database and printed in reports. It used to be extremely expensive to manufacture and purchase RAM. However data processing speeds in RAM are by far the fastest option so many data management and reporting procedures (that run entire corporations) were based on a combination of storing data on tape and disk in databases, then when data processing, management, analytics, reports or visualization were needed the needed data was transferred from the slow but cheap tape, and disk storage to RAM. So data was fed into arrays in RAM and then manipulated, and compiled according to business needs. Using arrays makes it very easy to make categories and hierarchies in the data, useful calculations, and report writing. You compile the data into useful summarizations (see figure 2) that are then stored back into databases saved onto disk storage. It would be common to summarize 10,000 detail data records into 20 rows of data. Often business decisions are facilitated by looking at compiled and aggregated data. Often just by putting data into categories, allows you to see problems and opportunities. So arrays have revolutionized business analysis and reporting, and continue to be an important topic for new students to investigate while on their path towards data mastery.left184975500Motivation: It is relatively easy to manipulate data once it is in an array. This is the data processing and data management that you have heard of. The data management skills you can learn and develop here (and in self-guided study) will make you understand many business processes. A good report of business patterns can make a company attain competitive advantage in the marketplace. Modern business managers need exceptional data management, data analysis and data visualization skills.Detail: Data structures (aka. arrays, matrices, and datatables) can hold data records in the RAM of the webserver or in the RAM of the local PC. Arrays are tabular grids of memory similar to an Excel spread sheet (shown in the figure 1 below). Just like Excel, every cell in the structure, every row, or every column can be manipulated, and updated. The array organizes attributes about a business transaction into columns of a data structure (same as you would with excel when you make column headings). Each row then records the attributes about a single business transaction, such as date, quantity, employee, or price. Each attribute is stored in a column for that new row in the columns. The data stored can be neatly organized into rows of business transaction data (as in Figure 1) or compiled, converted, and formatted into derived summary tables of data (as in Figure 2). 165100142240Figure SEQ Figure \* ARABIC 1 Using datatables to store detail transaction data00Figure SEQ Figure \* ARABIC 1 Using datatables to store detail transaction dataThe business-minded programmer/analyst can parse out data from a column in an array and add new columns of derived data, either categorical, date/time, or measures. You can also split columns or merge columns together. You can put records into groups such as quartiles. These are all value-added procedures that get you paid. Implementation Plan Similar to database design, First create the table as a Public Shared object. Next add a page initialization procedure that sets up the datatable with columns making it ready for data entry. Specify the column names, datatypes, and any default value or property (such as auto-number). Next write a procedure that pulls the data values from the controls on a webpage and store the transaction data (such as labor payroll records) into a new row of the datatable.During the process of adding the new row of data, you can create your own new columns of data that are derived from the values provided by the program user. The new derived columns facilitate analytics and insight. For example you can a) parse out useful date dimension information such as day of week, b) you can perform timespan calculations, and c) you can perform calculations on the numerical measures, such as subtracting or adding to inventory values. By adding many columns that are later used in dashboards, chart, maps and reports, a lot of preparation for business analytics occurs at this step. If you can create columns that measure key performance indicators for improved marketing and operations analysis (for example), you will gain respect and be helping your company.20320050419000Often when adding the new row of transaction data (i.e. figure 1) you refresh different summary tables by updating values in one row (ie updating the totals when recording labor for one more electrician). So it is common to update the running totals in the summary table after each transaction. It is common that summary tables are the data source for a dashboard chart, you will learn that in a later module.3365501665605Figure SEQ Figure \* ARABIC 2 Using datatables to store and update summary data00Figure SEQ Figure \* ARABIC 2 Using datatables to store and update summary dataAfter storing the rows of data into the transaction table (aka fact table in data warehousing terms) next you turn your attention to updating the data. You add different procedures that update one or more rows in the datatable, such as updating the inventory level of a specific product sold such as the prior bookstore module. In the app below, one record is updated every time a student studies a few more hours for their upcoming exam. Conceptual – not in current module: Often the next step is to compile data into useful summarized and categorized reports. You can add more arrays to your website at different levels of summarization (such as labor grade in figure 2 above). You build new tables that categorize, group and summarize the data at a higher level of organization, such as going from individual employee to job category, or grand totals (as in figure 2 above). Also new datasets are often merged with existing data to derive new insights, and reports. This capability is usually reserved only for talented DBA’s. Higher level managers need reports at different levels (such as store, city, state, region, country) and time periods (such last week, month to date, month-over-month analysis, year-to-date, moving average). Business performance data must be compiled at different levels. Data is organized and compiled according to management’s key performance indicators (i.e. TOP 10 reports). This work can be performed in arrays, and transferred to SQL database tables. If you have the data in a SQL database already you can easily use SQL GROUP BY() and PIVOT() queries to summarize data as in ETL processes that move transaction data from different POS terminals and operational data stores to a data warehouse (see above footnote for more on this ETL topic).Conceptual – not in current module: From arrays to database tables: After data is compiled and compacted into different data tables, then it’s a one line of code to save array data into a SQL Server database. It is common to summarize transaction data, compile and build tables of summary data, by calculating column totals, row totals, counting the frequency of events, averaging, or summing different numerical columns. At some point you need to save data into the cloud based database. Cloud-hosted databases are the foundation of up to the minute analytics. Detail data or summary tables can both be analyzed with Excel pivot charts and pivot tables. When performing analysis, be sure to think about the level of analysis you need, such as individual, store level, job category, company-wide, regional, etc.). Here is a key concept you need to understand in this module:To add a number to column 5 for the first row in your datatable (the first row is row 0) you could use this code: arraytableName.rows(0).item(“Column 5”) += 4To take the value from a textbox control on a webpage and add that number to column 5 for the first row in your datatable you could use this code:arraytableName.rows(0).item(“Column 5”) += convert.todecimal(txtNumberHoursStudied.text)To take the value from a textbox control on a webpage and add that number to a nicely named column for the first row in your datatable you could use this code:arraytableName.rows(0).item(“ProgressColumn”) += convert.todecimal(txtNumberHoursStudied.text)Ok here is the magic228600-63500If you have the exams in a radiobutton list then you can identify the row to update by its row number. The index number of the data rows matches the index numbers of the radiobutton list items. The right row gets updated.arraytableName.rows(radiobuttonlist1.selectedindex).item(“ProgressColumn”) += convert.todecimal(txtNumberHoursStudied.text)In the code below notice that when you add a new exam (row of data) to the datatable you also add a new item to the radiobuttonlist which is displayed on the second view. Because the index numbers match, you can select the row to update in the radiobutton list as shown.-5787000Working with Dates and DataTablesImports System.DataPartial Class StudyAidCalendars Inherits System.Web.UI.Page 'this program created to give some ideas of the usefulness of the timespan variable and date calculations. The webpage is a first prototype of an app that can be used by students to track the number of hours they need to (and have) studied for their upcoming exams. May students do not keep track of their exam commitments, and specifically know what is the best use of their study time on any given day. Exam prep priorities will change as classes differ in their difficulty, and each student has their own strategy to prepare for exams. So this program must evolve to become more useful. This program is just a partial solution used for teaching. 'some suggested upgrades '1. Make the program data based '2. Add another column to give a textual feedback depending on progress to go - for example if > 70% of hours still need to attain goal the message should read something like "hurry up". Alternatively of there are just < 20% of hours left to hit the study goal the message could read "one more study session needed almost done '3. A gauge or red/yellow/green indicant or picture can be shown to alert the student about progress to goal. Red would mean danger, green would mean - reached the study goal. '4. The system could recommend to the student what is the best usage of their time (after identifying which exam is coming quickest, and seeing where the study performance to goal is in the danger zone. '5. Can you think of other improvements? Public Shared tblExams As New DataTable#Region "Save Record" Protected Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click 'when the program user has a new exam to study for they use this procedure to add a row to a datatable, so that it can be tracked. This procedure creates a new blank row then fills the colunms of that row with values from the form. The datarow added actually knows and has the same data table schema as it is a new row of the table specified (so the datatable knows what columns it has, the datarow knows the datatypes it should be receiving, and it knows any default values). The schema of the table is passed into the new row. If txtExamDate.Text = Nothing OrElse txtHoursRequired.Text = Nothing OrElse txtName.Text = Nothing OrElse IsNumeric(txtHoursRequired.Text) = False Then Response.Write("check all data entry") Exit Sub End If Dim dr As DataRow = tblExams.NewRow 'we want the date to be somewhat formatted nicely so we store it as string, but we need to calculate and display how many days it is in the future so we also convert it to a date format to include in a timespan calculation. Next the date is pulled from the textbox and converted to a variable with a date datatype. Dim dateExam As Date = DateTime.Parse(txtExamDate.Text) 'This is the timespan variable that can hold the results of the calculation between two dates. Here the number of days in the future the exam is from today is calculated. In a line below you can see the ts.totaldays function is used to format the calculated result into the number of days including decimal, portions of a day (e.g. 5.2 days based on time of day) Dim ts As TimeSpan = dateExam - Today() 'now take the values from the form and put them into different columns of the new blank row dr.Item("ExamName") = txtName.Text 'verify that this next field is a string not a date. The datatype of the column is a string, so that it can be formatted nicely. To explicity use a north american month/day/year formatting we can take the value out of the textbox, convert it to a date variable (see line 16 above). Here we format the date the way we prefer using the .tostring formatting dr.Item("ExamDate") = dateExam.ToString("MM/dd/yy") 'this is another very helpful formatting option of .the tostring built-in function. We can pull out the day name from the date dr.Item("DayofWeek") = dateExam.ToString("ddd") 'this next field is a decimal column that we store a value such as 6 days until the exam dr.Item("DaysUntilExam") = ts.TotalDays 'since the exam is a new row in the table the number hours required and remaining are both the same. The next procedure is used to update the number of hours remaining after recording some hours studying dr.Item("TotalHoursRequired") = Convert.ToDecimal(txtHoursRequired.Text) dr.Item("NumberHoursLeft") = Convert.ToDecimal(txtHoursRequired.Text) 'now that the necessary fields are filled in, we add the new row of data to the datatable and then display the data in a gridview control tblExams.Rows.Add(dr) GridView1.DataSource = tblExams GridView1.DataBind() 'view #2 will be used to update the # hours studied so far for the exam. We add the test name to a radiobuttonlist on the second view. rblExams.Items.Add(txtName.Text) End Sub#End Region#Region "Update exam progress" 'Here we update the one row of data that needs to be updated.0 Protected Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click 'the ExamDate column is stored in a string variable (so it could be formatted nicely) so we need to convert it to a date format so it can be included in a timespan operation and the column updated. 'Meaning for these next two lines: take the value from the examDate column, convert it to a datetime format and assign it to a date variable. (Convert.todatetime works for date or datetime variables). Next create a timespan variable and set its value to the result of the calculation exam date - today. The result of this calculation can be formatted using the .totaldays formatting into a number such as 5.5 days. If rblExams.SelectedIndex = -1 OrElse txtHours.Text = Nothing OrElse IsNumeric(txtHours.Text) = False Then Response.Write("Check data entry") Exit Sub End If Dim dtExam As Date = Convert.ToDateTime(tblExams.Rows(rblExams.SelectedIndex).Item("ExamDate")) Dim ts As TimeSpan = dtExam - Today() Dim decProgress As Decimal With tblExams.Rows(rblExams.SelectedIndex) .Item("DaysUntilExam") = ts.TotalDays 'this next column needed to have a default value of 0. Here is incremented to record the number of hours studied. .Item("HoursCompleted") += Convert.ToDecimal(txtHours.Text) 'this next formula just keeps track of how many hours are left to study. We can use the value in the Hours completed column because we just updated it. .Item("NumberHoursLeft") = .Item("TotalHoursRequired") - .Item("HoursCompleted") 'we use a decimal variable to peform a division, then format that result into a string to format it nicely. The Progress column is a string becuase divisions always give you 20 decimal places. If we convert to string we can specify how many decimal places are displayed. decProgress = .Item("HoursCompleted") / .Item("TotalHoursRequired") .Item("Progress") = decProgress.ToString("P1") End With 'now show the updated data for prpgress to exam. GridView2.DataSource = tblExams GridView2.DataBind() End Sub#End Region#Region "Create table" Private Sub NewFor2018_StudyAidCalendars_Init(sender As Object, e As EventArgs) Handles Me.Init 'Create the columns for the datatable. Notice that the Exam date and Progress columns are set to the datatype string. This is done only to improve the formatting of the column values in the gridview. The date datatype wants to add the time (here not needed or captured) and the progress field is a calculation, specifically a division. There is no easy way to restrict the formatting to 2 decimal places when you have a number (please suggest solutions you may discover) so the calculation is performed and then stored as a string using the string formatting .tostring("P1") If tblExams.Columns.Count > 0 Then Exit Sub 'ensure the datatable is creaed only once With tblExams.Columns .Add("ExamID", GetType(Integer)) .Add("ExamName", GetType(String)) .Add("ExamDate", GetType(String)) .Add("DayofWeek", GetType(String)) .Add("DaysUntilExam", GetType(Decimal)) .Add("TotalHoursRequired", GetType(Decimal)) .Add("HoursCompleted", GetType(Decimal)) .Add("NumberHoursLeft", GetType(Decimal)) .Add("Progress", GetType(String)) .Add("ActualGrade", GetType(String)) End With 'the exam ID is set to an auto-number field using this code With tblExams.Columns("ExamID") .AutoIncrement = True .AutoIncrementSeed = 1 .AutoIncrementStep = 1 End With 'the HoursCompleted column is a running total so it needs to be set to 0 so that it can be incremented using a += formula. With tblExams .Columns("HoursCompleted").DefaultValue = 0 End With End Sub#End Region#Region "Delete an exam" Protected Sub Button5_Click(sender As Object, e As EventArgs) Handles Button5.Click 'This procedure deletes a row from the table. First make sure an exam was selected in the radiobutton list. If rblExams.SelectedIndex = -1 Then Response.Write("Select an exam to delete") Exit Sub End If 'first remove the row from the table. The code reads table.rows(row to delete).delete that row. The .delete function deletes the row. The table(rblExams.SelectedIndex) specifies which row to delete. Its great that both the radiobuttonlist and the rows collection of the table both have an index to identify their number. The index of the row or item in the radiobuttonlist (or any list control) starts at 0. tblExams.Rows(rblExams.SelectedIndex).Delete() 'now that we have deleted the row from the table, we delete the item from the list collection of the list control (e.g. delete the item from the radiobuttonlist) rblExams.Items.RemoveAt(rblExams.SelectedIndex) 'while we delete the row from the datatable, the gridview is not updated until you bind it again (after the row is deleted). GridView1.DataSource = tblExams GridView2.DataSource = tblExams GridView1.DataBind() GridView2.DataBind() End Sub#End Region#Region "Utilities" 'These next two procedures are used to clear the data entry controls Protected Sub Button3_Click(sender As Object, e As EventArgs) Handles Button3.Click txtHours.Text = Nothing rblExams.SelectedIndex = -1 End Sub Protected Sub Button4_Click(sender As Object, e As EventArgs) Handles Button4.Click txtName.Text = Nothing txtExamDate.Text = Nothing txtHoursRequired.Text = Nothing End Sub 'these next two procedures allow the human to switch between sections (iews) of the program Protected Sub LinkButton1_Click(sender As Object, e As EventArgs) Handles LinkButton1.Click MultiView1.ActiveViewIndex = 0 End Sub Protected Sub LinkButton2_Click(sender As Object, e As EventArgs) Handles LinkButton2.Click MultiView1.ActiveViewIndex = 1 End Sub #End Region End Class ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download