Process Analysis and Modeling



Mathematical background

In this chapter we will briefly review some elementary concepts in mathematics that will be used frequently in the text later.

1 Calculus

The branch of mathematics that investigates the change in quantities is called calculus. For the study of change, calculus brings the mathematical structures such as functions and coordinate systems and operations like derivation and integration into our skill set. Some fundamental theorems tie these structures and operations together to form the mathematical foundation to study the change in variables. Let’s start our review with the definition of function.

1 Function

A function is the mathematical description that shows how an independent variable is translated into a dependent variable. In a strict mathematical sense there has to be a one-to-one relation between the independent and dependent variables for a relation to be called a function. However, this condition is relaxed in some engineering applications. Usually this a functional mapping is shown as in Figure 1.

Figure 1 A general function and its inverse

The pool of available x values, independent variable values, form the domain of the function. The mapped f(xn) values, dependent variable values, constitute the range of the function. If the function f(x) follows the strict mathematical definition there will be an inverse function ƒ-1[f(x)] that maps the range back to the domain.

In most cases it is a difficult task to obtain the inverse relation for a functional mapping. However, there are widely used mathematical transformations such as Laplace where we have tabulated inverse relations for our convenience. As we will see in later chapters, these transformations are great tools that enables the analysis of complex chemical systems.

2 Coordinate systems

Many chemical engineering problems, if not all, are analyzed with their geometrical features in consideration. For representing a point in space we need a reference so we can relate the specific point to the reference system and fix its position in space. In engineering science a coordinate system is used for this purpose.

A coordinate system is formed by the selection of linearly independent basis vectors that span the coordinate system. Then any point in space can be represented by a linear combination of these basis vectors and simply expressed as pairs for planar and as triplets for three dimensional geometries. The most common coordinate system is the Cartesian coordinate system, where the basis vectors are three unit vectors orthogonal to each other for three dimensional space.

Depending on the system’s symmetry, the analysis may be greatly simplified if another coordinate system is selected. The other two common coordinate systems in chemical engineering are the cylindrical and spherical coordinate systems. It should be noted that different coordinate systems mean only different referencing methods for the same point in space. It is possible to convert a geometry from one coordinate system to the other. The relation and basic features of the three coordinate systems mentioned are shown in Figure 2.

Figure 2 The Cartesian, cylindrical and spherical coordinate systems

3 Derivative

Earlier we defined what a function was. We also defined the different coordinate systems these functions might have based on. Now, it is time to define the operation that would give the sense of change in a function. This operator is called differentiation.

Very often engineering problems are expressed in terms of the changes in various variables using differential equations. In later chapters we will discuss them in greater detail. However, for now let’s just give a very brief idea about the geometrical meaning of derivative and the relation between difference notation and differential notation as described by (Mickley, H. S., Sherwood, T. K., and Reed, C. E. 1957).

Assume that there is a continuous function y=f(x). Geometrical meaning of derivative of this function at x is the slope of curve at the same x value. Hence, the derivative, dy/dx, can be approximated by the following formula

[pic] (2.1)

The exact definition is obtained when the difference Δx is let to approach zero. In the limit we have the definition of derivative as

[pic] (2.2)

The relationship between the approximate and exact quantities are shown in a figure adapted from (Mickley, H. S., Sherwood, T. K., and Reed, C. E. 1957).

Figure 3 The relationship between differences and differentials

4 Partial derivatives and total differential

In the previous section the definition for derivative is given for a case where we have only one independent variable. The generalization of derivatives to cases where there are more than one independent variable involves partial derivatives and have a similar physical meaning. The difference is that the other independent variable is held constant when taking the partial derivative. Consider a function f(x,y). The partial derivatives of f(x,y) with respect to x and y are

[pic] (2.3)

[pic] (2.4)

With these definitions, the total differential df can be written in terms of the two partial derivatives as

[pic] (2.5)

This equation can be generalized for a function with n independent variables x1-xn as follows.

[pic] (2.6)

5 Chain rule

In many modeling situations the dependent variable of an equation is an independent variable in a different equation. Let’s derive the general differentiation equations for this common case. Consider now two functions f and g. Function g has the independent variable x. Function f has g as its independent variable. Symbolically, this is expressed as: g(x) and f(g). Our goal is to determine the rate of change of f with respect to x. Applying the definition of differentiation to our case we get

[pic] (2.7)

The function g(x+Δx) can also expressed as g+Δg. With this substitution, the equation becomes

[pic] (2.8)

Now, let’s multiply the argument of the limit expression with unity expressed as Δg/ Δg

[pic] (2.9)

Rewriting the equation by exchanging the places of Δx and Δg makes

[pic] (2.10)

The difference Δg can also be expressed as g(x+Δx)-g(x)

[pic] (2.11)

The “chain rule” of differentiation is obtained by comparing equation 2.11 to the definition of derivative

[pic] (2.12)

6 Integral

In some engineering problems the rate of change in a relation, its derivative, is known and the total accumulation due to this rate is looked after. Let F(t) denote the functional relationship of some total amount at time t. The amount accumulated between times t0 and tn would be

[pic] (2.13)

If the interval [t0,tn] is divided to with (n-1) equal length Δt, the accumulated amount can be expressed as

[pic] (2.14)

By multiplying the above expression by unity expressed as Δt/Δt the equation becomes

[pic] (2.15)

As the difference Δt approaches 0, the terms (ΔF/Δt)i becomes the derivative dF/dt at that t value and the equation itself becomes the definition of integral of dF/dt from t0 and tn.

[pic] (2.16)

7 Fundamental theorem of calculus

Equation 2.16 connects a function to its derivative through an integral expression. It is a very important result and forms the basis of the fundamental theorem of calculus shown below.

T Fundamental theorem of calculus

If f(x) is a continuous function in the interval [a,b], then the following two assertions are true

1. If f(x) is the derivative of a function F(x), then the following definite integral should hold

[pic]

2. If the following indefinite integral holds

[pic]

then

[pic]

8 Rolle's theorem

The fundamental theorem of calculus will enable us to develop a very practical result for graphical representation of integrals and numerical integration in general. However, first we need to appreciate one important property of continuous and differentiable functions stated by Rolle's theorem below.

T Rolle's theorem

Let f(x) be a continuous function on a closed interval [a,b] and differentiable on its interior (a,b). If f(a) = f(b) then there is some c in the interval (a,b) such that f'(c) = 0.

9 Mean value theorem

Equation 2.15 enables us to make a very useful graphical interpretation of integration operation.

[pic] (2.17)

The difference quotient in equation 2.17 is a constant approximation to f(t) in the first part of mean value theorem as depicted in Figure 4. The staircase shape has a constant value for each interval of Δt and the product of (ΔF/Δt)i* Δt becomes the area of each rectangle bounded by the base Δt and height (ΔF/Δt)i as illustrated by the blue rectangle. Hence, the summation in equation 2.17 becomes the sum of the rectangular areas. As Δt approaches zero, the equation becomes the definition of integral and the sum of the rectangles equals the area under the curve f(t).

The graphical interpretation of integral leads to an important theorem of calculus, the mean value theorem: if a function f(t) is drawn in an interval [t0,tn] there has to be a rectangle shown by the yellow rectangle with base t0-tn as and height bounded by the minimum and maximum values of the f(t) in that interval that would be equal to the area under the curve in the same interval.

Figure 4 f(t)=dF/dt vs. t plot to illustrate integral graphical interpretation

Below you can find the mathematical formulation of the mean value theorem.

T Mean value theorem

Let f(x) be a continuous function on a closed interval [a,b] and differentiable on its interior (a,b), then there is a number c such that

[pic]

10 Taylor’s theorem

The mean value theorem leads to another valuable result. It enables us to estimate any point of a continuous function from a another point on the function and its first n derivatives. The theorem’s mathematical definition is as follows:

T Taylor's theorem

Let f(x) be a continuous function and has its first (n+1) derivatives defined as f', f'',..., f(n + 1) on [a,x], then for every x in the interval

[pic]

where

[pic]

for some c in the interval [a,x].

In the above theorem, Tn(x) is the Taylor polynomial of degree n. Rn(x) signifies the remaining error term. The Taylor polynomial provides us with an equation that can approximate any function with a predictable error term. In practice, usually the first few terms of the polynomial expressed in its open form in equation 2.18 is sufficient for reasonable approximation.

[pic] (2.18)

The Taylor function where x0=0, the function has another name, Maclaurin function. In essence, both functions provide us with a relationship that can approximate any continuous function with various level of accuracies. Maclaurin function does the approximation about the zero point, whereas Taylor function can approximate about any point on the function. Let’s analyze the various level of accuracies provided by Taylor function expansion of f(x)=e-x about x0=1.

The first n derivatives of e-x are expressed as

[pic] (2.19)

The Taylor polynomial Tn(x) is written as

[pic] (2.20)

For the case where x0=1 the Taylor function becomes

[pic] (2.21)

Figure 5 shows various level of approximations for the e-x function about the x0=1point. With n=5 we can get a very good approximation for any value in the [0,3] interval. To accomplish this we used only the first 5 derivatives of e-x and the function value at x0=1. This will become handy in the solution of many engineering problems.

[pic]

Figure 5 Taylor approximation of e-x about x0=1

2 Elementary data manipulation

Chemical engineers frequently manipulate data in order to gain insight to a certain process. The data either comes from experiments or from tabulated data in handbooks or scientific journals. Now we will briefly discuss the methods interpolation and extrapolation that are commonly employed in chemical engineering analysis. For a detailed discussion on various interpolation techniques the following reference is a valuable resource (Hamming, R. W. 1986).

1 Interpolation

Let's assume that there is a series of dependent variable, y, values for a corresponding tabulated values of independent variable, x. Interpolation is finding the y value for a corresponding x value between xn and xn+1. There are three common methods for interpolation: linear interpolation and Newton's and Lagrange's methods. The first method assumes yn, yn+1, xn and xn+1 form a linear relationship. The latter two methods both assume that the data can be represented by an n-th order polynomial. Newton's method requires that the independent variable x has equidistant values. Lagrange's method does not have a restriction like that. Let’s start our discussion with Newton’s method.

1 Newton's method

If a tabulated x-y data can be represented by an n-th order polynomial then the following equation should hold

[pic] (2.22)

where x0-xn are the first n x values of the tabulated data. One can obtain expressions for the coefficients an by repeatedly substituting pairs from the tabulated data for x y values in the equation. The procedure results in the following first three coefficients a0-a3

[pic] (2.23)

Realizing that the x values are equidistant and equal to h, the general equation is expressed as

[pic] (2.24)

With the coefficients from equation 2.24 substituted into equation 2.22 the Newton’s interpolation formula becomes

[pic] (2.25)

However, the working equation of Newton’s method employs another variable substitution

[pic] (2.26)

This is the ratio of current x value’s distance from first point to the constant distance between adjacent points. With this substitution Newton’s Method’s working equation becomes

[pic] (2.27)

Newton’s method works out nicely with the help of a difference table. Let’s generate data using a third order polynomial y=x3+x2+x+1 for sample calculation. As the Newton’s interpolation method is closely related to differentiation the difference table for a third order polynomial generated constant terms for the third differences. In general, for an n-th order polynomial we should get constant terms at the n-th difference level. If the data is experimental, then we need to define a criterion to assume that the difference is constant at a certain difference level.

Table 1 Difference table for data generated from y=x3+x2+x+1

|x |y |Δ1 |Δ2 |Δ3 |

|0.1 |1.111 | | | |

| | |0.137 | | |

|0.2 |1.248 | |0.032 | |

| | |0.169 | |0.006 |

|0.3 |1.417 | |0.038 | |

| | |0.207 | |0.006 |

|0.4 |1.624 | |0.044 | |

| | |0.251 | |0.006 |

|0.5 |1.875 | |0.050 | |

| | |0.301 | |0.006 |

|0.6 |2.176 | |0.056 | |

| | |0.357 | |0.006 |

|0.7 |2.533 | |0.062 | |

| | |0.419 | | |

|0.8 |2.952 | | | |

Let’s finish the example by interpolating the y value for x=0.24. For this example, the critical parameter values are as follows y0=1.111, h=0.1, x-x0=0.24-0.10=0.14, p=1.4 and n=3. The interpolated y value is

[pic] (2.28)

As we generated the data from a third order polynomial, the interpolated y value is equal to the exact value, y=0.243+0.242+0.24+1=1.311424.

For experimental data, or in general, for data generated from non-polynomial functions, it is important which part of the difference table is used during interpolation. In the example above, we used the differences in the upper diagonal with the corresponding x0 and y0 values. You can find more information on different strategies regarding the use of the difference table with respect to interpolation in Hildebrand’s book (Hildebrand, F. B. 1987).

Q.2.1 Find the corresponding y value for x=32 with Newton's interpolation formula as discussed in the example.

2 Lagrange's method

Like in the Newton’s interpolation method, Lagrange’s method assume that the tabulated x-y data can be represented with an n-th order polynomial. The form of the polynomial is as follows

[pic] (2.29)

where x0-xn are the first n x values of the tabulated data. Substituting the n tabulated x-y pairs into equation 2.29 the coefficients a1-an are found as

[pic] (2.30)

Substituting the coefficients the polynomial for Lagrange’s interpolation becomes

[pic] (2.31)

To illustrate Lagrange’s method let’s calculate the density of a 20% aqueous sodium chloride (NaCl) solution at 25 °C. The following data is given in Perry’s Handbook.

|x, % NaCl |12 |16 |24 |26 |

|y, density in g/cm3 |1.08365 |1.11401 |1.17776 |1.19443 |

Table 2 The tabulated density data for aqueous NaCl at 25 °C

Equation 2.31 for the above case is written as

[pic] (2.32)

The value listed in Perry’s Handbook but intentionally omitted from the Table 2 for 20% aqueous NaCl is 1.14533. The exact match suggests that either we got very lucky or the values in Perry’s Handbook for this property was fitted by a third order polynomial and displayed as tabulated data.

Q.2.2 Find the density of 18% aqueous NaCl with Lagrange's interpolation formula as discussed in the example.

2 Extrapolation

Extrapolation refers to finding a corresponding y value for an x value beyond the available data range. The same interpolation equations can be used for extrapolation as well. However, it is not recommended to use extrapolation considerably farther than the endpoints unless there is supporting theoretical explanation for the validity of a polynomial model.

3 Elementary linear algebra

In the study of engineering problems, frequently certain quantities appear as significant ordered rectangular arrays. A common example is the representation of imaginary numbers a+bi as (a,b). Another example is the expression of a point in the Cartesian coordinate system as (x,y,z) (Eves, H. 1980). Operations we can perform with these and many other ordered rectangular arrays of numbers have some very similar characteristics. The similarities enable us to look at the analysis of these mathematical structures, matrices, with a great degree of generality. However, first we need to make some important definitions.

1 Definitions

There are two elementary mathematical structures we will introduce here, ring and field. These structures will help us to make a definition for a vector space. It will turn out that an ordered rectangular number array, a matrix, is nothing more than a way of representing the linear transformation for the vectors in its corresponding vector space.

T Ring

A set R with elements a, b and c and two operations addition, +, and multiplication, [pic], is said to form a ring if the following conditions are true:

1. R is closed under addition and multiplication

The operations [pic]and [pic] result in uniquely defined elements in R

2. Associativity laws for both operations

[pic]and [pic]

3. Commutativity law for addition

[pic]

4. Additive identity element

[pic]

5. Addiditive inverse element

[pic]

6. Distributivity law of multiplication over addition

[pic] and [pic]

With the help of the definition for a ring, let's explain what a field is:

T Field

A ring R with the following properties is called a field F:

1. Commutativity law for multiplication

[pic]

2. Multiplicative identity element

[pic]

3. Multiplicative inverse element

[pic]

T Vector space

A set of objects called vectors is said to be a vector space V with elements u, v and w over a field F with elements a and b, if operations vector addition, [pic], for all u, v in V and scalar multiplication, [pic] for all u in V and a in F, the following ten properties hold:

1. V is closed under vector addition

The operation [pic] results in a uniquely defined element in V

2. Associativity law of addition

[pic]

3. Commutativity law for vector addition

[pic]

4. Additive identity element

[pic]

5. Additive inverse element

[pic]

6. V is closed under scalar multiplication

The operation [pic] results in a uniquely defined element in V

7. Associativity law of scalar multiplication

[pic]

8. Identity element for scalar multiplication

[pic]

9. Distributivity law of scalar multiplication over vector addition

[pic]

10. Distributivity law of vector addition over scalar multiplication

[pic]

Now, we know what a vector space is. There is a special vector space called zero vector space 0v. In practice v is dropped and 0 is used to refer to the zero vector space. In our discussion our primary interest will be the vector space of n-tuples as [pic]where ai are elements of a field F. The zero vector in the vector space of n-tuples is represented by [pic]. Two common fields we will encounter as chemical engineers are the field of real numbers R and the field of complex numbers C. Please convince yourself that all the possible n-tuples formed with the elements from R or C are indeed in accord with the definition of a vector space of n-tuples.

There are two very common representation styles for the vectors in the vector space of n-tuples. We used one of them in our discussion, the n-tuple expressed with its element separated by comma and enclosed in parentheses as [pic]. The other common style is to express the elements in a column and enclose the column with brackets.

[pic] (2.33)

In this text, if the subject discusses only vectors, the parenthesis style will be used. However, in matrix operations we will opt for the style of brackets.

The linear transformations in a vector space V can be conveniently expressed as a rectangular array of symbols. The array of symbols is called a matrix as shown in equation 2.34. It is customary to use capital letters to denote matrices. The individual symbols a11, a12, …, amn are called the elements of matrix A.

[pic] (2.34)

The above matrix contains m rows and n columns. Therefore it is of order [pic]. Commonly, the matrix is referred as a m by n matrix.

2 Elementary special matrices

Here we will explore some of the elementary special matrices we will refer in text frequently.

1 Row matrix

A matrix with a single row of n elements is called a row matrix. Its order is [pic]. Sometimes it is referred as a column vector as well.

[pic] (2.35)

2 Column matrix

A matrix with a single column of n elements is called a column matrix. Its order is [pic]. Sometimes it is referred as a row vector as well.

[pic] (2.36)

3 Null matrix

A matrix where all elements has a value of zero is called a null matrix. Its order is [pic].

[pic] (2.37)

4 Square matrix

A matrix where the number of rows and columns are equal is called a square matrix. Its order is [pic].

[pic] (2.38)

5 Diagonal matrix

A square matrix where only the elements on the main diagonal has a value other than zero is called a diagonal matrix. Its order is [pic].

[pic] (2.39)

6 Identity matrix

A diagonal matrix where all elements has a value of one is called an identity matrix. Its order is [pic]. It is usually denoted by the capital letter [pic].

[pic] (2.40)

7 Upper triangular matrix

A square matrix where all elements has a value other than zero only on the diagonal axis and above is called an upper triangular matrix. Its order is [pic].

[pic] (2.41)

8 Upper matrix

An upper diagonal matrix where diagonal elements are zero as well is called an upper matrix. Its order is [pic].

[pic] (2.42)

9 Lower triangular matrix

A square matrix where all elements has a value other than zero only on the diagonal axis and below is called a lower triangular matrix. Its order is [pic].

[pic] (2.43)

10 Upper matrix

A lower diagonal matrix where diagonal elements are zero as well is called an upper matrix. Its order is [pic].

[pic] (2.44)

3 Matrix operations

We defined various mathematical structures in the previous section. However, in order to use them engineering problem solving we also need to define several mathematical operations to go with these structures. This section will introduce common operations that involve vectors and matrices. Let's start our discussion with matrix addition.

1 Matrix addition

Matrix addition between two matrices A and B is performed by adding the corresponding elements aij, bij of matrices. Therefore, this operation is only defined between matrices of the same order.

[pic] (2.45)

2 Scalar multiplication

The multiplication of a matrix A by a scalar k is performed by multiplying all the elements aij with k. This operation is defined for matrices of any order.

[pic] (2.46)

3 Inner product (scalar product, dot product) between a row and column matrix

The inner product of a row matrix U with order [pic]and column matrix V with order [pic] is defined as the scalar obtained as follows

[pic] (2.47)

Scalar multiplication and scalar product should not be confused with each other. Scalar multiplication has its name because a matrix is multiplied with a scalar whereas in the case of scalar product two matrices are multiplied with each other resulting to a scalar.

4 Matrix multiplication

Matrix multiplication between two matrices A and B is an operation defined only if the number of columns in the first matrix and number of rows in the second matrix are the same. In other words, the order of the matrices should be [pic] and [pic], respectively. In that case, the matrices are said to be conformable. Please note that matrix multiplication operation is not commutative in general. However, there are cases where AB=BA. In these special cases, the matrices are said to be commutable.

[pic] (2.48)

5 Determinant of a matrix

The determinant is only defined for a square matrix A of order [pic]. It is a rectangular array of elements identical to the elements of the original matrix enclosed by vertical lines as opposed to brackets. It corresponds to a single value. For a [pic] matrix, the determinant calculation is simple. For example, the following A matrix

[pic] (2.49)

has the determinant

[pic] (2.50)

However, to express a general formula for determinants of square matrices is a complex task. First, we need to review what a permutation is. A permutation is an ordered arrangement of n integers 1, 2, … , n. For n integers there are n! permutations. Say we have a permutation (i1, i2, … , ir , … , in) . The element ir is said to have m inversions for the cases where integers to its right are less than ir. If all inversions for the elements of a permutation are calculated and summed the result would be an odd or even integer. If it is odd the permutation is an odd permutation, otherwise it is even permutation. For all n>1, there are exactly n!/2 odd permutations and n!/2 even permutations for n integers.

Example

Say we have the following integers 1, 2, 3.

Number of all permutations = n! = 3! = 1*2*3 = 6

The permutations and their odd/even properties can be analyzed with the help of the following table.

Table 3 Odd and even permutations of 1, 2, 3

| |Inversions | |

|Permutation |for 1 |for 2 |for 3 |Total |even/odd |

|(1, 2, 3) |0 |0 |0 |0 |even |

|(1, 3, 2) |0 |0 |1 |1 |odd |

|(2, 1, 3) |0 |1 |0 |1 |odd |

|(2, 3, 1) |0 |1 |1 |2 |even |

|(3, 2, 1) |0 |1 |2 |3 |odd |

|(3, 1, 2) |0 |0 |2 |2 |even |

After this short discussion on permutations we can express a general formula for the value of a determinant. The determinant of a square matrix A is the sum of products of its elements over all permutations p=(i1, i2, … , in) of integers 1, 2, … , n. The value of ep is (+1) and (–1) for even and odd permutations, respectively.

[pic] (2.51)

For more discussion on determinants you can refer to (Eves, H. 1980). The general equation for a determinant of square matrices is not a very practical equation to visualize the calculation for small order matrices. However, we can use two commonly employed methods below.

1 Calculation of determinants with Sarrus Rule

For a [pic] matrix like

[pic] (2.52)

repeat the first two columns at the end of the matrix as illustrated below.

Figure 6 Sarrus rule for calculating the determinant of a [pic] matrix

Now multiply the elements along the colored lines and put a plus or minus sign in front of the product of elements along blue and red lines, respectively. The sum of the products is the determinant |A| of matrix A.

[pic] (2.53)

We could have reached to the same result with the general formula presented with permutations of one of the indices. To verify please examine equation 2.53. It has 6 permutations for the second index with integers 1, 2, 3 as (1,2,3), (2,3,1), (3,1,2), (3,2,1), (1,3,2), (2,1,3). Three of these permutations are odd and three of them are even. Therefore, the Sarrus Rule is a simple way of calculating the determinants of matrices of order [pic]. However, this method does not generally apply to higher order square matrices. Let's look to another method that is more general.

2 Calculation of determinants with cofactors

As a rule, each element aij of matrix A can occur only once in each term and (n-1)! times in the collection of terms of the equation defining its determinant |A|. Let's use equation 2.53 for discussing the details of this method. For a the[pic] matrix A, the following statements hold: n=3, hence n!=6 and (n-1)!=2. So each element aij occurs 2 times in the 6 terms defining the determinant. Let's select the elements which form a row or column in the original matrix and factor them out. After some algebraic manipulation we arrive at the following expression if the elements of the first row is selected.

[pic] (2.54)

A closer look at equation 2.54 reveals the fact that the elements in the parentheses form sub-determinants from all the elements not contained in the row and column where the factored element is located. For a better visualization, refer to the following illustration for the first term of equation 2.54.

.

Figure 7 Cofactor of the first element in a determinant of a [pic] matrix

With the notion of cofactor Aij of an element aij in a matrix A, equation 2.54 can written succinctly as

[pic] (2.55)

Employing cofactors to calculate the determinant of a matrix can be applied to all square matrices of order [pic]. The generalized equation for the method can be expressed for rows or column versions. However, the resulting determinant would be the same

[pic] (2.56)

Earlier we mentioned that the matrix multiplication is not commutative in general. However, it turns out that although [pic] the determinants of AB and BA are the same as long as the result is a square matrix.

Q.2.3 Consider the following two [pic]matrices, A and B.

[pic] and [pic]

a) Find AB and BA.

b) Find |AB| and |BA|.

Q.2.4 Find the determinant of the following [pic]matrix.

[pic]

For a further discussion of determinants and proofs of various statements we made please refer to (Eves, H. 1980) and (Marcus, M. and Minc, H. 1988).

6 Inverse of a matrix

If for a square matrix A, there is another matrix A-1 so that the matrix multiplication of A and A-1 results in an identity matrix I, A-1 is said to be the inverse of A.

[pic] (2.57)

If A-1 exists, then the matrix A is called a non-singular matrix. For a non-singular matrix the inverse of a matrix of order [pic] can be found by applying this general formula

[pic] (2.58)

where det(A) is the determinant of matrix A as

[pic] (2.59)

and where the matrix C' is the transpose of the cofactor matrix C.

[pic] (2.60)

7 Transpose of a matrix

The transposition operation for a matrix is performed by exchanging the places of rows with columns. This operation is defined for any matrix of order [pic]. It is expressed with an ' sign on the top right side of a matrix A. The order of A' becomes [pic].

After the transposition operation, matrix A

[pic] (2.61)

becomes

[pic] (2.62)

There are some nice general remarks for transpose of matrices of order [pic].

1. [pic] always results in a square matrix.

2. For conformable matrices A and B, [pic]. This rule can be extended to a product of any number of conformable matrices.

8 Inverses and determinants of some special matrices

The equation 2.56 and equations 2.58-2.60 provide us with generalized formulas to calculate the determinants and inverses of square matrices, respectively. However, these formulas are not very practical, especially to be used in computer applications. In the next section, when the linear algebraic equations are discussed, practical methods that suit computer applications better will be discussed. In these applications, most often, with row operations and exchanges, a matrix is brought to a special form where the determinant or inverse is very easy to calculate. Therefore, it is a good idea to explore the determinants and inverses for some of these special forms.

1 The determinant of a diagonal matrix

For [pic], the determinant is [pic] (2.63)

2 The inverse of a diagonal matrix with non-zero elements for diagonal entries

For [pic], the inverse is [pic] (2.64)

3 The determinant of an upper triangular matrix

For [pic], the determinant is [pic] (2.65)

4 The determinant of a lower triangular matrix

For [pic], the determinant is [pic] (2.66)

5 The inverse of an orthogonal matrix

Let's first define what we mean with orthogonal matrix. Matrix A is said to be orthogonal if the multiplication with its transpose generates an identity matrix I.

[pic] for orthogonal A (2.67)

Here we need to realize that the generation of an identity matrix in a matrix multiplication is also present in the definition of inverse matrix. Hence, for orthogonal matrices the inverses are simply their transposes.

[pic] for orthogonal A (2.68)

4 Algebraic equations

Algebraic equations are mathematical equations of some unknown variables with no derivative and integral expressions of the unknown variables in them. Depending on the problem's nature algebraic equations can appear in sets. For a set of algebraic equations to be solvable, the number of equations has to be at least equal to the number of unknowns. That does not guarantee you a solution though. This is the minimum requirement.

If the number of unknowns exceed the number of equations the set is said to be underspecified. If the number of unknowns is less than the number of equations then the set is called overspecified. If the numbers are the same then the system is well posed.

There are two major classes of algebraic equations: linear and non-linear. Linear algebraic equations are the ones with unknown variables only to the first power. The unknown variables are not allowed to appear in products or any transcendental functions (sin, acos, exp, log, etc.) for linear algebraic equations. Non-linear equations do not have these restrictions. Any functional form except derivatives and integrals are allowed in them.

Below you will find some common examples for algebraic equations. For simple equations and equation sets it is easy to visualize what the solution means with a graph. Even for complex problems a sketch might be very helpful. In the following examples we will classify the equations and help you to visualize the solution with a graph.

1 Example 1

[pic] (2.69)

Equation 2.69 is a non-linear algebraic equation with one unknown variable. The graphical solution for it would be easier to illustrate if the equation is rewritten as [pic].

Figure 8 Graphical solution for [pic]

2 Example 2

[pic] (2.70)

The eqautions in 2.70 form a set of eqautions. The first equation in the set is a non-linear algebraic equation, whereas the second one is a linear one. Graphically, the first equation is the equation for unit circle. The second equation defines a line. The two intersections between the circle and the line are the solutions to the set of non-linear algebraic equation.

Figure 9 Illustration for an underspecified system

3 Example 3

[pic] (2.71)

Equation 2.71 is a non-linear algebraic equation. However, there are two unknown variables but one equation. Therefore the problem is underspecified and there are infinite number of solutions that satisfy the equation. These solutions form a parabola as shown in Figure 10.

4 Example 4

[pic] (2.72)

Equation 2.72 is also a set of non-linear algebraic equations. As there are two unknowns and three equations the problem is overspecified. The first equation is the parabola from the previous example. The second and third equations define two lines. The intersection of these three structures, if there is any is the solution to our system. As Figure 10 reveals, the three curves do not intersect. Hence there is no solution to the problem.

Figure 10 Illustration for an overspecified system

5 Example 5

[pic] (2.73)

The equations in 2.73 form a well posed set of linear algebraic equations. The graphical solution is shown in .

Figure 11 The graphical solution to a linear set of algebraic equations

Q.2.5 Classify the following equation sets and find the solutions by graphical means.

a) [pic]

b) [pic]

2 Linear algebraic equations

As we have seen in the fifth example of the previous section, sets of linear algebraic equations are rather simple problems. However, as the number of unknowns increases even these simple looking problems get complex and the need for a structured solution method arises. Let's revisit the problem from example 5.

[pic] (2.74)

By putting the coefficients of the unknown variables into a matrix A and the unknown variables themselves into a column matrix x, the left hand side of the equations can be represented by a matrix multiplication Ax. Making the results on the right hand side into a column vector b we arrive at the following symbolic expression.

[pic] (2.75)

In numerical values it would be

[pic] (2.76)

Let's find the inverse of coefficient matrix and multiply both sides of the matrix equation with it.

[pic] (2.77)

[pic] (2.78)

Carrying out the matrix multiplications

[pic] (2.79)

As the right most term is an identity matrix, equation 2.79 represents the solution.

This method can be generalized to linear algebraic equation sets of any size. As long as the problem is well posed, that is number of unknowns are equal to the number of equations, there will be a square coefficient matrix of order [pic] and the problem can be expressed as

[pic] (2.80)

Multiplying both sides of equation 2.80 with A-1 from the left side gives the solution as

[pic] (2.81)

However, for large n values finding the inverse of coefficient matrix with the cofactor method as discussed in the previous section is computationally not feasible. Most often, a computationally better method involves the Gaussian elimination for solving linear algebraic equations or general matrix inversion problems. Next section will introduce the original Gaussian elimination method and a variation of it.

1 Gaussian elimination

The matrix equation 2.80 corresponds to a set of linear algebraic equations of the form as follows

[pic] (2.82)

Adding or subtracting multiples of one row to another of this structure does not change the solution to it. We can use this fact to do some intelligent row manipulations to get an upper triangular matrix

[pic] (2.83)

for the coefficient matrix. At this point, the solution is reduced to a simple back-substitution procedure of the x values from xn to x1.

[pic], [pic], … (2.84)

Example

Let's explore the mechanics of the method with a numerical example. Consider the following linear algebraic equation set with three unknowns and three equations.

|[pic] |[pic] |

First, we want to eliminate x1 from equations [1b] and [1c]. For this purpose, we do the following row manipulations indicated on the right side of each row and obtain a new set like

|[pic] |[pic] |

Second, we will eliminate x2 from equation [2c] by doing the row manipulations on the right side:

|[pic] |[pic] |

Finally, we make the manipulations so the coefficient of x3 becomes unity.

|[pic] |[pic] |

At this point the upper triangular form is achieved. We found a value for the last unknown variable as x3=-2. Back-substitution of x3 into [4b] reveals the second unknown, x2=-1. Finally, the first unknown is found by back-substituting x2 and x3 into [4a], x1=0.

Let's analyze the Gaussian method we used in the above example to come up with some general rules. In each group of row manipulations, we systematically selected a row and a coefficient in that row to be used in eliminating unknowns in other rows. The selected row and coefficient are called the pivot row and the pivot, respectively. The pivot row is divided by the pivot to make the coefficient of the corresponding unknown unity. Then, it is used to successively eliminate the unknown of the pivot in other rows by subtracting multiples of the pivot row from them. Repeating the procedure several times generates a new equation set with a coefficient matrix with ones on the main diagonal and zeros below the diagonal. Finally, substituting the values for unknowns backwards lead to the solution.

Gaussian elimination seems to be very simple. It can be computerized with little effort. However, the example we discussed earlier was very well arranged. We did not have any difficulty at all applying the procedure. In some problems it is not that obvious how to select a pivot row and the pivot in it. For example, consider the case where the second equation does not contain the second unknown x2. This case would lead to a division by zero and the algorithm would fail. To deal with this case, the rows need to be arranged so the second equation contains the second unknown. However, how do you perform these exchanges if there are more than one candidate equation with second unknown. One approach is to select the equation that would generate the largest pivot. This strategy is called partial pivoting. Another strategy is to find the largest pivot by looking to the entire coefficient matrix and making the necessary column exchanges. This strategy is called complete pivoting. Generally, partial pivoting is enough for most practical purposes (Hamming, R. W. 1986).

2 Gauss-Jordan elimination

Once the upper triangular form with ones on the main diagonal is obtained from Gaussian elimination, the equation set will be

[pic] (2.85)

or in matrix notation.

[pic] (2.86)

In this form the last equation explicitly gives the value for the last unknown, xn. It can be used to eliminate xn in the first (n-1) equations. After this step, the equation next to last will be explicit in xn-1. Now, the value for xn-1 can be used to eliminate xn-1 in the first (n-2) equations. Repeating the procedure (n-1) times will reveal all the unknowns. This algorithm is called Gauss-Jordan elimination and it is a slight variation of the original Gaussian elimination.

Q.2.6 Consider the following equation set and find its solution by Gaussian and Gauss-Jordan eliminations.

[pic]

3 Gaussian methods and matrix inversion

Gauss-Jordan elimination is usually the method of choice for finding the inverse of matrices numerically. This is accomplished by constructing the following [pic] matrix for A.

[pic] (2.87)

If the first n columns are transformed by Gauss-Jordan into an identity matrix the second n columns would hold the inverse matrix for A.

Q.2.7 Consider the following equation set .

[pic]

a) Find the inverse of the coefficient matrix.

b) Use the inverse to find the solution to the equation set.

3 Non-linear algebraic equations

In engineering, a problem description often leads to a single non-linear equation of the form

[pic] (2.88)

or to a set of non-linear equations like

[pic] (2.89)

The x values that satisfy the equation or equations are called the roots. The single non-linear equation is said to be an n-th order polynomial if it has only integer powers of the unknown variable x. Its general form is as follows.

[pic] (2.90)

An n-th order polynomial has exactly n real or complex roots. For second, third and some fourth order polynomials there are exact equations to determine the roots. For higher order polynomials usually an iterative numerical procedure is needed for solution.

All non-linear equations that are not polynomials are commonly called transcendental functions. Transcendental functions may include all the special functions like sines, cosines, tangents, exponentials and logarithms. The only functions that are left out are the integral and derivative functions. Equations of transcendental functions usually require iterative methods for their solutions, as well. Hence, we will focus on numerical methods for the solution of non-linear algebraic equations in this section. Let's first discuss equations with a single variable.

1 Non-linear algebraic equations with one unknown variable

As shown earlier, these equations have the following general form.

[pic] (2.91)

In general, any iterative solution technique involves the construction of a recurrence relation that would approximate the root after successive iterations. Two common numerical techniques are the direct substitution and the Newton-Raphson methods. Both methods can be extended to problems with sets of non-linear algebraic equations.

1 Direct substitution method

In this method, the original equation expressed in 2.91 is brought to an equivalent functional relationship of the form

[pic] (2.92)

There are different strategies that can be adopted to rewrite equation 2.91 as equation 2.92. These strategies will eventually have an effect on the convergence of the overall procedure.

The modified equation can be expressed as a recurrence relation like

[pic] (2.93)

According to the above equation, the (k+1)-th approximation for the root is calculated by substituting the k-th approximation into the function F(x). The convergence criterion for the recurrence equation is

[pic] (2.94)

The proof for the convergence criterion can be found in (Hildebrand, F. B. 1987).

Example

Let's solve the following non-linear equation by direct substitution method

[pic]

One possible [pic] equation and the corresponding recurrence formula is

[pic] and [pic]

At this point it is a good idea to check the first derivative of F(x).

[pic]

It may give us some insight relative to the convergence characteristics of the numerical method. For direct substitution to converge to the root, the absolute value of F'(x), hence x2, should be less than unity near the root. The convergence condition[pic]gives us an interval of [-1,1]. Although it is a limited range we will take our chances and try the procedure with three initial points. The table below shows the results with the following initial points –1.5, -1.0, 1.0 and 2.0.

Table 4 Direct substitution method with various initial values

|k |x1 |

|0 |-1.50000 |1.00000 |1.00000 |2.00000 |

|1 |-0.38462 |-0.38462 |0.83333 |1.26667 |

|2 |0.83809 |0.83809 |0.81785 |0.90417 |

|3 |0.81793 |0.81793 |0.81773 |0.82133 |

|4 |0.81773 |0.81773 |0.81773 |0.81774 |

|5 |0.81773 |0.81773 |0.81773 |0.81773 |

As seen in Table 5 Newton-Raphson method converges to the root for all initial estimates in 5 iterations with an accuracy of 5 decimal places.

The example shows the remarkable performance of Newton-Raphson method. Its guaranteed and rapid convergence for good initial estimates make this method very attractive. This method is also extensible to problems with many unknown variables.

2 Non-linear algebraic equations with more than one unknown variable

In this section our concentration will be on problems of the following type

[pic] (2.100)

Both methods discussed in the context of equations with one unknown variable can be extended to handle the non-linear algebraic equations sets. The extension of the direct substitution method is trivial. Therefore, we will discuss only the extension of Newton-Raphson method. A Taylor series expansion of the i-th function, fi(x1, x2, … , xn), over the roots, [pic], ignoring the terms beyond linear yields

[pic] (2.101)

For the case where [pic] the i-th equation can be expressed as a recurrence relation

[pic] (2.102)

With the introduction of a matrix of partial derivatives, called Jacobian matrix, J,

[pic] (2.103)

equation 2.102 can be written in matrix form as

[pic] (2.104)

where F is the column matrix representing the functions 1, …, n. Multiplying both sides of the equation with the inverse of Jacobian and rewriting the equation for [pic] gives the working recurrence relation of the Newton-Raphson method for problems with more than one unknown variables.

[pic] (2.105)

Apparently, the matrix manipulation skills we acquired in the previous section will be useful in solving the sets of non-linear algebraic equations. Let's solve an example problem.

Example

We will solve the following set of non-linear algebraic equations with the Newton-Raphson method.

[pic]

Let's first bring the equations to f(x)=0 form and find the Jacobian matrix.

[pic]

[pic]

With the above matrices, the recurrence relation stated in equation 2.105 becomes

[pic]

As the Jacobian matrix is a square matrix of order 2, the inverse is simple enough to write it as an algebraic relation

[pic]

Substituting the inverse Jacobian matrix into the recurrence relation we obtain

[pic]

After selecting an initial estimate of for the roots we are ready to perform the numerical procedure.

[pic]

As Table 6 shows, Newton-Raphson method converges to a solution of

[pic]

in eight iterations. Figure 13 shows the two original functions and the progress of the numerical method.

Table 6 Numerical example for Newton-Raphson method

|k |x1,k |x2,k |f1(x1,k,x1,k) |f1(x1,k,x1,k) |

|0 |2 |1 |2 |4 |

|1 |1.7333 |0.0667 |0.0711 |1.12 |

|2 |1.8874 |-0.5388 |0.0238 |0.2733 |

|3 |1.9581 |-0.8293 |0.0050 |0.0639 |

|4 |1.9893 |-0.9563 |0.0010 |0.0122 |

|5 |1.99886 |-0.9954 |9.18E-05 |0.0012 |

|6 |1.99998 |-0.9999 |1.26E-06 |1.58E-05 |

|7 |2 |-1 |0 |0 |

Figure 13 Progress of Newton-Raphson method

This example completes this chapter. With the mathematical background we refreshed here we can now cover more advanced topics.

References

1. Eves, H. Elementary Matrix Theory. 1980. New York, New York, Dover Publications, Inc.

2. Hamming, R. W. Numerical Methods for Scientists and Engineers. 1986. Mineola, New York, Dover Publications, Inc.

3. Hildebrand, F. B. Introduction to Numerical Analysis. 1987. Mineola, New York, Dover Publications, Inc.

4. Marcus, M. and Minc, H. Introduction to Linear Algebra. 1988. Mineola, New York, Dover Publications, Inc.

5. Mickley, H. S., Sherwood, T. K., and Reed, C. E. Applied Mathematics in Chemical Engineering. 1957. New York, McGraw-Hill.

-----------------------

f(x)

x0

Assumptions:

Assumptions:

Assumptions:

Assumptions:

Both definitions are taken from , 06/07/2001

Both definitions are taken from , 06/07/2001

Both definitions are taken from , 06/07/2001

Both definitions are taken from , 06/07/2001

Both definitions are taken from , 06/07/2001

Both definitions are taken from , 06/07/2001

Both definitions are taken from , 06/07/2001

Both definitions are taken from , 06/07/2001

x

Maximum value for interval [t0,tn]

Minimum value for interval [t0,tn]

t0

tn

t

f(t)=dF(t)/dt

[pic]

[pic]

dx

”y

”x=dy

y

x

p

dx

Δy

Δx=dy

y

x

p

[pic]

[pic]

[pic]

[pic]

[pic]

Cartesian ≡ Cylindrical ≡ Spherical

(x, y, z) ≡ (r, θ, z) ≡ (r, θ, φ)

r

[pic]

[pic]

[pic]

θ

φ

Both definitions are taken from , 06/07/2001

Both definitions are taken from , 06/07/2001

Both definitions are taken from , 06/07/2001

Both definitions are taken from , 06/07/2001

Both definitions are taken from , 06/07/2001

Both definitions are taken from , 06/07/2001

Both definitions are taken from , 06/07/2001

Both definitions are taken from , 06/07/2001

Both definitions are taken from , 06/07/2001

Both definitions are taken from , 06/07/2001

Both definitions are taken from , 06/07/2001

Both definitions are taken from , 06/07/2001

Both definitions are taken from , 06/07/2001

Both definitions are taken from , 06/07/2001

Both definitions are taken from , 06/07/2001

Both definitions are taken from , 06/07/2001

z

y

x

z-axis

y-axis

x-axis

ƒ-1[f(x)]

ƒ(x)

f(x1)

f(x2)

.

.

f(xn)

x1

x2

.

.

xn

Range

Domain

[pic]

+

+

+

-

-

-

Cofactor A11 of element a11

Solution

X=1.94

(d)

(c)

(b)

(a)

Solution

-0.67, 4.44

X2

Solutions

0,1 AND 1,0

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download