Chemical Calculations



Treatment of Experimental Data and StatisticsAs a member of the scientific community, it will become necessary to collect and/or treat some laboratory data at some time or another. When you were beginning, very little attention was given to the accuracy of the data and the statistical analysis, although you may have done some statistics such as average and standard deviation. When presenting data to a community that is very skeptical and critical by design, it becomes more important to present information that has been scrutinized for accuracy. Data can come in a variety of forms. If an experiment is difficult to perform, it may be that only one or two measurements can be taken. Under such conditions, how is one to estimate the error in the results? For a large number of data points, it becomes easier to find the average 7and standard deviation, that is, the average of the difference of the data points from the calculated average. Then there are relationships between data that may be linear or nonlinear. How can we tell? Other problems can arise. When a number of measurements are used to produce a calculated result, what is the error in the final answer? When a data point is very far off from an average, when can we consider the possibility of discarding the number? These and other considerations will be addressed in the following pages.Types of Errors:Tantamount to our discussion is a discussion of the type of errors found and what can be done to remedy them. Errors fall into two broad categories, systematic and random. A systematic error is one that generally occurs because of a repeated error in the way a measurement is taken or possibly one that occurs due to instrumental variation. For example, a pH meter may have a poorly calibrated thermometer and consistently reads a pH value as too high or a thermometer is in error by 5oC over the range of measurement. Other examples include a researcher using a spectrophotometer that may consistently use an incorrect blank to calibrate the instrument, or perhaps, a researcher may utilize incorrect fundamental theory in the calculation of results. This last example is said to have occurred by R. A. Millikan in his oil drop experiment, which he used to calculate the value of Avogadros number. He arrived at a value of 6.064 .006 x 1023 which is in contrast to the well established value of 6.022137 .000007 x 1023 found by crystallographic measurements. The error in his value was traced back to erroneous data provided by other researchers.The other broad category is random error. These types of errors generally occur due to inaccuracy in the experimental equipment or expertise in the collection of data. These are very often investigator specific. For example, one individual may have better eyesight than another and be able to read the thermometer to a higher degree of consistency and accuracy. Most is in the quality of the instrumentation. For example, a research grade pressure gauge may be able to read pressures to .001 psi whereas a laboratory grade pressure gauge may give readings only to 1 psi. In general, random error can only be reduced by better (and usually more expensive) equipment and more diligence in the laboratory. Most obvious is the work carried out by general chemistry laboratory students. Inevitably, students that take the most time and care in their laboratory work produce results that have a higher degree of precision. All measurements, however accurate, will result in some degree of random error, which must be recognized. Systematic errors are far more difficult to sniff out and eliminate. These are the types of errors that will produce consistent results that are consistently wrong. For example, one might weigh an object repeatedly and get an average weight of 34.524 grams .001 grams when the true weight of the object may be 34.634 grams. The consistency in error shows a small degree of random error, however something is obviously wrong. Most likely is that the balance was not properly calibrated to read zero with no weight and it may be a place to begin. Other measurements may not provide such an obvious place to look. Some authors also categorize a third type of error called erratic error, which is a nice way of saying a random mistake. Multiple measurement of a single valueNow we will consider the measurement of a single value such as a temperature, or mass, although these are frequently measured only once. Some basics: Let us define some basic terms, which we will use later.Average:This is the most basic of determinations and one from which we will obtain others.Range: R = xlargest - xsmallestAverage deviation:Avg. dev. = This expression, although useful, has the drawback that errors, or deviations from an average cancel out giving a distorted estimate of experimental precision. A more effective method is to be certain that all deviations are additive. This is accomplished by:Variance:S2 = Note that the denominator has the value N-1 which is found statistically to produce a more accurate estimate of the experimental error. The variance, also useful, has the disadvantage that the result is not directly comparable to the data because it is a square of the data values. This leads us to the most used expression for experimental variation:Standard deviation:Example 1.) : Calculate the average and standard deviation for the following data: 5.34 5.63 5.22 5.76 5.50average = (5.34 + 5.63 + 5.22 + 5.76 + 5.50) / 5 = 5.49standard deviation = [ (5.34 - 5.49)2+(5.63-5.49)2+(5.22-5.49)2+(5.76-5.49)2+(5.50-5.49)2 ]1/2/ 2 = 0.217Normal Curve:Now that we have calculated some values, it is necessary to evaluate the quality of the data. We would like to ask questions such as: Is the data any good? Should I reject any data points as discordant? What number should I report and what range should I give in order to give the reader the confidence that the number is reproducible or legitimate? Often data is given as an average followed by a +/- number to give the reader an estimate of the variance of the number obtained in the data collection. The question is, what +/- should I report. It turns out that there are a number of approaches to this problem. Some background information is necessary in order to provide a satisfactory answer. In observing the distribution of experimental data, it is most commonly observed that a large number of measurements of a single value will be distributed in a reproducible way. The distribution is known as the “normal” distribution and looks like the following: This curve is probably familiar to you. It is used for a variety of applications that include I.Q. distribution, manufacturing statistical process control and many others. Some of the equations that were presented earlier are based on this distribution. If this figure is going to represent a distribution of data values, it will be necessary to define the meaning. Specifically, the curve represents the probability, or likelihood that a given value will be obtained. The maximum in the curve is interpreted in this scheme, as the average or mean value. Thus, there is an equal probability of measuring a value greater than the average and less than the average. Logically, you may say that the probability of having a measured data point be any real number in the range of - to + is a certainty. This seems like a ridiculous observation, but it has important consequences with respect to our normal curve. Let us set the value of 1 to mean “100% likely” or “absolutely certain”. If we do this, we can say that the probability of having a measured data point be some real number in the range of - to + is 1. From that point on, the probabilities for all other measured values are relative to the number 1. For example, suppose you had 200 rods of differing lengths. Let us say that the rods were distributed as follows:Number of rodsLength20 1 meter502 meters903 meters304 meters105 metersWe accomplish the above discussion by simply adding up the number of rods, which is 200. If I ask “What is the probability of picking up a rod out of a box at random of length 3 meters?” the question is answered by dividing the number of rods having length 3 meters by the total number of rods, that isProb(3 meters) = 90/200 = 0.45Although it is beyond the scope of our discussion, from a basic application Calculus, the area underneath a smoothly varying curve can be found. Applying these that application and the concept introduced in the discussion of the rods, we can find the area underneath the normal curve. When we do this, we are in essence adding up all possible numbers that can be. Doing so results in a number, just like the 200 in the rods example, If we re-interpret the curve to mean not just numbers, but probabilities, then it is reasonable to assume that the total sum of probabilities of having a given value for a number must add up to the number “1”. This has been done for the normal curve by adjusting the overall height of the curve. The result is now applicable to all possible collections of numbers and is known as a “normalized” distribution. From now on, we can discuss the probability of numbers having a given value relative to the number 1.To illustrated how this is used, consider the standard deviation, S, calculated for the example data set in Example 1, above. By applying the concepts discussed to the normal curve, it is found that the mean +/- the standard deviation will result in a number that has a 0.683 probability of being measured. In Example 1, the standard deviation of 0.217 means that 68.3 % of the time, a measurement made will fall between 5.49 - 0.217 and 5.49 + 0.217 or between 5.273 and 5.707. Other values and probabilities have been determined and are commonly used. Some examples are as follows:mean +/- S = .6826 = 68.26%mean +/- 2.00 x S = .9545 = 95.45%mean +/- 1.645 x S = 0.90 = 90%mean +/- 1.96 x S = .95 = 95%mean +/- 2.50 x S = 0.9875 = 98.75%mean +/- 2.58 x S = 0.99 = 99%mean +/- 3.29 x S = .999 = 99.9%mean +/- 4.00 x S = .99994 = 99.994%Say that we want to report a number and a +/- that would insure that 99 % of the measured numbers would fall within that range. We might report: 5.49 +/- (2.58x0.217) or 5.49 +/- .56. Problem: The discussion just presented, ideally, is perfectly accurate only for infinitely large samples! As the sample sizes get smaller and smaller, the normal distribution probabilities lose their meaning. This is a lousy thing to find out since most of the time, we don’t make hundreds of measurements for a given value. What do we do for small samples? Well, fortunately, the statisticians have dealt with this problem. The concepts discussed above form the basis of a set of statistics for small samples. What is done is to assume that a large number of “sets” of measurements are made. The standard deviations of the sets are calculated and statistics are done on that basis. This analysis produces a slightly different normal distribution. There are a number of useful results of this analysis, which will now be presented.Rejection of DataBefore doing any analysis of data, it is important to determine whether or not the data collected is significant. How often have you collected 3, 4, or 5 measurements and found one to be significantly different from the others. At first thought, it is tempting to discard the data out of hand, however, this practice is highly frowned upon. The whole subject of data rejection is very controversial. It is the goal of statistics to establish a reasonable criterion for the rejection of data. Early criteria for rejection was base on the standard deviation. As cited above, if a data point falls outside of a certain number of standard deviations, estimates can be made of the confidence of data. Typical values used were 2.5S and 4S. From the list above, this says that if a data point falls outside of 2.5 times the standard deviation, one would have confidence that we are 98.75% certain that the data can be discarded. Similarly, if a data point falls outside of 4 times S, we can reject the data with a 99.9994% certainty. All this seems reasonable until we realize that these estimates are mainly for large data sets. For small sets this is not very good at all resulting in a much higher rate of discordance than is reasonable to expect. A far better method has been developed and tested and is known as the Q test. In practice, a rejection quotient, or Q is calculated as follows:Comparison is then made with statistical Q tables. Some values are provided as follows:Table 1 90 % confidence 95 % confidence 99% confidencenQQQ3.941.970.9944.765.830.9265.642.710.8216.560.625.7407.507.568.6808.468.527.6349.437.495.59810.412.468.568Example 2.) Consider a set of data: 12.12, 13.05, 11.81, 9.02, 12.88The average for this data is 11.78 and the standard deviation is 1.62. From examination of this data, it looks like 9.02 might be suspect. We apply the Q test by first calculating the rejection quotient as:Q = |9.02 - 11.81| / |13.05 - 9.02| = 0.692Comparing this Q value to the table, we note that we have 5 data points. To see if the data point can be rejected with 95% probability, we trace down the 95% confidence column to n=5 and see a value of 0.710. Since 0.692 is less than 0.710, we decide that we cannot discard the data point. However, if we want to know if this value will be measured 90% of the time, we trace down the 90% confidence column and find a value of 0.642. Since the calculated Q value of 0.692 is greater than the table value of 0.642, we can discard the data point. Thus, it is observed that this is a more stringent criteria. Note: This process can only be done once! One can conceivably discard all data from repeated application of this test. Once the data is selected with a degree of confidence, it is then necessary to report that data with a degree of significance. There are a number of approaches to this, all based on the normal distribution above. The method of choice for small data sets is the “student t” test. (This name was given by the developer of this technique, a gentleman named Gossett, and has no relation to students.) The student t value is a better estimate of error for small sample sizes and is used as follows: For a small set of data, calculate the standard deviation. Second, consult a table of “student t factors” which can be found in any math handbook or most books on statistics. A small section of a table is given below. # of data points 0.900.950.990.9950.999513.086.3131.863.7637.021.892.926.969.9231.631.642.354.545.8412.941.532.133.754.608.6151.482.023.364.036.8761.441.943.143.715.9671.411.893.003.505.4181.401.862.903.365.0491.381.832.823.254.78101.371.812.763.174.59151.341.752.602.954.07201.331.722.532.853.85301.311.702.462.753.65Use of this table is as follows:Calculate the standard deviation for the data set.Decide on the confidence level desired and locate the t factor from the student t table.Calculate the quantity:where N is the number of data points.Example 3.) Using the data presented in example 2.), we calculated a standard deviation of 1.62 for 5 data points. Consulting the table for 95% confidence, we find the value 2.02 for 5 data points. ThusNow we report our result as average +/- , or for our data, Result = 11.78 +/- 1.46 (N=5, 95% conf.)Exercises: Classify the following as systematic, random or erratic error.An LED is burned out on a digital frequency counter which results in all 8’s looking like 9’s._____________________A student is too short to read a buret parallel to the liquid level and thus reads the level from below parallel._____________________A blank utilized in a spectrophotometer has deionized water in the cuvette when the measured solute is dissolved in methanol_____________________A digital pH meter won’t settle down to single number._____________________A balance is placed in an area where there is a draft._____________________A dyslexic student weighs a solution used in an experiment. The digital balance reads 1.92 grams and the student records 1.29 grams_____________________Calculate the average and standard deviation for the following data. Carry out a Q test for the data at 95% confidence level. Can any of the data points be discarded? If so, use the student-t table to recalculate the average and standard deviation of the sample.26.08, 24.36, 25.93, 28.62, 24.85, 26.43, 31.75, 24.42, 18.94, 27.85Average = ____________________Standard Deviation = ________________Discarded Point, if any, using Q test. __________________In the general chemistry laboratory, the solubility product constant was calculated by a number of groups. The following resulted:Ks (x 10-5) 3.50, 3.81, 3.50, 2.27, 3.12, 2.97, 3.40, 3.50, 3.50, 3.60. Calculate the average and standard deviation. Carry out a Q test for the data at 90% confidence level. Can any of the data points be discarded? If so, use the student-t table to recalculate the average and standard deviation.Average = ____________________Standard Deviation = ________________Discarded Point, if any, using Q test. (Identify Confidence Level) __________________Propagation of ErrorsIn many cases, a final result is calculated from a set of collected data using an equation. For example, say we wanted to calculate the molecular weight of a gaseous substance by using the ideal gas relationshipLet us say that an investigator went into the laboratory and collected the following set of data:g = 0.585 SYMBOL 177 \f "Symbol" 0.005 gT = 373.15 SYMBOL 177 \f "Symbol" 0.25 KP = 760.0 SYMBOL 177 \f "Symbol" 0.2 torrV = 206.34 SYMBOL 177 \f "Symbol" 0.15 mlWhat we really would like to know is how each of these measurements effect the final calculation of the molecular weight. If the random errors are not too large, we can use the derivative for our error analysis. Consider the calculation of a value, y, based on measured values x1,x2,x3, etc. By differential calculus, the total differential in y is given as:Before you get ill at the sight of this monstrosity, let’s proceed through it carefully. From our discussion of derivatives, we recognized that the derivativemeans “what is the infinitesimal change in the value “y” with respect to an infinitesimal change in “x”. From this idea, we can write that the infinitesimal change in y when x is changes can be written as: that is, the change in y is equal to the rate of change of y with x multiplied by the change in x. As an analogy, consider distance and velocity. Since velocity is the change in distance with time, we can write:The only change I want to make to this concept is that we must recognize that some equations have more than one variable that can change. Thus we must consider the change in all variables. The way that we do that is to say that the infinitesimal change in some value of interest is the sum of the effects of all other variables. So explaining the behemoth equation written above, we say the following:“The infinitesimal change in y, dy, is equal to the rate of change of y with x1 multiplied by the change in x1 + the rate of change of y with x2 multiplied by the change in x2 + etc…or, writing in terms of small errors relative to a measured value,Notice that the d’s in the derivatives have been replaced with ’s . The reason for this is to identify the new expressions as “partial derivatives” which means that it is the change in y with respect to one of the variables, but there are others variables to consider.The problem that occurs is that we do not know if the random errors are positive or negative. An effective way to resolve this is always take the absolute value of each of the terms, however, it is common to simply square each term and then take the square root of the result. Thus, the error in y becomes:Example 4.) Consider the calculation of a density. Let us say that the following laboratory data was measured for toluene:m = 21.42 g +/- 0.01gV = 24.6 ml +/- 0.2 mlthus Density = 21.42 g/ 24.6 ml = 0.871 g/mlSince Density = mass/volume, it is a function of two variables. We then write the change in density with respect to mass and volume as:We now need to produce the derivatives and . These are shown as: and Now we substitute these two derivatives into dD along with our data to yield:which has the value (1.6525 x 10-7 + 5.0114 x 10-5)1/2 = .007091 g/ml Notice two things from our analysis: The second term is larger than the first term. This indicates that the volume measurement is the one providing the most error. This means that in subsequent density measurements, one should concentrate on improving the accuracy of this measurement.The final result, reported as 0.87 +/- 0.01 g/ml provides a range that encompasses the literature value of toluene of 0.867 g/ml.As a further example, let us return to our calculation of the molecular weight. Since there are four independent variables, it is necessary to carry out four derivatives. These are expressed in the form of the differential change in the molecular weight as:Notice that a term involving the error in the gas constant is not included. Such constants are assumed to have no error. Given the experimental data as:g = 0.585 SYMBOL 177 \f "Symbol" 0.005 gT = 373.15 SYMBOL 177 \f "Symbol" 0.25 KP = 760.0 SYMBOL 177 \f "Symbol" 0.2 torr = 1.00 SYMBOL 177 \f "Symbol" .000263 atmV = 206.34 SYMBOL 177 \f "Symbol" 0.15 ml = .20624 SYMBOL 177 \f "Symbol" .00015 LThe error then becomes:= (0.5510 + .0034 + .0005 + .0040)1/2 = 0.75 g/molThe molecular weight is calculated as:giving a final result of 86.8 SYMBOL 177 \f "Symbol" 0.8 g/mol to the correct number of significant figures.Note again that the term involving the error due to the measurement of mass provides the greatest error in the final result. Thus, the investigator should concentrate on improving this measurement on subsequent determinations. Applications:Given the following laboratory data and calculations, determine the error in the calculated result.A crime lab wishes to calculate the kinetic energy imparted in a bullet by a given cartridge load fired by a given gun. The mass and velocity of the bullet are given as m = 5.012 .002 gramsv = 310 10 meters/secGiven: Kinetic Energy, KE = ? mv2., Calculate the energy in Joules and the expected error in the result. KE = ________________ +/- _________ JoulesGiven the equilibrium reaction 2 CH3COOH (CH3COOH)2 calculate the equilibrium constant and the error in the constant given the equilibrium pressures as:P(CH3COOH) = .340 0.02 atm P((CH3COOH)2) = .146 0.02 atmand Kp = ________________ +/- _________It is recommended that when a needed solution has a very low concentration, that the solution be prepared at a higher concentration and then diluted. Examine this from a statistical perspective. Calculate the concentrations and expected error in the preparation of 250.0 ml of a 1.50 ppm(w/v) solution of NaF if the solution were prepared as follows:Assume the balance to have an accuracy of 0.0001 grams, pipette to have accuracy of 0.002 ml and the volumetric flasks to have accuracy of 0.02 ml.As a direct measurement of NaF followed by dissolving in water and diluting to 250.0 ml.By preparation of the solution in the following steps:Prepare 500.00 ml of a 50 ppt solution from solid NaF.Pipette 10.00 ml of the 50 ppt solution and dilute to 500.0 ml.Pipette 10.00 ml of the resulting 1000 ppm solution and dilute to 500.0 ml.Pipette 18.75 ml of the resulting 20 ppm solution and dilute to 250 ml.A solution of a strong base can standardized to a known concentration by titration with Potassium Acid Phthalate, KHP. In such an analysis, the Molar concentration of the base can be found through the relation:In such an analysis, a typical set of measurements made by an investigator is as follows:MassKHP = .521 +/- .001 gramsVKOH = .0210 +/- .001 LMWKHP = 204.22 +/- .01 g/molFrom this information, calculate the concentration of the KOH solution.Calculate the differential error in the result.Which measured value should the investigator improve first to get a more precise result?The investigator made several runs in order to determine a statistical average. The following concentrations resulted:Conc (M) : 0.125, 0.126, 0.121, 0.119, 0.133, 0.117, 0.120, 0.118, 0.119, 0.125Calculate the mean and student-t error at 95% confidence. Test any suspicious values for rejection at 95% certainty.Is the error within that predicted by the differential error analysis?An external laboratory assay showed that the concentration of the base was 0.119 +/- 0.002 M.Does the mean value that this investigator found agree with the laboratory assay? If not, what type of error is involved, random or systematic? Suggest some items that the investigator might examine. An experimenter used a bomb calorimeter to determine the enthalpy of combustion of octane. The expression used to calculate the enthalpy is given by:where Ccal is the heat capacity of the calorimeter, n is the number of moles of octane, C8H18 used and T1 and T2 are the initial and final temperatures of the combustion.The following data was collected by the researcher:Ccal = 2390 +/- 10 kJ/oCMass octane = 54.32 +/- .01 gT1 = 24.02 +/- 0.02 oCT2 = 25.16 +/- 0.02 oC MWoctane = 114.2302 g/molFrom this information, calculate the heat of combustion for octane.Hcomb = __________________ kJ/molCalculate the propagation of error value for the values given above.Error = _____________ kJ/molSeveral runs were made by the researcher and calculations made. The following results were determined:RunHcomb (kJ/mol)1572025823356154591855661Perform a Q test to determine any values that might be discarded. (Q95% = .710)Value discarded (give value or write none) ______________Discarding any values determined by above, calculate the standard deviation of the data.Standard Deviation = Use a student t-test to determine the error in the reported result. t95% = 2.02 for 5 data points, t95% = 2.13 for 4 data pointsStudent t error = What is the overall final reported value?Value +/- error = This calorimeter system was used to measure the combustion enthalpies for several hydrocarbons of various length as above. The results are given below.HydrocarbonMolecular WtMass (/g)Number of CH2 GroupsHeat (/J)Molar Heat of combustion (kJ/mol)CH3CH2CH344.0970.5515127778CH3(CH2)2CH358.1240.5014224801CH3(CH2)3CH372.1510.4545322305CH3(CH2)4CH386.1780.4054419489CH3(CH2)6CH3114.230.3001614345It is found that the energy of combustion increases linearly for hydrocarbons as the chain length increases. Thus, a fit of the data can yield predictive equations.For the data in the table, calculate the Molar heat of combustion for each of the hydrocarbons in kJ/mol. Place your results in the table above.Use the data in the table to create a linear regression of the Molar Heat of Combustion versus number of CH2 groups.slope = _________________ +/- _______________intercept = _________________ +/- ____________corr = ______________Use your linear regression fit to estimate the Heat of Combustion of Ethane, CH3CH3 with error.Heat of Combustion of Ethane = __________________ +/- _______________ kJ/mol ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download