Does Having Had Course in Logic Help Those …



Exploring the Relationship between Logic Background and

Performance on the Analytical Section of the GRE

Elizabeth L. Bringsjord, Ph.D.

Assistant Provost, Office of Academic Affairs

State University of New York, Albany NY

Selmer Bringsjord, Ph.D.

Professor of Logic, Cognitive Science, and Computer Science

Associate Chair, Department of Cognitive Science

Director, Minds and Machines Laboratory

Rensselaer Polytechnic Institute (RPI), Troy NY

The test description for the Analytical section of the Graduate Record Examination (GRE)—a test of general reasoning ability that includes both logical reasoning (LR) and analytical reasoning (AR) items—states that the section does not assume any formal training in logic. What is not discussed in that description is the relationship between logic background and performance—that is, does the section test, at least in part, for the kind of reasoning taught in symbolic logic courses, or only for a kind of reasoning acquired more generally? Is there any advantage accruing to examinees formally trained in logic? If so, what sort of advantage: speed, accuracy, or both? Would such an advantage suggest that the Analytical section of the GRE is coachable by way of logic training, in any meaningful sense? The purpose of this study was to explore the relationship between logic background and performance on the analytical section of the GRE. Specifically, the following questions were investigated: Is formal training in logic associated with shorter response time on both GRE Analytical item types? One of them? Neither? Likewise, is formal logic training associated with greater accuracy? Does test mode—i.e., computerized-adaptive (CAT) versus paper-and-pencil (P&P)—moderate these effects in any way? If so, how? How do problem-solving strategies differ, if at all, between those examinees with extensive logic background and those with less training? Do examinees point to training in logic as important or useful for this section of the GRE?

Using a unified theory of cognition and an information-processing framework, the aim of this study was to dissect cognitive processes across individuals with varying levels of logic preparation exposed to two different test environments. Accordingly, the study adopted Anderson’s (1983, 1989, 1993; Anderson & Lebiere, 1998) theory of cognition, Adaptive Control of Thought (ACT-R), and focused on the mental activity of examinees taking the analytical section of the GRE. Predictions based on ACT-R were tested. For example, ACT-R predicts that extensive background in logic should be associated with shorter response times and increased accuracy—if that background is relevant to the tasks—because of higher activation of declarative and procedural memory. According to ACT-R, humans almost always isolate one goal at a time, and then invoke a production (an if-then rule) for realizing that goal. Often more than one production can be used to reach a goal. In ACT-R, mediating between competing goals and productions is controlled through a process called “conflict resolution;” each time a goal becomes the current goal, a search for a production is triggered. Most of the time more than one production will be found, and conflict resolution will begin. One testable consequence of this cognitive processing is reduced response time for individuals with extensive background because they: have access to well-rehearsed, relevant productions; can more easily identify the most efficient procedures; and are less likely to experience prolonged cognitive conflict that would slow down processing time. The theory further predicts that additional cognitive demands in the form of, for example, a cognitively demanding test mode (e.g., CAT), may moderate the relationship between relevant background knowledge and test performance.

This study is part of a larger investigation (E. Bringsjord, 2001a, 2001b, 2000) exploring individual differences (including differences related to logic background) in examinee experience, particularly the cognitive experience, of taking the GRE Analytical test in CAT versus P&P environments. The methodology included both qualitative and quantitative dimensions. Data were collected using paper-and-pencil instruments, videotaped recordings of behaviors, observation, and interview. In addition, verbal protocol analysis (Ericsson & Simon, 1993) was used to elucidate differences in cognitive processes across individuals—with varying logic background—and test condition. Also included in the discussion are findings from a relevant study by Rinella, Bringsjord, and Yang (2001) which demonstrated pretest-posttest differences among undergraduate college students (n = 100) following instruction in symbolic logic on reasoning tasks purported by Cheng and Holyoak (1985) to be insensitive to such training.

Theoretical Framework

Anderson’s ACT-R

This study took John Anderson’s (1993; and Anderson & Lebiere, 1998) Adaptive Control of Thought (ACT-R) architecture to be an accurate model of information processing-based cognition, particularly cognition involved in problem solving. But what is ACT-R? Why is it appropriate for this study? And what does it imply in connection with the performance of examinees tackling the Analytical section of the GRE?

ACT-R is intended by Anderson (Anderson & Lebiere, 1998) to fulfill Alan Newell’s (1973) dream that psychology would eventually yield an information-processing model of such complexity and comprehensiveness that it would capture all of human cognition. “ACT-R consists of a theory of the nature of human knowledge, a theory of how this knowledge is deployed, and a theory of how this knowledge is acquired” ( Anderson & Lebiere, 1998). Another way to look at this is to say that ACT-R (or perhaps a descendant), if meticulously implemented in a computer program, would give rise to an artificial agent as smart and flexible as a human person. Obviously, if ACT-R is good enough to model human cognition completely, it should include the information processing engaged in by examinees taking the GRE and other such tests.

ACT-R is the result of more than 20 years of refinement of previous architectures. The sequence of such architectures starts with ACTE (Anderson, 1976), then moves to ACT* (i.e., ‘ACT Star’, Anderson, 1983), and then to ACT-R version 2.0 (Anderson, 1993), and finally to ACT-R v. 4.0 (Anderson & Lebiere, 1998), the theory used here. This sequence was initiated out of the Human Associative Memory (HAM) Theory of human declarative memory (Anderson & Bower, 1973). HAM was much more limited than even the first member of the Anderson’s ‘ACT’ sequence, ACTE. The fundamental reason is that ACTE and its descendants postulate much more than declarative memory; for example, they assume that human cognition is also composed of processing over IF-THEN rules called productions (or production rules), which are described below. Thus, ACT-R assumes both declarative and procedural memory.

Figure 1 provides an overview of ACT-R. As that diagram indicates, ACT-R is composed of four main components: Current Goal, Goal Stack, Declarative Memory, and Procedural Memory. Current Goal and Goal Stack, together, pretty much amount to what is often called “working memory” in cognitive psychology: it is the “working scratchpad” that holds sensory information coming in from the environment, and the result of processing that information (sometimes in conjunction with information from long-term memory). The Goal Stack holds the hierarchy of those things the agent intends to reach or make happen. Anderson and Lebeire (1998) liken the Goal Stack to a stack of trays at a cafeteria; the first one in is the last one out and the last one in is the first one out.

Figure 1

Architecture of Anderson’s (Anderson & Lebiere, 1998) ACT-R cognitive model

In the case of goals, when one is “popped,” or removed, the next most recent is retrieved. A goal stack records the hierarchy of intentions, in which the bottom goal is the most general and the goals above it are subgoals set in its service. (Anderson & Lebiere, 1998, p. 10)

The ‘Current Goal’, as its label suggests, reflects that which the agent has encoded and is currently focussed on obtaining. To take as an example one given by Anderson and Lebiere (1998), suppose a human agent is confronted with an arithmetic problem, say 3 + 4 = ? If the agent in question has “zeroed in” on this problem, and has for the time being allowed goals like “Pass my upcoming math test” to be left out of current processing, and hence left that goal in the goal stack, then Current Goal will be set to “Solve the arithmetic problem: 3 + 4 = ?” Once the current goal is fixed, productions (i.e. production rules) that relate to Current Goal are retrieved, and activated. This may strike the reader as a trivial example, but problems on the analytical section of the GRE—particularly the analytical reasoning items—would seem to give rise to very similar embodiments of ACT-R in those who try to solve these items.

Now what about ‘Declarative Memory’? This area of the architecture holds factual, propositional information that remains fairly stable over time, and episodic information as well. For example, you know that George W. Bush is President of the United States, that New York City is north of Miami, that 56 multiplied by 2 is 112, that such-and-such happened to you last holiday season, and that you ate dinner last night after doing such-and-such, which happened after you did this-and-that. Figure 2 shows a network representation of the fact that 3 + 4 equals 7. If you study this diagram for a minute, you might wonder what the ‘Bi’ and ‘Sji’ strings are there for. ‘Bi’ here is a variable that can hold the overall activation value of the fact. (As you read this, such facts as the GRE includes an analytical section probably have a high activation level in your mind, whereas facts like “Buffalo is west of Boston” probably have a low level.) ‘Sji’ is also a variable, one whose values will indicate the strength of the association between the concept at the start of the arrows (3, 4, 7) and the fact itself (3 + 4 = 7). (The connection between the number 3 and the fact that 3 + 4 = 7 is probably a fairly strong one for you, whereas the associative connection between ‘green’ and this fact is perhaps as low as zero.) Presumably, level of background knowledge can impact these quantities. For example, if an examinee recognizes the structure of a logical reasoning item as a “classic” example of a logical fallacy, then relevant rules of inference may acquire higher activation levels and incidental information relating to (say) the contextual features of the problem should have lower activation levels.

Figure 2

Example of Declarative Information in ACT-R architecture

One last point about declarative memory: it was said above that this memory is “propositional” in nature. This is indeed true, despite the rather exotic-looking network in Figure 2. To realize this, you have only to note that sometimes information in declarative memory is represented by Anderson and his colleagues in textual rather than network form, as in Figure 3 below.

Figure 3

Textual representation of an addition fact (from Anderson & Lebiere, 1998, p. 23)

Fact 3 + 4

is a ADDITION-FACT

addend1 Three

addend2 Four

sum Seven

Now for ‘Procedural Memory’, which is one of the distinctive aspects of ACT-R. This component is composed of procedural knowledge, which essentially means “knowing how” (rather than “knowing that,” which is the kind of knowledge in declarative memory). Procedural memory stands at the very heart of ACT-R because it points toward the basic building block for the architecture as a whole, viz., a production. A production is a conditional, or an IF-THEN statement; it says that if some condition C is true then some action A should be performed. It is very important to realize that the actions involved needn’t be physical actions; they can be mental in nature. And productions can be chained together.

Cognitive Load and Interference in CAT-Based Analytical Reasoning Items

When certain types of items are rendered in the CAT mode, the examinee is forced to work in multiple modes that are not present when these items are presented in traditional P&P form. For example, the analytical reasoning (AR) items on the GRE Analytical subtest “scream out” for scrap paper in order to carry out diagrammatic reasoning. (Kaplan, Princeton Review, etc. all prepare students for these items by trying to teach them how to use diagrams on scrap paper. See also Stenning, Cox, and Oberlander’s 1995 paper, regarding diagrams created by subjects attempting to solve similar problems.) This means that the moment such items are rendered in CAT form, test-takers are confronted with the prospect of having to deal with pencil, paper, and keyboard and computer screen. The increase in cognitive load is patent from an intuitive standpoint. Presumably the behavior of subjects on the test in question will provide data sufficient to ascertain whether or not there has in fact been interference and additional load. In particular, response time should be affected by such conditions.

In summary, ACT-R would suggest that extensive logic preparation is likely to be associated with enhanced performance on the GRE Analytical section, a test of reasoning and CAT—a cognitively demanding test mode—is likely to be associated with reduced performance. The former due to higher levels of activation of relevant declarative and procedural knowledge, and the latter due to cognitive interference.

Methodology

Subjects

College undergraduates who had never taken the Graduate Record Exam were recruited from Rensselaer Polytechnic Institute, Troy, NY. A total of 42 volunteer subjects participated, but two subjects had to be dropped due to equipment malfunction in one case and failure to complete the study in the other, leaving a total sample of 40 subjects (n = 20 per group). Subjects ranged in age from 18 to 21 years with a mean of 19.67 years. All subjects were full-time students; more than half indicated they were pursuing a dual major and roughly one third identified a minor. Academic majors among subjects were predominantly computer science and/or engineering (55%), with 35% subjects identifying information technology majors and the remaining 10% either management or mathematics-related majors. The sample was largely male (80% or n = 32) and predominantly White, non-Hispanic (77.5% or n = 31) with 17.5% Asian (n = 7) and 5% Black (n = 2). Subjects were randomly assigned to one of two testing conditions (n = 20 per group). All took the analytical section of the Graduate Record Examination—one group using the POWERPREP( software (Educational Testing Service, 1997) and the other using a paper and pencil version. At the end of the testing session, all subjects received a printed score report along with information about score interpretation.

Design

The cognitive experience of examinees—with varying levels of logic background—exposed to two modes of testing were studied using a mixed method design (Greene, Caracelli, & Graham, 1989). The quantitative portion of the study included pre and post measures as well as data collected during the testing session (e.g., response time per item). The qualitative aspects of the design primarily involved observational data collected during the test, analysis of artifacts (e.g., scrap paper), and posttest interviews—including use of verbal protocol analysis for two analytical reasoning items on each subject’s test. All subjects were video recorded throughout the testing session and interviewed briefly following the testing session. Thus, qualitative and quantitative data were collected on all subjects. It was anticipated that the qualitative data would complement the quantitative data, by providing a ‘window’ into the cognitions of these test-takers. Furthermore, the procedure provided opportunity for validation of qualitative self-report data with simultaneously collected measures such as mean response time and accuracy.

This study followed an independent two-group true experimental design in which subjects were randomly assigned to one of two treatment conditions. The design is depicted schematically in Table 1.

Table 1

Research Design

|Assignment |Group |Before |Treatment |After |

|R |Experimental 1 |O1 |X1 |O2 |

|R |Experimental 2 |O1 |X2 |O2 |

Before-treatment observations (O1) included background in logic and SAT score. Treatment was one of two testing modes: X1 represents the CAT mode; X2 represents the paper and pencil mode. Post-treatment observations (O2) included test scores, response time, scrap paper use, and self-reported cognitions.

Analytical Section of GRE

Paper-and-pencil and computerized-adaptive versions of the analytical section of the Graduate Record Examination (GRE) were administered to subjects assigned randomly to one of two experimental groups. The tests used were those published by the Educational Testing Service itself in their “PowerPrep” (Educational Testing Service, 1997) software bundle—for the CAT group, and General Test Form GRE 90-16 (ETS, 1990)—for the P&P group.

As the name implies, the analytical section(s) of the GRE are intended to measure the ability to think analytically. Two types of items are found in the analytical subtest: analytical reasoning (AR) items and logical reasoning (LR) items. AR items test

…the ability to understand a given structure of arbitrary relationships among fictitious persons, places, things, or events, and to deduce new information from the relationships given. Each analytical reasoning group consists of (1) a set of about three to seven related statements or conditions (and sometimes other explanatory material) describing a structure of relationships, and (2) three or more questions that test understanding of that structure and its implications. Although each question in a group is based on the same set of conditions, the questions are independent of one another; answering one question in a group does not depend on answering any other question. (Educational Testing Service, 1994, p. 34)

And LR items test

…the ability to understand, analyze, and evaluate arguments. Some of the abilities tested by specific questions include recognizing the point of an argument, recognizing assumptions on which an argument is based, drawing conclusions and forming hypotheses, identifying methods of [an] argument, evaluating arguments and counter-arguments, and analyzing evidence. (Educational Testing Service, 1994, p.40)

Procedure

After obtaining informed consent, subjects were asked to complete a questionnaire that included demographic and other background information, such as background in logic and computer experience—including familiarity with computer-based testing. Following the collection of background and baseline information, each subject received a set of scripted directions designed to simulate a true test-taking situation. The script also made use of imagery. Subjects were told to imagine they were about to take the Graduate Record Exam for real, that the test was very important to them, and that their performance would be a major factor in determining whether they were accepted into their first choice graduate program. Subjects were instructed to provide their best answer to each question. In order to encourage effortful participation (a concern in any study when the test doesn’t really “count”), subjects were told that participation would likely benefit them in two ways. First, they would be better able to gauge their level of preparedness; and second, they would get practice taking part of the GRE, which could lead to improved performance on the actual test. All subjects received a score report at the end of the testing session with score interpretation information. The researcher discussed the score report with each subject and answered questions about the report and the test. It was hoped that this ‘self-diagnostic’ aspect of the study would provide sufficient intrinsic motivation for optimum performance.

After appropriate instruction, the test session began. In both test conditions, all relevant activity was video recorded for subsequent analysis. Immediately after subjects finished the testing session (typically 60 minutes) they completed an investigator-created questionnaire concerning their perceptions of and attitude toward the test and testing mode. Finally, all subjects were interviewed using a short verbal protocol of approximately15 minutes wherein they solved two problems from the first analytical reasoning (AR) problem set encountered on the test.

Results

Logic Background and Response Time

Did subjects with extensive relevant background knowledge (e.g., procedural knowledge as might be acquired through a GRE prep course or content knowledge from coursework in logic) have shorter mean response times compared to those with less background, as predicted by Anderson’s ACT-R?

None of the subjects in this study reported that they had taken a GRE preparation course, neither did any report preparing for the GRE on their own. In fact, most of the subjects indicated that they did not know what the Graduate Record Exam was. On the other hand, ten (n = 10) subjects had taken two or more college-level courses in logic—which was used as a proxy for “extensive relevant background knowledge.” Table 2 shows that indeed subjects with more background (i.e., two or more courses) in logic had shorter response times on the logical reasoning (LR) items than both subjects with no background (n = 16) and subjects with some preparation (n = 14). In addition to the total sample, this difference held for each experimental group; it also held after adjusting the means using the SAT combined score as a covariate. Consistent with the hypothesis, pairwise comparisons revealed a significant difference in mean response time between subjects with extensive preparation in logic (i.e., two or more courses in logic) versus those with some preparation at p < .05. Although the difference between mean response time for subjects with extensive preparation in logic versus those with “no preparation” was in the direction predicted by ACT-R (i.e., those with extensive background had shorter mean response times on LR items), the difference did not reach statistical significance.

Table 2

Mean and Adjusted Mean Response Time on Logical Reasoning (LR) Problems across Level of Logic Background for Experimental Groups and All Subjects

| |Background in Logic |Mean LR |N |SD |Adjusted Mean LR |Standard |

| | |Response Time | | |Response Time |Error |

| | |(sec) | | |(sec)a | |

|Computerized Adaptive Testing |No preparation |87.45 |8 |18.72 |87.58 |5.17 |

| |Some coursework |89.47 |6 |18.53 |90.18 |5.97 |

| |> 2 college courses |77.21 |6 |21.23 |81.36 |6.07 |

|Paper and Pencil Testing |No preparation |75.53 |8 |5.57 |71.06 |5.30 |

| |Some coursework |83.12 |8 |22.02 |84.40 |5.18 |

| |> 2 college courses |67.22 |4 |6.42 |66.01 |7.32 |

|All subjects* |No preparation |81.49 |16 |14.69 |79.32 |3.70 |

| |Some coursework |85.84 |14 |20.09 |87.29 |3.96 |

| |> 2 college courses |73.21 |10 |17.05 |73.68 |4.73 |

a Means adjusted using SAT composite score as the covariate

* Adjusted Means for subjects with > 2 college courses vs. those with some coursework differ significantly at p < 05.

An unexpected finding was that those subjects who reported “no preparation” in logic had shorter response times on LR items, within and across experimental groups, than those subjects reporting they had taken some coursework. Given that logic instruction is frequently embedded in kindergarten through 12th grade math curricula throughout the United States, and in particular in New York (where 35% of participants reported they attended high school), some subjects may not have recognized that they did in fact have some formal preparation in logic.

As discussed below, a scatter plot showing GRE Analytical scores across level of logic background (see Figure 6) suggested a trend consistent with this explanation; that is, the deviations of scores from the line of “best fit” are greatest at the lowest level of logic preparation.

Figure 4 below depicts graphically the adjusted (using SAT composite as the covariate) mean response times on LR items across levels of logic background for CAT and P&P subjects. The trends across level of logic preparation for the two experimental groups are quite similar, although mean response time for CAT subjects tended to be longer than that for P&P subjects, regardless of level of logic preparation.

Figure 4

Adjusted Mean Response Time on Logical Reasoning (LR) Problems across Level of Logic Background for CAT versus P&P Experimental Group

In sharp contrast to the trends noted on LR items, AR response time across level of logic background—specifically, at the higher end—showed divergent patterns for the two testing modes (see Table 3 and Figure 5). That is, CAT subjects with extensive background in logic had longer mean and adjusted mean response times than their less-prepared counterparts, while P&P subjects with extensive background had shorter response times than those with some course work. Although the graphed means would seem to suggest an interaction, no significant interaction was noted.

Table 3

Mean and Adjusted Mean Response Time on Analytical Reasoning (AR) Problems across Level of Logic Background for Experimental Groups and All Subjects

| |Background in Logic |Mean AR |N |SD |Adjusted Mean AR|Standard |

| | |Response Time | | |Response Time |Error |

| | |(sec) | | |(sec) a | |

|Computerized Adaptive Testing |No preparation |100.14 |8 |25.99 |100.23 |5.97 |

| |Some coursework |112.99 |6 |20.30 |113.45 |6.89 |

| |> 2 college courses |128.53 |6 |21.38 |131.22 |7.00 |

|Paper and Pencil Testing |No preparation |78.61 |8 |3.72 |75.72 |6.12 |

| |Some coursework |87.99 |8 |12.66 |88.82 |5.98 |

| |> 2 college courses |78.85 |4 |11.63 |78.07 |8.45 |

|All subjects* |No preparation |89.37 |16 |21.10 |87.97 |4.27 |

| |Some coursework |98.70 |14 |20.24 |101.14 |4.57 |

| |> 2 college courses |108.66 |10 |30.94 |104.65 |5.46 |

a Means adjusted using SAT composite score as the covariate

* Means for subjects with > 2 college courses vs. those with no preparation and those with some coursework differ significantly at p < .05.

Figure 5

Adjusted Mean Response Time on AR Problems across Level of Logic Background for CAT versus P&P Experimental Group

Although the differences in mean AR response times between levels of logic preparation were, for the most part, in the opposite direction predicted by ACT-R, the differences were significant, with F (2, 33) = 3.56, p = .04. Specific pairwise comparisons of means between subjects with “no preparation” versus those with some coursework and between subjects with “no preparation” versus those with two or more college courses were also found to be statistically significant at p < .05. Thus, whereas extensive background in logic appeared to be associated with decreased response time on LR items—as predicted, it was associated with longer mean response time on AR items for CAT subjects and shorter mean response times for P&P subjects.

Logic Background and Performance (Accuracy)

To explore whether logic background was associated with improved performance as measured by GRE Analytical score, first, level of logic preparation was treated as a continuously distributed variable. That is, levels of logic preparation (i.e., 1 = no preparation, 2 = high school course, 3 = college course, 4 = two college courses, and 5 = more than two college courses) were assumed to represent equal intervals along a degree-of-logic-preparation scale. Next, a Pearson product-moment correlation was calculated between GRE Analytical score and logic preparation. The calculation revealed that more extensive logic preparation was associated with higher GRE Analytical scores (r = .361, p = .022).

Figure 6

Scatter plot of GRE Analytical Scores across Levels of Logic Background with a Line of “Best Fit” Superimposed

A scatter plot of GRE Analytical scores across levels of logic background is shown in Figure 6. Superimposed on the plot is a line of “best fit” which suggests (supported by the correlation coefficient) a positive relationship between the two variables. Of interest, as noted previously, is that the there is a much greater spread of scores on the left hand side of the X axis among subjects reporting “no preparation” in logic. It may be the case, as already alluded to, that since logic instruction is often integrated in primary and secondary math curricula, some subjects may have underreported their preparation. On the other hand, it may also be the case that given only pre-college level logic instruction, embedded as part of a K-12 math curriculum, a wide range of actual “logic background” is likely to be associated with that preparation.

To further examine the relationship between logic background and performance, a one-way ANOVA was conducted which revealed significant group differences across the three levels of logic background (i.e., no coursework, some coursework, and 2 or more courses), with F(2,37) = 3.484, p = .041. Consistent with the correlational analysis and scatterplot (Figure 6), higher GRE Analytical scores were associated with stronger background in logic for subjects across and within groups (see Table 4). When ability was controlled for, using the SAT composite score as the covariate, those subjects with the strongest background in logic (i.e., two or more courses) outperformed their less-prepared colleagues. Subjects who reported “no preparation” in logic had higher adjusted mean scores than those with “some coursework” in the paper-and-pencil group; this difference carried over into the total sample as well. Once again, embedded K-12 logic instruction may offer some explanation.

Table 4

Mean and Adjusted Mean GRE Analytical Score across Level of Logic Background for Experimental Groups and All Subjects

| |Background in Logic |Mean GRE |N |SD |Adjusted Mean GRE |Standard |

| | |Analytical | | |Analytical Scaled |Error |

| | |Scaled Score | | |Score a | |

|Computerized Adaptive Testing |No preparation |638.75 |8 |107.63 |637.72 |28.17 |

| |Some coursework |653.33 |6 |72.33 |647.90 |32.54 |

| |> 2 college courses |726.67 |6 |47.61 |694.81 |33.06 |

|Paper and Pencil Testing |No preparation |598.75 |8 |146.33 |632.95 |28.88 |

| |Some coursework |606.25 |8 |129.83 |596.42 |28.23 |

| |> 2 college courses |717.50 |4 |41.93 |726.75 |39.87 |

|All subjects* |No preparation |618.75 |16 |125.80 |635.34 |20.16 |

| |Some coursework |626.43 |14 |108.03 |622.16 |21.56 |

| |> 2 college courses |723.00 |10 |43.22 |710.78 |25.80 |

a Means adjusted using SAT composite score as the covariate

* Means and adjusted means for subjects with > 2 college courses differ significantly from those with some course work and those with no preparation at p < 05.

Figure 7 depicts graphically the adjusted mean GRE Analytical scores (using SAT composite as the covariate) across level of logic background for CAT and P&P subjects. Although the graphed means intersect, the interaction failed to reach statistical significance at p < .05. A univariate test of the effect of logic background, based on linearly independent pairwise comparisons of marginal adjusted means, was significant, with F(2, 33) = 3.877, p = .031. In particular, those with two or more courses in logic had significantly higher mean scores than their less-prepared peers even when scores were adjusted for general ability (using the SAT covariate). Test mode effects were not significant. Thus, regardless of test mode, subjects with two or more years of logic preparation outperformed their less-prepared counterparts.

Figure 7

Adjusted Mean GRE Analytical Score across Levels of Logic Background for CAT versus P&P group

How do problem-solving strategies (on GRE analytical items) differ, if at all, between those examinees with extensive logic background and those with less training?

To answer this question, scrap paper use and verbal protocols were analyzed. Scrap paper analysis was carried out for all subjects (n = 40). Although all subjects participated in a posttest think aloud, due to time constraints, a sub-sample (n = 14) of verbal protocols was randomly selected for in-depth analysis.

Scrap paper analysis revealed that virtually no subjects used scrap paper for LR items. In sharp contrast, there was extensive reliance on scrap paper to solve AR problems across logic background, with nearly the same amount of reliance among those subjects with some course work in logic (M = 97.79% of AR problems) and those with 2 or more courses (M = 97.08%), followed by those with no preparation (M = 92.24%). Overall, CAT subjects tended to use scrap paper more extensively than their P&P counterparts. However, ANOVA revealed no significant interaction between level of logic background and test mode.

Differences were also noted with respect to the detail of problem representation on scrap. Subjects with “no preparation” in logic had the lowest mean modeling of AR problems at 3.13 (on a 5-point scale, 5 being the most sophisticated) vs. 3.57 for those with some coursework and 3.33 for those with 2 or more logic courses.

Ten subjects employed symbolic logic to answer questions. Interestingly, those who did so had higher mean scores than their less logic-driven peers (M = 683.00, SD = 116.05 vs. M = 635.67, SD = 108.97). As seen in the graphed means below (Figure 8), those subjects who used symbolic logic had slightly higher GRE scores across all levels of logic background.

Figure 8

Adjusted Mean GRE Scores for Subjects who Used Symbolic Logic to Solve AR Problems across Levels of Logic Preparation

Perhaps contributing to the success among subjects using logic was that they tended to model AR problems on scrap in more detail than others (M = 3.7 vs. M = 3.2, on a 5-point scale with 5 being the most detailed), and this difference was significant with F(1,38) = 5.008, p = .031.

Do examinees point to training in logic as important or useful for this section of the GRE?

On the posttest questionnaire, when asked to indicate their level of agreement with the statement “My reasoning skills are strong,” those subjects with 2 or more college courses indicated stronger agreement (M = 4.1 on a 5-point Likert scale with 5 being the strongest level of agreement) than those with some course work (M = 3.7) and those with no preparation in logic (M = 3.6).

Discussion

Consistent with predictions derived from Anderson’s ACT-R, subjects with extensive background knowledge in logic (i.e., 2 or more college courses) had shorter mean response times on logical reasoning (LR) items than did subjects with less background (i.e., some coursework). The difference in mean response time reached statistical significance for the total sample; this finding suggests that relevant background knowledge—both declarative and procedural—may have freed up working memory so that problem solving was more efficient in the case of LR items.

On the other hand, logic background was generally not associated with shorter response times on analytical reasoning (AR) items. In fact, a significant difference in the inverse direction was noted for CAT subjects: that is, subjects with two or more college courses in logic showed significantly longer response times than their less-prepared counterparts. Three possible explanations are offered: 1) Logic background may be less salient with AR problems than with LR problems; 2) Those with logic background may have taken more care in representing and working out AR problems—especially if speed was less critical; and/or 3) Test mode effects may be more salient with AR items than with LR items.

Let’s examine the first explanation. Whereas LR items call primarily for analysis of a line of reasoning or determining the logical coherence of arguments, AR items call for the sorting and organization of conditions—often requiring mechanical spatial manipulation in the form of diagrammatic representation. In the verbal protocols, several subjects indicated that the AR items seemed relatively easy—once you plugged all the information into a diagram and worked it through—compared to LR items which, according to one subject, “made you think.” Recall also that subjects were much less likely to use scrap paper on LR items, especially CAT subjects. Because of the numbers of variables that must be juggled, AR problems cannot be easily done in the head. In fact, the representation of conditions—either pictorially or syntactically—serves to free up working memory for determining the solution. A certain amount of time is required for transcription of problems, no matter how elementary the representation may be. But when the diagrams are done well, which may involve considerably more time, the deductions arise automatically.

With respect to the second explanation, in contrast to less well-prepared subjects, those with logic background may have taken more care in solving problems; indeed, they may have wanted to take problem solutions to another level once arrived at mechanically, so-to-speak. In other words, those trained in logic may not have been satisfied with just a mechanical plug-and-chug problem solving approach, especially given the luxury of less speeded conditions as was the case in the CAT mode. Verbal protocol analysis also revealed that CAT subjects often solved problems completely before referring to the possible answers. In any event, those trained in logic had higher scores than those with less preparation, so there appeared to be some advantage from logic background with respect to accuracy, if not consistently with respect to speed.

Finally, the possibility that test mode may be more salient on AR problems than LR problems receives support from observational data and from subject’s post-test questionnaires. In general, subjects were observed to consistently use scrap paper for AR problems, whereas scrap paper was used very little on LR items, but CAT subjects were observed to use scrap on more items than P&P subjects (M = 97% vs. M = 93%, respectively). Use of scrap paper in the CAT mode entailed more transcription and lots of back and forth eye movements from screen to paper and back. When asked to indicate their level of agreement with the statement, “It was frustrating to have to move back and forth between the test and scrap paper” CAT subjects indicated significantly stronger agreement than their P&P colleagues, with F(1,38) = 7.997, p = .007.

The results of the present study are consistent with those obtained in a study conducted by Rinella, Bringsjord, and Yang (2001). There is insufficient space here to present the data from this study. The main point is that these three researchers found that students who took a first course in symbolic logic were generally able to solve reasoning problems long held—on the strength of a famous study carried out by Cheng and Holyoak (1985)—to be impervious to training in formal logic. Rinella et al. conducted a pretest-posttest experiment. Both these tests were formally similar; that is, from the standpoint of symbolic logic, the questions on the pretest and posttest had similar underlying mathematical structure—but the English that “clothed” this structure was different in the pre and post cases. Items on these two tests included not only items which Cheng and Holyoak (1985) had used in their own pre and posttests (e.g., the Wason selection task), but also items similar to those found in the analytical section of the GRE.

Implications

Background in logic seemed to have differential effects on AR problem response time between the two test modes. More extensive logic background was associated with decreased response time for the P&P group and increased response time for the CAT group. On LR problems, more extensive background in logic was associated with shorter response time overall, regardless of test mode. What are the implications for these differential effects? First, more research is needed. This study should be replicated with a larger sample size and also with a pre-test of logic. That would assure greater confidence in the relationship between logic background and response time. Second, there may be implications for test preparation. That is, if those trained in logic are more likely to spend more time working AR items out, because they choose to represent the problem both pictorially and syntactically (let us assume), this could jeopardize performance. On the other hand, logic background was associated with higher scores. Both of these findings have potential implications for practice in the form of providing appropriate guidance in the test preparation materials.

At least potentially, there are other, “deeper” implications. It is one thing to conclude that stuyding symbolic logic can enable cognition that secures higher scores on standardized tests, but it is quite another thing to conclude that such training can help students become more successful than they would otherwise be in today’s economy. The present study, at best, supports the former, more humble, conclusion. But perhaps it is safe to say that the present study, especially combined with the aforementioned concordant one carried out by Rinella, Bringsjord, and Yang (2001), does at least suggest that further research should be carried out to determine if the second, far-reaching conclusion can in fact be justifiably drawn. In connection with this issue, a recent paper by Stanovich and West (2000) should at least be mentioned here. Stanovich and West (2000) hold that “life is becoming more like the tests” (p. 714); and the tests they refer to include the GRE. The basic idea is that our increasingly “technologized” world demands cognition that is abstract and symbolic, rather than concrete and anchored to physical circumstances. As an example, they give the challenge of making a rational decision as to how to apportion investments made to a retirement fund—and they provide many other examples as well. If Stanovich and West (2000) are right (and there are others who perceive a link between mastery of symbolic logic and “real world” competency: e.g., see Adler 1984), it may be that teaching symbolic logic from the standpoint of mental metalogic can give students an increased ability to thrive in the high-tech economy of today. Whether or not this is so will hinge on subsequent research, to which we hope to contribute.

References

Adler, J. E. (1984). Abstraction is uncooperative. Journal for the Theory of Social Behavior 14: 165-181.

Anderson, J. R. (1976). Language, memory, and thought. Hillsdale, NJ: Lawrence Erlbaum Associates.

Anderson, J. R. (1983). The Architecture of Cognition. Cambridge, MA: Harvard University Press.

Anderson, J. R. (1989). A theory of human knowledge. Artificial Intelligence, 40, 313-351.

Anderson, J. R. (1993). Rules of the mind. Hillsdale, NJ: Lawrence Erlbaum.

Anderson, J. R. & Bower, G. H. (1973). Human associative memory. Mahwah, NJ: Lawrence Erlbaum Associates.

Anderson, J. R.; & Lebiere, C. (1998). The atomic components of thought. Mahwah, NJ: Lawrence Erlbaum.

Bringsjord, E. (2001a). Computerized-adaptive versus paper-and-pencil testing environments: An experimental analysis of examinee experience (Doctoral dissertation, University at Albany, State University of New York, 2001). Dissertation Abstracts International (in press).

Bringsjord, E. (2001b). Computerized-adaptive versus paper-and-pencil testing environments: An experimental analysis of examinee experience. Paper presented in the Distinguished Paper Series at the annual meeting of the American Educational Research Association, April 15, 2001, Seattle, WA.

Bringsjord, E. (2000, October). Computerized-Adaptive versus Paper-and-Pencil Testing Environments: An experimental analysis of examinee experience. Paper presented at the annual meeting of the Northeastern Educational Research Association, Ellenville, NY.

Cheng, P. W., & Holyoak, K. J. (1985). Pragmatic versus syntactic approaches to training deductive reasoning. Cognitive Psychology, 17, 391-416.

Educational Testing Service (1990). The Graduate Record Examinations General Test. Princeton, NJ: Educational Testing Service.

Educational Testing Service (1994). [“The official guide”] GRE: Practicing to take the General Tests (9th ed.). Princeton, NJ: Educational Testing Service.

Educational Testing Service (1997). POWERPREP®—Preparing for the GRE General Test. Princeton, NJ: Educational Testing Service.

Ericsson, K. A., Simon, H. A. (1993). Protocol analysis: Verbal reports as data. Cambridge, MA: MIT Press.

Greene, J. C., Caracelli, V. J., & Graham, W. F. (1989). Toward a framework for mixed method evaluation designs. Educational Evaluation and Policy Analysis, 11, 255-274.

Newell, A. (1973). Production systems: Models of control structures. In W.G. Chase (Ed.), Visual information processing (pp. 463-526). New York, NY: Academic Press.

Rinella, K., Bringsjord, S., & Yang, Y. (2001) Efficacious logic instruction: People are not irremediably poor deductive reasoners. In Stenning, K. & Moore, J.D., eds., Proceedings of the 23rd Annual Conference of the Cognitive Science Society (Mahwah, NJ: Lawrence Erlbaum), pp. 851-856.

Stanovich, K. & West, R. (2000). Individual differences in reasoning: Implications for the rationality debate? Behavioral and Brain Sciences 23(5): 645-665, 701-726.

Stenning, K., Cox, R., Oberlander, J. (1995). Contrasting the cognitive effects of graphical and sentential logic teaching: Reasoning, representation, and individual differences. Cognitive Processes, 10(3/4): 333-354.

Yang, Y. & Bringsjord, S. (2001). Mental metalogic: A new paradigm for psychology of reasoning. Proceedings of the 3rd Annual International Conference on Cognitive Science (Hefei, China: Press of the University of Science and Technology of China), pp. 199-204.

-----------------------

[pic]

[pic]

[pic]

[pic]

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download