Introduction



Beauty is in the eye of the examiner: reaching agreement about physical signs and their value.

Anthony M. Joshua1, 3, David S. Celermajer2, 3, Martin R. Stockler1, 3,4

1. Departments of Medical Oncology, Sydney Cancer Centre, Royal Prince Alfred Hospital, Sydney, Australia.

2. Department of Cardiology, Royal Prince Alfred Hospital, Sydney, Australia.

3. Central Clinical School, Royal Prince Alfred Hospital, University of Sydney.

4. NHMRC Clinical Trials Centre, School of Public Health, University of Sydney.

Summary

Despite advances in other areas, evidence based medicine is yet to make substantial inroads on the standard medical physical examination. We have reviewed the evidence about the accuracy and reliability of the physical examination and common clinical signs. The studies available were of variable quality and only incompletely covered standard techniques. The physical examination includes many signs of marginal accuracy and reproducibility. These may not be appreciated by clinicians and adversely effect decisions about treatment and investigations or the teaching and examination of students and doctors-in-training. Finally, we provide a summary of points which can guide an accurate and reproducible physical examination.

Introduction

The physical examination is probably the most common diagnostic test used by doctors, and yet its accuracy and reliability have not been scrutinised with the same rigorous standards applied to other diagnostic modalities (which usually rely on calibrated machines and technology rather than on “artful” physicians). Reliability and accuracy are two different measures - the findings of two doctors may agree (be reliable) yet be wrong (inaccurate) when objectively assessed. The overall physical examination has not been reviewed using these criteria for over a quarter of a century.[i] However, parts of the exam have been reviewed individually as part of The Rational Clinical Examination Series in JAMA, and the analysis of a constellation of history taking, signs and their integration into a syndromal diagnosis has been reviewed for a number of conditions recently.[ii],[iii] As previous reviewers have found, the evidence in this area varies and there are relatively few studies of high quality.[iv]

Forty years ago, up to 88% of all primary care diagnosis were made on history and clinical exam[v], and even 20 years ago up to 75% of all diagnosis in a general medicine clinic were made using these tools.[vi] Whilst these percentages may be even lower in recent years, the physical exam will always retain its importance as an essential tool of modern practice. By formally reviewing various aspects of the physical exam, we hope to give clinicians a greater degree of appreciation of its real value, allowing them to refine their examination technique and therefore their ordering and prioritising of appropriate investigations.

Methods

We searched MEDLINE by utilizing an iterative strategy of combining the keywords such as “kappa” or “sensitivity” or “specificity” or “likelihood ratio” with the MESH subject heading “physical examination” as well as the keywords of various physical examination techniques such as “auscultation” or “palpation” or “percussion” or various physical examination findings such as “heart sounds”, “ascites”, “hepatomegaly”, “palsy” and “paraesthsia”. Reference lists were scanned from previous meta-analyses and systemic reviews, major textbooks of the physical examination and of internal medical subspecialties. Articles were limited those in the English language.

Statistics used in this Article

Kappa is an index that describes the level of agreement beyond that expected by chance alone and can be thought of as the chance-corrected proportional agreement. Possible values range from +1 (perfect agreement) via 0 (no agreement above that expected by chance) to –1 (complete disagreement). As a guide, the level of agreement reflected by a Kappa value of 0 to 0.2 is “slight”, 0.21 to 0.4 is “fair”, 0.41 to 0.6 is “moderate”, 0.61 to 0.8 is “substantial” and 0.81 to 1 is “almost perfect”.[vii] The need for this statistic arises because of the substantial rate of agreement that arises by chance alone. For example, if two physicians each consider half the cases they see abnormal, then they will agree 25 percent of the time by chance alone. The drawback of Kappa is that it varies with prevalence; the level of agreement expected by chance varies according to the proportion of cases considered abnormal across observers.[viii] For example, as prevalence approaches the extremes of 0% or 100%, Kappa decreases. Thus a low Kappa in a sample with a low prevalence (e.g.10%) does not reflect the same lack of agreement as the same Kappa in a sample with a moderate prevalence (e.g. 50%).[ix] The number of possible response categories of a test also influences Kappa – a dichotomy (present or not) will give a higher Kappa value than a scale with more than 2 levels.[x] Thus comparisons of Kappas across studies must therefore be interpreted carefully. Nevertheless, most of the medical literature on inter-observer reliability over the last 2 decades has been reported in terms of Kappa rates and it remains a useful summary measure of agreement.

Likelihood ratio (LR): This is the probability that a test is positive in those with a disorder divided by the probability the test is positive in those without the disorder. A LR greater than 1 gives a post-test probability that is higher than the pre-test probability. A LR less than 1 produces a post-test probability that is lower than the pre-test probability. When the pre-test probability lies between 30% and 70%, test results with a high LR (say, greater than 10) make the presence of disease very likely and test results with a low LR (say, less than 0.1) make the presence of the disease very unlikely.

Specificity: the probability that a finding is absent in people who do not have the disease. Highly specific tests, when positive, are useful for ruling a disease in (the mnemonic is SpIn: specific rules in).

Sensitivity: the probability that a symptom is present in people that have the disease. Highly sensitive tests, when negative, are useful for ruling a disease out (the mnemonic is Snout: sensitive rules out).

Table 1 – Comparisons of Kappa Values for Common Clinical Signs (agreement between observers beyond that expected by chance alone)

|Sign |Kappa Value |Reference(s) |

|Abnormality of extra-ocular movements |0.77 |111 |

|Size of Goitre by examination |0.74 |115 |

|Forced expiratory time |0.70 |[xi] |

|Presence of wheezes |0.69 |2 |

|Signs of liver disease e.g. jaundice, Dupytrens, spider naevi |0.65 |78 |

|Palpation of the posterior tibial pulse |0.6 |65 |

|Dullness to percussion |0.52 |19 |

|Tender liver edge |0.49 |86 |

|Clubbing |0.39-0.90 |18,19,20 |

|Bronchial breath sounds |0.32 |19 |

|Hearing a systolic murmur |0.3-0.48 |41, 42 |

|Tachypneoa |0.25 |67 |

|Clinical Breast Examination for cancer |0.22-0.59 |72 |

|Neck stiffness |-0.01 |111 |

Table 2- Examples of sensitivities and specificities for Common Clinical Sings

|Sign |Underlying Condition |Sensitivity |Specificity |Reference |

|Shifting Dullness |Ascites |85 |50 |87, 88 |

|Palpable Spleen – Specifically |Splenomegaly |58 |92 |91 |

|examined | | | | |

|Goitre |Thyroid disease |70 |82 |115 |

|Abnormal foot pulses |Peripheral Vascular Disease |63-95 |73-99 |[xii] |

|S3 |Ejection fraction < 50% |51 |90 |40 |

| |Ejection fraction < 30% |78 |88 |40 |

|Trophic skin changes |Peripheral Vascular Disease |43-50 |70 |[xiii] |

|Hepatojugular reflux |Congestive Cardiac Failure |24-33 |95 |30 |

|Initial impression |COPD |25 |95 |68 |

|Femoral arterial bruit |Peripheral Vascular Disease |20-29 |95 |[xiv], [xv] |

|Prolonged capillary refill |Peripheral Vascular Disease |25-28 |85 |13 |

|Tinel’s sign |Carpal Tunnel Syndrome |25-75 |75-90 |103 |

|Kernigs Sign |Meningitis |5 |95 |[xvi] |

Cardiovascular Exam

Clubbing

The old adage that compares clubbing to pregnancy - “Decide if it’s present or not – there’s no such thing as early clubbing” is incorrect - a recent review has identified 3 variables i.e. profile angle, hyponychial angle, and phalangeal depth ratio which can be used as quantitative indices to identify clubbing.[xvii] There are no angles that define clubbing, only its absence. Impressive Kappa values of 0.39 to 0.90[xviii],[xix],[xx] attest to the objectivity and importance this sign will always retain in clinical practice. The Schamroth sign (opposition of two “clubbed” fingernails vertically obliterating the normally formed diamond-shaped window) is an interesting manoevre for detecting clubbing that has not been formally tested.[xxi]

Atrial Fibrillation

The sensitivity and specificity of an “irregularly irregular” pulse for atrial fibrillation has never been formally assessed however Rowles et al., looked at the R-R intervals and pulse volumes with Doppler in 74 patients with atrial fibrillation and found there were periods of pulse regularity in 30% and pulse volume regularity in over 50%.[xxii]

Blood Pressure

There are minute-to-minute physiologic variations of 4mmHg systolic and 6-8 mmHg diastolic within patients. [xxiii],[xxiv] With respect to the examiners as the source of variability, differences of 8-10 mmHg have been reported frequently for both physicians and nurses[xxv],[xxvi]; this is the same order of magnitude achieved by several commonly used anti-hypertensive agents.

The Jugular Venous Pressure (JVP)/ Central Venous Pressure (CVP)

The first challenge in assessing jugular venous waveform is finding it and there is only sparse information on how well this is done. In one study, examiners “found” a JVP in only 20% of critically ill patients.[xxvii] Prediction of the CVP by assessing the JVP has been more extensively studied. One study found physicians correctly predicted the CVP only 55% of the time in an intensive care setting.[xxviii] In another study, Cook recruited medical students, residents and attending physicians to examine the same 50 ICU patients and estimate their CVP. Agreement between the students and residents was surprisingly high (Kappa of 0.65), moderate between students and physicians (Kappa of 0.56), and lowest between residents and staff physicians (Kappa of 0.3).[xxix] She also found substantial inter-observer and intra-observer variations of up to 7cm in estimations of the CVP. In another study of 62 patients undergoing right heart catheterisation, various medical staff predicted whether four variables, including the CVP, were low, normal, high, or very high. The sensitivity of the clinical examination for identifying a low (< 0 mm Hg), normal (0 to 7 mm Hg), or high (>7 mm Hg) CVP was 33%, 33% and 49% respectively. The specificity of the clinical examination for identifying low, normal, or high CVP was 73%, 62%, and 76% respectively. Interestingly, accuracy was no better for cases in which agreement among examiners was high.[xxx]

The presence of abdominojugular reflux is good for ruling in congestive cardiac failure in (i.e. high specificity of 0.96, likelihood ratio positive 6.4), but its absence is poor for ruling it out (low sensitivity 0.24, likelihood ratio negative 0.8).[xxxi]

The Carotid Pulse

It is easy to agree upon the presence of a carotid bruit (Kappa = 0.67) but not its character (Kappa < 0.4)[xxxii]. The NASCET trial showed that over a third of high grade carotid stenoses (70-99%) had no detectable bruits and a focal ipsilateral carotid bruit had a sensitivity of only 63% and a specificity of 61% for high-grade stenosis. These unhelpfully moderate values give equally unhelpful likelihood ratios: the odds of high grade stenosis are only doubled by the presence of a carotid bruit, and only halved by the absence of a carotid bruit – not nearly enough to confidently rule in or out this important pathology.[xxxiii], [xxxiv]

The Praecordium

Appreciating the importance of the apex beat in cardiology patients may be more difficult than many of us realise: the apex beat is palpable in just under half of all cardiology patients in the supine position. An apical impulse lateral to the mid-clavicular line is a sensitive (100%) indicator of left ventricular enlargement, but not a specific one (18%).[xxxv] Clinicians often agree about the presence of a displaced apical impulse (Kappa 0.53-0.73)[xxxvi] however it takes a complete clinical exam to classify patients into low, intermediate and high probabilities of systolic dysfunction.[xxxvii]

The third heart sound.

Many clinicians agree that hearing that a third heart sound can be “physiological” finding[xxxviii], but how often do they agree that they can hear it at all? Not very often, it seems. Kappa values for hearing a third heart sound range from a trivial 0.1 to reasonable 0.5,[xxxix] while its moderate sensitivity 78% and higher specificity 88% make its presence useful for helping to rule in severe left ventricular dysfunction (ejection fraction 3) but that colour abnormalities, prolonged capillary refill time and trophic changes are not (LR positive ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download