Comments - BMJ



Reviewer 1

|Name: |

|Position: |

| |

|This is a very interesting paper which addresses the difficult issue of performing RCT's of appliances in patients with osteoarthritis. Osteoarthrtis is |

|a common condition that leads to significant pain, disability for the patients and impacts heavily on health care resources. |

| |

|Magnets would be an excellent addition to the physicians armoury if thought to be effective. This is predominently due to the safety and cost |

|effectiveness of such a treatment. The trial has been well designed and presented. Although I appreciate the constraint imposed by the word limit, a few |

|extra details would make the paper much easier to interpret . |

| |

|What was the split of clinical vs radiographic cases?. It says that nurses arranged x-rays if needed. Does that mean that they were all x-rayed? Was |

|there any difference in response between clinical and radiographic cases? Was there a differential response between grades of x-ray changes where |

|available? |

| |

|What was the split between hip and knee OA? Was there a differential response. |

| |

|What percentage of people had monoarthrits and what percentage polyarticular OA? |

| |

|Was a record of analgesic usage through the trial kept. If so did it differ between groups and in a multivariate analysis, did it affect the overall |

|effect size of the bracelet? |

| |

|There appears to be a considerable spread in the stength of the magnets. Within groups A and B, was there a significant correlation between strength of |

|magnet and improvement in pain. This may help to prove an effect over and above placebo response. |

| |

|If the group B was restricted to patients who received magnets of the correct strength, did this alter the results significantly? |

| |

|To try and address the issue of pacebo response, it would be worth looking at the treatment effect in dicotomised subgroups of belief if that were |

|possible. |

| |

|The main difficulty of this paper is teasing out a true treatment effect from the placebo effect which is almost certainly present. Although i am sure |

|that it will be impossible to fully apportion effects to each of these, further discusiion would be helpful. I would also be interested to see a |

|discussion about the relaitive importance of this issue in clinical practice with a simple, safe treatment such as this. |

Reviewer 2

| |

|Name: |

|Position: |

| |

|Should this paper have been anonymised? I know some of the authors! Also I note that Michael Dixon is a substantial contributor but is not a named |

|author: a typo perhaps? |

| |

|The methods seem sound, it appears to be well-powered and the outcome measures are appropriate. Discussion and the conclusions are reasonable. The |

|stats are a little complex for me to give a definitive opinion, sorry, so I suggest you run them past a more knowledgeable statistician. If the stats|

|are sound then this is a valuable and I believe ground-breaking report which I recommend you publish. |

Reviewer 3 (Commissioned for this training package)

|Review of magnetic bracelets paper. |

| |

|This is a randomized, blinded, three-arm trial of magnetic therapy for osteoarthritis. It is always nice to see an adequately powered randomized |

|trial, particularly in novel or controversial areas of medicine. The paper will therefore be of general medical interest, suitable for publication in|

|the BMJ. |

| |

|The paper would be improved by attention to the following points. |

| |

|Major comments |

|1. There is repeated reference to “specific or non-specific effects”. The meaning of this is slightly opaque to me given the context of the paper. |

|Normally we talk about specific and non-specific effects in terms of placebo: a specific effect would be the effect of aspirin on prostaglandins; the|

|non-specific effect would be the benefit of talking to a caring professional during counseling before cancer surgery (it is “non-specific” because |

|you get this benefit from many different therapies). So I would normally read “it is unclear whether this is due to specific or non-specific effects”|

|as meaning “we don’t know whether it is a placebo or not”. Yet there was a statistically significant difference between active and placebo, the |

|authors conclude “magnetic bracelets are effective” and there was no untreated group (traditionally required to determine the extent of non-specific |

|effects). |

| |

|2. The introduction is extremely brief. In particular, more details need be given about the prior data e.g. How many trials? Were they randomized? |

|What conditions were treated? |

| |

|3. The sample size calculation is incomplete, inaccurate and problematic. It is incomplete because it is not explained how one gets from a “20% |

|differential reduction in WOMAC A score” to an effect size of 0.39. It is inaccurate because an effect size of 0.39 requires 104 patients per group, |

|not 104 patients per two way comparison (i.e. 52 patients per group). It is problematic because it is based on two way comparisons for a three group |

|trial. Hence it assumes a 20% difference between no magnets and weak magnets and then another 20% difference between strong and weak magnets. Hence |

|we are expecting strong magnets to reduce WOMAC scores by 40% compared to placebo, a tall order. |

| |

|4. I have several problems with the statistics (but I would of course, being a statistician). First, Stata 6? Didn’t that go out with the rolodex? |

|Can’t the study team get something a little more up to date? Stata 8 was released in 2003; god knows the date for Stata 6. (though I don’t think this|

|makes much difference to the analyses conducted). Second, ANCOVA should not be used as a “check on the results”: it has been known for a long time |

|that this is the most statistically efficient method of analysis. I strongly recommend as the primary analysis ANCOVA with baseline score as |

|covariate and two dummy variables for “any magnet” and “strong magnet” as predictors. Third, I don’t know how you can use Dunnett’s test “compare the|

|standard … magnet group mean[] … with the standard magnet group mean”. Fourth, what is meant by “results were confirmed by examining residuals” and |

|“results were confirmed by … bootstrapping analysis on analysis of covariance with 3,000 replications)”?. Fifth, far more details need to be given |

|about missing data imputation. This could be published as a “web extra” appendix. Suffice it to say that it is simply insufficient to report that |

|missing values were imputed using a range of plausible values. How was the range of plausible values chosen? How were the results of different |

|iterations combined? Sixth (or is it seventh?), the following is opaque to me “general linear models on all subject explored the associated between |

|outcomes and magnetic strength … and subject’s belief”. Why was generalized linear modelling required? What was the random and what the fixed effect |

|here? What was the correlation structure such that simple linear regression was inappropriate? A general point (seventhly?) is that I felt that the |

|statistical section was attempting to “blind us with science”. So I might advise a simpler approach to the analysis, especially given that, |

|especially given that now actual results are presented for all that exciting advanced statistical stuff (bootstrapping etc.) Alternatively, do at |

|least: a) present your methods in sufficient detail to be reproducible; b) give the actual results….. |

| |

|5. It is a little odd that the authors discuss a pairwise comparison (A vs C) when the overall ANOVA was non-significant. |

| |

|6. I think I need more on the clinical relevance of the difference between groups. What does a difference in WOMAC score of 1.3 actually mean? For |

|example, how much of an improvement is it in percent terms? |

| |

|7. Mechanism: I was somewhat surprised to see absolutely nothing about mechanism. Now magnets applied locally for pain doesn’t seem too much of a |

|stretch because I presume that they improve / change local blood flow and there are several therapies that work on these lines (e.g. diathermy, |

|rubefacients). But this should be discussed. |

| |

|This is a minor comment but I have included it in the major comments just because it annoyed me so much. “Our results may thus not translate to other|

|ethnic populations”. I loathe to see the burden of proof for RCTs be “prove that you can generalize” rather than “prove you can’t”. I can’t think of |

|any reason why magnets would help a white Englishman, but not, for example, an Asian American. |

| |

|Minor comments |

|1. I think the authors are using the term “blocks” incorrectly in the “assignment section”. A block is a unit of randomization to ensure |

|approximately equal numbers in each group are assigned at the same time and / or to different sub-groups. As I read it, the block size is 15. If so, |

|more details need to be given about how “numbers were then allocated … in such a way that each batch contained five bracelets from each group”. There|

|are obviously a very large number of ways of arranging 15 bracelets in three groups of five. |

| |

|2. The result reported in the abstract (comparing group A to group C) is somewhat selective. |

| |

|3. The fourth paragraph in the discussion is a little confusing. The claim is that the mean reduction with magnets and with NSAIDs is the same. The |

|authors use the within group reduction rather than the difference between groups, which is odd, and give no figures for NSAIDs. Incidentally, the |

|statement that “larger investigators should now test the safety of magnets relative to the … risks of analgesics” is silly. Isn’t it obvious? Do we |

|really need to test this? |

| |

|The authors need to be more specific about the further research they propose. |

Statistical Reviewer

|Name: |

|Position: |

| |

|I welcome this report of a 3 arm trial of a simple intervention. However, I feel that the methods used are perhaps unnecessarily complex and I am not |

|fully convinced by the investigation of (un)blinding. I have particular concerns about the incorrect specification of the magnets used in group B. |

|Although I have made many comments, most should be relatively straightforward to address. |

| |

|Methods |

|1. In trials with 3 (or more) treatment arms it is important to be clear about the objectives. There is no prespecification about how the groups would|

|be compared. However, the sample size section refers to analysis of variance, which is a method that compares all 3 groups at once. As noted their, |

|the results of that analysis depends on the outcome in the low magnetic group, even though the primary comparison is between the full strength magnets|

|and the dummy bracelets. I prefer to omit this step and go straight to paired comparisons (with or without adjustment for multiplicity according to |

|taste). In fact the 3 groups test was not statistically significant in this trial. |

| |

|2. As a related point, I am unclear what is meant by a treatment effect of 0.39SD – is this the postulated difference between groups A and C? What |

|value was used for Group B? The value of SD used in this calculation should be stated. As the target was 0.2SD I presume it was 5*0.39, or about 2, |

|which is virtually exactly the same as the observed values shown in Table 2. However, maybe I am misreading this – the effect size should have been |

|expressed as a % of the SD of the change from baseline, which would be a rather bigger difference. |

| |

|3. There is a good argument that the analysis of covariance should be the definitive analysis in such trials, but this appears not to have been |

|prespecified as the main analysis. |

| |

|4. Despite several sentences of explanation, I am left slightly unclear about the method of assignment of magnets (p3). I think that in effect the |

|trial was block randomised with block size 15. |

| |

|5. p4: Examination of residuals can establish whether the model is reasonable but does not ‘confirm results’. |

| |

|6. Did each trial participant have a unique bracelet or were the bracelets reused? If the former, why were only 62 magnets tested in Group B (p4)? If |

|the latter, please specify how many different magnets were used. |

| |

|7. I am puzzled by the discrepancy between the tests of the magnets before and after the study. It seems that the testing was done differently on the |

|two occasions – by NPL at the start (but only 5 per group) and a Hall Effect probe afterwards (p2). Group A magnets are stated on p2 to have field |

|strength of 170-200 mTesla at the start (p3) but 134-197 afterwards (p4). Do magnets vary their strength over time or is the discrepancy due to |

|different test procedures or is it possible that the 10 magnets tested at the start all just happened to be in the intended range? |

| |

|8. For group B the magnets were 21-30 mT at the start and in the range 21-196 mT at the end, of which 34/62 were outside the prespecified range ‘due |

|to a manufacturing error’. This manufacturing error makes Group B of limited value – or, at least, it does not fulfil its intended purpose. On p5 the |

|authors conclude that field strength is important yet they have arguably not examined this factor adequately in their analysis. There is the |

|possibility of an (exploratory) analysis examining a dose-effect relationship. |

| |

|9. I am unsure why the bootstrap was invoked. The use of this approach is certainly unusual in analysis of an RCT. What aspects of the study were |

|deemed to need such conformation, and why? |

| |

|10. There is no mention of adherence to the recommendations of OMERACT and/or OARSI regarding standardised outcome measures for studies in |

|osteoarthritis. |

| |

|Results |

|11. Should table 1 include some information about the severity of the disease in the three groups? |

| |

|12. The examination of gender seems relatively unimportant and certainly has not any adequate power. |

| |

|13. It is not possible to say with certainty that the imputation did not have an effect (last line p4) – I think this should be reworded. However, |

|given the few imputed values this is certainly a reasonable assumption even without a formal investigation. (Indeed I feel that there are too few |

|missing values to be especially worried about.) |

| |

|14. Perhaps a table could be included showing the numbers in each group who gave each of the answers listed on p3. At present the information is given|

|partly and only for Groups A and B. It is to be expected that the treatment effect will be larger in those who knew or guessed correctly but this does|

|not mean that belief is a predictor of outcome. See for example the rapid responses to the recent BMJ paper by Fergusson et al. If the authors retain |

|the analysis restricted to those who did not guess right, then the results should be given more fully. |

| |

|Discussion |

|15. Following #14, I think that the findings on blinding are over-interpreted. |

| |

|16. Treatment B was flawed through a manufacturing error – this is a very strange situation. I did wonder if the Group B results should be played |

|down, combined with A, or even excluded. However, the text is quite clear and readers can make up their own minds |

| |

|17. The conclusion of efficacy (p6) seems too strong without acknowledging the previous studies and without qualification of the strength of the |

|magnet. |

| |

|Abstract |

|18. I don’t think that it is correct to refer to beliefs as a predictor of changes in the WOMAC scale, for reasons noted above. It is not possible to |

|say whether this is cause or effect. |

| |

|Minor points |

|19. Use median or mean consistently for strength of magnets. |

| |

|Dean Fergusson, Kathleen Cranley Glass, Duff Waring, and Stan Shapiro. Turning a blind eye: the success of blinding reported in a random sample of |

|randomised, placebo controlled trials. BMJ Feb 2004; 328: 432 - |

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download