Welcome to the Podiatry Arena forums

You are currently viewing our podiatry forum as a guest which gives you limited access to view all podiatry discussions and access our other features. By joining our free global community of Podiatrists and other interested foot health care professionals you will have access to post podiatry topics (answer and ask questions), communicate privately with other members, upload content, view attachments, receive a weekly email update of new discussions, access other special features. Registered users do not get displayed the advertisements in posted messages. Registration is fast, simple and absolutely free so please, join our global Podiatry community today!

  1. Have you considered the Clinical Biomechanics Boot Camp Online, for taking it to the next level? See here for more.
    Dismiss Notice
Dismiss Notice
Have you considered the Clinical Biomechanics Boot Camp Online, for taking it to the next level? See here for more.
Dismiss Notice
Have you liked us on Facebook to get our updates? Please do. Click here for our Facebook page.
Dismiss Notice
Do you get the weekly newsletter that Podiatry Arena sends out to update everybody? If not, click here to organise this.

Inter-rater reliability of the Foot Posture Index (FPI-6) in the assessment of the paediatric foot

Discussion in 'Pediatrics' started by JFAR, Oct 21, 2009.

  1. JFAR

    JFAR Active Member


    Members do not see these Ads. Sign Up.
    Inter-rater reliability of the Foot Posture Index (FPI-6) in the assessment of the paediatric foot

    Stewart C Morrison and Jill Ferrari

    Journal of Foot and Ankle Research 2009, 2:26doi:10.1186/1757-1146-2-26

    Abstract (provisional)

    Background: Reliability is an integral component of clinical assessment and necessary for establishing baseline data, monitoring treatment outcomes and providing robust research findings. In the podiatric literature traditional measures of foot assessment have been shown to be largely unreliable. The Foot Posture Index (FPI-6) is a clinical tool used in the assessment of foot and to date, there is limited research published which evaluates the reliability of this tool in children and adolescents. Method: Thirty participants aged 5 - 16 years were recruited for the research and two raters independently recorded the FPI-6 score for each participant. Results: Excellent agreement between the two raters was identified following weighted kappa analysis (kW= 0.86). Conclusion: The FPI-6 is a quick, simple and reliable clinical tool which has demonstrated good inter-rater reliability when used in the assessment of the paediatric foot.
     
  2. Hmmm, if I wanted to test the inter-rater reliability of a test, I'd have thought a sample size of more than 2 should be needed. Better to have had 30 raters and 2 measurement subjects than 30 measurement subjects and 2 raters.
     
  3. Admin2

    Admin2 Administrator Staff Member

  4. Peter1234

    Peter1234 Active Member

    don't make me laugh - how would you get the full 'spectrum' of postures from highly pronated to highly supinated in two people? Maybe try and morph them - between sessions: 'We morphed him Bruce, you can measure him again now!!':eek:
     
  5. Griff

    Griff Moderator

    Peter,

    Why would you need a 'full spectrum of postures' in order to test reliability/repeatibility of measurements?

    Ian
     
  6. Peter, since you found my response comical, it is clear to me that you must have a much better understanding of research design and statistics than I do. This being the case, perhaps you could explain how a study of inter-rater reliability employing 2 raters over 30 measurement subjects is a better design, from a statistical point of view, than a study employing 30 raters over 2 measurement subjects to test the study hypothesis?
     
  7. stewartm

    stewartm Member

    Hi Simon,

    Your interpretation of the sampe size and proposals for study design are arbitrary. What is your basis for this?

    By nature of the FPI-6, two measurement participants would yield rather feeble data. What is the purpose of looking at the range of foot posture for two participants when the index ranges from -12 through to +12? The clinical application of this would be limited and, as indicated in the literature, not truly a reflection of the reliability of the index. Are you familiar with the Cornwall et al (2008) study? In this work the authors commented that all raters had difficulty distinguishing in the mid-range of the index - between normal / pronated feet and normal / supinated feet - which was where the majority of their participant group were placed. It would not be possible to determine such limitations with the index by considering the index across 2 measurement participants.

    Stewart
     
  8. The point is that your study sets out to test the inter-rater reliability, but only tests whether two raters can agree 30 times, what we really need to know is whether many raters can agree once (or more). The way the study was designed means that effectively we have the same hypothesis being tested 30 times with a sample size of 2. How generalisable are the results of 2 observers to a large population of clinicians performing this test? I take your point regarding identifying the "difficult discriminations", but that is not what the study set out to do, the study sought to identify the inter-rater reliability, not which points within the classification system were more or less reliable. Cornwalls study shows that inter-rater reliability is dependent upon the foot-type of the measurement subject, so pick a couple of measurement subjects from this mid-range area, at least this would give you your worst case scenario. I'll openly admit that I was clearly being facetious by simply turning the sample sizes employed on their heads when I suggested 30 raters and 2 measurement subjects, so I'd agree this was arbitrary. I assume you are Stewart Morrison one of the authors of the paper, so I'm sure you can explain the rationale for using n=2 rater subjects. And in reality even if we have 30 measurement subjects, what certainty can be given to there being at least one measurement subject in each category? What is the probability that a sample of 30 measurement subjects will yield at least one subject per category? To my mind, to really do the study justice you would need many raters and a range of measurement subjects from all potential categories. However, I should be grateful if you could explain and provide details of your sample size calculation that resulted in your selection of 2 raters and 30 measurement subjects, or were these arbitrary too? I don't claim to be an expert in the area and stand to be corrected, Stewart.
     
  9. Peter1234

    Peter1234 Active Member

    Hi Simon,

    look, the point is, you are way more experienced than myself. However if you read the article you will notice that reliability changes from the highly supinated/pronated with high levels of reliability to the mid range where the tool is less reliable. I believe this to be a valid point-
     
  10. Peter,
    The years I've spent in podiatry, viz. my experience, don't automatically equate to me having a better understanding of statistical processes and theory than you. The outliers are always going to be more easy to differentiate than the cross-overs. At the risk of repeating myself, this is not the question being asked. What is the title of the paper? Moreover, it still does not justify the use of two observers to test inter-observer reliability. So if either of you, yourself or your lecturer Stewart, could explain in statistical terms why it is better to have two observers and 30 measurement subjects than a proportionally larger observer group in the evaluation of inter-observer reliability, or, why 2 observers were sufficient to perform the study, it should be most helpful to me. Put another way, why are the data obtained from two observers sufficient to extrapolate to lets say 1000 clinicians? Look, I've got a reasonable grasp of statistics, I'm no expert, but I'll try to follow your explanations. Thanks in advance.
     
  11. I agree with Simon. Were the two clinicians doing the measurements in the study also clinicians who work together? Clinicians who work together will tend to get the same results when doing clinical measurements whereas clinicians who do not work together will not be as similar in their measurement results. That is the nature and problem of any clinical measurement system.
     
  12. Hi Stewart,

    Having just glanced back over your paper, I quote:"A convenience sample of 30 participants" and "Inter-rater reliability was determined for two podiatrists with postgraduate experience of working in paediatrics (in excess of five years)". To me, or someone who might dabble in statistics these statements by you sound like your "interpretation of sample size" was "arbitrary"; a pot and a kettle spring to mind. What is your basis for this? I assume you (and Peter) do know that there is at least one, if not several methods for calculating the required sample size of both "examiners" and "examinees" for reliability studies? Don't talk to me about arbitrary, Stewart.

    Kevin, there are a number of potential bias problems tied up in this kind of study, the point you outline being one of them.

    This is interesting: "All children referred to the paediatric clinic were considered for inclusion. Children
    were excluded if they presented with a foot position that would be associated with
    abnormal structural features or would obscure visualisation of normal foot
    architecture.
    "

    Ok, so we have here a system of foot typing, but we can't use it to classify children who have feet with "abnormal structural features" - OK- useful?
     
  13. Peter1234

    Peter1234 Active Member

    Hi Simon,

    it would be better to have 10,000 examiners taking samples of the entire population of the UK. I do believe there are flaws to most studies - yes you are right Simon. When penicillin was discovered I don't believe this was planned for beforehand.

    Peter
     
  14. Peter,

    I'll take it from your response above that you can't:
    Thanks for being on the show. But before you go...
    Actually, you raise a reasonable point here: what is the between-day variability in the FPI-6?
     
  15. Peter1234

    Peter1234 Active Member

    Yes Stuart and Simon and Jill,

    you want agreement and correlation, yes you can remove the outliers to possibly end up with a less certain result or more concentrated result, no you can't use percentage for correlation, and yes there are flaws in the design and spelling mistakes. And finally yes you do ultimately want more raters for more accuracy/statistical power. You could also run a statistical power calculation (i think you did) to figure the exact number of raters (although not normally done). The calculations weren't shown or any proper tables with the numbers. The article is in it's provisional stage as far as I can understand. So maybe we should wait until its entirely finished until we comment. . there!!! :craig::craig: I am now the official know-it-all, the pinada of Podiatry Arena.

    :boxing::eek::confused:

    PS I thought it was a good study!!
     
  16. stewartm

    stewartm Member

    Hi,

    I don't disagree that there could be improvements to the study - does any research project ever complete the picture? Or do they raise more questions? Regardless, it is perfectly feasible to look at the inter-rater reliability across two raters and we have presented this in the study. The work will never be applicable to all clinicians, there were two raters with quite specific expertise and the applicability of this comes down to the critical interpretation. We know that expertise and experience will affect one's reliability and consequently, I wouldn't see the results of the study applying to all. Furthermore, there were limitations to the study and these have been discussed in the manuscript.

    I acknowledge that a number of concepts underpinning statistics are arbitrary, I never said otherwise. All this highlights is that there are different opinions out there but you can't argue that your proposed design is more appropriate than the one published in the manuscript. On a different note, I'm not sure that I entirely agree with Kevin's comment because, regardless of whether we work together or not, the raters scored independently and were blinded to the results. However, I do appreciate that there are a number of sources of error with any measurement and that these cannot always be controlled.
     
  17. Stewart:

    I think that you will find, when doing any clinical measurement, that if two examiners were trained by the same individuals and have compared their measurements directly with each other over a period of time, that they will be invariably more close in their measurement results than if the examiners were trained by different school/individuals and have never compared their measurements directly with each other.

    When I was teaching Root biomechanics measurements a quarter century ago, I noted that these clinical measurements also had good interexaminer reliability when two examiners worked closely with each other made the measurements. This does not have to do with blinding of the examiners or rating their scores independently. This has to do with the actual exact measurement methods used and the frame of reference of each examiner when they perform each measurement.

    I am not criticizing Tony Redmond's FPI only since I think the FPI is a great idea and I commend Tony for his pioneering work (and also your work) in this regard. However, my clinical experience in doing and teaching clinical measurement to hundreds of podiatry students and podiatrists over the last 25 years has clearly shown me that any clinical measurement is subject to error and these errors will be magnified if the examiner has been trained differently and never worked along side the other exainer that he/she is comparing his measurements with. As I said earlier, this is a characteristic of any clinical measurement system of the human foot and lower that I know of.

    Excellent discussion!:drinks
     
  18. Thanks for your responses Stewart
    "Conclusion: The FPI-6 is a quick, simple and reliable clinical tool which has demonstrated good inter-rater reliability when used in the assessment of the paediatric foot."

    Perhaps then a caveat should have been inserted: The FPI-6 is a quick, simple and reliable clinical tool in the hands of these two clinicians which has demonstrated good inter-rater reliability for said clinicians when used in the assessment of the paediatric foot."?

    I do know that sampling error is proportional to 1/ square root of n

    What the above equation suggests is that the greater the sample size n, the smaller the probability of sampling error.

    1/ square root 2 = 0.707;
    1/ square root 30 = 0.183
     
Loading...

Share This Page