Inter-rater reliability of the Foot Posture Index (FPI-6) in the assessment of the paediatric foot

JFAR · Oct 21, 2009

Members do not see these Ads. Sign Up.
Inter-rater reliability of the Foot Posture Index (FPI-6) in the assessment of the paediatric foot

Stewart C Morrison and Jill Ferrari

Journal of Foot and Ankle Research 2009, 2:26doi:10.1186/1757-1146-2-26

Abstract (provisional)

Background: Reliability is an integral component of clinical assessment and necessary for establishing baseline data, monitoring treatment outcomes and providing robust research findings. In the podiatric literature traditional measures of foot assessment have been shown to be largely unreliable. The Foot Posture Index (FPI-6) is a clinical tool used in the assessment of foot and to date, there is limited research published which evaluates the reliability of this tool in children and adolescents. Method: Thirty participants aged 5 - 16 years were recruited for the research and two raters independently recorded the FPI-6 score for each participant. Results: Excellent agreement between the two raters was identified following weighted kappa analysis (kW= 0.86). Conclusion: The FPI-6 is a quick, simple and reliable clinical tool which has demonstrated good inter-rater reliability when used in the assessment of the paediatric foot.

Simon Spooner · Oct 21, 2009

JFAR said: ↑

Results: Excellent agreement between the two raters was identified following weighted kappa analysis (kW= 0.86). Conclusion: The FPI-6 is a quick, simple and reliable clinical tool which has demonstrated good inter-rater reliability when used in the assessment of the paediatric foot.
Click to expand...

Hmmm, if I wanted to test the inter-rater reliability of a test, I'd have thought a sample size of more than 2 should be needed. Better to have had 30 raters and 2 measurement subjects than 30 measurement subjects and 2 raters.

Admin2 · Oct 21, 2009

Related threads:
Other threads tagged with Foot Posture Index
The paediatric flat foot proforma

Peter1234 · Oct 22, 2009

Simon Spooner said: ↑

Hmmm, if I wanted to test the inter-rater reliability of a test, I'd have thought a sample size of more than 2 should be needed. Better to have had 30 raters and 2 measurement subjects than 30 measurement subjects and 2 raters.
Click to expand...

don't make me laugh - how would you get the full 'spectrum' of postures from highly pronated to highly supinated in two people? Maybe try and morph them - between sessions: 'We morphed him Bruce, you can measure him again now!!'

Griff · Oct 22, 2009

Peter,

Why would you need a 'full spectrum of postures' in order to test reliability/repeatibility of measurements?

Ian

Simon Spooner · Oct 23, 2009

Peter1234 said: ↑

don't make me laugh - how would you get the full 'spectrum' of postures from highly pronated to highly supinated in two people? Maybe try and morph them - between sessions: 'We morphed him Bruce, you can measure him again now!!'
Click to expand...

Peter, since you found my response comical, it is clear to me that you must have a much better understanding of research design and statistics than I do. This being the case, perhaps you could explain how a study of inter-rater reliability employing 2 raters over 30 measurement subjects is a better design, from a statistical point of view, than a study employing 30 raters over 2 measurement subjects to test the study hypothesis?

stewartm · Oct 23, 2009

Hi Simon,

Your interpretation of the sampe size and proposals for study design are arbitrary. What is your basis for this?

By nature of the FPI-6, two measurement participants would yield rather feeble data. What is the purpose of looking at the range of foot posture for two participants when the index ranges from -12 through to +12? The clinical application of this would be limited and, as indicated in the literature, not truly a reflection of the reliability of the index. Are you familiar with the Cornwall et al (2008) study? In this work the authors commented that all raters had difficulty distinguishing in the mid-range of the index - between normal / pronated feet and normal / supinated feet - which was where the majority of their participant group were placed. It would not be possible to determine such limitations with the index by considering the index across 2 measurement participants.

Stewart

Simon Spooner · Oct 23, 2009

stewartm said: ↑

Hi Simon,

Your interpretation of the sampe size and proposals for study design are arbitrary. What is your basis for this?

By nature of the FPI-6, two measurement participants would yield rather feeble data. What is the purpose of looking at the range of foot posture for two participants when the index ranges from -12 through to +12? The clinical application of this would be limited and, as indicated in the literature, not truly a reflection of the reliability of the index. Are you familiar with the Cornwall et al (2008) study? In this work the authors commented that all raters had difficulty distinguishing in the mid-range of the index - between normal / pronated feet and normal / supinated feet - which was where the majority of their participant group were placed. It would not be possible to determine such limitations with the index by considering the index across 2 measurement participants.

Stewart
Click to expand...

The point is that your study sets out to test the inter-rater reliability, but only tests whether two raters can agree 30 times, what we really need to know is whether many raters can agree once (or more). The way the study was designed means that effectively we have the same hypothesis being tested 30 times with a sample size of 2. How generalisable are the results of 2 observers to a large population of clinicians performing this test? I take your point regarding identifying the "difficult discriminations", but that is not what the study set out to do, the study sought to identify the inter-rater reliability, not which points within the classification system were more or less reliable. Cornwalls study shows that inter-rater reliability is dependent upon the foot-type of the measurement subject, so pick a couple of measurement subjects from this mid-range area, at least this would give you your worst case scenario. I'll openly admit that I was clearly being facetious by simply turning the sample sizes employed on their heads when I suggested 30 raters and 2 measurement subjects, so I'd agree this was arbitrary. I assume you are Stewart Morrison one of the authors of the paper, so I'm sure you can explain the rationale for using n=2 rater subjects. And in reality even if we have 30 measurement subjects, what certainty can be given to there being at least one measurement subject in each category? What is the probability that a sample of 30 measurement subjects will yield at least one subject per category? To my mind, to really do the study justice you would need many raters and a range of measurement subjects from all potential categories. However, I should be grateful if you could explain and provide details of your sample size calculation that resulted in your selection of 2 raters and 30 measurement subjects, or were these arbitrary too? I don't claim to be an expert in the area and stand to be corrected, Stewart.

Peter1234 · Oct 23, 2009

Hi Simon,

look, the point is, you are way more experienced than myself. However if you read the article you will notice that reliability changes from the highly supinated/pronated with high levels of reliability to the mid range where the tool is less reliable. I believe this to be a valid point-

Simon Spooner · Oct 23, 2009

Peter1234 said: ↑

Hi Simon,

look, the point is, you are way more experienced than myself. However if you read the article you will notice that reliability changes from the highly supinated/pronated with high levels of reliability to the mid range where the tool is less reliable. I believe this to be a valid point-
Click to expand...

Peter,
The years I've spent in podiatry, viz. my experience, don't automatically equate to me having a better understanding of statistical processes and theory than you. The outliers are always going to be more easy to differentiate than the cross-overs. At the risk of repeating myself, this is not the question being asked. What is the title of the paper? Moreover, it still does not justify the use of two observers to test inter-observer reliability. So if either of you, yourself or your lecturer Stewart, could explain in statistical terms why it is better to have two observers and 30 measurement subjects than a proportionally larger observer group in the evaluation of inter-observer reliability, or, why 2 observers were sufficient to perform the study, it should be most helpful to me. Put another way, why are the data obtained from two observers sufficient to extrapolate to lets say 1000 clinicians? Look, I've got a reasonable grasp of statistics, I'm no expert, but I'll try to follow your explanations. Thanks in advance.

Kevin Kirby · Oct 23, 2009

I agree with Simon. Were the two clinicians doing the measurements in the study also clinicians who work together? Clinicians who work together will tend to get the same results when doing clinical measurements whereas clinicians who do not work together will not be as similar in their measurement results. That is the nature and problem of any clinical measurement system.

Simon Spooner · Oct 23, 2009

stewartm said: ↑

Hi Simon,

Your interpretation of the sampe size and proposals for study design are arbitrary. What is your basis for this?
Click to expand...

Hi Stewart,

Having just glanced back over your paper, I quote:"A convenience sample of 30 participants" and "Inter-rater reliability was determined for two podiatrists with postgraduate experience of working in paediatrics (in excess of five years)". To me, or someone who might dabble in statistics these statements by you sound like your "interpretation of sample size" was "arbitrary"; a pot and a kettle spring to mind. What is your basis for this? I assume you (and Peter) do know that there is at least one, if not several methods for calculating the required sample size of both "examiners" and "examinees" for reliability studies? Don't talk to me about arbitrary, Stewart.

Kevin Kirby said: ↑

I agree with Simon. Were the two clinicians doing the measurements in the study also clinicians who work together? Clinicians who work together will tend to get the same results when doing clinical measurements whereas clinicians who do not work together will not be as similar in their measurement results. That is the nature and problem of any clinical measurement system.
Click to expand...

Kevin, there are a number of potential bias problems tied up in this kind of study, the point you outline being one of them.

This is interesting: "All children referred to the paediatric clinic were considered for inclusion. Children
were excluded if they presented with a foot position that would be associated with
abnormal structural features or would obscure visualisation of normal foot
architecture. "

Ok, so we have here a system of foot typing, but we can't use it to classify children who have feet with "abnormal structural features" - OK- useful?

Peter1234 · Oct 23, 2009

Hi Simon,

it would be better to have 10,000 examiners taking samples of the entire population of the UK. I do believe there are flaws to most studies - yes you are right Simon. When penicillin was discovered I don't believe this was planned for beforehand.

Peter

Simon Spooner · Oct 23, 2009

Peter1234 said: ↑

Hi Simon,

it would be better to have 10,000 examiners taking samples of the entire population of the UK. I do believe there are flaws to most studies - yes you are right Simon. When penicillin was discovered I don't believe this was planned for beforehand.

Peter
Click to expand...

Peter,

I'll take it from your response above that you can't:

Simon Spooner said: ↑

explain in statistical terms why it is better to have two observers and 30 measurement subjects than a proportionally larger observer group in the evaluation of inter-observer reliability, or, why 2 observers were sufficient to perform the study, it should be most helpful to me. Put another way, why are the data obtained from two observers sufficient to extrapolate to lets say 1000 clinicians?
Click to expand...

Thanks for being on the show. But before you go...

Peter1234 said: ↑

'We morphed him Bruce, you can measure him again now!!'
Click to expand...

Actually, you raise a reasonable point here: what is the between-day variability in the FPI-6?

Peter1234 · Oct 24, 2009

Yes Stuart and Simon and Jill,

you want agreement and correlation, yes you can remove the outliers to possibly end up with a less certain result or more concentrated result, no you can't use percentage for correlation, and yes there are flaws in the design and spelling mistakes. And finally yes you do ultimately want more raters for more accuracy/statistical power. You could also run a statistical power calculation (i think you did) to figure the exact number of raters (although not normally done). The calculations weren't shown or any proper tables with the numbers. The article is in it's provisional stage as far as I can understand. So maybe we should wait until its entirely finished until we comment. . there!!! :craig::craig: I am now the official know-it-all, the pinada of Podiatry Arena.

:boxing:

PS I thought it was a good study!!

stewartm · Oct 26, 2009

Hi,

I don't disagree that there could be improvements to the study - does any research project ever complete the picture? Or do they raise more questions? Regardless, it is perfectly feasible to look at the inter-rater reliability across two raters and we have presented this in the study. The work will never be applicable to all clinicians, there were two raters with quite specific expertise and the applicability of this comes down to the critical interpretation. We know that expertise and experience will affect one's reliability and consequently, I wouldn't see the results of the study applying to all. Furthermore, there were limitations to the study and these have been discussed in the manuscript.

I acknowledge that a number of concepts underpinning statistics are arbitrary, I never said otherwise. All this highlights is that there are different opinions out there but you can't argue that your proposed design is more appropriate than the one published in the manuscript. On a different note, I'm not sure that I entirely agree with Kevin's comment because, regardless of whether we work together or not, the raters scored independently and were blinded to the results. However, I do appreciate that there are a number of sources of error with any measurement and that these cannot always be controlled.

Kevin Kirby · Oct 26, 2009

stewartm said: ↑

Hi,

I don't disagree that there could be improvements to the study - does any research project ever complete the picture? Or do they raise more questions? Regardless, it is perfectly feasible to look at the inter-rater reliability across two raters and we have presented this in the study. The work will never be applicable to all clinicians, there were two raters with quite specific expertise and the applicability of this comes down to the critical interpretation. We know that expertise and experience will affect one's reliability and consequently, I wouldn't see the results of the study applying to all. Furthermore, there were limitations to the study and these have been discussed in the manuscript.

I acknowledge that a number of concepts underpinning statistics are arbitrary, I never said otherwise. All this highlights is that there are different opinions out there but you can't argue that your proposed design is more appropriate than the one published in the manuscript. On a different note, I'm not sure that I entirely agree with Kevin's comment because, regardless of whether we work together or not, the raters scored independently and were blinded to the results. However, I do appreciate that there are a number of sources of error with any measurement and that these cannot always be controlled.
Click to expand...

Stewart:

I think that you will find, when doing any clinical measurement, that if two examiners were trained by the same individuals and have compared their measurements directly with each other over a period of time, that they will be invariably more close in their measurement results than if the examiners were trained by different school/individuals and have never compared their measurements directly with each other.

When I was teaching Root biomechanics measurements a quarter century ago, I noted that these clinical measurements also had good interexaminer reliability when two examiners worked closely with each other made the measurements. This does not have to do with blinding of the examiners or rating their scores independently. This has to do with the actual exact measurement methods used and the frame of reference of each examiner when they perform each measurement.

I am not criticizing Tony Redmond's FPI only since I think the FPI is a great idea and I commend Tony for his pioneering work (and also your work) in this regard. However, my clinical experience in doing and teaching clinical measurement to hundreds of podiatry students and podiatrists over the last 25 years has clearly shown me that any clinical measurement is subject to error and these errors will be magnified if the examiner has been trained differently and never worked along side the other exainer that he/she is comparing his measurements with. As I said earlier, this is a characteristic of any clinical measurement system of the human foot and lower that I know of.

Excellent discussion!:drinks

Simon Spooner · Oct 26, 2009

Thanks for your responses Stewart

stewartm said: ↑

it is perfectly feasible to look at the inter-rater reliability across two raters and we have presented this in the study. The work will never be applicable to all clinicians, there were two raters with quite specific expertise and the applicability of this comes down to the critical interpretation. We know that expertise and experience will affect one's reliability and consequently, I wouldn't see the results of the study applying to all.
Click to expand...

"Conclusion: The FPI-6 is a quick, simple and reliable clinical tool which has demonstrated good inter-rater reliability when used in the assessment of the paediatric foot."

Perhaps then a caveat should have been inserted: The FPI-6 is a quick, simple and reliable clinical tool in the hands of these two clinicians which has demonstrated good inter-rater reliability for said clinicians when used in the assessment of the paediatric foot."?

stewartm said: ↑

All this highlights is that there are different opinions out there but you can't argue that your proposed design is more appropriate than the one published in the manuscript.
Click to expand...

I do know that sampling error is proportional to 1/ square root of n

What the above equation suggests is that the greater the sample size n, the smaller the probability of sampling error.

1/ square root 2 = 0.707;
1/ square root 30 = 0.183

Inter-rater reliability of the Foot Posture Index (FPI-6) in the assessment of the paediatric foot

JFAR Active Member

Simon Spooner MVP

Admin2 Administrator Staff Member

Peter1234 Active Member

Griff Moderator

Simon Spooner MVP

stewartm Member

Simon Spooner MVP

Peter1234 Active Member

Simon Spooner MVP

Kevin Kirby MVP

Simon Spooner MVP

Peter1234 Active Member

Simon Spooner MVP

Peter1234 Active Member

stewartm Member

Kevin Kirby MVP

Simon Spooner MVP

Preventive interventions for medial tibial stress syndrome

work-from-home policies and interest in foot surgery

Interdigital-type antifungal socks

Podiatrist Wanted in Mebourne Australia - International Visa Sponsorship Offered

Podiatrist Position Avaiable - International Visa Sponsorship Offered

Podiatrist Position Avaiable - International Visa Sponsorship Offered

Podiatrist Position Avaiable - International Visa Sponsorship Offered

Share This Page

Inter-rater reliability of the Foot Posture Index (FPI-6) in the assessment of the paediatric foot

JFAR Active Member

Simon Spooner MVP

Admin2 Administrator Staff Member

Peter1234 Active Member

Griff Moderator

Simon Spooner MVP

stewartm Member

Simon Spooner MVP

Peter1234 Active Member

Simon Spooner MVP

Kevin Kirby MVP

Simon Spooner MVP

Peter1234 Active Member

Simon Spooner MVP

Peter1234 Active Member

stewartm Member

Kevin Kirby MVP

Simon Spooner MVP

Preventive interventions for medial tibial stress syndrome

work-from-home policies and interest in foot surgery

Interdigital-type antifungal socks

Podiatrist Wanted in Mebourne Australia - International Visa Sponsorship Offered

Podiatrist Position Avaiable - International Visa Sponsorship Offered

Podiatrist Position Avaiable - International Visa Sponsorship Offered

Podiatrist Position Avaiable - International Visa Sponsorship Offered

Share This Page

Useful Searches