Introduction

Spasticity, a defining feature of central motor neuron disorder, is a complex motor disturbance predominantly identified by a velocity-dependent enhancement of myotatic reflexes, leading to exaggerated tendon jerks and enhanced muscle tone [1]. Lance’s widely disseminated definition describes spasticity as ‘a velocity-dependent increase in the tonic stretch reflex‘. However, researchers such as Pandyan et al. [2] argue that this understanding is limited and propose a broader definition, describing spasticity as ‘a disorder of sensorimotor control resulting from an upper motor neuron lesion, characterised by intermittent or sustained involuntary activation of muscles‘. This revised definition emphasises the multifaceted nature of spasticity, which extends beyond increased muscle tone and reflects a broader spectrum of motor dysfunction.

Spasticity is commonly observed in several major neurological disorders such as stroke, multiple sclerosis, traumatic brain injury, and spinal cord injury [3]. Unlike hypertonus or generalised increased muscle tone, spasticity specifically relates to the rate of muscle stretching: quicker stretches result in more significant resistance. This phenomenon may originate from intrinsic muscle rigidity or an amplification of the myotonic reflex [4]. The repercussions of spasticity extend beyond mere alterations in muscle tone, affecting joint mobility, coordination of movement, and overall functional capability [5]. It varies in severity among individuals, potentially causing discomfort, pain, and restricted motion, thus significantly affecting daily life activities. Effective management of spasticity is crucial for enhancing mobility and quality of life, especially considering that, twelve months post-stroke, the occurrence rate of spasticity can vary between 17% and 38% in those affected [68].

Quantifying the severity of spasticity requires reliable and valid methods of assessment [3]. Although electromyography and electrophysiological tests are objective methods to assess spasticity; in clinical settings, scales are usually used because they are less expensive, less time consuming and more practical [3, 9, 10]. Among the subjective methods for assessing spasticity, the Modified Ashworth Scale (MAS) is the most frequently applied scale in clinical practice [11]. Several studies have demonstrated that the MAS has good reliability between and within raters, particularly when the assessment focuses on the elbow and wrist [11]. However, the reliability in the lower limbs has been questioned as it may be difficult to perceive rigidity mediated by reflex responses in the lower extremities, which are heavy and difficult to mobilise [12, 13]. Moreover, limited research has evaluated the inter-observer reliability of the MAS in the lower extremities of stroke patients [11, 1418], and only one has included physiotherapy students, but it did not exclusively include stroke survivors [19]. Recent research has demonstrated that the MAS exhibits varying degrees of reliability across different muscle groups in stroke patients [18]. Specifically, the inter-observer reliability ranged from poor to good for upper extremities (Kappa = 0.25 to 0.66) and was moderate for lower extremities (Kappa = 0.41 to 0.54) [18]. Moderate intra-observer reliability was detected in the assessment of the hip flexors, underscoring the importance of cautious interpretation of MAS outcomes, especially when considering its lower reliability in assessing certain muscle groups in stroke patients. Additionally, a Spanish version of the MAS is available, as referenced in the Rehabilitation Measures Database. A validation study conducted by physiotherapists in Cali, Colombia, further supports its cultural adaptation and suitability for Spanish-speaking populations [20].

This research innovatively assesses the reliability of the MAS when utilised by novice physiotherapists, a scarcely explored area in the existing literature. It emphasises the importance of reliable tools for neurorehabilitation, aiding clinical decision-making in stroke survivor care. These insights are crucial for enhancing spasticity assessment accuracy in clinical settings, especially for early-career professionals. Addressing both a knowledge gap and the impact of practitioner experience on assessment reliability, the objective of this investigation is to assess the inter-observer reliability of physiotherapy students in evaluating lower limb spasticity post-stroke using the MAS.

Subjects and methods

Study design

We conducted an observational cross-sectional study.

Participants

A total of 32 participants selected by convenience sampling were assessed between March and May 2016. Nine participants were recruited from two geriatric homes, eight through an advertisement placed in local media, and 15 were recruited using different methods such as physiotherapist referral, contact on the streets, and referral by a health institution. People over 40 with hemiplegia or hemiparesis, minimum 6 months after their stroke, volunteers, being able to follow simple commands and living in Bucaramanga were included.

Participants with additional lower limb disorders, including orthopaedic, musculoskeletal, vascular, or integumentary conditions that could interfere with the mobilisation of the affected limb (e.g., fractures, severe arthritis, peripheral arterial disease, or skin ulcers) were excluded. Additionally, individuals presenting dyspnoea, abdominal pain, chest pain, uncontrolled hypertension, congestive heart failure, acute chest trauma, or pulmonary thromboembolism were also excluded to ensure participant safety and the reliability of the assessments.

All participants were undergoing physiotherapy as part of a clinical trial titled Effects of Lower Limb Muscle Strengthening on Spasticity, Gait, and Functionality in Post-Stroke Patients. The intervention included conventional therapy consisting of motor sequence exercises, balance activities, and functional training tailored to initial assessment findings. Additionally, participants in the experimental group received lower-limb resistance training over a 4-month period.

Evaluators

Two final-year physiotherapy students from the University of Santander (UDES) with prior academic training in neurological evaluation and intervention conducted all assessments. To ensure consistency and reliability, they received 12 hours of specialised training from a physiotherapist with over 10 years of experience in neurorehabilitation (ICGD). This training included theoretical instruction, practical exercises, and the use of a standardised checklist to ensure uniformity in the application of the MAS [19]. As part of their preparation, the students performed practice assessments on participants from UDES and a local geriatric care facility. Additionally, a pilot test was conducted in which each student assessed six participants under conditions identical to the main study. This process made it possible to refine the study protocol and ensured consistency in applying the MAS, a widely used tool for measuring spasticity on a six-point scale ranging from 0 to 4 [21]. All procedures, including the use of the checklist, were standardised during the pilot phase to maintain methodological rigour throughout the study.

Procedure

A general information questionnaire was completed to assess patients’ socio-demographic characteristics, medical history, and affected side.

Each evaluator performed a single attempt to assess spasticity per participant, with the second evaluator conducting the assessment approximately 5 min after the first. This approach was chosen to minimise variability caused by changes in participants’ muscle tone due to fatigue, emotional status, or environmental factors. Moreover, previous studies [22] have demonstrated that the effects of passive mobilisation on spasticity are short-lived, as they primarily influence the biomechanical properties of the muscle without significantly affecting other aspects of spasticity. Therefore, the first evaluation was unlikely to impact the subsequent results. To ensure consistency and minimise external influences, all assessments were conducted in a controlled environment. Patients were evaluated either on a stretcher or bed in a space designed to provide a fresh and pleasant atmosphere, allowing them to remain relaxed during the examination. Participants recruited from geriatric homes were assessed in their own beds within the institution, while other participants were evaluated at the ‘Neurotrauma Center SAS’, a facility specialising in neurorehabilitation and research in Bucaramanga. The order of evaluation between the two assessors was randomised to prevent systematic bias. Additionally, the evaluators were blinded to each other’s results to maintain objectivity. Testing sessions were conducted between 8 AM and 12 PM to account for consistency in circadian variations that could affect muscle tone.

Each participant was positioned supine with the head in the midline, and the upper limbs were aligned with the trunk. The mobilisation was performed with manual contacts on the bony prominences of the segment that was mobilised to prevent pressure or stimulation on the muscle or tendon that could facilitate or inhibit muscle tone. The passive mobilisation was done in flexion and extension, with rhythmic and successive displacements of the segment, varying the speed so as not to facilitate adaptation. The evaluator moved the joints three times, with each movement lasting less than one second, as determined by counting ‘one thousand and one’ (‘mil uno’ in Spanish). One score was assigned based on the three movements. The muscle testing sequence advanced systematically through assessments of the hip, knee, and ankle joints, wherein extensors and flexors were evaluated sequentially for each joint. The passive mobilisation was done first on the unaffected muscle group to develop a sense of a normal movement, and afterward, the affected muscle group was tested. The spasticity assessment on the affected side was carried out as described in the supplementary file. Prior to assessing spasticity, ankle mobility was evaluated to identify any potential retraction or fixed contractures. This was done by performing mobilisations at varying speeds. Changes in the speed of movement help differentiate the cause of resistance: spasticity-related resistance varies with the movement speed due to hyperexcitability of the muscle spindle, whereas resistance from contractures remains constant regardless of the speed [22]. This step ensured accurate differentiation between spasticity and fixed contractures and enhanced the reliability of the assessment process.

Statistical analysis

Age was reported as the mean value accompanied by the standard deviation, and time with the stroke was conveyed by the median alongside the 25th and 75th percentiles (interquartile range – IQR). Nominal variables were expressed using both counts and percentages. The inter-observer reliability was assessed using the quadratic weighted Cohen’s Kappa Index (Kω). Also, the 95% confidence intervals (95% CI) were calculated. The interpretation of the results followed the Landis and Koch criteria, which categorise agreement as follows: poor (< 0), slight (0–0.20), fair (0.21–0.40), moderate (0.41–0.60), substantial (0.61–0.80), and almost perfect (0.81–1) [23]. Approval for the study was obtained from the Research Ethics Board at the UDES Prior to the assessment, informed consent was obtained from all participants. Statistical significance was set at a p-value of less than 0.05, and data analysis was conducted using the Stata 16.1 software.

Results

Out of the 32 recruited participants, 75% (n = 24) were male. The mean ± SD age was 65.2 ± 13.9 years, and 53.1 % (n = 17) had a stable partner. The right side was the main dominant side (75%, n = 24) and the most affected one (53.1%, n = 17). A medical history of heart disease was reported by 53.1% (n = 17) of participants. Only one participant was taking antispastic drugs (baclofen). The median time since the stroke was 60 months (IQR 25.3–110.3). The general MAS scores for both evaluators have been summarised in Table 1. All muscle groups evidenced substantial agreement, with the exception of the knee extensors, which presented a moderate level of agreement (Table 2).

Table 1

Participants’ characteristics (n = 32)

Variablen%
Sexfemale825.0
male2475.0
Age (years)mean (SD)65.2 (13.9)
Marital statuswithout a stable partner1546.9
with a stable partner1753.1
Lateralityright2475.0
left825.0
Medical history of heart diseaseyes1753.1
no1443.8
do not know13.1
Affected sideright1753.1
left1546.9
VariableMedian(IQR)
Time since the stroke (months)60(25.3–110.3)
General MAS score, evaluator 18(5–12.5)
General MAS score, evaluator 29(5–17)
Table 2

Inter-observer reliability of the Modified Ashworth Scale for lower limb muscle groups

Muscle groupKappa (Kω)95% CI
Hip flexors0.700.50, 0.83
Hip extensors0.780.59, 0.92
Knee flexors0.780.60, 0.88
Knee extensors0.540.30, 0.76
Dorsiflexors0.650.42, 0.83
Plantar flexors0.640.39, 0.84

Discussion

The study explored the MAS’s inter-observer reliability in assessing lower limb spasticity in patients six months poststroke, particularly when administered by physiotherapy students. Our findings revealed substantial agreement across most of the muscle groups assessed. The moderate reliability observed in the knee extensors highlights a need for cautious interpretation in certain muscle groups. These results are significant as they demonstrate that novice physiotherapists, even with limited clinical experience but adequate training, can reliably utilise the MAS in a clinical setting. This implies that the MAS could be considered a viable instrument for evaluating spasticity when used by emerging healthcare professionals, reinforcing its applicability in diverse clinical environments and among varied practitioner skill levels. To the best of our knowledge, there are no other studies that have assessed the reliability of the lower limb MAS between novice physiotherapists, nor have they examined inter-observer reliability in hip extensors and dorsiflexors (Table 3).

In the study by Sloan et al. [16], which adopted a methodology similar to ours in terms of sample size, participants’ position, time between evaluators, ordering between evaluators, blinding results between evaluators, and training, the Spearman’s rank correlation coefficient indicated slight to moderate inter-observer agreement, which was lower than our findings. The differences could be partly attributed to the higher number of evaluators (n = 4) and the increased number of movements performed during the test in the Sloan et al. [16] study. Repeated passive stretching could affect spasticity and this aspect makes the classifying problematic for evaluators. Conversely, Gregson et al. [17] found a similar agreement in the knee flexors. However, in their study, the position of the participants was seated, the time between evaluators’ assessments was 5 min longer, and the time since stroke was lower.

While the agreement of the knee extensors was higher in the current study compared with Blackburn et al. [14], the knee extensors obtained the lowest agreement, which may be related to the participants’ position during the test and posture. Fleuren et al. [24] found that elongated muscles such as knee extensors in the supine position increased stretch reflex activity when contrasted with contracted muscles such as knee flexors in a seating position. In addition, in Blackburn et al. [14], the participants were positioned in a side-lying posture, with the affected leg placed on top, presenting a challenge for the evaluator to simultaneously control the knee, hip, and pelvis.

Regarding the dorsiflexors and plantar flexors, although there was substantial agreement in our study, it was lower compared with the hip extensors and flexors and knee flexors. This might be associated with the constrained ankle range of motion and the contraction of the plantar flexors, complicating classification [25]. However, in the current study, we did not assess the range of motion to confirm this statement. Also, the dorsiflexors and plantar flexors were the last muscle group that the evaluators tested, and there was no break between muscle groups, which could affect spasticity differently in the two evaluations of the same participants.

Table 3

Inter-observer reliability of the Modified Ashworth Scale for lower limb muscle groups

VariableGomez et al. (our study)Sloan et al. (1992) [16]Gregson et al. (1999) [17]Blackburn et al. (2002) [14]Li et al. (2014) [15]Vidmar (2023) [18]
General characteristics
Males/Total (n)24/3226/3420/3517/3636/5120/30
Age (years)65.2 ± 13.957.8 ± 17.873 ± (NR)76.1 ± 7.959 ± 1455.1 ± 13.5
Populationstrokestroke
n = 31
head injury
n = 3
strokestroke
infarct
n = 33
unclassified
n = 3
stroke
ischemic
n = 37
haemorrhagic
n = 14
stroke
Evolution time60 (R: 6–288)
months
NR40 (R: 19–78)daysR: 2–12 weeks3.7±4.3
months
R: 1–19
months
Aspects of the MAS
time resting before the test5 minNRNR5 min10 min10 min
verbal instruction to startyesNRNRNRNRNR
extremities – positionsupinesupineseatedside lyingsupinesupine
movements343*32up to 3
Methodological aspects
number of evaluations per rater112
(two consecutive days)
112
(1 day in between)
time between raters’ assessments5 min5 min10 min1 h30 min15 min
time between muscle groups0 minnot applicableNRNRNANR
evaluator independent registration (blind)yesyesyesyesyesNR
Aspects of the evaluators
number of raters(2):
PT students
(4):
2 PT
2 doctors
(2):
1 medical specialist
1 PT
(2):
both PT
(2):
1 physiatrist
1 PT
(3):
2 PT
1 physiatrist
order between ratersrandomrandomcounter balancedcounter balancedphysiatrist then PTrandom
experiencefinal year studentNRNR>10 yearsNRone rater = 2 years
two raters > 10 years
rater trainingyesyes, not specifiednrnot extensiveyesno additional
resultsKω (95% CI)rhoKKbKK (95% CI)
HF0.70
(0.50–0.82)
NANANANAA1–B 0.15 (–0.39,0.68)
A1–C 0.48 (0.10,0.85)
B–C 0.59 (0.21,0.97)
KF0.77 (0.59–0.88)MD1 vs MD2: 0.37
MD1 vs PT1: 0.57
MD1 vs PT2: 0.44
PT1 vs MD2: 0.40
PT1 vs PT2: 0.62
PT2 vs MD2: 0.26
day 1: 0.79
day 2: 0.73
NANAA1–B 0.43 (0.10,0.76)
A1–C 0.67 (0.44,0.89)
B–C 0.24 (–0.10,0.59)
KE0.54
(0.29–0.76)
NANA0.28
p = 0.06
NAA1–B 0.45 (0.20,0.71)
A1–C 0.71 (0.53,0.90)
B–C 0.36 (0.06,0.65)
plant flexors0.63
(0.38–0.83)
NAday 1: Kω = 0.51
day 2: Kω = 0.45
G: 0.15
p = 0.21
S: 0.19
p = 0.10
K: 0.48
p < 0.001
S: A1–B 0.53 (0.34,0.72)
S: A1–C 0.71 (0.52,0.90)
S: B–C 0.39 (0.20,0.59)
G: A1–B 0.42 (0.25,0.59)
G: A1–C 0.64 (0.44,0.84)
G: B–C 0.34 (0.14,0.54)

[i] PT – physiotherapist, MD – doctor of medicine, HF – hip flexors, HE – hip extensors, KF – knee flexors, KE – knee extensors, G – gastrocnemius, S – soleus, R – range, IQR – interquartile rank, ढω – weighted Kappa, K – Kappa, Kb – Kendall tau-b, rho – Spearman’s rank correlation coefficient, 95% CI – confidence interval of the 95%, NA – not assessed, NR – not reported

* Three measurements each time, 30 s apart, the lowest value was selected.

Overall, the high level of reliability found in the current study could be primarily explained by the rater training and the standardisation of the test procedure. This means that the MAS is reliable for an inexperienced rater (final year physiotherapy student) with a training period in the MAS evaluation and following a pre-established evaluation protocol. This protocol included a rest period prior to the first evaluation to obtain a basal state of patient conditions; three movements by counting ‘one thousand and one’ (‘mil uno’ in Spanish) following the Bohannon and Smith [21] procedure, which allows the same stretch speed during the test. This counting tried to reduce inconsistency in the movement velocities used by the evaluators during the assessment. A study of 14 patients (10 post-stroke participants) [19] incorporated a metronome in the MAS evaluation to obtain consistent movement velocities. However, the findings indicated low reliability, with a Kappa coefficient classified as poor for the ankle plantar flexors [19]. Also, the manual contacts were standardised mainly using bony prominences as the contact point of the tester’s hand on the patient avoiding the pressure over the muscle so as not to facilitate the response of the myotatic reflex and avoiding the pressure over the tendon so as not to stimulate the tendinous organ of Golgi with the consequent relaxation. In addition, in this reliability study, the protocol involved a resting time of 5 min between evaluators to prevent a carry-over effect. Another aspect that may have increased the reliability was the time since stroke (6 months and over), which allows reflex stability. The variability of reflex responses in stroke patients is known to be higher during acute stages than in chronic stages [26], and the muscle’s mechanical properties can also evolve over time [27].

Questions have been asked regarding whether the MAS serves as a valid assessment tool for spasticity. Concurrent criterion validity studies of the MAS have been carried out using electrophysiological and biomechanical measurements. While some studies have shown low-to-moderate validity [2832], previous research has demonstrated a low correlation between MAS scores and electrophysiological or biomechanical measures of spasticity [3337]. The inconclusive results of the concurrent criterion validity could be explained due to the MAS not being able to exclusively evaluate spasticity. Spasticity is typified by an elevation in resistance to passive movement that is dependent on velocity, stemming from an increase in excitability of tonic stretch reflexes, referred to as neural stiffness [38]. However, augmented resistance to passive movement may result from non-neural factors, such as modifications in the muscle or joint properties, notably changes in extracellular matrix viscosity (non-neural stiffness) [38, 39].

Contractures are another non-neural factor that can increase the resistance to passive movement. Changes in soft tissue, including a reduced number of sarcomeres in series, restructuring of connective tissue, and diminished extensibility, are associated with contractures [36]. According to Kwah et al. [40], 52% of patients developed at least one contracture at 6 months post-stroke. In our study, the median of time since the stroke was 60 months; thus, the participants in this study had a high probability of having soft tissue and joint property changes during the evaluation due to their underlying condition. Therefore, the MAS evaluated both spasticity (neural stiffness) and non-neural stiffness.

In addition, electrophysiological measures (i.e., H-reflex and F-waves) are frequently applied to monitor muscle neural activities [28, 29], while biomechanical measures (i.e., Myotonometry and Sonoelastography) evaluate non-neural stiffness [41]. Other combined electrophysiological and biomechanical measurements (i.e., NeuroFlexor) evaluate both non-neural and neural stiffness [42]. Therefore, the MAS’s validity results, as a measure of stiffness rather than spasticity, depend on the criterion used in the validity study.

Another point of controversy is whether the MAS is an ordinal measure of stiffness. Two studies have demonstrated that the resistance to passive movement using a biomechanical device was not significantly different among grades 1, 1+, and 2 [37] or 0, 1, 1+, and 2 of the MAS [35]. Both studies concluded that the MAS does not fulfil the criteria for a valid ordinal-level measure of resistance to passive movement. However, despite these limitations, the MAS remains the most frequently applied clinical measure of spasticity in poststroke patients [4, 43].

Limitations

The current study is subject to limitations, including a small sample size, which may impact the calculation of Kappa statistics. Specifically, a larger sample size is necessary to compute the Kappa index if the prevalence of the outcome is low [44]. Previous studies of reliability have shown a low prevalence of high levels of spasticity in people with stroke [13, 14]. In addition, the results are restricted to a sample of stroke survivors in the chronic stage due to the reflex stability; although in clinical settings, clinicians need to assess spasticity in patients in acute stages. The agreement was assessed by two physiotherapy students with little experience, who mainly scored spasticity between 0 and 2 according to the MAS. Also, their main disagreements were in the knee extensor, dorsiflexors and plantar flexors; the second rater tended to assign 1 while the first rater tended to assign 0 (see supplementary file Appendix 2). Because some uncontrolled factors could alter the spasticity in the second assessment, the evaluators carried out the assessments in random order. Other factors that were not evaluated in the current study and could influence the scoring of the two evaluators were pain and the angle of the rater’s joints during measurements, which could influence the speed of the movement of the limb and the pressure exerted by the rater at the point of contact with the participant.

Further studies need to be carried out in people with acute stroke, using more than two evaluators and evaluators from diverse disciplines. Regarding the methodology of the reliability study, for future studies, we suggest a random order between muscle groups instead of the cephalocaudal order and giving breaks between muscle groups.

Conclusions

In conclusion, the MAS shows substantial agreement using Kco in non-experienced but trained evaluators for five out of the six lower limb muscle groups assessed. This supports its utilisation in assessing spasticity in individuals who have experienced a stroke. The implementation of this type of study contributes to the body of scientific evidence for health professionals working in rehabilitation. Rehabilitation workers require scales with adequate psychometric properties that provide objectivity to the evaluation to address the intervention plans according to the problems detected.