Test instruments need to include test tasks that allow test-takers to engage in interactive and extended communication.

Key Issues

Two key indications of the quality of a speaking part of a language test are the degree of authenticity and the extent that the test is able to elicit language from test-takers that it is representative of the type and context for how that language is used in real-world situations – matching the purpose of the assessment. The more effectively the test achieves these, the more confidence there can be in the use of the test results for making decisions about test-takers language abilities in real-world communication situations and the higher the overall validity of the test instrument.

In LPR tests authenticity includes how well the test tasks engage the test-takers in interactions with an interlocutor in the way that speakers communicate in real-world situations. In radiotelephony communication contexts, pilots and air traffic controllers interact by asking questions, responding, giving instructions, acknowledging and so on. In such cases the context for the communication is shaped by both the situation and how both participants engage during the exchange. Radiotelephony communication between pilots and controllers is dialogic in nature. At least some parts of the test instrument should contain task types that represent this type of language use.

Validity in LPR tests also relates to how well the test tasks mirror the context and type of language used in radiotelephony communications. The more aligned the kind of language is in contexts similar to how pilots communicate in real-world situations, the more likely it is that the results of the test can be used to make valid inferences about how well test-takers can perform and use language for accurate and effective radiotelephony communication.


In a speaking part of a test, evaluations can only be made on whether the test-taker has the appropriate skills to communicate in real life situations from their performance during the limited time available in a test. It is therefore important that the test instrument includes test tasks which are designed to provide the test-taker with opportunities to communicate in interactive situations that allow the test-taker to actively contribute and shape the discourse.

Naturally occurring discourse involves two or more people interacting to share information. This often involves negotiating meaning. Turn-taking and using strategies such as rephrasing, paraphrasing, clarifying ideas and even recognising that the listener may not have understood, and so using strategies to minimise or manage miscommunication, are examples of speaking skills that can only be effectively assessed in contexts where test tasks encourage test-takers to participate in an exchange.

What does this mean for test design?

At least some speaking tasks need to provide opportunities for test-takers to participate in interactive communication with a trained interlocutor, i.e., tasks which require the test-taker to contribute to a co-constructed dialogue in the same way that communication occurs in real-world aeronautical contexts. Test tasks which are limited to test-takers responding to isolated questions or disconnected prompts do not allow interactive skills to be evaluated and do not reflect real-world communications. They are therefore not authentic. Authenticity is a key requirement of proficiency testing. Note that in interactive speaking tasks, comprehension should not be assessed, or only be limited to supporting the results of a part of the test dedicated to assessing comprehension separately. Comprehension could be rated and included as a subjective impression to confirm or support the results of a dedicated listening part of the test (see Criterion 3).

ICAO Statements & Remarks

The following statements from ICAO Document 9835 (2nd Edition, 2010) are related to this issue.

1.3.2. Human language is characterised, in part, by its ability to create new meanings and to use words in novel contexts. This creative function of language is especially useful in accommodating the complex and unpredictable nature of human interaction, including in the context of aviation communications. Linguistic proficiency in listening and speaking can be broken down into component skills which are described below with their associated learning processes. These are the skills that appear in the ICAO rating scale.

Interaction: This skill addresses the ability to engage in spontaneous spoken dialogue and to successfully achieve communicative goals. Increasing proficiency in this skill results in reduced allowance or effort on the part of an interlocutor to maintain a conversation. It is characterized by the rapidity and appropriateness of responses, the ability to volunteer new information, to take conversational initiatives, to be responsive to feedback from an interlocutor, and to detect and to resolve misunderstandings as they occur.

3.2.1. All uses of a language and all language-learning environments have unique characteristics that are the consequence of the context of communication and the tasks and purposes of the users.
3.2.3. The tasks and purposes of the users determine:
a) communication themes or topics;
b) dominant speech acts or language functions to be understood or produced;
c) dominant interactive schemata or speech-act sequences and exchange structures;
d) dominant strategies (e.g. interaction: turn-taking, cooperating, communication repair, etc.).
3.2.4. Proper implementation of ICAO language proficiency requirements depends on an accurate understanding of the characteristics of the language of aeronautical radiotelephony communications. However, the inflexibility arising from the use of standardized, pre-recorded prompts may result in an important limitation in the scope of evaluation available to semi-direct tests. This limitation may be particularly critical in the ability of the test to assess the full range of abilities covered by the “interactions” descriptors of the ICAO Rating Scale. Proficiency tests that are administered directly may use face-to-face communication in some phases of the delivery but should include a component devoting time to voice-only interaction.

Why this issue is important

Aeronautical radio communication involves pilots and controllers communicating in interactive situations where the main purpose is to share information leading to outcomes (actions). This includes responding to issues, enquiring, solving problems, providing advice, offering choices, and so on. In all such communication, each participant needs to engage in topics, negotiate meaning and participate in a collective and shared communicative context which develops as a result of the interaction. Further, there is a range of language functions (see Appendix B in ICAO Document 9835) that pilots and controllers typically need to be able to use for effective radiotelephony communication. How well test-takers can use these functions is best assessed by eliciting them in interactive situations which simulate real-world radiotelephony communication contexts. This is best achieved in task types that allow the test-takers to participate in a co-constructed dialogue.

It is essential that at least some components of an LPR test allow test-takers to demonstrate their ability to:

  • engage in communication by responding to an interlocutor and contributing to a jointly constructed exchange with the interlocutor, and,
  • build and develop on ideas and topics as part of a co-constructed dialogue with an interlocutor.

These are important interaction skills necessary for communication that pilots and controllers need to demonstrate. The test needs to assess these skills, including how well test-takers are able to communicate (as opposed to just speak) using a wide range of different language functions to achieve the purpose for the communication in real-world communication contexts associated with radiotelephony communication contexts.

Providing opportunities for test-takers to provide samples of their interactive competencies and skills increases the likelihood that the test is able to provide a valid means of assessing a wider range of speaking skills and competencies associated with how pilots and air traffic controllers interact and communicate in real-world situations.

Test developers should therefore ensure that test tasks are designed to elicit as much interactive language as possible.

This means designing test tasks that enable test-takers to fully engage with language production processes and to interact as much as possible improves the overall effectiveness of an LPR test.

Limiting the scope for test-takers to interact limits the possibilities for raters to make a full evaluation of the skills required and may limit claims the test developer makes towards the validity of the test instrument.

Interaction where both parties participate in a dialogue involves turn-taking and the use of strategic skills such as rephrasing ideas for repair or clarification or paraphrasing ideas for precision and clarification for the listener.

Interactive communication may be affected and driven by external factors which the speakers are obliged to manage and which determines why the communication is necessary and how it develops. Such factors in radiotelephony communication can be – operational, climatic, technical, procedural, and/or administrative. It can often be more than one external factor that determines the need and type of communication.

Such complexities and unpredictability mean that assessment of these skills needs to be incorporated in at least some parts of the test design.

Test instruments which focus on task types which are monologic at the expense of dialogic task types are more likely to elicit predominantly informative language functions (describing, giving reasons, elaborating, giving examples etc) and not those associated with managing an interaction (e.g. initiating, clarifying, changing a topic, resolving a misunderstanding, offering, refusing, requesting etc). Clearly such test instruments would lack the ability for these language functions to be assessed; yet these functions are essential language skills needed by pilots and controllers for effective communication over the radio. Tests that therefore do not include task types that elicit these language functions through performance situations which require test-takers to take part in a dialogue would therefore be less valid for assessment of language proficiency for radiotelephony communication.

Next ➟ Best Practice Options

Best Practice Options

Test Developers need to consider the following points when designing test instruments.

Test tasks that maximise interaction between a test-taker and an interlocutor give the best opportunity for the test-taker to participate in negotiated communication that allows a wider assessment of interactive skills associated with real-world communication situations between pilots and controllers.

Tasks that require a test-taker to participate in ATC-PILOT role-play situations provide opportunities to enhance the authenticity of the test and allow for interactive skills to be assessed. Including such task types also improves the meaningfulness of the test to the users and the validity of the test results (see Criterion 2 and Criterion 4).

Prompts in a role-play may include situations which encourage clarification, requests, paraphrasing, turn-taking, use of strategies to correct miscommunication or rephrasing of ideas and explanations. Ideally these can be built into voice-only interactions as would be the case in real-life radiotelephony communications.

Interview task types may be included in the test design to provide more evidence of how well test-takers are able to use creative and complex language. Such language may be difficult to fully assess in a pilot-controller role-play situation alone with the limited time available in a testing situation. This could be in the form of a work-place role-play (for example an incident debrief), or a discrete interview task.

Interview-based tasks which attempt to mirror a conversational dialogue may appear more unnatural compared to natural face-to-face conversations which produce more natural discourse. Including this type of structured test task provides for greater standardisation and reliability, but it reduces authenticity. Tasks and prompts need to follow logical steps and be relevant to the operational situational language and contexts of the test-taker.

In an interview format, test-takers typically respond to scripted questions asked by an interlocutor. In such cases the test task needs to be designed in such a way that the prompts (interview questions) allow the interlocutor and test-taker to engage in conversation. This can be achieved in part by encouraging interlocutors to engage with test-taker responses (e.g. by asking follow up questions such as why, can you explain, what do mean by… etc), but also by ensuring the prescribed questions are sequenced in natural, linked and cohesive way, to facilitate the discourse at least appearing to develop and being co-constructed by both the interlocutor and test-taker in a more natural way.

It is possible that test instruments may contain task types that tend to be more monologic (that is, the test-taker produces a response which does not require an interlocutor to engage with the test-taker). However, these tasks should not be included in tests to the exclusion of speaking task types that allow for interactive exchanges. In other words, it is important that the majority of LPR tests tasks in a speaking component of the test are interactive in the form of interviews and role-plays.

A set of interview questions an examiner asks requiring the test-taker to respond to perhaps random or unconnected topics lacks both authenticity in the language produced and the context for how this communication occurs. In such situations the examiner may simply ask a set of prescribed questions requiring the test-taker to answer, with minimal or no interaction with the examiner. Typically such test tasks set up a highly artificial communicative context and produce unnatural discourse. The test-taker responds by producing a set of usually short and disconnected monologues with minimal or no interaction with the examiner (interlocutor). Such test tasks bear little resemblance to how pilots and controllers communicate in real-world radiotelephony communication contexts over the radio. This affects both the authenticity of the test task and validity of the results associated with these test tasks. Interview question task types are also limited in their ability to elicit the wide range of language functions that pilot or controllers need to be able to use and understand in radiotelephony communication contexts. Indeed, in cases where test instruments rely on interview questions as the dominant component of a speaking test, there are risks that language training programmes designed to support pilots and controllers in the target population focus on developing language associated with responding to interviewer questions rather than the language needed for successful communication in radiotelephony (e.g., controllers asking questions, giving instructions, making enquiries, offering choices and pilots requesting assistance, justifying reasons, speculating about future outcomes etc).

Not only does this type of task type lack authenticity, but it can also encourage test-takers to prepare rehearsed responses in cases where the interview questions become known to the test-taker population (refer to Criterion 8).

Next ➟ External References

External References

The following references are provided in support of the guidance and best practice options above.

1. Weir (2005)

The author points out the need for test instruments to include test tasks that allow the test-taker to engage in meaningful and co-constructed dialogue, and that relying solely on interview-based test task formats is limiting for assessment purposes. This of course is central to PILOT-ATC communication in radiotelephony where all discourse is co-constructed and interactive.

“If we want to test spoken interaction, a valid test must include reciprocity conditions. This contrasts with the traditional interview format in which the interviewer asks the questions and the interviewee answers. So if we are interested in the candidate’s capacity to take part in spoken interaction, there should be reciprocal exchanges where both interlocutor and candidate have to adjust vocabulary and message and take each other’s contribution into account. This means that the candidate must be involved in the interaction to a greater extent than that of merely answering questions” (p. 72).

“To test speaking ability we should require candidates to demonstrate their ability to use language in ways which are characteristic of interactive speech, i.e. to process the language in the way described … over an adequate sampling of the routines involved in the real-world) speaking process” (p. 104).

Weir, C. J. (2005). Language testing and validation: An evidence-based approach. Basingstoke: Palgrave Macmillan.

2. Galaczi and ffrench (2011)

“Several language testing studies have provided valuable insights and empirical evidence into the role of different response formats in terms of the quality and quantity of candidate output. … The findings indicated that different response formats produced different functional profiles. The one-to-one interview format and individual long turn tended to elicit predominantly informational functions, while the discussion tasks enabled a broader range of functions from all three categories” (p. 114).

Galaczi, E. D., & ffrench, A. (2011). Context validity of Cambridge ESOL speaking tests. In L. Taylor (Ed.), Examining speaking (Vol. 30). Cambridge: Cambridge University Press.

3. Brown (2005)

“In semi-structured or conversational interviews in particular, the aim is to engage the candidate in relatively naturalistic ‘conversational’ interaction, a feature which is typically claimed to contribute to the validity of the procedure as a measure of communicative ability” (p. 25).

Brown, A., (2005) Interviewer Variability in Oral Proficiency Interviews, Frankfurt: Lang.

4. Green (2014)

“The burden of organizing and developing spoken interaction is shared between the participants and depends on their ability to take turns, recognizing when it is appropriate for them to speak and their awareness of the kinds of contribution they are expected to make as the interaction progresses” (p. 130).

“The interview format itself has come under criticism because the interaction it gives rise to tends to be very one sided. The interviewer asks all the questions and controls the interaction while the assessee is just required to answer. This does not closely resemble everyday conversations, discussions or transactions, where the participants have a more balanced relationship and share in directing the interaction” (p. 139).

Green, A., (2014), Exploring Language Assessment and Testing: Language in Action. New York, NY: Routledge.

5. Kim (2013)

The author provides an example of the necessity to include test tasks that involve test-takers in participating in radiotelephony discourse:

“…. qualities such as strategic competence for accommodation, and shared responsibility for lack of success of communication by participants should be incorporated into the radiotelephony communication construct and any tests which are designed to reflect this” (p. 107).

Kim, H. (2013). Exploring the construct of radiotelephony communication: A critique of the ICAO English testing policy from the perspective of Korean aviation experts. Papers in Language Testing and Assessment, 2 (2), 103-110.

6. Weir (2005)

“To determine whether learners can communicate orally, it is necessary to get them to take part in direct spoken language activities” (p. 103).

“We want candidates to perform relevant language tasks and adapt their speech to the circumstances making decisions under time pressure … and making any necessary adjustments as unexpected problems arise” (p. 103).

“To test speaking ability we should require candidates to demonstrate their ability to use language in ways which are characteristic of interactive speech, i.e. to process the language … over an adequate sampling of the routines” (p. 103).

“The more indirect the test task the more difficult it will be to translate test results into statements about what candidates can or cannot do in … real-life activity”(p. 144).

“The extended picture description lacks situational authenticity and one might seriously question when (test-takers) ever need to do this kind of thing in real life. However, claim might be made for interactional authenticity in the technique may well be tapping into the informational routine of reporting” (p. 148-149).

“Describing something that has happened may well be important operation in some contexts, but generally speaking this task tells us very little about … ability to interact orally or to use skills such as negotiation of meaning or agenda management. The technique does not allow … to incorporate the important condition of reciprocity, which can only be tested in a more interactive technique” (p. 148).

“In interviews it is sometimes difficult to replicate features of real-life communication such as motivation, purpose and role appropriateness” (p. 154).

It is often difficult to elicit fairly common language patterns typical of real life conversation, such as questions from the candidate. Interviewers often manipulate conversations to get candidates to say things employing a variety of structures. The might reduce the authenticity of the discourse” (p. 154).

Weir, C. J. (2005). Language Testing and Validation – An Evidence Based Approach. London: Palgrave Macmillan.


CRITERION 1 Use of language in radiotelephony contexts
CRITERION 2 Different tests for pilots and air traffic controllers
CRITERION 3 Dedicated tasks to assess listening comprehension
CRITERION 4 Distinct sections with appropriate task types
THIS PAGE Engage in interactive and extended communication
CRITERION 6 Tasks and items can differentiate between levels
CRITERION 7 Assess abilities for real-world communications
CRITERION 8 A sufficient number of equivalent test versions
GO TDG Criteria: HOME
GO Test Evaluation Tool
GO Intro & Objectives
GO TDG Workshops 2019