Test instruments need to contain appropriate tasks that assess test-takers’ abilities to understand and communicate in real-world contexts.

Key Issues

Test tasks that mirror real-world communication contexts represent the kinds of communicative activities pilots or air traffic controllers may be expected to perform in real-world situations. The more the task types are able to set up communication contexts that reflect real-world communication situations (within the context of parameters controlled through the design of test instrument) the higher the authenticity of the test.


Authenticity is one of the fundamental elements in specific purpose language testing. Authenticity influences the validity of LSP tests. The more the content of an LSP test resembles real-world communication contexts, the more confidence we can have in the meaningfulness of the test results. The more the content is aligned to real-world specific purpose communication needs, the more likely it is that valid conclusions can be made with respect to how well test-takers can communicate in comparable real-world situations. Authenticity is an essential requirement in LPR tests as it is a basis for allowing useful and valid results to be reported and informed and fair decisions to be made on the basis of these results. In the case of the LPRs, this means pilot or air traffic controller communication in real-world aeronautical situations needs to be included to ensure the tests have authenticity.

In simple terms, that means that all test tasks need to replicate real communicative activities in a specific domain as much as possible. This enables the assessment of language proficiency related to that domain to be meaningful.

Authenticity in test tasks designed for radiotelephony communication ensures that the context, format, and type of communication (and types of language included in the test tasks) reflect real-world language and communication needs of pilots and air traffic controllers.

Tests which claim to assess aeronautical proficiency in communication which fail to include task types that replicate the way pilots or air traffic controllers communicate in real-world situations are less valid for reporting on the ICAO LPRs. This is because the type of language the test-takers are assessed on in such tests is less likely to relate to their real-world job needs.

What does this mean for test design?

Components of the test need to contain tasks and content that mirror the kind of communication settings and contexts associated with real-world situations that pilots and air traffic controllers may face, both in radiotelephony communication and other job-related communication contexts. The more directly the test tasks mirror real-world communication contexts, including the type of language and how this is used, the more effective a test instrument is in allowing valid interpretations to be made about test performances and test scores.

ICAO Statements & Remarks

The following statements from ICAO Document 9835 (2nd Edition, 2010) are related to this issue.

3.2.1. Language proficiency is necessarily linked to particular uses of the language […] All uses of a language and […] have unique characteristics that are the consequence of the context of communication and the tasks and purposes of the users.
3.2.4. Proper implementation of ICAO Language Proficiency Requirements depends on an accurate understanding of the characteristics of the language of aeronautical radiotelephony communications.
3.2.7. The sole object of ICAO Language Proficiency Requirements is aeronautical radiotelephony communications, a specialized subcategory of aviation language corresponding to a limited portion of the language uses of only two aviation professions — controllers and flight crews. It includes ICAO standardized phraseology and the use of plain language.
4.5.3. a) Proficient speakers shall communicate effectively in voice-only (telephone/radiotelephone) and in face-to-face situations.

This holistic descriptor draws attention to the need for […] testing to provide voice-only settings to exercise or demonstrate language proficiency, as well as face-to-face settings that allow broader uses of language.

4.5.5. b) the ICAO Rating Scale has a distinct aeronautical radiotelephony focus; it addresses the use of language in a work-related aviation context, voice-only communications […] The test-taker may be asked […] to engage in a conversation-like interview with the interlocutor or to perform in a role-play.
Special Guidance on ICAO Doc 9835 (2010), (2010) refers to the option for the test-takers to engage in a conversation-like interview with the interlocutor or to perform in a role-play.

Best practice in English for specific purpose testing promotes high authenticity. This is achieved through the test design and choice of test tasks. In the LPRs context this means that it is essential for the test-takers to perform in communication contexts that mirror the way pilots and controllers communicate in real-world situations. In fact, role-play tasks are the most effective test tasks for maximising authenticity and ensuring tests contain contexts that assess how well test-takers can understand and communicate in real-world communication contexts. For LPR tests, it is important to include role-play tasks in the speaking part of the test which requires pilots or controller to communicate in voice-only radiotelephony communication contexts. Role-plays that require test-takers to communicate with colleagues in briefing sessions can also provide opportunities to assess plain language in authentic situations. Conversation-like interviews can be included in the test instrument, but not at the expense of or to replace role-play task types. Contrary to what ICAO Doc 9835 states, the option for either a conversation-interview or a role-play, is therefore not recommended in effective ICAO LPR test instrument design. One benefit of direct testing is that the test tasks can be made more natural or more communicative because the test-takers interact with an interlocutor. […] it is important that test-takers are evaluated in their use of language related to routine as well as unexpected or complicated situations as evidence of their level of proficiency. The purpose of a language proficiency test is to assess test-takers’ use of language based on their performance in an artificial situation in order to make generalizations about their ability to use language in future real-life situations. Because of the high stakes involved, pilots and air traffic controllers deserve to be tested in a context similar to that in which they work. Test content should, therefore, be relevant to their work roles. The provisions of the ICAO language proficiency requirements that directly address test content are:
a) Annex 1, Appendix 1, where holistic descriptors refer to “work-related topics”, “work-related context”, and “routine work situation”. In general, tasks that resemble real-life activities are most suitable. It is important to keep in mind that the idea of a work-related context […] would aim to closely replicate radiotelephony communications, including the extent of plain language needed in unusual, unexpected or emergency situations. Proficiency tests that are administered directly […] should include a component devoting time to voice-only interaction.

Voice-only interaction is an important characteristic of aeronautical radiotelephony communications. When a pilot and a controller interact, they cannot see each other. Directly administered proficiency tests should simulate this condition of “voice only” in at least a portion of the test. The test should be specific to aviation operations (and) should provide test-takers with opportunities to use plain language in contexts that are work-related for pilots and air traffic controllers.

A further step toward providing test-takers with a familiar aviation-related context would be to customize the tests for controllers or pilots.

Why this issue is important

Specific purpose language is generally seen as that used for communication in defined professional domains such as aviation, medicine and engineering. The context for the communication shapes the type of language used and how it is used. In specific purpose language testing, the type of tasks play an important role in providing the means for reproducing the kind of communication and language used in real-world communication. In controlled testing contexts, the type of responses and language used by test-takers is pre-determined by the design of the test tasks.

Test tasks that are able to replicate real-world communication contexts improve the authenticity of the test. A high level of authenticity means that the kind of language the test assesses more closely reflects the language usage and communication situations which the test aims to evaluate.

The more the test is able to replicate real-world communication contexts by including appropriate task types, the more confidence we can have in the results of the test. This improves the the validity of the interpretations made of test results and how these are used. In LPR testing, the more authentic the task types – the more they reflect real-world communication contexts in which pilots and air traffic controllers communicate and the more likely the test-takers are to respect the test and the results they receive.

Next ➟ Best Practice Options

Best Practice Options

Test Developers need to consider the following points when designing test instruments.

Two key factors need to be considered when evaluating the authenticity of a test task. These are 1) the extent that the task replicates the context of a real-life situation (situational authenticity), and 2) the authenticity in the ability of the test-takers to engage in mental processes that enable them to interact in a similar way they would in equivalent real-life communication situations (interactional authenticity).

Test developers need to ensure that an authentic task demonstrates, as accurately as possible, all elements of a real-world communicative event for a domain specific situation, including elements that trigger similar mental processes which the test-taker can be expected to use in corresponding communication in real-world communication situations. In LPR testing, this means that at least part of the test instrument needs to include test tasks that replicate situations which require pilots and controllers to communicate over the radio. These require test-takers to use (in a speaking part of the test) and understand (in a listening part of the test) specific language associated with the domain, and at a level of complexity, that allows assessment to correspond to the levels associated with the ICAO language proficiency rating scale.

A proven test task that enhances the authenticity of test instruments for language in technically specific operational domain is simulated role-play tasks. In role-play tasks the test-takers take on the role of either a pilot or air traffic controller and perform the communicative activities associated with managing an event, in ways they would in real world situations.


1. Role-plays – where the test-taker has an opportunity to speak and listen in an authentic operational communication as near as possible to a real-life situation – are arguably the most valid. Such tasks should share the critical features of the real-life communication and rely on the communicative skills and background knowledge of the test-taker in order to provide as authentic a situation as possible.

Role-play tasks should include routine operational elements and a situational complication (non-routine/unexpected) where the use of plain language is required. Such tasks would typically be between a pilot and an air traffic controller and require the test-taker and interlocutor to communicate in voice-only contexts.

2. Other collaborative speaking tasks, including face-to-face discussions, may be considered on their own merit according to the level of authenticity that the test task provides. Prompt stimuli such as texts, videos and pictures may be used in order to assess a wider range of operational language than role-plays can offer alone, but should always refer to the language required for replicating authentic communication in a real-world operational situation. Such tasks can be in the form of role-plays simulating communication between, for example, a training captain and a pilot or between an ATC supervisor and a controller. Prompts and responses could relate, for example, to unexpected and or non-routine events and could elicit not just narrative and descriptive language, but also require test-takers to use a range of higher-level language functions and more complex language (e.g. justifying actions, giving opinions, hypothesizing about alternative outcomes, speculating about possible past events, etc).

Collaborative tasks that prompt a test-taker to engage in communicative tasks replicating authentic communication in routine, non-routine and unexpected situations in air-ground communication contexts provide an ideal platform to assess interactions and the test-taker’s ability to comprehend an interlocutor and respond appropriately, simulating real-world communication. However, care should be taken to ensure that test tasks that are designed to assess speaking performance are not used to evaluate listening comprehension at the expense of including a dedicated listening comprehension part of the test. If a collaborative task is used to assess listening comprehension then this should be clearly identified and described in the test specifications and be used to provide further evidence of comprehension to support the results of the tasks in other parts of the test designed to assess just comprehension (refer to Criterion 3).

A more typical way of assessing listening is to use a dedicated listening comprehension test, comprising a number of radiotelephony-based recordings (which may be scripted and recorded to simulate real-life recordings) with prompts and response formats that allow the test to assesses comprehension ability and the level of a test-taker.

In terms of authenticity, a separate listening test can replicate passive listening skills that a pilot or an air traffic controller uses to maintain situational awareness in operational environments. Thus a set of recordings may have a wider coverage of events to simulate the likely variety of language and communication contexts that a test-taker may experience in real world operational contexts.

Note that although authenticity is an essential requirement of an LPR test, with the test tasks providing opportunities for test-takers to communicate in real-world contexts, it is important to remember that the test tasks serve to control the conditions in which this language is produced for assessment purposes.

Conducting assessment in an on-the-job situation, such as in a simulator or in the control room, would result in high task authenticity. However, such conditions are not controlled in the way that can be achieved by a carefully designed test instrument. The type and complexity of language the pilot/controller is exposed to or needs to use in such situations is not likely to provide a sufficient breadth or representative sample of language for valid assessment purposes. Further, evaluating language in these types of uncontrolled situations does not provide the consistency needed across multiple evaluations, meaning such a testing process would lack reliability and therefore fairness. Finally, as simulator sessions or on-the-job observations of language performance are not designed to exclusively assess language proficiency, with pilot potentially needing to multi-task, manage the aircraft, monitor displays etc, these distractions can impact on language performance and negatively influence evaluations. It is therefore not recommended to assess language proficiency in simulator sessions or in on-the-job situations.

Test tasks need to be designed so that the types of responses test-takers produce can be meaningfully evaluated and in a way that demonstrate the degree of proficiency the responses align to. Test tasks therefore serve a purpose in eliciting certain types of expected responses and language in order that test-takers’ abilities to cope and perform under specific communicative conditions can be evaluated.

Next ➟ External References

External References

The following references are provided in support of the guidance and best practice options above.

1. Alderson, Clapham and Wall (1995)

“Many advocates of communicative language testing argue that it is important that a […] test should look like something one might do ‘in the real world’ with language” (p. 172).

Alderson, C. J., Clapham, C., & Wall, D. (1995) Language Test Construction and Evaluation. Cambridge: Cambridge University Press.

2. Bachman and Palmer (1996)

“To justify the use of language tests we need to be able to demonstrate that performance on language tests corresponds to language use in specific domains other than the language test itself” (p. 23).

“One aspect […] pertains to the correspondence between the characteristics of Target Language Use (TLU) tasks and those of the test task. It is this correspondence that is at the heart of authenticity. We would describe a test task whose characteristics correspond to those of the TLU tasks as relatively authentic” (p. 23).

“We consider authenticity to be an important test quality because it relates the test task to the domain of generalizations to which we want our score interpretations to generalize. Authenticity thus provides a means for investigating the extent to which score interpretations generalize beyond performane on the test to […] the TLU domain” (p. 23).

“…another reason for considering authenticity important is because of its potential effect on test-takers’ perceptions of the test and hence their performance […] in terms of their perceived relevance to a TLU domain, of the test’s topical content and the types of tasks required. It is this relevance that we believe helps promote a positive affective response […] and can thus help test-takers to perform at their best” (p. 24).

“The essence of authenticity is the degree of correspondence between the characteristics of TLU tasks and those of the test task” (p. 142).

Bachman, L. F., & Palmer, S. (1996) Language Testing in Practice. Oxford, Oxford University Press.

3. Douglas (2000)

“I have defined a specific purpose language test as one in which test content and methods are derived from an analysis of a specific purpose target language use situation, so that test tasks and content are authentically representative of tasks in the target situation, allowing for an interaction between the test-taker’s language ability and specific purpose content knowledge, on the one hand, and the test tasks on the other. Such a test allows us to make inferences about a test-taker’s level of language ability with reference to a specific purpose domain” (p. 87).

Douglas, D. (2000). Assessing language for specific purposes. Cambridge, UK: Cambridge University Press.

4. Douglas (2013)

According to Jacoby and McNamara (as cited in Douglas, 2013, p. 371), “using the traditional four linguistic skills to delineate special purpose performance is inadequate to capture real-world communicative cultures and activities and that special-purpose performance is by definition task-related, context-related, specific, and local …”

“Specific purpose language needs include not only linguistic knowledge but also background knowledge relevant to the communicative context in which learners need to operate. Thus, the theoretical underpinnings of specific purpose assessment must be expanded to include not only strictly linguistic features but also features of the context of interest to test-takers and score users” (p. 371).

“Authenticity is a matter of degree related to the features of specific purpose context that are simulated in the test tasks” (p. 371).

Douglas, D. (2013). ESP and assessment. In B. Paltridge & S. Starfield (Eds.), The handbook of English for Specific Purposes (p. 367-383). Chichester, UK: John Wiley & Sons.

5. Elder (2016)

“In LSP testing (authenticity) is paramount. Representation of the target context in a way that will engage the specific abilities required for communication in that context is seen as a critical condition for valid decision-making. It I also valuable in promoting a positive washback effect” (p. 148).

“Concerns for authenticity inform not only domain description and task design in LSP testing, but also the nature of interaction during the test encounter” (p. 149).

Elder, C. (2016). Exploring the limits of authenticity in LSP testing: The case of a specific-purpose language test for health professionals. Language Testing, 33(2) 148-152.

6. Elliott (2013)

“Authenticity of a test is an issue which pervades all considerations of validity, (lack of which) weakens generalizability of test results. Authenticity is […] a critical parameter to be considered by test developers” (p. 154).

“Tasks must be selected which form a representative sample of the types of listening the candidate would be exposed to within the TLU domain” (p. 154-155).

Elliott, M. (2013). Cognitive Validity. In A. Geranpayeh & L. Taylor (Eds.), Examining Listening: Research and Practice in Second Language Listening – Studies in Language Testing v. 35 (pp. 152-241). Cambridge: Cambridge University Press.

7. Fulcher and Davidson (2007)

“Authenticity […] does not make test tasks automatically valid through directness: it means only that we may be able to model test-taker behavior in way that allow us to observe the use of processes that would be used in real-world language use” (p. 154).

Fulcher, G., & Davidson, F. (2007). Language testing and assessment: An advanced resource book. London and New York: Routledge.


CRITERION 1 Use of language in radiotelephony contexts
CRITERION 2 Different tests for pilots and air traffic controllers
CRITERION 3 Dedicated tasks to assess listening comprehension
CRITERION 4 Distinct sections with appropriate task types
CRITERION 5 Engage in interactive and extended communication
CRITERION 6 Tasks and items can differentiate between levels
THIS PAGE Assess abilities for real-world communications
CRITERION 8 A sufficient number of equivalent test versions
GO TDG Criteria: HOME
GO Test Evaluation Tool
GO Intro & Objectives
GO TDG Workshops 2019