Test instruments need to include appropriate tasks that directly assess how test-takers use language in radiotelephony communication contexts.

Key Issues

The ICAO Language Proficiency Requirements and Rating Scale were developed to assess speaking and listening proficiency specifically for aeronautical radiotelephony communication (ICAO Doc 9835, 2010).

ICAO Annex 10, Volume II states that pilots and ATCOs need to use ICAO standard phraseology specifically.

Annex 10 also states that proficiency in both phraseology and plain language is required.

In order to evaluate language proficiency in such a specific purpose domain (referred to as Language for Specific Purposes (LSP)), test tasks need to include the context and key language features that test-takers experience in real-life operations.


As authenticity is also a key requirement in assessment. This means that test instruments need to contain tasks where test-takers are assessed on their ability to communicate in aeronautical radiotelephony situations for all six criteria in the ICAO Rating Scale. The more directly related the test content and contexts are to the real-world communicative contexts pilots and controllers operate in, the more authentic and valid the test is. Authenticity is maximised by ensuring LPR test instruments include test tasks that mirror radiotelephony communication situations so that test-takers are required to demonstrate their language abilities to communicate in radiotelephony communication contexts – both in speaking and listening components of the test.

What does this mean for test design?

Speaking and listening skills in air-ground communication contexts need to be directly assessed through dedicated test tasks that reflect this content, context and type of communication. A substantial component of the test needs to contain content and task types which assess how test-takers communicate in radiotelephony communication contexts – in listening (understanding what pilots or controllers are saying over the radio) and speaking (being able to communicate as pilots or air traffic controllers over the radio) components of the test. The tasks used to assess language proficiency for radiotelephony communication need to be appropriate and reflect the communicative contexts in which pilots or controllers communicate in real-world situations.

ICAO Statements & Remarks

The following statements from ICAO Document 9835 (2nd Edition, 2010) are related to this issue.

3.2.7. The sole object of ICAO language proficiency requirements is aeronautical radiotelephony communications, a specialized subcategory of aviation language corresponding to a limited portion of the language uses of only two aviation professions — controllers and flight crews. It includes ICAO standardized phraseology and the use of plain language.
4.2.5. The language proficiency requirements and Rating Scale were developed to assess speaking and listening proficiency specifically for aeronautical radiotelephony communications.
4.3.3. The need for plain language proficiency is a fundamental component of radiotelephony communications […] but it should not be interpreted as suggesting that plain language can suffice instead of ICAO standardized phraseology.
4.3.4. When plain language is required, it should be delivered in the same clear, concise and unambiguous manner as standardized phraseology in emergencies or unusual situations, to clarify or elaborate on instructions […] and should in no way be interpreted as permission to chat or otherwise ignore the formal and informal protocols that govern the use of standardized phraseology.
4.5.3. a) Proficient speakers shall communicate effectively in voice-only (telephone/radiotelephone) and in face-to-face situations. Radiotelephony communications lack the facial cues, body language and listening cues found in usual face-to-face situations. Communications without such cues are considered to be more difficult and challenging, requiring a higher degree of language proficiency than face-to-face interactions. In addition, other features of radiotelephony communications make it a unique kind of communicative event. For example, the sound quality may be poor, with distracting sounds and the communicative workload of the air traffic controller or a pilot may be heavy, with a corresponding need for efficiency and brevity. This holistic descriptor draws attention to the need for training and testing to provide voice-only settings to exercise or demonstrate language proficiency, as well as face-to-face settings that allow broader uses of language.

b) Language proficiency should not be limited to standardized phraseology and should range across a relatively broad area of work-related communicative domains.

4.5.5. The ICAO Rating Scale has a distinct aeronautical radiotelephony focus. It addresses the use of language in a work-related context [in] voice-only communications. It is vital that language testing for licensing purposes comply with best practices and address the specific requirements of aviation operations. In an aviation context, proficiency testing should establish the ability of test-takers to effectively use appropriate language in operational conditions. It is important that test-takers are evaluated in their use of language related to routine as well as unexpected or complicated situations as evidence of their level of proficiency. Because of the high stakes involved, pilots and air traffic controllers deserve to be tested in a context similar to that in which they work. Radiotelephony communications require not only the use of ICAO standardized phraseology, but also the use of plain language. A test designed to evaluate knowledge or use of standardized phraseology [only] cannot be used to assess plain language proficiency. It is acceptable that a test of plain language in a work-related context could contain a scripted test task or a prompt in which standardized phraseology is included. The test task may be used […] as a means of setting a radiotelephony context in which to elicit plain language responses from the test-taker. If phraseology is included in a test prompt, care should be taken that it is used appropriately and that it is consistent with ICAO standardized phraseology. In general, tasks that resemble real-life activities are most suitable. A narrow interpretation would aim to closely replicate radiotelephony communications, including the extent of plain language. A broad interpretation of the holistic descriptors and Rating Scale would aim to elicit plain language on various topics that are related to radiotelephony communications or aviation operations, without replicating radiotelephony communications specifically. Examples may include question and answer routines, problem-solving exchanges, briefings, simulations and role-plays. Both interpretations are valid.
Special Guidance on ICAO Doc 9835 (2010), refers to the need for test tasks (and content) to reflect real-life activities (where communication occurs) but then allows for a broad interpretation of how test tasks (and content) may not need to replicate radiotelephony communications specifically.

Best practice in English for specific purpose testing proposes that the test instrument in fact should replicate real-world communication contexts. Authenticity is a key quality in LSP test tasks and content, and without authenticity LSP are not appropriate for assessment of test-takers’ language proficiency in the target language situations (in the case of the ICAO LPRs, radiotelephony communications). For LPR tests, it is important to include a variety of test tasks and a substantial part of the test instrument should focus on contexts directly associated with radiotelephony communications.

In tests which choose to not include task types that contain language directly associated with radiotelephony communications, problems emerge in how well these tests are able to effectively measure language proficiency in this very specific language and communications domain.

The mainstream and accepted approach towards valid and meaningful language testing practices in ESP is that the test content (language and how it is used) needs to reflect real-world communication settings. This improves the validity of the test and ensures test-takers are more likely to respect the test. Tests which contain contexts which do not relate to the communication contexts associated with radiotelephony communication are more likely to be viewed negatively by test users (test-takers, those administering the test and those who are respond to the outcomes of the test, such as licensing authorities and airlines or ANSPs).

Further, with over a decade of international experience in the implementation of the ICAO LPRs, clear patterns are emerging which strongly indicate that LPR licensing tests which directly include radiotelephony communication contexts in tasks associated with both listening and speaking components of tests have more credibility with stakeholders and are more likely to have positive washback effects on training programmes. In other words, such tests promote language training which addresses the language needs of controllers and pilots for communication in radiotelephony communication contexts associated with their jobs or every day operational environment.

For these reasons, LPR tests need to contain tasks and content which directly reflect the way pilots and controllers use and hear language in radiotelephony communications in order to improve test validity, acceptance among all stakeholders and the affect on related language training programmes.

Contrary to what ICAO Doc 9835 states, the option for a broad interpretation, allowing tasks to elicit plain language topics that are related to radiotelephony communications or aviation operations, without replicating radiotelephony communications specifically, is therefore not recommended in the design of an an effective ICAO LPR test instrument. The language proficiency requirements in Annex 1 specify that speaking and listening should be evaluated in the context of operational aviation communications. The holistic descriptors and Rating Scale were developed to address the specific requirements of radiotelephony communications. Tests developed for other purposes may not address the specific and unique requirements of aviation language testing. Proficiency tests that are administered directly may use face-to-face communication in some phases of the delivery but should include a component devoting time to voice-only interaction. Voice-only interaction is an important characteristic of aeronautical radiotelephony communications. The test should be specific to aviation operations and provide test-takers with opportunities to use plain language in contexts that are work-related for pilots and air traffic controllers.

The ICAO Language Proficiency Requirements (LPRs) refer to the ability to speak and understand the language used for radiotelephony communications. It is important that flight crews and air traffic controllers be proficient in the use of plain language used within the context of radiotelephony communications in order to communicate safely on any operational issue that may arise.

ICAO language provisions require proficiency in the use of standardized phraseology and in the use of plain language. The assessment of standardized phraseology is an operational activity, not a language proficiency assessment activity. While an aviation language test may include phraseology to introduce a discussion topic or make interaction meaningful to the test taker, it is important that tests elicit a broad range of plain language and not be limited to tasks that require standardized phraseology.

Why this issue is important

In cases where test developers claim to assess language proficiency for communication in radiotelephony but the test fails to include content and task types directly related to communication in radiotelephony, the validity of inferences based on test results is questionable. How confident could we be, for example, that pilots or controllers who have taken this test are able to use the necessary language functions, vocabulary and structures to explain or describe problems and their intentions and interact with and understand ATC over the radio in non-routine situations? And how confident would we be that controllers who have taken a test could understand and interact with pilots in similar situations? This in fact is the exact intention of the ICAO LPRs: ensuring tests assess language proficiency for successful radiotelephony communication between pilots and ATCOs.

A simple analogy would be, if there is a requirement that doctors’ language proficiency needs to be assessed and the requirement aims to address how well doctors communicate with patients to diagnose illnesses, make evaluations and explain possible treatments, but then test instruments are provided which do not directly assess how doctors interact and communicate with patients, how authentic would such tests be and how confident would we be that they indeed measure the skills needed to meet the requirement? Test instruments which included task types which directly assess test-takers’ abilities to talk with patients (e.g. in role-play situations) and comprehend doctor and patient consultations (in listening test sections) would have authenticity, assess language use in the target use situations (doctor-patient conversations in a medical clinic or hospital) and allow valid score-based inferences.

In the context of the LPRs this means that test instruments need to contain task types which require pilots and ATCOs to communicate in situations that mirror situations requiring the use of radiotelephony communications.

In the case of the ICAO LPRs this means that at least some parts of the test instrument need to include test tasks that provide the context and content associated with radiotelephony communications for the assessment of language proficiency in aeronautical radiotelephony communications. Including authentic task content and contexts is a key element of all LSP language testing, not only for aviation. This differentiates the test from more general purpose language testing.

A key issue in LSP test design is identifying the communicative needs of test-takers, as well as the contexts in which they communicate and use language. For the ICAO LPRs this means recognising the communication needs of pilots and controllers in radiotelephony communication contexts when more complex plain language is needed. The context for communication is determined by the task type. Test tasks need to elicit plain language alongside phraseology in order to assess a test-taker’s ability to communicate and express ideas in a range of routine and non-routine situations.

Standard phraseology and plain language are critical content features of radiotelephony communication. It therefore follows that in order to provide a valid assessment instrument to assess the language proficiency of pilots or controllers in this LSP domain, both these types of language need be included in the content associated with test tasks.

However, as the ICAO LPRs focus on assessing proficiency in plain language, test tasks need to provide contexts in which test-takers switch from using phraseology to plain language (usually in non-routine situations). It is this plain language which should be the focus of the assessment. Phraseology can be seen as the scaffolding within the test tasks that allow the plain language to be used in authentic contexts.

Only by this method of replicating authentic real-life communication can it be determined that a test-taker has the required language proficiency to communicate successfully in radiotelephony communication contexts.

The purpose for testing language proficiency in radiotelephony communications is to be as confident as possible that pilots or controllers can communicate with a minimum level of proficiency for safe, efficient and effective communications.

It follows that the test instruments used to assess language proficiency of pilots or controllers, need to mirror as closely as possible the communicative contexts in real-life (that is, over the radio) in controlled testing situations (controlled so that the type and complexity of language test-takers produce is determined through the task design and can therefore be evaluated against a predetermined expected response). This applies to both speaking and listening contexts. It would be difficult to exclude phraseology entirely from the test instrument as it is needed to support the eliciting or presentation of plain language in radiotelephony communications. However, for assessment purposes, assessment of proficiency in phraseology should be avoided.

In order to make the most accurate inference as possible about a pilot or controller’s language proficiency, test-takers need to engage in tasks similar to those in which we want them to perform in real-life. It is not sufficient to simply allow them to talk about the subject matter as this does not allow an appropriate evaluation of the skills required to communicate in real world context radiotelephony communication situations.

We know that the language used in real-life air-ground communication can be classified as:

1. standard phraseology normally used for routine and expected situations.
2. plain language normally used for non-routine and unexpected situations, where phraseology alone does not suffice.

Standard phraseology is clearly defined in ICAO Doc 9432 and so it is relatively easy to produce phraseology scripts for test tasks.

It should also be noted that for the majority of interactions in this context, communication is also voice only. Therefore certain plain language functions are also vital for effective communication in this context and should be incorporated in test tasks.

Radiotelephony communication between pilots and ATCOs can be seen as a mix of standard phraseologies and plain language. In order to provide effective test instruments that assess language proficiency in this domain, test tasks need to include these elements in the test design through the test tasks.

Next ➟ Best Practice Options

Best Practice Options

Test Developers need to consider the following points when designing test instruments.

LPR tests need to include a range of test tasks in both the listening and speaking components of the test which focus on assessing language proficiency associated with radiotelephony communications in which radiotelephony is used. Examples of appropriate test tasks may include having test-takers participate in role-play situations to allow their communicative ability with either pilots or controllers to be evaluated, having test-takers listen to audio recordings of radiotelephony recordings (authentic or simulated) and requiring them to answer prescribed questions in a listening comprehension parts of the test.

Test tasks need to provide test-takers with opportunities to participate in co-constructed, extended and interactive communication with an interlocutor. Such tasks may require the test-taker to participate in a role-play as a controller or pilot and share information, explain, resolve issues, negotiate meaning in order to reach an outcome, similar to the way in which communication in real-world aeronautical situations occurs (refer to Criterion 5). Test tasks which are limited to requiring test-takers to merely paraphrase information in transmissions or produce short isolated readbacks are not sufficient for assessment purposes. Such tasks are limited in their ability to assess test-takers’ language skills associated with producing spontaneous and creative language and manage they types of communicative contexts they are exposed to in operational or real-world situations.

The inclusion of test tasks which do not directly assess language proficiency for radiotelephony communications is acceptable, provided this is not at the expense of inclusion of radiotelephony-related tasks. In other words, at least some test tasks need to assess language directly related to radiotelephony communication contexts.

Test tasks which are limited to requiring test-takers to merely paraphrase information in transmissions or produce short isolated readbacks are not sufficient for assessment purposes. Such tasks are limited in their ability to assess test-takers’ language skills associated with producing spontaneous and creative language and manage they types of communicative contexts they are exposed to in operational or real-world situations.

The inclusion of test tasks which do not directly assess language proficiency for radiotelephony communications is acceptable, provided this is not at the expense of inclusion of radiotelephony-related tasks.

Next ➟ External References

External References

The following references are provided in support of the guidance and best practice options above.

1. Douglas (2000)

“Authenticity of task means that the LSP test tasks should share critical features of tasks in the target language use situation of interest to test-takers … to increase the likelihood that the test-taker will carry out the test task in the same way as the task would be carried out  in the actual target situation” (p. 2).

“It is not enough merely to give test-takers topics relevant to the field they are studying or working in: the material the test is based on must engage test-takers in a task in which both language ability and knowledge of the field interact with the test content in a way which is similar to the target language use situation” (p. 6). The test task must therefore be authentic.

“If we wish to interpret a person’s test performance as evidence of language ability in a specific language use situation, we must engage the test-taker in tasks which are authentically representative of that situation” (p. 7).

“A second reason for preferring specific purpose language tests over more general ones is that technical language – that used in … air traffic control … – has specific characteristics that people who work in the field must control” (p. 7).

“…it is only by taking note of the features of the target situations and comparing them with those of the test task, that we can make that inference (on performance) with any certainty” (p. 12).

“It is this analysis of target language use characteristics that which will allow us to make inferences about language ability in  specific purpose domain” (p. 14).

“The interaction between ability and task characteristics leads to authenticity … the extent to which the test does in fact engage the test-takers in task characteristics of the target language use situation” (p. 14).

Douglas (2000) gives an in-depth and systematic framework for analysis of target language and task characteristics to help in identifying test-task content and characteristics (pp. 103-108).

Douglas, D. (2000). Assessing languages for specific purposes. Cambridge: Cambridge University Press

2. Moder and Halleck (2009)

The authors investigated the variation in oral proficiency demonstrated by 14 ATCOs across two types of testing tasks: work-related radiotelephony-based tasks and non-specific English tasks on aviation topics (common occurrence and less expected occurrence). The results demonstrate significant differences in the performance of test-takers across task types with respect to the established minimum required proficiency, Operational Level 4. “Of greater concern from a public safety perspective is the finding that some controllers performed at Operational level on one of the general description in the aviation context tasks and failed to demonstrate minimum proficiency on the radiotelephony tasks. In such a case, the general aviation task would have inaccurately predicted the controller’s performance level on a critical workplace task” (p. 13).

Moder, C. L. and Halleck, G. B. (2009). Planes, politics and oral proficiency: Testing international air traffic controllers. Australian Review of Applied Linguistics, 32(3), 25.1-25.16. DOI 10.2104/aral0925

3. Canale and Swain (1980)

“… assessment instruments must be designed so as to address not only communicative competence but also communicative performance, i.e. the actual demonstration of this knowledge in real second language situations and for authentic communication purposes” (p. 6).

Canale, M., and Swain, M. (1980). Theoretical bases of communicative approaches to second language teaching and testing. Applied Linguistics, 1, 1-47.

4. Bachman and Palmer (1996)

“Test performance must correspond in demonstrable ways to language use in non-test situations.” (p.9)

“The ability to describe the characteristics of language use tasks and test tasks is critical in order to demonstrate how performance on a given language test is related to language use in specific situations, other than the test itself.” (p.43)

“In language testing our primary purpose is to make inferences about test takers’ ability … in those domains in which the test takers are most likely to need to use the language.” (p. 44)

“The key to designing tests is … to include in the test, tasks whose distinguishing characteristics correspond to those of (real-world) tasks.” (p.45)

Bachman, L. F., and Palmer, S. (1996) Language Testing in Practice. Oxford: Oxford University Press.

5. Moder (2013)

“In English for specific purposes teaching and testing, the task of the curriculum or test designer is to mirror as accurately as possible the language, tasks, and contexts of the target language situation”. (p. 238)

“It is essential to include both routine and unexpected radiotelephony tasks in Aviation English tests, making use of representative authentic combinations of phraseology and plain language.” (p. 239)

Moder, C. L. (2013). Aviation English in B. Paltridge & S. Starfield (eds.), The handbook of English for Specific Purposes (pp-227-242). Chichester: John Wiley & Sons.

6. Field (2013)

“Cognitive validity is to be understood as the extent to which the test tasks succeed in eliciting from candidates a set of processes which resemble those employed … in a real-world listening event” (p.77) … “and is particularly important in professional contexts.” (p.78)

Field, J. (2013) Cognitive Validity in Geranpayeh, A. and Taylor, L. (ed.) Examining Listening – Studies in Language Testing. Cambridge: Cambridge University Press, 77-151.

7. Yan (2009)

“Valid language testing in aviation should reflect the real work domain as much as possible. That is, the characteristics of authentic language use between pilots and ATCs should be incorporated in to aviation language testing.” (p.80)

“Valid English language tests should reflect and stress the performance of a test taker in naturalistic aviation communication contexts instead of assessing structural and linguistic elements (in isolation).” (p.80)

“A valid test should be able to elicit the language use of pilots and ATCs based on the kinds of situations that are common to the industry of aviation.”  (p.80)

“The environment for administering language testing should also be as authentic as what pilots and ATCOs actually meet in their real aviation situations.” (p.81)

“Valid language testing should incorporate the element of real time. In aviation situations there is a considerable variety of technical phrases as well as general English terms that are required in response to commands.” (p.82)

Yan, R. (2009) Assessing English Language Proficiency in International Aviation. Saarbrücken: VDM.

8. Moder and Halleck (2009)

“The communication requirements of pilot-air traffic controller communication overlap little with those found in general English or other English for Specific Purpose contexts. ATC communication is highly distinctive in structure and discourse organization. For that reason, the ability to narrate or describe events outside the ATC domain does not show a direct relationship to workplace communication. The ICAO guidelines require aviation professionals to be able to deal with unexpected circumstances.” (p.25.13)

“What is critically required … is the ability to assess the accuracy of the information conveyed and understood, to ask questions or make statements that clarify the details of the [situation], to interact and negotiate meaning effectively. In [Pilot/] ATC communication this is usually done with a combination of phraseology and plain language … Tasks set outside of the ATC communication setting are unlikely to accurately assess the ability of aviation professionals to effectively combine these two modes of communication. Furthermore, one-way tasks that require the test-taker to listen to a general English prompt and provide a one-way response will not provide adequate opportunities to assess critical interactional competencies.” (p.25.13)

Moder, C. L. & Halleck, G. B. (2009). Planes, politics and oral proficiency: Testing international air traffic controllers. Australian Review of Applied Linguistics, 32(3), 25.1-25.16. DOI 10.2104/aral0925.


THIS PAGE Use of language in radiotelephony contexts
CRITERION 2 Different tests for pilots and air traffic controllers
CRITERION 3 Dedicated tasks to assess listening comprehension
CRITERION 4 Distinct sections with appropriate task types
CRITERION 5 Engage in interactive and extended communication
CRITERION 6 Tasks and items can differentiate between levels
CRITERION 7 Assess abilities for real-world communications
CRITERION 8 A sufficient number of equivalent test versions
GO TDG Criteria: HOME
GO Test Evaluation Tool
GO Intro & Objectives
GO TDG Workshops 2019