Test instruments need to contain tasks dedicated to assessing listening comprehension, separate from tasks designed to assess speaking performance.

Key Issues

Tests need to include a separate component specifically designed to assess listening comprehension. Comprehension represents half the linguistic workload in aviation communications and failure of a pilot or controller to understand input in the form of air-ground communications poses serious threats to communicative success.


The participation of the listener in a communicative situation may vary considerably, with different degrees of collaboration or interaction. It is necessary to emphasise the need to assess listening comprehension in the form of non-collaborative listening, which is a process of understanding what speakers mean in non-interactive situations. Interactive listening skills are also important, but the ability to perform the listener’s role in a collaborative situation has generally been tested as a speaking skill, usually in interactive assessments, such as oral interviews or role-plays.

What does this mean for test design?

Tests need to contain sections and parts which are designed to only assess listening comprehension. This means test-takers are required to listen to prescribed recordings and then complete follow up comprehension tasks. Such tasks could be on paper, require test-takers to summarise information or answer prescribed written questions asked orally or provided on a test paper/computer screen.

It is possible for tests to also evaluate comprehension subjectively in an interactive context in addition to having a dedicated listening test section, but not to the exclusion of including dedicated listening comprehension test sections. In such situations the subjective ratings should be used to support the results of the dedicated listening sections.

ICAO Statements & Remarks

The following statements from ICAO Document 9835 (2nd Edition, 2010) are related to this issue. (e) Comprehension:  This skill addresses the ability to recognize and understand speech. Development of this skill will result in decreasing difficulty when dealing with complex discourse, with unexpected or unfamiliar topics, unfamiliar accents or delivery styles and with unfavourable conditions of reception (due to background noise, etc.). Proficiency in comprehension can be characterized by the degree of detail and speed of understanding. The learning processes involved in the development of comprehension are:
1) mastery of other subskills;
2) progression from simplified to natural speech;
3) graduated listening tasks (word recognition, overall meaning, complex meanings, inferences).
4.6.6. While comprehension is only one out of six skills in the Rating Scale, it represents half of the linguistic workload in spoken communications. If comprehension is assessed through a specific listening section with individual items, it should not be done to the detriment of assessing interaction.

What it means. Some general language tests evaluate comprehension during an oral interaction such as a conversation, interview or role-play. Other general language tests evaluate comprehension separately, in some cases via a series of individual listening items. An example of an individual listening item, in the aviation language context, might require a test-taker to listen to a pre-recorded conversation between ATC and a flight crew to identify relevant pieces of information.

Why it is important. A separate listening test can provide information about comprehension independent of a person’s ability to interact. In such tests, the communication is one-way, and the test-taker does not have to participate in the way that is required by a conversation, role-play or other interaction.

Additional information. It is important for test developers to justify their approach to evaluating comprehension.

Why this issue is important

Comprehension draws on different language processing and cognitive skills to speaking and therefore requires test tasks designed to evaluate comprehension separately. Assessing comprehension in test tasks developed primarily to elicit speaking performance prevents isolation of comprehension skills, thereby making it difficult to separate the assessment of proficiency in comprehension from other speaking-related skills. Having test tasks designed to evaluate comprehension separately means scores and results in these test tasks are less likely to be affected by speaking ability and so do not interfere or overlap with the skills associated with speaking.

Assessing comprehension at the same time as speaking compromises the validity of the result for comprehension.

Evaluating comprehension only in contexts where test-takers participate in an oral interview situation can be problematic in cases where the assessment is based on what the test-taker is able to produce (say). In such cases, caution is needed as decisions about comprehension ability may be influenced by productive skills such as language knowledge (grammar or vocabulary) or pronunciation. It is possible a test-taker may comprehend spoken language adequately but not be able to interact or respond effectively due to lower productive skills related to knowing and retrieving vocabulary and grammatical structures.

It is also possible that a test-taker may comprehend input in a collaborative task but his/her pronunciation interferes with the assessors’ abilities to evaluate effective comprehension (i.e. the assessors cannot adequately understand how a test-taker responds, this may result in a lower rating for comprehension, irrespective of the fact that the test-taker understood the input and responded correctly). Test developers need to be mindful of ensuring interference of ability in other skills do not unfairly influence the assessment results in such cases.

Next ➟ Best Practice Options

Best Practice Options

Test Developers should consider the following points when designing test instruments.

Acceptable Options

Option 1:

Assess comprehension through a separate part of the test which requires test-takers to listen to audio recordings where identified comprehension of language for communication is assessed. The test task requires test-takers to answer item-based questions where their responses are scored as correct or incorrect. Results are based on the number of correct items, with score levels which equate to ICAO levels for comprehension.

Option 2:

Assess comprehension through a dedicated part of the test which requires test-takers to listen to audio recordings where identified comprehension of language for communication is assessed. The interlocutor asks prescribed scripted questions designed to assess how well the test-taker is able to comprehend each and/or parts of the recordings. Their responses are evaluated in terms of predicted responses which equate to ICAO levels for comprehension.

Acceptable Task Types

The test task requires test-takers to listen to one or more audio recordings and are presented with one more of the following task types in the test:

  • Select short responses which best reflect the answers to questions based on the main ideas or specific details in the audio (typically multiple choice with four options, one of which is the correct answer) while the audio is played.
  • Select a short sentence which best completes a summary sentence based on either the main ideas or specific details in the audio (typically multiple choice with four options, one of which correctly completes a statement) while the audio is played.
  • Add words and/or short phrases to complete a summary table of information contained in the audio while the audio is played.
  • Answer scripted questions an interlocutor asks once recordings are played and which test-takers read prior to or during the playing of audio recordings.
  • Answer summative or holistic questions an interlocutor asks after recordings are played which assess global comprehension skills (as opposed to details which may be difficult to recall later).

Not Acceptable Task Types

1. Dictation test tasks in any form
The ability to write down what is heard (including simple word recognition) does not necessarily indicate that meaning has been comprehended. Other skills also influence the ability to transcribe audible language.
2. Cloze tasks, including requiring test-takers to add words or phrases to complete scripted sentences based on audio recordings
There are many aspects to comprehension. These kinds of task types only assess a very narrow component of comprehension (e.g. simple word recognition) and, if overly used in a test instrument, the test fails to fully assess other components of comprehension. Further, it is possible a test-taker can accurately complete a cloze task by recognising the sounds they hear and transcribe these into a plausible attempt to complete a cloze, without understanding the meaning of the transcribed word(s) or the broader meaning.
3. Selecting ‘True’, ‘False’ or ‘Not Given’, based on whether a statement accurately reflects the meaning or ideas contained in audio recordings
While this can be useful in language training, presenting test-takers with statements with a percentage chance of being correct reduces the likelihood that the test task is able to adequately assess comprehension in a significantly valid way when the level is determined by the number of correct items.
4. Placing statements or pictures in a correct sequence which best reflects the order of information contained in audio recordings
While this can be useful in language training, this kind of task is problematic if one or more items in the sequence are incorrect, as this can affect the overall score, even though the majority of the sequence may still be broadly correct. In such cases a test-taker’s resultant low score is not commensurate with their actual ability to achieve the task.
5. Labelling or matching information (e.g. words or pictures) with a large number of items
While this can be useful in language training, these kinds of tasks can be problematic in testing because of the connected relationship with the items within the task. In simple terms, if one item is incorrectly labelled and matched this results in a knock-on effect with one or more other items also being incorrectly labelled and matched. The use of a task-type where the items are inter-connected should be avoided in cases where a large proportion of the items are associated with the one task and therefore impact on the overall score.
6. Reading back ATC transmissions
The ability to repeat what is heard in the form of a readback does not necessarily indicate that meaning has been comprehended. Unfamiliar vocabulary, for example, could be repeated without test-takers understanding the overall meaning.
7. Listening to extended audio recordings then summarising the ideas
Such tasks are influenced by the ability to memorise then recall information, separate to comprehension skills. Further, test-takers may be able to repeat chunks of a recording without necessarily having comprehended the message or meaning.
8. Listening to short radio transmissions and summarising the intent or meaning
Test-takers may be able to repeat what they have heard without necessarily having comprehended the message or meaning.
9. Reading long and/or dense text or questions associated with the audio (e.g. long multiple choice options)
Requiring test-takers to read complex or long texts in test tasks related to assessing comprehension may inadvertently assess reading proficiency, thereby reducing the effectiveness of the task in assessing just comprehension.

Next ➟ External References

External References

The following references are provided in support of the guidance and best practice options above.

1. Buck (2001)

“I concluded that listening comprehension is an active process of constructing meaning, and that this is done by applying knowledge to the incoming sound. I further concluded that comprehension is affected by a wide range of variables, and that potentially any characteristic of the speaker, the situation or the listener can affect the comprehension of the message.” (p.31)

The listening construct
“The first point is that listening is a multi-faceted process, with a large number of sub-components that can be viewed from a number of different perspectives. […] we need to test both the ability to extract basic linguistic information, and the ability to interpret that in terms of some broader context. We need linguistic processing, which should include phonology, stress and intonation, word meanings, syntax and discourse features. We also need interpretation in terms of the co-text, the context of the situation and world knowledge, which would include summarizing, making inferences, understanding sociolinguistic implications, following the rhetorical structure and understanding the speaker’s communicative intent.” (p.59)

Buck (2001) explores three approaches to assessing listening – discrete-point tests, integrative tests, communicative tests – and the ideas associated with them. He explains that “[…] in many ways it is possible to see these ideas as representing the development of an expanding view of the listening construct: from the narrow view of listening as recognizing elements, through listening as language processing, to the more current idea of listening as interpreting meaning in terms of a communicative context.” (p.93).

“Given the lack of context and no obvious communicative situation, dictation does not seem to require the ability to understand inferred meanings, or to relate the literal meaning to a wider communicative context. Dictation operationalises listening in the narrower of the two-level view of listening. Dictation also clearly tests more besides listening: it requires good short-term memory as well as writing ability, and it seems fair to say that it is far more than a test of listening skills.” (p.78)

Cloze task
“…in a listening cloze, filling some gaps may test listening in the narrower sense of understanding clearly stated information, whereas other gaps may only require word recognition skills.” (p.73)

Sentence-repetition task (readback)
“Although sentence repetition tasks work through listening, they require more than just listening skills. If the sentence is short and can be repeated immediately, it might test no more than the ability to recognize and repeat sounds, and this may not require processing of the meaning at all. […] If the sentence gets a little longer, or the delay between listening and repeating increases, the task will begin to test working memory. As the sentences get even longer, it seems likely that chunking ability and the ability to deal with reduced redundancy will begin to become more important, and, as with dictation, these are closely related to general linguistic competence. As listening tasks they require no more than understanding short sections of decontextualised language on a literal level.” (p.79).

Buck, G. (2001). Assessing listening. Cambridge: Cambridge University Press

2. Green (2017)

The author outlines the difficulties associated with using a number of listening test tasks.

“While it is possible to use sequencing as a test mentioned in listening, for example, test-takers could be asked to put a series of pictures into the correct order according to the content of the sound file, care must be taken to minimise the role of memory and recall in task completion as this may involve construct irrelevant variance (Field, 2013). The number of items in the sequence would therefore need to be limited and this might in turn lead to guessing” (p. 95).

“True/false tasks can also be used for listening, but the obvious problem is that test-takers have a 50:50 chance of answering the item correctly by guessing. While this might be acceptable as part of a low-stakes test, in a high-stakes one it is not something to be recommended” (p. 95).

Green, R. (2017). Designing listening tests: A practical approach. Basingstoke: Palgrave Macmillan


CRITERION 1 Use of language in radiotelephony contexts
CRITERION 2 Different tests for pilots and air traffic controllers
THIS PAGE Dedicated tasks to assess listening comprehension
CRITERION 4 Distinct sections with appropriate task types
CRITERION 5 Engage in interactive and extended communication
CRITERION 6 Tasks and items can differentiate between levels
CRITERION 7 Assess abilities for real-world communications
CRITERION 8 A sufficient number of equivalent test versions
GO TDG Criteria: HOME
GO Test Evaluation Tool
GO Intro & Objectives
GO TDG Workshops 2019