TDG Criteria & Evaluation Tool

Guidelines and explanations to evaluate the design of ICAO LPR Tests

ICAO LPR TEST DESIGN CRITERIA

CRITERION 1

Test instruments need to include appropriate tasks that directly assess how test-takers use language in radiotelephony communication contexts.

Why is this important?

The ICAO LPRs refer to communication between pilots and air traffic controllers. Assessing proficiency for radiotelephony communication is central to the ICAO LPRs. This is the primary focus of the safety-critical communication contexts which the ICAO LPRs aim to address.

What does this mean for test design?

Speaking and listening skills in air-ground communication contexts need to be directly assessed through dedicated test tasks that reflect this content, context and type of communication. A substantial component of the test needs to contain content and task types which assess how test-takers communicate in radiotelephony communication contexts – in listening (understanding what pilots or controllers are saying over the radio) and speaking (being able to communicate as pilots or air traffic controllers over the radio) components of the test. The tasks used to assess language proficiency for radiotelephony communication need to be appropriate and reflect the communicative contexts in which pilots or controllers communicate in real-world situations.

READ TDG Criterion 1

CRITERION 2

Separate test instruments need to be designed for pilots and air traffic controllers.

Why is this important?

The communication contexts and language needs of pilots and air traffic controllers differ and therefore these need to be reflected in different forms of the test, catering to the needs of each profession.

What does this mean for test design?

The structure of test instruments for pilots/air traffic controllers may look similar, however, the content and task requirements need to differ and reflect the language needs and communicative contexts associated with each profession.

READ TDG Criterion 2

CRITERION 3

Test instruments need to contain tasks dedicated to assessing listening comprehension, separate from tasks designed to assess speaking performance.

Why is this important?

Listening comprehension represents at least half the communicative load in aeronautical communication. Proficiency in comprehension is determined by a range of different cognitive skills, language skills and knowledge. All of these attributes are internal and cannot be directly observed for assessment purposes. In contrast, speaking skills are more observable and can be assessed directly by observing speaking performance. Therefore, proficiency in listening comprehension is best assessed in contexts which are not affected by speaking ability because basing decisions on what test-takers say may be more of a result of their speaking skills rather than their internal comprehension proficiency.

What does this mean for test design?

Tests need to contain sections and parts which are designed to only assess listening comprehension. This means test-takers are required to listen to prescribed recordings and then complete follow up comprehension tasks. Such tasks could be on paper, require test-takers to summarise information or answer prescribed written questions asked orally or provided on a test paper/computer screen.
It is possible for tests to also evaluate comprehension subjectively in an interactive context in addition to having a dedicated listening test section, but not to the exclusion of including dedicated listening comprehension test sections. In such situations the subjective ratings should be used to support the results of the dedicated listening sections.

READ TDG Criterion 3

CRITERION 4

Test instruments need to comprise distinct sections with a range of appropriate test task types.

Why is this important?

Tests need to comprise different sections with different assessment purposes – assessing different skills/levels in a range of communication contexts. Tests which contain a range of different test tasks provide more opportunities to effectively sample the range, complexity and type of communication pilots or air traffic controllers may face. This improves both the fairness and the effectiveness of the test, including the validity of the interpretations made of test results and how these are used. Tests which do not include enough variety are not able to effectively sample test-takers’ abilities to engage with the language or communicate in different situations. This undermines the test’s overall validity. Such tests can also unfairly disadvantage test-takers who are less familiar or comfortable with certain test tasks which may dominate a test.

What does this mean for test design?

A variety of different task types, items, situations and content needs to be included throughout the test instrument to ensure the domain and range of language proficiency levels are effectively sampled.

READ TDG Criterion 4

CRITERION 5

Test instruments need to include test tasks that allow test-takers to engage in interactive and extended communication.

Why is this important?

Aeronautical radio communication involves pilots and controllers communicating in interactive situations – responding to issues, enquiring, solving problems, providing advice etc. In all such communication, each participant is required to engage in topics, negotiate meaning and participate in a collective and shared communicative context which develops as a result of the interaction.

What does this mean for test design?

At least some speaking tasks need to provide opportunities for test-takers to participate in interactive communication with a trained interlocutor, i.e., tasks which require the test-taker to contribute to a co-constructed dialogue in the same way that communication occurs in real-world aeronautical contexts. Test tasks which are limited to test-takers responding to isolated questions or disconnected prompts do not allow interactive skills to be evaluated and do not reflect real-world communications. They are therefore not authentic. Authenticity is a key requirement of proficiency testing. Note that in interactive speaking tasks, comprehension should not be assessed, or only be limited to supporting the results of a part of the test dedicated to assessing comprehension separately. Comprehension could be rated and included as a subjective impression to confirm or support the results of a dedicated listening part of the test (see Criterion 3).

READ TDG Criterion 5

CRITERION 6

Test instruments need to include tasks and items which allow the assessment to differentiate between ICAO language proficiency levels.

Why is this important?

Content and task types which target the proficiency levels and skills associated with each of the ICAO levels the test aims to assess need to be incorporated into the test.

What does this mean for test design?

Parts and components of tests need to be accessible and achievable for test-takers at different levels, with some parts/tasks/items catering to test-takers with lower proficiency levels (below ICAO Level 4), and other parts for ICAO Level 4 (and above). If the test claims to be able to assess ICAO Level 5 or Level 6, different tasks/items or test parts need to be dedicated to assessing these higher levels and their associated competencies (as reflected in the ICAO LPR Rating Scale).

READ TDG Criterion 6

CRITERION 7

Test instruments need to contain appropriate tasks that assess test-takers’ abilities to understand and communicate in real-world contexts.

Why is this important?

In order for the test to allow valid assessment decisions to be made about how well test-takers are able to communicate in their jobs as either pilots or air traffic controllers, the test needs to ensure the content and task requirements allow for this evaluation to be effective. The closer the test reflects the communicative requirements associated with real-world communication contexts which pilots or air traffic controllers face, the more meaningful the test results are. Test tasks which require test-takers to communicate or use language that is not directly associated with how they communicate in real-world situations are not able to allow meaningful assessment decisions to be made about how well the test-takers can communicate in their jobs as pilots or air traffic controllers.
In high-stakes testing, test-takers respect tests and the results of such tests when the tests mirror real-world communication needs of the test-takers.

What does this mean for test design?

Components of the test need to contain tasks and content that mirror the kind of communication settings and contexts associated with real-world situations that pilots and air traffic controllers may face, both in radiotelephony communication and other job-related communication contexts. The more directly the test tasks mirror real-world communication contexts, including the type of language and how this is used, the more effective a test instrument is in allowing valid interpretations to be made about test performances and test scores.

READ TDG Criterion 7

CRITERION 8

Test instruments need to have a sufficient number of equivalent versions, with each version of the test representing the test instrument in the same way.

Why is this important?

Tests which do not draw on a sufficient bank of test versions lack security. Test-takers in a target population can become familiar with test content and therefore prepare and rehearse answers and responses for the test. Obviously, in such cases, the test is not able to accurately assess test-takers’ real overall proficiency as test-takers may appear to perform on the test at levels above their ‘real’ proficiency level.

Test banks also need to comprise equivalent test versions. This means that test-takers receive similar results on whichever version of the test they take. If the test bank includes versions which are easier than other versions, the test and the results are not reliable and therefore the overall testing system is not effective.

What does this mean for test development?

Tests need to comprise a test bank where each version of the test has aspects which are unique to that version of the test. Test developers need to ensure that each version of the test is written to a set of specifications so that all test versions are parallel and more or less equivalent in their level of difficulty and the range of language and communicative contexts that they assess.

The larger the test-taker population and the more often they need to be tested, the larger the test bank must be.

READ TDG Criterion 8

TEST INSTRUMENT EVALUATION TOOL

If a test instrument meets all 8 TDG Criteria it is likely to be effective and suitable for making ICAO LPR assessments.

NAVIGATION

CRITERION 1	Use of language in radiotelephony contexts
CRITERION 2	Different tests for pilots and air traffic controllers
CRITERION 3	Dedicated tasks to assess listening comprehension
CRITERION 4	Distinct sections with appropriate task types

CRITERION 5	Engage in interactive and extended communication
CRITERION 6	Tasks and items can differentiate between levels
CRITERION 7	Assess abilities for real-world communications
CRITERION 8	A sufficient number of equivalent test versions

GO	TDG Criteria: HOME

GO	Test Evaluation Tool

GO	Intro & Objectives

GO	TDG Workshops 2019