A number of key criteria have been identified as critical elements in the test instrument design which influence the overall effectiveness of an ICAO LPR test.

Each criterion plays a key role in how well an ICAO LPR test meets the objectives of the LPRs and how well the test performs overall in terms of validity and fairness. All the criteria are important in ICAO LPR best practice assessment.

These guidelines have been developed to help civil aviation authorities and organisations involved in the design of LPR tests recognise and understand key issues related to the design of LPR tests and their impact on overall LPR testing practices.

The criteria identified in these guidelines are core issues that shape the overall effectiveness and suitability of a testing system for the assessment of air traffic controllers and pilots for ICAO LPR licensing purposes.

Since 2003, when ICAO Document 9835 was first published, Civil Aviation Authorities and test developers have relied solely on this manual for information on how to design and develop or select tests which fulfil the ICAO LPR requirements. The intention was that test developers would base their test development on ICAO Document 9835 so that they could design, develop and administer tests for licensing purposes, and apply a common standard. An important objective of the ICAO LPRs is that equivalence between different tests and the ICAO Levels these tests award is achieved.

In an ideal situation, each State would develop/select and implement its own testing system(s) in accordance with the ICAO LPRs. The standard implemented would ensure confidence of equivalence among States. In other words, ICAO Document 9835 should provide the basis of a common framework for the implementation of tests and a means of promoting equivalence in the ICAO Levels awarded and the associated aviation English language skills and knowledge assessed by these tests. The objective was that a common and universal standard could be implemented globally, irrespective of which tests were used and in which States.

In the field of language testing, this ICAO initiative is unprecedented. There are no other examples of an attempt to develop and implement a common language standard internationally where different tests are developed and implemented independently which are based on a common rating scale and set of associated descriptors. While ICAO Document 9835 provides a common framework for the design, development, selection and administration of tests, the manual itself serves only as guidance material and was not intended to be highly prescriptive. ICAO provided such material to allow the international aviation industry to develop and implement tests to commence and which would improve proficiency standards. ICAO Document 9835 was developed to facilitate the development of tests that best met local test development capabilities and needs, yet were still based on a common standard. This has never been attempted on a global scale in any industry. In the first ten years of the ICAO LPRs, the project has achieved its goal. States have implemented the ICAO LPRs successfully. Civil Aviation Authorities, airlines and ANSPs across the globe are now aware of the importance of language proficiency in safety and have put in place systems to assess personnel. Language proficiency is now recognised in aviation operations. Language training is being implemented in response to the ICAO initiative on a global scale and standards are improving. To this effect, ICAO Document 9835 has achieved its objective.

However, although the LPRs have been in place since 2008, the industry now needs to look ahead to work together to improve standards. This can be done by reflecting on areas where ICAO LPR implementation can be strengthened and improved. One of the most important issues is to develop a means of improving standardisation and harmonisation of testing standards.

Since the LPRs were introduced, many language proficiency tests have been developed and implemented in isolation. While they may have been developed in response to the ICAO LPRs and followed the guidance material outlined in ICAO Document 9835, differences in interpretation of the guidance material, testing methods and, indeed what constitutes an effective LPR test instrument, have emerged. As a result, differences in testing standards and practices have arisen. This has introduced a number of challenges which could undermine the long-term effectiveness of the ICAO LPRs and, compromise aviation safety.

These challenges include:

  • inadequate implementation of a uniform ICAO LPR standard internationally;
  • variations in what language skills and language knowledge ICAO LPR tests assess and the extent to which these relate to the language, communicative contexts and proficiency levels needed for safe aeronautical communications;
  • a lack of equivalence between the ICAO Levels issued by different tests.
  • market forces favouring the emergence and spread of inferior tests and testing practises at the expense of quality tests and testing practices;
  • a lack of confidence or even mistrust in the ICAO Levels awarded by different tests and States;
  • a large number of tests based on flawed, inconsistent or unsuitable test designs which negatively impact on language training programmes and long-term attitudes towards language proficiency and the ICAO LPRs, and threats to civil aviation safety.

This lack of consistency among LPR tests is a result of wide differences in test instrument design. Further, regulatory authorities do not yet have access to a common framework to evaluate LPR tests – a key initial step required to attempt to standardise LPR testing systems.

The design of the test instrument is central to the quality and overall effectiveness of a testing system. In order for an equivalence between ICAO LPR tests to be established which would facilitate harmonisation of LPR standards internationally, ICAO LPR tests need to incorporate a number of key test instrument design elements. Identifying and highlighting these key elements, based on ICAO Doc 9835 and best practice in language assessment, provides clear parameters so that tests can have more in common in terms of the language skills and language knowledge they assess and their alignment with language assessment needs for radiotelephony communications. In simple terms, these guidelines aim to define baseline test instrument design elements that need to be included to allow effective comparisons to be made between tests. This can then facilitate international and inter-test standard setting.

Only once an ICAO LPR test instrument is well designed can it be possible to develop and implement a sound and valid testing system. It is often incorrectly assumed that the quality of the raters determines the quality and effectiveness of a testing system. In fact, rating can only be effective if the test instrument is well designed, valid and effective. Rather than the rater or interlocutor, it is the test instrument itself that determines the type, range and complexity of language as well as the adequate coverage of skills and contexts for communication.

These guidelines aim to provide clarification and explanation of key issues related to test instrument design in order to limit potential confusion when interpreting ICAO Doc 9835. The guidelines provide in-depth explanations of why key criteria are critical to the effectiveness of a test. The guidelines may expand on, go further than, or narrow the scope of guidelines outlined in ICAO Doc 9835. They aim to reduce opportunities for markedly different interpretations of ICAO Doc 9835 that impact on ICAO LPR test design and which result in variations between testing systems. These guidelines are developed in response to the issues that have emerged since 2003, that have caused such a divergence in LPR testing practices.

The guidelines are framed around, but not exclusively based on, what is contained in ICAO Doc 9835 (2010). They aim to provide a common framework to analyse, evaluate and select LPR tests. They also will assist test developers to design and develop tests in line with the ICAO LPRs together with best practice in language testing. They aim to provide a clear explanation of best practice issues in test instrument design to promote greater consistency and more uniform application of the ICAO LPRs standards, all of which is reflected at the core of ICAO LPR testing systems – in the design of the test instrument.

The ICAEA Board has been involved with the history of the ICAO LPRs since their introduction. Since 2005, ICAEA’s efforts have focused extensively on providing guidance to initially educate the industry, and more recently, improve LPR testing quality.

The guidelines have been carefully researched and developed by members of the ICAEA Board, supported by the ICAEA Research Group, who have many years of extensive academic and practical experience in both mainstream and ICAO LPR language testing. The ICAEA Board is committed to upholding the aims of the ICAO LPRs and best practice in language testing, while also taking account the practical issues involved in the development and implementation of ICAO LPR tests in this relatively new field of language testing.

MORE  information about the ICAEA Board

MORE  information about the ICAEA Research Group

A language test instrument is the tool or device which is administered during a language assessment to collect information about a person’s language skills and abilities to allow these to be measured. A test instrument is made up of sets of test tasks constructed and assembled in a meaningful way linked to test content and stimuli (recordings, videos and picture prompts) so that the language level(s), language knowledge, and language skills the test is designed to measure, occur in a predetermined and controlled way. To people administering or taking ICAO LPR tests designed to assess speaking and listening proficiency, test instruments typically appear as the test tasks or items along with their associated test content and other stimuli. This may typically be a list of scripted questions asked by an interlocutor or a means of collecting test-taker responses and answers (e.g. on paper, a computer), or any combination of these.

The design of the test instrument is outlined in a test specifications document – the blueprint for the design of the test instrument and the development of all the test versions in the test bank. A test developer needs to develop test specifications as an important part in the early stages of a test development project. The design of the test instrument affects all aspects of the quality of a language testing system and is a fundamental requirement to best practice in language testing. The test instrument is the cornerstone to the quality of a testing system and determines the validity and fairness of a test.

In fact, it is not possible for a test to achieve a good degree of validity if the test instrument is poorly designed.

The design of an LPR test instrument is the most fundamental factor that determines the quality of a testing system. And, because the design of the test influences all aspects of LPR testing – from the delivery, administration, rating and scoring and the standard of results issued – these guidelines focus on issues related to the design of the test instrument.

For each criterion the following is provided:


Key Issues & Considerations

An overview of the criterion and an explanation of why it is important.


ICAO Statements & Remarks

A reference to what is mentioned in ICAO Doc 9835 (2010) on issues directly and indirectly related to the criterion, if applicable.


Why this issue is important

An explanation of the importance of the issue and how it influences the quality of a test.


Best Practice Options

An explanation of options for best practice when addressing how the criterion needs to be considered in the design of a test.


External References

A list of key references from mainstream language testing literature that provide additional explanations for why the criterion is important.

Test Instrument The test instrument in language testing is a tool that is developed to measure a specific sample of test-takers’ language skills, knowledge and behaviour. The tool allows assessors to make inferences about test-takers’ abilities to use certain language skills. The test instrument is constructed and assembled so that all forms of the test are presented in a similar way and have the same structure. It includes all the material – content, input, tasks or items and rubric which test-takers engage with and complete for the purposes of the assessment. The test instrument may appear as a test paper, set of questions and content on a computer screen, examiner questions, tasks included in a booklet with test-taker and examiner instructions.

Validity is a judgment about how effective a test is in assessing a specific group of people and the purpose for the assessment. Validity includes considerations such as:

  • whether the test really measures what it intends to measure (construct validity);
  • the extent to which the test content is appropriate for and aligns with the test construct (content validity);
  • whether the way test results are used and the effect the results have in the real-world applications align with the test’s purpose and intentions (consequential validity); and
  • whether test users perceive the test content and task-types as appropriate and a fair and effective means of assessing their language skills according to the purpose of the test (face validity).
Authenticity Authenticity relates to how well the test content and task types reflect real-life situations. In specific purpose language testing (such as that required for the ICAO LPRs), authenticity is central to the quality of the test because the more the test content and task types mirror real-world communication situations that the test-takers engage in during the test, the more meaningful the results of the test because the outcomes of the test align with real-world language needs. Therefore, authenticity affects the validity of the test in specific purpose testing.
Reliability Reliability is a measure of a test’s consistency, within a test version across the test bank and between test administration periods or cycles. Reliability is determined by how consistent the measurement characteristics of a test are. This includes issues related to item or task difficulty (e.g. if test items are worth one mark in a listening test, for example, for higher reliability all other items worth one mark need to be equally difficult). Reliability also extends to the how similar the results and scores different versions of a test generate. If a test-taker of a certain language level takes different versions of the test and the results are the same, irrespective of which version they take, the test has higher reliability. Reliability basically means there is confidence that the test leads to the same outcomes. The more reliable a test is the more confidence we can have in the ability of the test to measure test-takers’ abilities.
Target Language Use (TLU) Domain The situations or contexts the test-takers use language in real-world communication. The TLU needs to be factored into the design of the test instrument so that the context and type of language that is assessed is related to real-world communication situations. Ensuring the TLU is reflected in the design of the test tasks allows more confidence in the decisions made on the basis of the results of the test. This is because the more aligned the TLU is with real world communication needs of test-takers, the greater the certainty that the test is measuring the type and complexity of language test-takers actually use in real-world situations.
Input Input is the material contained in the task, which test-takers need to process in some way, and to which they are expected to respond. Input is the stimulus material which influences the type of language the test-taker is expected to produce (in speaking tasks) or what they need to understand (in listening tasks). Input may be recordings, pictures, interlocutor statements or questions, prompts in a role-play and so on.
Washback Washback is related to the effect the test has on test users including test-takers, teachers, administrators, authorities and employers. It also affects language training programmes and attitudes within organisations who are affected by the outcomes of the test. Tests with higher content validity are more likely to be valued and respected by test-takers leading to more positive test user perceptions of the test. Similarly, tests which contain content and task types that effectively reflect a range of real communication needs lead training programmes, course curricula and teachers to provide training which develops the language skills that not only support potential test-takers in developing the required language skills for the test but also the language skills needed for effective communication in real-world situations. In such cases we refer to this as positive washback. Negative washback occurs when language tests influence perceptions and training negatively so that programmes and curricula focus on preparing test-takers to develop specific abilities that may help them achieve a better test result, but which in fact do not lead to any useful gains in language proficiency in the real-world communication contexts which the test was designed to assess.


Enhancing validity & effectiveness

The ICAO LPR Test Design Guidelines have been developed by ICAEA to assist Civil Aviation Authorities and Test Developers understand key issues related to the validity, effectiveness and fairness of test instrument design, and the effects they have on test takers and aviation safety.



Using the Test Design Guidelines

ICAEA provided four practical Workshops in 2019 to introduce and demonstrate the use of the Guidelines to Regulators and other stakeholders. Each workshop featured a hands-on programme especially designed for personnel involved in LPR test approval, auditing and development.