Using generative AI in item development for the driving theory test: What does the future hold?

Using generative AI in item development for the driving theory test: What does the future hold?


Generative artificial intelligence (AI) tools have produced both excitement and uncertainty across many industries, and the global assessments landscape is no exception.

Back in July 2021, the Association of Test Publishers (ATP) released a white paper titled ‘Artificial Intelligence and the Testing Industry: A Primer’, exploring a range of potential applications for AI in the credentialing field as well as ‘the appropriate responsible use of AI’1.

Shifting forward to today, AI is dominating media headlines around the world. An increasing number of exam content creators have started actively exploring how cutting-edge AI technologies might be used to develop test items for a range of assessments.

As the international driving test community ‘prepares drivers for smart mobility’ at the 56th CIECA Congress 2024, AI is understandably giving everyone a lot to think about, including:

  • What differences between AI and human-driven item quality might we see with item development in the future?
  • With the integration of AI in the item writing process, what ethical or legal implications need to be considered?
  • What is the potential cost of incorporating AI into existing processes? With the level of accuracy depending largely on the quality and quantity of data, what is the cost of human effort required to refine and improve upon the generated content?

Generative AI allows users to submit written text (prompts) specifying a task.

This could include writing a multiple-choice item, crafting a scenario about driving in difficult conditions, suggesting plausible but incorrect response options, or editing existing text according to style guidelines. While AI can do these things, the obvious question is “How well?” closely followed by “How will this fit into our existing test development processes?

Opinions across the global assessments landscape vary widely around the potential for automatic item generation — whether using AI or other methods such as template-based approaches. While simple requests for items may produce flawed and relatively ‘low-level’ content, it is possible to get well-constructed.

Items across a range of cognitive levels using the right prompt. Incorporating the same instructions provided to human item writers regarding format, structure, distractors, and other item elements is just as necessary for generative AI item development as conventional item writing processes. Including experts on item development and evaluation through the whole process is key — as is an organized and scientific approach to understanding the results generated.

In 2023, we conducted several studies looking into the quality and characteristics of items generated by popular, free-to-use AI platforms.

We created a series of prompts based on our item writing guidelines, which also included comprehensive instructions and examples of cognitive level, sample item formats, and style guidelines.

Our research helped us to understand the current capabilities (and potential limitations) of AI for item development:

  1. We found that the creation of quality draft test items using large language models (LLMs) can reduce item writing time considerably.
  2. AI-generated items appear comparable to unedited human-written items across a range of categories, including:
    • Did the item address the appropriate topic?
    • Is the item appropriate for the exam?
    • How much editing does the item require?
    • Does the item contain factual errors?
    • Is the item key correct?
    • Are the incorrect options plausible and incorrect?
  3. The cognitive level, or type of thinking (remember, apply, analyze) required to answer the item, did not always match the requested level for both AI and human items. Asking the AI to make an item more difficult, resulted in longer/wordier items that were not perceived as being more difficult. When using AI to generate numerous items at once from a test content area, duplicate content was produced, and items were uneven in their coverage of topics within that content area.

Further research exploring the instructions (prompts) used to generate items is underway. Studies to be presented in 2024 will explore which elements of the prompt are necessary to produce quality items, and whether quality is maintained across different professional fields. We will study if there are time savings when using AI to investigate any productivity gains. We are also working on methods for inserting certain types of content into the AI item-generation process. For example, this might include internal training documentation or relevant statutes. This could mean that in the future, countries would be able to integrate their own driving laws and ‘rules of the road’, applying country-specific regulations into the theory test item writing process.

As high stakes testing methods continue to evolve, questions will inevitably arise around content ownership when using generative AI and how we can prevent bias or discrimination. These concerns are not unique to the testing industry and any industry using generative AI will have to navigate them accordingly. As the ATP whitepaper highlighted: ‘When inherent bias and a diverse user population are not accounted for in developing and using AI there are great risks related to bias and discrimination in outcomes’.2 As an industry we need to collectively ensure we’re using AI tools with ethical and technical oversight and be transparent about their application.

So what does the future look like?

  • We expect that AI will provide a range of assistive capabilities for subject matter experts and content developers.
  • AI will likely reduce the time for producing draft test items, sample content, and quality item writing.
  • AI may start to play more of a part in the editorial process, providing suggestions to focus item writing on specific areas, or be part of the feedback process.
  • Subject matter experts and experienced test developers will continue to review and verify test items with technical and ethical oversight through a human-controlled approach.
  • The integration of AI into future driving theory test formats holds immense promise, and this evolution, coupled with ethical considerations and ongoing refinement, has the potential to reshape how learner drivers are assessed.

1 ‘Artificial Intelligence and the Testing Industry: A Primer’- A Special Publication from ATP, Authored by the International Privacy Subcommittee of the ATP Security Committee July 6, 2021, p.3

2 ‘Artificial Intelligence and the Testing Industry: A Primer’- A Special Publication from ATP, Authored by the International Privacy Subcommittee of the ATP Security Committee July 6, 2021, p.9


About Pearson VUE

Pearson VUE has been a pioneer in the computer-based testing industry for decades, delivering more than 16 million certification and licensure exams annually in every industry from academia and admissions to IT and healthcare. We are the global leader in developing and delivering high-stakes exams via the world's most comprehensive network of nearly 20,000 highly secure test centers as well as online testing in over 180 countries. Our leadership in the assessment industry is a result of our collaborative partnerships with a broad range of clients, from leading technology firms to government and regulatory agencies. For more information, please visit PearsonVUE.com.

Media contact

Greg Forbes, Global PR & Communications Manager
+44 (0) 7824 313448
greg.forbes@pearson.com

Generative artificial intelligence (AI) tools have produced both excitement and uncertainty across many industries, and the global assessments landscape is no exception.

Back in July 2021, the Association of Test Publishers (ATP) released a white paper titled ‘Artificial Intelligence and the Testing Industry: A Primer’, exploring a range of potential applications for AI in the credentialing field as well as ‘the appropriate responsible use of AI’1.

Shifting forward to today, AI is dominating media headlines around the world. An increasing number of exam content creators have started actively exploring how cutting-edge AI technologies might be used to develop test items for a range of assessments.

As the international driving test community ‘prepares drivers for smart mobility’ at the 56th CIECA Congress 2024, AI is understandably giving everyone a lot to think about, including:

  • What differences between AI and human-driven item quality might we see with item development in the future?
  • With the integration of AI in the item writing process, what ethical or legal implications need to be considered?
  • What is the potential cost of incorporating AI into existing processes? With the level of accuracy depending largely on the quality and quantity of data, what is the cost of human effort required to refine and improve upon the generated content?

Generative AI allows users to submit written text (prompts) specifying a task.

This could include writing a multiple-choice item, crafting a scenario about driving in difficult conditions, suggesting plausible but incorrect response options, or editing existing text according to style guidelines. While AI can do these things, the obvious question is “How well?” closely followed by “How will this fit into our existing test development processes?

Opinions across the global assessments landscape vary widely around the potential for automatic item generation — whether using AI or other methods such as template-based approaches. While simple requests for items may produce flawed and relatively ‘low-level’ content, it is possible to get well-constructed.

Items across a range of cognitive levels using the right prompt. Incorporating the same instructions provided to human item writers regarding format, structure, distractors, and other item elements is just as necessary for generative AI item development as conventional item writing processes. Including experts on item development and evaluation through the whole process is key — as is an organized and scientific approach to understanding the results generated.

In 2023, we conducted several studies looking into the quality and characteristics of items generated by popular, free-to-use AI platforms.

We created a series of prompts based on our item writing guidelines, which also included comprehensive instructions and examples of cognitive level, sample item formats, and style guidelines.

Our research helped us to understand the current capabilities (and potential limitations) of AI for item development:

  1. We found that the creation of quality draft test items using large language models (LLMs) can reduce item writing time considerably.
  2. AI-generated items appear comparable to unedited human-written items across a range of categories, including:
    • Did the item address the appropriate topic?
    • Is the item appropriate for the exam?
    • How much editing does the item require?
    • Does the item contain factual errors?
    • Is the item key correct?
    • Are the incorrect options plausible and incorrect?
  3. The cognitive level, or type of thinking (remember, apply, analyze) required to answer the item, did not always match the requested level for both AI and human items. Asking the AI to make an item more difficult, resulted in longer/wordier items that were not perceived as being more difficult. When using AI to generate numerous items at once from a test content area, duplicate content was produced, and items were uneven in their coverage of topics within that content area.

Further research exploring the instructions (prompts) used to generate items is underway. Studies to be presented in 2024 will explore which elements of the prompt are necessary to produce quality items, and whether quality is maintained across different professional fields. We will study if there are time savings when using AI to investigate any productivity gains. We are also working on methods for inserting certain types of content into the AI item-generation process. For example, this might include internal training documentation or relevant statutes. This could mean that in the future, countries would be able to integrate their own driving laws and ‘rules of the road’, applying country-specific regulations into the theory test item writing process.

As high stakes testing methods continue to evolve, questions will inevitably arise around content ownership when using generative AI and how we can prevent bias or discrimination. These concerns are not unique to the testing industry and any industry using generative AI will have to navigate them accordingly. As the ATP whitepaper highlighted: ‘When inherent bias and a diverse user population are not accounted for in developing and using AI there are great risks related to bias and discrimination in outcomes’.2 As an industry we need to collectively ensure we’re using AI tools with ethical and technical oversight and be transparent about their application.

So what does the future look like?

  • We expect that AI will provide a range of assistive capabilities for subject matter experts and content developers.
  • AI will likely reduce the time for producing draft test items, sample content, and quality item writing.
  • AI may start to play more of a part in the editorial process, providing suggestions to focus item writing on specific areas, or be part of the feedback process.
  • Subject matter experts and experienced test developers will continue to review and verify test items with technical and ethical oversight through a human-controlled approach.
  • The integration of AI into future driving theory test formats holds immense promise, and this evolution, coupled with ethical considerations and ongoing refinement, has the potential to reshape how learner drivers are assessed.

1 ‘Artificial Intelligence and the Testing Industry: A Primer’- A Special Publication from ATP, Authored by the International Privacy Subcommittee of the ATP Security Committee July 6, 2021, p.3

2 ‘Artificial Intelligence and the Testing Industry: A Primer’- A Special Publication from ATP, Authored by the International Privacy Subcommittee of the ATP Security Committee July 6, 2021, p.9