AI agents in VET: A shortcut to non-compliance?

Introduction

After recently reviewing a suite of VET training and assessment materials purchased from a well-known commercial supplier, I published an article titled, ‘Human versus AI: The future of assessment design’.

The resources I had reviewed were disappointingly unfit for purpose. I identified several critical issues, including:

  • Overly complex numbering and an excessive amount of fragmented documents made navigation difficult.
  • The content was cluttered with unnecessary instructions and jargon that is neither learner-friendly nor used in actual workplaces.
  • The training and assessment materials lacked details and felt like generic templates had been used rather than materials tailored for the Unit of Competency.

The overall quality was bland and disconnected. This is highly characteristic of AI generated content. I later confirmed that this supplier is a ‘leading’ user of AI agents to produce their materials.

This is a following-on article warning all who use, or are considering to use, an AI agent to develop training and assessment materials. Also, it is a warning to RTOs who are intending to purchase training and assessment materials that have been produced by an AI agent.

I am not against using AI. I design and develop training and assessment materials, and I use an AI chatbot to assist me.

Let’s first look at the difference between an AI chatbot, AI assistant, and AI agent.

What is the difference between an AI chatbot, AI assistant, and AI agent?

An AI chatbot describes the ‘chat’ format or interface with AI. An AI assistant describes the overall role of helping the user. And an AI agent describes an AI that can act autonomously.

In the Australian VET system, the distinction between these three tools is defined by their autonomy and integration into an RTO compliance workflow.

Here is one specific example of how an instructional designer might use each of the three AI applications.

AI chatbot: The conversational researcher

When unpacking a new unit of competency, a chatbot acts as a reactive sounding board. You manually copy technical jargon or Performance Criteria into a separate window to request plain-English explanations or workplace scenarios. It requires a constant back-and-forth exchange, where the AI only knows what you explicitly provide in the chat. This manual ‘copy-paste’ workflow makes it a useful external tool for brainstorming and simplifying complex training requirements.

AI assistant: The integrated co-writer

As you draft learner guides or assessment tools within your word processor, an AI assistant works alongside you in real-time. Because it is context-aware, it ‘sees’ your active document, allowing it to suggest knowledge checks or generate marking rubrics based on your specific text. You can refine your tone or create content without switching windows. This integrated approach streamlines the design process by providing immediate, relevant support inside your workspace.

AI agent: The autonomous worker

For complex tasks like gap analysis, an AI agent operates with high autonomy. Once you set a goal, such as auditing assessment documents against a unit’s requirements from training.gov.au, it proactively executes a multi-step workflow. The agent navigates sites, downloads requirements, and identifies evidence gaps across files without further prompting. Unlike reactive tools, it completes the entire project independently and delivers a finished mapping matrix directly to your inbox.

The following is a summary comparing the above three AI applications.

Using AI agents to develop training and assessment materials

While AI agents offer significant efficiency in automating high-volume tasks, their use within the Australian VET sector, specifically under the 2025 Standards for RTOs, poses significant risks when developing training and assessment materials.

Here are five ways that relying on an AI agent can degrade the quality of training and assessment materials

The compliance illusion

AI agents excel at keyword matching but lack the expert judgment to determine if a task measures competency. An agent might incorrectly flag an assessment tool as ‘fully mapped’ just because it identifies specific terms from a Performance Criteria. However, it cannot determine if the task actually represents a valid or authentic measure of competency in a real-world workplace. This creates a ‘compliance illusion’ that can lead to non-compliance during a compliance audit.

Compromised intellectual property

Developing high-quality, training and assessment materials requires significant investment. Unless you are using a private AI system, uploading an RTO’s documents can mean your IP is used to train external AI models. For many RTOs, this is not just a quality issue but a major breach of data sovereignty and a loss of competitive advantage.

Pedagogically flawed

Training Packages on training.gov.au are complex and frequently updated. An AI agent may inadvertently pull historic definitions or draw from outdated datasets. Furthermore, it often lacks the ability to interpret the Companion Volume Implementation Guide, which provides the essential context for how a unit should actually be delivered and assessed, leading to mapping that may be technically correct but pedagogically flawed.

Lack of accountability for ‘hallucinated’ mapping

If an AI agent produces a mapping matrix that claims a specific content or assessment item covers a Performance Criteria or Foundation Skill when it actually doesn’t, the responsibility still rests entirely with the RTO. Unlike a human instructional designer who can provide an evidence-based rationale, an agent cannot justify its professional judgment. This lack of accountability results in unreliable mapping.

Erosion of contextualisation

A core requirement of the VET sector is contextualisation. This means tailoring training and assessments to a specific industry or learner cohort. AI agents tend to produce generic, one-size-fits-all training and assessment materials. Relying on an autonomous agent risks producing ‘cookie-cutter’ materials that fail to meet compliance or contextualised requirements.

Conclusion: Efficiency must not replace expertise

The allure of ‘set and forget’ AI agents for resource generation and compliance mapping is tempting for the time-poor VET sector. However, there is a vast chasm between functional automation and quality materials. Speed is irrelevant if the output fails a compliance audit.

Outsourcing instructional design to autonomous AI agents risks sacrificing human professional judgment. While AI can complete complex tasks at lightning speed, it lacks the capacity to understand workplace nuances, specific learner cohorts, or the pedagogical depth of a Training Package.

For RTOs, the warning is clear. Investigate how developers of training and assessment materials have used AI. Is it a chatbot for research, an assistant for drafting, or an agent for autonomous creation? As human oversight decreases, the risks to compliance and learner outcomes increase.

Technology should be embraced as a tool, not a replacement. Use chatbots to brainstorm or assistants to refine prose, but keep the human instructional designer at the centre of the development process. In an era of AI agents, human expertise is the only safeguard against a ‘cookie-cutter’ future.

Please tell me what you think!

Human versus AI: The future of assessment design

Introduction

Recently, I reviewed a suite of VET training and assessment materials purchased from a well-known commercial supplier. Despite the provider’s reputation, the resources were disappointingly unfit for purpose. Focusing specifically on the assessment components, I identified several critical issues:

  • Poor usability: Overly complex numbering and an excessive amount of fragmented documents made navigation difficult.
  • Language and literacy barriers: The content was cluttered with unnecessary instructions and jargon that is neither learner-friendly nor used in actual workplaces.
  • Lack of context: Assessments lacked specific scenario details and felt like generic templates rather than materials tailored to the unit of competency being assessed.

The overall quality was bland and disconnected. This is highly characteristic of AI generated content. I later confirmed that this supplier is indeed a ‘leading’ user of AI to produce their materials. This serves as a stark reminder: while AI is a powerful tool, it cannot replace the human expertise required to create meaningful, compliant VET resources.

Structuring assessment tasks

While there are typically multiple ways to structure assessment tasks, the quality of that design varies significantly. At the highest level, a structure is effective, efficient, and compliant, balancing regulatory requirements with a smooth user experience. Other designs may be adequate and compliant but ultimately burdensome, creating unnecessary hurdles for both the learner and the assessor. More concerning are structures that are inadequate but appear compliant on the surface, masking deeper flaws. Finally, some structures are simply inadequate and obviously non-compliant, failing to meet the basic standards required for a valid assessment.

To illustrate these differences in practice, I have provided the following three distinct comparisons between AI-generated and human-designed assessment tasks across various industry sectors. These three examples highlight how a human-led strategy ensures that the structure remains both pedagogical and practical. While the AI versions may tick boxes in a literal sense, the human-designed versions demonstrate a deeper understanding of how to weave complex requirements into a logical, streamlined workflow that supports an effective, efficient and compliant assessment process.

Example 1. BSBCMM411 Make presentations

The following is the Performance Evidence for the BSBCMM411 Make presentations unit of competency.

The following are assessment tasks generated by AI.1

The following show the assessment tasks generated by a human.2

The following is a list1 of five reasons why the human-generated assessment structure for BSBCMM411 unit is superior to the AI-generated version.

  • Logical chunking of workflow: The human version groups the planning, delivery, and review into a single cohesive task for each presentation (Task 2 and Task 3), whereas the AI splits the planning and delivery into entirely separate tasks.
  • Reinforcement of the full cycle: By requiring the candidate to complete the entire cycle (Plan-Deliver-Review) for the first presentation before moving to the second, the human structure allows for immediate application of “lessons learned”.
  • Explicit material development: The human-generated structure explicitly includes the “development of presentation aids” within the planning phase, ensuring this critical requirement is not overlooked, while the AI description is more generic.
  • Clarity on “different” scenarios: The human structure clearly mandates that Task 3 must be a second presentation that is “different to the presentation delivered in Task 2”, providing a clear instruction for meeting the unit’s diversity requirements.
  • Reduced administrative confusion: In the AI structure, an assessor must jump back and forth between Task 2 (Planning) and Task 3 (Delivery) to grade one presentation. The human structure allows an assessor to finalise all evidence for “Presentation 1” within a single task block.

Example 2. CHCECE037 Support children to connect with the natural environment

The following is the Performance Evidence for the CHCECE037 Support children to connect with the natural environment unit of competency.

The following are assessment tasks generated by AI.1

The following show the assessment tasks generated by a human.2

The following is a list1 of three reasons why the human-generated assessment structure for CHCECE037 unit is superior to the AI-generated version.

1. Direct alignment with assessment requirements

The Performance Evidence explicitly requires evidence of supporting children’s knowledge on three occasions.

  • Human Design: Tasks 2, 3, and 4 in the human version clearly provide these three distinct opportunities (Indoor, Outdoor, and Aboriginal/Torres Strait Islander focused).
  • AI Design: The AI version only lists two clear implementation experiences (Experience A and B) in Task 3, potentially failing to meet the “three occasions” mandate.

2. Specific inclusion of cultural perspectives

The unit requires that at least one occasion must involve Aboriginal and/or Torres Strait Islander peoples’ use of the natural environment.

  • Human Design: Dedicates a specific, standalone task (Task 4) to ensure this mandatory requirement is met and observed.
  • AI Design: Completely omits this specific cultural requirement in its brief descriptions, focusing instead on generic activities like “seed growing” or “scavenger hunts”.

3. Clear Indoor/Outdoor distinction

The unit requires one indoor and one outdoor opportunity.

  • Human Design: Explicitly structures Task 2 as an indoor activity and Task 3 as an outdoor activity, ensuring the candidate covers both environments.
  • AI Design: Focuses heavily on the outdoor environment (Task 2 audit and Task 3 “nature play”), without clearly designating or requiring a specific indoor engagement.

Example 3. CPCCCA3010 Install windows and doors

The following is the Performance Evidence for the CPCCCA3010 Install windows and doors unit of competency.

The following are assessment tasks generated by AI.1

The following show the assessment tasks generated by a human.2

The human-generated assessment tasks ensure full compliance with the specific Performance Evidence for CPCCCA3010 unit. The following is a list1 of three reasons why the human-generated assessment structure is superior to the AI-generated version.

1. Inclusion of specific door types

The Performance Evidence requires the installation of a sliding cavity door unit and door and a pair of doors.

  • Human Design: Includes “Task 4” specifically for the sliding cavity door and “Task 5” for the pair of doors.
  • AI Design: Uses generic categories like “External Door” and “Internal Door”, which fails to explicitly require these two specialised installation types.

2. Accurate quantity of installations

  • Human Design: The human-generated tasks align perfectly with the requirement to install “a” (single) standard window
  • AI Design: The AI-generated Task 2 requires the candidate to install two windows, which adds an unnecessary burden not specified in the performance evidence.

3. Integration of planning and installation

  • Human Design: Integrates the “plan” and “prepare” requirements directly into every individual practical task (Tasks 2, 3, 4, 5, and 6). This ensures that the planning is context-specific to the unique requirements of a window, a sliding cavity door, or a pair of doors.
  • AI Design: Separates “Planning & Compliance” into a standalone Portfolio (Task 3). By treating planning as a generic administrative exercise rather than an embedded part of the installation process, the AI version risks a disconnect between the candidate’s theoretical plan and the actual technical preparation required for different types of frames and doors.

Conclusion: Why the human designer is irreplaceable

The examples above highlight a consistent pattern: while AI can generate a list of tasks that look like an assessment, it lacks the professional judgment to design a strategy that is actually fit for purpose.

The disparity between these two approaches boils down to three critical factors:

  • Nuance and compliance: As seen in the CPCCCA3010 and CHCECE037 examples, AI frequently misses specific requirements that are essential for a finding of competency. A human designer reads between the lines of a Training Package to ensure no mandatory evidence is overlooked.
  • Pedagogical workflow:  AI tends to “atomise” tasks into clinical, disconnected steps. In contrast, human designers understand how a job actually functions. By grouping planning, execution, and review into a single cohesive task, as seen in the BSBCMM411 example, humans create a natural assessment flow that mirrors real-world workplace practice rather than a fragmented digital checklist.
  • The “Goldilocks” principle of evidence: AI often oscillates between two extremes: providing too little detail or creating “assessment bloat” by requiring more work than is necessary. A human expert knows how to design a strategy that is “just right”, meeting every requirement specified by the unit of competency without placing an unnecessary administrative burden on the learner or the assessor.

AI is a powerful assistant for brainstorming or drafting, but it is a poor architect. In the high-stakes environment of VET compliance, an assessment strategy is more than just a document. It is a roadmap that needs to be accurate and compliant. The “human-in-the-loop” must remain the “human-at-the-helm.”

Investing in human-led design isn’t just about avoiding “bland” materials; it’s about ensuring that our VET students are truly competent and that our RTOs remain compliant.

Footnotes:

1 On the 2nd of March 2026, Gemini was the AI platform used to generate the assessment tasks for the three examples. It was also used to compare the assessment structure generated by AI and the human.

2 Alan Maguire was the human who generated the assessment tasks for the three examples. He has had more than 40 years experience designing training and assessment. Alan may be getting older, but he is not yet redundant.

Using AI is not learning

Introduction

As a trainer and assessor, I have been delivering the Certificate IV in Training and Assessment qualification since it was released in 2004. Over the past two decades I have seen many changes. A new phenomenon has recently appeared.

Over the past two years the answers to knowledge questions that are submitted for assessment have significantly improved. Two years ago, I would have seen many poorly written answers with spelling and grammatical errors. Last year, there was a noticeable improvement with far less spelling and grammatical errors. This year, most answers to knowledge questions are very well written.

Usually, at least half of the participants attending my Certificate IV in Training and Assessment courses have English as their second language. And I have come to expect spelling and grammatical errors. But things have changed. Miraculously, I am now assessing written answers to questions that seem to be too good to believe.

Also, I am seeing many more people spelling words using American English rather than Australian English. I am seeing the letter ‘z’ far too often.

What has happened?

Over the past two years there has been a substantial uptake in people using AI. Like many people, I too use AI often. And like many people, I find it to be useful.

As a trainer, I tell my participants that AI may be useful. However, I asked them not to use AI for answering their knowledge questions. I tell them that there are five ways I can tell if a response has been generated by AI:

  • Consistency: AI responses are often highly consistent in tone, style, and factual accuracy, making them seem almost too perfect.
  • Pattern Recognition: Look for repetitive phrases, unnatural sentence structures, or overreliance on certain keywords.
  • American English Bias: AI may favor American English, using “z” instead of “s” in words like “analyze” or “realize.”
  • Numbered Lists: AI often generates numbered lists, even when they are not explicitly requested.
  • Key Phrase Followed by Colon: Pay attention to responses that frequently use a key phrase followed by a colon, followed by additional information. This is a common pattern in AI-generated text.

By the way, I used AI to generate the above list.

People are using AI

I am assuming that many participants studying for a vocational education and training (VET) qualification are using AI. And I will assume that the number of participants using AI will grow. It is likely that some participants will be tempted to use AI to help them answer their knowledge questions.

Some participants make it easy to identify when an answer has been generated by AI. I see answers with the following characteristics:

  • Key Phrase Followed by Colon: Responses that have used a key phrase followed by a colon, followed by additional information.
  • Over capitalisation (using too many capital letters)
  • The letter ‘z’.

Grammarly is AI

Recently, I asked one of my participants if they were using AI to answer the knowledge questions. They told me that they were not. As I showed the participant why I had asked my question, they told me that they use Grammarly. Luckily, I knew that Grammarly is AI because I was able to inform them that the use of this application was likely to be doing more than just correcting spelling and grammatical errors. The participant agreed and said that they would immediately remove the application.

The following is a snippet from the Grammarly homepage.

Grammarly will write text, not just correct spelling and grammar. The same thing is likely to be happening for people with English as their second language when they are using translation apps. I’m not sure, but if you know, I would be happy to hear from you.

Using AI to investigate the use of AI

Many answers to knowledge questions are looking too perfect to have been written by a human. But, how do I know if an answer has been generated by AI? I provided the answer from one of my participants and I asked AI if it had been written by AI. Here is AI’s response.

AI tells me that it is highly probable that the text was generated by AI.

This backs up my hunch that the participant’s answer to the knowledge question was likely to have been generated by AI. And I have a hunch that many participants are using AI to write answers to their knowledge questions.

AI is getting better

Two years ago, even a year ago, I would have been getting many more incorrect answers from AI. It is continuously getting better and because it is connected to the internet, the AI-generated responses can be astonishingly accurate. Here are examples when I have asked AI to answer two different knowledge questions.

Example 1

I did not provide the table. AI generated it.

Example 2

There was no need to go to the website. AI provided the link.

AI can give wrong answers

Although AI is getting better, it still can give the incorrect answers.

Here is an answer to a knowledge question submitted by a participant.

The correct answer that I’m looking for is, ‘JSA stands for Jobs and Skills Australia’.

I asked AI, ‘what does JSA stand for’, and the following is what I got.

This tells me that the participant probably got their answer from AI. As an assessor, it is good that AI is still providing some incorrect answers.

In conclusion

Participants studying for VET qualifications are using AI. On one hand we are encouraging our participants to use AI to help them perform their work. But on the other hand, we tell our participants not to use AI to answer the knowledge questions.

Regardless of what we say, some participants are using AI to answer their knowledge questions. Their answers may have the following characteristics:

  • Answers that are very well written without spelling and grammatical errors
  • Answers that are in a format that looks AI-generated
  • Answers with the letter ‘z’
  • Answers that are obviously incorrect.

I believe that many participants will use AI. And I believe that many participants will not use AI as a tool to help them learn something. Instead, it is only being used to blindly answer questions – no thinking involved.

Using AI is not learning.

It would be good to hear what you think about this topic.

Please contact me, Alan Maguire on 0493 065 396, if you need to learn how to legitimately use AI as a trainer and assessor, or legitimately use AI if you are studying for their TAE40122 Certificate IV in Training and Assessment qualification.

Do you need help with your TAE studies?

Are you a doing the TAE40122 Certificate IV in Training and Assessment, and are you struggling with your studies? Do you want help with your TAE studies?

Ring Alan Maguire on 0493 065 396 to discuss.

Contact now!

logo otws

Training trainers since 1986