Standards of evidence help us make consistent and transparent judgements when assessing evidence about the effectiveness of a particular education policy, practice or program.

Why we need standards of evidence

Teachers, educators, leaders and policymakers make dozens of decisions every day aimed at improving outcomes for children and students.

How can they be confident that what they choose to do will work? How do they decide between one approach versus another?

To answer these questions in a consistent and transparent way, we need a way to evaluate the strength of research evidence on the effectiveness of a particular approach. We need standards of evidence.

AERO’s Standards of evidence

AERO’s standards of evidence establish our view on what constitutes rigorous and relevant evidence. When evidence is rigorous and relevant, it provides confidence that a particular approach is effective in a particular context.

The standards can apply to all forms of education evidence – whether generated through academic research or by teachers and educators through their daily practice.

AERO will primarily use the standards of evidence when undertaking those projects which are based on syntheses and causal (or evaluative) research.

In developing the standards of evidence, AERO has sought to build upon existing policies and research on evidence use across Australia and the world. AERO hopes that the Standards of evidence and associated evidence tools can help enhance quality conversations about generating and using evidence across the Australian education community.

What makes evidence rigorous and relevant?

There are many criteria that can be used to evaluate education evidence. AERO’s Standards of evidence prioritise two criteria: rigour and relevance. These criteria have been prioritised because they are the most important considerations when deciding whether a piece of evidence can give someone confidence that a particular educational approach will be effective in their context.

Rigorous evidence is defined as evidence produced using research methods (whether qualitative, quantitative or mixed methods) that isolate the specific impact of a particular educational approach.

Relevant evidence is defined as evidence produced in contexts that are similar to one’s own. Evidence is also relevant when it is derived from a large number of studies conducted over a wide range of contexts, as this suggests that the educational approach is not dependent on any particular contextual factor.

Although the Standards of evidence clearly differentiate between four levels of confidence that evidence can provide, the standards should be viewed as a continuum along which rigour and relevance gradually increase. Evidence at each level builds on the evidence from preceding levels.

How AERO uses the Standards of evidence

When conducting research syntheses, we use the standards to make consistent and transparent judgements in selecting the best available research evidence. When conducting causal (or evaluative) research, we use the standards to guide the design and implementation of the research, so that it generates high-quality evidence for the Australian education community.

There may be occasions when research evidence does not clearly fit within a particular level of confidence. When this is the case, we draw on expert research guidance to make an assessment about how confident we are in the effectiveness of an approach.

How can educators, teachers, leaders, policymakers and researchers use the standards?

The standards of evidence can be used by teachers, educators, leaders, policymakers and researchers interested in determining the strength of existing evidence for a particular approach in their particular context.

To help you use the standards in your context, we've developed Evidence decision-making tools. These tools help practitioners and policymakers evaluate their confidence in the effectiveness of a new or existing approach and provide implementation guidance appropriate to one’s level of confidence.

The standards can also be used by policymakers and researchers when designing evaluations. By designing evaluations aligned to the Standards of evidence, policymakers and researchers can try to generate evidence that meets a desired level of confidence.

AERO Standards of evidence diagram

The Standards of evidence

Level 1

Level 1 Low confidence

Research hypothesises why the approach should have positive effects.

What types of research fit within this level?

Research that presents a hypothesis for why the approach should have positive effects on outcomes.

This research does not provide data (whether qualitative or quantitative) to substantiate its claims that the approach is effective.

What features of research studies increase my confidence within this level?

The study provides an explanation that is based on well-established theories of learning and development.

The study clearly explains step-by-step how the approach is hypothesised to have positive effects.

Is followed by...
Level 2

Level 2 Medium confidence

Research associates the approach with positive effects

What types of research fit within this level?

Research that demonstrates a correlation between the approach and positive effects on outcomes; for example:

  • small-scale studies, such as case studies, and/or large-scale studies, such as cross-national surveys
  • studies using qualitative (for example, observations and/or interviews), quantitative (for example, statistical techniques) or mixed methods.

This research does not necessarily show that the approach causes positive effects as there could be other potential explanations.

What features of research studies increase my confidence within this level?

The study has been conducted in my own context or in contexts similar to my own.

The study corroborates findings from other studies conducted in many different contexts.

The study measures change in outcomes over time.

The study has a large sample size that is spread across more than one site.

The study uses strategies that discount the possibility that effects are due to chance.

The study compares one group that has been subject to the approach to another group that has not been subject to the approach.

The study is conducted by people or organisations independent of the developer of the approach.

The study has been conducted recently.

Is followed by...
Level 3

Level 3 High confidence

Research shows the approach causes positive effects.

What types of research fit within this level?

Research that meets the following criteria:

  • uses rigorous qualitative, quantitative or mixed methods that address issues like selection bias, history effects and maturation effects
  • uses outcome measures validated for the purposes of the study.

This research does not necessarily prove the approach causes positive effects in my context. This is because there may be other factors in my context that mean the approach will not work as intended.

What features of research studies increase my confidence within this level?

The study corroborates findings from other studies conducted in many different contexts.

The study measures change in outcomes over time.

The study has a large sample size that is spread across more than one site.

The study uses strategies that discount the possibility that effects are due to chance.

The study compares one group that has been subject to the approach to another group that has not been subject to the approach.

The study has been conducted by people or organisations independent of the developer of the approach.

The study has been conducted recently.

The study mitigates the likelihood that effects are simply due to the particular characteristics of those that participate in the study.

The study discusses and/or tests the key contextual factors that may influence the effectiveness of the approach.

Is followed by...
Level 4

Level 4 Very high confidence

Research conducted in my context or contexts similar to mine shows the approach causes positive effects.

What types of research fit within this level?

Research that meets the following criteria:

  • uses rigorous qualitative, quantitative or mixed methods that address concerns like selection bias, history effects and maturation effects
  • uses outcome measures validated for the purposes of the study
  • is conducted in my context or in contexts similar to mine.
  • synthesises the findings of rigorous research through a systematic review or meta-analysis of studies conducted in a range of contexts or in contexts similar to mine

What features of research studies increase my confidence within this level?

The study corroborates findings from other studies conducted in many different contexts.

The study identifies the factors that lead to the approach working, and the conditions that are necessary for the approach to be implemented on a larger scale.

The study assesses the effectiveness of the approach on different subgroups and explains reasons for any differences in effectiveness between subgroups.

The study monitors outcomes for different groups over time to ensure continued effectiveness.

Is followed by...

AERO has defined some common education research terms. Read our list of key concepts explained.