Research and analysis

National Reference Test 2021: contextual information

Published 9 December 2021

Applies to England

Authors

Ming Wei Lee and Jamie Cockcroft

Executive summary

The variable disruption faced by institutions and students during the coronavirus (COVID-19) pandemic meant that exams could not be held in a way that was fair in summer 2021. Following the cancellation of the summer 2021 exams, GCSE grades were determined by teachers based on a range of evidence and therefore the National Reference Test (NRT) was not used as a source of evidence in GCSE awarding in summer 2021. The NRT 2021 results, however, provide an objective source of data on year 11 performance in English language and maths in 2021.

As in previous years, to gain good understanding of the test sample and how it compared to samples in past iterations of the NRT, Ofqual carried out the routine series of contextual analyses of NRT attendance and test motivation.

The NRT’s sampling design aims to draw a student sample representative of the full range of subsequent GCSE performance via a school sample that is stratified on school size and historical GCSE performance. The sample is not designed to be representative in terms of all student background variables, but we do routinely analyse NRT attendance by student background variables to help us understand potential for sample bias. In 2021, similar to previous years, sampled students in the identified subgroup(s) under the following student background variables were more likely to attend the NRT:

  • school type: those in participating selective (and independent) schools
  • ethnicity: those with Asian, Chinese, Black and Any Other Ethnic Group backgrounds
  • major language: those with a language other than English as their major language
  • special education needs and disabilities (SEND): those without SEND
  • free school meal (FSM) eligibility: those not eligible
  • income deprivation: those with lesser income deprivation
  • key stage 2 prior attainment: those with higher prior attainment
  • GCSE grade: those who went on to achieve higher GCSE grades

This pattern of differential attendance by student background variables leads us to expect NRT results to be upwardly biased compared to the target population, given the known relationships between those variables and GCSE exam performance. However, because of its cross-year stability, the bias is unproblematic for the NRT’s usual purpose of tracking successive cohorts’ performance. For the purpose of understanding the impact of the pandemic on students’ learning, the cross-year stability of the bias helps rule out sample difference as an explanation for any finding of a change in test performance in 2021.

In the NRT student survey in 2021, participants in both subjects reported lower perceived importance of the NRT, greater indifference to their own NRT performance, and less test preparation, but little change in test-taking effort, compared to their 2020 counterparts.

Statistical modelling exploiting the historical relationship between test motivation and test performance on the NRT suggested that reduced test motivation in 2021 was not a major contributor to the decline in performance in maths and that it did not render the English results under-estimates of the population’s attainment. Modelling exploiting the historical birth-date effect on test performance found the 2021 participants’ higher age in months (resulting from the postponement of the 2021 test to the summer term) did not render comparisons of their performance with their 2020 counterparts unjustified.

Introduction

The National Reference Test (NRT) is designed to be taken annually by a nationally representative sample of Year 11 students shortly before their GCSE exams to provide an additional source of information that can be used in GCSE English language and maths awarding later in the summer. The NRT is supplied by the National Foundation for Educational Research (NFER), which reports annually on the operation of the test and the test results. To contextualise the annual test results, Ofqual analyses the background characteristics of NRT participants and, through the NRT student survey, collects data about their test-taking motivation and preparation for the GCSE subject they have taken the NRT in. After the summer, Ofqual examines the relationship between participants’ NRT performance and their subsequent attainment at GCSE, to understand how well NRT results function as an indicator of subsequent GCSE performance. Summaries of Ofqual’s contextual analyses for NRT 2017-2019 was published in 2019.

In 2021, following the cancellation of GCSE exams due to the disruption of education caused by the COVID-19 pandemic, the results of NRT 2021 could not be used in GCSE awarding in summer 2021. NRT 2021 went ahead because the NRT provides an objective measure of year 11 performance in English language and maths and longitudinal data collection is important. It is therefore just as important this year as in previous years to understand the characteristics of the NRT 2021 participants. This paper summarises results of Ofqual’s routine analysis of the background characteristics, prior attainment profile and test motivation of the NRT 2021 sample and how it compares to previous years.

Data sources

In 2017 to 2019, the NRT in each subject (that is, English and maths) had 12 test booklets, 8 ‘live’ and 4 ‘refresh’ booklets. In 2020, the maths test continued to have 8 live and 4 refresh booklets while the English test had only 8 live booklets. In 2021, both subjects had only 8 live booklets. The live booklets contain live test items, while the refresh booklets contain a mixture of live and new items. In any year when there are both live and refresh booklets, only attendees taking the 8 live booklets contribute to the estimation of the national performance standard. Participants taking the refresh booklets contribute to the development of new items which may be used in future iterations of the NRT. The analyses contained in this paper pertain to all participants taking live booklets in all years.

NRT participants are invited to fill in the NRT student survey after completing the one-hour NRT. The NRT student survey has 3 parts:

  1. self-ratings on 10 items about NRT-specific test motivation
  2. short factual questions about tuition and learning in the relevant GCSE subject
  3. self-ratings on 10 items about their motivation, feelings, and attitudes about learning the relevant GCSE subject.

Part 1 is the same in the English and maths versions of the survey, while parts 2 and 3 differ slightly between the 2 versions. In 2021, some questions were modified to consider the fact that students would not be sitting GCSE exams, and new items were added to part 2 to examine students’ perceived learning progress during the pandemic.

The GCSE English language and maths performance data used in our analyses are the candidate-level data that the 4 exam boards offering the relevant GCSEs in England submitted to Ofqual before GCSE results were issued. The grades of 2020 were the higher of centre assessment grades or calculated grades. The grades of 2021 were teacher assessed grades.

The prior attainment data used in our analyses is from the Key Stage 2 (KS2) tests taken 5 years previously, and was extracted from the National Pupil Database (NPD). The KS2 measure used will be referred to as the ‘KS2 normalised score’, which can range from 0 to 100 [footnote 1]. It was calculated by transforming and combining English reading marks and maths total marks in the KS2 dataset, following exam boards’ current practice in generating KS2-based prediction matrices for GCSE awarding [footnote 2]. The same KS2 normalised score of different years represents the same relative standing in the respective KS2 cohort, but may not represent the same attainment because of cohort-based normalisation [footnote 3]. For students in the NRT dataset who could be matched to the NPD, data on ethnicity, language background, special educational needs (SEND) status, free school meal (FSM) eligibility, and income deprivation affecting children index (IDACI) score was also obtained from the latest school census extract of the NPD.

The NRT performance data and survey weightings used in our analyses, as well as the sampled students’ names and dates of birth (which were used for matching to the NPD and the GCSE dataset), gender and school information, were supplied by NFER and Cito, the NRT supplier and sub-contractor. NRT performance data are in the form of 10 plausible values for each participant. The plausible values are on the NRT ability scale created through an Item Response Theory (IRT) analysis of 5 years’ NRT test data after the 2021 test. In each subject, ability thresholds for grades 4, 5 and 7 have been established on the scale, which allow each plausible value of each participant to be classed as at or above NRT grade 4, 5, and 7 or not.

The classification of participating schools by region was based on publicly available Local Education Authority codes. The classification of participating schools as either ‘independent or selective centres’ or not was based on schools’ self-declared centre type information, which the exam boards have shared with Ofqual.

Results and discussion

NRT attendance

NRT is a statutory assessment for most schools, but every year not all sampled schools take part (that is, there are school non-responses). For NRT 2021, initially 348 schools were recruited for the test that was to take place in February and March 2021. Then, a new wave of COVID-19 cases prompted a national lockdown, and further restrictions on in-person teaching caused NRT 2021 to be postponed to the summer term. Subsequently, many schools requested to withdraw from the test. Over a third of the originally recruited sample withdrew, and, eventually 216 schools took part, compared to 334 in 2020. As reported in NFER’s NRT 2021 Results Digest, despite the lower school participation rate, the achieved school sample for NRT 2021 is similar to the school samples in previous years, in terms of the stratifying variables, namely, school size and school’s historical GCSE performance.

Within the schools that agree to take part, students are randomly selected for the test, but not all selected students take the test (that is, there are student non-responses). Table 1 shows the number of students in the achieved sample (that is, NRT attendees), the drawn sample (that is, attendees and absentees in NRT-participating schools combined) and the corresponding attendance rate in the NRT from 2017 to 2021.

Table 1. Drawn and achieved sample size and attendance rate in NRT 2017-2021

Subject Sample 2017 2018 2019 2020 2021
NRT English Drawn 8,039 7,351 7,924 7,845 5,124
NRT English Achieved 7,082 6,193 6,739 6,639 4,030
NRT English Attendance 88% 84% 85% 85% 79%
NRT Maths Drawn 8,075 7,319 7,911 7,883 5,152
NRT Maths Achieved 7,144 6,169 6,825 6,756 4,143
NRT Maths Attendance 88% 84% 86% 86% 80%

To some extent, NRT attendees and absentees are self-selected groups. Any self-selection bias which can be linked to academic performance could lead to the results obtained from the attendees not being as nationally representative as has been envisaged, which in turn could jeopardise or complicate the intended use of the test results in GCSE awarding. Given the NRT’s purpose of measuring cohort-level changes over time, a bias caused by student absence in any one year is not by itself problematic. However, for NRT results from different series to be comparable, the sample bias has to be stable from year to year. Therefore, every year, we look beyond the headline attendance rate and compare the achieved and drawn samples on a range of student background variables to understand the potential for bias in the overall test results and check for cross-year stability of any bias. The variables examined are: geographical region, school type (either ‘independent or selective school’ or not), gender, ethnicity, major language, SEND status, FSM eligibility, IDACI score, KS2 normalised score and GCSE grade.

Sample composition by student characteristic

Figures 1 and 2 present breakdowns of the drawn and achieved samples in each iteration of the NRT. There is a plot for each background variable. Each plot contains 2 or more panels, each corresponding to a subgroup within the relevant variable.

For each year, the ‘drawn’ percentages of the different panels under a variable add to 100 and they indicate the relative sizes of the subgroups in that year’s drawn sample. In each panel, the drawn percentages are similar but not exactly the same across the years, reflecting: (i) genuine change between years in the student population with respect to the relevant variable, (ii) change between years in the variable itself, (iii) change resulting from change between years in school-level non-response pattern, and or (iv) random fluctuations between years (which can be expected given that none of the variables is a stratifier in the NRT sampling design).

For each year, the ‘achieved’ percentages of the different panels under a variable also add to 100 and they indicate the relative sizes of the subgroups in that year’s achieved sample. For any variable, the drawn and achieved percentages being the same means that the drawn and achieved samples have the same composition with respect to that variable, indicating that student absences from the NRT are not sensitive to that variable. In contrast, the drawn and achieved percentages being different means some difference in composition between the drawn and achieved samples, which suggests that student absences are sensitive to the variable under consideration. If the variable in question has a known relationship with exam performance, there is reason to surmise that the achieved sample’s test results are biased. For example, if there were a lower percentage of males in the achieved than in the drawn sample it would suggest that males are more likely to be absent from the NRT than females. As such, the achieved sample contains a higher-than-expected proportion of females, and its test results are likely to be upwardly biased if there is independent evidence that females generally perform better at exams in the subject.

As mentioned above, for NRT results from different series to be comparable, the sample bias has to be stable from year to year. The plots, below, allow for year-on-year comparison of the difference between the corresponding drawn and achieved percentages in each plot panel, and suggest all changes between years are negligible, testifying to the cross-year stability of the sample bias.

Figure 1a. NRT English drawn and achieved sample composition by Geographical Region

Figure 1b. NRT English drawn and achieved sample composition by School Type

Figure 1c. NRT English drawn and achieved sample composition by Gender

Figure 1d. NRT English drawn and achieved sample composition by Ethnicity

Ethnicity: AOEG = Any Other Ethnic Group, ASCH = Asian and Chinese combined, BLAC = Black, Mixed = Mixed background, WHIT = White, Unknown = unknown and unclassified combined.

Figure 1e. NRT English drawn and achieved sample composition by Major Language

Figure 1f. NRT English drawn and achieved sample composition by SEND

Figure 1g. NRT English drawn and achieved sample composition by FSM Eligibility

Figure 1h. NRT English drawn and achieved sample composition by IDACI Score Quintile

Year-specific quintile boundary scores were set which divided all target-age pupils in the school census extract of the NPD with known IDACI scores into 5 groups of roughly equal size.

Figure 1i. NRT English drawn and achieved sample composition by Normalised KS2 Score Quintile

Year-specific quintile boundary scores were set which divided all target-age pupils in the KS2 cohort with known KS2 normalised scores into 5 groups of roughly equal size.

Figure 1j. NRT English drawn and achieved sample composition by GCSE English Grade

Figure 2a. NRT Maths drawn and achieved sample composition by Geographical Region

Figure 2b. NRT Maths drawn and achieved sample composition by School Type

Figure 2c. NRT Maths drawn and achieved sample composition by Gender

Figure 2d. NRT Maths drawn and achieved sample composition by Ethnicity

Ethnicity: AOEG = Any Other Ethnic Group, ASCH = Asian and Chinese combined, BLAC = Black, Mixed = Mixed background, WHIT = White, Unknown = unknown and unclassified combined.

Figure 2e. NRT Maths drawn and achieved sample composition by Major Language

Figure 2f. NRT Maths drawn and achieved sample composition by SEND

Figure 2g. NRT Maths drawn and achieved sample composition by FSM Eligibility

Figure 2h. NRT Maths drawn and achieved sample composition by IDACI Score Quintile

Year-specific quintile boundary scores were set which divided all target-age pupils in the school census extract of the NPD with known IDACI scores into 5 groups of roughly equal size.

Figure 2i. NRT Maths drawn and achieved sample composition by Normalised KS2 Score Quintile

Year-specific quintile boundary scores were set which divided all target-age pupils in the KS2 cohort with known KS2 normalised scores into 5 groups of roughly equal size.

Figure 2j. NRT Maths drawn and achieved sample composition by GCSE Maths Grade

Effects of background variables on NRT attendance: logistic regression modelling

A formal assessment of changes across the years in the effects of the background variables on NRT attendance was undertaken using mixed-effects logistic regression modelling. Two fixed effects were entered in each model: Year and Background Variable, along with their interaction. NRT attendance was treated as a binary outcome variable of either attended or did not attend. To account for nesting of attendance in schools, school was added as a random intercept in each model.

IDACI score, KS2 normalised score and GCSE grade were treated as categorical variables in the plots above, but were entered as continuous variables and mean-centred in their respective models (sampled students with no data on these variables were excluded from the respective analysis). All other variables were treatment-coded, with the respective most numerous (in 2021) category as the reference group:

  • region: South East
  • school type: not independent or selective
  • gender: male
  • ethnicity: White
  • major language: English
  • SEND: no SEND
  • FSM eligibility: not eligible

Students with no data on these categorical variables were put in the ‘unknown’ category in the respective analysis. Year was also a treatment-coded categorical variable, with 2021 as the reference group.

Because of the treatment coding of Year, each background variable’s main effect in the model tells us about its effect on attendance in 2021, while its interaction with each of the years 2017 to 2020 tells us about how its effect in 2021 compared to the same effect on attendance in 2017, 2018, 2019 or 2020. Omnibus F-tests were performed to test for the main effect of Year, the main effect of the background variable across the years, and the interaction between Year and the background variable, which, if significant, tells us that the effect of the variable on attendance was not uniform across the years.

As the analyses of the 2 subjects yielded similar results, we discuss the 2 subjects together focusing on effects and interactions where a statistically significant effect was identified (p ≤ .050). There are 2 results that are common to all models. First, all 20 models found a statistically significant main effect of Year indicating higher overall attendance in 2017 than in 2018 to 2020 and lowest overall attendance in 2021, consistent with what is shown in Table 1.

Second, the omnibus tests for all but 3 models showed no significant interaction between Year and the respective background variable, indicating that any effect of a background variable on attendance was highly consistent across the 5 years of the NRT. The first exception was a significant Year by GCSE grade interaction for NRT English attendance. Examination of this interaction found a stronger correlation between GCSE English grade and probability of attendance at NRT English in 2020 (than in 2021) and no difference between 2017 to 2019 and 2021 in that correlation.

The unusual correlation may have more to do with the different way of awarding GCSE grades (based on teacher prediction) in 2020 than with a genuine change in NRT attendance pattern in 2020. The second exception was the significant Year by IDACI interaction for NRT maths attendance. Examination of this interaction suggested a slightly weaker relationship between IDACI score and NRT maths attendance rate in 2018 (compared to 2021) and no difference in that relationship in any of the other years (compared to 2021), so it was not relevant to our 2020 and 2021 comparison. The third exception was the significant Year by SEND interaction for NRT maths attendance. Examination of this interaction found, between 2020 and 2021, a slightly smaller drop in attendance rate among participants with SEND and no drop among participants with unknown SEND status, compared to participants without SEND.

Given the large number (20) of models examined, it was possible for some effects to be statistically significant by chance. The few exceptions (at least one of which was not relevant to comparing the 2020 and 2021 samples) should not detract from 17 non-significant interactions between Year and the respective variable, testifying to the cross-year consistency of any effect of a background variable on NRT attendance.

A summary of the outcomes from the omnibus tests on the main effects of the 20 models is provided below. In brackets, the relevant subgroups’ attendance probabilities across the 5 years of the NRT are provided. These probabilities were converted from the parameters of their respective model. A range is given when the attendance probabilities for the 2 subjects do not round to the same figure.

  • attendance rate did not significantly differ between geographical regions
  • female students (87%) were more likely to attend than male students (85-86%)
  • students in participating independent and selective schools (91-92%) were more likely to attend than students in participating non-independent and non-selective schools (86-87%)
  • Asian, Chinese, Black and Any Other Ethnic Group students (89-92%) were more likely to attend than White students and students with mixed or unknown ethnic background (78-87%)
  • students with a language other than English as their major language (90-91%) were more likely to attend than students with English as their major language (86-87%), who were in turn more likely to attend than students with unknown language background (77-79%)
  • students without SEND (88-89%) were more likely to attend than students with unknown SEND status (76-78%), who were in turn more likely to attend than students with SEND (74-75%)
  • students not eligible for FSM (88-89%) were more likely to attend than students eligible for FSM and students with unknown FSM status (76-78%)
  • students with lower IDACI scores (indicating lesser income deprivation) were more likely to attend than students with higher IDACI scores
  • students with higher normalised KS2 scores (indicating higher prior attainment) were more likely to attend than students with lower KS2 scores
  • students with higher GCSE grades (indicating higher attainment subsequent to the NRT) were more likely to have attended than students with lower GCSE grades

In conclusion, the probability of a sampled student attending to take their NRT test is, to varying degrees, sensitive to gender, school type, ethnicity, language background, SEND status, FSM eligibility, income deprivation, KS2 prior attainment and subsequent GCSE attainment, in both subjects in all 5 years of the NRT. Given the known relationships of many of these variables with GCSE exam performance, their effects on NRT attendance lead one to suspect an upward bias in the NRT results of the achieved samples.

Our analyses suggest that the bias has been a constant in all years of the NRT. The cross-year stability makes the bias unproblematic for the usual purpose of the NRT, namely, tracking successive cohorts’ performance. For the purpose of using the NRT 2021 results to understand the impact of the pandemic on student learning, the cross-year stability helps rule out sample differences between years as an explanation of any differences in test performance between years.

KS2 prior attainment profile of drawn and achieved samples in NRT 2020 and 2021

To further illustrate the cross-year stability of any upward bias, we examine the KS2 prior attainment profiles of the drawn and achieved samples. KS2 prior attainment is a strong predictor, among the background variables we have data on, of NRT outcomes. Arguably, GCSE grade is a stronger predictor but the different ways of awarding GCSEs in 2020 and 2021 make it a less appropriate measure with which to investigate bias.

Figure 3 shows, for each subject, the cumulative distribution of KS2 normalised scores in the drawn and achieved samples of NRT 2020 and 2021 as well as in the respective whole KS2 cohort (of 2015 and 2016) whose English reading and maths test marks were used to derive the KS2 normalised scores. For each subject, the 2 KS2 cohort distributions are on top of each other, which is expected given the way the KS2 normalised scores were produced.

The distributions for the 2020 and 2021 drawn samples are, in some places, very slightly to the right of the respective KS2 cohort distribution, indicating a slight upward bias resulting from: (i) students in the KS2 cohort not going on to take GCSE English language and maths and hence not coming into the NRT target student population, (ii) exclusion of schools with students in the target population from the NRT sampling frame, and or (iii) school-level non-response for the NRT. Further, the 2020 and 2021 drawn samples’ distributions do not overlap in some places, indicating very slight differences between years in the drawn sample. Possible reasons for those differences are discussed in an earlier section of this paper (see the second paragraph under the sub-heading ‘Sample composition by student characteristic’).

The distributions for the 2020 and 2021 achieved samples are, in many places, slightly to the right of their respective drawn sample distributions. This supports the findings of the regression models above. Specifically, there is slight upward bias resulting from student-level non-attendance.

The cross-year stability of the upward bias can be considered by inspecting how well the 2020 and 2021 achieved sample distributions overlap. It may be noted that in English, at the high end of the score scale, the 2021 distribution is very slightly to the right of the 2020 distribution, indicating proportionally very slightly more higher-attaining students in the 2021 than the 2020 test. This suggests caution in interpreting any comparison of 2020 and 2021 cohort-level performance in English at the grade 7 and 6 boundary.

It may also be noted that in both subjects, at the very low end of the score scale, the 2021 distribution is very slightly to the right of the 2020 distribution, indicating proportionally very slightly fewer lower-attaining students in the 2021 than the 2020 test.

This is not ideal but given that the attainment range in question is at the very low end, it likely has little import in the comparison of 2020 and 2021 cohort-level performance at the grade 4 and 3 boundary (and even less for the 2 higher grade boundaries). Overall, the 2020 and 2021 achieved sample distributions do not completely overlap, but consistent with the absence of a Year by KS2 normalised score interaction in the logistic regression model in either subject, neither distribution is consistently, even just slightly, to the left or right of the other, indicating little change in the upward bias between 2020 and 2021.

Figure 3a. Cumulative distribution of KS2 normalised scores in drawn and achieved samples for NRT English 2020 and 2021 and in the respective KS2 cohort (number of KS2-matched students and KS2 match rate in the respective sample given in brackets)

Figure 3b. Cumulative distribution of KS2 normalised scores in drawn and achieved samples for NRT Maths 2020 and 2021 and in the respective KS2 cohort (number of KS2-matched students and KS2 match rate in the respective sample given in brackets)

NRT student survey

NRT motivation

In the first part of the NRT student survey, participants provide self-ratings on 10 items about NRT-specific test motivation. Factor analysis suggested grouping the items into 4 factors: effort made, perceived importance, indifference to own performance and preparation made specifically for the NRT. Tables 2a and 2b presents participants’ mean ratings on the 4 factors in 2017 to 2021.

Table 2a. Mean ratings on 4 factors about NRT motivation in NRT 2017 to 2021: English

English 2017 2018 2019 2020 2021
Effort 3.47 3.43 3.37 3.41 3.38
Importance 2.84 2.75 2.65 2.70 2.58*
Indifference 2.86 2.93 3.04 2.99 3.26*
Preparation 1.71 1.64 1.61 1.62 1.56*

Table 2b. Mean ratings on 4 factors about NRT motivation in NRT 2017 to 2021: maths

Maths 2017 2018 2019 2020 2021
Effort 3.64 3.62 3.59 3.59 3.57
Importance 2.87 2.73 2.66 2.68 2.59*
Indifference 2.81 2.87 2.96 3.00 3.21*
Preparation 1.79 1.70 1.66 1.66 1.61*

Note: Mean ratings are on the 1 to 5 rating scale. Standard errors of all mean ratings are between 0.010 and 0.024. The standard errors quantify the statistical uncertainty due to sampling of students from the GCSE-taking population and were computed by balanced repeated replication which took account of the NRT’s stratified sampling design. Survey weightings were applied in calculating mean ratings and their standard errors. * indicates a statistically significant difference between 2020 and 2021 in the mean rating on the relevant factor in the relevant subject at p < .005.

In both subjects, the 2021 participants perceived the NRT as less important, and reported less NRT-specific preparation and greater indifference to their own NRT performance, than their 2020 counterparts. The drop in self-reported test taking effort was statistically significant compared to 2017, but not to 2020.

To assess the extent to which any year-on-year change in NRT performance may be attributed to any change in the participants’ self-reported NRT test motivation, the relationship between NRT performance and test motivation was examined using regression analysis. First, a set of regression analyses were performed on the 2017 data to find the relationship between NRT performance and the 4 factor-based scores, with due treatment of model parameters and standard errors to take account of multiple plausible values and sampling and imputation errors:

NRT = β0 + β1*Effort + β2*Importance + β3*Indifference + β4*Preparation + ₑ

NRT denotes the probability of attaining a key grade at the NRT. Effort, Importance, Indifference and Preparation denote scores on the 4 factors. For ease of interpretation, all scores were standardised prior to the regression analyses. Then, all subsequent years’ factor-based scores were standardised by centring to their respective 2017 mean and scaled by their respective 2017 standard deviation, and by applying the regression formula established from the 2017 data, a ‘prediction’ of NRT performance for each subsequent year was calculated (see the predictions presented in Tables 2c and 2d).

The prediction tells us how the achieved sample of participants would perform given their NRT motivation ratings if their attainment in the subject did not differ from that of the 2017 sample and if the relationship between the ratings and NRT performance remained the same as in 2017. To the extent that the predictions remain unchanged between years, the changes in the headline measures (see the Headline rows in Tables 2c and 2d) can be considered to not be explainable by changes in NRT motivation between the relevant cohorts. A slight complication is that predictions could only be made for the sub-sample of participants for whom full NRT motivation data was available (referred to as the survey-responding sub-sample below) while the headline measures included all participants. Therefore, the NRT results of the survey-responding sub-sample are also presented in Tables 2b and 2c to allow more nuanced comparisons of between-year changes in NRT performance against between-year changes in NRT motivation.

Table 2c. NRT English language performance prediction based on relationship between self-reported test motivation and NRT performance in 2017

Grade Group 2017 2018 2019 2020 2021
Percent Grade 4 or higher Headline 69.9 68.9 65.8 67.5 66.8
Percent Grade 4 or higher Survey-responding sub-sample 70.6 69.7 68.0 68.7 68.5
Percent Grade 4 or higher Prediction for sub-sample NA 70.3 69.4 69.9 69.1
Percent Grade 5 or higher Headline 53.3 52.6 49.5 51.5 51.6
Percent Grade 5 or higher Survey-responding sub-sample 54.0 53.3 51.4 52.6 52.9
Percent Grade 5 or higher Prediction for sub-sample NA 53.8 52.9 53.4 52.6
Percent Grade 7 or higher Headline 16.8 16.7 16.3 17.5 18.7
Percent Grade 7 or higher Survey-responding sub-sample 17.2 17.0 17.0 18.1 19.2
Percent Grade 7 or higher Prediction for sub-sample NA 17.2 16.7 17.0 16.6

Note: NA for 2017 because there was nothing to predict for 2017 when the predictions for the later years were based on the relationship in 2017.

Table 2d. NRT maths performance prediction based on relationship between self-reported test motivation and NRT performance in 2017

Grade Group 2017 2018 2019 2020 2021
Percent Grade 4 or higher Headline 70.7 73.3 73.5 74.0 69.6
Percent Grade 4 or higher Survey-responding sub-sample 71.4 73.6 73.4 74.2 70.5
Percent Grade 4 or higher Prediction for sub-sample NA 72.3 71.9 71.7 71.3
Percent Grade 5 or higher Headline 49.7 52.4 51.9 54.4 49.2
Percent Grade 5 or higher Survey-responding sub-sample 50.3 52.3 51.7 54.7 50.3
Percent Grade 5 or higher Prediction for sub-sample NA 51.4 50.9 50.7 50.1
Percent Grade 7 or higher Headline 19.9 21.5 22.7 24.0 21.0
Percent Grade 7 or higher Survey-responding sub-sample 20.1 21.1 22.2 24.0 22.0
Percent Grade 7 or higher Prediction for sub-sample NA 21.0 20.8 20.6 20.2

Note: NA for 2017 because there was nothing to predict for 2017 when the predictions for the later years were based on the relationship in 2017.

In 2021, in line with the drop in test motivation shown in Tables 2a and 2b, the predictions in both subjects are down from 2020.

For every grade boundary in both subjects, it can be verified that the change in test motivation-based prediction between 2020 and 2021 does not exceed the half width of the 95% confidence interval for the headline estimate in 2021. Therefore, it can be argued that the change in test motivation need not impinge on the interpretation of any between-year comparison of the headline estimates as long as the comparison has taken account of the precision of the headline estimates. That is, we can be confident that the drop in NRT test motivation was not a major contributor to the significant decline in performance in maths reported by NFER and that it did not render the English results under-estimates of the population’s attainment.

GCSE preparation

The second part of the NRT student survey normally focuses on students’ preparation for their forthcoming GCSE English or maths exams through questions related to lessons in school and other learning activities. In 2021, in the context of online teaching provided by schools and no forthcoming public exams, the wording of some questions in the survey were modified, questions about teaching time in both subjects and expected tier entry in maths were removed, and questions about perceived learning progress were added. As in previous years, we asked schools participating in the NRT questions about teaching and learning in both subjects. The schools gave their answers to the NRT test administrators, who then recorded the answers in the test administrator questionnaires.

Teaching and learning

For English, students provided data on lesson type (language only lesson, separate language and literature lessons, combined language and literature lessons) and frequency of tuition in English received outside school. Total teaching time in a typical week was examined using data provided by schools. For maths, students provided data on total homework time in a typical week, total time spent in maths-relevant after-school activities in a typical week, and frequency of tuition in maths received outside school. Total teaching time and homework time in a typical week were examined using data provided by schools.

Tables 3a and 3b present summaries of participants’ and schools’ answers to the questions about teaching and learning in the relevant GCSE subject. In both subjects, fewer students reported receiving extra tuition outside school in 2021. It is likely that these changes resulted from extra tuition being curtailed during the pandemic, but we cannot rule out the possibility that the changes were due to the amended wording of this question with an emphasis (which previously was intended but not explicitly spelled out) on extra tuition given by anyone other than the student’s school teacher(s).

In English, the trend towards more separate teaching (and correspondingly less combined teaching) of language and literature continued, although the changes on the relevant variables were statistically significant compared to 2017 only, not 2020.

In maths, the 2021 participants reported spending more time on homework and much less time on maths-related school activities outside class than their 2020 counterparts.

Table 3a. Summary of NRT English participants’ and participating schools’ answers to questions about teaching and learning for GCSE English Language

Variable 2017 2018 2019 2020 2021
English lessons          
% combined language and literature 75.62 73.57 71.05 66.45 61.35
% separate language and literature 21.99 22.38 25.31 28.23 34.12
% language only 2.40 4.05 3.64 5.31 4.53
School-reported weekly teaching time (in hrs) 4.27 4.16 4.26 4.42 4.35
% receiving tuition outside school 30.12 28.56 27.03 27.48 13.61*

Note: * indicates a statistically significant difference between 2020 and 2021 in the relevant measure at p < .005.

Table 3b. Summary of NRT Maths participants’ and participating schools’ answers to questions about teaching and learning for GCSE Maths

Variable 2017 2018 2019 2020 2021
School-reported weekly teaching time (in hrs) 4.32 4.08 4.32 4.32 4.24
Weekly homework time (in hrs) 1.41 1.40 1.38 1.43 1.59*
School-reported weekly homework time (in hrs) 1.46 1.48 1.34 1.37 1.38
Weekly other maths-related school activity time (in hrs) 0.73 0.70 0.68 0.63 0.27*
% receiving tuition outside school 37.10 37.90 34.56 36.63 26.32*

Note: * indicates a statistically significant difference between 2020 and 2021 in the relevant measure at p < .005.

Within the confines of the NRT student survey, we cannot assume simple cause-and-effect relationships between the teaching and learning variables and test performance.

Perceived learning progress

In an attempt to understand students’ perception of their progress in learning amidst the disruption of education caused by the pandemic, we asked participants where they felt they were in their learning of the subject they had taken the NRT in, compared to where they would have expected to be at the time of year when they took the NRT. If they reported they were ahead or behind, they were asked to indicate how many months ahead or behind they thought they were. We also asked participants to rate their preparedness for assessments in GCSE subjects other than English language and maths.

Tables 4a and 4b show, for each subject, the percentage of participants answering in each progress category, and the average number of months ahead or behind estimated by the participants (months ahead counted as positive, months behind negative, ‘where I expected to be’ zero).

Table 4a. NRT 2021 participants’ perceived progress in learning

Progress in learning English (N = 4,009) Maths (N = 4,125)
‘Behind’ 30.71% 37.07%
‘Where I expected to be’ 59.27% 53.26%
‘Ahead’ 8.58% 8.00%
No or multiple responses 1.45% 1.67%

Table 4b. NRT 2021 participants’ perceived progress in learning quantified as months ahead or behind

Progress quantified as number of months ahead (+) or behind (-) English (N = 3,727) Maths (N = 3,873)
Median 0 0
Mean -1.09 -1.39
Standard deviation 3.26 3.48

In both subjects, just over half of the participants did not feel they were behind (or ahead) in their learning. Of the remaining participants, the large majority felt they were behind, thereby producing, for the respective whole sample, an average progress indicator of just over one month behind. The self-reported perceived learning progress measure has reasonable validity in predicting test performance, as regression analyses confirmed in both subjects, a statistically significant relationship between participants’ perceived progress in learning and their NRT performance (as indexed by NRT plausible values) even after controlling for the very strong relationship between KS2 prior attainment and NRT performance.

Table 5 shows, for each subject, the percentage of participants answering in each category when asked to rate their preparedness for assessments in GCSE subjects other than English language and maths.

Table 5. NRT 2021 participants’ perceived preparedness for assessments in GCSE subjects other than English language and maths

Preparedness, relative to NRT subject English (N = 4,009) Maths (N = 4,125)
A lot less prepared 9.80% 9.02%
A little less well prepared 19.83% 22.45%
Just as prepared 40.56% 41.70%
More prepared 20.85% 19.54%
No or multiple responses 8.95% 7.30%

About 30% of participants reported being less prepared for assessments in other subjects than in English and maths. Without historical data on the same question, we cannot tell whether students’ relative perceived focus on English and maths and other subjects has shifted during the pandemic.

GCSE motivation

In the third part of the NRT student survey, participants provide self-ratings on 10 items about motivation, feelings, and attitudes about learning the relevant GCSE subject (as opposed to the subject in the NRT). Factor analysis of the data in each subject suggested grouping the items into 4 factors, which can be interpreted as about the utility value of the subject, importance of the subject, level of enjoyment of the subject, and how big a role the subject plays in their future plan. Tables 6a and 6b presents participants’ mean ratings on the 4 factors in 2017 to 2021.

Table 6a. Mean ratings on 4 factors about GCSE motivation in NRT 2017 to 2021: English

English 2017 2018 2019 2020 2021
Utility 3.66 3.60 3.54 3.52 3.41*
Importance 4.17 4.14 4.09 4.12 3.89*
Enjoyment 3.68 3.67 3.66 3.67 3.62
Future 2.25 2.21 2.15 2.13 2.06*

Table 6b. Mean ratings on 4 factors about GCSE motivation in NRT 2017 to 2021: maths

Maths 2017 2018 2019 2020 2021
Utility 3.59 3.60 3.57 3.49 3.41*
Importance 4.06 4.08 4.06 4.05 3.88*
Enjoyment 3.46 3.53 3.53 3.51 3.42*
Future 2.53 2.54 2.51 2.45 2.44

Note: Mean ratings are on the 1 to 5 rating scale. Standard errors of all mean ratings are between 0.010 and 0.030. The standard errors quantify the statistical uncertainty due to sampling of students from the GCSE-taking population and were computed by balanced repeated replication which took account of the NRT’s stratified sampling design. Survey weightings were applied in calculating mean ratings and their standard errors. * indicates a statistically significant difference between 2020 and 2021 in the mean rating on the relevant factor in the relevant subject at p < .005.

For English, the 2021 participants found the subject less useful and less important and saw less of a role of the subject in their future plan than their 2020 counterparts. The drop in self-reported enjoyment of the subject was statistically significant compared to 2017, but not to 2020.

For maths, the 2021 participants found the subject less useful and less important and reported less enjoyment of the subject than their 2020 counterparts.

Within the confines of the NRT student survey, we cannot determine what has driven the observed changes in students’ thinking about the relevant subject or assume simple cause-and-effect relationships between the motivation factors and test performance.

NRT performance

Relationship with age in months (‘maturity’ at time of test)

There is much research evidence of a birthdate effect on exam performance even at GCSE level: students who are more mature (in terms of age in months) at the time of an exam perform, on average, slightly better at the exam than their less mature peers [footnote 4] [footnote 5]. The postponement of the NRT in 2021 meant that the student sample as a whole was more mature at the time of the test than the student samples of previous years. It is of interest to investigate the extent to which any birthdate or maturity effect may impinge on our interpretation of the headline results [footnote 6].

To investigate this question, we created a maturity variable with 3 levels, denoted by -1, 0, +1. Table 7, below, shows how we defined participants’ maturity across the NRT. For those born in the 2 months that the NRT test period straddled and the preceding month, they were defined as having a maturity level of 0. Other participants were assigned a maturity level of -1 or +1.

Table 7. Assignment of maturity level based on birth month of participants in NRT 2017 to 2021

NRT cycle Level +1 Level 0 Level -1
NRT 2017-2020 Sept - Dec Jan, Feb, Mar Apr - Aug
NRT 2021 Sept - Feb Mar, Apr, May Jun - Aug

The relationship between maturity level and NRT performance was then examined using regression analysis. First, a set of regression analyses was performed on the 2017 data to find the relationship between NRT performance and maturity level, with due treatment of model parameters and standard errors to take account of multiple plausible values and sampling and imputation errors:

NRT = β0 + β1*Maturity+ₑ

NRT denotes the probability of attaining a key grade at the NRT. Maturity denotes maturity level which can be -1, 0 or +1 (as a continuous variable).

Maturity was not statistically significant on any grade measure in English (all ps > .10), but was significant on all 3 grade measures in maths (ps <. 001 at grade 4 and grade 5; p < .02 at grade 7). This indicates a birthdate effect on NRT performance in maths, but little evidence for it in English.

Then, by applying the regression formula established from the 2017 data, a ‘prediction’ of NRT performance based on participants’ average maturity level for each subsequent year was calculated (see the predictions presented in Tables 9a and 9b).

The prediction tells us how the achieved sample of participants would perform given their maturity level if the relationship between maturity level and NRT performance remained the same as in 2017. To the extent that the predictions remain unchanged between years, the changes in the headline measures (see the Headline rows in Tables 8a and 8b) can be considered to not be explainable by changes in participants’ maturity between the years.

Table 8a. NRT performance prediction based on student sample’s average maturity level and the relationship between maturity level and NRT performance in 2017: English

English Group 2017 2018 2019 2020 2021
% of G4+ Headline 69.9 68.9 65.8 67.5 66.8
% of G4+ Prediction NA 69.9 69.9 69.9 70.2
% of G5+ Headline 53.3 52.6 49.5 51.5 51.6
% of G5+ Prediction NA 53.3 53.3 53.3 53.7
% of G7+ Headline 16.8 16.7 16.3 17.5 18.7
% of G7+ Prediction NA 16.8 16.8 16.8 17.1

Table 8b. NRT performance prediction based on student sample’s average maturity level and the relationship between maturity level and NRT performance in 2017: maths

Maths Group 2017 2018 2019 2020 2021
% of G4+ Headline 70.7 73.3 73.5 74.0 69.6
% of G4+ Prediction NA 70.7 70.7 70.7 71.6
% of G5+ Headline 49.7 52.4 51.9 54.4 49.2
% of G5+ Prediction NA 49.7 49.7 49.7 50.6
% of G7+ Headline 19.9 21.5 22.7 24.0 21.0
% of G7+ Prediction NA 19.8 19.9 19.9 20.3

The predictions for 2018 to 2020 are virtually the same as the actual results of 2017. For 2021, the predictions are slightly higher, by 0.3-0.4 percentage points in English and 0.4-0.9 percentage points in maths, depending on the grade measures.

For every grade boundary in both subjects, it can be verified that the change in maturity-based prediction between 2020 and 2021 does not exceed the half width of the 95% confidence interval for the headline estimate in 2021. Therefore, it can be argued that the slightly greater maturity of the 2021 sample need not impinge on the interpretation of any between-year comparison on the headline estimates if the comparison has taken account of the precision of the headline estimates. That is, we can be confident that it did not render comparisons between the 2021 results and previous years meaningless.

The age issue arose because of the unusual timing of NRT 2021. It is worth considering the effect of the unusual timing on 2021 participants’ preparedness for assessment, relative to their 2020 counterparts. In 2020, the NRT took place before the pandemic struck, at the time of the NRT, participants were fully expecting GCSEs to take place in 2 to 3 months’ time. In 2021, the vast majority of students returned to school in March 2021 after a national lockdown.

By the time of the NRT, most 2021 participants had spent at least 4 weeks participating in face-to-face lessons and the cancellation of GCSE exams may have meant that they were working in a different way in the weeks preceding the NRT administration than their 2020 counterparts. On the one hand, the 2021 participants may have been less prepared for the NRT because they were not facing external exams that the NRT resembled. On the other hand, the 2021 participants may have been more prepared because they were due to sit internal exams or tests that would count towards their teacher assessed grades. It is not possible to say which scenario was closer to the truth.

Relationship with GCSE grade: classification concordance between NRT and GCSE

For the NRT to provide useful information for GCSE awarding, NRT participants’ performance should ideally be predictive of their own and their cohort’s GCSE performance. However, several obvious differences between the NRT and GCSE would lead one not to expect a perfect relationship between NRT and GCSE performance. For example, although the NRT and GCSEs in the relevant subject examine the same content, the NRT is dissimilar in length, and perhaps also in question style and format, to the GCSE exam with a particular exam board that NRT participants have been preparing for [footnote 7].

Effects of those dissimilarities on NRT performance may not be uniform across all NRT participants. Another obvious difference is that GCSE exams take place some months after the NRT, and NRT participants may improve their knowledge and skills to different degrees in the intervening months. A further difference is that the NRT is low-stakes to students and schools while GCSEs are high-stakes, and stakes can interact with student ability, test subject, test motivation and test anxiety in complex ways in affecting test performance. These differences have been taken account of in setting the NRT grade standards. Any change in the relationship between NRT and GCSE performance over the years would signal a possible change in the effect of some or all of these differences on test performance.

One way to quantify the relationship between NRT and GCSE performance is to examine the classification concordance between the 2 assessments, that is, how often concordance (NRT and GCSE delivering the same above or below classification for a student at a key grade boundary) and the 2 kinds of disconcordance (over- and under-performance at the NRT relative to GCSE) occur. The ideal observation is a high level of concordance coupled with similar levels of the 2 kinds of disconcordance.

Tables 9a and 9b presents the level of NRT and GCSE classification concordance and pattern of disconcordance at the 3 key grade boundaries in the 2 subjects in 2017 to 2021. A higher level of classification concordance can be seen in maths than in English. In both subjects, more classification concordance can be found at the grade 7 and 6 boundary than at the grade 4 and 3 boundary, and likewise more at the grade 4 and 3 boundary than at the grade 5 and 4 boundary.

In maths, the level of classification concordance is high and has stayed highly stable from 2017 to 2020, even though GCSE grades of 2020 were based on teacher prediction rather than exams. The level of classification concordance dropped in 2021.

In English, the level of classification concordance at grade 7/6 is reasonably high and has stayed stable from 2017 to 2019, dropped in 2020, and dropped further in 2021. At the other 2 grade boundaries, the level of classification concordance dropped in 2019 and has stayed stable since.

In both subjects, the level of classification concordance is at its lowest ever level in 2021. This likely reflects the fact that in previous years, GCSE grades, like NRT grades, were based entirely on tests or exams, while in 2021 (and 2020), GCSE grades diverged from NRT grades in being based on teacher assessment (or teacher prediction). Despite the slight decline, the level of classification concordance remained high in 2021, which may be taken as evidence for the credibility of the teacher assessed GCSE grades of summer 2021.

As for the disconcordance patterns, under-performance on the NRT (relative to GCSE) has almost always been more common than over-performance (the exception being the case of NRT Maths in 2019 at grade 7/6). The dominance of NRT under-performance over NRT over-performance as a pattern of classification disconcordance has increased in 2020, and increased further in 2021, likely reflecting teachers giving students the benefit of the doubt in their centre assessment grades and teacher assessed grades in 2020 and 2021.

Table 9a. NRT/GCSE classification concordance at key grade boundaries in 2017 to 2021: English (standard error in brackets)

English Concordance 2017 % 2017 (SE) 2018 % 2018 (SE) 2019 % 2019 (SE) 2020 % 2020 (SE) 2021 % 2021 (SE)
Grade 7/6 Concordant 84.03 (0.47) 83.48 (0.52) 84.40 (0.52) 81.72 (0.53) 79.91 (0.73)
Grade 7/6 NRT>GCSE 7.68 (0.33) 7.11 (0.37) 6.73 (0.34) 5.47 (0.30) 4.45 (0.36)
Grade 7/6 NRT<GCSE 8.29 (0.37) 9.41 (0.40) 8.87 (0.42) 12.81 (0.49) 15.64 (0.74)
Grade 5/4 Concordant 74.59 (0.56) 75.33 (0.57) 73.94 (0.57) 73.39 (0.57) 72.31 (0.76)
Grade 5/4 NRT>GCSE 10.89 (0.41) 9.52 (0.40) 9.08 (0.40) 6.24 (0.35) 5.26 (0.41)
Grade 5/4 NRT<GCSE 14.52 (0.50) 15.16 (0.56) 16.97 (0.53) 20.37 (0.59) 22.43 (0.80)
Grade 4/3 Concordant 78.32 (0.53) 78.04 (0.56) 75.97 (0.57) 75.83 (0.67) 75.43 (0.86)
Grade 4/3 NRT>GCSE 8.53 (0.36) 8.00 (0.39) 7.50 (0.35) 3.89 (0.28) 2.90 (0.30)
Grade 4/3 NRT<GCSE 13.15 (0.49) 13.96 (0.54) 16.52 (0.56) 20.28 (0.68) 21.67 (0.84)

Table 9b. NRT/GCSE classification concordance at key grade boundaries in 2017 to 2021: maths (standard error in brackets)

Maths Concordance 2017 % 2017 (SE) 2018 % 2018 (SE) 2019 % 2019 (SE) 2020 % 2020 (SE) 2021 % 2021 (SE)
Grade 7/6 Concordant 88.65 (0.39) 89.35 (0.44) 89.28 (0.39) 88.43 (0.42) 86.22 (0.58)
Grade 7/6 NRT>GCSE 3.80 (0.25) 4.87 (0.32) 5.45 (0.29) 4.27 (0.27) 2.53 (0.28)
Grade 7/6 NRT<GCSE 7.55 (0.36) 5.78 (0.32) 5.26 (0.32) 7.30 (0.35) 11.26 (0.57)
Grade 5/4 Concordant 84.28 (0.49) 84.31 (0.49) 84.38 (0.49) 84.23 (0.50) 79.19 (0.73)
Grade 5/4 NRT>GCSE 4.93 (0.26) 6.64 (0.37) 6.52 (0.33) 3.92 (0.27) 2.28 (0.26)
Grade 5/4 NRT<GCSE 10.79 (0.42) 9.05 (0.39) 9.09 (0.39) 11.85 (0.47) 18.53 (0.75)
Grade 4/3 Concordant 86.72 (0.43) 86.79 (0.50) 87.13 (0.44) 87.53 (0.45 83.64 (0.63)
Grade 4/3 NRT>GCSE 3.89 (0.23) 5.21 (0.36) 4.88 (0.27) 2.76 (0.21) 1.76 (0.22)
Grade 4/3 NRT<GCSE 9.38 (0.39) 8.00 (0.38) 7.99 (0.36) 9.72 (0.43) 14.61 (0.63)

Note: The standard error quantifies the combined uncertainty due to sampling of students from the GCSE-taking population and sampling of items from all items in use in the NRT. The uncertainty due to student sampling was computed by balanced repeated replication which took account of the NRT’s stratified sampling design. The uncertainty due to item sampling was computed by combining the repeated analyses using the 10 plausible values. Survey weightings were applied in calculating the concordant and disconcordant percentages and their standard errors.

Conclusion

Following the cancellation of GCSE exams in summer 2021 due to the disruption of education caused by the COVID-19 pandemic, the results of NRT 2021 could not be used in GCSE awarding in summer 2021, although the tests went ahead to enable the longitudinal aim of the NRT to continue. It was therefore just as important this year as in previous years to carry out the routine series of contextual analyses to try to better understand the NRT 2021 sample and how it compared to previous years.

In 2021, as in previous years, NRT attendance was sensitive to many student background variables. The pattern of differential attendance by student background variables leads us to expect the NRT results to be upwardly biased compared to the target population, given the known relationships between those variables and GCSE exam performance. However, because of its cross-year stability, the bias is unproblematic for the NRT’s usual purpose of tracking successive cohorts’ performance. For the purpose in 2021 of trying to detect any impact of the pandemic on students’ learning, the cross-year stability of the bias helps rule out sample difference as an explanation for any finding of a change in test performance in 2021.

In the NRT 2021 student survey, participants in both subjects reported lower perceived importance of the NRT, greater indifference to their own NRT performance, less test preparation, but little change in self-reported test-taking effort, compared to their 2020 counterparts. Statistical modelling exploiting the historical relationship between test motivation and test performance at the NRT suggested that reduced test motivation in 2021 was not a major contributor to the significant decline in performance in maths and that it did not render the results in English under-estimates of the population’s attainment. Modelling exploiting the historical birth-date effect on test performance found the 2021 participants’ higher age in months (resulting from the postponement of the 2021 test to the summer term) did not render results in either subject over-estimates relative to results in previous years.

  1. KS2 normalised score should not be confused with KS2 scaled score. Since 2016, the Standards and Testing Agency has published scaled score conversion tables for each series of KS2 tests. KS2 scaled scores are on a scale of 80 to 120. 

  2. Eason, S. (2017). Key Stage 2 tests: Alternative measures of prior attainment for predicting GCSE outcomes. AQA Centre for Education Research and Practice. 

  3. There is an added complication with the 2021 cohort’s KS2 normalised scores. The 2021 NRT cohort was the first to sit the revised KS2 assessments in 2016. NRT 2020 and 2021 participants sharing the same KS2 normalised score have the same relative standing in their respective cohort in terms of performance at their respective KS2 tests, but we cannot be certain that they have the same level of prior attainment in the absolute sense because of the change in KS2 assessments as well as cohort-based normalisation. 

  4. Benton, T. (2014). Should we age-standardise GCSEs? Cambridge Assessment research report. 

  5. Sykes, E.D.A., Bell, J.F., & Vidal Rodeiro, C. (2009/2016). Birthdate effects: a review of the literature from 1990-on. Cambridge Assessment. 

  6. If the test were postponed in a normal year, the sample would be more mature not only in age in months at the time of testing, but also because of 2 extra months of in-school teaching and learning. It is debatable whether the 2021 sample had had 2 extra months of teaching compared to previous years’ samples. Our modelling did not attempt to estimate the effect of maturity due to extra months of teaching. 

  7. The NRT is designed to be exam board-neutral.