National Travel Survey: 2023 Sampling Stratification Review

Question 1

Chapter 1: Introduction and Background

Accepted Answer

The National Travel Survey (NTS) provides up-to-date and regular information about personal travel within England and monitors trends in travel behaviour.

The survey collects detailed information on the key characteristics of each participating household and any vehicle to which they have access to. In addition, each individual within the household is interviewed and then asked to complete a 7-day travel record. The survey produces a rich dataset for analysis with information recorded at a number of different levels (household, individual, vehicle, long distance journey, day, trip and stage).

1.1 Sampling and stratification

The NTS 2023 was designed to provide a representative sample of households in England. It is based on a stratified 2-stage random probability sample of private households. The sample was drawn firstly by selecting the Primary Sampling Units (PSUs), and then by selecting addresses within PSUs. The sample design employs postcode sectors as PSUs. In 2023, 989 PSUs were issued and 22 addresses from each PSU, a total of 21,758 issued addresses.

The survey has used a quasi-panel design from 2002 onwards in which half the PSUs in the sample for a given year are retained for the next year’s sample and the other half are replaced. This has the effect of reducing the variance of estimates of year-on-year change. As the NTS sample size was increased in 2023, the proportion of PSUs retained from 2022 was less than half of the total. The 2023 issued sample included 324 retained PSUs, 33% of the total issued.

In order to draw the PSUs, a list of all postcode sectors in England was generated excluding those in the Isles of Scilly due to the cost of interviewing. Sectors carried over from the previous year were also excluded. Sectors with fewer than 500 delivery points were grouped with an adjacent sector. Grouped sectors were then treated as one PSU. On average, each PSU contained about 2,900 delivery points.

This list of grouped postcode sectors in England was sorted by a regional variable (International Territorial Level 2), urban-rural status within regional categories, tertiles of area-level car ownership from the 2011 Census, and area-level percentage of people aged 16 to 74 working mainly at home from the 2011 Census. This sorting was done in order to increase the precision of the sample and to ensure that the different strata in the population are correctly represented. Random samples of PSUs were then selected with probability proportional to delivery point count. Separate sample targets were set for Inner London, Outer London, and the rest of England, in order to oversample London due to historically lower response rates there.

In 2014, the National Centre for Social Research (NatCen) carried out a comprehensive review of the NTS sample stratification to examine whether the stratifiers used were optimal using data from NTS 2013 and Census 2011. This work concluded that the stratification described above was the best option and this has been used in the subsequent decade of NTS. In 2024, an equivalent review process has been undertaken using data from NTS 2023 and Census 2021, to determine whether the current stratification remains optimal and, if not, which option offers gains in precision of NTS estimates.

Question 2

Chapter 2: Methodology

Accepted Answer

2.1 Introduction

The variables used to stratify the sampling of new PSUs were last reviewed by NatCen in 2014 by identifying a range of key estimates from the survey and then measuring the strength of the association between those estimates and a range of combinations of potential stratification variables.

Stratification improves precision most when the stratification variables are strongly correlated with the outcome measures. The aim was thus to identify the combination of potential stratification variables that were most strongly associated with key survey measures. This was done by fitting a series of regression models which measured the proportion of the variance explained when the potential stratification variables were added to the model. The conclusion of the review was that International Territorial Level 2 (ITL2), urban-rural status, tertiles of car ownership, and percentage of people working mainly at home were the best stratification variables.

Note: ITL2 was known as Nomenclature of Units for Territorial Statistics (NUTS2) at the time of the last review.

The aim of this report is to repeat the methodology that was previously used. This section explains the main elements of the methodology. The first step was identifying a range of key NTS survey measures, to compare competing stratification strategies by fitting regression models at the area level and measuring the amount of variance explained.

2.2 NTS measures (dependent variables)

Optimal stratifiers for one variable are likely to be different than for another, because the correlation between survey variables and stratification factors will be different for each survey variable. Choosing optimal stratifiers, therefore, requires making a compromise between the optimal solutions for a range of variables across different levels of the NTS database.

This report largely uses the same set of NTS variables which were included in the previous review in 2014, which used data from NTS 2013. In two instances where a variable was no longer available in NTS 2023, a similar variable was chosen. Table 1 below lists the 16 NTS variables analysed in this review and the level at which they were collected.

Table 1: Key variables from NTS 2023 used in stratification review regression tests

Level	Variable Name	Description	Statistic used
Household	NumCarVan_B01ID	Households with at least one car or van	Percentage
Household	NumBike_B01ID	Households with at least one bike	Percentage
Individual	DrivLic_B02ID	Adults with full car driving licence	Percentage
Trip	TripPurpTo_B02ID	Purpose of trip ‘to’ was shopping	Percentage
Trip	TripTravTime	Overall travelling time	Mean
Trip	jotxsc	Overall trip time	Mean
Trip	jd	Trip distance	Mean
Trip	jjxsc	Number of trips	Mean
Trip	TripPurpTo_B02ID	School trip distance	Mean
Stage of trip	StageMode_B01ID	Stages where mode of transport was walking	Percentage
Stage of trip	StageMode_B01ID	Stages where mode of transport was car	Percentage
Stage of trip	sttxsc	Stage travelling time	Mean
Stage of trip	sd	Stage distance	Mean
Stage of trip	stagedistance	Length of stage	Mean
Stage of trip	stagetime	Stage travel time	Mean
Vehicle	vehanmileage	Annual mileage	Mean

The variables listed in the table were used as dependent variables in the regression models, aggregated to the postcode sector (PSU) level. For example, the variable indicating whether an adult was in possession of a full car driving licence was aggregated to estimate the percentage of adults holding a full driving licence within each PSU.

2.3 Regional classifications (independent variables)

Four regional-level classifications were considered in the analysis, namely:

the current NTS regional stratification variable based on ITL2 areas and 2-category urban-rural status, 52 categories in total (variable name: ITL2 urbrur)
International Territorial Level 2 areas, 30 categories in total (variable name: ITL2)
Government Office Region, with London split into inner and outer regions, 10 categories in total (variable name: GOR 10)
Government Office Region, with North East, North West and Merseyside, Yorkshire and Humberside, and West Midlands split into metropolitan and non-metropolitan regions, 13 categories in total (variable name: GOR 13)

2.4 Census measures (independent variables)

The 2021 Census provides a wide range of variables at local area (LSOA) level that could be used to stratify a sample of postcode sectors. A comprehensive selection of options were chosen, covering a range of demographic characteristics and methods of commuting to work. Both individual-level and household-level Census variables were included. Table 2 below shows the 39 variables which were chosen to be tested as potential stratifiers, based on the previous stratification review in 2014.

Table 2: Variables from the 2021 Census used in stratification review regression tests

Variable name	Description
popdens	population density (persons per hectare)
ag014	% Persons aged 0 to 14
ag1524	% Persons aged 16 to 24
ag2534	% Persons aged 25 to 34
ag3544	% Persons aged 35 to 44
ag4554	% Persons aged 45 to 54
ag5564	% Persons aged 55 to 64
ag6574	% Persons aged 65 to 74
ag75p	% Persons aged 75 or over
nonwhite	% Persons of an ethnicity other than white
marital_m	% Adults married
marital_nm	% Adults not married
ownocc	% Owner occupier households
larent	% Households rented from council
privrent	% Privately rented households
semi_d	% Households semi-detached
nssec12	% Persons aged 16 or over in NS-SEC categories 1 and 2
retire	% Persons aged 16 or over who are retired
active	% Persons aged 16 or over economically active
unemploy	% Persons aged 16 or over unemployed
inactive	% Persons aged 16 or over economically inactive
full	% Economically active working full time
part	% Economically active working part time
limitil	% Persons with limiting long-term illness
noqual	% Adults with no qualifications
level4p	% Adults with level 4 qualifications and above
hhsize1	% Households with 1 person
hhsize2	% Households with 2 persons
hhsize3	% Households with 3 persons
hhsize4	% Households with 4 or more persons
hhsize	Average household size
nokids	% Households with no dependent children
cars0	% Households with no car or van
cars2p	% Households with 2 or more cars or vans
cars	Average number of cars per household
home_w	% Persons aged 16 or over working mainly at home in May 2021
train_w	% Persons aged 16 or over travelling to work by train, underground, metro, light train, tram
car_w	% Persons aged 16 or over travelling to work by car or van
bus_w	% Persons aged 16 or over travelling to work by bus, minibus, coach

Prior to conducting this review, during the NTS 2024 sampling, a comparison was made between home_w (% Persons aged 16 or over working mainly at home) data collected in the 2011 and 2021 censuses. The 2021 Census took place in May and coincided with a period of coronavirus (COVID-19) lockdown restrictions. Consequently, the overall proportion of people working from home in May 2021 was much higher than in 2011. Aggregated to postcode sector level, the average percentage of persons working from home in England was 3.9% in 2011 Census and 31.7% in the 2021 Census. In addition, the 2021 variable has a much higher standard deviation. While home_w has nonetheless been included in the stratification review, it is important to note that it captured behaviour in May 2021 and is unlikely to accurately measure levels of working from home when no lockdown restrictions are in place.

2.5 Regression analysis

An analysis dataset at the PSU level was compiled by matching together the survey estimates listed in section 2.2, the regional classification variables listed in section 2.3, and the 2021 Census variables listed in section 2.4 using postcode sectors. This dataset was used to run separate stepwise linear regression models with each of the 16 key NTS measure as the dependent variable and all possible stratification options (geographical and Census-based) as independent variables.

Each model was compared on the basis of their adjusted multiple coefficient of determination, known as adjusted R squared. This measures the percentage of variance in the dependent variable accounted for by the variables in the regression model. It utilises a degree of freedom adjustment in estimating the error variance, thus making it a useful measure for comparing models based on different numbers of independent variables.

All things being equal, the 2 Census variables appearing more often in the final models, that is those which explain most of the variability in NTS measures, in combination with the chosen regional first stratifier, are in theory the best choices and should be considered as the subsequent stratifiers.

To examine the extent of any gain in precision, the adjusted R squared from 2 models (one containing the proposed stratifiers and one the existing ones) were compared. The percentage change in precision achieved by the proposed stratifiers was computed using the following formula:

Figure 1: formula used to calculate percentage change in precision for a change in stratification variables

(Alt text for Figure 1): The formula states that percentage gain in precision is equal to adjusted R squared for the new stratifiers minus adjusted R squared for the old stratifiers, divided by 1 minus adjusted R squared for the old stratifiers, multiplied by 100.

The results of the analysis undertaken are presented in the next section.

Question 3

Chapter 3: Analysis and results

Accepted Answer

3.1 Choosing a regional stratification variable

The first 2 NTS stratification variables comprise a regional variable based on ITL2 areas and urban-rural status within each regional category (ITL2 urbrur). The previous review in 2014 considered 3 alternative regional stratification options and these were also tested in the present review to check that ITL2 with urban-rural status is still the optimal choice. The first alternative regional variable was ITL2, previously known as NUTS2, without urban-rural status. The other 2 were based on Government Office Region with differing numbers of categories, one with inner and outer London split (GOR 10) and one with a metropolitan and non-metropolitan split (GOR 13). The first stage of the review compared these regional measures to determine whether ILT2 still performs better than the 2 forms of GOR. It is also tested whether the addition of urban-rural status within regions still improves precision compared with region alone. Census 2011 urban-rural classification was used in this review as the updated Census 2021 version has not yet been released.

Separate linear regression models were fitted for each of the NTS dependent variables and each of the 4 regional categorical independent variables. For each NTS measure, the model with the highest value for adjusted R square offers the best precision. As was the case in the previous 2014 NTS stratification review, R squared represents a biased estimate of the true R squared, as the effect of sampling households within PSUs inflates the variance between area means. In other words, a portion of the area-level variance represents variance between households within the same area. Although this affects the final estimates, the comparison between different models is not affected as all tests are at PSU level.

The adjusted R squared for each regression model is given in Table 3, consisting of 4 models run for each NTS measure with a total of 64 results. The regional stratification variable that produces the highest R squared offers the best precision for the given NTS measure. This best precision variable is indicated in the final column of the table.

Table 3: Comparison of adjusted R squared values across the 4 regional variables for each NTS measure

NTS measure	ITL2 urbrur	ITL2	GOR 13	GOR 10	Best precision
Households with at least one car or van	23.0	22.1	15.2	21.3	ITL2 urbrur
Households with at least one bike	8.0	8.1	6.0	5.7	ITL2
Adults with full car driving licence	14.4	11.8	9.4	9.5	ITL2 urbrur
Purpose of trip ‘to’ was shopping	3.9	3.1	3.8	3.0	ITL2 urbrur
Overall travelling time	18.8	17.4	16.1	17.4	ITL2 urbrur
Overall trip time	22.1	21.5	19.2	21.9	ITL2 urbrur
Trip distance	10.2	4.3	4.3	2.9	ITL2 urbrur
Number of trips	4.6	5.4	3.0	5.3	ITL2
School trip distance	13.3	0.9	0.2	0.0	ITL2 urbrur
Stages where mode of transport was walking	6.2	5.9	3.0	6.2	GOR 10
Stages where mode of transport was car	40.4	39.7	33.2	39.3	ITL2 urbrur
Stage travelling time	13.6	12.4	10.0	12.5	ITL2 urbrur
Stage distance	14.2	7.1	7.1	5.5	ITL2 urbrur
Length of stage	14.6	7.6	7.5	5.8	ITL2 urbrur
Stage travel time	10.1	7.8	6.5	7.9	ITL2 urbrur
Annual mileage	3.3	4.6	4.1	3.8	ITL2
Average for all 16 NTS measures	13.8	11.2	9.3	10.5	ITL2 urbrur

The current regional stratifier including urban-rural status (ITL2 urbrur) produced the highest R squared for 12 of the 16 NTS variables. For example, ITL2 urbrur explained 14.4% of the variation in the percentage of adults in the PSU with a full car driving license compared to 11.8% for ITL2 alone, 9.4% for GOR 13, and 9.5% for GOR 10.

For 3 of the 4 NTS measures another regional variable performed better than ITL2 urbrur, the difference was less than a percentage point. The only exception is annual mileage, where ITL2 urbrur performed worst of the regional options. These results suggest that the current regional stratifier, ITL2, along with urban-rural status, remain the optimal choice of first and second stratifiers. Hence, ITL2 urbrur was used in all subsequent analyses to select the remaining stratifiers.

3.2 Choosing Census-based stratifiers

Stepwise linear regressions were carried out to examine which of the 39 Census independent variables listed in Table 2 were most highly correlated with each NTS measure, once ITL2 urbrur has been controlled for. The Census variable that was selected at each step, up to a maximum of 4 variables, is shown in Table 4 below. If additional variables did not add significantly to the explanatory power of a model, then fewer than 4 were recorded. Census variables are listed as ordered in the stepwise regression, so the first variable listed was added in first step and the fourth in the fourth step.

Table 4: Significant Census 2021 variables included in stepwise regressions for each NTS measure (after controlling for ILT2 urbrur)

NTS measure	Census 2021 variables in stepwise regression (maximum of 4)
Households with at least one car or van	cars0, unemploy, nokids, part
Households with at least one bike	noqual, hhsize3
Adults with full car driving licence	Unemploy, noqual, cars, hhsize
Purpose of trip ‘to’ was shopping	home_w, ag6574, bus_w
Overall travelling time	car_w, ownocc, ag3544
Overall trip time	privrent, car_w, ownocc, ag3544
Trip distance	semi_d, limital, hhsize2, larent
Number of trips	hhsize1, cars2p, unemploy
School trip distance	hhsize4, privrent
Stages where mode of transport was walking	Privrent, car_w, nonwhite, cars2p
Stages where mode of transport was car	cars0, noqual
Stage travelling time	Privrent, ag3544, ownocc, unemploy
Stage distance	Limital, semi_d, hhsize2, larent
Length of stage	Limital, semi_d, hhsize2, larent
Stage travel time	Ownocc, car_w, ag3544
Annual mileage	Nonwhite, bus_w

The analysis showed a relatively strong correlation between NTS variables and Census measures of car ownership and household size. Car ownership or commuting by car measures featured in 8 of the 16 models, spread across different levels of the NTS hierarchical database, and were the best performing variable in 3 cases. Household size measures featured in 7 of the 16 models, again spread across the hierarchy, and were the best performing variable in 2 cases.

It is worth noting that while the variables were those selected for contributing most to an increase in R squared, the actual difference in the magnitude of the increase between the highest and the next several was often relatively small. That is, similar R squared values could often be attained by various combinations of variables.

This analysis demonstrates that Census measures of car ownership continue to have strong associations with NTS outcomes. However, Census measures of household size also appear frequently in the results. It was therefore decided to conduct regression tests for the final Census stratifier using two alternative stratifiers in tertile form: percentage of households with no car or van (cars0) and percentage of households containing one person (hhsize1). Cars0 was the third stratifier chosen in the 2014 stratification review, after ITL2 and urban-rural status.

The stepwise analysis was repeated while controlling for ITL2 urbrur and percentage of households with no car in tertiles. Census variables measuring car ownership (cars0, cars2p, and cars) were not included in these stepwise regressions to avoid duplication. The Census variable that was selected at each step, up to a maximum of 3 variables, is shown in Table 5 below. If additional variables did not add significantly to the explanatory power of a model, then fewer than 3 were recorded. Census variables are listed as ordered in the stepwise regression, so the first variable listed was added in first step and the third in the third step.

Table 5: Significant Census 2021 variables included in stepwise regressions for each NTS measure (after controlling for ITL2 urbrur and tertiles of car ownership)

NTS measure	Census 2021 variables in stepwise regression (maximum of 3)
Households with at least one car or van	Ownocc, part, hhsize1
Households with at least one bike	Noqual, ag014, limitil
Adults with full car driving licence	nssec12, car_w, ag014
Purpose of trip ‘to’ was shopping	Noqual, ag6574, hhsize1
Overall travelling time	car_w, ownocc, ag3544
Overall trip time	car_w, ownocc, ag3544
Trip distance	semi_d, ownocc, hhsize2
Number of trips	nokids
School trip distance	privrent, hhsize2, home_w
Stages where mode of transport was walking	car_w, privrent, nonwhite
Stages where mode of transport was car	car_w, marital_m, ownocc
Stage travelling time	Privrent, ownocc, ag75p
Stage distance	semi_d, ownocc
Length of stage	semi_d, ownocc
Stage travel time	Ownocc, car_w, ag3544
Annual mileage	Nonwhite, bus_w

The analysis showed that once ILT2 urbrur and car ownership tertiles were controlled for, the Census variables occurring most frequently in stepwise models were the percentage of owner occupier households (ownocc) and the percentage of persons aged 16 or over travelling to work by car or van (car_w). Ownocc appeared in 9 of the 16 models and car_w in 6.

The stepwise analysis was repeated again while controlling for ILT2 urbrur and percentage of households with one person in tertiles. Census variables measuring household size (hhsize1, hhsize2, hhsize3, hhsize4, and hhsize) were not included in the stepwise regressions to avoid duplication. The Census variable that was selected at each step, up to a maximum of 3 variables, is shown in Table 6 below. If additional variables did not add significantly to the explanatory power of a model, then fewer than 3 were recorded. Census variables are listed as ordered in the stepwise regression, so the first variable listed was added in first step and the third in the third step.

Table 6: Significant Census 2021 variables included in stepwise regressions for each NTS measure (after controlling for ILT2 urbrur and tertiles of household size)

NTS measure	Census 2021 variables in stepwise regression (maximum of 3)
Households with at least one car or van	cars0, level4p, unemploy
Households with at least one bike	Noqual
Adults with full car driving licence	Unemploy, nssec12, cars
Purpose of trip ‘to’ was shopping	home_w, ag6574, bus_w
Overall travelling time	car_w, ownocc, ag3544
Overall trip time	privrent, car_w, ownocc
Trip distance	Limital, semi_d, ag3544
Number of trips	cars2p, nokids, ag4554
School trip distance	NO SIGNIFICANT VARIABLES
Stages where mode of transport was walking	car_w, privrent, nonwhite
Stages where mode of transport was car	cars0, noqual
Stage travelling time	privrent, ag3544, ownocc
Stage distance	Limital, semi_d, ag3544
Length of stage	Limital, semi_d, ag3544
Stage travel time	Ownocc, car_w, ag3544
Annual mileage	NO SIGNIFICANT VARIABLES

The analysis showed that once ITL2 urbrur and household size tertiles were controlled for, the Census variables occurring most frequently in stepwise models were commuting measures (car_w, bus_w, home_w) and car ownership measures (cars0, cars, and cars2p). Commuting measures appeared in 5 of the 16 models and car ownership measures in 4. The variation between models for different NTS measures was greater when household size was controlled than when car ownership was controlled for.

The results so far have indicated that retaining a fourth stratifier would be useful to improve precision, in combination with tertiles of either car ownership or household size. In order to select the optimal combination of Census stratifiers for NTS, 4 options were chosen for comparison based on the variables occurring most often in the preceding results.

3.3 Comparison of alternative stratification options

The options for NTS stratification that were compared were:

the current stratification option (Current): ITL2 urbrur, tertiles of car ownership, percentage working mainly from home
Option 1: ITL2 urbrur, tertiles of car ownership, percentage owner occupier households
Option 2: ITL2 urbrur, tertiles of car ownership, percentage travelling to work by car or van
Option 3: ITL2 urbrur, tertiles of household size, percentage of households with no car or van
Option 4: ITL2 urbrur, tertiles of household size, percentage travelling to work by car or van

Note: the stratifiers listed against each option are listed in the order that they would be used for sampling new PSUs.

Separate linear regression models were fitted for each of the NTS dependent variables and each of the 5 stratification options. For each NTS measure, the model with the highest value for adjusted R squared offers the best precision. Comparing the 5 options demonstrates whether an alternative would be an improvement on the current stratification variables and, if so, which is best.

The adjusted R squared for each regression model is given in Table 7, with the stratification option producing the highest value and thus the best precision for each NTS measure listed in the final column. When the stratification that is currently in use offers better precision than the 4 alternatives under consideration, ‘Current’ is listed in the final column.

Table 7: Comparison of adjusted R squared values across the current and 4 proposed alternative stratification options and for each NTS measure

NTS measure	Current	Option 1	Option 2	Option 3	Option 4	Best precision
Households with at least one car or van	36.2	40.0	36.1	44.6	28.6	3
Households with at least one bike	11.2	9.4	10.9	9.1	9.3	Current
Adults with full car driving licence	30.6	29.8	27.1	30.0	15.7	Current
Purpose of trip ‘to’ was shopping	5.0	4.0	4.9	3.6	4.6	Current
Overall travelling time	19.9	21.6	21.6	20.6	21.7	4
Overall trip time	25.2	25.9	27.1	25.9	27.0	2
Trip distance	11.2	11.5	11.9	10.0	11.1	2
Number of trips	8.1	7.8	8.3	7.4	7.1	2
School trip distance	13.0	13.5	13.0	13.0	12.9	1
Stages where mode of transport was walking	12.9	9.9	15.2	11.4	15.0	2
Stages where mode of transport was car	49.4	51.1	51.8	53.7	48.8	3
Stage travelling time	15.4	16.7	16.8	16.4	16.8	2
Stage distance	15.0	15.5	15.5	14.1	14.6	1
Length of stage	15.5	16.0	16.0	14.6	15.0	1
Stage travel time	10.5	12.6	11.6	11.4	11.7	1
Annual mileage	3.1	3.0	3.1	3.0	3.1	2
Average for all 16 NTS measures	17.6	18.0	18.2	18.1	16.4	2

The comparison above demonstrates that alternative options perform better than the stratification currently used, when looking across all 16 NTS measures. The current stratification has the highest adjusted R squared for only 3 NTS measures and the second-lowest average precision of the 5 options. Option 3 has the highest adjusted R squared for 2 NTS measures and option 4 the highest for only one. Options 1 and 2 have the highest adjusted R squared values for key diary variables at the trip and stage levels. Option 1 has the highest value for 4 NTS measures and option 2 has the highest value for 6 NTS measures. Option 2 also has the highest average precision across all 16 variables, although this is only marginally higher than options 1 and 3.

The percentage gain or loss in precision for each NTS measure when comparing each of the 4 new options with the current stratification factors is shown in Table 8 below. The stratification option offering the best improvement in precision listed in the final column. When none of the 4 options offer an improvement on the current stratification, this is stated as ‘none’.

Table 8: Percentage change adjusted R squared values across the 4 stratification options and for each NTS measure, compared with current NTS stratification

NTS measure	Option 1	Option 2	Option 3	Option 4	Best improvement
Households with at least one car or van	5.9	-0.2	13.1	-11.9	3
Households with at least one bike	-2.0	-0.3	-2.3	-2.1	None
Adults with full car driving licence	-1.1	-5.1	-0.8	-21.5	None
Purpose of trip ‘to’ was shopping	-1.1	-0.2	-1.5	-0.5	None
Overall travelling time	2.1	2.1	0.9	2.2	4
Overall trip time	1.0	2.6	0.9	2.5	2
Trip distance	0.4	0.8	-1.3	-0.1	2
Number of trips	-0.4	0.2	-0.8	-1.0	2
School trip distance	0.6	0.0	0.0	-0.1	1
Stages where mode of transport was walking	-3.5	2.6	-1.8	2.3	2
Stages where mode of transport was car	3.5	4.8	8.5	-1.2	3
Stage travelling time	1.6	1.7	1.2	1.6	2
Stage distance	0.6	0.6	-1.1	-0.5	1
Length of stage	0.6	0.5	-1.1	-0.6	1
Stage travel time	2.4	1.3	1.1	1.4	1
Annual mileage	-0.1	0.0	0.0	0.0	2

The above analysis demonstrates that option 2 most frequently offers the best improvement in precision compared with the current stratification for 6 of the 16 NTS measures. Option 1 offers the best improvement for 4 NTS measures, option 3 for two, and option 4 for only one. For 3 measures, the current stratification offers better precision that the potential alternatives under consideration.

The comparison undertaken between the current NTS stratification variables and the 4 potential alternatives has demonstrated that option 2 offers the best precision across the hierarchy of NTS measures. Option 2 includes tertiles of car ownership as the third stratifier and the percentage travelling to work by car or van (car_w) as the fourth stratifier. The other Census variable tested as a fourth stratifier in the options above is the percentage of owner occupier households (ownocc). As a final test of which is better suited to be the final stratifier, a correlation analysis was carried out to test the correlation between the 2 potential final stratifiers and the percentage of households with no car or van. The latter variable is the current third stratifier and also the third stratifier in options 1 and 2, the best performing alternatives in the comparison tests reported above.

The correlation analysis suggested that both ownocc and car_w are negatively correlated with the percentage of households with no cars, although ownocc is more strongly correlated (Pearson’s R = -0.837) than car_w (Pearson’s R = -0.635). Consequently option 2 appears preferable, confirming the results presented in tables 7 and 8 above. The current fourth stratifier, home_w, and this proposed alternative, car_w, are both measures of travel to work derived from the same Census question. Option 2 therefore represents a relatively minor change to the current stratification for NTS, but the analysis conducted nonetheless shows it to be the preferable alternative.

As explained in section 2.4, responses to the commuting question in the 2021 Census were impacted by lockdown restrictions in place at the time. The recorded level of working from home was relatively high, therefore the proportion of people commuting by any mode was lower in May 2021 than May 2011. Despite this, the Census 2021 car commuting measure offers the best precision as fourth stratification variable when compared with the 4 other options tested. To check the impact of lockdown restrictions on car_w, a comparison was made of the variables car_w and home_w from Census 2011 and 2021 matched onto NTS 2013 and NTS 2023 PSUs in the respective stratification reviews. For the percentage commuting by car, the variable distributions, means, and standard deviations are reasonably similar in NTS 2013 and 2023 PSUs with Census 2011 and 2021 data. The mean for NTS 2013 PSUs was 40% commuting by car, whereas the mean for 2023 PSUs was 48%. These similarities suggest that even during lockdown restrictions, the 2021 Census captured geographical variation in levels of car commuting for the lower base of people not working from home at that time. By contrast, the percentage working from home variable distributions, means, and standard deviations are much more different in NTS 2013 and 2023 PSUs with Census 2011 and 2021 data. The mean for NTS 2013 PSUs was 3.5% working from home with a standard deviation of 2.1, whereas the mean for 2023 PSUs was 31% and the standard deviation 12. The regression tests conducted in this review demonstrate that the percentage commuting by car from Census 2021 is significantly associated with NTS 2023 measures, despite the fact that lockdown restrictions in place during Census 2021 had been lifted before NTS 2023 data collection began.

Question 4

Chapter 4: Conclusions and recommendations

Accepted Answer

This report considered whether the current set of stratifiers used to select the NTS sample is optimal, and if not, whether a new set would achieve sufficient gains in precision of estimates to warrant changes. The analysis used 16 key NTS 2023 survey measures and compared competing stratification strategies, based on a combination of geographical and Census 2021 measures, by fitting linear regression models at the area level and measuring the amount of variance explained.

The stratification variables recommended to be used for selection of the NTS sample based on the results of this review are shown in Table 9 below.

Table 9: NTS stratification recommendations based on the review

Order	Variable	Change to current stratification?
1st	International Territorial Level 2	No
2nd	Urban-rural indicator (2 groups), derived from the ten-category urban-rural Census 2011 classification	No
3rd	Percentage of households with no cars (tertiles), derived from the 2021 Census	No
4th	Percentage of people commuting by car (continuous), derived from the 2021 Census	Yes

The 2024 stratification review outcomes for each stratifier are summarised below.

1st stratifier: The current first stratifier International Territorial Level 2 (formerly NUTS2) remains the optimal choice of regional variable. When compared with 10 and 13 category GOR variables, ITL2 offered the highest precision for 8 of the 16 NTS measures tested as well as higher average precision across all 16 NTS measures.

2nd stratifier: Combining ITL2 with urban-rural status produced the highest adjusted R squared for 12 of the 16 NTS variables when compared with ITL2 alone and 10 or 13 category GOR. For the 4 variables where an alternative regional variable performed better, the difference was less than 1 percentage point for 3 of the NTS measures. Including urban-rural status as the second stratifier therefore improves precision and remain optimal. 2021 Census urban-rural classification will be used when this becomes available.

3rd stratifier: There is a relatively strong correlation between NTS variables and both Census measures of car ownership and household size. Car ownership or commuting by car measures featured in 8 of the 16 models, spread across different levels of the NTS hierarchical database, and were the best performing variable in 3 cases. Household size measures featured in 7 of the 16 models, again spread across the hierarchy, and were the best performing variable in 2 cases. Two stratification options with car ownership tertiles and 2 with household size tertiles as the first non-geographical Census-based stratifier were compared. The options including car ownership offered better precision when compared with household size options. It therefore seems sensible to retain percentage of households with no car or van in tertiles as the third stratifier both for precision and continuity.

4th stratifier: The combination of car ownership tertiles and the percentage of people travelling to work by car (car_w) achieved the best gain in precision compared with the current stratification for 6 of the 16 NTS measures. This option also offered the highest average precision overall of the 4 alternatives. Therefore car_w seems the most optimal choice for the fourth and final stratifier, in combination with car ownership.

This recommendation represents a minor change to the current NTS stratification. Only for the fourth and final stratifier is a different variable being recommended.

National Travel Survey: 2023 Sampling Stratification Review

Chapter 1: Introduction and Background

1.1 Sampling and stratification

Chapter 2: Methodology

2.1 Introduction

2.2 NTS measures (dependent variables)

Table 1: Key variables from NTS 2023 used in stratification review regression tests

2.3 Regional classifications (independent variables)

2.4 Census measures (independent variables)

Table 2: Variables from the 2021 Census used in stratification review regression tests

2.5 Regression analysis

Figure 1: formula used to calculate percentage change in precision for a change in stratification variables

Chapter 3: Analysis and results

3.1 Choosing a regional stratification variable

Table 3: Comparison of adjusted R squared values across the 4 regional variables for each NTS measure

3.2 Choosing Census-based stratifiers

Table 4: Significant Census 2021 variables included in stepwise regressions for each NTS measure (after controlling for ILT2 urbrur)

Table 5: Significant Census 2021 variables included in stepwise regressions for each NTS measure (after controlling for ITL2 urbrur and tertiles of car ownership)

Table 6: Significant Census 2021 variables included in stepwise regressions for each NTS measure (after controlling for ILT2 urbrur and tertiles of household size)

3.3 Comparison of alternative stratification options

Table 7: Comparison of adjusted R squared values across the current and 4 proposed alternative stratification options and for each NTS measure

Table 8: Percentage change adjusted R squared values across the 4 stratification options and for each NTS measure, compared with current NTS stratification

Chapter 4: Conclusions and recommendations

Table 9: NTS stratification recommendations based on the review

Is this page useful?

Help us improve GOV.UK

Help us improve GOV.UK

Cookies on GOV.UK

Chapter 1: Introduction and Background

1.1 Sampling and stratification

Chapter 2: Methodology

2.1 Introduction

2.2 NTS measures (dependent variables)

Table 1: Key variables from NTS 2023 used in stratification review regression tests

2.3 Regional classifications (independent variables)

2.4 Census measures (independent variables)

Table 2: Variables from the 2021 Census used in stratification review regression tests

2.5 Regression analysis

Figure 1: formula used to calculate percentage change in precision for a change in stratification variables

Chapter 3: Analysis and results

3.1 Choosing a regional stratification variable

Table 3: Comparison of adjusted R squared values across the 4 regional variables for each NTS measure

3.2 Choosing Census-based stratifiers

Table 4: Significant Census 2021 variables included in stepwise regressions for each NTS measure (after controlling for ILT2 urbrur)

Table 5: Significant Census 2021 variables included in stepwise regressions for each NTS measure (after controlling for ITL2 urbrur and tertiles of car ownership)

Table 6: Significant Census 2021 variables included in stepwise regressions for each NTS measure (after controlling for ILT2 urbrur and tertiles of household size)

3.3 Comparison of alternative stratification options

Table 7: Comparison of adjusted R squared values across the current and 4 proposed alternative stratification options and for each NTS measure

Table 8: Percentage change adjusted R squared values across the 4 stratification options and for each NTS measure, compared with current NTS stratification

Chapter 4: Conclusions and recommendations

Table 9: NTS stratification recommendations based on the review

Is this page useful?

Help us improve GOV.UK

Help us improve GOV.UK