National Travel Survey: 2023 Sampling Stratification Review
Updated 28 August 2024
Chapter 1: Introduction and Background
The National Travel Survey (NTS) provides up-to-date and regular information about personal travel within England and monitors trends in travel behaviour.
The survey collects detailed information on the key characteristics of each participating household and any vehicle to which they have access to. In addition, each individual within the household is interviewed and then asked to complete a 7-day travel record. The survey produces a rich dataset for analysis with information recorded at a number of different levels (household, individual, vehicle, long distance journey, day, trip and stage).
1.1 Sampling and stratification
The NTS 2023 was designed to provide a representative sample of households in England. It is based on a stratified 2-stage random probability sample of private households. The sample was drawn firstly by selecting the Primary Sampling Units (PSUs), and then by selecting addresses within PSUs. The sample design employs postcode sectors as PSUs. In 2023, 989 PSUs were issued and 22 addresses from each PSU, a total of 21,758 issued addresses.
The survey has used a quasi-panel design from 2002 onwards in which half the PSUs in the sample for a given year are retained for the next year’s sample and the other half are replaced. This has the effect of reducing the variance of estimates of year-on-year change. As the NTS sample size was increased in 2023, the proportion of PSUs retained from 2022 was less than half of the total. The 2023 issued sample included 324 retained PSUs, 33% of the total issued.
In order to draw the PSUs, a list of all postcode sectors in England was generated excluding those in the Isles of Scilly due to the cost of interviewing. Sectors carried over from the previous year were also excluded. Sectors with fewer than 500 delivery points were grouped with an adjacent sector. Grouped sectors were then treated as one PSU. On average, each PSU contained about 2,900 delivery points.
This list of grouped postcode sectors in England was sorted by a regional variable (International Territorial Level 2), urban-rural status within regional categories, tertiles of area-level car ownership from the 2011 Census, and area-level percentage of people aged 16 to 74 working mainly at home from the 2011 Census. This sorting was done in order to increase the precision of the sample and to ensure that the different strata in the population are correctly represented. Random samples of PSUs were then selected with probability proportional to delivery point count. Separate sample targets were set for Inner London, Outer London, and the rest of England, in order to oversample London due to historically lower response rates there.
In 2014, the National Centre for Social Research (NatCen) carried out a comprehensive review of the NTS sample stratification to examine whether the stratifiers used were optimal using data from NTS 2013 and Census 2011. This work concluded that the stratification described above was the best option and this has been used in the subsequent decade of NTS. In 2024, an equivalent review process has been undertaken using data from NTS 2023 and Census 2021, to determine whether the current stratification remains optimal and, if not, which option offers gains in precision of NTS estimates.
Chapter 2: Methodology
2.1 Introduction
The variables used to stratify the sampling of new PSUs were last reviewed by NatCen in 2014 by identifying a range of key estimates from the survey and then measuring the strength of the association between those estimates and a range of combinations of potential stratification variables.
Stratification improves precision most when the stratification variables are strongly correlated with the outcome measures. The aim was thus to identify the combination of potential stratification variables that were most strongly associated with key survey measures. This was done by fitting a series of regression models which measured the proportion of the variance explained when the potential stratification variables were added to the model. The conclusion of the review was that International Territorial Level 2 (ITL2), urban-rural status, tertiles of car ownership, and percentage of people working mainly at home were the best stratification variables.
Note: ITL2 was known as Nomenclature of Units for Territorial Statistics (NUTS2) at the time of the last review.
The aim of this report is to repeat the methodology that was previously used. This section explains the main elements of the methodology. The first step was identifying a range of key NTS survey measures, to compare competing stratification strategies by fitting regression models at the area level and measuring the amount of variance explained.
2.2 NTS measures (dependent variables)
Optimal stratifiers for one variable are likely to be different than for another, because the correlation between survey variables and stratification factors will be different for each survey variable. Choosing optimal stratifiers, therefore, requires making a compromise between the optimal solutions for a range of variables across different levels of the NTS database.
This report largely uses the same set of NTS variables which were included in the previous review in 2014, which used data from NTS 2013. In two instances where a variable was no longer available in NTS 2023, a similar variable was chosen. Table 1 below lists the 16 NTS variables analysed in this review and the level at which they were collected.
Table 1: Key variables from NTS 2023 used in stratification review regression tests
Level | Variable Name | Description | Statistic used |
---|---|---|---|
Household | NumCarVan_B01ID | Households with at least one car or van | Percentage |
Household | NumBike_B01ID | Households with at least one bike | Percentage |
Individual | DrivLic_B02ID | Adults with full car driving licence | Percentage |
Trip | TripPurpTo_B02ID | Purpose of trip ‘to’ was shopping | Percentage |
Trip | TripTravTime | Overall travelling time | Mean |
Trip | jotxsc | Overall trip time | Mean |
Trip | jd | Trip distance | Mean |
Trip | jjxsc | Number of trips | Mean |
Trip | TripPurpTo_B02ID | School trip distance | Mean |
Stage of trip | StageMode_B01ID | Stages where mode of transport was walking | Percentage |
Stage of trip | StageMode_B01ID | Stages where mode of transport was car | Percentage |
Stage of trip | sttxsc | Stage travelling time | Mean |
Stage of trip | sd | Stage distance | Mean |
Stage of trip | stagedistance | Length of stage | Mean |
Stage of trip | stagetime | Stage travel time | Mean |
Vehicle | vehanmileage | Annual mileage | Mean |
The variables listed in the table were used as dependent variables in the regression models, aggregated to the postcode sector (PSU) level. For example, the variable indicating whether an adult was in possession of a full car driving licence was aggregated to estimate the percentage of adults holding a full driving licence within each PSU.
2.3 Regional classifications (independent variables)
Four regional-level classifications were considered in the analysis, namely:
- the current NTS regional stratification variable based on ITL2 areas and 2-category urban-rural status, 52 categories in total (variable name: ITL2 urbrur)
- International Territorial Level 2 areas, 30 categories in total (variable name: ITL2)
- Government Office Region, with London split into inner and outer regions, 10 categories in total (variable name: GOR 10)
- Government Office Region, with North East, North West and Merseyside, Yorkshire and Humberside, and West Midlands split into metropolitan and non-metropolitan regions, 13 categories in total (variable name: GOR 13)
2.4 Census measures (independent variables)
The 2021 Census provides a wide range of variables at local area (LSOA) level that could be used to stratify a sample of postcode sectors. A comprehensive selection of options were chosen, covering a range of demographic characteristics and methods of commuting to work. Both individual-level and household-level Census variables were included. Table 2 below shows the 39 variables which were chosen to be tested as potential stratifiers, based on the previous stratification review in 2014.
Table 2: Variables from the 2021 Census used in stratification review regression tests
Variable name | Description |
---|---|
popdens | population density (persons per hectare) |
ag014 | % Persons aged 0 to 14 |
ag1524 | % Persons aged 16 to 24 |
ag2534 | % Persons aged 25 to 34 |
ag3544 | % Persons aged 35 to 44 |
ag4554 | % Persons aged 45 to 54 |
ag5564 | % Persons aged 55 to 64 |
ag6574 | % Persons aged 65 to 74 |
ag75p | % Persons aged 75 or over |
nonwhite | % Persons of an ethnicity other than white |
marital_m | % Adults married |
marital_nm | % Adults not married |
ownocc | % Owner occupier households |
larent | % Households rented from council |
privrent | % Privately rented households |
semi_d | % Households semi-detached |
nssec12 | % Persons aged 16 or over in NS-SEC categories 1 and 2 |
retire | % Persons aged 16 or over who are retired |
active | % Persons aged 16 or over economically active |
unemploy | % Persons aged 16 or over unemployed |
inactive | % Persons aged 16 or over economically inactive |
full | % Economically active working full time |
part | % Economically active working part time |
limitil | % Persons with limiting long-term illness |
noqual | % Adults with no qualifications |
level4p | % Adults with level 4 qualifications and above |
hhsize1 | % Households with 1 person |
hhsize2 | % Households with 2 persons |
hhsize3 | % Households with 3 persons |
hhsize4 | % Households with 4 or more persons |
hhsize | Average household size |
nokids | % Households with no dependent children |
cars0 | % Households with no car or van |
cars2p | % Households with 2 or more cars or vans |
cars | Average number of cars per household |
home_w | % Persons aged 16 or over working mainly at home in May 2021 |
train_w | % Persons aged 16 or over travelling to work by train, underground, metro, light train, tram |
car_w | % Persons aged 16 or over travelling to work by car or van |
bus_w | % Persons aged 16 or over travelling to work by bus, minibus, coach |
Prior to conducting this review, during the NTS 2024 sampling, a comparison was made between home_w (% Persons aged 16 or over working mainly at home) data collected in the 2011 and 2021 censuses. The 2021 Census took place in May and coincided with a period of coronavirus (COVID-19) lockdown restrictions. Consequently, the overall proportion of people working from home in May 2021 was much higher than in 2011. Aggregated to postcode sector level, the average percentage of persons working from home in England was 3.9% in 2011 Census and 31.7% in the 2021 Census. In addition, the 2021 variable has a much higher standard deviation. While home_w has nonetheless been included in the stratification review, it is important to note that it captured behaviour in May 2021 and is unlikely to accurately measure levels of working from home when no lockdown restrictions are in place.
2.5 Regression analysis
An analysis dataset at the PSU level was compiled by matching together the survey estimates listed in section 2.2, the regional classification variables listed in section 2.3, and the 2021 Census variables listed in section 2.4 using postcode sectors. This dataset was used to run separate stepwise linear regression models with each of the 16 key NTS measure as the dependent variable and all possible stratification options (geographical and Census-based) as independent variables.
Each model was compared on the basis of their adjusted multiple coefficient of determination, known as adjusted R squared. This measures the percentage of variance in the dependent variable accounted for by the variables in the regression model. It utilises a degree of freedom adjustment in estimating the error variance, thus making it a useful measure for comparing models based on different numbers of independent variables.
All things being equal, the 2 Census variables appearing more often in the final models, that is those which explain most of the variability in NTS measures, in combination with the chosen regional first stratifier, are in theory the best choices and should be considered as the subsequent stratifiers.
To examine the extent of any gain in precision, the adjusted R squared from 2 models (one containing the proposed stratifiers and one the existing ones) were compared. The percentage change in precision achieved by the proposed stratifiers was computed using the following formula:
Figure 1: formula used to calculate percentage change in precision for a change in stratification variables
(Alt text for Figure 1): The formula states that percentage gain in precision is equal to adjusted R squared for the new stratifiers minus adjusted R squared for the old stratifiers, divided by 1 minus adjusted R squared for the old stratifiers, multiplied by 100.
The results of the analysis undertaken are presented in the next section.
Chapter 3: Analysis and results
3.1 Choosing a regional stratification variable
The first 2 NTS stratification variables comprise a regional variable based on ITL2 areas and urban-rural status within each regional category (ITL2 urbrur). The previous review in 2014 considered 3 alternative regional stratification options and these were also tested in the present review to check that ITL2 with urban-rural status is still the optimal choice. The first alternative regional variable was ITL2, previously known as NUTS2, without urban-rural status. The other 2 were based on Government Office Region with differing numbers of categories, one with inner and outer London split (GOR 10) and one with a metropolitan and non-metropolitan split (GOR 13). The first stage of the review compared these regional measures to determine whether ILT2 still performs better than the 2 forms of GOR. It is also tested whether the addition of urban-rural status within regions still improves precision compared with region alone. Census 2011 urban-rural classification was used in this review as the updated Census 2021 version has not yet been released.
Separate linear regression models were fitted for each of the NTS dependent variables and each of the 4 regional categorical independent variables. For each NTS measure, the model with the highest value for adjusted R square offers the best precision. As was the case in the previous 2014 NTS stratification review, R squared represents a biased estimate of the true R squared, as the effect of sampling households within PSUs inflates the variance between area means. In other words, a portion of the area-level variance represents variance between households within the same area. Although this affects the final estimates, the comparison between different models is not affected as all tests are at PSU level.
The adjusted R squared for each regression model is given in Table 3, consisting of 4 models run for each NTS measure with a total of 64 results. The regional stratification variable that produces the highest R squared offers the best precision for the given NTS measure. This best precision variable is indicated in the final column of the table.
Table 3: Comparison of adjusted R squared values across the 4 regional variables for each NTS measure
NTS measure | ITL2 urbrur | ITL2 | GOR 13 | GOR 10 | Best precision |
---|---|---|---|---|---|
Households with at least one car or van | 23.0 | 22.1 | 15.2 | 21.3 | ITL2 urbrur |
Households with at least one bike | 8.0 | 8.1 | 6.0 | 5.7 | ITL2 |
Adults with full car driving licence | 14.4 | 11.8 | 9.4 | 9.5 | ITL2 urbrur |
Purpose of trip ‘to’ was shopping | 3.9 | 3.1 | 3.8 | 3.0 | ITL2 urbrur |
Overall travelling time | 18.8 | 17.4 | 16.1 | 17.4 | ITL2 urbrur |
Overall trip time | 22.1 | 21.5 | 19.2 | 21.9 | ITL2 urbrur |
Trip distance | 10.2 | 4.3 | 4.3 | 2.9 | ITL2 urbrur |
Number of trips | 4.6 | 5.4 | 3.0 | 5.3 | ITL2 |
School trip distance | 13.3 | 0.9 | 0.2 | 0.0 | ITL2 urbrur |
Stages where mode of transport was walking | 6.2 | 5.9 | 3.0 | 6.2 | GOR 10 |
Stages where mode of transport was car | 40.4 | 39.7 | 33.2 | 39.3 | ITL2 urbrur |
Stage travelling time | 13.6 | 12.4 | 10.0 | 12.5 | ITL2 urbrur |
Stage distance | 14.2 | 7.1 | 7.1 | 5.5 | ITL2 urbrur |
Length of stage | 14.6 | 7.6 | 7.5 | 5.8 | ITL2 urbrur |
Stage travel time | 10.1 | 7.8 | 6.5 | 7.9 | ITL2 urbrur |
Annual mileage | 3.3 | 4.6 | 4.1 | 3.8 | ITL2 |
Average for all 16 NTS measures | 13.8 | 11.2 | 9.3 | 10.5 | ITL2 urbrur |
The current regional stratifier including urban-rural status (ITL2 urbrur) produced the highest R squared for 12 of the 16 NTS variables. For example, ITL2 urbrur explained 14.4% of the variation in the percentage of adults in the PSU with a full car driving license compared to 11.8% for ITL2 alone, 9.4% for GOR 13, and 9.5% for GOR 10.
For 3 of the 4 NTS measures another regional variable performed better than ITL2 urbrur, the difference was less than a percentage point. The only exception is annual mileage, where ITL2 urbrur performed worst of the regional options. These results suggest that the current regional stratifier, ITL2, along with urban-rural status, remain the optimal choice of first and second stratifiers. Hence, ITL2 urbrur was used in all subsequent analyses to select the remaining stratifiers.
3.2 Choosing Census-based stratifiers
Stepwise linear regressions were carried out to examine which of the 39 Census independent variables listed in Table 2 were most highly correlated with each NTS measure, once ITL2 urbrur has been controlled for. The Census variable that was selected at each step, up to a maximum of 4 variables, is shown in Table 4 below. If additional variables did not add significantly to the explanatory power of a model, then fewer than 4 were recorded. Census variables are listed as ordered in the stepwise regression, so the first variable listed was added in first step and the fourth in the fourth step.
Table 4: Significant Census 2021 variables included in stepwise regressions for each NTS measure (after controlling for ILT2 urbrur)
NTS measure | Census 2021 variables in stepwise regression (maximum of 4) |
---|---|
Households with at least one car or van | cars0, unemploy, nokids, part |
Households with at least one bike | noqual, hhsize3 |
Adults with full car driving licence | Unemploy, noqual, cars, hhsize |
Purpose of trip ‘to’ was shopping | home_w, ag6574, bus_w |
Overall travelling time | car_w, ownocc, ag3544 |
Overall trip time | privrent, car_w, ownocc, ag3544 |
Trip distance | semi_d, limital, hhsize2, larent |
Number of trips | hhsize1, cars2p, unemploy |
School trip distance | hhsize4, privrent |
Stages where mode of transport was walking | Privrent, car_w, nonwhite, cars2p |
Stages where mode of transport was car | cars0, noqual |
Stage travelling time | Privrent, ag3544, ownocc, unemploy |
Stage distance | Limital, semi_d, hhsize2, larent |
Length of stage | Limital, semi_d, hhsize2, larent |
Stage travel time | Ownocc, car_w, ag3544 |
Annual mileage | Nonwhite, bus_w |
The analysis showed a relatively strong correlation between NTS variables and Census measures of car ownership and household size. Car ownership or commuting by car measures featured in 8 of the 16 models, spread across different levels of the NTS hierarchical database, and were the best performing variable in 3 cases. Household size measures featured in 7 of the 16 models, again spread across the hierarchy, and were the best performing variable in 2 cases.
It is worth noting that while the variables were those selected for contributing most to an increase in R squared, the actual difference in the magnitude of the increase between the highest and the next several was often relatively small. That is, similar R squared values could often be attained by various combinations of variables.
This analysis demonstrates that Census measures of car ownership continue to have strong associations with NTS outcomes. However, Census measures of household size also appear frequently in the results. It was therefore decided to conduct regression tests for the final Census stratifier using two alternative stratifiers in tertile form: percentage of households with no car or van (cars0) and percentage of households containing one person (hhsize1). Cars0 was the third stratifier chosen in the 2014 stratification review, after ITL2 and urban-rural status.
The stepwise analysis was repeated while controlling for ITL2 urbrur and percentage of households with no car in tertiles. Census variables measuring car ownership (cars0, cars2p, and cars) were not included in these stepwise regressions to avoid duplication. The Census variable that was selected at each step, up to a maximum of 3 variables, is shown in Table 5 below. If additional variables did not add significantly to the explanatory power of a model, then fewer than 3 were recorded. Census variables are listed as ordered in the stepwise regression, so the first variable listed was added in first step and the third in the third step.
Table 5: Significant Census 2021 variables included in stepwise regressions for each NTS measure (after controlling for ITL2 urbrur and tertiles of car ownership)
NTS measure | Census 2021 variables in stepwise regression (maximum of 3) |
---|---|
Households with at least one car or van | Ownocc, part, hhsize1 |
Households with at least one bike | Noqual, ag014, limitil |
Adults with full car driving licence | nssec12, car_w, ag014 |
Purpose of trip ‘to’ was shopping | Noqual, ag6574, hhsize1 |
Overall travelling time | car_w, ownocc, ag3544 |
Overall trip time | car_w, ownocc, ag3544 |
Trip distance | semi_d, ownocc, hhsize2 |
Number of trips | nokids |
School trip distance | privrent, hhsize2, home_w |
Stages where mode of transport was walking | car_w, privrent, nonwhite |
Stages where mode of transport was car | car_w, marital_m, ownocc |
Stage travelling time | Privrent, ownocc, ag75p |
Stage distance | semi_d, ownocc |
Length of stage | semi_d, ownocc |
Stage travel time | Ownocc, car_w, ag3544 |
Annual mileage | Nonwhite, bus_w |
The analysis showed that once ILT2 urbrur and car ownership tertiles were controlled for, the Census variables occurring most frequently in stepwise models were the percentage of owner occupier households (ownocc) and the percentage of persons aged 16 or over travelling to work by car or van (car_w). Ownocc appeared in 9 of the 16 models and car_w in 6.
The stepwise analysis was repeated again while controlling for ILT2 urbrur and percentage of households with one person in tertiles. Census variables measuring household size (hhsize1, hhsize2, hhsize3, hhsize4, and hhsize) were not included in the stepwise regressions to avoid duplication. The Census variable that was selected at each step, up to a maximum of 3 variables, is shown in Table 6 below. If additional variables did not add significantly to the explanatory power of a model, then fewer than 3 were recorded. Census variables are listed as ordered in the stepwise regression, so the first variable listed was added in first step and the third in the third step.
Table 6: Significant Census 2021 variables included in stepwise regressions for each NTS measure (after controlling for ILT2 urbrur and tertiles of household size)
NTS measure | Census 2021 variables in stepwise regression (maximum of 3) |
---|---|
Households with at least one car or van | cars0, level4p, unemploy |
Households with at least one bike | Noqual |
Adults with full car driving licence | Unemploy, nssec12, cars |
Purpose of trip ‘to’ was shopping | home_w, ag6574, bus_w |
Overall travelling time | car_w, ownocc, ag3544 |
Overall trip time | privrent, car_w, ownocc |
Trip distance | Limital, semi_d, ag3544 |
Number of trips | cars2p, nokids, ag4554 |
School trip distance | NO SIGNIFICANT VARIABLES |
Stages where mode of transport was walking | car_w, privrent, nonwhite |
Stages where mode of transport was car | cars0, noqual |
Stage travelling time | privrent, ag3544, ownocc |
Stage distance | Limital, semi_d, ag3544 |
Length of stage | Limital, semi_d, ag3544 |
Stage travel time | Ownocc, car_w, ag3544 |
Annual mileage | NO SIGNIFICANT VARIABLES |
The analysis showed that once ITL2 urbrur and household size tertiles were controlled for, the Census variables occurring most frequently in stepwise models were commuting measures (car_w, bus_w, home_w) and car ownership measures (cars0, cars, and cars2p). Commuting measures appeared in 5 of the 16 models and car ownership measures in 4. The variation between models for different NTS measures was greater when household size was controlled than when car ownership was controlled for.
The results so far have indicated that retaining a fourth stratifier would be useful to improve precision, in combination with tertiles of either car ownership or household size. In order to select the optimal combination of Census stratifiers for NTS, 4 options were chosen for comparison based on the variables occurring most often in the preceding results.
3.3 Comparison of alternative stratification options
The options for NTS stratification that were compared were:
- the current stratification option (Current): ITL2 urbrur, tertiles of car ownership, percentage working mainly from home
- Option 1: ITL2 urbrur, tertiles of car ownership, percentage owner occupier households
- Option 2: ITL2 urbrur, tertiles of car ownership, percentage travelling to work by car or van
- Option 3: ITL2 urbrur, tertiles of household size, percentage of households with no car or van
- Option 4: ITL2 urbrur, tertiles of household size, percentage travelling to work by car or van
Note: the stratifiers listed against each option are listed in the order that they would be used for sampling new PSUs.
Separate linear regression models were fitted for each of the NTS dependent variables and each of the 5 stratification options. For each NTS measure, the model with the highest value for adjusted R squared offers the best precision. Comparing the 5 options demonstrates whether an alternative would be an improvement on the current stratification variables and, if so, which is best.
The adjusted R squared for each regression model is given in Table 7, with the stratification option producing the highest value and thus the best precision for each NTS measure listed in the final column. When the stratification that is currently in use offers better precision than the 4 alternatives under consideration, ‘Current’ is listed in the final column.
Table 7: Comparison of adjusted R squared values across the current and 4 proposed alternative stratification options and for each NTS measure
NTS measure | Current | Option 1 | Option 2 | Option 3 | Option 4 | Best precision |
---|---|---|---|---|---|---|
Households with at least one car or van | 36.2 | 40.0 | 36.1 | 44.6 | 28.6 | 3 |
Households with at least one bike | 11.2 | 9.4 | 10.9 | 9.1 | 9.3 | Current |
Adults with full car driving licence | 30.6 | 29.8 | 27.1 | 30.0 | 15.7 | Current |
Purpose of trip ‘to’ was shopping | 5.0 | 4.0 | 4.9 | 3.6 | 4.6 | Current |
Overall travelling time | 19.9 | 21.6 | 21.6 | 20.6 | 21.7 | 4 |
Overall trip time | 25.2 | 25.9 | 27.1 | 25.9 | 27.0 | 2 |
Trip distance | 11.2 | 11.5 | 11.9 | 10.0 | 11.1 | 2 |
Number of trips | 8.1 | 7.8 | 8.3 | 7.4 | 7.1 | 2 |
School trip distance | 13.0 | 13.5 | 13.0 | 13.0 | 12.9 | 1 |
Stages where mode of transport was walking | 12.9 | 9.9 | 15.2 | 11.4 | 15.0 | 2 |
Stages where mode of transport was car | 49.4 | 51.1 | 51.8 | 53.7 | 48.8 | 3 |
Stage travelling time | 15.4 | 16.7 | 16.8 | 16.4 | 16.8 | 2 |
Stage distance | 15.0 | 15.5 | 15.5 | 14.1 | 14.6 | 1 |
Length of stage | 15.5 | 16.0 | 16.0 | 14.6 | 15.0 | 1 |
Stage travel time | 10.5 | 12.6 | 11.6 | 11.4 | 11.7 | 1 |
Annual mileage | 3.1 | 3.0 | 3.1 | 3.0 | 3.1 | 2 |
Average for all 16 NTS measures | 17.6 | 18.0 | 18.2 | 18.1 | 16.4 | 2 |
The comparison above demonstrates that alternative options perform better than the stratification currently used, when looking across all 16 NTS measures. The current stratification has the highest adjusted R squared for only 3 NTS measures and the second-lowest average precision of the 5 options. Option 3 has the highest adjusted R squared for 2 NTS measures and option 4 the highest for only one. Options 1 and 2 have the highest adjusted R squared values for key diary variables at the trip and stage levels. Option 1 has the highest value for 4 NTS measures and option 2 has the highest value for 6 NTS measures. Option 2 also has the highest average precision across all 16 variables, although this is only marginally higher than options 1 and 3.
The percentage gain or loss in precision for each NTS measure when comparing each of the 4 new options with the current stratification factors is shown in Table 8 below. The stratification option offering the best improvement in precision listed in the final column. When none of the 4 options offer an improvement on the current stratification, this is stated as ‘none’.
Table 8: Percentage change adjusted R squared values across the 4 stratification options and for each NTS measure, compared with current NTS stratification
NTS measure | Option 1 | Option 2 | Option 3 | Option 4 | Best improvement |
---|---|---|---|---|---|
Households with at least one car or van | 5.9 | -0.2 | 13.1 | -11.9 | 3 |
Households with at least one bike | -2.0 | -0.3 | -2.3 | -2.1 | None |
Adults with full car driving licence | -1.1 | -5.1 | -0.8 | -21.5 | None |
Purpose of trip ‘to’ was shopping | -1.1 | -0.2 | -1.5 | -0.5 | None |
Overall travelling time | 2.1 | 2.1 | 0.9 | 2.2 | 4 |
Overall trip time | 1.0 | 2.6 | 0.9 | 2.5 | 2 |
Trip distance | 0.4 | 0.8 | -1.3 | -0.1 | 2 |
Number of trips | -0.4 | 0.2 | -0.8 | -1.0 | 2 |
School trip distance | 0.6 | 0.0 | 0.0 | -0.1 | 1 |
Stages where mode of transport was walking | -3.5 | 2.6 | -1.8 | 2.3 | 2 |
Stages where mode of transport was car | 3.5 | 4.8 | 8.5 | -1.2 | 3 |
Stage travelling time | 1.6 | 1.7 | 1.2 | 1.6 | 2 |
Stage distance | 0.6 | 0.6 | -1.1 | -0.5 | 1 |
Length of stage | 0.6 | 0.5 | -1.1 | -0.6 | 1 |
Stage travel time | 2.4 | 1.3 | 1.1 | 1.4 | 1 |
Annual mileage | -0.1 | 0.0 | 0.0 | 0.0 | 2 |
The above analysis demonstrates that option 2 most frequently offers the best improvement in precision compared with the current stratification for 6 of the 16 NTS measures. Option 1 offers the best improvement for 4 NTS measures, option 3 for two, and option 4 for only one. For 3 measures, the current stratification offers better precision that the potential alternatives under consideration.
The comparison undertaken between the current NTS stratification variables and the 4 potential alternatives has demonstrated that option 2 offers the best precision across the hierarchy of NTS measures. Option 2 includes tertiles of car ownership as the third stratifier and the percentage travelling to work by car or van (car_w) as the fourth stratifier. The other Census variable tested as a fourth stratifier in the options above is the percentage of owner occupier households (ownocc). As a final test of which is better suited to be the final stratifier, a correlation analysis was carried out to test the correlation between the 2 potential final stratifiers and the percentage of households with no car or van. The latter variable is the current third stratifier and also the third stratifier in options 1 and 2, the best performing alternatives in the comparison tests reported above.
The correlation analysis suggested that both ownocc and car_w are negatively correlated with the percentage of households with no cars, although ownocc is more strongly correlated (Pearson’s R = -0.837) than car_w (Pearson’s R = -0.635). Consequently option 2 appears preferable, confirming the results presented in tables 7 and 8 above. The current fourth stratifier, home_w, and this proposed alternative, car_w, are both measures of travel to work derived from the same Census question. Option 2 therefore represents a relatively minor change to the current stratification for NTS, but the analysis conducted nonetheless shows it to be the preferable alternative.
As explained in section 2.4, responses to the commuting question in the 2021 Census were impacted by lockdown restrictions in place at the time. The recorded level of working from home was relatively high, therefore the proportion of people commuting by any mode was lower in May 2021 than May 2011. Despite this, the Census 2021 car commuting measure offers the best precision as fourth stratification variable when compared with the 4 other options tested. To check the impact of lockdown restrictions on car_w, a comparison was made of the variables car_w and home_w from Census 2011 and 2021 matched onto NTS 2013 and NTS 2023 PSUs in the respective stratification reviews. For the percentage commuting by car, the variable distributions, means, and standard deviations are reasonably similar in NTS 2013 and 2023 PSUs with Census 2011 and 2021 data. The mean for NTS 2013 PSUs was 40% commuting by car, whereas the mean for 2023 PSUs was 48%. These similarities suggest that even during lockdown restrictions, the 2021 Census captured geographical variation in levels of car commuting for the lower base of people not working from home at that time. By contrast, the percentage working from home variable distributions, means, and standard deviations are much more different in NTS 2013 and 2023 PSUs with Census 2011 and 2021 data. The mean for NTS 2013 PSUs was 3.5% working from home with a standard deviation of 2.1, whereas the mean for 2023 PSUs was 31% and the standard deviation 12. The regression tests conducted in this review demonstrate that the percentage commuting by car from Census 2021 is significantly associated with NTS 2023 measures, despite the fact that lockdown restrictions in place during Census 2021 had been lifted before NTS 2023 data collection began.
Chapter 4: Conclusions and recommendations
This report considered whether the current set of stratifiers used to select the NTS sample is optimal, and if not, whether a new set would achieve sufficient gains in precision of estimates to warrant changes. The analysis used 16 key NTS 2023 survey measures and compared competing stratification strategies, based on a combination of geographical and Census 2021 measures, by fitting linear regression models at the area level and measuring the amount of variance explained.
The stratification variables recommended to be used for selection of the NTS sample based on the results of this review are shown in Table 9 below.
Table 9: NTS stratification recommendations based on the review
Order | Variable | Change to current stratification? |
---|---|---|
1st | International Territorial Level 2 | No |
2nd | Urban-rural indicator (2 groups), derived from the ten-category urban-rural Census 2011 classification | No |
3rd | Percentage of households with no cars (tertiles), derived from the 2021 Census | No |
4th | Percentage of people commuting by car (continuous), derived from the 2021 Census | Yes |
The 2024 stratification review outcomes for each stratifier are summarised below.
1st stratifier: The current first stratifier International Territorial Level 2 (formerly NUTS2) remains the optimal choice of regional variable. When compared with 10 and 13 category GOR variables, ITL2 offered the highest precision for 8 of the 16 NTS measures tested as well as higher average precision across all 16 NTS measures.
2nd stratifier: Combining ITL2 with urban-rural status produced the highest adjusted R squared for 12 of the 16 NTS variables when compared with ITL2 alone and 10 or 13 category GOR. For the 4 variables where an alternative regional variable performed better, the difference was less than 1 percentage point for 3 of the NTS measures. Including urban-rural status as the second stratifier therefore improves precision and remain optimal. 2021 Census urban-rural classification will be used when this becomes available.
3rd stratifier: There is a relatively strong correlation between NTS variables and both Census measures of car ownership and household size. Car ownership or commuting by car measures featured in 8 of the 16 models, spread across different levels of the NTS hierarchical database, and were the best performing variable in 3 cases. Household size measures featured in 7 of the 16 models, again spread across the hierarchy, and were the best performing variable in 2 cases. Two stratification options with car ownership tertiles and 2 with household size tertiles as the first non-geographical Census-based stratifier were compared. The options including car ownership offered better precision when compared with household size options. It therefore seems sensible to retain percentage of households with no car or van in tertiles as the third stratifier both for precision and continuity.
4th stratifier: The combination of car ownership tertiles and the percentage of people travelling to work by car (car_w) achieved the best gain in precision compared with the current stratification for 6 of the 16 NTS measures. This option also offered the highest average precision overall of the 4 alternatives. Therefore car_w seems the most optimal choice for the fourth and final stratifier, in combination with car ownership.
This recommendation represents a minor change to the current NTS stratification. Only for the fourth and final stratifier is a different variable being recommended.