Official Statistics

Participation Survey July to September 2023 Technical Report

Updated 24 July 2024

Applies to England

December 2023

© Verian 2023

1. Introduction

1.1 Background to the survey

In 2021, the Department for Culture, Media and Sport (DCMS) commissioned Verian (formerly Kantar Public) to design and deliver a new, nationally representative ‘push-to-web’ survey to assess adult participation in DCMS sectors across England. The survey served as a successor to the Taking Part Survey, which ran for 16 years as a continuous face to face survey.

This technical note relates to the 2023/24 Participation Survey Quarter 2 fieldwork, conducted between 7th July and 2nd October 2023.

The 2023/24 Participation Survey was commissioned by DCMS in partnership with Arts Council England (ACE). The scope of the survey is to deliver a nationally representative sample of adults (aged 16 years and over) and to assess adult participation in DCMS sectors across England, targeting enough households to allow for Local Authority representation of the data. The data collection model for the Participation Survey is based on ABOS (Address-Based Online Surveying), a type of ‘push-to-web’ survey method. Respondents take part either online or by completing a paper questionnaire. In 2023/24 the target respondent sample size increased to 175,000 – which was previously 33,000 per survey year in the interim survey from 2021 to 2023. Fieldwork will run across four quarters (May to June 2023, July to September 2023, October to December 2023 and January to March 2024).

1.2 Survey objectives

  • To inform and monitor government policy and programmes in DCMS, ACE and other government departments (OGDs) on adult engagement with the DCMS and digital sectors [footnote 1]. The survey will also gather information on demographics (for example, age, gender, education).

  • To assess the variation in engagement with cultural activities across DCMS sectors in England, and the differences in social-demographics such as location, age, education, and income.

  • To monitor and report on progress in achieving the Outcomes set out in Let’s Create [footnote 2] – Creative People, Cultural Communities, and A Creative and Cultural Country (as set out in the Arts Council England Impact Framework.

In preparation of the 2023/24 survey, Verian (formerly Kantar Public) undertook questionnaire development work to test any new or amended questions. The 2023/24 survey launched in May 2023.

1.3 Survey design

The basic ABOS design is simple: a stratified random sample of addresses is drawn from the Royal Mail’s postcode address file and an invitation letter is sent to each one, containing username(s) and password(s) plus the URL of the survey website. Sampled individuals can log on using this information and complete the survey as they might any other web survey. Once the questionnaire is complete, the specific username and password cannot be used again, ensuring data confidentiality from others with access to this information.

It is usual for at least one reminder to be sent to each sampled address and it is also usual for an alternative mode (usually a paper questionnaire) to be offered to those who need it or would prefer it. It is typical for this alternative mode to be available only on request at first. However, after nonresponse to one or more web survey reminders, this alternative mode may be given more prominence.

Paper questionnaires ensure coverage of the offline population and are especially effective with sub-populations that respond to online surveys at lower-than-average levels. However, paper questionnaires have measurement limitations that constrain the design of the online questionnaire and also add considerably to overall cost. For the Participation Survey, paper questionnaires are used in a limited and targeted way, to optimise rather than maximise response.

2. Sampling

2.1 Sample design: addresses

The address sample design is intrinsically linked to the data collection design (see ‘Details of the data collection model’ below) and was designed to yield a respondent sample that is representative with respect to neighbourhood deprivation level, and age group within each of the 33 ITL2 regions and 309 lower-tier local authorities [footnote 3] in England. This approach limits the role of weights in the production of unbiased survey estimates, narrowing confidence intervals compared with other designs.

The design also sought a minimum four-quarter respondent sample size of 500 for each local authority and a minimum four-quarter effective sample size of 2,700 for each ITL2 region [footnote 4]. Although there were no specific targets per quarter, the sample selection process was designed to ensure that the respondent sample size per local authority was approximately the same per quarter.

As a first step, a stratified master sample of 726,790 addresses in England was drawn from the Postcode Address File (PAF) ‘small user’ subframe. Before sampling, the PAF was disproportionately stratified by lower tier local authority (309 strata) Furthermore, within each of the 309 strata, the PAF was sorted by (i) neighbourhood deprivation level (5 groups, each of a similar scale at the national level), (ii) super output area, and finally (iii) by postcode. This ensured that the master sample of addresses was geo-demographically representative within each stratum.

This master sample of addresses was then augmented by data supplier CACI. For each address in the master sample, CACI added the expected number of resident adults in each ten-year age band. Although this auxiliary data will have been imperfect, ’s investigations by Verian (formerly Kantar Public) have shown that it is highly effective at identifying households that are mostly young or mostly old. Once this data was attached, the master sample was additionally stratified by expected household age structure based on the CACI data:

(i) all aged 35 or younger (17% of the total)

(ii) all aged 65 or older (21% of the total)

(iii) all other addresses (62% of the total).

The conditional sampling probability in each stratum was varied to compensate for (expected) residual variation in response rate that could not be ‘designed out’, given the constraints of budget and timescale. The underlying assumptions for this procedure were derived from empirical evidence obtained from the 2021/22 and 2022/23 Participation Surveys.

Verian (formerly Kantar Public) drew a stratified random sample of 455,546 addresses from the master sample of 726,790 and systematically allocated them with equal probability to quarters 1, 2, 3 and 4 (that is, circa 113,886 addresses per quarter). Verian (formerly Kantar Public) then systematically distributed the quarter-specific samples to three equal-sized ‘replicates’, each with the same profile. The second replicate was expected to be issued two weeks after the first replicate, and the third replicate was expected to be issued two weeks after the second replicate to ensure that data collection was maximally spread throughout the three-month period allocated to each quarter [footnote 5].

These replicates were further subdivided into twenty-five equal sized ‘batches’ to help manage fieldwork. The expectation was that only the first twenty batches within each replicate would be issued (that is, circa 30,370 addresses), with the twenty first to the twenty fifth batches kept back in reserve.

In quarter 2, a proportion of the reserve batches was used in the second and third replicates in order to make up for the lower than targeted number of completes in quarter 1 as well as to boost the number of completes in quarter 2. In total, 102,451 addresses were issued for quarter 2.

Table 1 shows the quarter 2 (issued) sample structure with respect to the major ‘design’ strata: neighbourhood deprivation level and expected household age structure.

Table 1: Initial address issue by area deprivation quintile group

Expected household age structure Most deprived 2nd 3rd 4th Least deprived
All <=35 4,229 4,621 3,913 3,333 2,466
Other 11,255 14,384 14,224 13,257 11,715
All >=65 3,069 3,266 4,374 4,418 3,927

2.2 Sample design: individuals within sampled addresses

All resident adults aged 16+ were invited to complete the survey. In this way, the Participation Survey avoided the complexity and risk of selection error associated with remote random sampling within households.

However, for practical reasons, the number of logins provided in the invitation letter was limited. The number of logins was varied between two and four, with this total adjusted in reminder letters to reflect household data provided by prior respondent(s). Addresses that CACI data predicted contained only one adult were allocated two logins; addresses predicted to contain two adults were allocated three logins; and other addresses were allocated four logins. The mean number of logins per address was 2.7. Paper questionnaires were available to those who are offline, not confident online, or unwilling to complete the survey this way.

2.3 Details of the data collection model

Table 2 summarises the data collection design within each stratum, showing the number of mailings and type of each mailing: push-to-web (W) or mailing with paper questionnaires (P). For example, ‘WWP’ means two push-to-web mailings and a third mailing with paper questionnaires included alongside the web survey login information. In general, there was a two-week gap between mailings.

Table 2: Data collection design by stratum

Expected household age structure Most deprived 2nd 3rd 4th Least deprived
All <=35 WWPW WWWW WWWW WWW WWW
Other WWPW WWW WWW WWW WWW
All >=65 WWPW WWPW WWP WWP WWP

3. Questionnaire

3.1 Questionnaire development

The online questionnaire was designed to take an average of 30 minutes to complete. A modular design was used with around half of the questionnaire made up of a core set of questions asked of the full sample. The remaining questions were split into three separate modules, randomly allocated to a subset of the sample.

The postal version of the questionnaire included the same set of core questions asked online, but the modular questions were omitted to avoid overly burdening respondents who complete the survey on paper, and to encourage response. Copies of the online and paper questionnaires are available online.

Given the extent of questionnaire changes in the 2023/24 Participation Survey, it was important to implement a comprehensive development and testing phase. This was made up of three key stages:

  • Questionnaire review

  • Cognitive testing

  • Usability testing

3.2 Questionnaire changes

Questions on the following topics of interest were added to the Participation Survey 2023/24, as requested by ACE and/or DCMS:

  • Environment, which included questions on mode of transport taken while travelling to an arts and cultural event, distance travelled, and reason(s) for transportation choice.

  • Social prescribing, which included questions on the respondent’s experience with social prescribing, and the types of activities they were referred to.

  • Further questions on arts and culture engagement, which included questions on the types of classes and clubs respondents have taken part in, the frequency and reasons(s) for their involvement, the impact/benefits of participating, and for non-participants, the reason for not participating.

  • Pride in Place, which included questions on respondents’ sense of belonging and pride of their local area, the role culture plays in choosing where to live, and the current arts and culture scene in their local area.

3.3 Q2 change to question wording

From July, the response options for 5G awareness question (CDIG5GAW) was expanded. The new option was added in the second place in the list, such that the question now reads as follows.

“5G (which stands for fifth generation) is the next step in mobile technology. It offers faster mobile internet speeds.

Which statement below best describes how much you know about 5G mobile technology?

  1. I hadn’t heard of it before now

  2. I have heard of it and already use it ← The new addition

  3. I have heard of it but am not sure what it is

  4. I understand what it is but am not interested in getting it in the near future

  5. I understand what it is and am interested in getting it in the near future.”

Further details about the questionnaire development work can be found in the Participation Survey methodology reports.

4. Fieldwork

4.1 Contact procedures

All selected addresses were sent an initial invitation letter containing the following information:

  • A brief description of the survey

  • The URL of survey website (used to access the online script)

  • A QR code that can be scanned to access the online survey

  • Log-in details for the required number of household members

  • An explanation that participants will receive a £10 shopping voucher

  • Information about how to contact Verian (formerly Kantar Public) in case of any queries

The reverse of the letter featured responses to a series of Frequently Asked Questions.

All non-responding addresses were sent two reminder letters, at the end of the second and fourth weeks of fieldwork respectively. A pre-selected subset of non-responding addresses (see Table 2) was sent a third reminder letter at the end of the sixth week of fieldwork. The information contained in the reminder letters was similar to the invitation letters, with slightly modified messaging to reflect each reminder stage.

As well as the online survey, respondents were given the option to complete a paper questionnaire, which consisted of an abridged version of the online survey. Each letter informed respondents that they could request a paper questionnaire by contacting Verian (formerly Kantar Public) using the email address or freephone telephone number provided, and a cut-off date for paper questionnaire requests was also included on the letters.

In addition, some addresses received up to two paper questionnaires with the second reminder letter. This targeted approach was developed based on historical data Verian (formerly Kantar Public) has collected through other studies, which suggests that proactive provision of paper questionnaires to all addresses can actually displace online responses in some strata. Paper questionnaires were pro-actively provided to (i) sampled addresses in the most deprived quintile group, and (ii) sampled addresses where it was expected that every resident would be aged 65 or older (based on CACI data).

4.2 Fieldwork performance

In total, 46,581 respondents completed the survey during quarter 2 – 41,033 via the online survey and 5,548 by returning a paper questionnaire. Following data quality checks (see Chapter 4 for details), 2,560 respondents were removed (2,550 web and 10 paper), leaving 44,021 respondents in the final dataset. 

This constitutes a 43% conversion rate, a 30% household-level response rate, and an individual-level response rate of 25% [footnote 6].

For the online survey, the median completion time was 25 minutes and 11 seconds, and the average completion time was 27 minutes and 22 seconds [footnote 7].

5. Data processing

5.1 Data management

Due to the different structures of the online and paper questionnaires, data management was handled separately for each mode. Online questionnaire data was collected via the web script and, as such, was much more easily accessible. By contrast, paper questionnaires were scanned and converted into an accessible format.

For the final outputs, both sets of interview data were converted into IBM SPSS Statistics, with the online questionnaire structure as a base. The paper questionnaire data was converted to the same structure as the online data so that data from both sources could be combined into a single SPSS file.

5.2 Quality checking

Initial checks were carried out to ensure that paper questionnaire data had been correctly scanned and converted to the online questionnaire data structure. For questions common to both questionnaires, the SPSS output was compared to check for any notable differences in distribution and data setup.

Once any structural issues had been corrected, further quality checks were carried out to identify and remove any invalid interviews. The specific checks were as follows:

  1. Selecting complete interviews: Any test serials in the dataset (used by researchers prior to survey launch) were removed. Cases were also removed if the respondent reached, but did not answer the fraud declaration statement (online: QFraud; paper: Q73).

  2. Duplicate serials check: If any individual serial had been returned in the data multiple times, responses were examined to determine whether this was due to the same person completing multiple times or due to a processing error. If they were found to be valid interviews, a new unique serial number was created, and the data was included in the data file. If the interview was deemed to be a ‘true’ duplicate, the more complete or earlier interview was retained.

  3. Duplicate emails check: If multiple interviews used the same contact email address, responses were examined to determine if they were the same person or multiple people using the same email. If the interviews were found to be from the same person, only the most recent interview was retained. In these cases, online completes were prioritised over paper completes due to the higher data quality.

  4. Interview quality checks: A set of checks on the data were undertaken to check that the questionnaire was completed in good faith and to a reasonable quality. Several parameters were used:

    a. Interview length (online check only)

    b. Number of people in household reported in interview(s) vs number of total interviews from household.

    c. Whether key questions have valid answers.

    d. Whether respondents have habitually selected the same response to all items in a grid question (commonly known as ‘flatlining’) where selecting the same responses would not make sense.

    e. How many multi-response questions were answered with only one option ticked.

Following the removal of invalid cases, 44,021 valid cases were left in the final dataset.

5.3 Data checks and edits

Upon completion of the general quality checks described above, more detailed data checks were carried out to ensure that the right questions had been answered according to questionnaire routing. This is generally all correct for all online completes, as routing is programmed into the scripting software, but for paper completes, data edits were required.

There were two main types of data edit, both affecting the paper questionnaire data:

  1. Single-response questions edits: If a paper questionnaire respondent had mistakenly answered a question that they weren’t supposed to, their response in the data was changed to “-3: Not Applicable”. If a paper questionnaire respondent had neglected to answer a question that they should have, they were assigned a response in the data of “-4: Not answered but should have (paper)”. If a paper questionnaire respondent had tick more than one box for a single response question they were assigned a response in the data of “-5: Multi-selected for single response (paper)”.

  2. Multiple response question edits: If a paper questionnaire respondent had mistakenly answered a question that they weren’t supposed to, their response was set to “-3: Not Applicable”. If a paper questionnaire respondent had neglected to answer a question that they should have, they were assigned a response in the data of “-4: Not answered but should have (paper)”. Where the respondent had selected both valid answers and an exclusive code such as “None of these”, any valid codes were retained and the exclusive code response was set to “0”.

Other, more specific data edits were also made, as described below:

  1. Additional edits to library question: The question CLIBRARY1 was formatted differently in the online script and paper questionnaire. In the online script it was set up as one multiple-response question, while in the paper questionnaire it consisted of two separate questions (Q21 and Q25). During data checking, it was found that many paper questionnaire respondents followed the instructions to move on from Q21 and Q25 without ticking the “No” response. To account for this, the following data edits were made:

    a. If CFRELIB12 and CPARLI12B was not answered and CNLIWHYA was answered, set CLIBRARY1_001 was set to 0 if it was left blank.

    b. If CFRELIDIG and CDIGLI12 was not answered and CNLIWHYAD was answered, CLIBRARY1_002 was set to 0 if it was left blank.

    c. CLIBRARY1_003 and CLIBRARY1_004 was set to 0 for all paper questionnaire respondents.

  2. Additional edits to grid questions: Due to the way the paper questionnaire was set up, additional edits were needed for the following linked grid questions: CARTS1/CARTS1A, CARTS2/CARTS2A, CARTS3/CARTS3A, CARTS4/CARTS4A, ARTPART12/ARTPART12A.

Figure 1 shows an example of a section in the paper questionnaire asking about attendance at arts events.

Figure 1: Example of the CARTS1 and CARTS1A section in the paper questionnaire

Marking the option “Not in the last 12 months” on the paper questionnaire was equivalent to the code “0: Have not done this” at CARTS1 in the online script. As such, leaving this option blank in the questionnaire would result in CARTS1 being given a default value of “1” in the final dataset. In cases where a paper questionnaire respondent had neglected to select any of the options in a given row, CARTS1 was recoded from “1” to “0”.

If the paper questionnaire respondent did not tick any of the boxes on the page, they were recoded to “-4: Not answered but should have (paper)”.

5.4 Coding

Post-interview coding was undertaken by members of the Verian (formerly Kantar Public) coding department. The coding department coded verbatim responses, recorded for ‘other specify’ questions.

For example, if a respondent selected “Other” at CARTS1 and wrote text that said they went to some type of live music event, in the data they would be back-coded as having attended a “a live music event” at CARTS1_006.

For the sets CASRT1/CARTS1A/CARTS1B, CASRT2/CARTS2A/CARTS2B and CHERVIS12/CFREHER12/CVOLHER data edits were made to move responses coded to “Other” to the correct response code, if the answer could be back coded to an existing response code.

5.5 Data outputs

Once the checks were complete a final SPSS data file was created that only contained valid interviews and edited data. For 2023-24 the data file has a prefix of “Y3_” added to variable names to indicate this is the 2023-24 survey and there have been substantial changes to the questionnaire compared to last year.

From this dataset, a set of data tables were produced. Due to the changes to the questionnaire structure the tables have also been updated accordingly. Notably the measures for “Engaged with heritage physically or digitally” and “Engaged with heritage physically and digitally” from table 3a can no longer be derived in this quarter [footnote 8].

5.6 Weighting

A three-step weighting process was used to compensate for differences in both sampling probability and response probability:

  1. An address design weight was created equal to one divided by the sampling probability; this also served as the individual-level design weight because all resident adults could respond.

  2. The expected number of responses per address was modelled as a function of data available at the neighbourhood and address levels. The step two weight was equal to one divided by the predicted number of responses.

  3. The product of the first two steps was used as the input for the final step to calibrate the sample. The responding sample was calibrated to the January-March 2023 Labour Force Survey (LFS) with respect to: (i) sex by age, (ii) educational level by age, (iii) ethnic group, (iv) housing tenure, (v) ITL2 region, (vi) employment status by age, (vii) household size, (viii) presence of children in the household, and (ix) internet use by age.

An equivalent weight was also produced for the (majority) subset of respondents who completed the survey by web. This weight was needed because a few items were included in the web questionnaire but not the paper questionnaire.

It should be noted that the weighting only corrects for observed bias (for the set of variables included in the weighting matrix) and there is a risk of unobserved bias. Furthermore, the raking algorithm used for the weighting only ensures that the sample margins match the population margins. There is no guarantee that the weights will correct for bias in the relationships between the variables.

The final weight variables in the dataset are:

  • ‘Finalweight’ – to be used when analysing data available from both the web and paper questionnaires.

  • ‘Finalweightweb’ – to be used when analysing data available only from the web questionnaire.

  1. In February 2023, there was a Machinery of Government (MoG) change and responsibility for digital policy now sits within the Department for Science, Innovation and Technology (DSIT). This MoG change did not affect the contents of the Participation Survey for 2023/24—digital questions are still part of the survey. 

  2. Let’s Create, a strategic vision by ACE, sets out that by 2030 they want England to be a country in which the creativity of each of us is valued and given the chance to flourish and where everyone has access to a remarkable range of high quality cultural experiences. They invest public money from the government and The National Lottery to help support the sector and to deliver this vision. 

  3. International Territorial Level (ITL) is a geocode standard for referencing the subdivisions of the United Kingdom for statistical purposes, used by the Office for National Statistics (ONS). Since 1 January 2021, the ONS has encouraged the use of ITL as a replacement to Nomenclature of Territorial Units for Statistics (NUTS), with lookups between NUTS and ITL maintained and published until 2023. 

  4. The effective sample size represents the statistical value of the sample after applying weights to compensate for the variation in address sampling probabilities within each ITL2 region. 

  5. In the event, the interval between first and second replicates was three weeks and between second and third replicates, the interval was one and a half weeks. 

  6. Response rates were calculated via the standard ABOS method. An estimated 8% of ‘small user’ PAF addresses in England are assumed to be non-residential (derived from interviewer administered surveys). The average number of adults aged 16+ per residential household, based on the Labour Force Survey, is 1.89. Thus, the response rate formula: Household RR = number of responding households / (number of issued addresses0.92); Individual RR = number of responses / (number of issued addresses0.92*1.89). The conversion rate is the simple ratio of the number of responses to the number of issued addresses.  

  7. Interview lengths under 2 minutes are removed, and they are capped at the 97th percentile. If interviews are under 10 minutes, they are flagged in the system for the research team to evaluate; if they are flagged for other fraud checks, then those interviews are removed. 

  8. Due to an oversight when allocating questions to different split sample modules, the physical heritage questions were asked to one subset of respondents, whilst the digital heritage questions were asked to a different subset of respondents. This means we cannot produce a figure for total heritage engagement (physical or digital) or a figure for engaging both physically and digitally in Q1 and Q2. We are in the process of rectifying this for Q3 and Q4.