Monitoring ambient air: data analysis techniques

Question

Accepted Answer

A summary of the main methods you should use to analyse ambient air monitoring data.

Directional analysis

Directional analysis techniques provide an insight into the direction and nature of pollution sources relative to the position of the monitoring location. These techniques assume that the wind direction recorded at the monitoring site is representative of the local wind trajectory. The following directional analysis techniques are available.

Pollution rose analysis

Plotting a ‘pollution rose’ can provide an insight into the direction of pollution sources relative to the position of the monitoring location. To achieve an effective pollution rose, you must collect concentration measurements concurrently with wind direction measurements.

You should optimise the averaging period of these measurements to provide a wind direction value representative of the concentration measurement averaging period, that is 15 minutes or hourly. If the duration of each averaged wind direction value is too long, the variation of actual instantaneous wind directions within that period will cause a greater degree of uncertainty in the evaluation. The averaging time must not be significantly shorter than the expected time that it would take for the pollutant to travel from source to monitor. This could mean wind direction measurement at the time of the pollutant measurement is not representative of the average wind direction during the travel time of the pollutant to the monitor.

You can create the pollution rose by first dividing the data into wind direction sectors. Then plot the wind direction sector against the mean of all the concentration measurements taken when the wind was coming from that sector. The presentation of this can be a conventional bar chart or can be a radial plot. A radial plot tends to be easier to interpret as directional biases in concentration appear to point in the direction of the sources. It is this radial representation of the data from which the name ‘rose’ comes.

If you plot 2 pollution roses on the same figure, it makes it easier to visually compare the data. For example, a plot of mean pollution roses for PM₁₀ and PM_2.5, can show if there is a prominent source of either PM₁₀ or PM_2.5. If there was a prominent source of PM₁₀ fraction and not PM_2.5, that would indicate that it was not from a combustion source.

Triangulation analysis

Multiple monitoring locations can help pinpoint sources of emission within a given area. For example, PM₁₀ pollution roses for 5 different monitoring locations in and around a large industrial site may show that each of the pollution roses has a bias that points toward the site

This technique can aid the identification of the processes within the site that are producing the greatest PM₁₀ emissions. This information helps decide where to allocate mitigation effort so that it has the greatest impact in reducing emissions.

Percentile rose analysis

Percentile analysis provides a method of looking at the distribution of concentrations within a data set. Microsoft Excel calculates percentiles by first sorting the concentrations into ascending order and then ranking each concentration. It then interpolates the value of a particular percentile from the calculated ranking, by calculating the concentration below which a certain percentage of concentrations fall. For example, at the 95th percentile, 95% of the data will be below this value and 5% of the data will be above it.

To produce radial percentile roses, first divide the data into the required wind sectors. Then the data in each sector undergoes separate percentile analysis. This shows the concentration of a pollutant at different percentiles for different wind sectors. You can then visually examine the distribution of pollutant concentrations at a particular monitoring site. This in turn will provide information on the source that may be influencing levels at the monitoring site.

By breaking each 10° wind sector down into several different percentiles you can see whether biases are present in all the percentiles or just certain ones. This can tell you whether a source is affecting the monitoring site relatively continuously or just intermittently. For example, you observe a bias in all the percentiles in a 10° wind sector. This suggests that the source is emitting relatively continuously as it is influencing a large percentage of the sector’s data.

A bias that is only observed in the higher percentiles suggests that the source is intermittent. It only affects a small percentage of the sector’s data. So, it does not affect concentrations at the monitoring site every time the wind is coming from this direction. But occasionally you may observe a bias in the lower percentiles that is not evident in the higher percentiles. This suggests that the source is relatively continuous, as it is affecting a large percentage of the data. It also tells you that the source is not causing appreciably high concentrations at the monitoring site.

Frequency roses

The frequency rose shows, for each wind direction, the number of times or the fraction of the total time that exceeds a given level or conversely does not reach.

Conditional probability function (CPF) plots in OpenAir

CPF plots, using the OpenAir software package in R, can help identify the wind direction and wind speeds from which the most prominent pollutant sources are likely to occur.

The CPF calculates the probability that in a particular wind sector, the concentration of a given pollutant is greater than some specified concentration value. This is usually expressed as a high percentile of the species of interest.

The CPF is defined as:

CPF_Δθ = m_Δθ | C≥⃒x / n_Δθ

Where:

m_Δθ is the number of samples in the wind sector θ having a concentration C greater than or equal to a threshold concentration value x
n_Δθ is the total number of samples from wind sector Δθ

Conventionally, x is set as a high percentile of concentration, (for example, the 75th or 90th percentile). So, CPF indicates the potential for a source region to contribute to high air pollution concentrations.

Therefore, where you have experienced a high number of data points with values greater than your chosen threshold value, for a particular wind direction, you will have a higher probability value for that wind direction. CPF analysis is useful for showing which wind directions are dominated by high concentrations and give the probability of doing so.

The conditional bivariate probability function (CBPF) couples ordinary CPF with wind speed as a third variable. It allocates the observed pollutant concentration to cells defined by ranges of wind direction and wind speed rather than to only wind direction sectors. It can be defined as:

CBPF_{Δθ, Δu} = m_Δθ,Δu | C≥⃒x / n_Δθ,Δu

Where:

m_Δθ,Δu is the number of samples in the wind sector Δθ with wind speed interval Δu having concentration C greater than the threshold concentration value x
n_Δθ,Δu is the total number of samples in that wind direction-speed interval

So, where you have a high number of data points with values greater than your chosen threshold concentration value for wind direction and speed, you will have a higher probability value for that wind direction and speed.

The extension to the bivariate case provides more information on the nature of the sources because different source types can have different wind speed dependencies. The use of a third variable can therefore provide more information on the type of source in question. Note that the third variable plotted on the radial axis does not need to be wind speed, it could for example be temperature. The important issue is that the third variable allows some sort of discrimination between source types because of the way they disperse.

Temporal analysis

All temporal analysis of air quality data takes advantage of cycles in the production and dispersion of pollutants. You can enhance each of the techniques listed below by first dividing the data into the relevant wind direction sectors. This allows the isolation of the signal from a particular source from the rest of the data. It also allows closer examination of the characteristics of the sources of interest.

Diurnal variation

Diurnal variation is the averaged temporal distribution of pollutant concentrations through the 24 hours of the day. This involves taking the date of the monitoring period (or part of the monitoring period) and dividing it up into 24 sets. Each set representing data collected within one of the hours of the day. The mean concentration is then plotted against the hour of the day.

Consideration of the diurnal distribution of concentration levels can provide useful information about the sources contributing to the ambient levels in each sector. Traffic generated pollutants often show a diurnal profile that matches the traffic flow levels. This generally takes the form of a double peak pattern, which correspond to the morning and evening rush hours.

Industrial emissions are usually characterised by more elevated concentrations during the hours with the greatest amount of solar heating of the ground. This is because of increased convective mixing bringing stack emissions to ground level.

Changes between British Summer Time (BST) and Greenwich Mean Time (GMT) occur twice per year. This provides an opportunity to identify sources whose emissions and inputs are specifically linked to daytime working patterns. Also, to distinguish them from other more continuously emitting sources, such as some industrial processes.

Weekly variation

Weekly variation is the averaged temporal distribution of pollutant concentrations through each day of the week. The technique divides the data set into the 7 days of the week. This can provide useful information about the working practices of individual process activities within the sector of interest.

Seasonal

This is the temporal distribution of averaged pollutant concentrations for each month of the year. This is relevant because many pollutants exhibit seasonal cycles.

Trend

By performing a regression analysis on statistics such as annual mean and 98th percentile concentrations, it is possible to assess how air quality compares to previous years. You may also identify whether pollution concentrations are changing over time. This may only become meaningful when complete data records extend over 5 years or more.

Cumulative sum (Cusum)

Cusum was originally developed to determine deviations from set values. (B. Barratt and others, Investigation into the use of the CUSUM technique in identifying changes in mean air pollution levels following introduction of a traffic management scheme, Atmospheric Environment, vol. 41, p.1784-1791, 2007). This has improved the sensitivity of this statistical technique to identify change points in ambient pollution levels.

Concurrence analysis

Concurrence analysis techniques look at the relationship between the pollutant of interest and other data collected at the same time. It looks for patterns and trends that can help identify sources. Concurrence methods can focus on two variables (for example, concentration and wind speed) or more than two (for example, concentration, wind speed and wind direction). If a particular source impacts distinctly for a particular range of concurrent variables, use this range to define a conditional window. Then track concentrations to determine trends in the source of the emission. The following are concurrence analysis techniques:

Wind speed versus pollutant concentration analysis

Wind speed plays an important role in the dispersion of air pollutants. Higher wind speeds generate more mechanical turbulence, which has the effect of distributing emissions more rapidly through the mixed boundary layer of the atmosphere. The relative concentrations measured at different wind speeds can provide insight to the nature of contributing sources. Examine and contrast average concentrations measured at different wind speeds for each 45° sector. You may get information about the number, distance, and height of release of impacting emissions.

For example, if the levels of PM₁₀ increase with windspeed but PM_2.5 does not, this suggests that the sources in these sectors are not combustion sources. This could be the result of wind-blown dust because a rapid rise with wind speed is consistent with wind erosion. If a sector shows levels of PM₁₀ and PM_2.5 that increase with increasing wind speed, this suggests an elevated combustion source, such as a stack release. If a sector shows levels of PM₁₀ and PM_2.5 that decrease with increasing wind speed, this suggests that the sources in this wind sectors are low level combustion sources, such as traffic emissions.

Pollution ratio

This model looks at both NOx and particulate concentrations to establish the amount of particulate coming from traffic sources. (Fuller and others, An empirical approach for the prediction of daily mean PM₁₀ concentrations, Atmospheric Environment vol.36, p1431-1441, 2002).

Episode analysis

Episode or event analysis looks more closely at the timing of high pollutant concentrations to identify the conditions that may have given rise to such events.

For example, peaks of methane and hydrogen sulfide concentrations may occur at night when there is poor dispersion because of nocturnal inversion layers. Nocturnal inversion layers occur during stable conditions when temperature and wind speed decrease. Also, if trends in methane and hydrogen sulfide events follow each other closely, it would suggest that they are both from the same source.

Ratio analysis

By measuring ratios of a gas of interest with another gas, it may be possible to:

attribute measured emissions to a source
quantify its emissions in more detail
calculate efficiencies of combustion for industrial and biological processes

In the case of methane as a gas of interest:

you may use the ratio of ethane to methane to distinguish biogenic sources from fossil sources
you may use the ratio of acetylene to methane with a tracer release to quantify emissions
you may use the ratio of carbon dioxide to methane to calculate destruction or removal efficiencies, or to attribute quantified emissions to a particular source such as a leaking compressor

Multiple time series analysis

Multiple time series plots are useful for visually showing correlation between individual pollutant events. For example, you can use a time series plot over several months to show trends in several pollutants simultaneously, such as:

hydrogen sulfide
carbon monoxide
oxides of nitrogen
PM_2.5 and PM₁₀

Contact us

You can contact the Environment Agency if you need any help.

General enquiries

National Customer Contact Centre
PO Box 544
Rotherham
S60 1BY

Email enquiries@environment-agency.gov.uk

Telephone 03708 506 506

Telephone from outside the UK (Monday to Friday, 8am to 6pm GMT) +44 (0) 114 282 5312

Monday to Friday, 8am to 6pm.

Monitoring ambient air: data analysis techniques

Applies to England

Directional analysis

Pollution rose analysis

Triangulation analysis

Percentile rose analysis

Frequency roses

Conditional probability function (CPF) plots in OpenAir

Temporal analysis

Diurnal variation

Weekly variation

Seasonal

Trend

Cumulative sum (Cusum)

Concurrence analysis

Wind speed versus pollutant concentration analysis

Pollution ratio

Episode analysis

Ratio analysis

Multiple time series analysis

Contact us

Is this page useful?

Help us improve GOV.UK

Help us improve GOV.UK

Cookies on GOV.UK

Monitoring ambient air: data analysis techniques

Applies to England

Directional analysis

Pollution rose analysis

Triangulation analysis

Percentile rose analysis

Frequency roses

Conditional probability function (CPF) plots in OpenAir

Temporal analysis

Diurnal variation

Weekly variation

Seasonal

Trend

Cumulative sum (Cusum)

Concurrence analysis

Wind speed versus pollutant concentration analysis

Pollution ratio

Episode analysis

Ratio analysis

Multiple time series analysis

Contact us

Updates to this page

Sign up for emails or print this page

Related content

Is this page useful?

Help us improve GOV.UK

Help us improve GOV.UK