Encoding characters
Updated 9 August 2022
Use UTF-8, an encoding form for Unicode character sets, for government digital services and technology.
1. Summary of the standard’s use for government
Unicode is based on the ASCII character set, but expands ASCII to include characters for most written languages.
UTF-8:
- is one of the encoding forms for Unicode
- encodes all Unicode characters without changing the ASCII code
This makes UTF-8 flexible for a wide range of uses. For example, the default character encoding in HTML-5 is UTF-8.
The government chooses standards using the open standards approval process and the Open Standards Board has final approval. Read more about the approval process for cross-platform character encoding.
2. How this standard meet user needs
Users of this standard include:
- publishers of government data
- data scientists
- data analysts
- developers
UTF-8 is an international standard. By using it you can read, write, store and exchange text that remains stable over time and across different systems.
You will also:
- prevent accidental or unanticipated corruption of text as it transfers between systems
- save operational costs by making it easier to find and fix errors in the text
- have accurately translated languages moving between systems
- keep file sizes smaller
3. How to use the standard
To use UTF-8 you need to:
- save text in UTF-8 encoding to apply it to your content
- declare the character encoding, for example, W3 has an example of declaring encodings in HTML
- check your server has the correct HTTP declarations so that they do not override your encoding
Read the W3.org article on migrating to Unicode for more information.