Digital and Social Media Research
10 Use Cases
Apple has leveraged federated learning to train the voice recognition software used by its AI Assistant, Siri. A local model is trained on an individual’s iPhone, and the resulting model weights are periodically communicated back to a central server, which builds a global model by aggregating the weights from the local models. This global model is pushed out to users’ iPhones, and the process repeats. Noise is injected during the training of the local model to ensure it is differentially private, so as to mitigate the risk of reidentification. Using this system, Siri can learn to recognise the voice of the iPhone owner, so that it only responds to them without Apple collecting any raw data relating to the users’ voice.
Supporting links
In 2018, Facebook established an initiative to provide researchers access to data in order to study the role of social media in elections and democratic discourse. Data was shared with 60 researchers and consisted of links that had been shared publicly on Facebook by at least 100 unique Facebook users. In 2020, the size of the shared dataset was substantially increased to include approximately 38 million such links with new aggregated information to help researchers analyse how many people saw these links on Facebook and how they interacted with that content – including views, clicks, shares, likes, and other reactions. The data shared was also aggregated by age, gender, country, and month. Facebook leveraged differential privacy to provide privacy guarantees to individuals in the dataset.
Supporting links
GBoard is a keyboard app for Android and iOS devices. It features next-word prediction, driven by a machine learning model. GBoard utilises federated learning where each mobile device downloads an initial model from a central server, which is further trained on the device using user data local to the device. The weights of the resulting model are periodically communicated back to the central server using a secure aggregation protocol (a form of multi-party computation), which aggregates the weights received from all mobile devices into a new common model. Devices download this new model, and the cycle repeats, such that the model is continuously trained without collecting user data centrally.
Supporting links
In order to compute accurate conversion rates from advertisements to actual purchases, Google computes the size of the intersection between the list of people shown an advertisement to the list of people actually purchasing the advertised goods. When the goods are not purchased online and so the purchase connection to the shown advertisement cannot be tracked, Google and the company paying for the advertisement have to share their respective lists in order to compute the intersection size. In order to compute this without revealing anything but the size of the intersection, Google utilises a protocol for privacy-preserving set intersection. Although this protocol is far from the most efficient known today, it is simple and meets their computational requirements.
Supporting links
Keyless is a cybersecurity platform that provides privacy-first passwordless authentication and personal identity management solutions for enterprises. They combine biometrics with PETS and a distributed cloud network. Their technology means that enterprises no longer need to centrally store and manage passwords, biometric data and other personally identifiable information. The underlying technology that they use is secure multiparty computation which enabled multiple cloud servers to jointly process identity authentication requests without disclosing any data between them. As such, companies are able to comply with authentication requirements and incur minimal data protection risk.
Supporting links
The Confidential Consortium Blockchain Framework (CCBF) is a system using trusted execution environments that facilitates confidentiality within a blockchain network. Blockchains were designed to prevent malicious behaviours by recording all transactions, making them open for all to see and replicated across hundreds of decentralised nodes for integrity. Within CCBF, confidentiality is provided by trusted execution environments (TEEs) that can process transactions that have been encrypted using keys accessible only to a CCBF node of a specific CCBF service. Besides confidentiality, TEEs also provide publicly verifiable artefacts, called quotes, that certify that the TEE is running a specific code. Hence, integrity of transaction evaluation in CCBF can be verified via quotes and not be replicated across mutually untrusted nodes as it is done in public blockchains. It is worth pointing out that transaction data is replicated in CCBF across a small network of nodes, each executing in a TEE, but for the purpose of fault-tolerance rather than integrity. In addition, Microsoft’s test showed that the CCBF could process 50,000+ transactions per second, demonstrating the scalability of the technology. As a comparison, the public blockchain Ethereum network has an average processing rate of 20 transactions per second, whilst the Visa credit card processing system averages 2,000 transactions per second. The framework is not a standalone blockchain protocol, but rather it provides trusted foundations that can support any existing blockchain protocol.
Supporting links
Microsoft Viva is an Employee Experience Platform (EXP) that brings together communications, learning, resources and analytics. There are four sub-services on the platform called Insights, Topics, Learning and Connections. Viva Insights gives employees and managers personalised and actionable insights on various organisational metrics that can help drive productivity and wellbeing experience. The Insights tool uses safeguards like de-identification and differential privacy by default so that personal insights are only available at the individual level and not for managers or leaders of an organisation.
Supporting links
Microsoft has rolled out Password Generator and Password Monitor features in its Edge browser, using a homomorphic encryption service. The password manager collects saved passwords in one place and the monitor alerts the user if passwords have been compromised. The use of homomorphic encryption means that Microsoft never has to decrypt the data, in other words never has access to the actual credentials, but is still able to query the data.
Supporting links
Signal is an open-source, privacy-focused instant messaging app. Signal provides end-to-end encryption of messages, and beyond this aims to collect as little information about its users as possible. The only information stored is a user’s phone number as this is required to register with the service. Signal leverages novel security technologies in order to provide features expected by users without collecting data about them. One such example is their use of trusted execution environments (TEEs) - namely, Intel SGX - to allow contact information from a user’s phone to be used to find their contacts who are also on Signal. A server-side contact discovery service runs inside the TEE, to which a user uploads their contact information, the service looks for matches in Signal’s database of registered users, and information of these matches is returned to the user. Contact information is only decrypted inside the isolated TEE, meaning Signal has no visibility of it. Additionally, SGX supports remote attestation, meaning the client is able to verify that it is the expected contact discovery service code running inside the TEE before using it.
Supporting links
Social media company Twitter and open source community OpenMined are working together to pilot methods of improving transparency in social media, by facilitating privacy-enabled access to algorithmic code for researchers. In this way, OpenMined is exploring and implementing the infrastructure for this access, which includes a variety of PETs. One significant example technology used is that of “remote execution” environments, or, federated learning and analytics. Using these PETs, researchers can run code designed to train machine learning algorithms and generate data analytics in a federated way across a network, hence, curtailing the risks which come with a central actor controlling this data throughout the machine learning/analytics process. Initial outcomes of this work have demonstrated that it is possible for researchers to run queries across vast social media algorithms without directly accessing the data themselves. There are now subsequent plans to scale these methods (using federated analytics and learning in addition to other PETs like differential privacy) across multiple online platforms..