Privacy Enhancing Technologies Evolution Series (Part 3) Event Summary

This is a summary of the  IAB Tech Lab’s event on December 8th, 2022, the third session in a series on the topic of Privacy Enhancing Technologies (PETs). You can watch the event video here.

For an introduction to PETs, including what they are and an overview of the different types of technologies that PETs include, please refer to the part 1 event of the IAB Tech Lab’s Privacy Enhancing Technologies Evolution event video and write up, and the part 2 event video and write up.

Key Takeaways

  • This event featured a range of viable, currently available, PETs based solutions that prove the feasibility of using this technology in the long term. As IAB Tech Lab’s Shailley Singh said “we are seeing a lot of green shoots in the industry… companies are making use of technology and making it available today for people to benefit.”
  • Even amongst the most challenging use cases, such as measurement and attribution, there are in-market examples PETs and clean rooms that are already enabling new privacy-safe ways to approach this use case. The future looks optimistic as the cost of running PETs continues to fall thanks to ongoing investment.  
  • Collaboration across the industry is crucial to continue the advancement of PETs, and the solutions that deploy them. A broad group of players both internally and externally need to be involved to ensure success. Industry groups such as that at IAB Tech Lab offer a space to participate, alongside calls from technology companies throughout the event to feed in directly to many proprietary initiatives.
  • Most possible solutions do not single out one PET over another, instead more often multiple PETs are used together to create an overall approach that works to achieve the balance of privacy and utility. As Meta’s Sanjay said, “It’s not about one technology vs another technology. It could be a combination of technologies that makes it work.”
  • PETs are still early in the adoption curve, but some marketers are starting to understand the need to change their approach. As most often they ultimately care about the impact of PETs on cost and performance, use case focused education programs are still needed to move the industry away from user-level data.  

IAB Tech Lab Introduction

Anthony Katsur, Chief Executive Officer, IAB Tech Lab

The CEO of IAB Tech Lab, Anthony Katsur opened the event by explaining that as an organization,  Tech Lab expects PETs to play a key role in the advertising ecosystem going forward, as they already do in other industries. He emphasized that PETs look set to play a role alongside other privacy frameworks to help “create a more private and secured digital supply chain while maintaining some form of addressability” which, in its current form, is shrinking due to changes in access to traditional identifiers, consumer expectations and the regulatory landscape.

The work of IAB Tech Lab in 2022 has focused on education and awareness, shifting to action in 2023, with the expectation that in 2024 PETs will hit “the flywheel of adoption”.  

Lastly, Anthony outlined the progress of the clean room standards that are being developed by the REARC Addressability working group, emphasizing that these are living, breathing, specs and a work in progress likely to be published in February 2023. Other projects in process through the PETs working group include creating awareness and evaluating the major browser PET proposals. Anthony made a call to include a wider base of participants to engage in this group, encouraging more feedback and discourse. To get involved in Tech Lab’s PETs working group, please visit the PETs working group sign up request.

Moving Towards Optimal Privacy & Utility

Sanjay Saravanan Research Scientist Manager, Statistics & Privacy R&D, Meta

Sanjay’s session focused on the advances that have been made in the advertising industry in the PET space. Using three examples he showed how real-life progress has been made towards achieving the optimal state between privacy and utility, whilst satisfying business requirements.

Following an overview of some of the benefits and challenges of some of the major types of PETs including on device analytics & learning, secure hardware, and secure multi-party computation (MPC), Sanjay used some examples of PETs in-market today using these technologies to illustrate the rate of advancement.

Sanjay first focused on Meta’s Private Lift solution as an example. Launched in 2018, Private Lift aims to measure the incremental lift for ads on Meta. The most technically challenging element of MPC has been scaling private matching, where records are linked between multiple record sets without sharing PII back and forth. To do this private matching, Meta’s Private Lift uses private ID, an open-source initiative. In 2018, the first sample query of 10k rows took 7 to 8 hours to run, compared to in 2022 where Meta’s Private Lift solution costs $1 to compute 1m rows, taking only 2 minutes, equalling 0.01% of the campaign cost to generate a report. Sanjay emphasized that this progress busts “the myth that MPC or PETs are too expensive, and with continued investment we are able to show that we can reduce cost”.

The second example Sanjay described came from the World Federation of Advertisers (WFA) Cross Media Measurement Framework, which also uses MPC, but aims to provide deduplicated reach and frequency measurement across media types and platforms. This initiative is explained in more detail in the final presentation of the event. Here, private matching is done through double blind panel exchange protocol, which is also open source. In this example, viability is proven again, as this approach costs $15 to compute 1m rows, taking 4 minutes equalling 0.08% of the campaign cost to generate a monthly report.

The last example, the joint initiative from Mozilla and Meta, Interoperable Private Attribution (IPA), is a newer project which aims to enable cross device attribution and interoperability across platforms. Here, the cloud cost to compute is currently $50 for 1m rows, taking 100 mins, equalling 0.7% of the campaign cost to generate an attribution report. Though these costs on the face of it look higher than the other two examples, this is a newer initiative, and the expectation is for these costs to come down as investment in engineering increases.

Sanjay closed the session by listing three learnings:

  1. At the start, all technologies look hard, expensive, and lengthy, but investing and optimizing over time brings down costs. PETs need continued investment to make them viable for ads.
  2. This is not just engineering problem. Having the right teams in place, such as lawyers and policy makers, and engaging with them early on is just as important for success.
  3. Continued testing is the only way to make progress and collaborating across the industry is a key for future advancement.

Matching & Activation in Clean Rooms: Need for a Standard Specification

Andrei Lapets, Vice President, Engineering & Applied Cryptography, Magnite
Bosko Milekic, Chief Product Officer, Optable

Andrei and Bosko focused on sharing information on another initiative, the Open Private Join protocol, that is being designed by IAB Tech Lab members across the PETs working group and the REARC Addressability working group.

Andrei explained the objective of the project was to “create a punching bag to mould into something we can use to practice addressing some of the challenges [of how to evaluate standards and work towards interoperability].”  The use case covers two operations in one, firstly for privacy-protecting targeting but also in activation allowing SSP/DSPs to use the protocol to “avoid leaking audience PII from advertiser to publisher”.

After many conversations, the three design goals of the project recently crystallized into the following three goals:

  1. Security of PII, i.e., not revealing raw data.
  2. Privacy or User Identity, i.e., not revealing a user’s identity (even from non PII data sources, such as just one column of data).
  3. Privacy of Audience Membership i.e., not revealing if a user in or out of an audience.

Bosko explained that most clean rooms can offer an activation solution, but what this initiative focuses on is the third design goal, maintaining the privacy of audience membership, as in a typical activation this is not maintained. In the architecture of OBJA, as pictured below, in the middle is the concept of a matching system that could be enabled by a clean room, but also potentially by other operators such as the DSP.

In terms of activation, one mechanism in consideration requires activation data in the form of ‘encrypted labels’ which are generated by the matching system, where each publisher gets back encrypted labels after the match. It is important to remember the goal here is not to produce an imposed or prescriptive standard, Bosko instead emphasized the aim is “to produce a reference design for open implementation and demonstrate how this can work in practice in the open to the benefit of everyone.”

It is important to consider the potential ‘collusion scenarios’, to either mitigate against, or at a minimum, to document the implications of any potential problems. For example, to think through what happens if the DSP and matching system share information, and what are the implications of such a collusion on the privacy and security design goals?  In some cases, solutions can offer mitigations which are adequate, or in other cases it might be a problem.

Potential attacks must also be considered, such as the possibility of a publisher observing which ads are served on its site, and then working out which of their users are a part of a given advertisers customer list, violating the third design goal. It might be possible to mitigate against this through noise injection and other approaches, but with any mitigation there is a direct utility vs privacy trade-off to be considered such as increased costs to the advertiser. Encouragingly, Google’s Privacy Sandbox proposal for ‘fenced frames’ is a mechanism currently in Chrome origin trials, and may mitigate against the impact of ad observability in practice. Andrei called for browsers to focus on this more, saying “if you don’t address this then the cost of everything at the matching stage isn’t worth it.”  

Feedback from REARC working group means this project will likely evolve into a set of smaller proposals to make it easier to consume and evaluate, and an initial proposal will be published in February 2023. There are a few drafts in early stages for the next steps, and Andrei made clear “The goal of this exercise is to try to propose some reference implementations to end up with a framework that could be used and reused”.  Bosko closed the session by sharing lessons on how challenging it has been to articulate the design goals and to reach agreement, even on a relatively common and simple workflow, but that participation and feedback has been key to make progress in the space.  

Publisher Advertiser Identity Reconciliation (PAIR)

Shreya Mathur, Senior Product Manager, Google

Shreya gave an overview of Google’s Publisher Advertiser Identity Reconciliation (PAIR) solution that was released in October 2022. Shreya defined PAIR as “a protocol. It’s a series of encryption steps that can be taken to enable a secure and privacy-safe way for advertisers and publishers to match their first-party data against one another, mostly for remarketing use cases. It offers a way to do this without the use of third-party cookies.”

A visual of PAIR can be seen on this link.      

Shreya walked through the nine steps in the PAIR workflow:

  1. Encrypted Instances – Specific 1-on-1 advertiser/publisher scoped relationships are created in a clean room, where first party data is uploaded to the clean room and the advertiser can see the list of available publishers that have uploaded their data set and are available to pair, and vice versa. Initially the advertiser and the publisher will work in same clean room provider, but the vision is that any opted-in clean room will be interoperable with each other.
  2. (A)(P) generation coordination- Here, three keys are coordinated for each publisher-advertiser relationship for each underlying record, an Advertiser key (A) a Publisher key (P), and the shared Secret key (S). Each party only has access to its own key and the Secret key.  
  3. Generation of advertiser and publisher encrypted identifiers This step is what makes PAIR unique. The advertiser and publisher keys are commutated, which means if the advertiser and publisher keys are applied consecutively on a specific input, regardless of the order in which those two keys are applied, the output will be the same from an encryption perspective.
  4. Share encrypted lists Here, the advertiser and publisher clean rooms share encrypted lists with each other.
  5. Clean room reapplies its key to get PAIR PAIR protocol runs for every advertiser-publisher pair to create multiple copies of the original dataset with unique IDs on a per advertiser-publisher basis for the same end user. The advertiser applies (A) key on the dataset received from the publisher, and the publisher applies (P) key on the dataset received from the advertiser. For both advertiser and publisher, the PAIR ID is a thrice encrypted identifier, and this all occurs in the clean room so neither party sees it.
  6. PAIR lists shared The advertiser and publisher share with each other the PAIR IDs they generated in step 5 to compare the data set they generated in step 5 with data received in step 6 and generate a match rate.
  7. Offline match rate The clean room shares offline match rates with advertisers and publishers. Advertisers and publishers only get access to advertiser or publisher encrypted identifiers respectively. The PAIR IDs do not get shared with anyone except the clean room and the DSP.
  8. PAIR instance within Display & Video 360 In this case, the DSP is Display & Video 360. Here, the DSP integrates with the advertiser’s clean room and will get access to the list of PAIR IDs (which are never shared with the advertiser or publisher directly) and the match rates across the different relationships.
  9. During the bid-request bid-response process Lastly, the online auction process begins. Here, the advertiser sets up a campaign based on the advertiser encrypted identifiers. For the publisher, when the user visits the property, the publisher performs a high-speed lookup and passes on publisher encrypted IDs in the bid request to the SSP, and the SSP sends this on as is the DSP. Here the DSP looks at the publisher encrypted identifier and re-encrypts the publisher identifier with the advertiser key to create the PAIR ID, and if there is a match, a bid response can be sent. The publisher does not know why the DSP has responded, and neither party can reconcile identifiers across different advertisers or publishers.

Deploying PAIR means that profiles cannot be built as the identifier is different for each publisher, reducing both data leakage and the leakage of insights of the data. Measurement is also made possible, specifically conversion data can be shared back to both advertiser and publisher at an aggregate level.

Currently, this protocol is only for first party data where the advertiser and publisher have collected consent from end user. Initial phases are focused on email-based identifiers, but this may expand to phone-based identifiers.

Using PETs in Clean Rooms for Measurement & Attribution

Edik Mitelman, GM, Privacy Cloud, AppsFlyer

This session gives another in-market example from AppsFlyer, and how they approach mobile and CTV measurement and attribution, and the solutions that clean rooms are providing in these areas.

To open the session, Edik stressed “We welcome privacy regulations. Nobody ever cared about ‘Bob’ or ‘Alice’ or ‘Jane’. Every analysis, insight and optimization uses cohorts, segments, groups of people. No one cares about user level data, it is just a row in a data set, we want insights and tools to optimize campaigns and improve user experience.  User-level data was easy and comfortable, and we got hooked on this drug, and we need to stop. PETs and clean rooms enable the same utility whilst fully preserving user privacy.”

Explaining how AppsFlyer’s approach works, Edik mentioned that, like Google, commutative encryption is deployed. The publisher and advertiser data sets are both encrypted by their private key, then again encrypted by Appsflyer’s private key, making it double encrypted.  Then a match can be performed on the double encrypted data, creating a simple yes/no to the match. If there is a match, then non-sensitive data is sent into the attribution machine which runs conversion modeling to attribute the action.

Now the attribution result is generated, how can it be consumed?  The entire process of the matching, joining and attribution happens in the data clean room. The PETs that Appsflyer deploys to ensure reports are fully aggregated and protected break down into three layers:

  • K-anonymity, a principle that says no rows will have one user group, rows are grouped together until a certain threshold (which is defined between network and app owner) are met.
  • Private Set Intersection (PSI) The advertiser and publisher share their encrypted data with each other and apply another layer of their own encryption and shuffle the order.
  • Differential Privacy (DP) On some occasions there is a need to additionally deploy DP, for example if there is a small group of users for example, a group from an obscure country.  Here DP is applied to the results which adds noise that keeps insights available but doesn’t allow for obscurities and details to reidentify individuals. 

The reports that are generated taking this approach output a more holistic view of the data, with less restricted data. A clean room using these methods creates a report with no blind spots and without PII data. Edik says of the utility of this approach “the entire point of using clean rooms and PETs for attribution is that you regain full visibility and control over the data and the insights that you need to run your marketing, without exposing privacy breaching guidance.”

Panel Discussion: State of Clean Rooms Today: What to Expect

Devon DeBlasio, Vice President, Product Marketing, InfoSum

Edik Mitelman, General Manager, Privacy Cloud, AppsFlyer

Shailley Singh, Executive Vice President, Product, Chief Operating Officer, IAB Tech Lab

Matt Zambelli, Director of Product, Neustar, a TransUnion Company

Devon DeBlasio, Edik Mitelman and Matt Zambelli came together under Shailley Singh’s guidance to discuss the outlook for clean rooms. 

The panelists explained that clean rooms are an established technology, and Devon summed up the breadth of use cases clean rooms are being deployed for today, “[Clean rooms are being used] across all marketing use cases that everyone uses today. That includes insights, activation, planning, measurement, and identity. That is the point, we need to do everything we do today, but in the context of a privacy secure environment where everyone has control over their own data and the ability to extract insights at scale without violating privacy or the reputation business.”

Marketing teams are leading the conversation with clean rooms providers, but increasingly pulling in IT and privacy teams to help vet the technology. Clients’ questions focus on, according to Matt, “how are you going to clean up this mess for me?” and “how is it going to impact the cost and performance of my marketing?” based on Devon’s experience. The good news is that Edik maintains it is “easy to show the ROI of clean rooms.. the clean room gives full access to data and marketers and see and improve LTV, and if it doesn’t do this, don’t buy it.” 

The panel agreed that the measurement use case is the biggest pain point. Customers are used to having user-level insights and clean rooms, according to Devon, create “a purposeful lack of precision that is baked into the results, which eliminates some of the hands-on access which the Data Science team is used to’”. Measurement needs to be rethought by marketers to adapt for the current context, and there is a job to be in convincing and educating clients to change their approach.  Edik said of this, “this is still an education  problem not an implementation or value problem. We are fighting the education war; we need to try to explain why people need it. Being early in the adoption curve as we are here, we need to explain and simplify it, and only then will we get to the weeds of implementations.”

Clean room providers are giving clients practical support to get the most out of the tools through customer support, instructional videos, relevant examples. One area that clean room suppliers are providing guidance on is on how to use the data that comes out of clean room operations. The panel emphasized that clean rooms provide full control end-to-end, and that the role of the clean room is to ensure the flow of data and utility is preserved.

On Device Audience Targeting

Eddie Dingels, Chief Technology Officer, GroundTruth

Eddie Dingels presented GroundTruth’s approach deploying on device technology. GroundTruth is a publisher (the app WeatherBug), an ad server, and a geofence-based audience company, and are considering the new environment including the use of PETs.

Eddie defined on-device as “the same dominant audiences that we built server side for years but made in a system we can pass and bid on without any IDs being passed across the wire. All that audience building we used to do on the back end, we shrunk it down, stuck in an SDK and put it on a mobile device.”

GroundTruth built a product, GroundTruth On-device Audience Targeting (GOAT), deploying on-device technology which is illustrated below. Eddie explains how this works, “Inside of an audience SDK we are getting our location data. We didn’t want to just pass precise location to the backend and then pull back blueprints one at a time. Instead, we are truncating lat/longs to make sure that we are not passing any precise location data to the back end.  We’re then pulling back a set of blueprints, so when we pull back those geofences to the device, we are integrating those into the device level OS to provide better battery life. We then do geo resolution on the device itself, storing visits. We’ve pulled and simplified our science models to run on device, still building the same audiences but done on device and stored into a local database. This is great as the ad request comes across the wire, with the audience, so no ID comes across into our DSP. This helps as no longer do we need a big database on the back end, we just worry about campaign allowability and ultimately return an ad that matches in the same way.”

GroundTruth did encounter some limitations to On-Device

1. A ‘cold start problem’ as often IDs have been the bedrock that databases have been built on for a long time and using on-device means starting from the ground up.

2. The data is at app level context, meaning that two apps on the same device might have slightly different audiences dependent on when each app was installed, which is less of an issue as it is a targeting framework vs a measurement framework.

3. There were questions about how to communicate audiences across the wire on RTB, until IAB Tech Lab launched Seller Defined Audiences, which means an audience ID can be passed across the wire instead of an identifier.

Combining GOAT with Seller Defined Audiences created the overall solution which GroundTruth estimated could increase the CPMs on WeatherBug by 388% compared to non-addressable supply. This benefits everyone as Eddie explained, “On-device is an elegant solution especially when combined with SDA because it increases publisher monetization effectively, adds to advertiser reach and leans into privacy by design for consumers.”

Privacy Sandbox Anti-Fraud Ad Spam Solutions

Neha Megchiani, Strategic Partnerships, Privacy Sandbox, Google

Eric Trouton, Product Manager, Privacy Sandbox, Google

This session focused on an update on the efforts Google are making, in partnership with the industry, through the Privacy Sandbox to combat advertising spam and online fraud.

There are two key Privacy Sandbox Proposals for Anti-Fraud

–   Device Integrity Attestation Through The Browser

Eric laid out that “It is getting harder to tell the difference between a real device and an emulator, and with fingerprint surface reduction this problem will get harder.” There are open questions from Google such as, would a signal attesting to the device’s legitimacy be useful? Which integrity signals would be most useful in preventing ad fraud e.g., a low entropy signal indicating valid device tested by the OS, or run time integrity checks or recency of factory reset?

The aim of this initiative is to level up the conversation and focus on capabilities which are the high-level functional requirements for a given set of anti-fraud use cases that are not specific to any sources of truth or technologies. Eric invited the industry to collaborate with Google on which capabilities are needed for anti-fraud detection, as “together we can build new solutions that are privacy preserving and may even provide a better source of signal.

–   Private State Token

(a.k.a. Trust Tokens, the previous name until October 2022)

Private State Token is further developed, the origin trial ended in 2022. Though next steps have not been announced, there have been positive signals from the testers on the origin trial.

Eric described it as a trust signal across sites, sending a very small amount of information without conveying any identifiers. For example, a site A owner or ‘issuer’ might have a user visit their site several times, or made purchases, and this establishes trust.  The issuer then encodes that this is a trusted user and generates a blind token to server for signature that the issuer, and the browser stores this token. Later, when the same user goes to a different site, site B will do a redemption request to the issuer to see if they have a token, and then request to redeem the token from the issuer site A. The issuer will see it is a valid user, verify the signature and issue a redemption record. Furthermore, the presence of a token does not in itself indicate it is trusted, as there are levels of trust encoded, making it harder for a malicious actor to reverse engineer if they are trusted or not.

Eric finished by again calling for collaboration, either by participating in surveys, industry groups or 1-to-1 meetings to contribute to the industry anti-fraud effort.

Advertisers Driving Change: Global Cross-Media Measurement

Matt Green, Director, Global Media Services, WFA

The last speaker of the day, Matt Green, ran through another PET proposal being spearheaded by the WFA aiming to allow advertisers to better count unique users and measure unduplicated reach and frequency across media in a privacy-compliant fashion.

From this goal the HALO program was born, culminating in the development of the Cross-Media Measurement Framework. This approach sees a single source panel measuring the media consumption habits of panelists, which is then used to train a virtual ID model to build a virtual representation of a population. There would be no more IDs than people in a given country, and publishers would assign virtual IDs to their impressions independently using a specific assignment model which maps existing user identifiers profiles and other impression data to the virtual ID (VID), allowing them to be deduplicated and privately counted to give a representation of campaign reach and frequency.

There are two phases to the approach

  • Set up and training phase- A VID model would be trained by a vendor chosen by a local market. Panelist media consumption is shared with the panel operator via a double-blind match, then VIDs can be assigned to events using specific publisher models that are trained on the panel data to account for personification and demographic correction of publisher first-party data identifiers and demographics.
  • Live measurement phase – that VID model would be applied to publisher census event data to assign the virtual personas to event data on a continuous basis. Those VID assignments are then transformed into encrypted probabilistic data structures called sketches, then finally MPC would merge the sketches provided by the publishers to calculate private reach and frequency.

The program continues to roll out with local stakeholders having just completed a minimum viable product, and furthermore there is interest across other countries such as Canada and Germany.

Matt shared his optimism for the project by saying “This [project] can deliver accurate and actionable measurement, which is privacy safe and future focused, making a lot of the previous technologies look positively anachronistic. It has the blessing of many global players who endorse it and are prepared to put their data into it. It’s a global framework which is locally owned and governed, and a flexible system to support the metrics chosen by the local market, and entirely transparent with no black boxes.”