Privacy is about people.
My name is Joey Tyson. I'm a privacy engineer at Apple, and I know you've heard a lot of great information this week on exciting new features and you're ready to get out there and build some new apps, but I know you also care deeply about your user's privacy.
And this is the first of three big ideas I want to explore with you about privacy today before we get into this year's updates. Because we're going to talk a lot about data privacy, but we can never forget that data is about people. It belongs to them.
So when I say that privacy is about people, I mean it's about building a relationship of trust with your users. This lays a foundation for better engagement, which leads to better apps. Think about other relationships. It's the people that you trust that you're more likely to work with and spend time with. And as your users understand why you're collecting data, how it's being used, as you handle that data respectfully and thoughtfully, you're going to get better data, because they're going to be more comfortable using your apps and sharing information, and this builds loyalty over time.
And I start here because I want you to apply this context to your development process.
None of us do engineering in isolation, so whether you're working with health records or just building a simple puzzle game, the information you gather and the ways that you use it could have a very real impact on people's lives. So it's critical for each of us to think carefully about the technologies that we're building.
In a recent commencement address at Duke University, Tim Cook talked about Apple's approach to privacy, and he said that "In every way, at every turn, the question we ask ourselves is not what can we do, but what should we do." And that leads me to the second big idea, ask the "should" questions. No matter what your role, whether you're a solo app developer or part of a large organization, you can be the one to stand up for your users. Remember your responsibility towards them, and ask questions about the data flows in your app. For example, why do we actually need this data? This isn't merely an accusation. It's to think about is this necessary for our use case? Should we collect it? Would this surprise our users? If people understood this and it scared them, why should we be doing it at all? Could we use less granular data, less precise? Are there other approaches we should consider? And should we delete or aggregate this data after a period of time? Part of the reason for asking these kinds of questions is that we can all fall prey to assumptions about our data. So just thinking, we should just log this for everyone. Maybe that's the way we've always done it in the past or the way others do it, but again, is it really necessary for what we're trying to accomplish? Or you might think that data couldn't possibly be sensitive in the context you're working, but maybe that data is very sensitive in a different context or for users in a vulnerable population. If you're taking data that was gathered for one purpose and apply it in a new way, would users understand or expect that? You also hear people talk about personally identifiable information, or PII, but even data that falls outside that definition can still have a privacy impact. Just like if you're protecting data with encryption and good security, that's wonderful, but privacy is so much more than that. Because this again doesn't get back to the should questions of should we even have this data at all. Now as you're asking questions about the data flows in your app, if you want to even go one step further, you can create privacy guarantees. These are high-level statements about the privacy expectations in your app that you want to be able to make. And by establishing these early on in the development process, it provides a framework to guide you as you're building your features and something to test against once you're done. There's some examples on the screen here of these kinds of statements that are similar to statements Apple has made about some of our features, and there's many options for implementing each of these, which brings me to the third big idea. Align your data practices with your use cases. To illustrate this, let's think of data collection, the amount and types of data that you gather. We mentioned earlier those assumptions. You know, you may think sometimes well shouldn't I just gather as much data as possible? Well, you know, in the past you may have hear people call data the fuel of the new economy. Like actual fuel, data should be handled with caution.
Because data is very powerful, and that unlocks a lot of great use cases, but because it's so powerful, it can also be dangerous if not handled carefully.
Gathering data creates overhead for you as an engineer. You're going to need to spend time and resources managing that data, keeping up with it, filtering it.
It's time you could be spending working on new features for your users.
It also creates liabilities. We've all heard about companies suffering data breaches, which is a bad situation. But if the data that gets leaked includes information that's not relevant to the use case, that's an even worse situation. Unexpected data collection creates all sorts of risks. And it destroys that foundation of trust with your users.
The next time you think about gathering as much data as possible, I want you to picture these tanks of chemicals and remember your responsibility to your users to handle their data carefully and thoughtfully. Instead, you want to practice what we call proportional data collection. This is the idea of collecting only what's necessary to achieve your goal, and again sometimes you might start off thinking that you need a lot of information when a different dataset may suffice. You can even start with the assumption of no data and figure out what's actually necessary for what you're trying to solve.
This gets back to user expectations. People should understand why you're collecting data and how you're using it. It should be in line with what they expect. You should always be able to provide a clear rationale for the use cases that you're building. But of course, this is about data collection, but when we talk about aligning data practices with use cases, that extends to the entire data life cycle and being good stewards of the information that's been entrusted to you. So even beyond just proportional data collection, you want to develop and use privacy techniques throughout your app's workflow.
You could develop a whole toolbox or repertoire of techniques that will help you build privacy into your app. Things like aggregation, providing transparency to users, using a scoped identifier instead of a real identity, automatically rotating those with time. Even more advanced techniques like differential privacy.
I don't have time today to go into the entire list of techniques available, but what I want to focus on now is the idea of adjusting these to match your use case.
You can think of a mixing board for music. If you have one track that's particularly loud or soft, you may need to adjust others to balance things out and achieve a good mix. And again, there will be times when you do need to collect a lot of data for a particular feature, but in those cases, you want to make sure you adjust those privacy techniques to create a great experience for your users.
Ideally, these apply across all systems where the data lives so those privacy guarantees stay consistent and are billed as technical enforcement rather than just policy statements about what you plan to do. But I know this can all be a little abstract, so to illustrate further, I want to has built where we've this kind of thinking. First is activity sharing where you can share fitness data with your friends. Now for me as a privacy engineer, I like to turn all of these sliders up to 100 percent as much as possible, but that's not always feasible for a given feature. In this case, you're sharing data with friends, so they know your name. They know whose data it is. So you can't just make this data de-identified. It's already very identifiable as part of the use case.
Consequently, we turn up other privacy techniques like only showing a summary of the data, not minute-by-minute statistics or the exact location of your run.
We also provide a lot of control over who you share with and when.
Now, in the Apple News app, we collect analytics data using a scoped identifier that's not connected to your Apple ID. That gives us more flexibility around the precision of data we collect, but since it's still sensitive information, we still provide control through things like being able to reset that identifier at any time. Finally, there's photo memories.
You may have seen these on your device. These use facial recognition data to identify people in pictures. It also uses precise location information to connect similar photos together. Now that's very sensitive data.
So consequently, there's another privacy technique we turn way up.
All the processing to build these memories happens on your device.
And by the way, that's a great tool for your toolbox to do processing locally.
So to recap, three big ideas. Privacy is about people, ask the "should" questions, and align your data practices with your use cases. In the time we have remaining, I want to talk through some features and tools that are available to you as developers to help you build privacy in your app. And these fall into two general categories of accessing user data and more broadly, data stewardship.
So for data access, let's start by talking about iOS, and much of this guidance will apply to tvOS and watchOS as well. Let's imagine you're building a game for iOS where players can compete against each other, and you want them to be able to upload a photo to identify themselves. Now we've all seen those permission prompts for access to photos, but wouldn't it be great if we could just have the user click a button, select a picture, and have it appear immediately in their app? Well, you can already do this today. Because we have a feature available called out-of-process pickers for contacts, camera, and photos data, where the picker that appears runs outside your apps process so the only information that's shared back with the app is what the user selects. We talked about those privacy techniques.
This is a case where because this doesn't involve ongoing access to the entire library, the user has control by picking what they share, so we don't need to show a permission prompt and ask them to make a decision about future access. This is the default method for accessing contacts, camera, and photos data. There are going to be times where you may need access to the broader library, but in most situations, you'll find that this works great for a whole lot of apps. This is a case where it does just work and only requires a few lines of code. You can see some snippets here for how to call these pickers in your app. Now as I said, there are going to be some times where you do need access to the broader data, and as you know, there's a variety of protected resources available on the device, but if you're using one of these APIs, you need to keep in mind three things before you start requesting access.
You should only request access to the data that's necessary for your use case in your app.
If you don't actually need the entire library of information, rather than requesting it, you should look for an alternative solution, like those out-of-process pickers.
You should only make these requests when it's needed. You want it to be in the moment when a user makes a decision not when they first open the app and they're bombarded with questions that they don't even understand yet. You want the prompt to be in context. But also you want to rely only on the API for status.
Remember, a user can revoke their decision at any time. You just want make sure your app still functions regardless of what the user had decided. Now when you request access, as you know, you need to include a purpose string or a usage description.
And when I say you need to, these are required. You'll find your app rejected in app review without these, and in fact, you'll find your apps start crashing if you try to access this data without a purpose string. Now this is one way of providing transparency to users. It's certainly not the only way. By that, I don't mean showing a fig prompt before the real one to get them to click. I mean this should be part of the overall program of informing your users about data flows in your app.
The goal here is to explain the reason for a request so that a user can make an informed, effective decision based on their priorities. When I say explain the reason, this is not what I'm talking about. And we've seen purpose strings like this in the past, but again, this is going to lead to rejections in app review.
We're increasingly enforcing quality purpose strings both through automated validation and manual review. Placeholders or blank strings are not going to be sufficient.
Just saying advertising doesn't tell a user much. Requires location doesn't explain the why. Even this last one about more relevant content, that's nice, but it's pretty vague. And when you look at our own maps app, when it requests location, this is what you see. It will be displayed on the map and used for directions, nearby search results, and travel times. So this is explaining the reason the use case for this request. It's specific, includes examples of how the data will be used.
The TV app also, this is another one that Apple wrote using your location to determine what's available to you and show you live games, events, and news from your area.
Remember, if users understand why they're being asked for this data, they're going to be more likely to allow it. If you were building a transit app for a subway system, you might write something like this. This app uses your location to show nearby stops and stations and allows you to plan trips from your current location. Again, explain the use case, be specific, provide an example. Some additional guidelines to remember when you're working with protected resources on a device. Access should not be required for your app to function. Again, this can lead to rejections in the app review process. Your app should have graceful fallback mechanisms so that even if a user declines access, it still functions. For example, with that transit app, if the user denies location access, you can have a field where they can enter location manually and use that instead. Again, you want to verify the authorization status of your app whenever it needs this data to make sure that the user still is getting access, and stay aware of third-part STKs. Again, requesting access should only be for the use cases of your app, and if you're including libraries that change that or tell you to set purpose strings, you should probably look for a different solution or update your code.
Going forward, app developers will only be able to access data from the reports 23andMe generates for customers, such as ancestry composition or risk probabilities for genetic diseases like Parkinson’s. The company says qualified researchers will still have access to raw genetic data, provided that customers have consented to share their information through the API.
Finally, you want to provide ongoing transparency. Again, this should not be the only time that a user gets to understand how their data is being used.
The first is for WiFi network information. If your app uses see and copy current network info, you're going to need to add an entitlement, AccessWiFiInformation in Xcode. This is a capability that you can enable for your app.
For example, if your app is communicating with a hardware accessory and needs to verify if they're on the same network, you would need this. If you're not doing that use case, you don't need to worry about this. This is only if it's necessary for the functionality of your app. You may have also heard about our new Health Records API. Because we know that developers have a lot of great ideas around building apps using health data, but we also recognize that's very sensitive information.
So again, adjusting those privacy techniques rather than just a simple permission prompt, we provide a greater transparency and control for the user.
The first new use case this year is for commonly misspelled words.
If you're typing and correct yourself, devices that are opted in to device analytics will now donate data on those words to help us improve our keyboard algorithms even further while protecting user privacy. Also, for Safari, since last year, we've added the ability for those devices to donate data around websites that typically cause crashes to improve the stability of the browser. So that's iOS. Let's talk for a moment about macOS as well. Because as you may have heard, we've made some changes to how protected resources are handled on macOS. It's an expanded list of categories of data where there are protections in place, and these can now trigger a permission prompt or for some of these an opt-in through system preferences.
I just want to highlight this so that if you're developing for the Mac, I want you to be aware of some of these changes because you need to know how they're going to affect your app if you're accessing these resources. Again, since these can trigger a permission prompt, you want to know when that's going to happen so your users aren't surprised, and please note, this applies to all third-party app processes including those outside of the app store. Just like with iOS, you'll need to set a purpose string for these permission prompts as well, and again there's a session from Tuesday that goes into a lot more detail on how this works for your app. Now to talk about accessing data on the web, I'm going to turn it over to my fellow privacy engineer, Brandon Van Ryswyk.
Thanks Joey. The web is one of the largest venues for data access today. If your business depends on providing content on third-party websites, this section is for you. This year we introduced the Storage Access API. The Storage Access API allows users to engage with logged-in content from embedded third parties across the web including from domains that have been classified as a tracker by intelligent tracking prevention. Now the Storage Access API does this only with the user's explicit consent. Let's go through an example.
Here the user is browsing a news site, news.example, and the news site has an embedded video player from video.example. Now the user has a paid account on the video site and would like to grant the embedded video access to its cookies so that they can enjoy the benefits of their subscription while reading the news. To accomplish this, video.example needs to implement the Storage Access API. Video.example should add a call to the Storage Access API when the user clicks the play button in their app.
Now this is an asynchronous API that will return a promise, so you should be prepared to handle successes as well as failures. So when a user clicks the play button, this will kick off a request, which will result in a prompt asking the user if they would like to grant video.example access to its cookies while embedded in news.example.
If the user clicks allow, this choice will be sticky, and the user won't be prompted again on this combination of domains. But if the user clicks deny, the site can always reprompt.
Let's assume the user clicked allow. The request will go through, and cookies will be returned to the embedded site. Now, this could create a tracking risk as now video.example has their users logged in identity associated with their presence on this news site. Now this is especially important given changes to intelligent tracking prevention this year. Now outside of the user consent provided via the Storage Access API, cookies from domains that are classified as trackers will be partitioned immediately and can never be used in a third-party context.
Additionally, after 30 days without user involvement, these cookies will be purged entirely.
Now importantly here, access via the Storage Access API will count towards this 30-day interaction timer. That means that users who interact frequently with your site in a third-party context will stay logged in. So in this sequence, the user will visit video.example both in a first-party context, logged into the home page, and in a third-party context, where it's embedded across the web. So first, the user visits the site in a first-party context. Now notice that the days since interaction timer will read zero as the user is currently interacting with the site. But as the user interacts with the embedded content throughout their web browsing, the timer will update.
Do Your Apps Know Too Much About You?
This means that when the user returns in a first-party context, the days since interaction timer will read 5 days despite there being 45 days since the user was last on video.example in a first-party context. Adopting the Storage Access API will allow your users to stay logged in and prevent unwanted tracking.
Now privacy does not end with gaining access to a user's data. Privacy is a continued obligation to your users to maintain their trust throughout the data's lifetime.
This is where data stewardship begins. Now I want to give you examples from four areas of data stewardship for you to think about when developing your apps.
First is deletion. Part of being a good data steward is respecting your user's intent to delete something from your app. So you should recognize that there are data flows that go outside your app, and you should ensure consistency between these systems when the user deletes something in your app. Now the operating system doesn't know what's happening inside your app, so if you've donated information to Siri Shortcuts or posted a notification, you should make sure that you delete that content when the user removes it from your app. For example, if a user deletes someone in your app's contact list, Siri Suggestions should not suggest them to message using your app.
Or if a user deletes a thread in your messaging application, you should delete the notifications for that content as well as it would be unexpected for a user to see notifications still on their device from a thread they thought they'd completely deleted.
And finally, if you're a passwords manager, and you've donated passwords to the New System Passwords API, you should make sure to delete this information if the user removes the site from your password manager. Now data stewardship continues through to device tracking, which means something very specific in the context I'll talk about today. You might have questions about the devices that use your apps.
For example, did this device already consume a free trial, or was this device previously used by an abusive user or for fraudulent activities. We offer an API called DeviceCheck that allows you to answer these questions. DeviceCheck lets you set two bits of data per device, which are stored by Apple and can be returned to you with a signature. These bits persist across a device reset or a device erase install. These bits provide high-integrity answers to your questions about a device's history without exposing unique device identifiers.
Now, you should adopt DeviceCheck and not rely on unsupported device tracking mechanisms like finger printing. As Craig said in the keynote, we continue to remove entropy from our platform and to remove functionality that is being abused to uniquely identify users. So you should adopt DeviceCheck to answer your questions while being a good data steward. Now in addition to being a good data steward in your own apps, you should consider your third-party partners as well. Now, you as developers are responsible for all of the code that chips in your app. This includes code that you've written but also code that you've imported in the form of a library.
So you should understand how these libraries that you import access or transfer your user's data off the device. This way you can be complete when giving transparency.
Don't just talk about the code that you wrote. You should describe the full impact on a user's privacy. And as Joey mentioned earlier, you should avoid unnecessary requests for resources. So, for example, if a library that you want to use requires an entitlement or access to some other sensitive resource that isn't required from the functionality you're trying to get out of this library, you should either find a different library or reach out to the developer of that library and ask them to remove that sensitive resource request. Now thinking about third-parties extends to your server side as well. You should understand how data flows to all third parties your server's touch. Now this includes the full breadth of systems that support your app, not just analytics or advertising networks but the network security systems, the customers password reset emails or third-party customer support integrations. Being a good data steward means taking responsibility for the full picture and considering privacy when considering partners to work with.
Now for a topic you may have heard a little bit about, machine learning.
So across the industry, much of the talk about machine learning centers around the performance characteristics of a new algorithm or the power of a cloud-based solution.
And while these are important technical developments, as Tim said in his commencement address, the question that we ask is not what can we do but what should we do.
Now this applies particularly to machine learning, and we've been working on it for years.
Face ID was built with privacy-friendly machine learning at its core.
And we've made it easy for you to add Face ID authentication to your apps using the LocalAuthentication API. You can take advantage of the work Apple has done to build strong biometric authentication using privacy friendly machine learning techniques.
Similarly, ARKit uses machine learning to model the environment around a user's device. And new in ARKit 2, you can create, persist, and store this map of the environment in your own apps, but you should collect this map only if it's needed for your feature as this data might be quite sensitive. It comprises a representation of what's around a user. So if you send it off the device, it should be expected.
Like if you're playing a game collaboratively with a shared object.
And if you use Game Center, you can take advantage of the MultipeerConnectivity API, which supports end-to-end encryption to transfer these models between devices.
Now Face ID and ARKit are good examples of features that Apple has developed that depend on privacy-friendly machine learning that you can use in your apps.
But many of you want more flexibility. Create ML and Core ML allow you to build your own features on top of machine learning. With Create ML and Core ML, it is easier than ever to add on-device machine learning to your apps.
Create ML allows you to train a machine learning model directly on your Mac, and Core ML lets you then take this model and evaluate it directly on a user's device.
This avoids collecting sensitive user data to evaluate the model, and protecting sensitive data on your servers requires a lot of engineering work. Evaluating a model on a user's device can lower your server's security requirements and will lower your breach risk as you don't hold this sensitive information in the first place. Now these two APIs make adoption of on-device machine learning easy. And you should already be asking privacy questions when developing features like the ones Joey went over earlier, and these questions are great, and Joey and I use them every day doing feature reviews at Apple.
However, machine learning requires you to ask a new set of questions to address the same underlying privacy goals. For example, you should ask does my model reveal anything about the data it was trained on? It's actually possible to invert a machine learned model and recover much of the data it was trained on. This could result in unexpected disclosure if you ship a model with your app and it's inverted to expose information about people it was trained on. Now this is an area of active academic research that you can learn more about in the paper that I've put on this slide. Similarly, you should ask, could I infer more about my users than they expected? So users might expect that you'd classify activity type via sensor data. But you should ask, did I accidentally encode the fact that this specific user uses a wheelchair? It could be great to offer a feature for wheelchair users, but this should be clear and sold as a feature. As with general data collection, you should obtain new consent if you have a new use case enabled by machine learning. Now it turns out that two small modifications can help mitigate both of these issues. The first is to ensure that you train on the right data. This means training on a sufficient quantity of diverse inputs that were collected with the proper consent. The second is to keep your model complexity proportional to the goal that you are trying to solve.
Both of these techniques can prevent model overfitting, which makes a model inversion or unexpected inference more likely. Now at Apple, we believe that considering questions like these are an important part of building products that users can trust. Because fundamentally privacy is about people.
It's about building trust with your users and respecting your users in handling their data.
By applying the techniques that we've gone over in this presentation, you too can build products with great features and great privacy. Now in summary, I hope you take away three big ideas about privacy. That privacy is about people, that you have to ask the should questions, and that you should align your data practices with the use cases you're trying to solve. For more information, please check out these sessions, and I look forward to seeing you at our lab after this session so that we can help you build better apps through better privacy.