Practical Data Privacy

Welcome to the wonderful world of privacy! This is a landing page for Practical Data Privacy (O'Reilly 2023) by Katharine Jarmul, a book for technologists to learn the latest privacy technologies and how to apply practically them in real data work.

About the Book

Between major privacy regulations like the GDPR and CCPA and expensive and notorious data breaches, there has never been so much pressure to ensure data privacy. Unfortunately, integrating privacy into data systems is still complicated. This essential guide will give you a fundamental understanding of modern privacy building blocks, like differential privacy, federated learning, and encrypted computation. Based on hard-won lessons, this book provides solid advice and best practices for integrating breakthrough privacy-enhancing technologies into production systems.

Practical Data Privacy answers important questions such as:

  • What do privacy regulations like GDPR and CCPA mean for my data workflows and data science use cases?
  • What does "anonymized data" really mean? How do I actually anonymize data?
  • How does federated learning and analysis work?
  • Homomorphic encryption sounds great, but is it ready for use?
  • How do I compare and choose the best privacy-preserving technologies and methods? Are there open-source libraries that can help?
  • How do I ensure that my data science projects are secure by default and private by design?
  • How do I work with governance and infosec teams to implement internal policies appropriately?

and more!

Note: I will keep this page updated with Errata, updates on newer editions and other significant changes. The code repository (in the Resources section) will also be updated as libraries change. Should you find an error or want to update or add any examples to the code repository, please open an issue or send a pull request on GitHub! :)

Resources

Reviews

Practical Data Privacy is exactly what it claims to be—a practical exploration of the approaches to data privacy. The book carefully balances, and makes the case for, the business benefits of protecting our users' data.

— Rebecca Parsons, Chief Technology Officer, Thoughtworks


Finally, a book on practical privacy for some of the most important actors of data protection in practice: data scientists and engineers! From pseudonymization to differential privacy all the way to data provenance, Practical Data Privacy introduces fundamental concepts in clear terms, with examples and code snippets, giving data practitioners the information they need to start thinking about how to implement privacy in practice, using the tools at their disposal.

— Damien Desfontaines, Staff Scientist, Tumult Labs


Gone are the days of saying "data is the new oil"; if data and oil have kinship today, it is that both are at risk to leak and make a huge, expensive mess for you and your stakeholders. The data landscape is increasing in complexity year over year. Regulatory pressures for data privacy and data sovereignty, not to mention algorithmic transparency, explainability, and fairness, are emerging worldwide. It's harder than ever to smartly manage data. Yet the tools for addressing these challenges are also better than ever, and this book is one of those tools. Katharine's practical, pragmatic, and wide-reaching treatment of data privacy is exactly the treatise needed for the challenges of the 2020s and beyond. She balances a deep technical perspective with plain-language overviews of the latest technology approaches and architectures. This book has something for everyone, from the CDO to the data analyst and everyone in between.

— Emily F. Gorcenski, Principal Data Scientist, Data & AI Service Line Lead, Thoughtworks


Consumer privacy protection will define the next decade of Internet technology platforms. Jarmul has written the definitive book on this topic, capturing a decade of learnings on building privacy-first systems.

— Clarence Chio, CTO, Unit21 and co-author of Machine Learning and Security (O'Reilly)


Some data scientists see privacy as something that gets in their way. If you’re not one of them, if you believe privacy is morally and commercially desirable, if you appreciate the rigor and wonder in engineering privacy, if you want to understand the state of the art of the field, then Katharine Jarmul’s book is for you.

— Chris Ford, Head of Technology, Thoughtworks Spain


I finally have a book to point people to when they avoid the topic of data privacy.

— Vincent Warmerdam, creator of Calm Code; Machine Learning Engineer, Explosion


Practical Data Privacy lives totally up to its promises—it is very practical! You will learn a lot about privacy in the context of Machine Learning with examples from big companies and many packages that will help you solve typical problems. I learned a lot while reading this book and recommend it to people who are working with data.

— Natalie Beyer, Co-founder, LAVRIO.solutions