Practical Data Privacy
Welcome to the wonderful world of privacy! This is a landing page for Practical Data Privacy (O'Reilly 2023) by Katharine Jarmul, a book for technologists to learn the latest privacy technologies and how to apply practically them in real data work.
The book has been translated into German as Data Privacy in der Praxis (dpunkt / O'Reilly DE Verlag 2024).
The book was also translated into Polish as Prywatność danych w praktyce (Helion 2024).
About the Book
Between major privacy regulations like the GDPR and CCPA and expensive and notorious data breaches, there has never been so much pressure to ensure data privacy. Unfortunately, integrating privacy into data systems is still complicated. This essential guide will give you a fundamental understanding of modern privacy building blocks, like differential privacy, federated learning, and encrypted computation. Based on hard-won lessons, this book provides solid advice and best practices for integrating breakthrough privacy-enhancing technologies into production systems.
Practical Data Privacy answers important questions such as:
- What do privacy regulations like GDPR and CCPA mean for my data workflows and data science use cases?
- What does "anonymized data" really mean? How do I actually anonymize data?
- How does federated learning and analysis work?
- Homomorphic encryption sounds great, but is it ready for use?
- How do I compare and choose the best privacy-preserving technologies and methods? Are there open-source libraries that can help?
- How do I ensure that my data science projects are secure by default and private by design?
- How do I work with governance and infosec teams to implement internal policies appropriately?
and more!
Note: I will keep this page updated with Errata, updates on newer editions and other significant changes. The code repository (in the Resources section) will also be updated as libraries change. Should you find an error or want to update or add any examples to the code repository, please open an issue or send a pull request on GitHub! :)
Resources
- Order the book - to get started with an e-book, Safari, or to copy the ISBN to use with your favorite retailer.
- Code and Notebook Repository - to follow along with the chapters.
- Trainings based on the book - to organize and attend trainings based on the book and related topics
- Complete URL List - in case you don't want to use the O'Reilly shortened links.
- Probably Private Newsletter - to follow my continued work and writing in privacy & data.
- Probably Private YouTube - to learn topics from the book via video
- Katharine's Blog - kjam is blogging things - additional writing on topics like privacy in AI/ML systems
Reviews
Practical Data Privacy is exactly what it claims to be—a practical exploration of the approaches to data privacy. The book carefully balances, and makes the case for, the business benefits of protecting our users' data.
— Rebecca Parsons, Chief Technology Officer, Thoughtworks
Finally, a book on practical privacy for some of the most important actors of data protection in practice: data scientists and engineers! From pseudonymization to differential privacy all the way to data provenance, Practical Data Privacy introduces fundamental concepts in clear terms, with examples and code snippets, giving data practitioners the information they need to start thinking about how to implement privacy in practice, using the tools at their disposal.
— Damien Desfontaines, Staff Scientist, Tumult Labs
Gone are the days of saying "data is the new oil"; if data and oil have kinship today, it is that both are at risk to leak and make a huge, expensive mess for you and your stakeholders. The data landscape is increasing in complexity year over year. Regulatory pressures for data privacy and data sovereignty, not to mention algorithmic transparency, explainability, and fairness, are emerging worldwide. It's harder than ever to smartly manage data. Yet the tools for addressing these challenges are also better than ever, and this book is one of those tools. Katharine's practical, pragmatic, and wide-reaching treatment of data privacy is exactly the treatise needed for the challenges of the 2020s and beyond. She balances a deep technical perspective with plain-language overviews of the latest technology approaches and architectures. This book has something for everyone, from the CDO to the data analyst and everyone in between.
— Emily F. Gorcenski, Principal Data Scientist, Data & AI Service Line Lead, Thoughtworks
Consumer privacy protection will define the next decade of Internet technology platforms. Jarmul has written the definitive book on this topic, capturing a decade of learnings on building privacy-first systems.
— Clarence Chio, CTO, Unit21 and co-author of Machine Learning and Security (O'Reilly)
Some data scientists see privacy as something that gets in their way. If you’re not one of them, if you believe privacy is morally and commercially desirable, if you appreciate the rigor and wonder in engineering privacy, if you want to understand the state of the art of the field, then Katharine Jarmul’s book is for you.
— Chris Ford, Head of Technology, Thoughtworks Spain
I finally have a book to point people to when they avoid the topic of data privacy.
— Vincent Warmerdam, creator of Calm Code; Machine Learning Engineer, Explosion
Practical Data Privacy lives totally up to its promises—it is very practical! You will learn a lot about privacy in the context of Machine Learning with examples from big companies and many packages that will help you solve typical problems. I learned a lot while reading this book and recommend it to people who are working with data.
— Natalie Beyer, Co-founder, LAVRIO.solutions