These days the developer conference held Apple, and during the WWDC keynote address that 2016 was a specific reserved for an issue that Apple is particularly proud section: the protection of privacy.
Speaking of his new efforts in this area, Apple mentioned the use of a discipline that few had heard: it is the “differential privacy”, a statistical system to collect and analyze data without compromising the identity and privacy of those who provided consciously or unconsciously. Do you know who created the concept? Microsoft, among others.
Collect data without violating privacy it is (mathematically) possible
The differential privacy is indeed a unique concept, especially now that all large companies based much of its activity to study our activity. Google, Facebook, Microsoft or Amazon do not stop to collect our data and then used in artificial intelligence engines, or advertising systems, or the way we recommend products and content.
All these companies swear up and down that our data is safe with them, they have the privacy flag and that type of collection is “dressed” with a layer that allows, “anonymize” the data to protect the identities of users. Well, the “differential privacy” is actually a discipline that seeks to achieve precisely the same objective specifically. Data collection is not the ultimate goal: it is to collect them without the identity of the remains revealed propocionan.
For this techniques are used as the hashing of data (the number), the subsampling which only takes part of these data- and noise injection , which makes the actual data are added and other random hiding “encapotan” sensitive or personal data.
Microsoft helped create the concept, Google uses in Chrome
Although Apple has been the one that has become fashionable the term thanks to the media impact that causes any of its new products in the market, the concept of differential privacy has its origin in a study by Microsoft Research, the division of research and development Redmond company.
In 2006 Cynthia Dwork, Frank McSherry, Kobbi Nissim and Adam Smith published the study “Calibrating Noise to Sensitivity in Private Data Analysis” as an effort to continue, “the line of research initiated in statistical databases in which privacy is preserved”.
This document is full of mathematical formulas in analyzing the sensitivity and privacy of the data to apply that “obfuscation” of the same during the statistical analysis of what is useful to take this approach the company. Some experts like Moritz Hardt and indicated that although the idea is especially interesting, its practical application may be complex.
This Google researcher specializes in machine learning and its second analysis on the subject over year made it clear that “the differential privacy is a rigorous way to machine learning, not to prevent this technique is used.” In addition, it is precisely in this area where this technique might make sense, but there are others where it can also be very relevant.
Google demonstrated that long ago launched the Open Source project -the code is available on GitHub – called Rappor (Randomized Aggregatable Privacy-Preserving Ordinal Response). As explained Google engineers in their own blog about security, this system “allows to know statistics about the behavior of software users while customer privacy is maintained.”
You may also like to read another article on iMindSoft: When bad passwords past make your life impossible in this?
The project has been part of the development versions of Chrome and indeed, it was possible to set it up on your browser preferences, but in recent versions that option has disappeared though Google makes clear in its white paper on privacy protection with function Safe browsing Chrome makes use of differential privacy.
Is the differential privacy solution to all our problems? Well, it certainly seems to be a good alternative to try to protect the data that are constantly giving companies that feed on them.
Some criticize the scope of this solution implements We’ll see how Apple – but the response of Adam Smith, one of the originators of the concept, these arguments was clear: ” not present differential privacy as the cure to all diseases of data of this world […] but I am sure that there is currently no rigorous way of thinking in the privacy of statistical data that provides a good alternative to differential privacy”.