GULYÁS, Gábor György, Ph.D.

Blog

On the impact of machine learning (on privacy)

2016-02-09 | Gabor

I've recently read an article where the author pictured a future where Google-glass-like products can support our decisions by using face recognition and similar techniques. While the author definitely aimed to picture a 'new bright future', she remained silent about potential abuses and privacy issues. Plus, while the technology has a definite direction toward as she desribed, this still leaves the writing as a piece of science fiction at the moment. But where exactly we are now, and for how long we may be reliefed?

Today, using machine learning (ML) is a hard task. First, you need to get vast amounts of quality data, then picking the proper algorithm, training and using it is also highly non-trivial. Not to mention hardware requirements, as the training requires a lot of computation power, and it takes a while until your application learns understanding the task it is designed to do. This might sound comforting from a privacy-focused aspect, but that would be inadequate to do so.

I see three major issues that could result a change in the state of the art, and I think – for some of these – we are already in the shifting phase:

Machine learning based applications should fit an average smartphone. Last year, we could see a nice pioneering example of real-time (pretrained) machine learning with Google translate: it could detect text from the camera in real time (text recognition), it could translate the given text (with deep neural networks), then it could replace the original text with its translation. This kind of applications should fit in low-end phones, too. This is likely to hapen in one, or two years.
Currently programs are trained remotely due to resources issues. Training phase needs to be shifted to the consumer side, to be done on smart phones. In a couple of years we might have specialized chips in smartphones that enable this, opening up the way of new types of applications.
Developing applications that use machine learning should be easier. There are a lot of research and educational activities around machine learning nowadays, but we can't stay that machine learning could be a simple import-and-use tool in general in the future. For some specific tasks, data types it might be, but that's all what we can see now.

It is easy to imagine that such ML could provide an exponential amount of privacy-infringing uses (*). However, we should not forget that today the data driven businesses fuel machine learning research and application development. Thus, there are already thousands of services that are built around data and machine learning. As many of these companies use data that was not gathered by user consent (just to mention at least one possible privacy violation), ML is already here to erode further our privacy.

Let's have some examples. BlueCava, a company that uses fingerprinting to track people on the web, is using machine learning to connect devices that belong to the same person. This is just an example; with little effort we could find a miriad of other companies who analyse user behavior, buying intent, fields of interests, etc. with similar techniques. Data that we generate is also at stake: we could think about smartphones and wareable devices, but also the posts we write.

To conclude shortly, machine learning already has a huge impact that should increase incredibly in the next few years. All big companies have their own research groups in the field, and if we are honest to ourselves, we know this is for a simple reason: use machine learning in their products in order to increase their revenues.

(*) I intentionally did not want to add a comment to if machines could became alive. I think here you can read a realistic opinion on the topic.

Tags: privacy, machine learning, web privacy, data privacy, google glass, google

Back to the archives

Blog tagcloud

CSP (1), Content-Security-Policy (1), ad industry (1), adblock (1), ads (1), advertising wars (1), amazon (1), announcement (1), anonymity (9), anonymity measure (2), anonymity paradox (3), anonymity set (1), boundary (1), bug (2), code (1), control (1), crawling (1), data privacy (1), data retention (1), data surveillance (1), de-anonymization (2), definition (1), demo (1), device fingerprint (2), device identifier (1), disposable email (1), ebook (1), el capitan (1), email privacy (1), encryption (1), end (1), extensions (1), fairness (1), false-beliefs (1), fingerprint (3), fingerprint blocking (1), fingerprinting (3), firefox (1), firegloves (1), font (1), future of privacy (2), google (1), google glass (1), home (1), hungarian keyboard layout (1), inkscape (1), interesting paper (1), internet measurement (1), keys (1), kmap (1), latex (1), location guard (1), location privacy (1), logins (1), mac (1), machine learning (3), neural networks (1), nsa (2), osx (2), paper (2), pet symposium (2), plot (1), price of privacy (1), prism (1), privacy (8), privacy enhancing technology (1), privacy-enhancing technologies (2), privacy-enhancing technology (1), profiling (2), projects (1), raising awareness (1), rationality (1), re-identification (1), simulation (1), social network (2), surveillance (2), tbb (1), thesis contest (1), tor (1), tracemail (1), tracking (12), tracking cookie (1), transparency (1), tresorit blog (4), uniqueness (3), visualization (1), web bug (3), web privacy (3), web security (1), web tracking (3), win (1), you are the product (1)

@GulyasGG