GULYÁS, Gábor György, Ph.D.
This page demonstrates the power of social network de-anonymization; it explains what it is, provides insights how these attacks work, and contains a visual simulator, where you can play around with the attacks.
This could be a useful resource for:
You can also find a Python starter kit here. It contains the algorithm source codes and the datasets you can find below.
If you would like to go for a deeper test, you should take a look at the SALab framework, with which you can create your own datasets, and run experiments on large datasets having tens of thousands of nodes.
For the intuitive explanation check out the following demo, flavored a bit for the presidential elections.
Click me!
to unveil the visual explanation!
These attacks can have a severe impact on privacy, as they could be used in multiple adversarial settings – beside re-identification attacks like in this example, e.g., linking unrelated datasets, like email and telephone communication networks. It wouldn't be surprising to find out someday that secret services had been relying on these attacks in some form.
(Note: this part concerns our paper. If you would like to understand better what are the differences on the algorithms below, read this part, otherwise, you can directly jump to the simulator. )
Remember the scoring part in the visual explanation? Actually, that plays a crucial part in the first attack of this kind, which we will call Nar later on. In that attack, they derived the scoring function from the cosine similarity. They showed on two real social networks that Nar can do its job pretty well: it mapped roughly 8,000 users correctly between a Twitter and a Flickr crawl, while it only made a bit more than 3,000 errors (out of ~27k users that were available in both social networks).
As said, the Nar scoring scheme was derived from the cosine similarity and looks like this:
where each set is the set of neighbors of each node. This scheme is biased, as it penalizes high degree nodes in favor of low degree nodes. We proposed the following metric, called BlbSim, that eliminates the bias:
and incorporated in an attack called Bumblebee ( more details here). We'll use Blb for short.
It turned out that this changes the whole behavior of the attack. As the similarity functions provides a wider diversity of scores, but also changes several characteristics of the attack.
Below, you can run the Nar and Blb attacks and compare them directly in your browser! Here, we provide some tips on what differences might be worth checking for, along our main findings.
(Note: these are very small toy networks, handle results with precaution. Do real experiments in the SALab framework.)
So, some of our key findings:
Demo
: for example run Nar over Dataset #2 with default settings, then Blb with setting Θ=0.1
.
Demo
: switch to Dataset #3, where you'll see that attacks have a more difficult task. Run Nar with Θ=0.5, #seeds=25
, then Blb with setting Θ=0.5
or Θ=1.0, #seeds=25, δ=0.5
.
Demo
: with Dataset #1 and default settings run Nar with #seeds ∈ [1, 2, 3]
, then Blb with setting #seeds=1
.1
vs. 60
.)
Below, you can run de-anonymization attacks and compare results on a scoreboard. Attacks run on two graphs: the grey is the background information of the attacker where user identities are known, and the blue is the anonymized network with sensitive information.
#seeds
determines the number of initial mappings before the algorithms are run.0.1-0.5
.Controls
Algorithm
Start
Params
Θ =
#seeds =
δ =
Click me!
to see the attacks in action – right in your browser!