Archive for the ‘data mining’ Category

The fallacy of data mining

Bruce Schneider had a very interesting essay on the soundness of massive citizen profiling.

Intuitively, a libertarian will abhor attempts by government to collect that level of private informations (unchecked by anything in practice). But Schneider shows that these efforts are bound to fail even at the stated goals: preventing terrorist attacks.

Let’s look at some numbers. Assume an unrealistically optimistic system with a 1-in-100 false positive rate (99% accurate), and a 1-in-1,000 false negative rate (99.9% accurate). That is, while it will mistakenly classify something innocent as a terrorist plot one in a hundred times, it will only miss a real terrorist plot one in a thousand times. Assume one billion possible “plots” to sift through per year, about four per American citizen, and that there is one actual terrorist plot per year.

Even this unrealistically accurate system will generate 10 million false alarms for every real terrorist plot it uncovers. Every day of every year, the police will have to investigate 270,000 potential plots in order to find the one real terrorist plot per month.

So do we give them the benefit of stupidity, or just call their intentions malevolent?