PLI Podcast: Eric Olson, Cyveillance

Posted on 15 March 2011 by


Eric Olson thinks about how to reduce extremely large datasets to lists whose elements may be  numbered in thousands. His job at commercial intelligence firm Cyveillance is to focus on the areas of Phishing and specific online crimes including the sale of tobacco and counterfeit drugs, identity theft and child exploitation.

Download the podcast hereDownload via iTunes here

Dave and I have been talking a lot about some of the problems in law enforcement intelligence, and this week on the podcast here’s the challenge we discussed with Eric:

In the classic cloak and dagger scenario, a case officer from some super-secret agency is running agents, and whether he recruited them or they walked in off the street, each agent has been cultivated and managed and operated such that there is a limited flow of human intelligence, or HUMINT, emanating from carefully recruited channels. The information coming from these sources is aggregated and correlated by the officer before it’s memorialized and digitized in a manner that supports a specific mission.

In law enforcement it’s almost the opposite. The HUMINT assets are the range of people cops come in contact with every day, and the officers don’t have a mission more specific than “protect and serve:” which is pretty darn broad. They don’t know what can be important, but they know that the cost of wrong is very high. So they must constantly be processing the data they receive, which is bandwidth intensive. They’re casting a wide net. And at the end of this chain sits the analyst seeking data, and the metric looks like:

  • data==(officers) x (contacts) x (minutes per contact)

To make matters worse, each contact provides information of UNKNOWN and UNKNOWABLE value.

The biggest problem that I have seen in my limited time in law enforcement has been the fact that this is but one of the challenges that is beyond the budgetary. The datasets analysts must deal with, for example, are more problematic than those in commercial cyber defense or even the HUMINT I described above, which is already pre-cleaned and processed (by the officer who’s passing it on) by the time it hits the analyst’s desk.

In law enforcement, we generally don’t have the benefit of this kind of “normalized” data, and the systems producing data and information don’t speak with one another well, if at all. And, generally speaking, law enforcement intelligence and information and data requires far greater human correlation and analysis than in information technology.

Eric speaks to us about how he conceptualizes these problems, seeking ways to make programatic that which seems at first glance non-obvious, or like a hunch.

For example, when describing the most common elements of a phishing website – which I would have thought would obviously be the “Name” or “Account Number” field, Eric’s analysis led to a surprising conclusion that allowed him to eliminate 99.99999% of the sites on the Internet and find a group in which sites genuinely might be fraudulent banking sites.

Olson maintains that analysts don’t need computer programs to find the gold nuggets. Analysts can, in his words, map astounding relationship maps and make connections of mind-blowing kinds given the data. What analysts would rather have, he says, is a program that eliminates the garbage from the data.

How he goes about conceptualizing those programs is what we’re discussing – and how you might put these concepts to work at your agency to improve data collection, processing and cleaning.

It’s an interesting conversation – Download the podcast hereDownload via iTunes here