Intel On The Cheap: Screen-scraping

Posted on 26 February 2011 by


Map-making using screen-scraped data

In this multi-part series on intelligence gathering and analysis tools that can be set up free or at no cost, we’ll explore the mysteries of screen-scraping. This is a pretty basic and high-level discussion.

Whether you know it, screen-scraping is something that you already do; it’s grabbing text from websites, documents and other information repositories, and using it for something.

Most of the screen-scraping done by crime analysts and law enforcement intelligence analysts on a regular basis is cutting and pasting information from a range of sources – Accurint, Facebook, Twitter, MySpace, *CIC, department of motor vehicles, records management systems and the like and combining it (manually aggregating and then correlating it) in something like an intelligence report.

Consider something like a burglary of a habitation circular on a specific individual. It would contain, likely, a photo (derived from a mugshot or the DMV record), reports of the burglaries of which he is suspected, information about what was taken, information about things he may have pawned from sites like LeadsOnline, and information which the analyst may have put together from a variety of sources about the kinds of places he likes to rob, and possible locations of where he may go.

All screen-scraping is is a way to automate the processes involved in creating an intelligence product and deal with far larger sets of data and information. For example, automating multiple parallel searches for keywords within social media and media sites, filtering the output into a usable form, counting the results and graphing them might be one way to determine activity of a specific group. Overlaying that information with other information present in a RMS, or other database,you can start to find patterns.

Obviously these can be reactive or predictive. You can look at social media buzz about a get-together and mapping those who discuss it; explore known gang activity, parolees and probationers in the region, and intelligence or police reports for the time leading up to an incident, or to make inferences about whether a particular incident may take place in a given region or house. Obviously many crime analysts and intelligence analysts are doing things like this right now – if you’re not, you should be thinking about it.

Some tools would include Programmable Web’s mashup tool, which provides a GUI interface to some seriously powerful stuff. It has built-in functionality to help with automating searches of social media. It also has a great list of APIs available for a range of websites that will make you say, “No! Really?!”

Similarly there are great resources available at ScraperWiki.

And finally, a collection of screenscraping links that will make you skip back to your desk from lunch is available at ProPublica, which has compiled a slew of links to free software for screen-scraping. They originally intended it for journalists. Go to town.