No, you probably don’t need a data warehouse – unless your initials are “NYPD”

Posted on 11 July 2011 by


A journalist working on a piece about some high-end technology [from a vendor whose name I can’t mention, but whose initials are IBM] being touted as saving the day at a large agency [I can’t reveal the agency, but its initials are NYPD] asked some questions about crime mapping, Compstat and crime analysis and I thought I would share some of the answers to get a general conversation going.

Use the comment form for feedback, hatemail and the inevitable IBM press people telling me I have it all wrong.

Could you please briefly tell me about the current situation with crime tracking tools? Are enough systems in place or is it fragmented? I would love to get an overview of how you see the situation.

Large-scale statistical crime analysis has changed dramatically since the 1990s, when the New York City Police Department introduced the earliest versions of COMPSTAT (or Compstat). Compstat is at its heart a management philosophy and program, which relies on data compiled from a range of technologies, but it is also used to describe the crime tracking software comprising a relational database (or several) and a series of scripts that add, aggregate, correlate and then map crimes.

The resultant datasets, maps and statistical analysis are used by agencies to detect changes in the levels of certain types of crime (by group) and identify geographical areas in which those changes have occurred – as more crimes are committed, “hot-spots” are visualized on the map. By regularly reviewing these changes in crime levels, police administrators are able to devise strategies, deploy resources and otherwise address the spikes in crime.

For example, a series of thefts of a specific type of machinery might indicate several things – a newly discovered use for component parts or a rising black market for the devices themselves, such as thefts of catalytic converters, which followed a sharp rise in the price of the components platinum, palladium and rhodium.

A hot-spot might indicate a new street gang or a turf war.

Or it could mean that the weather was too hot and people became bored and started hitting each other. Or drinking a lot of alcohol, stripping down, abusing passers-by, running generally amok and leaping aboard inflatable rafts to escape (hat-tip to my friend Deputy Bob for that last one, which, okay, is probably not mappable).

Crime tracking combined with Compstat supervisor meetings comprise a major part of the NYPD’s management philosophy. But when considering stories about the success of Compstat, it’s important to understand that, in the 1990s, crime rates in the US started to fall, and a clear leader in crime reduction was New York City. This occurred for many reasons, and technology vendors and NYPD brass like very much to point to Compstat as a primary driver.

According to Prof. Frank Zimring, in our conversation and podcast about his forthcoming book, The City That Became Safe: What New York Teaches About Crime and Its Control, what drove annual crime rates down in New York City down was “a combination of hotspots policing and the destruction of public drug markets, and maybe Compstat mapping, and maybe gun programs” [emphasis original].

It is inaccurate at best; it is certainly conflation of correlation and causality; and it is at worst intellectually dishonest, to claim that technology was the major factor in driving down New York City’s crime rate.

That said, the technical, cultural and managerial influence of Compstat on American law enforcement is undeniable: cities around the US now use it – in 2004, 58% of agencies with more than 100 officers were considering deploying a Compstat-like capability and crime statistics and crime analysis are growing fields.

The International Association of Crime Analysts and International Association of Law Enforcement Intelligence Analysts have grown dramatically in size and influence in the past decade, and college programs dedicated to crime analysis are popping up around the country.

Currently almost all cities using Compstat and similar management frameworks run on off-the-shelf hardware and commercially available or free and open source software. And for the most part, these agencies get real value for their constituents without looking to custom-built applications or high-end systems.

What do you think about the Crime Information Warehouse being developed? Does it address critical needs? Does it represent a significant advancement and if so or not, why? I’d love to get your opinion on any aspect of the system, potential advantages, disadvantages and anything else you would like to share.

A data warehouse is a highly useful system to have, yet it requires a degree of organizational technical maturity and a cogent IT strategy which by definition limits its applicability to only a handful of agencies in the world.

Attention: the NYPD is in every way an outlier.

The New York City Police Department has more than twice the number of sworn officers as the second largest agency (Chicago, with about 13,000 officers) and more than three times those of the third-largest (Los Angeles, with about 10,000).

According to, New York City is the nation’s most populous city with more than 8.1 million residents in 2010 more than double the population of Los Angeles, the next largest. At 27,000 people per square mile, New York is by far the most densely populated of any US city. The city’s residents speak about 200 languages, because 3 million of its residents are foreign-born – more than 25% of them have been in the city for a decade or less.

I mention all these statistics because the Compstat program and the silos of information that the NYPD views as relevant to its policing mission have long eclipsed mere “crime” data. The city maintains a state-of-the art “Syndromic Surveillance System” program which provides medical and health information, and lots of other “non-crime” statistics – such as those called for in several national intelligence strategies, such as terrorism, public health and safety, and natural-hazard emergency get “tracked”.

All this means that New York City has a markedly different need, a patently different use case, from any agency. From an information technology and information management standpoint, the NYPD’s mission since 9/11 has moved from local law enforcement to something that more closely resembles an intelligence and law enforcement agency.

The technical power of the [data warehousing] system described here vastly out-classes anything used anywhere else in the country, and is in fact antithetical to the premise of Compstat being easy and cheap to deploy, leveraging commodity technology:

The databases the NYPD and other Compstat-driven agencies use are not proprietary software developed entirely by in-house programmers or special consultants. Instead, they are off-the-shelf software packages any agency can purchase and use. Similarly, Compstat technology does not require highly sophisticated hardware. A couple of basic stand-alone PCs or a small networked LAN system can generally run even the largest agency’s Compstat initiative.”

Does it address critical needs?

For the NYPD’s unique mission, this [data warehouse] is unquestionably a critical need, and it would appear this system should be capable of delivering value. I’d be hard-pressed to find other local agencies in such need of these capabilities that the cost:benefit analysis would result in a purchase on its merits. There is a very interesting use case here for fusion centers, though.

Does it represent a significant advancement and if so or not, why?

Technically speaking, this system appears capable of hitting the bullseye in what we look to for a technology product in law enforcement. It costs a bomb, but the utility of the capabilities is substantial enough to mean that it’s doing something you can’t buy anywhere else. By integrating, aggregating and correlating information derived from across stovepipes, by de-conflicting investigation information, and then providing one-stop shopping in a real-time intelligence product, it is simple to use. Those three qualities make it an advancement over the kinds of technology platforms in use at law enforcement agencies today.

But a crime data warehouse is not by any stretch of the imagination a “plug-and-play” item. To even contemplate something of this complexity requires a degree of organizational and technical maturity, and a cogent IT strategy. This, by definition, limits its applicability to only a handful of agencies in the world.

I’d love to get your opinion on any aspect of the system, potential advantages, disadvantages and anything else you would like to share.

Many highly sophisticated agencies – case in point, the Nassau County (NY) Police Department – find that they can run world class crime tracking and intelligence operations using far less customized software. Their real time intelligence center operation provides the same kind of cross-jurisdictional information sharing including crime mapping, hotspots, and correlated views on flat-screen TV monitors and mobile data terminals for their officers in eight precincts, and in the 17 village, and two city, police departments within the county.

In Los Angeles, a similar capability is produced by the LA Sheriff’s Department. And they are not alone; excellent crime and intelligence operations are in place in cities as big as Miami, Dallas, Chicago and Phoenix, and in cities as surprising as Naperville, IL; Plano, TX and Rochester, NY.

All these agencies built the capabilities combining in-house and intern effort, commercial, off the shelf and free-and-open-source software. Is the IBM product more enterprise-ready? Quite probably. But again, I’m not sure that in many markets outside New York City, it would pass a straight up-or-down cost:benefit analysis.