This is how we analyzed Colorado wildfire data

Hart Van Denburg/CPR News
Colton McDonald at the Blue Lakes trailhead along Highway 14 near Cameron Pass on Thursday, October 14, 2021. This was the point where he set off on a 10-day backpacking trip north into the Rawah Wilderness, and was rescued by helicopter after being trapped by the Cameron Peak fire.

As the one-year anniversaries of the large Colorado fires of 2020 approached this summer with no identified specific causes, CPR News reporters set out to learn just how often the reasons behind a fire remain a mystery.

Data showed that fewer fires are solved in Colorado than neighboring states, so reporters then set out to try and learn why. Dozens of interviews were conducted and the data was refined for comparison from state-to-state. 

Here’s how it was done.


CPR News analyzed wildfire data from the United States Forest Service’s Fire Program Analysis – Fire-Occurrence Database, or FPA–FOD. Federal, state and local agencies have separate systems for reporting wildfires and the FPA–FOD is the only database that contains records about wildfires and their ignition sources reported by each level of government. The most recent version, published in 2021, includes wildfire records from 1992 through 2018. Records submitted to the database are cleaned to remove duplicates and wildfires with inaccurate starting locations. Only wildfires where the ignition point is accurate within a one-mile radius are included. A full description of how the data is cleaned can be found here.

The records include coordinates for the point of ignition and discovery date of each wildfire. Each record contains information on whether the fire was started naturally, by humans, or whether the cause is unknown; as well as a separate field for information on the specific cause of ignition, such as lightning, vehicles, arson, or railroads. The agency that reported the fire, the agency that responded to the fire, and the final burn acreage are also included.

Data Limitations

In some states, reporting wildfires is voluntary. An estimate by Thomas and Butry (2012) showed that only two-thirds of fires of any type, including wildfires, are reported to the system of record for U.S. fire departments, NFIRS. Subsequently, this dataset—and all other United States wildfire datasets—give an incomplete picture of wildfires in the United States. Researchers estimate that small, human-started fires make up the bulk of wildfires that are not reported. 

Many wildfire databases were created to organize emergency response teams and recover costs associated with putting the fires out. Consequently, these wildfires records are not frequently updated to reflect the results of origin-and-cause investigations.

Analysis Decisions

To control for different reporting requirements between states, CPR News restricted its analysis to wildfires that started on or after January 1, 2000, and burned 1,000 acres or more. It is improbable that a fire of this size would not be captured in the FPA-FOD database. Fires of this size are also the most likely to have accurate data on the cause and ignition source.

CPR News separately analyzed all wildfires in the FPA-FOD database without a size constraint and found Colorado still had the highest rate of human-started fires with no listed ignition source. A separate analysis of data from the National Interagency Fire Center, using fires that burned 1,000 or more acres and which were categorized as human-started and a general cause that was either missing data, or “investigated but undetermined,” showed that Colorado tied with New Mexico for the highest percentage of human-started wildfires with an unknown or undetermined cause of ignition.