This is an analysis of Missouri wildfire reporting data from 2002 to 2021 including data cleaning, an initial exploratory analysis with integrity checks and a full analysis with explanations of the process.
This data set was downloaded from the Missouri Department of Conservation’s wildfire reporting at this link [https://mdc12.mdc.mo.gov/Applications/MDCFireReporting/Home/FireReportSearch].
The data was downloaded by selecting a time range from Jan. 1, 2002 to Nov. 16, 2021. There are 69,745 rows of data and 7 variables including view, discovered date, county, station, cause, acres burned and the member who reported the wildfire. All possible causes of wildfire were selected including arson, campfire, children, debris, equipment, fireworks, lightning, miscellaneous, not reported, powerline, railroad, smoking, structure and unknown. Data for prescribed and controlled fires is not included in this data set. All counties were chosen for for the data downloaded.
It is unclear what the definitions are for some of the causes of fire. For example, the cause labeled “children” is vague. I plan to reach out to someone to clarify the definitions of these causes.
There are 1,393 NAs in the acres burned category, approximately 2% of the data. While these rows exist as a report, I’m not sure if this means no acres were burned or it wasn’t reported. I believe the dataset still has sufficient information for analysis.
Wildfire reports are categorized by county. However, there are 124 counties listed, including a Decatur, which is not a county in Missouri, and a category called “out of state”. I know there must be other duplicates or errors since there are only 115 counties (including city of St. Louis) in Missouri.
This analysis focuses on Missouri as a whole and counties, acres burned, number of reports, and various time periods. The year 2020 is analyzed the most since it is the most recent full-year of data, but data from 2021 is included in analysis of trends over time. Because columns including the member who reported the wildfire and the name of the station are dirty, no conclusions will be drawn using these columns. Some exploratory analysis will be done using these columns.
Loading libraries.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.5 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.4 ✓ stringr 1.4.0
## ✓ readr 2.0.1 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
Loading data and cleaning variable names.
wildfires <- read_csv("data/wildfire-reports-2002-2021.csv")
## Rows: 69745 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): View, Discovered Date, County, Station, Cause, Member
## dbl (1): Acres Burned
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
wildfires <- wildfires %>% clean_names()
Converting the date column to new column in date format. What is the exact time range of the data?
wildfires <- wildfires %>% mutate(new_date = as.Date(discovered_date, "%m/%d/%Y"), .after="discovered_date")
wildfires %>% summarise(range(new_date))
The range of the data is Jan. 9, 2002 to Nov. 15, 2021.
How many different causes of wildfire are there?
wildfires %>% count(cause) %>%
arrange(desc(n))
There are 14 causes, including 8,083 rows for miscellaneous, 20,543 for unknown and 1,663 not reported. Debris has the most at 28,067 reports.
How many different counties? Which county has the most reports?
wildfires %>% count(county) %>%
arrange(desc(n))
The fewest reports?
wildfires %>% count(county) %>%
arrange(n)
124 “counties” are listed. Camden county has the most reports over the time frame at 3,529. Decatur and Fulton had the fewest reports with 1 report each. Boone is in the bottom ten, with only 14 reports. Decatur and Fulton are not counties in Missouri, but there is a township and a city with the same names. There is also an “Out of State” category on here. It has 26 reports.
Are there any counties listed as NA?
wildfires %>% filter(is.na(county)) %>%
summarise(count = n())
There are 56 rows with the county listed as NA.
Cleaning the county names: Creating a new table to filter out all of the counties that aren’t actual counties.
## Bringing in a csv with Missouri county names and state attached. I'm using the state to introduce NAs in the state column for counties that don't match.
mo_counties <- read_csv("data/mo-counties.csv")
## Rows: 115 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): County, State
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
mo_counties <- mo_counties %>% clean_names()
## Matching county names that have multiple spellings in wildfires.
## Joining mo_counties with wildfires to filter out non-counties. Matching
wildfires <- wildfires %>% mutate(county2 = case_when(
county=="Dekalb" ~ "DeKalb",
county=="Sainte Genevieve" ~ "Ste. Genevieve",
county=="St. Charles" ~ "Saint Charles",
county=="St. Clair" ~ "Saint Clair",
county=="St. Francois" ~ "Saint Francois",
TRUE ~ as.character(county),
))
wildfires <- wildfires %>% left_join(mo_counties, by = c("county2"="county"))
## Checking that this worked, it did!
wildfires %>% filter(is.na(state)) %>%
group_by(county2) %>%
summarise(count = n())
wildfires %>% filter(!is.na(state)) %>%
group_by(county2) %>%
summarise(count = n())
Four fields are included in wildfires that don’t match a county: Decatur, Fulton, out of state, and NA. It is 84 total rows.
When you count counties now using the county2 column, it returns 115 counties, which includes the city of St. Louis.
Now, taking out Decatur, Fulton, Out of State and NAs and putting them into a new table:
not_counties <- wildfires %>% filter(is.na(state))
wildfires <- wildfires %>% filter(!is.na(state))
## Testing this again to make sure it worked
wildfires %>% filter(is.na(state)) %>%
group_by(county2) %>%
summarise(count = n())
wildfires %>% filter(!is.na(state)) %>%
group_by(county2) %>%
summarise(count = n())
## The unmatched counties are removed from wildfires.
How many different stations? Which station has the most reports for the time period?
wildfires %>% count(station) %>%
arrange(desc(n))
There are 745 stations listed throughout this dataset. Looking at the names, they look dirty. Some are all uppercase and some are not. Butler County Fire Protection district has the most reports over the time period at 1,771.
How many reports list the station as NA?
wildfires %>% filter(is.na(station)) %>%
summarise(count = n())
No stations are listed as NA.
Making a new column for just the year. What year had the most reports?
wildfires <- wildfires %>% mutate(year = substr(discovered_date, 7, 10), .after = "discovered_date")
wildfires %>% count(year) %>%
arrange(desc(n))
2012 had the most reports at 6,578 rows/reports.
Making a new column for just year and month. What month out of all years had the most reports?
wildfires <- wildfires %>% mutate(month = substr(new_date, 1, 7), .after = "discovered_date")
wildfires %>% group_by(month) %>%
summarise(count = n()) %>%
arrange(desc(count))
The highest number of reports was in March 2014 at 2,214 reports.
Are there NAs in the acres burned column? How many?
wildfires %>% filter(is.na(acres_burned)) %>%
summarise(count=n())
There are 1,410 NAs in the acres burned column. This is about 2% of the data.
What is the range for acres burned?
wildfires %>% filter(!is.na(acres_burned)) %>%
summarise(range = range(acres_burned))
The range of the acres burned goes from -1 to 10,142.16 acres.
How many rows include -1 as the number of acres burned?
wildfires %>% filter(acres_burned == -1)
Six rows list -1 acres burned. All from Fort Osage Fire Protection District in Jackson County in 2004. These occurred during January, February and March.
How many NAs are in the member column?
wildfires %>% filter(is.na(member)) %>%
summarise(count=n())
4,875 NAs in the member column.
Is the member column clean? Creating a new table for member counts.
members <- wildfires %>% filter(!is.na(member)) %>%
group_by(member) %>%
summarise(count = n())
This returns 15,074 rows. It’s full of repetitive entries and different spellings of names, so it isn’t clean.
What person has filed the most reports?
wildfires %>% filter(!is.na(member)) %>%
group_by(member) %>%
summarise(count = n()) %>%
arrange(desc(count))
Bob Fredwell has filed the most at 260 reports. The top ten includes “none”, “4200”, “3000” and more, so again, the names are dirty.
How many acres were burned overall for Missouri for the time period?
wildfires %>% filter(!is.na(acres_burned)) %>%
summarise(total_acres = sum(acres_burned))
1,089,632 acres burned from 2002 to 2021.
How many acres were burned overall for Boone County in the time period?
wildfires %>% filter(!is.na(acres_burned) & county2 == "Boone") %>%
summarise(acres = sum(acres_burned))
From 2002 to Nov. 2021, only 77.01 acres have been burned in Boone County from wildfires.
How many acres were burned for Missouri in 2020? 2021?
wildfires %>% filter(!is.na(acres_burned) & year == "2020") %>%
summarise(total_acres = sum(acres_burned))
wildfires %>% filter(!is.na(acres_burned) & year == "2021") %>%
summarise(total_acres = sum(acres_burned))
30,635.49 acres in 2020 and 30,563.77 acres in 2021.
How many acres were burned in Boone County in 2020? 2021?
wildfires %>% filter(!is.na(acres_burned) & year == "2020" & county2 == "Boone") %>%
summarise(total_acres = sum(acres_burned))
wildfires %>% filter(!is.na(acres_burned) & year == "2021" & county2 == "Boone") %>%
summarise(total_acres = sum(acres_burned))
In 2020, there were zero acres burned in Boone County. In 2021, there have been 7.93 acres burned as of Nov. 15.
What county had the most acres burned overall?
wildfires %>% filter(!is.na(acres_burned)) %>%
group_by(county2) %>%
summarise(total_acres = sum(acres_burned)) %>%
arrange(desc(total_acres))
Camden County has the highest overall total acres burned at 95,599.12 acres.
What county had the least acres burned overall?
wildfires %>% filter(!is.na(acres_burned)) %>%
group_by(county2) %>%
summarise(total_acres = sum(acres_burned)) %>%
arrange(total_acres)
Saint Louis City only had 1.95 acres burned for the time period.
What county had the most acres burned in 2020? 2021?
wildfires %>% filter(!is.na(acres_burned) & year == "2020") %>%
group_by(county2) %>%
summarise(total_acres = sum(acres_burned)) %>%
arrange(desc(total_acres))
In 2020, Washington County had the most acres burned at 11,077.20
wildfires %>% filter(!is.na(acres_burned) & year == "2021") %>%
group_by(county2) %>%
summarise(total_acres = sum(acres_burned)) %>%
arrange(desc(total_acres))
In 2021, Camden County had the most acres burned at 3,655.68 acres.
What year did the most acres burn in Missouri?
wildfires %>% filter(!is.na(acres_burned)) %>%
group_by(year) %>%
summarise(acres = sum(acres_burned)) %>%
arrange(desc(acres))
The highest number of acres burned was in 2012 with 150,841.04 acres. The second highest was 2014 with 96,077.6 acres.
What month overall has had the highest number of acres burned?
wildfires %>% filter(!is.na(acres_burned)) %>%
group_by(month) %>%
summarise(acres_burned = sum(acres_burned)) %>%
arrange(desc(acres_burned))
March 2014 was the month with the highest number of acres burned in the time period at 52,345.44 acres burned.
In March 2014, what county had the highest number of acres burned?
wildfires %>% filter(!is.na(acres_burned) & month == "2014-03") %>%
group_by(county2) %>%
summarise(acres = sum(acres_burned)) %>%
arrange(desc(acres))
Camden County has the most acres burned in March 2014 at 5,456.25 acres.
What day did the most acres burn in Missouri in the time period?
wildfires %>% filter(!is.na(acres_burned)) %>%
group_by(new_date) %>%
summarise(acres = sum(acres_burned)) %>%
arrange(desc(acres))
March 6, 2012 had the highest number of acres burned at 24,683.34 acres.
What county had the highest number of acres burned on March 6, 2012?
wildfires %>% filter(!is.na(acres_burned) & new_date == "2012-03-06") %>%
group_by(county2) %>%
summarise(sum = sum(acres_burned)) %>%
arrange(desc(sum))
Dallas County had the highest on March 6, 2012 at 7,666 acres. The second highest is Washington county at 5,466.6 acres burned.
How has the number of acres burned change over time?
Creating a new table showing the acres burned from 2002 to 2021 for Missouri by year. Plotting on a line chart.
acres_mo_year <- wildfires %>% filter(!is.na(acres_burned)) %>%
group_by(year) %>%
summarise(total_acres = sum(acres_burned))
ggplot(data=acres_mo_year, aes(x=year, y=total_acres, group=1, fill="red")) +
geom_line()
While the number of acres burned peaked in 2012 for the time period, acres burned for the past five years appears to be trending downward.
Looking at acres burned in Missouri by month. Creating a new table showing the acres burned from 2002 to 2021 for Missouri by month. Plotting on a line chart. Any notable trends?
acres_mo_month <- wildfires %>% filter(!is.na(acres_burned)) %>%
group_by(month) %>%
summarise(monthly_acres = sum(acres_burned))
ggplot(data=acres_mo_month, aes(x=month, y=monthly_acres, group=1)) +
geom_line()
It doesn’t show a consistent trend for the entire time period, but it looks like “seasons” exist. The data fluctuates frequently. The peaks also appear to be trending downward in the past 5-7 years, but going back up in 2020-2021.
A naturalist at the Runge Conservation Center explained that there has been a decrease in wildfires in Missouri over time, so the data from the last years supports that. Having more historical data about wildfires would help confirm the claim and see a broader trend.
Writing a csv of acres burned in Missouri by month to create a chart in Datawrapper.
write_csv(acres_mo_month, "data/acres-mo-month.csv", na="")
Creating a new table showing the acres burned in Boone County from 2002 to 2021 by year. Ploting on a line chart. Any notable trends?
acres_boone_year <- wildfires %>% filter(!is.na(acres_burned) & county2 == "Boone") %>%
group_by(year) %>%
summarise(total_acres = sum(acres_burned))
ggplot(data=acres_boone_year, aes(x=year, y=total_acres, group=1)) +
geom_line()
Create a new table showing the acres burned in Boone County from 2002 to 2021 by month. Plot on a line chart. Any notable trends?
acres_boone_month <- wildfires %>% filter(!is.na(acres_burned) & county2 == "Boone") %>%
group_by(month) %>%
summarise(monthly_acres = sum(acres_burned))
ggplot(data=acres_boone_month, aes(x=month, y=monthly_acres, group=1)) +
geom_line()
Boone County has not had reports of very many acres burned. Only 9 months out of the reported time period have listed any acres burned. This is definitely worth looking into and checking with an expert on the topic. Boone isn’t as rural as most other counties in Missouri, so it may just not have many large wildfires. I’ll look into the number of reports in the next section of the analysis.
How has the number of acres burned for each county changed over time? (by year)
acres_by_county <- wildfires %>% filter(!is.na(acres_burned)) %>%
group_by(year, county2) %>%
summarise(sum = sum(acres_burned))
## `summarise()` has grouped output by 'year'. You can override using the `.groups` argument.
Plotting acres burned by county by year on a line chart.
ggplot(data=acres_by_county, aes(x=year, y=sum, group=county2, color=county2)) +
geom_line()
I tried to make a line chart with the over 100 counties in the data set to see if there were an anomalies in trends for each county. I couldn’t get the ggplot to work in R but wanted to keep the code here for feedback.
How do counties differ on acres burned in 2020? Writing a new csv file to make a heat map of acres burned in Missouri in 2020 by county using Datawrapper.
acres_county_2020 <- wildfires %>% filter(!is.na(acres_burned) & year == "2020") %>%
group_by(county2) %>%
summarise(sum = sum(acres_burned))
write_csv(acres_county_2020, "data/mo_acres_burned_county_2020.csv", na="")
How many fires were reported in Missouri in 2020?
wildfires %>% filter(year == "2020") %>%
summarise(count = n())
1,812 fires reported in 2020 in Missouri.
How many fires were reported in Boone County in 2020?
wildfires %>% filter(county2 == "Boone" & year == "2020") %>%
summarise(count = n())
No fires were reported in Boone County in 2020.
How many fires were reported in Boone County over the time period?
wildfires %>% filter(county2 == "Boone") %>%
summarise(count = n())
14 fires were reported in Boone County in the time period, this isn’t a lot. Previously, I looked at total acres burned for the time period for Boone, and it was 77.01. Given only 14 wildfires were reported for 20 years, this makes sense.
How did the number of wildfires reported per year in Missouri change over time? Creating a line chart showing number of fires reported in Missouri by year.
reports_mo_year <- wildfires %>% group_by(year) %>%
summarise(count = n())
ggplot(data=reports_mo_year, aes(x=year, y=count, group=1)) +
geom_line()
Looking at the trend by year is interesting because it looks like the number of reports has been trending downward for the past 5 or so years.
How did the number of fires reported in Boone County change over time? Creating a line chart showing fires reported by year over time.
reports_boone_year <- wildfires %>% filter(county2 == "Boone") %>%
group_by(year) %>%
summarise(count = n())
ggplot(data=reports_boone_year, aes(x=year, y=count, group=1)) +
geom_line()
I plotted this just to check it out, but as found previously with acres burned, Boone County simply doesn’t have many wildfires.
How did the number of fires per month in Missouri change over time? Creating a bar chart showing number of fires reported in Missouri by month.
reports_mo_month <- wildfires %>% group_by(month) %>%
summarise(count = n())
ggplot(data=reports_mo_month, aes(x=month, y=count, group=1)) +
geom_line()
Plotting the reports in Missouri by month really shows the fluctuating months. This chart appears to be similar to the number of acres reported.
Writing a csv of reports in Missouri by month to create a chart in Datawrapper
write_csv(reports_mo_month, "data/mo-reports-month.csv", na="")
How did the number of fires per month reported in Boone County change over time? Creating a line chart showing fires reported by month over time.
reports_boone_month <- wildfires %>% filter(county2 == "Boone") %>%
group_by(month) %>%
summarise(count = n())
ggplot(data=reports_boone_month, aes(x=month, y=count, group=1)) +
geom_line()
The same as before, Boone has has a little amount of reports. Although, Boone had 5 reports in January of 2014. There was a spike in wildfire reports in early 2014 for the state as well.
Writing a new csv file to make a heat map of wildfire reports in Missouri in 2020 by county using Datawrapper.
reports_county_2020 <- wildfires %>% filter(year == "2020") %>%
group_by(county2) %>%
summarise(reports = n())
write_csv(reports_county_2020, "data/report-count-by-county-2020.csv", na="")
What was the highest cause of wildfires in Missouri in 2020? 2021?
wildfires %>% filter(year == "2020") %>%
group_by(cause) %>%
summarise(count = n()) %>%
arrange(desc(count))
wildfires %>% filter(year == "2021") %>%
group_by(cause) %>%
summarise(count = n()) %>%
arrange(desc(count))
In both 2020 and 2021 the cause of wildfire with the most reports was the category unknown. In 2020, there were 692 unknown reports and in 2021 there were 543 unknown reports. The second highest cause in 2020 and 2021 was debris.
What was the highest cause of wildfires in Boone County in 2021?
wildfires %>% filter(county2 == "Boone" & year == "2021") %>%
group_by(cause) %>%
summarise(count = n()) %>%
arrange(desc(count))
There were 2 reports of debris in 2021, making it the highest of 4 total reports. No fires were reported in 2020, so there is no highest cause.
What cause and which county had the most reports in the time period?
wildfires %>% group_by(county2, cause) %>%
summarise(count = n()) %>%
arrange(desc(count))
## `summarise()` has grouped output by 'county2'. You can override using the `.groups` argument.
Camden County had the most reports filtered by cause. The cause was unknown at 1,340 reports. The second highest was Newton County with 1,330 reports of wildfires caused by debris. The third highest was Camden again with 1,192 reports caused by debris.
What cause of wildfire is responsible for the most acres burned over the time period in Missouri?
wildfires %>% filter(!is.na(acres_burned)) %>%
group_by(cause) %>%
summarise(sum = sum(acres_burned)) %>%
arrange(desc(sum))
Debris caused the most acres burned over the time period at 336,144.44 acres burned over the 20-year time period.
What cause of wildfire is responsible for the most acres burned in 2020 in Missouri?
cause_acres <- wildfires %>% filter(!is.na(acres_burned) & year == "2020") %>%
group_by(cause) %>%
summarise(sum = sum(acres_burned)) %>%
arrange(desc(sum))
However, the cause that burned the most acres in 2020 was unknown at 18,606 acres.
Writing a csv of acres burned in 2020 for each wildfire cause.
write_csv(cause_acres, "data/cause-acres-mo-2020.csv", na="")
What station has reported the most acres burned?
wildfires %>% filter(!is.na(acres_burned)) %>%
group_by(station) %>%
summarise(acres = sum(acres_burned)) %>%
arrange(desc(acres))
Lebanon forestry has reported the most acres burned in the time period at 54,658.37 acres.
What member had reported the most acres burned?
wildfires %>% filter(!is.na(acres_burned)) %>%
group_by(member) %>%
summarise(acres = sum(acres_burned)) %>%
arrange(desc(acres))
The most acres burned comes from members listed as NA, so the highest listed is LaVal, reporting 13,423.5 acres.
Where is LaVal from?
wildfires %>% filter(member == "LaVal") %>%
group_by(county2, station) %>%
summarise(count = n())
## `summarise()` has grouped output by 'county2'. You can override using the `.groups` argument.
LaVal is from Lebanon Forestry and 1 report from Camdenton Forestry. The counted covered by LaVal include Laclede and Dallas, and 1 report each from Camden and Hickory.
To conclude, the data span of 20 years offers valuable information about wildfire trends in Missouri. While peak months for wildfire reports and acres burned in Missouri have been lower in recent years, it is clear that the number of wildfires and acres burned has not gone up significantly over the past two decades. With worries about climate change and natual disasters, wildfires are always a concern, but the data does not show this to be a problem at the moment.
I believe trying to access data, if available, on a longer time period would help support claims about trends in the data over time better. In addition, this dataset did not include prescribed and controlled burns in Missouri, which have been a crucial part in maintaining Missouri’s land.
To further reporting on the data, it would be useful to look into Camden and Washington counties and why these two counties saw signficiantly higher acres burned and wildfire reports in 2020.
Using finding, I created five graphics summarizing findings from the data:
https://www.datawrapper.de/_/Snstz/
https://www.datawrapper.de/_/5ar3Q/
https://www.datawrapper.de/_/4HYc2/
https://www.datawrapper.de/_/HawO8/
https://www.datawrapper.de/_/XlEqD/
I also published these together on one page of a website I built for a multimedia class. The website includes more information about wildfires in Missouri.
https://j4502-fs21.github.io/annie-jennemann-final-project/graphics.html