11 August 25
An AI Lesson From Urban Forest Mapping
Today I fielded an email from a staffer at the California Air Resources Board about the following topic, and I think there’s a general lesson to be had here. Between 2021 and 2023 I worked on a project that was looking at the extent of and ecosystem services provided by the urban forests of California. This was a follow-on to an earlier project our lab had done in 2015 about the same topic, and one of the goals of the project was to do a change analysis between the two time periods. For the question about urban forest canopy extent, we were working with high-resolution tree canopy cover datasets from a company called EarthDefine. In particular, we were comparing a canopy cover data layer from 2012 (used in our 2015 analysis) to a canopy cover data layer from 2018. In theory, all one has to do to measure in canopy cover extent is to subtract the 2018 layer from the 2012 layer. Pixels where there was canopy cover in 2012 but not in 2018, or vice-versa, would represent change.
In practice, we soon discovered this wasn’t going to work at all. These canopy cover datasets were developed using machine learning models applied over NAIP imagery, which is high-resolution aerial photography produced periodically in a program run by the US Department of Agriculture. When we compared the canopy cover maps in 2012 and 2018 with their source imagery, it was evident that the machine learning models for two canopy cover datasets used very different ideas about how to recognize and delineate trees in the source imagery. This resulted in unrealistic change statistics, for example the urban canopy cover in Riverside County purportedly increasing from 2012 to 2018 by 20%. Basically, the comparison was between outputs from different machine learning models applied over different datasets (in particular the 2012 imagery had a resolution of 1 meter, and the 2018 imagery had a resolution of 0.6 meters) — apples and oranges.
The general lesson for AI is to be very careful about extending an AI model beyond the domain over which it has been trained. Sometimes this works, but many times it does not, with deleterious consequences. In particular, this is one of the antipatterns that can result in AI bias.
Previous: Photo Cataloging Next: Painting In Kodachrome