Skip to main content


Automated Data Cleaning for Environmental Monitoring


Automated sensor networks are revolutionizing data collection in ecology and environmental science, but these enormous data streams pose a challenge for scientists to accurately and precisely discriminate false readings from true readings. A team of computer scientists, ecologists, and ecological information managers worked with climate data collected at the HJ Andrews Long-Term Ecological Research (LTER) site to develop a computer program that can accurately and consistently identify data streams from malfunctioning sensors, and discriminate these data streams from those produced by correctly functioning sensors.

Address Goals

Discovery. The computer science used to develop algorithms for data cleaning in these climate datasets was a novel contribution to computer science, a Dynamic Bayesian Network model for analyzing sensor observations and distinguishing sensor failures from valid data for the case of air temperature measured at 15 minute time resolution. Research Infrastructure. Automated data cleaning from large, distributed sensors in the environment will be essential to make operational the cyber-infrastructure of large sensor network projects, such as the National Ecological Observatory Network (NEON).