Fault detection and diagnosis (FDD) algorithms for building systems and equipment represent one of the most active areas of research and commercial product development in the buildings industry. However, far more e↵ort has gone into developing these algorithms than into assessing their performance. As a result, considerable uncertainties remain regarding the accuracy and e↵ectiveness of both research-grade FDD algorithms and commercial products—a state of a↵airs that has hindered the broad adoption of FDD tools. This article presents a general, systematic framework for evaluating the performance of FDD algorithms. The article focuses on understanding the possible answers to two key questions: in the context of FDD algorithm evaluation, what deﬁnes a fault and what deﬁnes an evaluation input sample? The answers to these questions, together with appropriate performance metrics, may be used to fully specify evaluation procedures for FDD algorithms.