At Groundlight, we put careful thought into measuring the correctness of our machine learning detectors. In the simplest case, this means measuring detector accuracy. But our customers have vastly different performance needs since our platform allows them to train an ML model for nearly any Yes/No visual question-answering task. A single metric like accuracy is unlikely to provide adequate resolution for all such problems. Some customers might care more about false positive mistakes (precision) whereas others might care more about false negatives (recall).
Tales from the Binomial Tail: Confidence intervals for balanced accuracy
· 24 min read