Why Pixel Counting is not Adequate for Evaluating Segmentation
Let me illustrate what I was trying to say in a comment to João Carriço:
Consider the following three shapes:
If the top (red) image is your reference and green and blue are two candidate solutions, then pixel counting (which forms the basis of the Rand and Jaccard indices) will say that green is worse than blue. In fact, green differs by 558 pixels, while blue only by 511 pixels.
However, the green image is simply a fatter version of red (with a circa 2 pixel boundary). Since boundaries cannot be really drawn at pixel level anyway (it is a fuzzy border between background and foreground), it is not an important difference. The blue image, however, has an extra blob and so is qualitatively different.
The Hausdorff distance or my own normalized sum of distances, on the other hand, would say that green is very much like red, while blue is more different. Thus they capture the important differences better than pixel counting. I think this is why we found that these are better measures than Rand or Jaccard (or Dice) for evaluation of segmentation.
(Thanks João for prompting this example. I used this when I gave a talk or two about this paper, but it was lost in the paper because of page limits.)
Reference
NUCLEAR SEGMENTATION IN MICROSCOPE CELL IMAGES: A HAND-SEGMENTED DATASET AND COMPARISON OF ALGORITHMS by Luis Pedro Coelho, Aabid Shariff, and Robert F. Murphy in Biomedical Imaging: From Nano to Macro, 2009. ISBI ’09. IEEE International Symposium on, 2009. DOI: 10.1109/ISBI.2009.5193098 [Pubmed Central open access version]
Related articles
Nuclear Segmentation in Microscope Cell Images (metarabbit.wordpress.com)