yeolj0o

yeolj0o t1_j2iizj4 wrote

This cityscapes segmentation approach paper provides intuition on the height prior, which is basically categorizing an image into three part according to the height of the pixel coordinates and then measuring pixel-wise class distributions. As you mentioned in your original post, you can use KL-divergence to measure similarity of the class distribution between two images.

For your case (measuring ambiguity), I think measuring the class distribution (of an image) seems like a bad idea since local differences may be the key difference you want to observe. Instead, I think measuring miou between two or more images can be a good measure since ambiguous annotations must have a small overlapping region, thus having a low miou.

2

yeolj0o t1_j2hkerk wrote

I was having the exact thought about comparing segmentation labels. The best "deep learning style" approach I've come up with (and also not satisfying) is running a semantic image synthesis model (e.g., SPADE, OASIS, PITI) and comparing FIDs. A better approach for my case (I am working with cityscapes) where the scene outline is mostly fixed, is to utilize height priors and compare KL divergence or FSD according to the height (bottom, mid, top).

1