Tag Completion based on Belief Theory and Neighbor Voting

Amel Znaidia, Hervé Le Borgne, Celine Hudelot

ACM International Conference on Multimedia Retrieval,
Dallas, Texas, USA, April 16-19, 2013 PDF


We address the problem of tag completion for automatic image annotation. Our method consists in two main steps: creating a list of candidate tags from the visual neighbors of the untagged image then using them as pieces of evidence to be combined to provide the final list of predicted tags. Both steps introduce a scheme to tackle with imprecision and uncertainty. First, a bag-of-words (BOW) signature is generated for each neighbor using local soft coding. Second, a sum-pooling operation across the BOW of the k nearest neighbors provides the list of candidate tags. Finally, we use neighbors as pieces of evidence to be combined according to the Dempster's rule to predict the more relevant tags. The method is evaluated in the context of image classification and that of tag suggestion. The database used for visual neighbors search contains 1.2 million images extracted from Flickr. Classification is evaluated on the well known Pascal VOC 2007 and MIR Flickr datasets, on which we obtain similar or better results than the state-of-the-art. For tag suggestion, we manually annotated 241 queries. As well, we obtain competitive results on this task.


This benchmark is made available for reproducible research purpose only. We downloaded all the images available on Flickr among the 331 proposed by [1], resulting into a collection of 241 images. An example of images, given in figure 3, shows that the proposed ground truth do not reflect perfectly the images visual content (see below).
We thus manually re-annotates the dataset to better reflect the image visual content. For this, we followed a protocol inspired from the collaborative annotation tool of TrecVid [2] showing that annotating a small fraction of carefully chosen samples of a collection is enough to achieve similar performance (or even better) compared to those obtained with the entire collection.

Example of images from the dataset of [1]. First row represents ground truth proposed by [1] and the second row represents our manually annotations used as ground truth for tag suggestion evaluation.
Dataset exemples (with original and proposed groundtruth)

The proposed annotations can be downloaded here.

The 241 images can be dowloaded directly on FlickR since we kept their original numeric ID. For this, you can use the FlickR API or a short programs to convert the numeric ID into the corresponding short URL. We provide a sample C++ code to convert a FlickR ID into its short URL.

[1] B. Sigurbjörnsson and R. van Zwol. Flickr tag recommendation based on collective knowledge. WWW'08, pages 327-336, New York, NY, USA, 2008. ACM.

[2] S. Ayache and G. Quénot. Evaluation of active learning strategies for video indexing. Journal of Image Communication, 22(7-8):692-704, Aug. 2007.


Tag Completion based on Belief Theory and Neighbor Voting
2013 ACM International Conference on Multimedia Retrieval (ICMR 2013), Dallas, Texas, USA, April 16-19, 2013