Learning Surrogates via Deep Embedding

Yash Patel, Tomáš Hodaň, Jiří Matas ;

Abstract


This paper proposes a technique for training neural networks by minimizing surrogate losses that approximate the target evaluation metric, which may be non-differentiable. The surrogates are learned via a deep embedding where the Euclidean distance between the prediction and the ground truth corresponds to the value of the evaluation metric. The effectiveness of the proposed technique is demonstrated in a post-tuning setup, where a trained model is tuned on the learned surrogate. The scores on the evaluation metric are improved without any significant computational overhead. Without any bells and whistles, improvements are demonstrated on challenging and the practical tasks of scene-text recognition (training with the edit distance metric) and scene-text detection (training with the intersection over union metric for rotated bounding boxes). Relative improvements of up to $38\%$ (in the total edit distance) and $4.25\%$ (in the $F_{1}$ score) were achieved in the recognition and detection tasks respectively."

Related Material


[pdf]