Adaptive Text Recognition through Visual Matching
This work addresses the problems of generalization and flexibility for text recognition in documents. We introduce a new model that exploits the repetitive nature of characters in languages, and decouples the visual decoding and linguistic modelling stages through intermediate representations in the form of similarity maps. By doing this, we turn text recognition into a visual matching problem, thereby achieving generalization in appearance and flexibility in classes. We evaluate the model on both synthetic and real datasets across different languages and alphabets, and show that it can handle challenges that traditional architectures are unable to solve without expensive re-training, including: (i) it can change the number of classes simply by changing the exemplars; and (ii) it can generalize to novel languages and characters (not in the training data) simply by providing a new glyph exemplar set. In essence, it is able to carry out one-shot sequence recognition. We also demonstrate that the model can generalize to unseen fonts without requiring new exemplars from them."