SAM-Net: Self-Attention based Feature Matching with Spatial transformers and Knowledge Distillation

1Technical University of Cluj-Napoca, Memorandumului 28, 400114 Romania
Alternative Text

Comparison between the SAM-Net and LoFTR. This example demonstrates that SAM-Net is capable of finding more accurate correspondences even in challenging conditions

Abstract

This paper presents a novel approach to improve feature matching in computer vision applications by combining four powerful techniques: LoFTR, knowledge distillation, self-attention, and spatial transformer networks. In our approach, we use knowledge distillation to learn the feature extraction and matching capabilities of PixLoc, while incorporating self-attention and spatial transformer networks to further improve the model's ability to extract and match 2D features. SAM-Net's performance outperforms state-of-the-art methods, as demonstrated by experiments conducted on both indoor and outdoor datasets. Furthermore, SAM-Net achieves the top ranking among published methods in two public benchmarks for visual localization.

Method

Alternative Text

Visualization

Alternative Text

Benchmarking

Alternative Text

BibTeX

@article{KELENYI2024122804,
  author    = {Benjamin Kelenyi, Victor Domsa, Levente Tamas},
  title     = {SAM-Net: Self-Attention based Feature Matching with Spatial transformers and Knowledge Distillation},
  journal   = {Expert Systems with Applications},
  volume    = {242},
  pages     = {122804},
  year      = {2024},
  issn      = {0957-4174},
  doi       = {https://doi.org/10.1016/j.eswa.2023.122804}
}