ORFormer: Occlusion-Robust Transformer for Accurate Facial Landmark Detection

Authors: Jui-Che Chiang, Hou-Ning Hu, Bo-Syuan Hou, Chia-Yu Tseng, Yu-Lun Liu, Min-Hung Chen, Yen-Yu Lin

Abstract: Although facial landmark detection (FLD) has gained significant progress,
existing FLD methods still suffer from performance drops on partially
non-visible faces, such as faces with occlusions or under extreme lighting
conditions or poses. To address this issue, we introduce ORFormer, a novel
transformer-based method that can detect non-visible regions and recover their
missing features from visible parts. Specifically, ORFormer associates each
image patch token with one additional learnable token called the messenger
token. The messenger token aggregates features from all but its patch. This
way, the consensus between a patch and other patches can be assessed by
referring to the similarity between its regular and messenger embeddings,
enabling non-visible region identification. Our method then recovers occluded
patches with features aggregated by the messenger tokens. Leveraging the
recovered features, ORFormer compiles high-quality heatmaps for the downstream
FLD task. Extensive experiments show that our method generates heatmaps
resilient to partial occlusions. By integrating the resultant heatmaps into
existing FLD methods, our method performs favorably against the state of the
arts on challenging datasets such as WFLW and COFW.

Source: http://arxiv.org/abs/2412.13174v1