Action-Agnostic Point-Level Supervision for Temporal Action Detection

Authors: Shuhei M. Yoshida, Takashi Shibata, Makoto Terao, Takayuki Okatani, Masashi Sugiyama

Abstract: We propose action-agnostic point-level (AAPL) supervision for temporal action
detection to achieve accurate action instance detection with a lightly
annotated dataset. In the proposed scheme, a small portion of video frames is
sampled in an unsupervised manner and presented to human annotators, who then
label the frames with action categories. Unlike point-level supervision, which
requires annotators to search for every action instance in an untrimmed video,
frames to annotate are selected without human intervention in AAPL supervision.
We also propose a detection model and learning method to effectively utilize
the AAPL labels. Extensive experiments on the variety of datasets (THUMOS ’14,
FineAction, GTEA, BEOID, and ActivityNet 1.3) demonstrate that the proposed
approach is competitive with or outperforms prior methods for video-level and
point-level supervision in terms of the trade-off between the annotation cost
and detection performance.

Source: http://arxiv.org/abs/2412.21205v1

About the Author

Leave a Reply

Your email address will not be published. Required fields are marked *

You may also like these