Abstract: This paper introduces a novel framework for few-shot open-set speaker identification, aimed at real-world household wake-up and recognition scenarios. To address the limitations of current ...
Visual-language foundation models, like CLIP, learn generalized representations that enable zero-shot open-set clas-sification. Few-shot adaptation methods, based on prompt tuning, have been shown to ...