clip1 [ Paper Review ] CLIP (Learning Transferable Visual Models From Natural Language Supervision, 2021) Learning Transferable Visual Models From Natural Language SupervisionState-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual coarxiv.org 0. AbstractSOTA computer vision 시스템은 사전에 정의된 object category를 예.. 2024. 7. 15. 이전 1 다음