Рет қаралды 1,670
In this video, I will explain how Yolo-World works. A Real-Time Open-Vocabulary Object Detection method for detecting new objects without the need to retrain the model.
YOLO-World is pre-trained on large-scale vision-language datasets such as Objects365, GQA, Flickr30K, and CC3M, providing it with strong zero-shot open-vocabulary capability and image grounding ability.
YOLO-World achieves fast inference speeds, and we demonstrate re-parameterization techniques for faster inference and deployment based on user vocabularies.
Github: github.com/AILab-CVC/YOLO-World
00:48 - YOLO-WORLD Paper
04:47 - YOLO-WORLD Architecture
08:47 - Some Examples
10:27 - HuggingFace Demo
12:03 - Google Colab Demo