Coarse-to-Fine Grained Text-based Video-moment Retrieval pipeline utilizing T-MASS and MESM models for efficient multi-stage text-video alignment.
machine-learning
deep-learning
video-processing
tacos
video-moment-retrieval
text-video-retrieval
t-mass
mesm
coarse-to-fine-alignment
stochastic-embed
stochastic-embedding
tacos-cg
-
Updated
Aug 28, 2024 - Python