Create an video operator using CLIP #357

dennyabrain · 2024-06-14T09:05:10Z

No description provided.

aatmanvaidya · 2024-08-05T10:44:07Z

@Snehil-Shah lets use this issue to track the work on just a simple video CLIP operator that takes an image as input and gives an embedding as output. (do add the iframe approach here)

can you reply to this issue so that I can assign it to you

Snehil-Shah · 2024-08-05T10:46:58Z

Comment

Snehil-Shah · 2024-08-08T23:05:20Z

CLIP-ViT-base-patch32 newer pipeline:

Video Length	CPU Time (s)	RAM Usage
30s (1.92 MB)	1.15	40.6 MiB
1m (8.86 MB)	2.16	147.4 MiB
5m (42.8 MB)	10.89	724.9 MiB
10m (85.83 MB)	28.67	1.4 GiB
15m (128.65 MB)	38.26	2.1 GiB
20m (171.65 MB)	58.89	2.9 GiB
25m (214.41 MB)	57.3	3.5 GiB
30m (257.29 MB)	77	4.2 GiB
45m (385.94 MB)	killed	killed
1h (421.13 MB)	killed	killed

The results are surprising.

Some pointers regarding the behavior of the operator:

Profiles are inconsistent. For instance, I measured the 20m video twice (as I noticed some discrepancies in data), once it was 58.89s (as written above) and another time it was 67s.
Maybe the video encoding really affects the behavior? I basically used ffmpeg to loop a base video to desired lengths to create the benchmarking data. I once used an online video looper service as well to generate a 30m video. With ffmpeg, the RAM is 4.2GiB (as stated above) and with the online service export, it was merely 1.1 GiB. This behavior is definitely strange.
The difference in RAM usage between the older CLIP pipeline and the newer pipeline is also strange. But to be fair we were using a different API (sentence-transformers) previously for quick testing, and now we are using raw transformers with batch processing. So, internal implementations might be different.

Here are the previous profiles with the older pipeline:

ResNet18:

Video Length	CPU Time (s)	RAM Usage
30s (1.92 MB)	3.34	106.7 MiB
1m (8.86 MB)	8.87	107.8 MiB
5m (42.8 MB)	58.37	110.3 MiB
10m (85.83 MB)	79	116.6 MiB

CLIP-ViT-base-patch32:

Video Length	CPU Time (s)	RAM Usage
30s (1.92 MB)	9.87	1.1 GiB
1m (8.86 MB)	17.53	1.1 GiB
5m (42.8 MB)	78	1.1 GiB
10m (85.83 MB)	175	1.1 GiB

Comparisons:

The compute time is a lot faster, possibly because of more selective frame sampling (around 0.2 of the number of frames processed in the older pipeline). There are definitely inconsistencies as mentioned above, and demanding RAM behavior.
The video also failed to process videos longer than 30 minutes (were killed). But on the other hand, continuing the discussion above, I took the 30m video from the online service (one with very less RAM usage) and looped it using ffmpeg, and it was able to process 1h and even 2h videos successfully.
I think it's safe to conclude, video length is not the right threshold to decide when to stop processing a video. I think it's safe to rely on the SIGKILL interrupt (emitted due to memory exhaustion) as a show-stopper.

dennyabrain mentioned this issue Jun 14, 2024

Clustering large amount of videos #354

Open

6 tasks

dennyabrain added the level:feature label Jun 14, 2024

aatmanvaidya changed the title ~~Create an operator for video clustering and benchmark it~~ Create an video operator using CLIP Aug 5, 2024

aatmanvaidya assigned Snehil-Shah Aug 5, 2024

Snehil-Shah mentioned this issue Aug 7, 2024

[81] - add operator to encode videos into vector embeddings using CLIP-ViT-B-32 #369

Merged

Snehil-Shah mentioned this issue Aug 11, 2024

[81] - add operator to classify videos using a zero-shot approach #370

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create an video operator using CLIP #357

Create an video operator using CLIP #357

dennyabrain commented Jun 14, 2024

aatmanvaidya commented Aug 5, 2024

Snehil-Shah commented Aug 5, 2024

Snehil-Shah commented Aug 8, 2024 •

edited

Loading

Create an video operator using CLIP #357

Create an video operator using CLIP #357

Comments

dennyabrain commented Jun 14, 2024

aatmanvaidya commented Aug 5, 2024

Snehil-Shah commented Aug 5, 2024

Snehil-Shah commented Aug 8, 2024 • edited Loading

CLIP-ViT-base-patch32 newer pipeline:

ResNet18:

CLIP-ViT-base-patch32:

Snehil-Shah commented Aug 8, 2024 •

edited

Loading