Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Voyager Face embedding storing #32

Closed
Raghucharan16 opened this issue Apr 18, 2024 · 15 comments
Closed

Voyager Face embedding storing #32

Raghucharan16 opened this issue Apr 18, 2024 · 15 comments
Labels

Comments

@Raghucharan16
Copy link

what exactly is this piece of code doing

for i in range(len(embeddings), target_size):
    embedding = np.random.uniform(-5, +5, num_dimensions)
    embeddings.append(embedding)
    img_names.append(f'synthetic_{i}.jpg')
print(f'There are {len(embeddings)} embeddings available')

and can i just add my own faces embeddings without creating synthetic data? if so how can i do that?
Thank you.

@serengil
Copy link
Owner

where did you get this?

@serengil
Copy link
Owner

it is for adding synthetic data. i wanted to test some ann algorithms on very large data. of course you do not have to have that block. working with just real data is better.

@serengil serengil closed this as not planned Won't fix, can't repro, duplicate, stale Apr 18, 2024
@Raghucharan16
Copy link
Author

Raghucharan16 commented Apr 18, 2024

on your blog
how can i do that, like how can add embedding with n_dimension parameter?
and also while running it in vs code it is not showing the result picture?
I had this code,

# built-in dependencies
import os
import time

# third-party dependencies
import numpy as np
import cv2
import matplotlib.pyplot as plt
from deepface import DeepFace
from voyager import Index,Space
model_name = 'Facenet'
detector_backend = 'mtcnn'
num_dimensions = 128 # Facenet produces 128-dimensional vectors 

img_names = []
embeddings = []

for dirpath, dirnames, filenames in os.walk('dbmod'):
    for filename in filenames:
        if '.jpg' in filename:
            try:

                img_name = f'{dirpath}{filename}'
                
                embedding_objs = DeepFace.represent(
                    img_name, model_name=model_name, detector_backend=detector_backend
                )
                embedding = embedding_objs[0]['embedding']
                embedding=embedding,num_dimensions
                embeddings.append(embedding)
                img_names.append(img_name)
            except Exception as e:
                pass
# target_size = 10000
# for i in range(len(embeddings), target_size):
#     embedding = np.random.uniform(-5, +5, num_dimensions)
#     embeddings.append(embedding)
#     img_names.append(f'synthetic_{i}.jpg')
print(f'There are {len(embeddings)} embeddings available')
index = Index(Space.Euclidean, num_dimensions=num_dimensions)
embeddings_np = np.array(embeddings)
tic = time.time()

index.add_items(embeddings_np)

toc = time.time()

print(
    f'{embeddings_np.shape[0]} embeddings are stored in voyager in '
    f'{round(toc-tic, 2)} seconds'
)
target_img = 'sample.jpg'
embedding_obj = DeepFace.represent(
    target_img, model_name=model_name, detector_backend=detector_backend
)
target_embedding = embedding_obj[0]['embedding']
tic = time.time()

neighbors, distances = index.query(target_embedding, k=3)

toc = time.time()

print(
    f'Index search completed in {toc-tic} seconds among '
    f'{embeddings_np.shape[0]} vectors'
)
target_img = cv2.imread('Madhursample.jpg')

for i, neighbor in enumerate(neighbors):
    img_name = img_names[neighbor]
    label = img_name.split('/')[-1]
    distance = distances[i]
    print(
        f'{i+1}. nearest neighbor is {label} with distance {round(distance)}'
    )

I'm getting this output, and [error]

There are 0 embeddings available
Traceback (most recent call last):
  File "/home/narravenkataraghucharan/Desktop/ufacedetection/face_voyager.py", line 87, in <module>
    index.add_items(embeddings_np)
ValueError: Input array was expected to have rank 2, but had rank 1.

@Raghucharan16
Copy link
Author

And for me 110 face embeddings are taking more than a minute for storing in voyager. but the search was fast though. could you check what went wrong??
this is the code:

import os
import time
import logging
import numpy as np
import cv2
from deepface import DeepFace
from voyager import Index, Space

model_name = 'Facenet'
detector_backend = 'mtcnn'
num_dimensions = 128  # Facenet produces 128-dimensional vectors 

img_names = []
embeddings = []

for dirpath, dirnames, filenames in os.walk('dbmod'):
    for filename in filenames:
        if '.jpg' in filename:
            try:
                img_name = os.path.join(dirpath, filename)
                
                # Generate embedding
                embedding_objs = DeepFace.represent(img_name, model_name=model_name, detector_backend=detector_backend)
                embedding = embedding_objs[0]['embedding']
                logging.debug(f"Successfully generated embedding for {img_name}")
                
                # Append to lists
                embeddings.append(embedding)
                img_names.append(img_name)
            except Exception as e:
                logging.error(f"Error generating embedding for {img_name}: {e}")
                pass

# Print number of embeddings
print(f'There are {len(embeddings)} embeddings available')

# Initialize Voyager index
index = Index(Space.Euclidean, num_dimensions=num_dimensions)

# Add embeddings to index
embeddings_np = np.array(embeddings)
index.add_items(embeddings_np)

# Process target image
target_img = 'sample.jpg'
embedding_obj = DeepFace.represent(target_img, model_name=model_name, detector_backend=detector_backend)
target_embedding = embedding_obj[0]['embedding']

# Perform index search
neighbors, distances = index.query(target_embedding, k=1)

# Print results
print(f'Index search completed among {embeddings_np.shape[0]} vectors')

# Display nearest neighbors
for i, neighbor in enumerate(neighbors):
    img_name = img_names[neighbor]
    label = img_name.split('/')[-1]
    distance = distances[i]
    print(f'{i+1}. Nearest neighbor is with distance {round(distance)}')

@serengil
Copy link
Owner

Nothing! Creating index takes time but it offers fast search.

@Raghucharan16
Copy link
Author

so this can't be faster than this?? like for mere 100 images it is taking 1 min to store?

@serengil
Copy link
Owner

if you have 100 images, then you should not use an index method. deepface's find function performs better.

index methods should be adopted if you have 1M+ samples.

@Raghucharan16
Copy link
Author

yes, indeed deepface's find function is much faster but for my data, it is not giving accurate results

@Raghucharan16
Copy link
Author

Hey @serengil I have one small task to do, would you give me a hand if possible,
The task is I have multiple folders containing faces in them, say folder1 has A,B,C,D faces and folder2 have A,D,E,F faces now my task is to iterate the 2 folders [basically there will be more] and save the unique faces in another folder say unique_faces_folder. what i'm doing is before adding a face, i'm verifying it through deepface's verify method and also tried the find method on [uniwue_faces-folder but i'm getting false positives. and with verify method it is taking too much time. what would be suggested way to improve and solve the use case. i'm using yolov9 for face detection. tried voyage and annoy too for first nearest neighour but those are giving mixed results.

@serengil
Copy link
Owner

the best way to do that is to use verify function - it will take some time

@Raghucharan16
Copy link
Author

Yeah verify gave me better results but taking some time. why can't we get same results with find function as it is very fast?
compared to iterative checking.

@serengil
Copy link
Owner

we discussed this yesterday, verify and find are doing same, find stores its outcomes in a pickle file to restore later.

@Raghucharan16
Copy link
Author

yeah we discussed about it. but for me results are not same. hoping the insight face's buffalo_l model will give better results. thanks for your patience and we appreciate your work.

@darkar18
Copy link

Hey just my opinion Vector store like Milvus can give you dynamic indexing and storing,need not build everytime. + they have searching and indexing params you can configure. checkout Milvusdb

@Raghucharan16
Copy link
Author

Raghucharan16 commented Apr 21, 2024

@darkar18 thanks for suggestion i'll look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants