This repository contains the framework for training deep embeddings for face recognition. The trainer is intended for the face recognition exercise of the EE488B Deep Learning for Visual Understanding course. This is an adaptation of the speaker recognition model trainer.
pip install -r requirements.txt
- Softmax:
python ./trainEmbedNet.py --model ResNet18 --trainfunc softmax --save_path exps/exp1 --nClasses 2000 --batch_size 200 --gpu 8
GPU ID must be specified using --gpu
flag.
Use --mixedprec
flag to enable mixed precision training. This is recommended for Tesla V100, GeForce RTX 20 series or later models.
Softmax (softmax)
Triplet (triplet)
For softmax-based losses, nPerClass
should be 1, and nClasses
must be specified. For metric-based losses, nPerClass
should be 2 or more.
ResNet18
You can add new models and loss functions to models
and loss
directories respectively. See the existing definitions for examples.
wget https://mm.kaist.ac.kr/teaching/ee488b/resources/val_pairs.csv
wget https://mm.kaist.ac.kr/teaching/ee488b/resources/ee488b_data_v1.zip
unzip ee488b_data_v1.zip
- This is optional.
wget https://mm.kaist.ac.kr/teaching/ee488b/resources/vggface2_train.zip
unzip vggface2_train.zip
The test list should contain labels and image pairs, one line per pair, as follows. 1
is a target and 0
is an imposter.
1,id10001/00001.jpg,id10001/00002.jpg
0,id10001/00003.jpg,id10002/00001.jpg
The folders in the training set should contain images for each identity (i.e. identity/image.jpg
).
The input transformations can be changed in the code.
In order to save pairwise similarity scores to file, use --output
flag.