Face Recognition Technology Evaluation (FRTE) 1:1 Verification

FRTE 1:1 Verification Art — Credit: Natasha Hanacek/NIST

Status

[2024-11-08] A new FRTE 1:1 report has been published.

[2024-07-03] NIST included all algorithms evaluated since the ongoing FRTE evaluation started in 2017. Previously we listed only results for each developer’s two most recent implementations. This change allows end-users to reference results for algorithms that have been superseded by newer variants, but which may still be used operationally. We added a “by developer” tab which lists only the most accurate algorithm from each developer, where accuracy is defined by the lowest FNMR value for the visa-border dataset.

[2024-03-27] NIST discontinued running the FRTE Visa-Visa benchmark on 1:1 algorithms submitted to FRTE.

[2023-07-03] All algorithms, participation agreements, and GPG keys should be submitted using the FRTE Submission Form.

Status

Except PAD, all FRTE tracks are open. Algorithms submitted to the FRTE 1:1 track will be run on all the datasets documented in the FRTE 1:1 report.

FRTE Participation Statistics

[last updated: 2024-11-08]

	2017	2018	2019	2020	2021	2022	2023	Total
Number of algorithms	57	100	165	137	224	197	192	1209
Number of unique developers	30	50	108	103	155	136	134	383

FRTE 1:1 Performance Summary

[last updated: 2024-11-08]

Accuracy One
Per Developer

Verification Performance

The table shows the top performing 1:1 algorithms measured on false non-match rate (FNMR) across several different datasets. FNMR is the proportion of mated comparisons below a threshold set to achieve the false match rate (FMR) specified. FMR is the proportion of impostor comparisons at or above that threshold. The algorithms are ordered initially in terms of rank on the VISABORDER dataset, calculated using FNMR @ FMR = 10^-6. The rows for a given column may be sorted by clicking on its header. Hovering over the algorithm will show you the corresponding company name. In addition, searching can be done using regular expressions and company names are searchable.

The values in blue correspond to a change in the FRTE API on 2022-02-14 that allows the algorithm to detect and produce templates from multiple faces in one image, which occurs in approximately 3% of border images and 7% of kiosk images. The handling and accuracy consequences of this are detailed on this slide.

Accuracy All
Algorithms, YR<3

Verification Performance

Accuracy All
Algorithms, YR≥3

Verification Performance

Resources All
Algorithms

Resources Performance

Algorithms submitted to FRTE implement NIST’s application programming interface (API). We measure the duration of all function calls using the C++ std::chrono::high resolution clock on an unloaded server-class machine. The table below includes durations of the template generation, finalization, search calls. In addition the size of the algorithm is reported in two parts: the recognition models, and the libraries.

Additional algorithms not listed in the table can be found in Table 2 of our latest FRTE report. It is a draft made available for comments. It will be updated on a monthly basis as algorithms and computations complete, as datasets are added, and as new analyses are developed.

Demographic
Variations FMR

FRTE 1:1 Demographic Differentials Summary

This page summarizes demographic variations in false match rates for 1:1 algorithms. For full details see two reports:


2022-07-12	NIST Interagency Report 8429: FRTE Part 8: Summarizing Demographic Differentials PDF
2019-12-20	NIST Interagency Report 8280: FRTE Part 3: Demographics Effects PDF

The table, last updated on 2024-11-08, includes summary indicators for how the two fundamental error rates vary by age, sex, and race.


False negatives	False negatives are failures to associate two photos of an individual. The are strongly dependent on image quality, and therefore poor photography of a face can induce a demographic effect. Two examples that will elevate false negative rates: 1. inadequate lighting or under-exposure of dark-skinned individuals, or over-exposure of fair-skinned subjects; 2. failure to adjust a camera for very tall or short individuals can lead to a pitch-angle variation
False positives	False positives are the incorrect association of two photos of different individuals. Demographic variations occur even with good image quality; they arise when an algorithm produces similarity score distributions that are displaced for one demographic versus another. This can occur due to under-representation of a demographic in the image dataset used for algorithm training.

False negatives

False negatives are failures to associate two photos of an individual. The are strongly dependent on image quality, and therefore poor photography of a face can induce a demographic effect. Two examples that will elevate false negative rates: 1. inadequate lighting or under-exposure of dark-skinned individuals, or over-exposure of fair-skinned subjects; 2. failure to adjust a camera for very tall or short individuals can lead to a pitch-angle variation

False positives

False positives are the incorrect association of two photos of different individuals. Demographic variations occur even with good image quality; they arise when an algorithm produces similarity score distributions that are displaced for one demographic versus another. This can occur due to under-representation of a demographic in the image dataset used for algorithm training.

In the table below, the rows list algorithms submitted to the 1:1 track of FRTE. The demographic summary measures are detailed in NIST IR 8429. The columns are:
1: Algorithm name.
2: Date algorithm was submitted to NIST.
3: A summary FNMR value (so that readers can look at the more accurate algorithms first).
4: The best FMR value with region, sex, age group.
5: The worst FMR value with region, sex, age group.
6: The maximum FMR over the geometric mean FMR. The ideal result is 1, indicating parity.
7: The Gini coefficient quantifying how false match errors are concentrated in certain demographics - small values are better.

FMR values are measured over comparisons of two high quality frontal portrait images of two people of the same sex, same age group, and same region of birth. The threshold is fixed for each algorithm to give FMR if 0.0003 overall.

A note on 1:N In addition, if the 1:1 algorithm were used to implement 1:N search (via N-comparisons and a sort operation), the demographic effects noted here would be germane to the 1:N application. Note that some operational 1:N algorithms do employ 1:1 algorithms, and results for those algorithms must be measured in separate 1:N tests. See Annex B in NIST IR 8429.

Demographic
Variations FNMR

FRTE 1:1 Demographic Differentials Summary

This page summarizes demographic variations in false non-match rates for 1:1 algorithms. For full details see two reports:


2022-07-12	NIST Interagency Report 8429: FRTE Part 8: Summarizing Demographic Differentials PDF
2019-12-20	NIST Interagency Report 8280: FRTE Part 3: Demographics Effects PDF

The table, last updated on 2024-11-08, includes summary indicators for how the two fundamental error rates vary by age, sex, and race.


False negatives	False negatives are failures to associate two photos of an individual. The are strongly dependent on image quality, and therefore poor photography of a face can induce a demographic effect. Two examples that will elevate false negative rates: 1. inadequate lighting or under-exposure of dark-skinned individuals, or over-exposure of fair-skinned subjects; 2. failure to adjust a camera for very tall or short individuals can lead to a pitch-angle variation
False positives	False positives are the incorrect association of two photos of different individuals. Demographic variations occur even with good image quality; they arise when an algorithm produces similarity score distributions that are displaced for one demographic versus another. This can occur due to under-representation of a demographic in the image dataset used for algorithm training.

False negatives

False positives

In the table below, the rows list algorithms submitted to the 1:1 track of FRTE. The demographic summary measures are detailed in NIST IR 8429. The columns are:
1: Algorithm name
2: Date algorithm was submitted to NIST
3: A summary FNMR value (so that readers can look at the more accurate algorithms first)
4: The best FNMR value across all regions of birth
5: The worst FNMR value across all regions of birth
6: The maximum FNMR over the geometric mean FNMR. The ideal result is 1, indicating parity.
7: The Gini coefficient quantifying how false non-match errors are concentrated in certain demographics - small values are better.

FNMR is computed over comparisons of medium quality airport immigration entry photos with high quality reference portraits. False negatives are determined by comparing similarity scores with a threshold set for each algorithm to give FMR of 0.00001 overall.

Twins
FMR

FRTE 1:1 Twins FMR

The accuracy table shows the algorithms False Match Rate (FMR) on twins probe images. FMR is the proportion of twin comparisons at or above a threshold score set to achieve FMR=0.0001 on Mugshot images. The Mugshot False Non-Match Rate (FNMR) is included to show the performance of the algorithms when mated comparisons are below a threshold set to achieve the same FMR=0.0001. Mugshot is used as a control for Twins Days data.

Prior Editions of Report

All prior Ongoing FRTE 1:1 reports can be accessed from here.

Overview

NIST started a new evaluation of face recognition technologies in February 2017. Unlike previous evaluations, the activity is conducted on an ongoing basis in that the evaluation remains open indefinitely such that developers may submit their algorithms to NIST whenever they are ready, but no more frequently than four calendar months. The algorithms will be evaluated rapidly on a first-come-first-served basis, following our MINEX III evaluation of fingerprint recognition implementations. Performance results will be posted to the NIST website as soon as they are ready. This approach more closely aligns evaluation with development schedules; this improves over the two to four year interval between past FRTE tests.

Goals

The FRTE is aimed at measurement of the performance of automated face recognition technologies applied to a wide range of civil, law enforcement and homeland security applications including verification of visa images, de-duplication of passports, recognition across photojournalism images, and identification of child exploitation victims. In all cases the input image will contain one face only. Our performance reports will include measurements of accuracy, speed, storage and memory consumption, and resilience. NIST will report the dependence of performance on the properties of the images and the subjects. In its initial form, FRTE has one assessment track, for face verification.

How to Participate

To participate in this evaluation, developers need to submit a participation agreement to NIST, wrap their software behind the published C++ API, run their libraries through the provided validation package (which creates a submission package), encrypt the package, and provide a download link for the encrypted submission package. More details are provided below.

Participation Agreement

FRTE is conducted by NIST, an agency of the United States Government. Participation is free of charge. FRTE is open to a global audience of face recognition developers. All organizations who seek to participate in FRTE must sign all pages of this Participation Agreement and submit it with their algorithm submission using the FRTE Submission Form. [last update: 2023-07-03]

API Document

A new API document has been published. All FRTE APIs reference the supporting FRTE General Evaluation Specifications, which includes hardware and operating system environment, software requirements, reporting, and common data structures that support the APIs. [last update: 2023-04-06]

Validation

An updated validation package has been published. All participants must run their software through the updated validation package prior to submission. The purpose of validation is to ensure consistent algorithm output between your execution and NIST’s execution. Note: The provider identifier in the core implementation library must be an appropriate, representative, non-infringing name of the main provider of the software. If the provider identifier is not representative of the submitting organization, NIST reserves the right to reject the submission for testing. [last update: 2023-04-28]

Encryption

All submissions must be properly encrypted and signed before transmission to NIST. This must be done according to these instructions using the FRTE Ongoing public key linked from this page. Participants must email their public key to NIST. The participant’s public key must correspond to the participant’s public-key fingerprint provided on the signed Participation Agreement. [last update: 2017-09-11]

Submission

All algorithm submissions must be submitted through the FRTE Submission Form, which requires encrypted files be provided as a download link from a generic http server (e.g., Google Drive). We cannot accept Dropbox links. NIST will not register, or establish any kind of membership, on the provided website. Participants can submit their algorithm(s), participation agreement, and GPG key at the same time via the submission form. [last update: 2023-07-03]

Participants must subscribe to the FRTE mailing list to receive emails when new reports are published or announcements are made.

Contact Information

Inquiries and comments may be submitted to [email protected].

Subscribe to the FRTE mailing list to receive emails when announcements or updates are made.

Face Recognition Technology Evaluation (FRTE) 1:1 Verification

Status

FRTE Participation Statistics

FRTE 1:1 Performance Summary

Accuracy OnePer Developer

Verification Performance

Accuracy AllAlgorithms, YR<3

Verification Performance

Accuracy AllAlgorithms, YR≥3

Verification Performance

Resources AllAlgorithms

Resources Performance

DemographicVariations FMR

FRTE 1:1 Demographic Differentials Summary

DemographicVariations FNMR

FRTE 1:1 Demographic Differentials Summary

TwinsFMR

FRTE 1:1 Twins FMR

Prior Editions of Report

Overview

Goals

How to Participate

Participation Agreement

API Document

Validation

Encryption

Submission

Contact Information

Subscribe

Related NIST Projects

FRTE Projects

FATE Projects

Accuracy One
Per Developer

Accuracy All
Algorithms, YR<3

Accuracy All
Algorithms, YR≥3

Resources All
Algorithms

Demographic
Variations FMR

Demographic
Variations FNMR

Twins
FMR