US8891878B2 - Method for representing images using quantized embeddings of scale-invariant image features - Google Patents

Method for representing images using quantized embeddings of scale-invariant image features Download PDF

Info

Publication number
US8891878B2
US8891878B2 US13/525,222 US201213525222A US8891878B2 US 8891878 B2 US8891878 B2 US 8891878B2 US 201213525222 A US201213525222 A US 201213525222A US 8891878 B2 US8891878 B2 US 8891878B2
Authority
US
United States
Prior art keywords
image
matrix
features
quantization
random
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US13/525,222
Other versions
US20130336588A1 (en
Inventor
Shantanu Rane
Petros T Boufounos
Mu Li
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mitsubishi Electric Research Laboratories Inc
Original Assignee
Mitsubishi Electric Research Laboratories Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitsubishi Electric Research Laboratories Inc filed Critical Mitsubishi Electric Research Laboratories Inc
Priority to US13/525,222 priority Critical patent/US8891878B2/en
Assigned to MITSUBISHI ELECTRIC RESEARCH LABORATORIES, INC. reassignment MITSUBISHI ELECTRIC RESEARCH LABORATORIES, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LI, MU, BOUFOUNOS, PETROS, RANE, SHANTANU
Priority to US13/733,517 priority patent/US8768075B2/en
Priority to JP2013100965A priority patent/JP5950864B2/en
Publication of US20130336588A1 publication Critical patent/US20130336588A1/en
Application granted granted Critical
Publication of US8891878B2 publication Critical patent/US8891878B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2133Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on naturality criteria, e.g. with non-negative factorisation or negative correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods

Definitions

  • This invention relates generally to extracting features from images, and more particularly to using the features to query databases.
  • Augmented reality is a significant application to leverage recent advances in computing devices, and more particularly mobile devices.
  • the mobile devices can include clients such as mobile telephones (cell phone), personal digital assistants (PDA), a tablet computers and the like.
  • clients such as mobile telephones (cell phone), personal digital assistants (PDA), a tablet computers and the like.
  • PDA personal digital assistants
  • Such devices have limited memory, processing, communication, and power resources.
  • augmented reality applications present a special chalange in mobile environments.
  • the devices can acquire images or videos using either a camera or network.
  • the images can be of real world scenes, or synthetic data, such as computer graphic images or animation videos.
  • the devices can augment the experience for a user by overlaying useful information on the images or videos.
  • the useful information can be in the form of metadata.
  • the metadata can be information about a historical landmark, nutrition information about a food item, or a product identified with a (linear or matrix) bar in an image.
  • SIFT scale-invariant feature transform
  • SURF speeded up robust feature
  • GIST G-invariant feature transform
  • SIFT and SURF acquire local details in an image, and therefore, have been used to match local features or patches. They can also be used for image matching and retrieval by combining hypotheses from several patches using, for example, the popular “Bag-of-Features” approach.
  • GIST acquires global properties of the image and has been used for image matching.
  • a GIST vector is an abstract representation of a scene that can activate a memory representations of scene categories, e.g., buildings, landscapes, landmarks, etc.
  • SIFT has the best performance in the presence of common image deformations, such as translation, rotation, and a limited amount of scaling.
  • the SIFT feature vector for a single salient point in an image is a real-valued, unit-norm 128-dimensional vector. This demands a prohibitively large bit rate required for the client to transmit the SIFT features to a database server for the purpose of image matching, especially if features from several salient points are needed for reliable matching.
  • BoostSSC Similarity Sensitive Coding
  • RBM Restricted Boltzmann Machines
  • PCA Principle Component Analysis
  • LDA Linear Discriminant Analysis
  • a low-bit rate descriptor uses Compressed Histogram of Gradients (CHoG) specifically for augmented reality applications.
  • CHoG Compressed Histogram of Gradients
  • gradient distributions are explicitly compressed, resulting in low-rate scale invariant descriptors.
  • LSH Locality Sensitive Hashing
  • Random projections are determined from scale invariant features followed by one-bit quantization.
  • the resulting descriptors are used to establish visual correspondences between images acquired in a wireless camera network.
  • the same technique can be applied to content-based image retrieval, and a bound is obtained for the minimum number of bits needed for a specified accuracy of nearest neighbor search.
  • those methods do not consider a tradeoff between dimensionality reduction and quantization levels.
  • the embodiments of the invention provide a method for representing an image by extracting featrures from the image.
  • the image is a query image.
  • the representation can be performed in a client, and the extracted feaures can be transmitted to a server to search a databases of similarly represented images for matching. Metadata of matching images can be returned to the client.
  • the method extracts scale invariant features from the query image. Then, a small number of quantized random projections of the features are determined and quantized. The quantized projections are used to search a database at the server storing images in a similar form.
  • the server performs a nearest neighbor search in a low-dimensional subspace of the quantized random projections, and returns metadata corresponding to the query image.
  • the embodiments invention allow a trade-off that balances the number of random projections and the number of bits, i.e., the quantizing levels, used to store the projection.
  • the method achieves a retrieval accuracy up to 94%, while requiring a mobile client device to transmit only 2.5 kB to the server for each image. This as a significant improvement over 1-bit quantization schemes known in the art.
  • the method is particularly suited for mobile client applications that are resource constrained.
  • FIG. 1A is a flow diagram of a method for representing an image by a client according to embodiments of the invention
  • FIG. 1B is pseudocode of the method for representing the image according to embodiments of the invention.
  • FIG. 2 is pseudocode of the method for representing images at a server according to embodiments of the invention
  • FIG. 3 is pseudocode of the method for searching a database according to embodiments of the invention.
  • FIG. 4 is a schematic of operations of the embodiments.
  • the embodiments of the invention provide a method for extracting features from a query image in a client.
  • the features can be transmitted to a server, and used to search a database to retrieve similar images, and image specific metadata that are appropriate in an augmented reality application.
  • FIG. 1 shows a method for representing an image 101 .
  • Features 102 are extracted 110 from the image. It is understood that the method can be applied to a sequence of images, as in a video.
  • the images can be of real world scenes, or synthetic data.
  • the images can be acquired directly by an embedded camera, or downloaded via a network.
  • the features are scale-invariant.
  • the features are multiplied 120 by a matrix 103 of random entries to produce a matrix of random projections 121 .
  • the matrix of random projections is quantized 130 to produce a matrix of quantization indices 104 that represent the image.
  • the indices matrix can re-arranged into a query vector.
  • the query vector is transmitted to the server, which searches 140 the database 151 for similar images, and retrieves the metadata 152 for the client.
  • the invention is based, in part, on a low-dimensional embedding of scale-invariant features extracted from images.
  • the use of embeddings is justified by the following result, which forms a starting point of our theoretical development.
  • This lemma means that a small set of points, e.g., features, in a high-dimensional space can be embedded into a space which has a substantially lower dimension, while still preserving the distances between the points.
  • mapping ⁇ :R d ⁇ R k computable in randomized polynomial time, such that for all u,v ⁇ X, e.g., pixels at locations (u,v) in image X 101 .
  • the dimensionality k of the points in the range of ⁇ is independent of the dimensionality of points in X and proportional to the logarithm of number of points in X. Since k increases as ln n, the Johnson-Lindenstrauss Lemma establishes a dimensionality reduction result, in which any set of n points (features) in d-dimensional Euclidean space can be embedded into k-dimensional Euclidean space. This is extremely beneficial for querying very large databases, i.e., a large n) with several attributes, i.e., a large d.
  • One way to construct the embedding function ⁇ is to project the points (features) from X onto a spherically random hyperplane passing through the origin. In practice, this is accomplished by multiplying the data vector with the matrix 103 of independent and identically distributed (i.i.d.) if each random variables.
  • the random matrix with i.i.d. N(0,1) entries provides the distance-preserving properties in Theorem 1 with high probability. The following result makes this notion precise.
  • ⁇ (u) is a k-dimensional embedding of a d-dimensional vector.
  • Theorem 2 holds for other distributions on a(i,j), besides the normal distribution. In what follows, however, we consider only the normal Gaussian distribution.
  • Proposition 1 For real numbers ⁇ >0 and ⁇ (0,1), let there be a positive integer k that satisfies (1).
  • a matrix A ⁇ R k ⁇ d whose entries a(i,j) are drawn i.i.d. from a N(0,1) distribution.
  • q(w) be an uniform scalar quantizer with step size ⁇ applied independently to each element of w. Then, for all u,v ⁇ X, the mapping
  • g ⁇ ( u ) 1 k ⁇ q ⁇ ( Au ) satisfies (1 ⁇ ) ⁇ u ⁇ v ⁇ g ( u ) ⁇ g ( v ) ⁇ (1+ ⁇ ) ⁇ u ⁇ v ⁇ + ⁇ with probability at least as large as 1 ⁇ n ⁇ .
  • Tthe accuracy of the quantized embedding depends on the scalar quantization interval ⁇ . This, in turn, depends on the design of the scalar quantizer and the bit-rate B used to encode each coefficient.
  • non-uniform quantization There are two additional issues: non-uniform quantization, and saturation.
  • a non-uniform scalar quantizer tuned to the distribution of the projections, could improve the performance of embedding.
  • the quantization still suffers from the same trade-off between number of bits per measurement and the number of projections.
  • adjusting the saturation rate of the uniform quantizer is a way to tune the quantizer to the distribution of the projections. Reducing the range of the quantizer S, reduces the quantization interval ⁇ and the ambiguity due to quantization.
  • a user of a mobile client device wants to find out more information about a query image, such as history of a monument, or nutrition information for a food item.
  • the user can acquire the query image with a camera in the device, or down-load the image via a network.
  • the mobile device having limited resource, uses a low complexity method to generate a representation of the image to be transmitted.
  • the bandwidth required to transmit the representation is low.
  • the server has sufficient resources to quickly process the query, and transmit the metadata to the client.
  • the method initialize the random projection matrix A ⁇ R k ⁇ d with elements a(i,j): N(0,1). Images J 1 , J 2 . . . , J t for s real or synthetic scenes are acquired, where s ⁇ t, and generate the metadata D i , i ⁇ 1, 2, . . . , s ⁇ for each object.
  • the scale-invariant feature extraction method is applied to each image J i , i ⁇ 1, 2, . . . , t ⁇ , which extracts several d-dimensional features from each image.
  • the number of features extracted from each image need not be equal.
  • the procedure is performed by the mobile device using the same random projection matrix A as the server.
  • the distribution of the a(i,j) can be approximated by a pseudorandom number generator.
  • the seed of the pseudorandom number generator is sent to the mobile device as a one-time update, or included as part of the client software installation. The seed ensures that the mobile device and the server generate the same realization of the matrix A by having identical seeds.
  • each q(i,j) is represented by ⁇ log 2 L ⁇ bits.
  • the computational complexity at the mobile device is primarily determined by the scale-invariant feature extraction method, and one matrix multiplication.
  • the number of bits transmitted by the client to the server is kM ⁇ log 2 L ⁇ bits.
  • the client can reduce the number of random projections k, the quantization levels L, or the number of features M extracted from the query image.
  • FIG. 3 shows the pseudocode for the approximate nearest neighbor search 140 performed by the server. Briefly, nearest neighbors are found in the space of the quantized embeddings of image descriptors. The nearest neighbors are aggregated to obtain the matching image, and thence the associated metadata 152 .
  • the query image 201 acquired.
  • the server uses random projection 120 matching with representations of images in the database 151 .
  • Image indices 401 are obtained as a function of the number of matching occurances. The index with the highest number of occurances is used to locate the associated metadata 152 .
  • the nearest neighbor procedure initializes an s-dimensional histogram vector h to all zeros.
  • Receive Q [q 1 , q 2 , . . . , q M ] 104 representing the query image 101 .
  • the embodiments of the invention enable randomized embeddings of scale invariant image features for image searching while reducing resource consumption at the client compared with directly using the scale invariant features as in the prior art.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Processing Or Creating Images (AREA)
  • Image Analysis (AREA)

Abstract

Scale-invariant features are extracted from an image. The features are projected to a lower dimensional random projection matrix by multiplying the features by a matrix of random entries. The matrix of random projections is quantized to produce a matrix of quantization indices, which form a query vector for searching a database of images to retrieve metadata related to the image.

Description

FIELD OF THE INVENTION
This invention relates generally to extracting features from images, and more particularly to using the features to query databases.
BACKGROUND OF THE INVENTION
Augmented reality is a significant application to leverage recent advances in computing devices, and more particularly mobile devices. The mobile devices can include clients such as mobile telephones (cell phone), personal digital assistants (PDA), a tablet computers and the like. Such devices have limited memory, processing, communication, and power resources. Hence, augmented reality applications present a special chalange in mobile environments.
The devices can acquire images or videos using either a camera or network. The images can be of real world scenes, or synthetic data, such as computer graphic images or animation videos. Then, the devices can augment the experience for a user by overlaying useful information on the images or videos. The useful information can be in the form of metadata.
For example, the metadata can be information about a historical landmark, nutrition information about a food item, or a product identified with a (linear or matrix) bar in an image.
To enable such applications, it is necessary to exploit recent advances in image recognition, while recognizing the limitations on the device resources. Thus, in a typical augmented reality application, the mobile device must efficiently transmit the salient features of a query image to a database at a server that stores a large number of images or videos. The database server should quickly determine whether the query image matches an entry in the database, and return suitable metadata to the mobile device.
Many image-based augmented reality applications use scale-invariant feature transform (SIFT), speeded up robust feature (SURF), and GIST, see e.g., U.S. 20110194737.
SIFT and SURF acquire local details in an image, and therefore, have been used to match local features or patches. They can also be used for image matching and retrieval by combining hypotheses from several patches using, for example, the popular “Bag-of-Features” approach. GIST acquires global properties of the image and has been used for image matching. A GIST vector is an abstract representation of a scene that can activate a memory representations of scene categories, e.g., buildings, landscapes, landmarks, etc.
SIFT has the best performance in the presence of common image deformations, such as translation, rotation, and a limited amount of scaling. Nominally, the SIFT feature vector for a single salient point in an image is a real-valued, unit-norm 128-dimensional vector. This demands a prohibitively large bit rate required for the client to transmit the SIFT features to a database server for the purpose of image matching, especially if features from several salient points are needed for reliable matching.
A number of training-based methods are known for compressing image descriptors. Boosting Similarity Sensitive Coding (BoostSSC) and Restricted Boltzmann Machines (RBM) are known for learning compact GIST codes for content-based image retrieval. Semantic hashing has been transformed into a spectral hashing problem, in which it is only necessary to calculate eigenfunctions of the GIST features, providing better retrieval performance than BoostSSC and RBM.
Besides these relatively recently developed machine learning methods, some conventional training-based techniques such as Principle Component Analysis (PCA) and Linear Discriminant Analysis (LDA) have also been used to generate compact image descriptors. In particular, PCA has been used to produce small image descriptors by applying techniques such as product quantization, and distributed source coding. Alternatively, small image descriptors can be obtained by applying LDA to SIFT-like descriptors followed by binary quantization.
While training-based methods perform accurately in conventional image retrieval, they can become cumbersome in augmented reality applications, where the database continuously evolves as new landmarks, products, etc. are added, resulting in new image statistics and necessitating repeated training.
As a source coding-based alternative to training-based dimensionality reduction, a low-bit rate descriptor uses Compressed Histogram of Gradients (CHoG) specifically for augmented reality applications. In that method, gradient distributions are explicitly compressed, resulting in low-rate scale invariant descriptors.
Other techniques are known for efficient remote image matching based on Locality Sensitive Hashing (LSH), which is computationally simpler, but less bandwidth-efficient than CHoG, and does not need training. Random projections are determined from scale invariant features followed by one-bit quantization. The resulting descriptors are used to establish visual correspondences between images acquired in a wireless camera network. The same technique can be applied to content-based image retrieval, and a bound is obtained for the minimum number of bits needed for a specified accuracy of nearest neighbor search. However, those methods do not consider a tradeoff between dimensionality reduction and quantization levels.
SUMMARY OF THE INVENTION
The embodiments of the invention provide a method for representing an image by extracting featrures from the image. in one embodiment, the image is a query image. The representation can be performed in a client, and the extracted feaures can be transmitted to a server to search a databases of similarly represented images for matching. Metadata of matching images can be returned to the client.
The method extracts scale invariant features from the query image. Then, a small number of quantized random projections of the features are determined and quantized. The quantized projections are used to search a database at the server storing images in a similar form.
The server performs a nearest neighbor search in a low-dimensional subspace of the quantized random projections, and returns metadata corresponding to the query image.
Prior art work has shown that binary embeddings of image features enable efficient image retrieval.
The embodiments invention allow a trade-off that balances the number of random projections and the number of bits, i.e., the quantizing levels, used to store the projection.
Theoretical results suggest a bit allocation scheme under a total bit rate constraint. It is often advisable to use bits on a small number of finely quantized random projections, rather than on a large number of coarsely quantized random projections.
The method achieves a retrieval accuracy up to 94%, while requiring a mobile client device to transmit only 2.5 kB to the server for each image. This as a significant improvement over 1-bit quantization schemes known in the art.
Hence, the method is particularly suited for mobile client applications that are resource constrained.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1A is a flow diagram of a method for representing an image by a client according to embodiments of the invention;
FIG. 1B is pseudocode of the method for representing the image according to embodiments of the invention;
FIG. 2 is pseudocode of the method for representing images at a server according to embodiments of the invention;
FIG. 3 is pseudocode of the method for searching a database according to embodiments of the invention;
FIG. 4 is a schematic of operations of the embodiments.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
The embodiments of the invention provide a method for extracting features from a query image in a client. The features can be transmitted to a server, and used to search a database to retrieve similar images, and image specific metadata that are appropriate in an augmented reality application.
FIG. 1 shows a method for representing an image 101. Features 102 are extracted 110 from the image. It is understood that the method can be applied to a sequence of images, as in a video. The images can be of real world scenes, or synthetic data. The images can be acquired directly by an embedded camera, or downloaded via a network.
The features are scale-invariant. The features are multiplied 120 by a matrix 103 of random entries to produce a matrix of random projections 121. The matrix of random projections is quantized 130 to produce a matrix of quantization indices 104 that represent the image. The indices matrix can re-arranged into a query vector.
The query vector is transmitted to the server, which searches 140 the database 151 for similar images, and retrieves the metadata 152 for the client.
Quantized Randomized Embeddings
The invention is based, in part, on a low-dimensional embedding of scale-invariant features extracted from images. The use of embeddings is justified by the following result, which forms a starting point of our theoretical development.
Theorem 1 Johnson-Lindenstrauss Lemma: For a real number εε(0,1) and points n, a positive integer k is
k 4 ɛ 2 / 2 - ɛ 3 / 3 ln n .
This lemma means that a small set of points, e.g., features, in a high-dimensional space can be embedded into a space which has a substantially lower dimension, while still preserving the distances between the points.
Then, for any set X⊂Rd that contains n points, there is a mapping ƒ:Rd→Rk, computable in randomized polynomial time, such that for all u,vεX, e.g., pixels at locations (u,v) in image X 101.
(1−ε)∥u−v∥ 2≦|∥ƒ(u)−ƒ(v)∥2≦(1+ε)∥u−v∥ 2
A realization from the above result is that for a given ε, the dimensionality k of the points in the range of ƒ is independent of the dimensionality of points in X and proportional to the logarithm of number of points in X. Since k increases as ln n, the Johnson-Lindenstrauss Lemma establishes a dimensionality reduction result, in which any set of n points (features) in d-dimensional Euclidean space can be embedded into k-dimensional Euclidean space. This is extremely beneficial for querying very large databases, i.e., a large n) with several attributes, i.e., a large d.
One way to construct the embedding function ƒ is to project the points (features) from X onto a spherically random hyperplane passing through the origin. In practice, this is accomplished by multiplying the data vector with the matrix 103 of independent and identically distributed (i.i.d.) if each random variables. In particular, the random matrix with i.i.d. N(0,1) entries provides the distance-preserving properties in Theorem 1 with high probability. The following result makes this notion precise.
Theorem 2 For real numbers ε, β>0, let there be a positive integer k such that
k 4 + 2 β ɛ 2 / 2 - ɛ 3 / 3 ln n . ( 1 )
Consider a matrix AεRk×d, whose entries a(i,j) are drawn i.i.d. from a N(0,1) distribution. Let there be a set XεRd that contains n points. Then, for all uεX, the mapping
f ( u ) = 1 k Au
satisfies the distance preserving property in Theorem 1 with probability at least as large as 1−n−β
By construction, ƒ(u) is a k-dimensional embedding of a d-dimensional vector. Theorem 2 holds for other distributions on a(i,j), besides the normal distribution. In what follows, however, we consider only the normal Gaussian distribution.
We are interested in a distance-preserving property for quantized embeddings, i.e., the case when a uniform scalar quantizer is used independently to each element of ƒ(u) and ƒ(v). Theorem 2 indicates that, in the unquantized case, the embedding ƒ is ε-accurate with probability 1−n−β. We describe the embedding accuracy when quantization is used to reduce the bit rate required to store or transmit the embeddings. Furthermore, we are interested in the trade-off that balances the number of quantization levels and the number of projections k that can be transmitted with a specified bandwidth.
The following proposition is the first step in understanding this trade-off.
Proposition 1 For real numbers β>0 and εε(0,1), let there be a positive integer k that satisfies (1). Consider a matrix AεRk×d, whose entries a(i,j) are drawn i.i.d. from a N(0,1) distribution. Let there be a set XεRd that contains n points. For any vector w, let q(w) be an uniform scalar quantizer with step size Δ applied independently to each element of w. Then, for all u,vεX, the mapping
g ( u ) = 1 k q ( Au )
satisfies
(1−ε)∥u−v∥−Δ≦∥g(u)−g(v)∥≦(1+ε)∥u−v∥+Δ
with probability at least as large as 1−n−β.
Proof. First note, that quantization, q(Au), introduces error at most Δ/2 per dimension, i.e., ∥Au−q(Au)∥≦√{square root over (k)}Δ/2 for any x. Using this, along with
f ( u ) = 1 k Au
as in Theorem 2, we get ∥ƒ(u)−g(u)∥≦Δ/2. Now, take the square roots in the statement of Theorem 1 noting that, for εε(0,1), 1+ε≧√{square root over (()}1+ε) and 1−ε≦√{square root over (()}1−ε), we get
(1−ε)∥u−v∥≦∥ƒ(u)−ƒ(v)∥≦(1+ε)∥u−v∥.
Then, the right half of the theorem statement follows from the triangle inequality as
g(u)−g(v)∥≦∥g(u)−ƒ(u)∥+∥ƒ(u)−ƒ(v)∥+∥ƒ(v)−g(v)∥≦Δ/2+(1+ε)∥u−v∥+Δ/2.
The proof for the left half is similar.
Tthe accuracy of the quantized embedding depends on the scalar quantization interval Δ. This, in turn, depends on the design of the scalar quantizer and the bit-rate B used to encode each coefficient.
We consider a finite uniform scalar quantizer with saturation levels ±S, that we assume to be set such that saturation is sufficiently rare and can be ignored. Thus, B bits are used to uniformly partition the range of the quantizer 2S, making the quantization interval Δ=2−B+1S.
Using R to denote the bit rate available to transmit the k projections, i.e., setting B=R/k bits per measurement, the quantization interval is Δ=2−R/k+1S. Thus, the trade-off, implicit in Proposition 1, balances the number of projections and the number of bits per measurement, which becomes more explicit:
( 1 - ɛ ) u - v - 2 - R k + 1 S g ( u ) - g ( v ) ( 1 + ɛ ) u - v + 2 - R k + 1 S ( 2 )
Specifically, increasing the number of projections for a fixed rate R, decreases the available rate per measurement and, therefore, increases the quantization interval Δ. This, in turn, increases the quantization error ambiguity, given by the additive factor
± 2 - R k + 1 S .
Furthermore, increasing the number of projections reduces ε and, therefore, reduces the ambiguity due to Theorem 2, given by the multiplicative factor (1±ε). For fixed β and n, ε scales approximately proportionally to 1/√{square root over (k)} when small.
There are two additional issues: non-uniform quantization, and saturation. A non-uniform scalar quantizer, tuned to the distribution of the projections, could improve the performance of embedding. However, the quantization still suffers from the same trade-off between number of bits per measurement and the number of projections. Similarly, adjusting the saturation rate of the uniform quantizer is a way to tune the quantizer to the distribution of the projections. Reducing the range of the quantizer S, reduces the quantization interval Δ and the ambiguity due to quantization.
However, the probability of saturation is increased, and consequently, the unbounded error due to saturation, making the above model invalid, and the theoretical bounds inapplicable. In the context of compressive sensing reconstruction from quantized random projections, careful tuning of the saturation rate can improve performance.
The theoretical development above fails for quantization at 1-bit per projection, which is performed just by keeping the sign of the projection. If two signals in the set of interest are a multiple of each other by a positive scalar, they are indistinguishable in the embedding. While for bounded norm signals the guarantees still hold, the bounds are often too loose to be useful. Tighter bounds can instead be developed when we are interested in the angles between two signals, i.e., their correlation, instead of their distance.
Embeddings of Scale-Invariant Image Features
We now describe our method for retrieving image-specific metadata from a server database using the query image 101 that has quantized embeddings of scale-invariant features.
In an example application, a user of a mobile client device, e.g., a cell phone, wants to find out more information about a query image, such as history of a monument, or nutrition information for a food item. The user can acquire the query image with a camera in the device, or down-load the image via a network.
The device transmits a representation of the query image to the database server. The server locates a similar image in the database. The image closely matches the query image according to some predetermined distance criterion, and transmits metadata of associated with that image to the client.
For a practical application, the following requirements should be satisfied. The mobile device, having limited resource, uses a low complexity method to generate a representation of the image to be transmitted. The bandwidth required to transmit the representation is low. The server has sufficient resources to quickly process the query, and transmit the metadata to the client.
We are primarily concerned with the first two requirements, which are the most challenging. Nevertheless, the server-based matching and enhance metadata compression supplements the advantages of our method.
As shown in the pseudocode of FIG. 2, we describe steps taken by the server to construct the database 201. It is understood that the representations of images stored in the database is similar to the representation of the query.
The method initialize the random projection matrix AεRk×d with elements a(i,j): N(0,1). Images J1, J2 . . . , Jt for s real or synthetic scenes are acquired, where s≦t, and generate the metadata Di, iε{1, 2, . . . , s} for each object.
The scale-invariant feature extraction method is applied to each image Ji, iε{1, 2, . . . , t}, which extracts several d-dimensional features from each image. The number of features extracted from each image need not be equal.
Then, using all the feature vectors thus obtained, construct the matrix V=[v1, v2, . . . , vN], which contains feature vectors from all images in the database. Typically, N>>s.
Determine the random matrix W=[w1, w2, . . . , wN]=AVεRk×N, where each wi is a k-dimensional random projection of the corresponding vi.
Store a lookup vector Λ⊂{1, 2, . . . , s}N, where the element λ(i), iε{1, 2, . . . , N} indexes the image from which the vector wi was extracted.
Next, we describe the query procedure. The procedure is performed by the mobile device using the same random projection matrix A as the server. The distribution of the a(i,j) can be approximated by a pseudorandom number generator. The seed of the pseudorandom number generator is sent to the mobile device as a one-time update, or included as part of the client software installation. The seed ensures that the mobile device and the server generate the same realization of the matrix A by having identical seeds.
We initialize the random projection matrix AεRk×d, where the elements a(i,j): N(0,1). Acquire the query image I 101. Apply 110 the scale-invariant feature extraction method on I to derive the matrix X=[x1, x2, . . . , xM] 102, where xi is a d-dimensional feature vector corresponding to the ith key point descriptor from the image I.
Determine the matrix Y=[y1, y2, . . . , yM]=AXεR k×M 103, where each yi is a k-dimensional random projection of the corresponding xi.
Determine the matrix 104 of quantized random projections Q=q(Y), where the function q(•) is a scalar quantizer that takes each y(i,j), iε{1, 2, . . . , k},jε{1, 2, . . . , M} and produces a integer quantization index q(i,j), i.e., for an L level quantizer, q(i,j)ε0, 1, . . . , L−1.
Transmit the matrix Q to the server 150, using element-wise fixed length coding of the quantization indices. Thus, each q(i,j) is represented by ┌log2 L┐ bits.
Based on the above procedure, the computational complexity at the mobile device is primarily determined by the scale-invariant feature extraction method, and one matrix multiplication. The number of bits transmitted by the client to the server is kM┌log2 L┐ bits.
To minimize transmit power, the client can reduce the number of random projections k, the quantization levels L, or the number of features M extracted from the query image.
FIG. 3 shows the pseudocode for the approximate nearest neighbor search 140 performed by the server. Briefly, nearest neighbors are found in the space of the quantized embeddings of image descriptors. The nearest neighbors are aggregated to obtain the matching image, and thence the associated metadata 152.
FIG. 4 shows the operation schematically, where r=10. In FIG. 4, the query image 201 acquired. The server uses random projection 120 matching with representations of images in the database 151. Image indices 401 are obtained as a function of the number of matching occurances. The index with the highest number of occurances is used to locate the associated metadata 152.
The nearest neighbor procedure initializes an s-dimensional histogram vector h to all zeros.
Receive Q=[q1, q2, . . . , qM] 104 representing the query image 101. Receive the number of quantization levels L.
Invert the quantization function q(•) and obtain the reconstructed random projection matrix Ŷ=q−1(Q)=[ŷ1, ŷ2, . . . ŷM] 103.
Determine Ŵ=q−1(q(W)), which contains the quantized reconstruction of all k-dimensional random projections corresponding from all t images of s objects.
For each iε{1, 2, . . . , M}, locate the nearest neighbor of ŷi among ŵ1, ŵ2, . . . , ŵN. Out of these M nearest neighbor pairs, select r pairs (ŷ(j)(j)), j=1, 2, . . . , r that are closest in Euclidean distance.
For each ŵ(j), j=1, 2, . . . , r, read the index α jε{1, 2, . . . , s} of the object from which the element occurs. This is readily available from the lookup table Λ, and increment h(αj) by 1.
Set the most similar object/image to the query image as argmax α h(α), and transmit the metadata of this object back to the mobile device.
When more images, or superior quality images or richer metadata become available, they can be appended to the database without affecting the querying method.
Effect Of The Invention
The embodiments of the invention enable randomized embeddings of scale invariant image features for image searching while reducing resource consumption at the client compared with directly using the scale invariant features as in the prior art.
When the bandwidth is constrained, it makes sense to allocate bits towards increasing the number of incoherent projections. However, after a certain minimum number of random projections is satisfied, it is more beneficial to utilize any additional bits toward representing the projections with high fidelity, rather than continuing to increase the number of coarsely quantized random projections.
Although the invention has been described by way of examples of preferred embodiments, it is to be understood that various other adaptations and modifications can be made within the spirit and scope of the invention. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the invention.

Claims (18)

We claim:
1. A method for representing an image, comprising the steps of:
extracting features from the image, wherein the features are scale-invariant;
projecting the features to produce a matrix of random projections by multiplying the features by a matrix of random entries; and
quantizing the matrix of random projections to produce a matrix of quantization indices representing the image, wherein the steps are performed by a processor in a client.
2. The method of claim 1, wherein the client is a mobile device.
3. The method of claim 2, wherein the transmitting uses an element-wise fixed length coding of the quantization indices.
4. The method of claim 1, wherein the matrix of quantization is a query vector; and further comprising:
transmitting the, query vector to a server;
receiving metadata related to the image by searching a database of images represented similarly as the image with respect to the extracting, multiplying, and quantizing.
5. The method of claim 4, wherein a rate of the transmitting depends on a quantization interval.
6. The method of claim 4, wherein a seed for generating the matrix of random entries is identical at the client and the server.
7. The method of claim 4, wherein the query vector is kM┌log2L┐ bits, where a number of the random projections is k, a quantization levels L, and a number of features is M.
8. The method of claim 7, further comprising:
minimizining transmit power by reducing a number of random projections k, a number of quantization levels L or a number of features M.
9. The method of claim 4, wherein the searching is a nearest neighbor search.
10. The method of claim 9, wherein the nearest neighbor search uses Euclidean distances between the image and the database of images.
11. The method of claim 4, wherein the server determines a number of matching occurances for the query image, and the metadata is related to the image in the database set of images with a highest number of occurances.
12. The method of claim 1, wherein the quantization uses a non-uniform scalar quantizer applied individually to each element of the matrix of random projections.
13. The method of claim 1, wherein the quantization is a vector quantizer applied individually to groups of elements in the matrix of random projections.
14. The method of claim 1, further comprising:
transmitting of the matrix of quantization indices to server
using variable-length entropy coding of the quantization indices.
15. The method of claim 1, further comprising:
reducing a dimensionality of the features by the projecting.
16. The method of claim 15, further comprising:
balancing the dimensionality reduction by the projecting and a number of levels of the quantizing.
17. The method of claim 15, wherein the dimensionality reduction preserves distances between pairs of the features.
18. The method of claim 1, wherein the quantization uses a uniform scalar quantizer applied separately to each random projection.
US13/525,222 2011-11-08 2012-06-15 Method for representing images using quantized embeddings of scale-invariant image features Active 2032-12-12 US8891878B2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US13/525,222 US8891878B2 (en) 2012-06-15 2012-06-15 Method for representing images using quantized embeddings of scale-invariant image features
US13/733,517 US8768075B2 (en) 2011-11-08 2013-01-03 Method for coding signals with universal quantized embeddings
JP2013100965A JP5950864B2 (en) 2012-06-15 2013-05-13 A method for representing images using quantized embedding of scale-invariant image features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US13/525,222 US8891878B2 (en) 2012-06-15 2012-06-15 Method for representing images using quantized embeddings of scale-invariant image features

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US13/291,384 Continuation-In-Part US8837727B2 (en) 2011-11-08 2011-11-08 Method for privacy preserving hashing of signals with binary embeddings

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/733,517 Continuation-In-Part US8768075B2 (en) 2011-11-08 2013-01-03 Method for coding signals with universal quantized embeddings

Publications (2)

Publication Number Publication Date
US20130336588A1 US20130336588A1 (en) 2013-12-19
US8891878B2 true US8891878B2 (en) 2014-11-18

Family

ID=49755988

Family Applications (1)

Application Number Title Priority Date Filing Date
US13/525,222 Active 2032-12-12 US8891878B2 (en) 2011-11-08 2012-06-15 Method for representing images using quantized embeddings of scale-invariant image features

Country Status (2)

Country Link
US (1) US8891878B2 (en)
JP (1) JP5950864B2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10878291B2 (en) 2019-03-28 2020-12-29 International Business Machines Corporation Visually guided query processing
US10885098B2 (en) 2015-09-15 2021-01-05 Canon Kabushiki Kaisha Method, system and apparatus for generating hash codes
US11341277B2 (en) * 2018-04-20 2022-05-24 Nec Corporation Method and system for securing machine learning models

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6041789B2 (en) * 2013-01-03 2016-12-14 三菱電機株式会社 Method for encoding an input signal
US9158807B2 (en) * 2013-03-08 2015-10-13 International Business Machines Corporation Fast distributed database frequency summarization
US10386435B2 (en) 2013-11-06 2019-08-20 The Research Foundation For The State University Of New York Methods and systems for fast auto-calibrated reconstruction with random projection in parallel MRI
JP6456031B2 (en) * 2014-03-25 2019-01-23 キヤノン株式会社 Image recognition apparatus, image recognition method, and program
US20170039198A1 (en) * 2014-05-15 2017-02-09 Sentient Technologies (Barbados) Limited Visual interactive search, scalable bandit-based visual interactive search and ranking for visual interactive search
US20150331908A1 (en) 2014-05-15 2015-11-19 Genetic Finance (Barbados) Limited Visual interactive search
US9432702B2 (en) * 2014-07-07 2016-08-30 TCL Research America Inc. System and method for video program recognition
US9337815B1 (en) * 2015-03-10 2016-05-10 Mitsubishi Electric Research Laboratories, Inc. Method for comparing signals using operator invariant embeddings
US10394777B2 (en) 2015-09-24 2019-08-27 Google Llc Fast orthogonal projection
CN107636639B (en) * 2015-09-24 2021-01-08 谷歌有限责任公司 Fast orthogonal projection
KR102221118B1 (en) * 2016-02-16 2021-02-26 삼성전자주식회사 Method for extracting feature of image to recognize object
US10909459B2 (en) 2016-06-09 2021-02-02 Cognizant Technology Solutions U.S. Corporation Content embedding using deep metric learning algorithms
US10579688B2 (en) * 2016-10-05 2020-03-03 Facebook, Inc. Search ranking and recommendations for online social networks based on reconstructed embeddings
US10984045B2 (en) 2017-05-24 2021-04-20 International Business Machines Corporation Neural bit embeddings for graphs
WO2019040136A1 (en) * 2017-08-23 2019-02-28 Google Llc Multiscale quantization for fast similarity search
US10755144B2 (en) 2017-09-05 2020-08-25 Cognizant Technology Solutions U.S. Corporation Automated and unsupervised generation of real-world training data
US10755142B2 (en) 2017-09-05 2020-08-25 Cognizant Technology Solutions U.S. Corporation Automated and unsupervised generation of real-world training data
US11574201B2 (en) 2018-02-06 2023-02-07 Cognizant Technology Solutions U.S. Corporation Enhancing evolutionary optimization in uncertain environments by allocating evaluations via multi-armed bandit algorithms
CN116384497B (en) * 2023-05-11 2023-08-25 深圳量旋科技有限公司 Reading and writing system, related method, device and equipment for quantum computing experimental result

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130028516A1 (en) * 2009-09-01 2013-01-31 Children"s Medical Center Corporation Image registration methods and apparatus using random projections
US8542869B2 (en) * 2010-06-02 2013-09-24 Dolby Laboratories Licensing Corporation Projection based hashing that balances robustness and sensitivity of media fingerprints

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4766197B2 (en) * 2009-01-29 2011-09-07 日本電気株式会社 Feature selection device
JP5347897B2 (en) * 2009-10-15 2013-11-20 株式会社リコー Annotation apparatus, method and program
JP2011150541A (en) * 2010-01-21 2011-08-04 Sony Corp Learning apparatus, learning method and program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130028516A1 (en) * 2009-09-01 2013-01-31 Children"s Medical Center Corporation Image registration methods and apparatus using random projections
US8542869B2 (en) * 2010-06-02 2013-09-24 Dolby Laboratories Licensing Corporation Projection based hashing that balances robustness and sensitivity of media fingerprints

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10885098B2 (en) 2015-09-15 2021-01-05 Canon Kabushiki Kaisha Method, system and apparatus for generating hash codes
US11341277B2 (en) * 2018-04-20 2022-05-24 Nec Corporation Method and system for securing machine learning models
US10878291B2 (en) 2019-03-28 2020-12-29 International Business Machines Corporation Visually guided query processing

Also Published As

Publication number Publication date
JP2014002723A (en) 2014-01-09
US20130336588A1 (en) 2013-12-19
JP5950864B2 (en) 2016-07-13

Similar Documents

Publication Publication Date Title
US8891878B2 (en) Method for representing images using quantized embeddings of scale-invariant image features
Guo et al. Content-based image retrieval using features extracted from halftoning-based block truncation coding
Chandrasekhar et al. Compressed histogram of gradients: A low-bitrate descriptor
Duan et al. Overview of the MPEG-CDVS standard
Chen et al. Tree histogram coding for mobile image matching
Chen et al. Residual enhanced visual vectors for on-device image matching
Sánchez et al. High-dimensional signature compression for large-scale image classification
Duan et al. Compact descriptors for visual search
US9256617B2 (en) Apparatus and method for performing visual search
US20130039566A1 (en) Coding of feature location information
US8774509B1 (en) Method and system for creating a two-dimensional representation of an image based upon local representations throughout the image structure
US20170026665A1 (en) Method and device for compressing local feature descriptor, and storage medium
US20180341805A1 (en) Method and Apparatus for Generating Codebooks for Efficient Search
Chandrasekhar et al. Survey of SIFT compression schemes
Li et al. Quantized embeddings of scale-invariant image features for mobile augmented reality
Chen et al. A hybrid mobile visual search system with compact global signatures
Chandrasekhar et al. Feature matching performance of compact descriptors for visual search
Rane et al. Quantized embeddings: An efficient and universal nearest neighbor method for cloud-based image retrieval
Li et al. Online variable coding length product quantization for fast nearest neighbor search in mobile retrieval
Chandrasekhar et al. Quantization schemes for low bitrate compressed histogram of gradients descriptors
Wu et al. Codebook-free compact descriptor for scalable visual search
Guo et al. Parametric and nonparametric residual vector quantization optimizations for ANN search
Boufounos et al. Dimensionality reduction of visual features for efficient retrieval and classification
Du et al. A Low Overhead Progressive Transmission for Visual Descriptor Based on Image Saliency.
Khapli et al. Compressed domain image retrieval using thumbnails of images

Legal Events

Date Code Title Description
AS Assignment

Owner name: MITSUBISHI ELECTRIC RESEARCH LABORATORIES, INC., M

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:RANE, SHANTANU;BOUFOUNOS, PETROS;LI, MU;SIGNING DATES FROM 20120615 TO 20120627;REEL/FRAME:028449/0482

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551)

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8