CN113704531A - Image processing method, image processing device, electronic equipment and computer readable storage medium - Google Patents
Image processing method, image processing device, electronic equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN113704531A CN113704531A CN202110261801.3A CN202110261801A CN113704531A CN 113704531 A CN113704531 A CN 113704531A CN 202110261801 A CN202110261801 A CN 202110261801A CN 113704531 A CN113704531 A CN 113704531A
- Authority
- CN
- China
- Prior art keywords
- image
- similarity
- images
- training
- initial
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Library & Information Science (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
Abstract
The application discloses an image processing method, an image processing device, electronic equipment and a computer readable storage medium, which relate to the technical field of artificial intelligence, cloud technology and image processing, and the method comprises the following steps: acquiring a first training data set and a second training data set, and pre-training the initial neural network model based on the first training data set to obtain a pre-trained neural network model; and training the pre-trained neural network model based on a second training data set to obtain an image similarity model. According to the method, since the first training data set is automatically determined by data augmentation on each initial image, a large number of training samples with similarity labeling results can be generated based on the first training data set, and data support is provided for model training. Furthermore, the manually labeled second sample image set and the similarity labeling result thereof are more accurate, so that the image similarity model obtained by training based on the second training data set has better performance.
Description
Technical Field
The present application relates to the technical field of artificial intelligence, big data processing, and image processing, and in particular, to an image processing method, an apparatus, an electronic device, and a computer-readable storage medium.
Background
The image similarity is widely applied to common image processing application scenes such as image retrieval, image recognition and the like, and in practical application, the similarity of two images can be determined based on a trained image similarity model. In order to obtain an image similarity model capable of determining the similarity between two images, in the prior art, the image similarity model is usually obtained by training based on a complete supervised learning manner, that is, the image similarity model is obtained by training based on a large number of manually labeled image pairs, and in order to obtain a model with higher precision, a large number of sample image pairs are usually required to be manually labeled, so that the number of sample image pairs in the manual labeling manner is relatively limited, which results in limited performance of the image similarity model and waste of a large amount of manpower.
Disclosure of Invention
The present application aims to solve at least one of the above technical drawbacks, and particularly proposes the following technical solutions to solve the problem of improving the performance of the image similarity model.
According to an aspect of the present application, there is provided an image processing method including:
acquiring a first training data set and a second training data set, wherein the first training data set comprises a plurality of first sample image sets, the second training data set comprises a plurality of second sample image sets, the first sample image sets and the similarity marking results thereof are determined by performing data augmentation on each initial image, and the similarity marking results of the second sample image sets are manual marking results;
pre-training the initial neural network model based on a first training data set to obtain a pre-trained neural network model;
and training the pre-trained neural network model based on the second training data set to obtain an image similarity model, and determining the similarity of the image pair through the image similarity model.
According to another aspect of the present application, there is provided an image processing method including:
acquiring at least two images to be processed;
processing at least two images to be processed by calling an image similarity model to obtain the similarity of each image pair in the at least two images to be processed, and processing the at least two images to be processed based on the similarity;
the image similarity model is obtained by the method shown in the first aspect of the application.
According to another aspect of the present application, there is provided an image processing apparatus including:
the training data acquisition module is used for acquiring a first training data set and a second training data set, wherein the first training data set comprises a plurality of first sample image sets, the second training data set comprises a plurality of second sample image sets, the first sample image sets and the similarity marking results of the first sample image sets are determined by data augmentation of all initial images, and the similarity marking results of the second sample image sets are manual marking results;
and the model training module is used for pre-training the initial neural network model based on the first training data set to obtain a pre-trained neural network model, training the pre-trained neural network model based on the second training data set to obtain an image similarity model, and determining the similarity of the image pair through the image similarity model.
In a possible implementation manner, when the training data obtaining module obtains the first training data set, the training data obtaining module is specifically configured to:
acquiring a plurality of initial images;
for each initial image, performing data augmentation processing on the initial image to obtain at least two sub-images corresponding to the initial image;
obtaining a plurality of first positive sample image sets and similarity marking results of the first positive sample image sets based on two sub-images belonging to the same initial image in the sub-images of the initial images;
obtaining a plurality of first negative sample image sets and similarity marking results of the first negative sample image sets based on two sub-images belonging to different initial images in the sub-images of the initial images;
wherein the plurality of first sample image sets includes a plurality of first positive sample image sets and a plurality of first negative sample image sets.
In one possible implementation, the second sample image set is a positive sample image set, and the second training data set further includes a plurality of first negative sample image sets.
In one possible implementation, the plurality of second sample image sets includes at least one of a plurality of second positive sample image sets or a plurality of second negative sample image sets, and the similarity of the second positive sample image sets is less than or equal to a first threshold; the similarity of the second negative sample image set is greater than or equal to a second threshold;
wherein the similarity of the second sample image set is determined by the pre-trained neural network model.
In one possible implementation, the data augmentation process includes at least one of:
image cutting; smearing treatment; fuzzy processing; color transformation; gray level transformation; rotating the image; and (5) image turning.
In a possible implementation manner, when the model training module pre-trains the initial neural network model based on the first training data set to obtain a pre-trained neural network model, the model training module is specifically configured to:
repeatedly executing the following training steps until the pre-training loss value meets the pre-training end condition to obtain a pre-trained neural network model:
inputting each first sample image set into an initial neural network model, respectively extracting the image characteristics of two images in each first sample image set through the initial neural network model, and predicting to obtain the prediction similarity of the first sample image set based on the image characteristics of the two images;
determining a pre-training loss value according to the prediction similarity and the similarity marking result of each first sample image set;
if the pre-training loss value meets the pre-training ending condition, ending the pre-training; if not, adjusting the model parameters of the initial neural network model, and repeating the training steps.
According to another aspect of the present application, there is provided an image processing apparatus including:
the image acquisition module is used for acquiring at least two images to be processed;
the image processing module is used for processing the at least two images to be processed by calling the image similarity model to obtain the similarity of each image pair in the at least two images to be processed so as to process the at least two images to be processed based on the similarity;
the image similarity model is obtained by the method shown in the first aspect of the application.
In a possible implementation manner, when the image obtaining module obtains at least two images to be processed, the image obtaining module is specifically configured to:
acquiring an image retrieval request, wherein the image retrieval request comprises a retrieval image;
acquiring an image database corresponding to the image retrieval request, wherein at least two images to be processed comprise a retrieval image and a retrieved image in the image database, and an image pair comprises the retrieval image and the retrieved image;
the device also includes:
and the image retrieval module is used for determining a target image corresponding to the image retrieval request from the image database according to the similarity of each image pair and providing the target image for a retriever.
In a possible implementation manner, when the image obtaining module obtains at least two images to be processed, the image obtaining module is specifically configured to:
acquiring an image set to be processed, wherein at least two images to be processed are images in the image set to be processed, and the image pair is any two images in the image set to be processed;
the device also includes:
and the image classification module is used for classifying the images in the image set to be processed according to the similarity of each image pair.
According to yet another aspect of the present application, there is provided an electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the image processing method of the present application when executing the computer program.
According to yet another aspect of the present application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image processing method of the present application.
Embodiments of the present invention also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations of the image processing method described above.
The beneficial effect that technical scheme that this application provided brought is:
according to the image processing method, the image processing device, the electronic equipment and the computer readable storage medium, when the image similarity model for determining the similarity of the image pair is obtained, the first sample image set in the first training data set of the model and the similarity annotation result of the first sample image set are automatically determined by performing data augmentation on each initial image, so that a large number of training samples with the similarity annotation result can be generated based on a data augmentation mode, and data support is provided for training of the model. Furthermore, the scheme of the embodiment of the application also provides a second training data set of the model, and as the second sample image set in the second training data set and the similarity marking result thereof are manually marked, the manually marked similarity marking result is more accurate, and therefore, the image similarity model obtained by training based on the second training data set has better performance.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic flowchart of an image processing method according to an embodiment of the present application;
fig. 2 is a schematic diagram of an image after a smearing process according to an embodiment of the present application;
fig. 3 is a schematic diagram of a network structure according to an embodiment of the present application;
fig. 4 is a schematic diagram illustrating a training flow of an image similarity model in an image processing method according to an embodiment of the present application;
FIG. 5 is a flowchart illustrating a further image processing method according to an embodiment of the present application;
fig. 6 is a flowchart illustrating an image processing method according to an embodiment of the present application;
fig. 7 is a schematic diagram of an implementation environment of an image processing method according to an embodiment of the present application;
FIG. 8 is a diagram illustrating an environment for implementing another image processing method according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of another image processing apparatus according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
The embodiment of the application provides an image processing method for improving the efficiency of obtaining training samples and improving the performance of a model, the method can be suitable for any scene needing to determine the similarity of image pairs, and the method relates to artificial intelligence, big data processing and cloud technology, in particular to the fields of machine learning, computer vision technology and the like in the artificial intelligence technology.
In an embodiment of the present application, the scheme provided in the embodiment of the present application may be implemented based on a cloud technology, and the data processing (including but not limited to data computing, etc.) related in each optional embodiment may be implemented by using cloud computing. Cloud technology refers to a hosting technology for unifying serial resources such as hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. The cloud technology is based on the general names of network technology, information technology, integration technology, management platform technology, application technology and the like applied in the cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can only be realized through cloud computing.
Cloud computing (cloud computing) is a computing model that distributes computing tasks over a pool of resources formed by a large number of computers, enabling various application systems to obtain computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the "cloud" appear to the user as being infinitely expandable and available at any time, available on demand, expandable at any time, and paid for on-demand.
As a basic capability provider of cloud computing, a cloud computing resource pool (called as a cloud Platform in general, an Infrastructure as a Service) Platform is established, and multiple types of virtual resources are deployed in the resource pool for selective use by external clients, the cloud computing resource pool mainly includes a computing device (including an operating system, for a virtualized machine), a storage device, and a network device, and is divided according to logical functions, a PaaS (Platform as a Service) layer may be deployed on an IaaS (Infrastructure as a Service) layer, a SaaS (Software as a Service) layer may be deployed on the PaaS layer, or the SaaS may be directly deployed on the IaaS layer, the PaaS may be a Platform running on Software, such as a web database, a container, and the like, as business Software of various websites, a web portal, and the like, SaaS and PaaS are upper layers relative to IaaS.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Machine Learning (ML) is a multi-domain cross subject, and relates to multiple subjects such as probability theory, statistics, approximation theory, convex analysis and algorithm complexity theory. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and counterlearning.
With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.
Big data (Big data) refers to a data set which cannot be captured, managed and processed by a conventional software tool within a certain time range, and is a massive, high-growth-rate and diversified information asset which can have stronger decision-making power, insight discovery power and flow optimization capability only by a new processing mode. With the advent of the cloud era, big data has attracted more and more attention, and the big data needs special technology to effectively process a large amount of data within a tolerance elapsed time. The method is suitable for the technology of big data, and comprises a large-scale parallel processing database, data mining, a distributed file system, a distributed database, a cloud computing platform, the Internet and an extensible storage system.
Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.
The scheme provided by the embodiment of the application can be executed by any electronic device, can be executed by user terminal equipment, and can also be executed by a server, wherein the server can be an independent physical server, a server cluster or distributed system formed by a plurality of physical servers, and a cloud server for providing cloud computing service. The terminal device may comprise at least one of: smart phones, tablet computers, notebook computers, desktop computers, smart speakers, smart watches, smart televisions, and smart car-mounted devices.
The source of the training data required by the model training in the embodiment of the present application is not limited in the embodiment of the present application, and may include an existing training data set, may also include big data acquired from the internet, and may also include training data obtained by a data augmentation method in the image processing method based on the present application.
The following describes the technical solutions of the present application and how to solve the above technical problems in detail with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
The embodiment of the present application provides a possible implementation manner, and as shown in fig. 1, provides a flowchart of an image processing method, where the scheme may be executed by any electronic device, for example, the scheme of the embodiment of the present application may be executed on a terminal device or a server, or executed by both the terminal device and the server. For convenience of description, the method provided by the embodiment of the present application will be described below by taking a server as an execution subject. As shown in the flow chart of fig. 1, the method may comprise the steps of:
step S110, a first training data set and a second training data set are obtained, where the first training data set includes a plurality of first sample image sets, and the second training data set includes a plurality of second sample image sets.
The first sample image set and the similarity marking result thereof are determined by data augmentation of each initial image, and the similarity marking result of the second sample image set is a manual marking result.
The similarity labeling result represents the similarity between the two images, and the specific mode of similarity labeling is not limited in the application. The first sample image set includes at least two images, any two of which are similar or dissimilar. The second sample set of images also includes at least two images, any two of which are similar or dissimilar. The similarity score of the first sample image set characterizes that any two images in the image set are similar or dissimilar, and the similarity score of the second sample image set characterizes that any two images in the image set are similar or dissimilar.
As an example, the similarity labeling result is represented by the numbers 1 and 0, where a number of 1 indicates that the two images are similar, and a number of 0 indicates that the two images are not similar.
For example, in an example of the present application, the similarity threshold is 0.6, when the similarity of the two images is greater than 0.6, the two images are similar, otherwise, the two images are not similar. It should be noted that the similarity threshold may be configured based on actual requirements, for example, in a scenario with a high similarity requirement, the similarity threshold may be appropriately increased, for example, the similarity threshold is 0.8.
The data expansion processing means that the data amount is increased without changing the image type, that is, an image set with a large data amount and diversification can be obtained as training data by the data expansion processing on the basis of the initial image.
Because the first sample image set and the similarity labeling result thereof are determined by data augmentation of each initial image, the image sets of various scenes cannot be covered by the data augmentation method, and the similarity labeling result of the image set determined by the data augmentation method may not be completely correct, in the scheme of the present application, the second training data set is determined by a manual labeling method, that is, a plurality of second sample image sets and the similarity labeling result thereof are determined by a manual labeling method. Therefore, a part of sample data is marked in a manual mode, so that the data used for training can cover more scenes as far as possible, and the image similarity model obtained by training is more generalized and has better performance.
Alternatively, the second sample image set may be image sets that are less readily determined to be similar, such as image sets obtained by panning, image sets with different layouts, and image sets obtained by screen-capturing. And training an image similarity model based on the image set, so that the generalization capability of the model can be better improved.
It is understood that the initial image may be a manually captured image or an image selected from an image library, and in this embodiment, the source of the initial image is not limited.
And step S120, pre-training the initial neural network model based on the first training data set to obtain a pre-trained neural network model.
Step S130, training the pre-trained neural network model based on the second training data set to obtain an image similarity model, and determining the similarity of the image pair through the image similarity model.
The image similarity model obtained through the training is used for determining the similarity of the image pair, namely the similarity of two images in the image set can be determined through the image similarity model.
As an alternative, the model architecture of the initial neural network model is not limited in this embodiment, and may be any initial neural network model that can be used for determining image similarity, such as a SimCLR (a Simple Framework for contrast Learning of Visual Representations) or SimSiam (Simple parameter, twin) network.
According to the scheme, when the image similarity model used for determining the similarity of the image pair is obtained, the first sample image set in the first training data set of the model and the similarity marking result are automatically determined by data augmentation on each initial image, so that a large number of training samples with the similarity marking result can be generated based on a data augmentation mode, and data support is provided for training of the model. Furthermore, the scheme of the embodiment of the application also provides a second training data set of the model, and as the second sample image set in the second training data set and the similarity marking result thereof are manually marked, the manually marked similarity marking result is more accurate, and therefore, the image similarity model obtained by training based on the second training data set has better performance.
In one embodiment of the present application, obtaining the first training data set may include:
acquiring a plurality of initial images;
for each initial image, performing data augmentation processing on the initial image to obtain at least two sub-images corresponding to the initial image;
obtaining a plurality of first positive sample image sets and similarity marking results of the first positive sample image sets based on two sub-images belonging to the same initial image in the sub-images of the initial images;
obtaining a plurality of first negative sample image sets and similarity marking results of the first negative sample image sets based on two sub-images belonging to different initial images in the sub-images of the initial images;
wherein the plurality of first sample image sets includes a plurality of first positive sample image sets and a plurality of first negative sample image sets.
Optionally, the multiple initial images may be multiple images covering as many application scenes as possible, and different images in the multiple initial images may be dissimilar, that is, the similarity between any two initial images is relatively low.
The subgraph refers to an image derived based on an initial image, for example, an image obtained by performing image transformation on the initial image. Taking an initial image as an example, after data augmentation processing is performed on the initial image, at least two sub-images are obtained, and each image in the at least two sub-images is similar to the initial image. Then a first positive sample image set corresponding to the initial image set and a similarity annotation result of the first positive sample image set can be determined based on at least two subgraphs corresponding to the initial image set, and at least one pair of first positive sample image sets can be determined corresponding to each initial image.
Among the subgraphs of each initial image, two subgraphs belonging to different initial images are dissimilar, as an example, the two initial images are the initial image x1 and the initial image x2, respectively, and the subgraph corresponding to the initial image x1 is dissimilar to the subgraph corresponding to the initial image x 2. Then a plurality of first negative sample image sets and similarity labeling results of the first negative sample image sets can be obtained based on two sub-images belonging to different initial images in the sub-images of the initial images.
In the data augmentation process, the data amount is increased without changing the image category, that is, only the display form of the initial image is changed on the basis of the initial image, for example, the size of the initial image, the color of the initial image, and the like are changed, and the image content in the initial image is not changed, so that two sub-images corresponding to the same initial image are similar images, and therefore, the similarity labeling result of the first positive sample image set is similar, that is, any two images in the first positive sample image set are similar. The image content of different initial images is different, so that the two sub-images corresponding to different initial images are dissimilar, and therefore, the similarity marking result of the first negative sample image set is dissimilar, that is, any two images in the first negative sample image set are dissimilar.
If the plurality of first sample image sets include a plurality of first positive sample image sets, and any two images in each first positive sample image set are similar, the similarity score for each first positive sample image set is similar. If the plurality of first sample image sets include a plurality of first positive sample image sets and a plurality of negative sample image sets, and any two images in each negative sample image set are dissimilar, the similarity score for each negative sample image set is dissimilar.
In an alternative aspect of the application, the data augmentation process includes at least one of:
image cutting; smearing treatment; fuzzy processing; color transformation; gray level transformation; rotating the image; and (5) image turning.
How to acquire the first training data set is further described below based on the above data augmentation process:
acquiring N initial images, optionally, N is greater than or equal to 20000, where the N initial images cover various different scenes as much as possible, and the N initial images do not need manual annotation.
And for each initial image in the N initial images, performing data augmentation processing on each initial image to obtain at least two sub-images corresponding to each initial image.
The following describes a scheme of performing different data amplification processing on an initial image to obtain at least two corresponding sub-images, taking the initial image as an example:
(1) image cropping
And (4) carrying out image cutting on the initial image to obtain a cut image, and directly taking the cut image as a sub-image of the initial image.
When the original image is cropped, the proportion of the cropped sub-image to the original image (original image) may be controlled by parameters, and may be optionally between 0.2 and 1.
Because the cropped images may have different sizes, in order to facilitate subsequent processing of the sub-image, the cropped images may be subjected to size transformation and transformed into the fixed-size images, and then the fixed-size cropped images are used as the sub-images of the initial image.
As an example, if the fixed size can be set to 256, the size of the sub-image obtained by the image cropping processing is 256.
(2) Painting process
And smearing the initial image, and taking the obtained image after smearing as a sub-image corresponding to the initial image. Smearing the initial image may enhance robustness to smearing interference.
The smearing process is to add different types of elements, such as at least one of images, characters, symbols, special effects or lines, to the initial image; a sub-graph may contain one element or may contain at least two elements simultaneously.
Referring to the image after the smearing process shown in fig. 2, a graph a is a sub-graph obtained by adding a line a to the initial image, a graph b is a sub-graph obtained by adding a line b to the initial image, and the line a and the line b are lines with different colors.
The text contents in the graph a and the graph b are the same, and the graph a is only a schematic diagram of a sub-graph obtained by adding the line a, the graph b is only a schematic diagram of a sub-graph obtained by adding the line b to the initial image, and the content in the image does not limit the scheme.
(3) Fuzzy processing
And carrying out fuzzy processing on the initial image to obtain a fuzzy image serving as a subgraph corresponding to the initial image. The initial image is subjected to the blurring processing, so that the robustness of similarity learning of the blurred image can be enhanced.
As an alternative, the blurring process may be a gaussian blurring process.
(4) Color conversion
And carrying out color transformation processing on the initial image, and taking the processed image as a sub-image corresponding to the initial image. The initial image is subjected to color transformation processing, so that the robustness of similarity learning for the transformation of illumination, colors and the like in the image can be enhanced.
The color transformation process includes, but is not limited to, a luminance transformation, a contrast transformation, a saturation transformation, and an rgb (red green blue) color transformation.
(5) Gray scale conversion
And carrying out gray level transformation on the initial image, and taking the processed image as a sub-image corresponding to the initial image. When the initial image is a color RGB image, gray level transformation can be carried out on the initial image, and the robustness of similarity learning of the gray level image can be enhanced.
(6) Image rotation
And performing image rotation, for example, clockwise rotation or counterclockwise rotation, on the initial image, and taking the processed image as a sub-image corresponding to the initial image. The image rotation is carried out on the initial image, so that the robustness of the similarity learning of the rotated image can be enhanced.
(7) Image flipping
And performing image turning on the initial image, for example, horizontal turning or vertical turning, and taking the processed image as a sub-image corresponding to the initial image. And the robustness of the similarity learning of the reversed image can be enhanced by carrying out image reversal on the initial image.
In one embodiment of the present application, the second set of sample images is a set of positive sample images, and the second set of training data further includes a plurality of sets of first negative sample images.
The second positive sample image set in the second training data set is determined by a manual labeling method, the second training data set may further include a negative sample image set, and the negative sample image set may be manually labeled or may be a first negative sample image set determined by a data augmentation method in the first training data set.
The first negative sample image set in the first training data set is used as the negative sample image set in the second training data set, so that the data volume of the training data can be improved, and the performance of the model can be improved.
In one embodiment of the present application, the plurality of second sample image sets includes at least one of a plurality of second positive sample image sets or a plurality of second negative sample image sets, and the similarity of the second positive sample image sets is less than or equal to a first threshold; the similarity of the second negative sample image set is greater than or equal to a second threshold;
wherein the similarity of the second sample image set is determined by the pre-trained neural network model.
The two images with the similarity smaller than or equal to the first threshold are actually similar, but are determined to be dissimilar based on the pre-trained neural network model, and the two images with the similarity larger than or equal to the second threshold are actually dissimilar, but are determined to be similar through the pre-trained neural network model. Therefore, for the pre-trained neural network model, the image pair with the similarity smaller than or equal to the first threshold and the image pair with the similarity larger than or equal to the second threshold are image pairs which are difficult to judge whether the images are similar or not, and then the image pair with the similarity smaller than or equal to the first threshold can be used as a second positive sample image set, the image pair with the similarity larger than or equal to the second threshold is used as a second negative sample image set, and the pre-trained neural network model is trained based on the second sample image set, so that the performance of the model can be further improved, and the model trained based on the second sample image set can accurately judge the similarity of the image pair with the similarity smaller than or equal to the first threshold and the image pair with the similarity larger than or equal to the second threshold.
In an embodiment of the present application, pre-training the initial neural network model based on the first training data set to obtain a pre-trained neural network model, may include:
repeatedly executing the following training steps until the pre-training loss value meets the pre-training end condition to obtain a pre-trained neural network model:
inputting each first sample image set into an initial neural network model, respectively extracting the image characteristics of two images in each first sample image set through the initial neural network model, and predicting to obtain the prediction similarity of the first sample image set based on the image characteristics of the two images;
determining a pre-training loss value according to the prediction similarity and the similarity marking result of each first sample image set;
if the pre-training loss value meets the pre-training ending condition, ending the pre-training; if not, adjusting the model parameters of the initial neural network model, and repeating the training steps.
In the model training process, the initial neural network model may be a model based on a twin neural network, and in order to further enhance understanding of the present disclosure, referring to the schematic network structure shown in fig. 3, in this example, taking an image pair in a first sample image set as an example, two images in the image pair are image x1 and image x2, respectively, the initial neural network model includes a first feature extraction layer (as an example, the feature extraction layer may be a convolutional neural network ConvNets), a first fully connected layer (full connected layer), a second feature extraction layer, a second fully connected layer fc, and a classification layer (as an example, the classification layer may also be a fully connected layer).
Inputting an image pair into an initial neural network model, extracting image features f1 of an image x1 through a first feature extraction layer, extracting image features f2 of an image x2 through a second feature extraction layer, inputting the image features f1 into a first full-connected layer, inputting image features f2 into a second full-connected layer, further processing the image features f1 and f2 through the full-connected layer, inputting the output of the first full-connected layer and the output of the second full-connected layer into a classification layer, and outputting the predicted similarity of the image x1 and the image x2 through the classification layer.
Through the same processing procedure, the prediction similarity of each first sample image set can be obtained, namely, whether any two images in the first sample image sets are similar or dissimilar, a pre-training loss value is determined based on the prediction similarity and the similarity marking result of each first sample image set, the pre-training loss value represents the difference between the prediction similarity and the similarity marking result, and for the first positive sample image set, the smaller the difference is, the closer the prediction similarity and the similarity marking result between the two images in the first positive sample image set is. Conversely, a larger difference indicates a larger difference between the predicted similarity and the similarity annotation result.
The similarity labeling result may be identified by a category label, for example, the category label is y, y equals 1 to indicate similarity, and y equals 0 to indicate dissimilarity.
In an alternative of the present application, the predicted similarity between the two images may be determined based on a characteristic distance (e.g., euclidean distance) between the two images, such as the image x1 and the image x2 in the above example, and the euclidean distance d between the image characteristics f1 and f2 is calculated, as shown in formula (1), and the degree of similarity between the image x1 and the image x2 is represented by the euclidean distance.
d=(f1-f2)2 (1)
In an alternative of the present application, the pre-training end condition may be configured based on the actual requirement, for example, the pre-training loss value is smaller than the first set threshold. And when the pre-training loss value is smaller than the first set threshold value, the pre-training loss value meets the pre-training ending condition, and the pre-training is ended. And when the pre-training loss value is not less than the first set threshold, indicating that the pre-training loss value does not meet the pre-training end condition, adjusting the model parameters of the initial neural network model, continuing training the adjusted model based on the training data until the obtained pre-training loss value meets the pre-training end condition, and ending the pre-training.
In an alternative of the present application, the pre-training termination condition may also be a convergence of a Loss function, for example, a contrast Loss function contrast Loss, as shown in formula (2), and the pre-training is terminated when the Loss function converges.
Wherein L is a value of a loss function (i.e., a training loss value) corresponding to one image pair in one first sample image set, y represents a similarity labeling result, i.e., a category label indicating whether two images in the first sample image set are similar (i.e., a similarity labeling result), y equals 1 to represent similarity, and y equals 0 to represent dissimilarity; m is a set threshold (margin value that constrains the negative sample to a range of feature distances), and in this example, m may be set to 1.
It should be noted that the above-mentioned loss function is for one image pair in the first sample image set, and for a plurality of image pairs, the loss function of the initial neural network model may be N × S × L, where S is the number of image pairs in one first sample image set, and N is the number of first sample image sets.
It can be understood that, in the solution of the present application, the pre-trained neural network model is trained based on the second training data set, and the obtained image similarity model may also be the same as the training process of the pre-trained neural network model obtained through the training.
Specifically, the following training steps are repeatedly executed until the training loss value meets the training end condition, so as to obtain an image similarity model:
inputting each second sample image set into a pre-trained neural network model, respectively extracting the image characteristics of two images in each second sample image set through the pre-trained neural network model, and predicting to obtain the prediction similarity of the second sample image set based on the image characteristics of the two images;
determining a training loss value according to the prediction similarity and the similarity marking result of each second sample image set;
if the training loss value meets the training ending condition, ending the training; if not, adjusting the model parameters of the neural network model after pre-training, and repeating the training steps.
The training end condition may be the same as or different from the pre-training end condition. For example, the training end condition is that the training loss value is smaller than the second set threshold. The first set threshold and the second set threshold may be the same or different.
In the following, the scheme of the present application is further described in detail with reference to the image processing method shown in fig. 4, and the method includes the following steps:
step S210, acquiring a plurality of initial images.
Wherein the initial image is selected to cover as many images of various scenes as possible.
Step S220, for each initial image, performing data augmentation processing on the initial image to obtain at least two sub-images corresponding to the initial image.
Wherein the data augmentation process comprises at least one of: image cutting; smearing treatment; fuzzy processing; color transformation; gray level transformation; rotating the image; and (5) image turning. Specifically, how to process the initial image based on the data augmentation processing is described above, and at least two sub-images corresponding to the initial image are obtained, which is not described herein again.
The data expansion processing method for each of the plurality of initial images is the same. And after each initial image is subjected to data amplification processing, at least two corresponding sub-images are obtained.
Step S230, obtaining a plurality of first positive sample image sets and similarity labeling results of the first positive sample image sets based on two sub-images belonging to the same initial image in the sub-images of each initial image.
After obtaining at least two sub-images of each initial image, determining a plurality of first positive sample image sets based on two sub-images belonging to the same initial image in the sub-images of each initial image, and performing similarity annotation on the first positive sample image sets, wherein the two sub-images belonging to the same initial image are similar, so that after performing augmentation processing on the initial image, the similarity annotation result of the first positive sample image set can be directly obtained based on two sub-images belonging to the same initial image in the sub-images of each initial image, and manual annotation is not needed.
Step S240, based on two sub-images belonging to different initial images in the sub-images of each initial image, obtaining a plurality of first negative sample image sets and similarity labeling results of the first negative sample image sets.
The two sub-images belonging to different initial images are dissimilar, so that after the initial image is subjected to augmentation processing, a similarity marking result of the first negative sample image set can be directly obtained based on the two sub-images belonging to the same initial image in the sub-images of the initial images, and manual marking is not needed.
And step S250, inputting each first sample image set into an initial neural network model, respectively extracting the image characteristics of two images in each first sample image set through the initial neural network model, and predicting to obtain the prediction similarity of the first sample image set based on the image characteristics of the two images.
The similarity of the first sample image set obtained by prediction based on the image features of the two images is described in the foregoing, and is not described herein again.
And step S260, determining a pre-training loss value according to the prediction similarity and the similarity marking result of each first sample image set.
Step S270, judging whether the pre-training loss value meets the pre-training ending condition.
If yes, execute steps SA 71-SA 72:
and step SA71, finishing the pre-training to obtain the pre-trained neural network model.
And step SA72, training the pre-trained neural network model based on the plurality of second positive sample image sets and the plurality of second negative sample image sets to obtain an image similarity model, wherein the similarity marking result of the second sample image set is a manual marking result.
In practical application, it is considered that some image pairs are difficult to judge whether the images are similar, for example, the image pair obtained by copying the same image, the image pair obtained by capturing the same image, and the typesetting of different images are equivalent, so that the image pairs can be used as a second positive sample image set pair, and the pre-trained neural network model is trained, so that the model has better robustness on the images difficult to judge whether the images are similar.
Wherein the similarity labeling result of the second positive sample image set is manually labeled. The second negative sample image set used for training the pre-trained neural network model may adopt the first negative sample image set described above, or may determine the similarity labeling result corresponding to the second negative sample image set and the second negative sample image set in a manual labeling manner.
If not, go to step B71: and adjusting model parameters of the initial neural network model, and repeatedly executing the steps S250 to S270 until the pre-training loss value meets the pre-training ending condition.
Based on the same principle as the method shown in fig. 1, the embodiment of the present application further provides an image processing method, which is described below with a server as an execution subject, and as shown in fig. 5, the method may include the following steps:
step S310, at least two images to be processed are obtained.
Step S320, processing at least two images to be processed by calling an image similarity model to obtain the similarity of each image pair in the at least two images to be processed, and processing the at least two images to be processed based on the similarity;
the image similarity model is obtained by the method in the foregoing.
The image pair in the at least two images to be processed refers to the image pair corresponding to any two images in the at least two images to be processed, and if the at least two images to be processed are two images, the image pair in the at least two images to be processed refers to the two images.
In practical application, the trained image similarity model may be stored, and when at least two images to be processed are obtained, the similarity of each image pair of the at least two images to be processed may be determined through the image similarity model, and then subsequent processing is performed on the at least two images to be processed based on the similarity, for example, classification processing is performed on similar image pairs, and deletion processing is performed on dissimilar image pairs.
According to the scheme of the application, the image similarity model trained based on the method described above has good robustness, and the similarity of the image pair can be accurately judged, so that the accuracy of the similarity can be improved when the similarity is determined based on the image similarity model.
In an embodiment of the present application, processing at least two images to be processed by calling an image similarity model to obtain a similarity of each image pair in the at least two images to be processed includes:
extracting the image characteristics of each image in at least two images to be processed through an image similarity model;
and determining the similarity of each image pair in the at least two images to be processed based on the image characteristics of each image.
Wherein the similarity of the image pair can be characterized by a characteristic distance (e.g., euclidean distance). The smaller the distance, the more similar the two images are, whereas the larger the distance, the less similar the two images are. The feature distance of each image pair in the at least two images to be processed can be determined based on the image features of each image, and the similarity of each image pair in the at least two images to be processed can be determined through the feature distance.
For better understanding of the present application, the following describes the present application with reference to a flow chart of an image processing method shown in fig. 6:
the first training data set is input to an initial neural network model, and the predicted similarity of each first sample image set in the first training data set is determined through the model.
And determining a pre-training loss value based on the similarity marking result of each first sample image set in the first training data set and the predicted similarity of each first sample image set.
If the pre-training loss value meets the pre-training end condition, obtaining a pre-trained neural network model; if not, adjusting the model parameters, and training the initial neural network model again until the pre-training loss value meets the pre-training end condition.
And inputting the second training data set into the pre-trained neural network model, and determining the prediction similarity of each second sample image set in the second training data set through the model.
And determining a training loss value based on the similarity marking result of each second sample image set in the second training data set and the predicted similarity of each second sample image set.
If the training loss value meets the training end condition, an image similarity model is obtained; if not, adjusting the model parameters, and retraining the pre-trained neural network model until the training loss value meets the training end condition.
In practical application, at least two images to be processed can be input into the image similarity model, and the similarity of each image pair in the at least two images to be processed is determined through the image similarity model.
In one embodiment of the present application, acquiring at least two images to be processed includes:
acquiring an image retrieval request, wherein the image retrieval request comprises a retrieval image;
acquiring an image database corresponding to the image retrieval request, wherein at least two images to be processed comprise a retrieval image and a retrieved image in the image database, and an image pair comprises the retrieval image and the retrieved image;
the method further comprises the following steps:
and according to the similarity of each image pair, determining a target image corresponding to the image retrieval request from the image database, and providing the target image for the retriever.
The retrieval image is an image to be retrieved, and may be an image including a target object, for example, an image including a piece of clothing (target object), and an image including a piece of shoes (target cashing).
The image retrieval request may be user initiated based on a terminal device of the user, which may include at least one of: smart phones, tablet computers, notebook computers, desktop computers, smart speakers, smart watches, smart televisions, and smart car-mounted devices.
The image database includes a retrieval image and a retrieved image, and a target image matching the retrieval image can be retrieved from the image database based on the retrieval image based on the image retrieval request. Specifically, the similarity between the retrieval image and each image in the image database can be determined through an image similarity model, and the target image corresponding to the image retrieval request is determined from the image database according to the similarity of each image pair.
The target image may be displayed through a terminal device of the searcher, where the terminal device may be operated by a client providing a picture display function, the client provides the picture display function, and a specific form of the client is not limited, for example: a media player, a browser, etc., and the client may be in the form of an application program or a web page, which is not limited herein.
In an embodiment of the present application, a target image corresponding to an image retrieval request is determined from an image database according to the similarity of each image pair, specifically, the similarity of each image pair may be ranked from high to low, and an image corresponding to the image pair with the highest similarity is selected as the target image.
Fig. 7 is a schematic diagram of an implementation environment of an image processing method according to an embodiment of the present application, where the implementation environment in this example may include, but is not limited to, a search server 101, a network 102, and a terminal device 103. The search server 101 can communicate with the terminal device 103 via the network 102, transmit the received image search request to the search server 101, and the search server 101 can transmit the searched target image to the terminal device 103 via the network.
The terminal device 103 includes a human-computer interaction screen 1031, a processor 1032 and a memory 1033. The man-machine interaction screen 1031 is used to display a target image. The memory 1033 is used for storing relevant data such as the retrieval image and the target image. The search server 101 includes a database 1011 and a processing engine 1012, and the processing engine 1012 can be used to train and obtain an image similarity model. The database 1011 is used for storing the trained image similarity model and the image database. The terminal device 103 may upload the image retrieval request to the retrieval server 101 through the network, and the processing engine 1012 in the retrieval server 101 may obtain an image database corresponding to the image retrieval request, determine a target image corresponding to the image retrieval request from the image database according to the similarity of each image pair, obtain a target image corresponding to the image retrieval request, and provide the target image for the terminal device 103 of the searcher to display.
The processing engine in the search server 101 has two main functions, the first function is to train and obtain an image similarity model, and the second function is to process an image search request based on the image similarity model and an image database to obtain a target image corresponding to the image search request (search function). It is understood that the above two functions can be implemented by two servers, referring to fig. 8, where the two servers are a training server 201 and a search server 202, respectively, the training server 201 is used for training to obtain an image similarity model, and the search server 202 is used for implementing a search function. The image database is stored in the retrieval server 202.
In practical applications, the two servers may communicate with each other, and after the training server 201 has trained the image similarity model, the image similarity model may be stored in the training server 201, or sent to the retrieval server 202. Alternatively, when the search server 202 needs to call the image similarity model, a model call request is sent to the training server 201, and the training server 201 sends the image similarity model to the search server 202 based on the request.
As an example, the terminal device 204 sends an image retrieval request to the retrieval server 202 through the network 203, the retrieval server 202 calls an image similarity model in the training server 201, and based on the image similarity model, the retrieval server 202 sends a retrieved target image to the terminal device 204 through the network 203 after completing a retrieval function, so that the terminal device 204 displays the target image.
In one embodiment of the present application, acquiring at least two images to be processed includes:
acquiring an image set to be processed, wherein at least two images to be processed are images in the image set to be processed, and the image pair is any two images in the image set to be processed;
the method further comprises the following steps:
and classifying the images in the image set to be processed according to the similarity of each image pair.
For the images in the to-be-processed image set, the images in the to-be-processed image set can be classified based on the similarity of each image pair, and one implementation manner is as follows: and classifying the images with the similarity meeting the preset conditions into one class, for example, classifying the images with the similarity larger than a first set value and smaller than a second set value into one class, classifying the images with the similarity not smaller than the second set value and smaller than a third set value into one class, and classifying the rest images with the similarity not smaller than the third set value in the image set to be processed into one class. The first set value is smaller than the second set value and smaller than the third set value, and the first set value, the second set value and the third set value can be configured based on actual demands.
According to an alternative scheme of the application, content recommendation can be performed according to the similarity of each image pair. For example, an image with a similarity greater than a set value is recommended to the user as an image to be recommended. The reference of the similarity between the images is wide, and the description is omitted, and the schemes related to determining the similarity between the two images can be determined by the image similarity model in the application.
Based on the same principle as the method shown in fig. 1, the embodiment of the present application further provides an image processing apparatus 40, as shown in fig. 9, the image processing apparatus 40 may include a training data obtaining module 410 and a model training module 420, wherein:
a training data obtaining module 410, configured to obtain a first training data set and a second training data set, where the first training data set includes a plurality of first sample image sets, the second training data set includes a plurality of second sample image sets, where the first sample image set and a similarity labeling result thereof are determined by performing data augmentation on each initial image, and the similarity labeling result of the second sample image set is a manual labeling result;
the model training module 420 is configured to pre-train the initial neural network model based on a first training data set to obtain a pre-trained neural network model; and training the pre-trained neural network model based on a second training data set to obtain an image similarity model, so as to determine the similarity of the image pair through the image similarity model.
In an embodiment of the present application, when the training data obtaining module 410 obtains the first training data set, it is specifically configured to:
acquiring a plurality of initial images;
for each initial image, performing data augmentation processing on the initial image to obtain at least two sub-images corresponding to the initial image;
obtaining a plurality of first positive sample image sets and similarity marking results of the first positive sample image sets based on two sub-images belonging to the same initial image in the sub-images of the initial images;
obtaining a plurality of first negative sample image sets and similarity marking results of the first negative sample image sets based on two sub-images belonging to different initial images in the sub-images of the initial images;
wherein the plurality of first sample image sets includes a plurality of first positive sample image sets and a plurality of first negative sample image sets.
In one embodiment of the present application, the second set of sample images is a set of positive sample images, and the second set of training data further includes a plurality of sets of first negative sample images.
In one embodiment of the present application, the plurality of second sample image sets includes at least one of a plurality of second positive sample image sets or a plurality of second negative sample image sets, and the similarity of the second positive sample image sets is less than or equal to a first threshold; the similarity of the second negative sample image set is greater than or equal to a second threshold;
wherein the first threshold is not less than the second threshold, and the similarity of the second sample image set is determined by the pre-trained neural network model.
In one embodiment of the present application, the data augmentation process includes at least one of:
image cutting; smearing treatment; fuzzy processing; color transformation; gray level transformation; rotating the image; and (5) image turning.
In an embodiment of the present application, the model training module 420 is specifically configured to, when pre-training the initial neural network model based on the first training data set to obtain a pre-trained neural network model:
repeatedly executing the following training steps until the pre-training loss value meets the pre-training end condition to obtain a pre-trained neural network model:
inputting each first sample image set into an initial neural network model, respectively extracting the image characteristics of two images in each first sample image set through the initial neural network model, and predicting to obtain the prediction similarity of the first sample image set based on the image characteristics of the two images;
determining a pre-training loss value according to the prediction similarity and the similarity marking result of each first sample image set;
if the pre-training loss value meets the pre-training ending condition, ending the pre-training; if not, adjusting the model parameters of the initial neural network model, and repeating the training steps.
Based on the same principle as the method shown in fig. 5, the embodiment of the present application further provides an image processing apparatus 50, as shown in fig. 10, the image processing apparatus 50 may include an image acquisition module 510 and an image processing module 520, where:
an image obtaining module 510, configured to obtain at least two images to be processed;
the image processing module 520 is configured to process the at least two images to be processed by calling the image similarity model, obtain a similarity of each image pair of the at least two images to be processed, and process the at least two images to be processed based on the similarity;
the image similarity model is obtained by the method in the foregoing.
In an embodiment of the present application, when the image obtaining module obtains at least two images to be processed, the image obtaining module is specifically configured to:
acquiring an image retrieval request, wherein the image retrieval request comprises a retrieval image;
acquiring an image database corresponding to the image retrieval request, wherein at least two images to be processed comprise a retrieval image and a retrieved image in the image database, and an image pair comprises the retrieval image and the retrieved image;
the device also includes:
and the image retrieval module is used for determining a target image corresponding to the image retrieval request from the image database according to the similarity of each image pair and providing the target image for a retriever.
In an embodiment of the present application, when the image obtaining module obtains at least two images to be processed, the image obtaining module is specifically configured to:
acquiring an image set to be processed, wherein at least two images to be processed are images in the image set to be processed, and the image pair is any two images in the image set to be processed;
the device also includes:
and the image classification module is used for classifying the images in the image set to be processed according to the similarity of each image pair.
The image processing apparatus according to the embodiment of the present application can execute the image processing method provided by the embodiment of the present application, and the implementation principle is similar, the actions executed by each module and unit in the image processing apparatus according to the embodiments of the present application correspond to the steps in the image processing method according to the embodiments of the present application, and for the detailed functional description of each module of the image processing apparatus, reference may be specifically made to the description in the corresponding image processing method shown in the foregoing, and details are not repeated here.
Wherein the image processing apparatus may be a computer program (including program code) running in a computer device, for example, the image processing apparatus is an application software; the apparatus may be used to perform the corresponding steps in the methods provided by the embodiments of the present application.
In some embodiments, the image processing apparatus provided by the embodiments of the present invention may be implemented by combining hardware and software, and by way of example, the image processing apparatus provided by the embodiments of the present invention may be a processor in the form of a hardware decoding processor, which is programmed to execute the image processing method provided by the embodiments of the present invention, for example, the processor in the form of the hardware decoding processor may be one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.
In other embodiments, the image processing apparatus provided in the embodiments of the present invention may be implemented in software, and fig. 9 illustrates the image processing apparatus stored in the memory, which may be software in the form of programs and plug-ins, and includes a series of modules, including a training data obtaining module 410 and a model training module 420, for implementing the image processing method provided in the embodiments of the present invention.
Based on the same principle as the method shown in the embodiments of the present application, there is also provided in the embodiments of the present application an electronic device, which may include but is not limited to: a processor and a memory; a memory for storing a computer program; and the processor is used for executing the image processing method shown in any embodiment of the application by calling the computer program.
According to the image processing method, when the image similarity model used for determining the similarity of the image pair is obtained, the first sample image set in the first training data set of the model and the similarity marking result of the first sample image set are automatically determined by data augmentation of each initial image, so that a large number of training samples with the similarity marking result can be generated based on a data augmentation mode, and data support is provided for training of the model. Furthermore, the scheme of the embodiment of the application also provides a second training data set of the model, and as the second sample image set in the second training data set and the similarity marking result thereof are manually marked, the manually marked similarity marking result is more accurate, and therefore, the image similarity model obtained by training based on the second training data set has better performance.
In an alternative embodiment, an electronic device is provided, as shown in fig. 11, the electronic device 4000 shown in fig. 11 comprising: a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Optionally, the electronic device 4000 may further include a transceiver 4004, and the transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data. In addition, the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.
The Processor 4001 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 4001 may also be a combination that performs a computational function, including, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
The Memory 4003 may be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, a RAM (Random Access Memory) or other types of dynamic storage devices that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.
The memory 4003 is used for storing application program codes (computer programs) for executing the present scheme, and is controlled by the processor 4001 to execute. Processor 4001 is configured to execute application code stored in memory 4003 to implement what is shown in the foregoing method embodiments.
The electronic device may also be a terminal device, and the electronic device shown in fig. 11 is only an example, and should not bring any limitation to the functions and the application range of the embodiment of the present application.
The image processing method provided by the application can also be realized in a cloud computing mode, wherein the cloud computing mode refers to a delivery and use mode of an IT infrastructure, and refers to that required resources are obtained in an on-demand and easily-expandable mode through a network; the generalized cloud computing refers to a delivery and use mode of a service, and refers to obtaining a required service in an on-demand and easily-extensible manner through a network. Such services may be IT and software, internet related, or other services. Cloud Computing is a product of development and fusion of traditional computers and Network Technologies, such as Grid Computing (Grid Computing), distributed Computing (distributed Computing), Parallel Computing (Parallel Computing), Utility Computing (Utility Computing), Network Storage (Network Storage Technologies), Virtualization (Virtualization), Load balancing (Load Balance), and the like.
With the development of diversification of internet, real-time data stream and connecting equipment and the promotion of demands of search service, social network, mobile commerce, open collaboration and the like, cloud computing is rapidly developed. Different from the prior parallel distributed computing, the generation of cloud computing can promote the revolutionary change of the whole internet mode and the enterprise management mode in concept.
The image processing method provided by the application can also be realized through an artificial intelligence cloud Service, which is generally called as AI as a Service (AI as a Service in chinese). The method is a service mode of an artificial intelligence platform, and particularly, the AIaaS platform splits several types of common AI services and provides independent or packaged services at a cloud. This service model is similar to the one opened in an AI theme mall: all developers can access one or more artificial intelligence services provided by the platform through an API (application programming interface), and part of the qualified developers can also use an AI framework and an AI infrastructure provided by the platform to deploy and operate and maintain the self-dedicated cloud artificial intelligence services. In the present application, the image processing method provided by the present application may be implemented by using an AI framework and an AI infrastructure provided by a platform.
The present application provides a computer-readable storage medium, on which a computer program is stored, which, when running on a computer, enables the computer to execute the corresponding content in the foregoing method embodiments.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The computer readable storage medium provided by the embodiments of the present application may be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer-readable storage medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the methods shown in the above embodiments.
According to another aspect of the application, there is also provided a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the image processing method provided in the various embodiment implementation manners described above.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present application may be implemented by software or hardware. Wherein the name of a module in some cases does not constitute a limitation on the module itself.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the disclosure. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.
Claims (13)
1. An image processing method, comprising:
acquiring a first training data set and a second training data set, wherein the first training data set comprises a plurality of first sample image sets, the second training data set comprises a plurality of second sample image sets, the first sample image sets and similarity marking results thereof are determined by performing data augmentation on each initial image, and the similarity marking results of the second sample image sets are manual marking results;
pre-training an initial neural network model based on the first training data set to obtain a pre-trained neural network model;
and training the pre-trained neural network model based on the second training data set to obtain an image similarity model, and determining the similarity of the image pair through the image similarity model.
2. The method of claim 1, wherein the obtaining a first training data set comprises:
acquiring a plurality of initial images;
for each initial image, performing data augmentation processing on the initial image to obtain at least two sub-images corresponding to the initial image;
obtaining a plurality of first positive sample image sets and similarity marking results of the first positive sample image sets based on two sub-images belonging to the same initial image in the sub-images of the initial images;
obtaining a plurality of first negative sample image sets and similarity marking results of the first negative sample image sets based on two sub-images belonging to different initial images in the sub-images of the initial images;
wherein the plurality of first sample image sets includes the plurality of first positive sample image sets and the plurality of first negative sample image sets.
3. The method of claim 2, wherein the second set of sample images is a positive set of sample images, and wherein the second set of training data further comprises a plurality of the first negative set of sample images.
4. The method of any one of claims 1 to 3, wherein the plurality of second sample image sets comprises at least one of a plurality of second positive sample image sets or a plurality of second negative sample image sets, the second positive sample image sets having a similarity less than or equal to a first threshold; the similarity of the second negative sample image set is greater than or equal to a second threshold;
wherein the similarity of the second sample image set is determined by the pre-trained neural network model.
5. The method of any of claims 1 to 3, wherein the data augmentation process comprises at least one of:
image cutting; smearing treatment; fuzzy processing; color transformation; gray level transformation; rotating the image; and (5) image turning.
6. The method of any one of claims 1 to 3, wherein the pre-training an initial neural network model based on the first training data set, resulting in a pre-trained neural network model, comprises:
repeatedly executing the following training steps until the pre-training loss value meets the pre-training end condition to obtain the pre-trained neural network model:
inputting each first sample image set into an initial neural network model, respectively extracting the image characteristics of two images in each first sample image set through the initial neural network model, and predicting to obtain the prediction similarity of the first sample image set based on the image characteristics of the two images;
determining a pre-training loss value according to the prediction similarity and the similarity marking result of each first sample image set;
if the pre-training loss value meets a pre-training ending condition, ending the pre-training; if not, adjusting the model parameters of the initial neural network model, and repeating the training step.
7. An image processing method, comprising:
acquiring at least two images to be processed;
processing the at least two images to be processed by calling an image similarity model to obtain the similarity of each image pair in the at least two images to be processed, and processing the at least two images to be processed based on the similarity;
wherein the image similarity model is obtained by the method of any one of claims 1-6.
8. The method of claim 7, wherein the acquiring at least two images to be processed comprises:
acquiring an image retrieval request, wherein the image retrieval request comprises a retrieval image;
acquiring an image database corresponding to the image retrieval request, wherein the at least two images to be processed comprise the retrieved image and a retrieved image in the image database, and the image pair comprises the retrieved image and a retrieved image;
the method further comprises the following steps:
and determining a target image corresponding to the image retrieval request from the image database according to the similarity of each image pair, and providing the target image for a retriever.
9. The method of claim 7, wherein the acquiring at least two images to be processed comprises:
acquiring a to-be-processed image set, wherein the at least two to-be-processed images are images in the to-be-processed image set, and the image pair is any two images in the to-be-processed image set;
the method further comprises the following steps:
and classifying the images in the image set to be processed according to the similarity of each image pair.
10. An image processing apparatus characterized by comprising:
a training data obtaining module, configured to obtain a first training data set and a second training data set, where the first training data set includes a plurality of first sample image sets, and the second training data set includes a plurality of second sample image sets, where the first sample image set and a similarity labeling result thereof are determined by performing data augmentation on each initial image, and the similarity labeling result of the second sample image set is a manual labeling result;
the model training module is used for pre-training the initial neural network model based on the first training data set to obtain a pre-trained neural network model; and training the pre-trained neural network model based on the second training data set to obtain an image similarity model, so as to determine the similarity of the image pair through the image similarity model.
11. An image processing apparatus characterized by comprising:
the image acquisition module is used for acquiring at least two images to be processed;
the image processing module is used for processing the at least two images to be processed by calling an image similarity model to obtain the similarity of each image pair in the at least two images to be processed so as to process the at least two images to be processed based on the similarity;
wherein the image similarity model is obtained by the method of any one of claims 1-6.
12. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1-9 when executing the program.
13. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method of any one of claims 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110261801.3A CN113704531A (en) | 2021-03-10 | 2021-03-10 | Image processing method, image processing device, electronic equipment and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110261801.3A CN113704531A (en) | 2021-03-10 | 2021-03-10 | Image processing method, image processing device, electronic equipment and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113704531A true CN113704531A (en) | 2021-11-26 |
Family
ID=78647761
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110261801.3A Pending CN113704531A (en) | 2021-03-10 | 2021-03-10 | Image processing method, image processing device, electronic equipment and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113704531A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114299304A (en) * | 2021-12-15 | 2022-04-08 | 腾讯科技(深圳)有限公司 | Image processing method and related equipment |
CN114330512A (en) * | 2021-12-13 | 2022-04-12 | 腾讯科技(深圳)有限公司 | Data processing method and device, electronic equipment and computer readable storage medium |
CN114448664A (en) * | 2021-12-22 | 2022-05-06 | 深信服科技股份有限公司 | Phishing webpage identification method and device, computer equipment and storage medium |
CN114548261A (en) * | 2022-02-18 | 2022-05-27 | 北京百度网讯科技有限公司 | Data processing method, data processing device, electronic equipment and storage medium |
CN114580631A (en) * | 2022-03-04 | 2022-06-03 | 北京百度网讯科技有限公司 | Model training method, smoke and fire detection method, device, electronic equipment and medium |
CN114841328A (en) * | 2022-05-31 | 2022-08-02 | 北京达佳互联信息技术有限公司 | Model training and image processing method and device, electronic equipment and storage medium |
CN115100717A (en) * | 2022-06-29 | 2022-09-23 | 腾讯科技(深圳)有限公司 | Training method of feature extraction model, and cartoon object recognition method and device |
CN115761529A (en) * | 2023-01-09 | 2023-03-07 | 阿里巴巴(中国)有限公司 | Image processing method and electronic device |
CN116091874A (en) * | 2023-04-10 | 2023-05-09 | 成都数之联科技股份有限公司 | Image verification method, training method, device, medium, equipment and program product |
CN117726836A (en) * | 2023-08-31 | 2024-03-19 | 荣耀终端有限公司 | Training method of image similarity model, image capturing method and electronic equipment |
-
2021
- 2021-03-10 CN CN202110261801.3A patent/CN113704531A/en active Pending
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114330512A (en) * | 2021-12-13 | 2022-04-12 | 腾讯科技(深圳)有限公司 | Data processing method and device, electronic equipment and computer readable storage medium |
CN114330512B (en) * | 2021-12-13 | 2024-04-26 | 腾讯科技(深圳)有限公司 | Data processing method, device, electronic equipment and computer readable storage medium |
CN114299304B (en) * | 2021-12-15 | 2024-04-12 | 腾讯科技(深圳)有限公司 | Image processing method and related equipment |
CN114299304A (en) * | 2021-12-15 | 2022-04-08 | 腾讯科技(深圳)有限公司 | Image processing method and related equipment |
CN114448664A (en) * | 2021-12-22 | 2022-05-06 | 深信服科技股份有限公司 | Phishing webpage identification method and device, computer equipment and storage medium |
CN114448664B (en) * | 2021-12-22 | 2024-01-02 | 深信服科技股份有限公司 | Method and device for identifying phishing webpage, computer equipment and storage medium |
CN114548261A (en) * | 2022-02-18 | 2022-05-27 | 北京百度网讯科技有限公司 | Data processing method, data processing device, electronic equipment and storage medium |
CN114580631B (en) * | 2022-03-04 | 2023-09-08 | 北京百度网讯科技有限公司 | Model training method, smoke and fire detection method, device, electronic equipment and medium |
CN114580631A (en) * | 2022-03-04 | 2022-06-03 | 北京百度网讯科技有限公司 | Model training method, smoke and fire detection method, device, electronic equipment and medium |
CN114841328A (en) * | 2022-05-31 | 2022-08-02 | 北京达佳互联信息技术有限公司 | Model training and image processing method and device, electronic equipment and storage medium |
CN115100717A (en) * | 2022-06-29 | 2022-09-23 | 腾讯科技(深圳)有限公司 | Training method of feature extraction model, and cartoon object recognition method and device |
CN115761529B (en) * | 2023-01-09 | 2023-05-30 | 阿里巴巴(中国)有限公司 | Image processing method and electronic device |
CN115761529A (en) * | 2023-01-09 | 2023-03-07 | 阿里巴巴(中国)有限公司 | Image processing method and electronic device |
CN116091874A (en) * | 2023-04-10 | 2023-05-09 | 成都数之联科技股份有限公司 | Image verification method, training method, device, medium, equipment and program product |
CN117726836A (en) * | 2023-08-31 | 2024-03-19 | 荣耀终端有限公司 | Training method of image similarity model, image capturing method and electronic equipment |
CN117726836B (en) * | 2023-08-31 | 2024-10-15 | 荣耀终端有限公司 | Training method of image similarity model, image capturing method and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113704531A (en) | Image processing method, image processing device, electronic equipment and computer readable storage medium | |
CN111898696A (en) | Method, device, medium and equipment for generating pseudo label and label prediction model | |
CN111476309A (en) | Image processing method, model training method, device, equipment and readable medium | |
CN113011282A (en) | Graph data processing method and device, electronic equipment and computer storage medium | |
CN111275784B (en) | Method and device for generating image | |
CN111563502A (en) | Image text recognition method and device, electronic equipment and computer storage medium | |
CN115457531A (en) | Method and device for recognizing text | |
CN112801047B (en) | Defect detection method and device, electronic equipment and readable storage medium | |
CN113822951A (en) | Image processing method, image processing device, electronic equipment and storage medium | |
CN115205150A (en) | Image deblurring method, device, equipment, medium and computer program product | |
CN113011320B (en) | Video processing method, device, electronic equipment and storage medium | |
CN113344794A (en) | Image processing method and device, computer equipment and storage medium | |
CN116434033A (en) | Cross-modal contrast learning method and system for RGB-D image dense prediction task | |
CN114581710A (en) | Image recognition method, device, equipment, readable storage medium and program product | |
CN113610034B (en) | Method and device for identifying character entities in video, storage medium and electronic equipment | |
CN114610677B (en) | Determination method and related device of conversion model | |
Guo et al. | UDTIRI: An online open-source intelligent road inspection benchmark suite | |
US20230072445A1 (en) | Self-supervised video representation learning by exploring spatiotemporal continuity | |
CN111783734B (en) | Original edition video recognition method and device | |
US9886652B2 (en) | Computerized correspondence estimation using distinctively matched patches | |
CN117292122A (en) | RGB-D significance object detection and semantic segmentation method and system | |
CN115168609A (en) | Text matching method and device, computer equipment and storage medium | |
CN116310615A (en) | Image processing method, device, equipment and medium | |
Orhei | Urban landmark detection using computer vision | |
CN115294333B (en) | Image processing method, related device, storage medium and program product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |