CN114565780A

CN114565780A - Target identification method and device, electronic equipment and storage medium

Info

Publication number: CN114565780A
Application number: CN202210175862.2A
Authority: CN
Inventors: 孟海秀; 孙明; 王振龙; 姚星星; 贾冬冬; 任涛林
Original assignee: Haier Digital Technology Qingdao Co Ltd; Haier Caos IoT Ecological Technology Co Ltd; Qingdao Haier Industrial Intelligence Research Institute Co Ltd
Current assignee: Haier Digital Technology Qingdao Co Ltd; Haier Caos IoT Ecological Technology Co Ltd; Qingdao Haier Industrial Intelligence Research Institute Co Ltd
Priority date: 2022-02-25
Filing date: 2022-02-25
Publication date: 2022-05-31

Abstract

The invention discloses a target identification method, a target identification device, electronic equipment and a storage medium. The method comprises the following steps: acquiring an image to be detected, and determining a target training sample set corresponding to the image to be detected according to source information of the image to be detected; building a training network according to the decoupling detection head, and training the training network according to the initial data set to obtain an initial recognition model; training the initial recognition model according to the target training sample set to obtain a target recognition model; and inputting the image to be detected into the target recognition model for target recognition to obtain a detection result of the image to be detected. Namely, according to the embodiment of the invention, the convergence rate of the model is improved from the network structure and the data source through the establishment of the training network and the determination of the target training sample; and carrying out initial training on the training network through an initial data set to obtain an initial recognition model, training different targets according to a target training sample set, and setting the convergence speed of the improved model from the training step.

Description

Target identification method and device, electronic equipment and storage medium

Technical Field

Embodiments of the present invention relate to computer technologies, and in particular, to a target identification method and apparatus, an electronic device, and a storage medium.

Background

With the rapid development of computer vision research, target detection has an indispensable role in the fields of industrial manufacturing intelligence direction, artificial intelligence information technology and the like. The image recognition is mainly used for judging whether the object instances of the predefined types exist in the target image or not and positioning the spatial position and range of the target to be recognized through the area frame. And when the target exists in the target image, returning the spatial position and the range of the identified target as an identification result. Safety inspection in industrial manufacturing requires manual observation, and inspection based on vision is necessary because of the high intensity of work, which easily causes danger due to fatigue of human beings. The existing target detection method may generate a large number of prior frames containing objects to be detected, then a classifier is used for judging whether each prior frame contains the objects to be detected and the confidence coefficient of the class of the objects, meanwhile, the boundary frames need to be corrected by post-processing, and finally, the boundary frames with low confidence coefficient and high overlapping degree are filtered out based on some criteria to obtain a detection result. Although the method has relatively high detection accuracy, the running speed is slow.

Disclosure of Invention

The invention provides a target identification method, a target identification device, electronic equipment and a storage medium, which are used for rapidly obtaining an identification model based on a region extraction algorithm so as to conveniently identify different targets.

In a first aspect, an embodiment of the present invention provides a target identification method, where the method includes:

acquiring an image to be detected, and determining a target training sample set corresponding to the image to be detected according to source information of the image to be detected;

building a training network according to the decoupling detection head, and training the training network according to an initial data set to obtain an initial recognition model;

training the initial recognition model according to the target training sample set to obtain a target recognition model;

and inputting the image to be detected into the target recognition model for target recognition to obtain a detection result of the image to be detected.

Further, determining a target training sample set corresponding to the image to be detected according to the source information of the image to be detected, including:

determining a detection target according to the source information of the image to be detected;

determining a target training sample set corresponding to the image to be detected from a training database according to the detection target;

and carrying out sample sorting on the target training sample set to obtain a target label corresponding to the target training sample set.

Further, a training network is built according to the decoupling detection head, and the method comprises the following steps:

adding a data enhancement module for feature enhancement at the input end of the network, and adding the decoupling detection head for target detection in the rest network;

and adding a spatial pyramid pooling module for image normalization processing behind the backbone network, thereby obtaining the training network.

Further, the decoupling detection head comprises a convolution network, a classification detection head, a regression detection head and a confidence level detection head;

the convolutional network is used for carrying out feature dimensionality reduction, the classification detection head is used for carrying out target classification, the regression detection head is used for carrying out position identification, and the confidence detection head is used for determining the accuracy of the classification detection head and the accuracy of the regression detection head.

Further, training the training network according to the initial data set to obtain an initial recognition model, including:

training the training network according to the target detection data in the initial data set to obtain initial parameters corresponding to the training network;

and updating the initial parameters into the training network to obtain the initial training model.

Further, training the initial recognition model according to the target training sample set to obtain a target recognition model, including:

dividing the target training sample set into data of a first training period and data of a second training period according to a preset training period;

and training the initial recognition model according to the data of the first training period and the data of the second training period to obtain the target recognition model.

Further, training the initial recognition model according to the data of the first training period and the data of the second training period to obtain the target recognition model, including:

freezing initial parameters of a trunk network in the initial recognition model, and training the initial recognition model by using the data of the first training period to obtain a first training model;

and unfreezing the initial parameters of the trunk network in the first training model, and training the first training model by using the data of the second training period to obtain the target recognition model.

In a second aspect, an embodiment of the present invention further provides an object recognition apparatus, where the apparatus includes:

the sample determining module is used for acquiring an image to be detected and determining a target training sample set corresponding to the image to be detected according to the source information of the image to be detected;

the network building module is used for building a training network according to the decoupling detection head and training the training network according to an initial data set to obtain an initial recognition model;

the model training module is used for training the initial recognition model according to the target training sample set to obtain a target recognition model;

and the image detection module is used for inputting the image to be detected into the target recognition model for target recognition to obtain the detection result of the image to be detected.

In a third aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the object recognition method.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the object recognition method.

According to the embodiment of the invention, a target training sample set corresponding to an image to be detected is determined by acquiring the image to be detected and according to the source information of the image to be detected; building a training network according to the decoupling detection head, and training the training network according to the initial data set to obtain an initial recognition model; training the initial recognition model according to the target training sample set to obtain a target recognition model; and inputting the image to be detected into a target recognition model for target recognition to obtain a detection result of the image to be detected. Namely, according to the embodiment of the invention, the convergence rate of the model is improved from the network structure and the data source through the establishment of the training network and the determination of the target training sample; and carrying out initial training on the training network through the initial data set to obtain an initial recognition model, then training different targets according to the target training sample set, and setting the convergence rate of the improved model from the training step.

Drawings

FIG. 1 is a schematic flow chart of a target identification method according to an embodiment of the present invention;

FIG. 2 is another schematic flow chart of a target identification method according to an embodiment of the present invention;

fig. 2A is a schematic structural diagram of a training network according to an embodiment of the present invention;

fig. 2B is a schematic structural diagram of a decoupling detection head according to an embodiment of the present invention;

FIG. 2C is a schematic diagram of a training process of a target recognition model according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of an object recognition apparatus according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Fig. 1 is a schematic flowchart of a target identification method according to an embodiment of the present invention, where the method may be executed by a target identification apparatus according to an embodiment of the present invention, and the apparatus may be implemented in software and/or hardware. In a particular embodiment, the apparatus may be integrated in an electronic device, which may be, for example, a server. The following embodiments will be described by taking as an example that the apparatus is integrated in an electronic device, and referring to fig. 1, the method may specifically include the following steps:

s110, acquiring an image to be detected, and determining a target training sample set corresponding to the image to be detected according to source information of the image to be detected;

for example, the image to be detected may be from an image capturing device, and the image capturing device may be a camera, a video recorder, or other device with an image capturing function installed in the region to be detected, where the region to be detected may be a capturing region for capturing the image to be detected. When the detected target is the target of emergency processing, acquiring images in the area to be detected in real time, wherein the images to be detected can be the latest images in the area to be detected, detecting whether the target exists in the area to be detected by using the images to be detected, and if the target exists in the images to be detected, performing emergency processing; when the detection target is a non-emergency target, the image in the region to be detected can be acquired within a preset time period, and if the detection target appears in the region to be detected by using the image to be detected, the operation and adjustment can be performed according to the appearance time and the frequency of the detection target. The source information of the image to be detected can be enterprise or unit information for obtaining the image to be detected, and targets in the same image can be detected to be possibly different according to different working attributes of different enterprises or units. The target training sample set corresponding to the image to be detected can be different target training sample sets corresponding to different detection targets, and the target mark positions and the target positive and negative sample marks in the target training sample sets corresponding to different detection targets are different.

In the specific implementation, after the image to be detected is acquired from the area to be detected according to the image acquisition equipment, the detection target corresponding to the image to be detected can be determined according to the source information of the image to be detected, and then the target training sample set corresponding to the image to be detected is determined according to the detection target corresponding to the image to be detected; each image in the target training sample set corresponding to the image to be detected has a mark position of the detected target, positive and negative samples can be distinguished according to a positive and negative sample classification method, and positive and negative sample labels are marked, so that the model can be trained according to the target training sample set to obtain a target identification model.

S120, building a training network according to the decoupling detection head, and training the training network according to the initial data set to obtain an initial recognition model;

for example, after the dimension reduction operation, the decoupling detection head may be connected with different detection branches and detection heads for detecting the object type, location and confidence in the image, so that the detection effect is improved, the object detection speed can be improved, and the increase of the calculation amount is avoided. The training network can be a neural network which is set up according to requirements and used for training a target recognition model, the training network comprises a backbone network and a residual network, and the residual network is internally provided with a decoupling detection head. The initial data set may be understood as a standard target training data set, such as: ImageNet dataset and COCO dataset. The initial recognition model may be a model trained from class labels of images in a standard target training dataset with certain recognition capabilities.

In specific implementation, the training network comprises a backbone network and a residual network, wherein the backbone network is used for extracting target features in the image, and the residual network is used for performing feature recognition and position information prediction on the target features in the image. The residual network comprises a decoupling detection head for detecting the object type, location and confidence degree in the image, and the object detection speed can be improved while the detection effect is improved. After the training network is built, the training network is trained through the initial data set until the model converges to obtain an initial recognition model, and more than 80 or more than 80 types of targets in the standard type can be recognized and used as a basic model of the target recognition model. And carrying out targeted training through the target training sample on the basis of the basic model to obtain a required target recognition model.

S130, training the initial recognition model according to the target training sample set to obtain a target recognition model;

in specific implementation, images in a target training sample set are input into an initial recognition model for target recognition, and the output of the initial recognition model can be the predicted position information of a detected target in each image and the confidence corresponding to the predicted position information. The confidence corresponding to the predicted position information may be a confidence determined by detecting the coincidence degree of the predicted position information and the actual position information of the image in the target training sample set. And a learning correction function can be set in the initial recognition model, and the degree of model training is determined by using the confidence corresponding to the predicted position information.

And S140, inputting the image to be detected into the target recognition model for target recognition to obtain a detection result of the image to be detected.

In a specific implementation, the detection result of the image to be detected may be an output result of the target recognition model obtained by inputting each image to be detected into the target recognition model for target recognition, where the result includes detection position information and confidence of the detected target in the image to be detected. In addition, the target recognition model can be a deep neural network model, and is obtained based on a pre-trained initial recognition model.

According to the embodiment of the invention, a target training sample set corresponding to an image to be detected is determined by acquiring the image to be detected and according to the source information of the image to be detected; building a training network according to the decoupling detection head, and training the training network according to the initial data set to obtain an initial recognition model; training the initial recognition model according to the target training sample set to obtain a target recognition model; and inputting the image to be detected into a target recognition model for target recognition to obtain a detection result of the image to be detected. Namely, according to the embodiment of the invention, the convergence rate of the model is improved from the network structure and the data source through the establishment of the training network and the determination of the target training sample; and carrying out initial training on the training network through an initial data set to obtain an initial recognition model, training different targets according to a target training sample set, and setting the convergence speed of the improved model from the training step.

The target identification method provided by the embodiment of the present invention is further described below, and as shown in fig. 2, the method may specifically include the following steps:

s210, acquiring an image to be detected, and determining a target training sample set corresponding to the image to be detected according to source information of the image to be detected;

determining a target training sample set corresponding to an image to be detected from a training database according to the detected target;

For example, the detection target may be an object to be detected in the image to be detected, and may be determined according to the source information of the image to be detected, for example: the method comprises the steps of detecting target air gun flame in an industrial environment, and identifying the use condition of the air gun flame. The source information of the image to be detected may be a detection target corresponding to an enterprise manufacturing object in the manufacturing industry, such as: the use of the object during the manufacturing process or whether there is a hazardous material in a particular area. The training database is used for storing training samples corresponding to different detection targets. The target label corresponding to the target training sample set may be a label of a positive sample and a negative sample corresponding to the target label in each image in the target training sample set.

In the specific implementation, when the image to be detected is obtained, the source information of the image to be detected is obtained at the same time. And determining a detection target corresponding to the image to be detected according to the source information of the image to be detected. And matching out a target training sample set corresponding to the image to be detected from the training database according to the detected target. And carrying out sample sorting on the target training sample set by using a sorting algorithm of positive and negative samples to obtain a target label corresponding to the target training sample set. The sorting algorithm of the divided samples can be a simOTA algorithm for screening an anchor frame of a target training sample set, dividing samples in the target training sample set into positive and negative samples, and training an initial recognition model according to the positive samples to obtain target recognition samples. In addition, the target training sample set may include a training sample set and a testing sample set, and the corresponding sample set proportion may be set through actual requirements and experimental data, for example, the proportion of 7:3 is divided into the training set and the testing set, so as to prevent an overfitting phenomenon in the training process.

S220, building a training network according to the decoupling detection head, and training the training network according to the initial data set to obtain an initial recognition model;

In specific implementation, the training network comprises a backbone network and a residual network, wherein the backbone network is used for extracting image features, and the residual network is used for identifying the image features and determining the predicted position information and the confidence coefficient of the predicted position information of the detection target in the image to be detected. A spatial pyramid pooling module for image normalization processing is added behind a backbone network, so that the size of the network output is the same without preprocessing the image size, and different sizes of the same image can be used as input to obtain pooling features with the same length. The space gold tower pooling module comprises a maximum pooling layer, a connection function for connecting a plurality of matrixes, convolution layers, normalization layers and residual components.

In the embodiment of the invention, a data enhancement module for feature enhancement is added at the input end of the network, wherein the data enhancement module can be a Mosaic algorithm or a Mixup algorithm. The Mosaic splices images in a random zooming, random cutting and random arrangement mode and is used for improving the detection effect of small targets; the Mixup presents the area between the samples as linearity, so that the memory of error labels can be reduced, and the robustness is increased. The data enhancement module is used for enhancing the data of the image data input by the training network, so that the over-fitting phenomenon in the training process is prevented. In the backbone network, an anchor-free and simOTA algorithm can be added to screen a prediction frame which can be intersected with the module in the target identification process, so that accurate prediction is carried out, and the calculation amount in the prediction process is reduced. In addition, the backbone network may be a Yolox-Darknet53 architecture.

Fig. 2A is a schematic structural diagram of a training network according to an embodiment of the present invention, and as shown in fig. 2A, a data enhancement module is added at an input end of the network to enhance features of an image input into the network, and then the image is input into a backbone network to perform feature extraction, so as to obtain a feature map of the image. Inputting the feature map into a spatial pyramid pooling module for normalization processing to obtain pooling features with the same length, inputting the pooling features into a residual network for image feature detection, and predicting position information and confidence of a detection target.

Furthermore, the decoupling detection head comprises a convolution network, a classification detection head, a regression detection head and a confidence level detection head;

the system comprises a convolution network, a classification detection head, a regression detection head and a confidence detection head, wherein the convolution network is used for carrying out feature dimension reduction, the classification detection head is used for carrying out target classification, the regression detection head is used for carrying out position identification, and the confidence detection head is used for determining the accuracy of the classification detection head and the accuracy of the regression detection head.

In specific implementation, the decoupling detection head in the residual network of the training network can perform feature reduction and field expansion on a feature map corresponding to an image to be detected by using a convolution network, for example: the feature maps with different channel numbers can be subjected to dimensionality reduction by adopting the 1 x 1 convolution layer, then the multiple 3 x 3 convolution layers are utilized to expand two branches of the visual field, one branch is used for forming a classification detection head, the other branch is used for forming a regression detection head, and an IOU branch is added to the output of the regression detection head to form a confidence detection head which is used for determining the accuracy of the classification detection head and the regression detection head.

Fig. 2B is a schematic structural diagram of the decoupling detection head according to the embodiment of the present invention, and as shown in fig. 2B, by setting a plurality of convolution layers with different dimensions in a convolution network, image features corresponding to an image to be detected are subjected to reduction and field expansion, so as to provide a clear feature image for a later classification detection head, a regression detection head, and a confidence detection head.

and updating the initial parameters into a training network to obtain an initial training model.

In a particular implementation, an initial data set is identified by images provided on a network, wherein the images in the initial data set are divided into training, validation, and test sets. The initial dataset may be a common target test set with an annotation type, i.e., a COCO dataset. After the training network is built, the training network is trained through the initial data set until the model converges to obtain an initial recognition model, and more than 80 or more than 80 types of targets in the standard type can be recognized and used as a basic model of the target recognition model. And carrying out targeted training through the target training sample on the basis of the basic model to obtain a required target recognition model.

S230, dividing a target training sample set into data of a first training period and data of a second training period according to a preset training period;

in specific implementation, the preset training period may be a training sub-period number preset according to actual requirements and experimental data, each image may be used as a training sample in the preset training period, and a training sample is input once to be trained as a training sub-period, or the preset training sub-period may be divided according to a target training sample set, each image is used as a training sample, and a training sample is input once to be trained as a training period. The data of the first training period is used for data in a target training sample set for performing parameter training on the remaining networks when the trunk network is frozen. And the data of the second training period is used for performing parameter training on the training network when the main network is unfrozen.

S240, training the initial recognition model according to the data of the first training period and the data of the second training period to obtain a target recognition model.

In specific implementation, data of a first training period is input into the initial recognition model to perform model training of the first training period, and parameters in the initial recognition model are updated. And inputting the data of the second training period into the initial recognition model after the parameters are updated to perform model training of the second training period, and performing second parameter updating on the initial recognition model to obtain a target recognition model.

Further, training the initial recognition model according to the data of the first training period and the data of the second training period to obtain a target recognition model, including:

freezing initial parameters of a trunk network in the initial recognition model, and training the initial recognition model by using data of a first training period to obtain a first training model;

and unfreezing the initial parameters of the backbone network in the first training model, and training the first training model by using the data of the second training period to obtain the target recognition model.

In the specific implementation, the initial parameters of a backbone network in an initial recognition model are frozen, data of a first training period are input into the initial recognition model to perform model training of the first training period, and the parameters in the initial recognition model are updated to obtain a first training model; and unfreezing the initial parameters of the backbone network in the first training model, inputting the data of the second training period into the first training model to perform model training of the second training period, and performing second parameter updating on the initial recognition model to obtain the target recognition model. Such as: setting a preset training period as 300 training sub-periods, setting data of a first training period as 50 training sub-periods, freezing initial parameters of a main network, and training the data of the first training period to obtain a first training model; unfreezing the initial training parameters of the backbone network, inputting the remaining 250 sub-period data serving as second training period data into the first training model for model training of a second training period, and obtaining a target recognition model. And when the flame of the air gun at the target position is detected, obtaining an air gun flame identification model.

Fig. 2C is a schematic diagram of a training process of a target recognition model according to an embodiment of the present invention, as shown in fig. 2C, acquiring an image to be detected, determining a target training sample set corresponding to the image to be detected according to source information of the image to be detected, and acquiring an initial data set on a network; and building a training network, and inputting the initial data set into the built training network for initial training to obtain an initial recognition model. And inputting the target training sample set into the initial recognition model for training based on the initial recognition model to obtain a target recognition model corresponding to the image to be detected.

And S250, inputting the image to be detected into a target recognition model for target recognition to obtain a detection result of the image to be detected.

Fig. 3 is a schematic structural diagram of an object recognition apparatus according to an embodiment of the present invention, and as shown in fig. 3, the object recognition apparatus includes:

the sample determining module 310 is configured to obtain an image to be detected, and determine a target training sample set corresponding to the image to be detected according to source information of the image to be detected;

the network building module 320 is used for building a training network according to the decoupling detection head and training the training network according to an initial data set to obtain an initial recognition model;

the model training module 330 is configured to train the initial recognition model according to the target training sample set to obtain a target recognition model;

the image detection module 340 is configured to input the image to be detected into the target recognition model for target recognition, so as to obtain a detection result of the image to be detected.

In an embodiment, the determining module 310 determines, according to the source information of the image to be detected, a target training sample set corresponding to the image to be detected, including:

In one embodiment, the network building module 320 builds a training network according to the decoupling detection head, including:

In one embodiment, the decoupling detection head comprises a convolutional network, a classification detection head, a regression detection head and a confidence level detection head;

In an embodiment, the network building module 320 trains the training network according to an initial data set to obtain an initial recognition model, including:

In an embodiment, the training of the initial recognition model by the model training module 330 according to the target training sample set to obtain a target recognition model includes:

In an embodiment, the training of the initial recognition model by the model training module 330 according to the data of the first training period and the data of the second training period to obtain the target recognition model includes:

According to the device provided by the embodiment of the invention, a target training sample set corresponding to an image to be detected is determined by acquiring the image to be detected and according to the source information of the image to be detected; building a training network according to the decoupling detection head, and training the training network according to the initial data set to obtain an initial recognition model; training the initial recognition model according to the target training sample set to obtain a target recognition model; and inputting the image to be detected into a target recognition model for target recognition to obtain a detection result of the image to be detected. Namely, according to the embodiment of the invention, the convergence rate of the model is improved from the network structure and the data source through the establishment of the training network and the determination of the target training sample; and carrying out initial training on the training network through an initial data set to obtain an initial recognition model, training different targets according to a target training sample set, and setting the convergence speed of the improved model from the training step.

Fig. 4 is a schematic structural diagram of an electronic device according to embodiment 4 of the present invention. FIG. 4 illustrates a block diagram of an exemplary electronic device 12 suitable for use in implementing embodiments of the present invention. The electronic device 12 shown in fig. 4 is only an example and should not bring any limitation to the function and the scope of use of the embodiment of the present invention.

As shown in FIG. 4, electronic device 12 is embodied in the form of a general purpose computing device. The components of electronic device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Electronic device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, and commonly referred to as a "hard drive"). Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.

Electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with electronic device 12, and/or with any devices (e.g., network card, modem, etc.) that enable electronic device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, the electronic device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 via the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processing unit 16 executes various functional applications and data processing by running a program stored in the system memory 28, for example, to implement an object recognition method provided by an embodiment of the present invention, the method including:

An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the target identification method, and the method includes:

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method of object recognition, comprising:

2. The method according to claim 1, wherein determining a target training sample set corresponding to the image to be detected according to the source information of the image to be detected comprises:

3. The method of claim 1, wherein building a training network from the decoupled detector head comprises:

4. The method of claim 3, wherein the decoupled detection heads comprise a convolutional network, a classification detection head, a regression detection head, and a confidence detection head;

5. The method of claim 1, wherein training the training network according to an initial data set to obtain an initial recognition model comprises:

6. The method of claim 1, wherein training the initial recognition model according to the target training sample set to obtain a target recognition model comprises:

7. The method of claim 6, wherein training the initial recognition model according to the data of the first training period and the data of the second training period to obtain the target recognition model comprises:

8. An object recognition apparatus, comprising:

9. An electronic device, characterized in that the electronic device comprises:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the object recognition method of any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the object recognition method of any one of claims 1 to 7.