CN113569600A - Method and device for identifying weight of object, electronic equipment and storage medium - Google Patents

Method and device for identifying weight of object, electronic equipment and storage medium Download PDF

Info

Publication number
CN113569600A
CN113569600A CN202010359731.0A CN202010359731A CN113569600A CN 113569600 A CN113569600 A CN 113569600A CN 202010359731 A CN202010359731 A CN 202010359731A CN 113569600 A CN113569600 A CN 113569600A
Authority
CN
China
Prior art keywords
target
target detection
frame
current image
image frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010359731.0A
Other languages
Chinese (zh)
Inventor
曾卓熙
胡文泽
王孝宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Intellifusion Technologies Co Ltd
Original Assignee
Shenzhen Intellifusion Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Intellifusion Technologies Co Ltd filed Critical Shenzhen Intellifusion Technologies Co Ltd
Priority to CN202010359731.0A priority Critical patent/CN113569600A/en
Publication of CN113569600A publication Critical patent/CN113569600A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a method, a device, an electronic device and a storage medium for object re-identification, wherein the method for object re-identification comprises the following steps: acquiring a current image frame of a video stream; inputting a current image frame into a pre-trained target detection model, wherein the target detection model comprises a detector and a re-identification network; performing target detection on the current image frame through the detector to obtain a target area of a target object in the current image frame, and performing feature extraction on the target area through the re-identification network to obtain a re-identification feature value of the target object; and matching the re-recognition characteristic value with the existing object characteristic value in the database to obtain a re-recognition result of the target object. By the adoption of the method and the device, the efficiency of re-identifying the object in the video stream is improved.

Description

Method and device for identifying weight of object, electronic equipment and storage medium
Technical Field
The present application relates to the field of machine learning technologies, and in particular, to a method and an apparatus for object re-identification, an electronic device, and a storage medium.
Background
With the development of computer vision technology, the video and image processing field has more research directions, and object re-identification is one of the popular fields. In video streaming, object re-identification is to identify objects which may be the same from frame to frame, and has indispensable effects in aspects of intelligent video monitoring, automatic driving, target tracking and the like. In the existing object re-recognition, a detector is generally adopted to recognize an object in a current video image frame, an independent feature extraction network is used for feature extraction of the recognized object, and then the recognized object is matched with the existing features in a database, so that the whole process needs a large amount of resources, and the object re-recognition efficiency is low.
Disclosure of Invention
In view of the foregoing problems, the present application provides a method, an apparatus, an electronic device, and a storage medium for object re-identification, which are beneficial to improving the efficiency of re-identifying an object in a video stream.
In order to achieve the above object, a first aspect of the embodiments of the present application provides a method for re-identifying an object, where the method includes:
acquiring a current image frame of a video stream;
inputting the current image frame into a pre-trained target detection model, wherein the target detection model comprises a detector and a re-recognition network; performing target detection on the current image frame through the detector to obtain a target area of a target object in the current image frame, and performing feature extraction on the target area through the re-identification network to obtain a re-identification feature value of the target object;
and matching the re-recognition characteristic value with the existing object characteristic value in the database to obtain a re-recognition result of the target object.
With reference to the first aspect, in one possible implementation, the detector is an anchor-based detector; the performing target detection on the current image frame to obtain a target area of a target object in the current image frame includes:
extracting the features of the current image frame through the detector to obtain a target feature map;
predicting the target object based on the target feature map and a preset anchor frame to obtain a target detection frame of the target object, and determining an area framed by the target detection frame as the target area.
With reference to the first aspect, in a possible implementation manner, the predicting the target object based on the target feature map and a preset anchor frame to obtain a target detection frame of the target object includes:
dividing the current image frame into a plurality of grids according to the size of the target feature map;
and if the center of the target object falls into the grid, predicting the target object by using the anchor frame preset by the grid to obtain the target detection frame.
With reference to the first aspect, in a possible implementation manner, the obtaining the target detection frame, where the predicted value of the anchor frame includes a coordinate offset value of a center point of the anchor frame, and a scaling value of a width and a height of the anchor frame, includes:
calculating to obtain a central point coordinate of the target detection frame according to the central point coordinate offset value;
calculating the width and height of the target detection frame according to the scaling values of the width and the height;
and determining the position of the target detection frame according to the coordinates of the central point of the target detection frame and the width and height of the target detection frame.
With reference to the first aspect, in one possible implementation manner, the training process of the target detection model includes:
adding the training loss of the re-recognition network to the original training loss of the detector to obtain the added training loss;
inputting a training data set consisting of a plurality of objects into the target detection model for iteration, adjusting network parameters of the target detection model according to the added training loss value, and fitting the output of the re-recognition network; the target object is included in the plurality of objects.
A second aspect of the embodiments of the present application provides an apparatus for re-identifying an object, where the apparatus includes:
the image frame acquisition module is used for acquiring a current image frame of the video stream;
the target detection and re-recognition module is used for inputting the current image frame into a pre-trained target detection model, and the target detection model comprises a detector and a re-recognition network; performing target detection on the current image frame through the detector to obtain a target area of a target object in the current image frame, and performing feature extraction on the target area through the re-identification network to obtain a re-identification feature value of the target object;
and the re-recognition characteristic value matching module is used for matching the re-recognition characteristic value with the existing object characteristic value in the database to obtain a re-recognition result of the target object.
A third aspect of embodiments of the present application provides an electronic device, which includes an input device, an output device, and a processor, and is adapted to implement one or more instructions; and a computer storage medium storing one or more instructions adapted to be loaded by the processor and to perform the steps of:
acquiring a current image frame of a video stream;
inputting the current image frame into a pre-trained target detection model, wherein the target detection model comprises a detector and a re-recognition network; performing target detection on the current image frame through the detector to obtain a target area of a target object in the current image frame, and performing feature extraction on the target area through the re-identification network to obtain a re-identification feature value of the target object;
and matching the re-recognition characteristic value with the existing object characteristic value in the database to obtain a re-recognition result of the target object.
A fourth aspect of embodiments of the present application provides a computer storage medium having one or more instructions stored thereon, the one or more instructions adapted to be loaded by a processor and to perform the following steps:
acquiring a current image frame of a video stream;
inputting the current image frame into a pre-trained target detection model, wherein the target detection model comprises a detector and a re-recognition network; performing target detection on the current image frame through the detector to obtain a target area of a target object in the current image frame, and performing feature extraction on the target area through the re-identification network to obtain a re-identification feature value of the target object;
and matching the re-recognition characteristic value with the existing object characteristic value in the database to obtain a re-recognition result of the target object.
The above scheme of the present application includes at least the following beneficial effects: compared with the prior art, the embodiment of the application acquires the current image frame of the video stream; then inputting the current image frame into a pre-trained target detection model, wherein the target detection model comprises a detector and a re-recognition network; performing target detection on the current image frame through the detector to obtain a target area of a target object in the current image frame, and performing feature extraction on the target area through the re-identification network to obtain a re-identification feature value of the target object; and finally, matching the re-recognition characteristic value with the existing object characteristic value in the database to obtain the re-recognition result of the target object. Therefore, the re-recognition network for re-recognition feature value extraction and the detector form a target detection model, the target detection task and the re-recognition feature value extraction task are carried out in the same network model, the re-recognition feature values are simultaneously output on the basis of the output result of the original detector, and the re-recognition feature values of each target object in the current image frame do not need to be extracted by independently using the feature extraction network, so that the efficiency of re-recognition of the target object is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic diagram of an application environment provided in an embodiment of the present application;
fig. 2 is a schematic flowchart of a method for re-identifying an object according to an embodiment of the present disclosure;
FIG. 3 is a schematic flow chart of another method for identifying weight of an object according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a target detection model according to an embodiment of the present disclosure;
FIG. 5 is an exemplary diagram of inputs to outputs of a target detection model provided by an embodiment of the present application;
fig. 6 is a schematic structural diagram of an apparatus for re-identifying an object according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of another object re-identification apparatus provided in an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "comprising" and "having," and any variations thereof, as appearing in the specification, claims and drawings of this application, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus. Furthermore, the terms "first," "second," and "third," etc. are used to distinguish between different objects and are not used to describe a particular order.
The embodiment of the application provides a scheme for re-identifying an object in a video stream, so as to improve the efficiency of re-identifying the object, compared with the existing scheme that a training data set is used independently to train a feature extraction network and a target detection task and a re-identifying feature value extraction task are carried out separately, the re-identifying network is added into a target detector, when the target detector is trained, the training loss of the re-identifying network is added on the basis of the training loss of the target detector, the re-identifying network is trained in a distillation mode, a model is realized to simultaneously output a target detection result and a re-identifying feature value of the object end to end, the hyper-parameters are reduced, the re-identifying network and the target detector share 100% of calculation, and the overall calculation complexity is not increased.
Specifically, first, a possible application environment of the solution of the embodiments of the present application is described by way of example with reference to the accompanying drawings. Referring to fig. 1, fig. 1 is a schematic view of an application environment provided in an embodiment of the present application, and as shown in fig. 1, the application environment includes a Video acquisition area and a Video processing center, in some cases, the Video processing center may be understood as a Video monitoring center, where the Video processing center adopts a VCN (Video Cloud Node) Video management service, and has functions of real-time Video viewing, Video forwarding, Video playback, and the like. The video processing center takes a server as an execution main body, and a user can interact with the server through a terminal, for example: the target detection model added with the heavy recognition network is deployed on the server through the terminal, and the processing result of the server on the video file can be displayed on the terminal. In the video processing center, the database may be a local database (for example, a database in a server), or may be a database independent of the server (for example, a cloud database, a third-party database), and may be used to structurally store video files, sample images of objects, re-identification feature values, and the like, and of course, the stored data may also be called by the server or a terminal. The video acquisition area is the area covered by the image acquisition equipment, and can be the area such as industrial park, street, district entry, and the image acquisition equipment that arranges in it can transmit the video of gathering to the server, carries out a series of heavy identification processing by the server, and under some circumstances, the video that the image acquisition equipment gathered also can directly be transmitted to the database and is stored. The communication network between the video acquisition area and each part of the video processing center includes, but is not limited to, a virtual private network, a local area network, and a metropolitan area network, and the object re-identification scheme proposed in the present application may be implemented in the application environment shown in fig. 1, of course, fig. 1 is only an example, and does not limit the embodiment of the present application.
Based on the application environment shown in fig. 1, the following describes in detail the object re-identification method provided in the embodiment of the present application with reference to other drawings.
Referring to fig. 2, fig. 2 is a flowchart illustrating an object re-identification method according to an embodiment of the present application, where the object re-identification method is executed by a server, as shown in fig. 2, and includes steps S21-S23:
s21, acquiring a current image frame of the video stream;
in this embodiment of the application, the video stream may be a video uploaded by an image capturing device in a video capturing area, or may also be a video in a database or other platforms acquired by a server through an interface, where a current image frame is an image frame transmitted and displayed at a current time point.
S22, inputting the current image frame into a pre-trained target detection model, wherein the target detection model comprises a detector and a re-recognition network; performing target detection on the current image frame through the detector to obtain a target area of a target object in the current image frame, and performing feature extraction on the target area through the re-identification network to obtain a re-identification feature value of the target object;
in the embodiment of the present application, the target object in the current image frame may be any object such as a pedestrian, a vehicle, etc., the detector of the target detection model is not limited, and the detector based on the anchor (anchor point or anchor frame) may perform the task of performing target detection on the current image frame, for example: fast R-CNN (fast Region-conditional Neural Networks, Faster candidate area Convolutional Neural network detectors), YOLO (youonly Look one, object-at-a-glance Detector) -V2, YOLO-V3, SSD (Single Shot multi box Detector), and so on. For the selected target detector, the whole network structure is basically kept unchanged, a re-recognition network is added into an output layer, and then iterative training is carried out by adopting a training data set, so that the obtained target detection model can output more re-recognition characteristic values of the target object in the current image frame on the basis of the original output result.
For the obtained current image frame, the detector which inputs the current image frame into the target detection model performs feature extraction by the full convolution network, and different target detection models perform different processing on the extracted feature map, for example: one branch of the feature map in the fast R-CNN enters an RPN (Region pro-social Network, Region generation Network) to generate an offset value for regression of an anchor frame and a bounding box, then a candidate target detection frame of a target object in a current image frame is calculated, and a ROIPooling (Region of interest Pooling) layer extracts the target feature map from the feature map and inputs the target feature map into a full connection and softmax Network to perform classification and bounding box fine regression of the target object. And the YOLOV3 respectively performs 32-fold, 16-fold and 8-fold down sampling on the extracted feature maps to obtain three feature maps with different sizes, the three feature maps with different sizes respectively adopt anchor frames with different sizes to generate offset values so as to calculate a target detection frame of the target object, and the type of the target object is predicted by using logistic. Although the above description is made by using only two target detection models, although the processing manners of different target detection models are different, the target detection frames of all target objects in the current image frame are finally obtained, the region framed by the target detection frame is the target region for feature extraction by the re-recognition network, and the re-recognition network performs convolution operation on the target region to output the re-recognition feature value of the target object, instead of inputting the target regions of different target objects into a separate feature extraction network to perform extraction of the re-recognition feature value, for example: assuming that there are 8 target objects in the current image frame, the conventional method is to use a detector to perform target detection, then perform re-recognition feature extraction on each target object independently, and need to run 8 times of feature extraction networks.
And S23, matching the re-recognition characteristic value with the existing object characteristic value in the database to obtain the re-recognition result of the target object.
In the embodiment of the present application, the existing object feature value in the database, that is, the feature value of the object appearing in the previous image frame or the previous image frame of the current image frame, is matched with the re-recognition feature value extracted in step S22, which can be implemented by using matrix operation, so as to quickly find the feature value of the existing object with higher matching degree, and highlight the target object on the display interface of the terminal, for example: the pedestrians appearing in the previous image frame and the current image frame can be calibrated, and under the condition that the existing object characteristic value with high matching degree is not found, the target object in the current image frame appears for the first time, the re-identification characteristic value of the target object can be directly stored in the database, or in some embodiments, a prompt that the same target object is not found can be returned to the terminal. Optionally, matching the re-recognition characteristic value with the existing object characteristic value in the database may be performed by calculating cosine similarity, euclidean distance, earth moving distance, and the like.
It can be seen that, in the embodiment of the present application, by acquiring a current image frame of a video stream; then inputting the current image frame into a pre-trained target detection model, wherein the target detection model comprises a detector and a re-recognition network; performing target detection on the current image frame through the detector to obtain a target area of a target object in the current image frame, and performing feature extraction on the target area through the re-identification network to obtain a re-identification feature value of the target object; and finally, matching the re-recognition characteristic value with the existing object characteristic value in the database to obtain the re-recognition result of the target object. Therefore, the re-recognition network for re-recognition feature value extraction and the detector form a target detection model, the target detection task and the re-recognition feature value extraction task are carried out in the same network model, the re-recognition feature values are simultaneously output on the basis of the output result of the original detector, and the re-recognition feature values of each target object in the current image frame do not need to be extracted by independently using the feature extraction network, so that the efficiency of re-recognition of the target object is improved.
Referring to fig. 3, fig. 3 is a flowchart illustrating another method for identifying the weight of an object according to an embodiment of the present application, which can also be implemented based on the application environment shown in fig. 1, as shown in fig. 3, including steps S31-S35:
s31, acquiring a current image frame of the video stream;
s32, inputting the current image frame into a detector of a pre-trained target detection model for feature extraction to obtain a target feature map; the detector is an anchor point-based detector;
s33, predicting the target object in the current image frame based on the target feature map and a preset anchor frame to obtain a target detection frame of the target object, and determining the region framed by the target detection frame as a target region;
s34, performing feature extraction on the target area through a re-identification network of the target detection model to obtain a re-identification feature value of the target object;
in the specific embodiment of the present application, a YOLO-V3 detector based on anchor is used as a detector of a target detection model, and as shown in fig. 4, a current image frame of a video stream is used as an input image, 53 convolutional layers are used to extract a basic feature map, 32-fold down-sampling is performed on several convolutional layers after 79 layers to obtain a 13 × 13 target feature map, then the up-sampling is performed on the target feature map, the feature map obtained by the up-sampling is fused with a 62 th layer feature map to obtain a 91 st layer feature map, 16-fold down-sampling is performed on the 91 st layer feature map to obtain a 26 × 26 target feature map, then the 91 st layer feature map is up-sampled, the feature map obtained by the up-sampling is fused with a 36 th layer feature map, and finally 8-fold down-sampling is performed to obtain a 52 target feature map.
In a possible implementation manner, the predicting a target object in the current image frame based on the target feature map and a preset anchor frame to obtain a target detection frame of the target object includes:
and dividing the input current image frame into a plurality of grids according to the size of the target feature map. For example: when the target feature map of 13 × 13 is used for positioning the target object, the current image frame is divided into 13 × 13 grids, and when the target feature map of 26 × 26 is used for positioning the target object, the current image frame is divided into 26 × 26 grids, and the same is true for the target diagnostic map of 52 × 52.
And if the center of the target object falls into a certain grid, predicting the target object by using an anchor frame preset by the grid to obtain a target detection frame of the target object. In the YOLO-V3, three anchor frames (or called prior frames) with different sizes are applied to the target feature map of each scale, three anchor frames are preset for each divided grid, and the center point coordinate offset value t of each anchor frame is predicted from the middle grid if the center of the pedestrian a is exactly located in the middle grid of 13 × 13x、tyScaling value t of width and height of anchor framew、thAccording to the coordinate offset value t of the center pointx、tyThe formula is adopted: bx=σtx+cx,by=σty+cyAnd calculating to obtain the coordinates (b) of the central point of the target detection framex,by) Wherein c isxAnd cyScaling the coordinate of the upper left corner of the grid in the target feature map according to the width and heightw、thThe formula is adopted:
Figure BDA0002474126970000091
calculating to obtain the width b of the target detection framewAnd high bhWherein p iswAnd phIs the width and height of the anchor box mapped onto the target feature map, bx,by,bwAnd bhThe position of the target detection frame can be determined, the area framed by the target detection frame is the target area where the target object is located, the target area is convoluted by adopting a re-recognition network, and 512-dimensional re-recognition characteristic values are output on the basis of the original output result of YOLO-V3. As shown in fig. 5, for the input 416 × 3 current image frame, three scales of target feature maps are obtained through feature extraction of the full convolution network, and the output of YOLO-V3, which is originally predicted on the three target feature maps, is: 13 × 3 (positioning coordinate (4) + confidence (1) + object recognition probability (C)), 26 × 3 (positioning coordinate (4) + confidence (1) + object recognition probability (C)), 52 × 3 (positioning coordinate (4) + confidence (1) + object recognition probability (C)), and since the current output layer incorporates a re-recognition network (convolution layer) to extract re-recognition feature values for the target region, the current output is: 13 × 3 (location coordinate (4) + confidence (1) + object recognition probability (C) + re-recognition feature value (512)), 26 × 3 (location coordinate (4) + confidence (1) + object recognition probability (C) + re-recognition feature value (512)), and 52 × 3 (location coordinate (4) + confidence (1) + object recognition probability (C) + re-recognition feature value (512)).
And S35, matching the re-recognition characteristic value with the existing object characteristic value in the database to obtain the re-recognition result of the target object.
Some steps shown in fig. 3 have been described in the embodiment shown in fig. 2, and can achieve the same or similar beneficial effects, and are not repeated here to avoid repetition.
In a possible implementation manner, the training process of the target detection model includes:
adding the training loss of the re-recognition network to the original training loss of the detector to obtain the added training loss;
inputting a training data set consisting of a plurality of objects into the target detection model for iteration, adjusting network parameters of the target detection model according to the added training loss value, and fitting the output of the re-recognition network; the target object is included in the plurality of objects.
In the embodiment of the present application, the training data set is an image set of a plurality of objects, and the training data set may be iterated by using a gradient descent method or a projection method (iterative projection algorithm), and when the training data set is output in each iteration, a value of training loss obtained after the addition is observed, and a network parameter (for example, the number of neurons in a hidden layer, etc.) of the target detection model is adjusted according to the value. Preferably, the training data set is input to the target detection model for iteration by distillation, and the output of the re-recognition network is fitted with a regression Loss function Huber Loss.
In the embodiment, the re-recognition network is merged into the detector to form the target detection model, the re-recognition network is trained in a distillation mode, the training loss of the re-recognition network is added to the original training loss of the detector, the network for extracting the re-recognition characteristic value is not required to be trained independently by adopting a complex loss function, and a large amount of video memory resources can be saved.
It can be seen that, in the embodiment of the present application, by acquiring a current image frame of a video stream; inputting the current image frame into a detector of a pre-trained target detection model for feature extraction to obtain a target feature map; predicting a target object in a current image frame based on a target feature map and a preset anchor frame to obtain a target detection frame of the target object, and determining an area framed by the target detection frame as a target area; performing feature extraction on the target area through a re-recognition network of the target detection model to obtain a re-recognition feature value of the target object; and finally, matching the re-recognition characteristic value with the existing object characteristic value in the database to obtain a re-recognition result of the target object. Therefore, the re-recognition network for re-recognition feature value extraction and the detector form a target detection model, the target detection task and the re-recognition feature value extraction task are carried out in the same network model, the re-recognition feature values are simultaneously output on the basis of the output result of the original detector, and the re-recognition feature values of each target object in the current image frame do not need to be extracted by independently using the feature extraction network, so that the efficiency of re-recognition of the target object is improved.
Referring to fig. 6, fig. 6 is a schematic structural diagram of an object re-identification apparatus according to an embodiment of the present application, and as shown in fig. 6, the apparatus includes:
an image frame acquiring module 61, configured to acquire a current image frame of the video stream;
a target detection and re-recognition module 62 for inputting the current image frame into a pre-trained target detection model, the target detection model including a detector and a re-recognition network; performing target detection on the current image frame through the detector to obtain a target area of a target object in the current image frame, and performing feature extraction on the target area through the re-identification network to obtain a re-identification feature value of the target object;
and the re-recognition characteristic value matching module 63 is configured to match the re-recognition characteristic value with an existing object characteristic value in a database to obtain a re-recognition result of the target object.
In a possible implementation manner, in terms of performing target detection on the current image frame to obtain a target area of a target object in the current image frame, the target detection and re-identification module 62 is specifically configured to:
extracting the features of the current image frame through the detector to obtain a target feature map;
predicting the target object based on the target feature map and a preset anchor frame to obtain a target detection frame of the target object, and determining an area framed by the target detection frame as the target area.
In a possible implementation manner, in predicting the target object based on the target feature map and a preset anchor frame to obtain a target detection frame of the target object, the target detection and re-identification module 62 is specifically configured to:
dividing the current image frame into a plurality of grids according to the size of the target feature map;
and if the center of the target object falls into the grid, predicting the target object by using the anchor frame preset by the grid to obtain the target detection frame.
In a possible implementation manner, the predicted value of the anchor frame includes a center coordinate offset value of the anchor frame, and a width and a height scaling value of the anchor frame, in terms of obtaining the target detection frame, the target detection and re-identification module 62 is specifically configured to:
calculating to obtain a central point coordinate of the target detection frame according to the central point coordinate offset value;
calculating the width and height of the target detection frame according to the scaling values of the width and the height;
and determining the position of the target detection frame according to the coordinates of the central point of the target detection frame and the width and height of the target detection frame.
In a possible implementation manner, as shown in fig. 7, the apparatus further includes a network training module 64, and in terms of training of the target detection model, the network training module 64 is specifically configured to:
adding the training loss of the re-recognition network to the original training loss of the detector to obtain the added training loss;
inputting a training data set consisting of a plurality of objects into the target detection model for iteration, adjusting network parameters of the target detection model according to the added training loss value, and fitting the output of the re-recognition network; the target object is included in the plurality of objects.
According to an embodiment of the present application, the units in the apparatus for object re-identification shown in fig. 6 and 7 may be respectively or entirely combined into one or several additional units to form the apparatus, or some unit(s) may be further split into multiple units with smaller functions to form the apparatus, which may achieve the same operation without affecting the achievement of the technical effect of the embodiment of the present application. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present application, the device based on object re-identification may also include other units, and in practical applications, these functions may also be implemented by the assistance of other units, and may be implemented by cooperation of a plurality of units. The object re-identification device provided by the embodiment of the present application can be applied to scenes such as target detection, object re-identification, target tracking, and the like.
According to another embodiment of the present application, the apparatus device for object re-identification as shown in fig. 6 or fig. 7 may be constructed by running a computer program (including program codes) capable of executing the steps involved in the corresponding method as shown in fig. 2 or fig. 3 on a general-purpose computing device such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and a storage element, and the method for object re-identification of the embodiment of the present application may be implemented. The computer program may be recorded on a computer-readable recording medium, for example, and loaded and executed in the above-described computing apparatus via the computer-readable recording medium.
Based on the description of the method embodiment and the device embodiment, the embodiment of the application further provides an electronic device. Referring to fig. 8, the electronic device includes at least a processor 81, an input device 82, an output device 83, and a computer storage medium 84. The processor 81, input device 82, output device 83, and computer storage medium 84 within the electronic device may be connected by a bus or other means.
A computer storage medium 84 may be stored in the memory of the electronic device, the computer storage medium 84 being for storing a computer program comprising program instructions, the processor 81 being for executing the program instructions stored by the computer storage medium 84. The processor 81 (or CPU) is a computing core and a control core of the electronic device, and is adapted to implement one or more instructions, and in particular, is adapted to load and execute the one or more instructions so as to implement a corresponding method flow or a corresponding function.
In one embodiment, the processor 81 of the electronic device provided in the embodiment of the present application may be configured to perform a process of weight recognition on a series of objects: acquiring a current image frame of a video stream; inputting the current image frame into a pre-trained target detection model, wherein the target detection model comprises a detector and a re-recognition network; performing target detection on the current image frame through the detector to obtain a target area of a target object in the current image frame, and performing feature extraction on the target area through the re-identification network to obtain a re-identification feature value of the target object; and matching the re-recognition characteristic value with the existing object characteristic value in the database to obtain a re-recognition result of the target object.
In one embodiment, the detector is an anchor-based detector; the processor 81 performs the target detection on the current image frame to obtain a target area of a target object in the current image frame, including:
extracting the features of the current image frame through the detector to obtain a target feature map;
predicting the target object based on the target feature map and a preset anchor frame to obtain a target detection frame of the target object, and determining an area framed by the target detection frame as the target area.
In another embodiment, the processor 81 is configured to perform the predicting of the target object based on the target feature map and a preset anchor frame to obtain a target detection frame of the target object, and includes:
dividing the current image frame into a plurality of grids according to the size of the target feature map;
and if the center of the target object falls into the grid, predicting the target object by using the anchor frame preset by the grid to obtain the target detection frame.
In another embodiment, the predicted value of the anchor frame includes a center coordinate offset value of the anchor frame, and a width and a height scaling value of the anchor frame, and the processor 81 executes the obtaining of the target detection frame, including:
calculating to obtain a central point coordinate of the target detection frame according to the central point coordinate offset value;
calculating the width and height of the target detection frame according to the scaling values of the width and the height;
and determining the position of the target detection frame according to the coordinates of the central point of the target detection frame and the width and height of the target detection frame.
In yet another embodiment, the processor 81 performs a training process of the target detection model, including:
adding the training loss of the re-recognition network to the original training loss of the detector to obtain the added training loss;
inputting a training data set consisting of a plurality of objects into the target detection model for iteration, adjusting network parameters of the target detection model according to the added training loss value, and fitting the output of the re-recognition network; the target object is included in the plurality of objects.
For example, the electronic devices may be computers, servers, computer hosts, etc., and the electronic devices may include, but are not limited to, a processor 81, an input device 82, an output device 83, and a computer storage medium 84. It will be appreciated by those skilled in the art that the schematic diagrams are merely examples of an electronic device and are not limiting of an electronic device and may include more or fewer components than those shown, or some components in combination, or different components.
It should be noted that, since the processor 81 of the electronic device executes the computer program to implement the steps in the method for identifying the object weight, the embodiments of the method for identifying the object weight are all applicable to the electronic device, and all can achieve the same or similar beneficial effects.
An embodiment of the present application further provides a computer storage medium (Memory), which is a Memory device in an electronic device and is used to store programs and data. It is understood that the computer storage medium herein may include a built-in storage medium in the terminal, and may also include an extended storage medium supported by the terminal. The computer storage medium provides a storage space that stores an operating system of the terminal. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), suitable for loading and execution by processor 81. The computer storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory; alternatively, it may be at least one computer storage medium located remotely from the processor 81. In one embodiment, one or more instructions stored in a computer storage medium may be loaded and executed by processor 81 to perform the steps of the above-described method for object re-identification; in particular implementations, one or more instructions in the computer storage medium are loaded by processor 81 and perform the following steps:
acquiring a current image frame of a video stream;
inputting the current image frame into a pre-trained target detection model, wherein the target detection model comprises a detector and a re-recognition network; performing target detection on the current image frame through the detector to obtain a target area of a target object in the current image frame, and performing feature extraction on the target area through the re-identification network to obtain a re-identification feature value of the target object;
and matching the re-recognition characteristic value with the existing object characteristic value in the database to obtain a re-recognition result of the target object.
In one example, one or more instructions in the computer storage medium when loaded by processor 81 further perform the steps of:
extracting the features of the current image frame through the detector to obtain a target feature map;
predicting the target object based on the target feature map and a preset anchor frame to obtain a target detection frame of the target object, and determining an area framed by the target detection frame as the target area.
In one example, one or more instructions in the computer storage medium when loaded by processor 81 further perform the steps of:
dividing the current image frame into a plurality of grids according to the size of the target feature map;
and if the center of the target object falls into the grid, predicting the target object by using the anchor frame preset by the grid to obtain the target detection frame.
In one example, one or more instructions in the computer storage medium when loaded by processor 81 further perform the steps of:
calculating to obtain the center point coordinate of the target detection frame according to the anchor frame center point coordinate offset value;
calculating the width and height of the target detection frame according to the scaling values of the width and height of the anchor frame;
and determining the position of the target detection frame according to the coordinates of the central point of the target detection frame and the width and height of the target detection frame.
In one example, one or more instructions in the computer storage medium when loaded by processor 81 further perform the steps of:
adding the training loss of the re-recognition network to the original training loss of the detector to obtain the added training loss;
inputting a training data set consisting of a plurality of objects into the target detection model for iteration, adjusting network parameters of the target detection model according to the added training loss value, and fitting the output of the re-recognition network; the target object is included in the plurality of objects.
Illustratively, the computer program of the computer storage medium includes computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.
It should be noted that, since the computer program of the computer storage medium is executed by the processor to implement the steps of the method for object weight recognition, all the embodiments of the method for object weight recognition are applicable to the computer storage medium, and can achieve the same or similar beneficial effects.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A method of object re-identification, the method comprising:
acquiring a current image frame of a video stream;
inputting the current image frame into a pre-trained target detection model, wherein the target detection model comprises a detector and a re-recognition network; performing target detection on the current image frame through the detector to obtain a target area of a target object in the current image frame, and performing feature extraction on the target area through the re-identification network to obtain a re-identification feature value of the target object;
and matching the re-recognition characteristic value with the existing object characteristic value in the database to obtain a re-recognition result of the target object.
2. The method of claim 1, wherein the detector is an anchor-based detector; the performing target detection on the current image frame to obtain a target area of a target object in the current image frame includes:
extracting the features of the current image frame through the detector to obtain a target feature map;
predicting the target object based on the target feature map and a preset anchor frame to obtain a target detection frame of the target object, and determining an area framed by the target detection frame as the target area.
3. The method according to claim 2, wherein the predicting the target object based on the target feature map and a preset anchor frame to obtain a target detection frame of the target object comprises:
dividing the current image frame into a plurality of grids according to the size of the target feature map;
and if the center of the target object falls into the grid, predicting the target object by using the anchor frame preset by the grid to obtain the target detection frame.
4. The method of claim 2, wherein the predicted value of the anchor frame comprises a center point coordinate offset value of the anchor frame, a width and a height scaling value of the anchor frame, and the obtaining the target detection frame comprises:
calculating to obtain a central point coordinate of the target detection frame according to the central point coordinate offset value;
calculating the width and height of the target detection frame according to the scaling values of the width and the height;
and determining the position of the target detection frame according to the coordinates of the central point of the target detection frame and the width and height of the target detection frame.
5. The method according to any one of claims 1-4, wherein the training process of the object detection model comprises:
adding the training loss of the re-recognition network to the original training loss of the detector to obtain the added training loss;
inputting a training data set consisting of a plurality of objects into the target detection model for iteration, adjusting network parameters of the target detection model according to the added training loss value, and fitting the output of the re-recognition network; the target object is included in the plurality of objects.
6. An apparatus for re-identification of an object, the apparatus comprising:
the image frame acquisition module is used for acquiring a current image frame of the video stream;
the target detection and re-recognition module is used for inputting the current image frame into a pre-trained target detection model, and the target detection model comprises a detector and a re-recognition network; performing target detection on the current image frame through the detector to obtain a target area of a target object in the current image frame, and performing feature extraction on the target area through the re-identification network to obtain a re-identification feature value of the target object;
and the re-recognition characteristic value matching module is used for matching the re-recognition characteristic value with the existing object characteristic value in the database to obtain a re-recognition result of the target object.
7. The apparatus according to claim 6, wherein in the aspect of performing the target detection on the current image frame to obtain the target area of the target object in the current image frame, the target detection and re-identification module is specifically configured to:
extracting the features of the current image frame through the detector to obtain a target feature map;
predicting the target object based on the target feature map and a preset anchor frame to obtain a target detection frame of the target object, and determining an area framed by the target detection frame as the target area.
8. The apparatus according to claim 7, wherein in predicting the target object based on the target feature map and a preset anchor frame to obtain a target detection frame of the target object, the target detection and re-identification module is specifically configured to:
dividing the current image frame into a plurality of grids according to the size of the target feature map;
and if the center of the target object falls into the grid, predicting the target object by using the anchor frame preset by the grid to obtain the target detection frame.
9. An electronic device comprising an input device and an output device, further comprising:
a processor adapted to implement one or more instructions; and the number of the first and second groups,
a computer storage medium having stored thereon one or more instructions adapted to be loaded by the processor and to perform the method of any of claims 1-5.
10. A computer storage medium having stored thereon one or more instructions adapted to be loaded by a processor and to perform the method of any of claims 1-5.
CN202010359731.0A 2020-04-29 2020-04-29 Method and device for identifying weight of object, electronic equipment and storage medium Pending CN113569600A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010359731.0A CN113569600A (en) 2020-04-29 2020-04-29 Method and device for identifying weight of object, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010359731.0A CN113569600A (en) 2020-04-29 2020-04-29 Method and device for identifying weight of object, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113569600A true CN113569600A (en) 2021-10-29

Family

ID=78158651

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010359731.0A Pending CN113569600A (en) 2020-04-29 2020-04-29 Method and device for identifying weight of object, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113569600A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114299316A (en) * 2021-12-27 2022-04-08 浙江蓝卓工业互联网信息技术有限公司 Method and device for removing duplication of image target area
CN116862980A (en) * 2023-06-12 2023-10-10 上海玉贲智能科技有限公司 Target detection frame position optimization correction method, system, medium and terminal for image edge

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273836A (en) * 2017-06-07 2017-10-20 深圳市深网视界科技有限公司 A kind of pedestrian detection recognition methods, device, model and medium
CN110555420A (en) * 2019-09-09 2019-12-10 电子科技大学 fusion model network and method based on pedestrian regional feature extraction and re-identification
CN110807385A (en) * 2019-10-24 2020-02-18 腾讯科技(深圳)有限公司 Target detection method and device, electronic equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273836A (en) * 2017-06-07 2017-10-20 深圳市深网视界科技有限公司 A kind of pedestrian detection recognition methods, device, model and medium
CN110555420A (en) * 2019-09-09 2019-12-10 电子科技大学 fusion model network and method based on pedestrian regional feature extraction and re-identification
CN110807385A (en) * 2019-10-24 2020-02-18 腾讯科技(深圳)有限公司 Target detection method and device, electronic equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114299316A (en) * 2021-12-27 2022-04-08 浙江蓝卓工业互联网信息技术有限公司 Method and device for removing duplication of image target area
CN116862980A (en) * 2023-06-12 2023-10-10 上海玉贲智能科技有限公司 Target detection frame position optimization correction method, system, medium and terminal for image edge
CN116862980B (en) * 2023-06-12 2024-01-23 上海玉贲智能科技有限公司 Target detection frame position optimization correction method, system, medium and terminal for image edge

Similar Documents

Publication Publication Date Title
Xie et al. Multilevel cloud detection in remote sensing images based on deep learning
JP7058669B2 (en) Vehicle appearance feature identification and vehicle search methods, devices, storage media, electronic devices
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
Arietta et al. City forensics: Using visual elements to predict non-visual city attributes
CN111814794B (en) Text detection method and device, electronic equipment and storage medium
CN109086811B (en) Multi-label image classification method and device and electronic equipment
CN111795704A (en) Method and device for constructing visual point cloud map
CN111126258A (en) Image recognition method and related device
CN110781756A (en) Urban road extraction method and device based on remote sensing image
CN109711416B (en) Target identification method and device, computer equipment and storage medium
CN110689021A (en) Real-time target detection method in low-visibility environment based on deep learning
CN112861575A (en) Pedestrian structuring method, device, equipment and storage medium
CN113761999A (en) Target detection method and device, electronic equipment and storage medium
CN114037640A (en) Image generation method and device
CN113537180B (en) Tree obstacle identification method and device, computer equipment and storage medium
CN110852327A (en) Image processing method, image processing device, electronic equipment and storage medium
CN115620090A (en) Model training method, low-illumination target re-recognition method and device and terminal equipment
CN117197462A (en) Lightweight foundation cloud segmentation method and system based on multi-scale feature fusion and alignment
Ataş Performance Evaluation of Jaccard-Dice Coefficient on Building Segmentation from High Resolution Satellite Images
Zhang et al. An improved target detection method based on YOLOv5 in natural orchard environments
CN113569600A (en) Method and device for identifying weight of object, electronic equipment and storage medium
US20230095533A1 (en) Enriched and discriminative convolutional neural network features for pedestrian re-identification and trajectory modeling
CN114332633B (en) Radar image target detection and identification method and equipment and storage medium
CN111783716A (en) Pedestrian detection method, system and device based on attitude information
CN114743139A (en) Video scene retrieval method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination