CN114708593A - Heterogeneous multi-model-based waste electronic product brand identification method - Google Patents

Heterogeneous multi-model-based waste electronic product brand identification method Download PDF

Info

Publication number
CN114708593A
CN114708593A CN202111673248.0A CN202111673248A CN114708593A CN 114708593 A CN114708593 A CN 114708593A CN 202111673248 A CN202111673248 A CN 202111673248A CN 114708593 A CN114708593 A CN 114708593A
Authority
CN
China
Prior art keywords
model
character
output
image
electronic product
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111673248.0A
Other languages
Chinese (zh)
Other versions
CN114708593B (en
Inventor
汤健
王子轩
张晓晓
荆中岭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202111673248.0A priority Critical patent/CN114708593B/en
Publication of CN114708593A publication Critical patent/CN114708593A/en
Application granted granted Critical
Publication of CN114708593B publication Critical patent/CN114708593B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The waste electronic product identification method based on the heterogeneous multi-model is provided, and aims to solve the problems that related data sets are limited, and the identification method precision is difficult to meet the actual industrial requirements. Extracting a character region on the back of the electronic product by using a CTAFT algorithm, and extracting a character part and an overall characteristic of the electronic product to be recovered by using a VGG19 model pre-trained by ImageNet as an image characteristic embedded model; establishing an OCR character recognition model aiming at the character partial characteristics to obtain an OCR sub-model recognition result, and establishing a deep forest classification model aiming at the characters and the overall characteristics to obtain a deep forest sub-model recognition result; and linearly combining the OCR recognition result and the deep forest classification vector, obtaining a category weight vector by using a softmax nonlinear function, and taking the result with the highest weight as an electronic product brand recognition result. The effectiveness is verified based on the real mobile phone and the tablet image shot by the waste electronic product recovery equipment.

Description

Heterogeneous multi-model-based waste electronic product brand identification method
Technical Field
The invention belongs to the field of recovery of waste electronic products.
Background
With the development of science and technology and the rapid popularization of 5G, the speed of changing intelligent electronic products is increasing. According to Strategy analysis forecast, the goods output of the global intelligent electronic products in 2021 is reflected by 6.5 percent on year-on-year basis, and the total amount reaches 13.8 hundred million. The increased speed of replacing electronic products is the main reason for the increasing of the goods output, which also leads to the increasing of the number of personal idle electronic products year by year. Therefore, domestic and foreign markets put higher demands on the recycling efficiency of the electronic product recycling industry. Waste electronic products are taken as a typical urban renewable resource, and are recycled by using unmanned and intelligent recycling equipment, so that a large amount of labor cost can be saved. The intelligent waste electronic product identification method is the key for completing the tasks.
Image recognition has been widely applied in the fields of target detection, face recognition and the like, and how to use a related data set to construct a classification model to intelligently recognize waste electronic products also becomes the key research point of current intelligent recycling equipment. But the deep neural network model construction based on the image depends on massive marked samples. The data set for identifying the problems of the waste electronic products only comes from the real shot pictures of the recovery equipment prototypes, the data volume is small, an effective neural network classification model is difficult to construct, the definition of the shot images in the industrial process is low, and the problems of poor image integrity of the electronic products, partial area mirror images of the electronic products and the like can be caused by the non-standard operation of users. How to complete the classification of electronic product brands on the premise of small sample amount and low sample quality becomes a main problem to be solved currently.
Based on the current research situation, the authors of the present invention propose a waste electronic product identification system based on parallel differential evolution-gradient feature deep forest, and use the back image of the waste mobile phone to construct a mobile phone brand classification model, so that the model classification accuracy can reach 80.12%; the waste electronic product identification system based on optical character identification utilizes the back characters of waste electronic products to construct a character classification model, and the character classification result is mapped into a waste electronic product brand through a mapping rule, and the model classification accuracy can reach 86.37%. However, the method only constructs the classification model from a single angle such as texture features and character features, and the model precision still hardly meets the actual industrial requirements. Therefore, the invention provides a waste electronic product identification method based on heterogeneous multi-models.
Firstly, extracting a back character region of an electronic product by using a CTAFT algorithm; secondly, extracting the characteristics of the back image of the electronic product and the character characteristic region of the back image by using a VGG19 model pre-trained by ImageNet, and replacing single-dimensional characteristics with high-dimensional convolution characteristics; then, an Optical Character Recognition (OCR) model is built based on the character features, and a deep forest electronic product classification model is built based on the image features and the character features of the electronic product; and finally, linearly splicing the classification results of different models, and obtaining a final classification result through a Softmax activation function. Based on a typical electronic product image data set of a telecommunication equipment authentication center of the Ministry of industry and informatization, the effectiveness of the algorithm in identifying the waste electronic products is verified.
Disclosure of Invention
The method for identifying the waste electronic products based on the heterogeneous multi-model comprises the following steps: the image preprocessing module, the multi-element feature extraction module and the heterogeneous multi-model identification module are all 3 parts. The overall system structure is shown in fig. 1.
The meanings of the occurrence variables of the present invention are shown in Table 1.
TABLE 1 meaning of variables
Figure RE-GDA0003626337080000021
Figure RE-GDA0003626337080000031
The input of the image preprocessing module is
Figure RE-GDA0003626337080000032
The data enhancement pre-processing output is XimgThe output of the character preprocessing using the CRAFT character level target detection algorithm is Xdigit
The multivariate feature extraction module uses a VGG19 network based on ImageNet pre-training to obtain the representation of character features and whole back image pixel features in a high-dimensional space, and the input of the multivariate feature extraction module is XimgAnd XdigitThe outputs are respectively
Figure RE-GDA0003626337080000033
And
Figure RE-GDA0003626337080000034
the heterogeneous multi-model recognition module comprises 3 parts including an OCR character recognition submodule, a deep forest electronic product recognition submodule and a softmax nonlinear output layer submodule, wherein: the OCR submodule inputs as
Figure RE-GDA0003626337080000035
Output is as
Figure RE-GDA0003626337080000036
The deep forest submodule is input as
Figure RE-GDA0003626337080000037
Output is as
Figure RE-GDA0003626337080000038
The softmax nonlinear output layer submodule maps the output result of the classification submodule to obtain
Figure RE-GDA0003626337080000041
The label with the highest score is the final output
Figure RE-GDA0003626337080000042
2.1 image preprocessing Module
2.1.1 data enhancement preprocessing
Data enhancement randomly changing training samples can reduce the dependence of the model on certain attributes, thereby improving the generalization capability of the model. The data enhancement method mainly comprises geometric transformation, color space transformation, kernel filter, image mixing, random erasing, countermeasure training, enhancement based on generation of countermeasure network and nerve style transfer. In the recovery process of the waste electronic products, the electronic product images generate position deviation due to different placement positions of users, so that data used by the method is enhanced mainly by geometric transformation. The method specifically comprises the following steps: rotation, turnover, mirror image, translation, addition of Gaussian noise and the like.
2.1.2 character enhancement preprocessing
In the electronic product recycling process, the problems of incomplete image acquisition of a camera, image mirroring of the electronic product and the like can be caused by non-standard operation of a user, and the model construction and electronic product brand prediction effect are poor by directly using a picture acquired by recycling equipment. The back characters of the electronic product are important bases for identifying the brand of the electronic product, but the characters are worn and shielded in the using process of a user, and the model is limited only by taking the back characters as classification bases. Therefore, the author selects character features in the back image of the electronic product as one of classification bases, determines and segments the character positions of the electronic product by adopting a CRAFT character-level image positioning algorithm, and linearly splices the character features and the whole picture to be used as a subsequent model for input. The method solves the problem that the classification model is difficult to construct according to the mobile phone image and the problem that the model constructed according to the single character features is limited.
A large number of experiments show that target detection algorithms such as YOLO3 and Fast-RCNN are widely applied to the fields of face detection, license plate detection and the like, but the aspect ratio of the detected targets of the algorithms is relatively fixed, and the problems of deformation, abrasion and the like do not occur mostly. The CRAFT algorithm trains an artificial data set with character tags in a weak supervised learning mode, when a back picture of an electronic product without character division is used as input, a model can detect, synthesize and generate corresponding character tags and then recognize the character tags, and the algorithm predicts the region where a text is located according to the closeness degree between characters. The CRAFT model training process is shown in FIG. 2.
For an artificial data set, the data set comprises a Gaussian heat map of a single character in the map, and the CRAFT algorithm carries out supervised training on the part; for an image data set at the back of an electronic product, firstly marking a text box area in an image of the electronic product, and stretching the text box area to a more positive text box through perspective transformation; and then obtaining a position frame of a single character by using a watershed algorithm, generating a corresponding Gaussian heat map, and pasting the position frame back to the corresponding position of the label map corresponding to the original image after conversion. The formula for calculating the score of the watershed algorithm segmentation result is as follows:
Figure RE-GDA0003626337080000051
wherein l (w) represents the length of the text box of the image of the electronic product, and lc(w) is the result of dividing the length of the character string by the watershed algorithm.
Obtaining the evaluation of the algorithm according to the formula (1) after the character string length is obtained by the watershed algorithm segmentation, and obtaining the confidence coefficient S if the evaluation is consistent with the real character lengthc(p) is 1, and a lower score indicates a poorer confidence in the segmentation result.
The electronic product image collected by the industrial equipment can be represented as XimgThe size is 400 x 300, and the character image can be represented as the character image after the 2.1-section image preprocessing
Figure RE-GDA0003626337080000052
Wherein
Figure RE-GDA0003626337080000053
Indicating characterThe size of the ith character in the character image is 50 x 50, and the size of the whole character image is 50 x (50 x m). For visual description of pixel variation of multi-feature extraction module, FIG. 3 of the present invention takes a five-character image as an example, XdigitSize 50 x 250. The output dimension of the multi-element feature extraction module is artificially determined, so that the output dimension of the m-character image is consistent with the output dimension of the five-character image. The structure of the multivariate feature extraction module is shown in figure 3.
The module adopts a VGG19 model pre-trained based on ImageNet as a base model, and firstly, the parameters of a convolutional layer and a pooling layer in the VGG19 model are solidified; then, constructing full connection layers with different sizes according to different image characteristics; and finally, linearly combining the model outputs of different images to serve as the input characteristics of a subsequent classification model. And the feature dimension after the multivariate feature extraction is determined by the dimension of the full connection layer. Aiming at mobile phone images X with different sizesimgAnd character image XdigitThe feature extraction process is shown in formula (2):
Figure RE-GDA0003626337080000054
wherein f isVGG(. cndot.) represents the VGG19 model output process.
2.3 heterogeneous multi-model identification module
The accuracy of the electronic product classification model which is built only from a single angle of texture features, character features and the like is verified, and the actual industrial requirement is still difficult to achieve. Therefore, the invention adopts the Stracking integration idea, linearly combines different angle characteristics to construct a heterogeneous multi-classification model, and improves the overall model precision through the integration of a plurality of different models. Aiming at the waste electronic product brand classification problem, an OCR character recognition model and a deep forest recognition model are constructed, and the sub-model structure is shown as follows.
2.3.1OCR character recognition model submodule
In the back character recognition process of the OCR electronic product, only 2.2 subsections of the character features are used
Figure RE-GDA0003626337080000061
As an input. Firstly, extracting character sequence characteristics containing complete context information through a bidirectional LSTM; then, the problem that input characteristics and output sequences cannot be aligned to the CTC network is solved; finally, determining the distance between the OCR output character string and the known label through the Levensstein distance, and finally obtaining the brand classification result of the electronic product
Figure RE-GDA0003626337080000062
The structure of the OCR character recognition model is shown in FIG. 4.
As shown in FIG. 4, the OCR character recognition model derives character features from the image preprocessing portion
Figure RE-GDA0003626337080000063
Construction of k (where k is>m) LSTM elementary units. The bidirectional LSTM network contains two sub-network structures, equations (3), (4) representing antecedent and consequent delivery, respectively.
Figure RE-GDA0003626337080000064
Figure RE-GDA0003626337080000065
Wherein k is the LSTM basic unit hyperparameter,
Figure RE-GDA0003626337080000066
representing the output result of forward LSTM at time i,
Figure RE-GDA0003626337080000067
represents the output result of the backward LSTM at time i, represents the ith input xiThe bi-directional LSTM output at time i is:
Figure RE-GDA0003626337080000068
then, CTC network outputs [ h ] to bidirectional LSTM network1,h2,...,hx]In the repeated recognition, the character is de-duplicated to be [ y1,y2,...,yn]. Since the bidirectional LSTM base unit is more than the number n of handset characters, resulting in repeated character division, for example, "honor" may be segmented into "hononorr". Multiple substrings of 'hoonorr' can be mapped into correct result 'honor', as shown in formula (6)
Figure RE-GDA0003626337080000069
The CTC network obtains the final result Y by maximizing the posterior probability P < Y | X > given input X, where P < Y | X > is as shown in equation (7):
Figure RE-GDA00036263370800000610
where π ∈ B (Y) denotes the set of all substrings that can be integrated into Y.
2.3.2 deep forest recognition model submodule
In the image recognition process of the waste electronic products in the deep forest, the character features are used
Figure RE-GDA0003626337080000071
And image features
Figure RE-GDA0003626337080000072
Linear combination to obtain deep forest input characteristics XDFThe process is shown in equation (8):
Figure RE-GDA0003626337080000073
first, using XDFConstructing different random forests and obtaining different random forest outputs
Figure RE-GDA0003626337080000074
Then outputting the random forest
Figure RE-GDA0003626337080000075
And XDFLinear combination, which is used as input to be transmitted into a next layer model to construct different random forests, and whether to continue constructing the next layer network model is determined according to the classification precision of the current model; and finally, ending the model growth when the model precision is not improved any more, weighting the final multiple random forest classification results to obtain the final classification result
Figure RE-GDA0003626337080000076
The structure of the deep forest recognition model is shown in FIG. 5.
2.3.3 Multi-model output weighting Module submodule
OCR character recognition model output in the heterogeneous multi-model recognition module
Figure RE-GDA0003626337080000077
The continuous character strings are output as a certain electronic product brand after being mapped based on distance measurement, and the deep forest recognition model is output
Figure RE-GDA0003626337080000078
The probabilities of all electronic product brands. In order to solve the problem that the output forms of heterogeneous models are different or the output results are inconsistent, a multi-model output weighting module is added at the end of a classification model. The multi-model weighted output module is shown in fig. 6.
The softmax function is also called a normalized exponential function, and is a classifier widely used in a supervised learning part of a deep network in the current deep learning research. The Softmax function is shown in equation (9):
Figure RE-GDA0003626337080000079
wherein n +1 represents the heterogeneous multi-model output vector dimension, and e represents the natural logarithm. In the classification model of the invention, n labels of waste electronic products are set, and OCR character recognition results
Figure RE-GDA00036263370800000710
Depth forest recognition result
Figure RE-GDA00036263370800000711
Obtaining n +1 dimensional result vector after linear splicing
Figure RE-GDA00036263370800000712
As input to the softmax function, the final result is
Figure RE-GDA00036263370800000713
Taking the label with the highest weight corresponding to the weight as the final classification result of the waste electronic products
Figure RE-GDA00036263370800000714
1. And (3) performing feature extraction on the images of the waste electronic products by using a VGG19 network pre-trained by ImageNet. With the deepening of the convolutional layer, the receptive field of a single feature is continuously increased, the characterization capability of the feature is continuously enhanced, and the feature is better than that of a single-angle feature extraction method. Compared with the deep forest classification model constructed only by the HOG features sensitive to the textures, the model constructed by the VGG19 is obviously improved in precision.
2. The method comprises the steps of constructing a waste mobile phone electronic product classification model by adopting a heterogeneous multi-model method, constructing classification models for different tasks through the same data set, and finally weighting output of a plurality of models through nonlinear functions to obtain a final classification result. The test proves that. Compared with a single OCR recognition model and a single depth forest recognition model, the heterogeneous multi-model provided by the invention has obviously improved precision.
Drawings
FIG. 1 shows a structure diagram of a recognition method of heterogeneous multi-model waste electronic products
FIG. 2CRAFT positioning clipping module structure diagram
FIG. 3 Multi-feature extraction Module
FIG. 4OCR character recognition model
FIG. 5 deep forest identification model
FIG. 6 Multi-model weighted output Structure
FIG. 7 application scenario of waste electronic product recycling equipment
FIG. 8 is a diagram of data enhancement effect
FIG. 9 image preprocessing results
FIG. 10OCR recognition model confusion matrix
FIG. 11 deep forest recognition model confusion matrix
FIG. 12 is a confusion matrix of heterogeneous multi-model waste electronic product classification models
Detailed Description
An application scenario of the waste electronic product recovery equipment is shown in fig. 7, and experimental data of the invention are derived from a real-shot picture of the equipment. The data set comprises 123 images, which comprise 10 categories of brands of waste electronic products, namely Huawei mobile phone (HUAWEI), Huawei tablet (MatePad) glory (Honor), millet (Mi), Zhongxing (ZTE), OPPO, VIVO, apple (iPhone), apple tablet (iPad) and other brands (other).
Because the waste electronic equipment real shooting samples are few, the training set and the test set samples are expanded by adopting a data enhancement method before the classification model is constructed. Taking the honor mobile phone back image as an example, the mobile phone back image is rotated, folded, added with noise and the like, 1 mobile phone back image sample is expanded to 12, and the total number of the samples is 400 images expanded to 4800. The sample expansion diagram is shown in fig. 8.
Then, a CRAFT character segmentation algorithm is adopted to segment the images of the waste electronic products to be recovered, so as to obtain a corresponding electronic product character data set, and the image preprocessing result is shown in FIG. 9.
The multi-feature extraction part uses an ImageNet data set of 1400 million pictures and 2 million classes to pre-train the VGG19 model, and the pre-trained VGG model is expressed as fVGG(. cndot.). And the multi-element feature extraction part adds full connection layers with different sizes in the VGG model according to the difference of input images, wherein a 1024-dimensional full connection layer is added for a 400 x 300 waste electronic product image, and a 512-dimensional full connection layer is added for a 50 x 50 waste electronic character image.
The OCR character recognition module is an easy OCR Chinese and English character pre-training model and constructs 128 LSTM basic units, namely k is 128. The depth forest identification model uses a random forest and GBDT as base classifiers of each layer of depth forest model, wherein the RF and the GBDT are constructed by using 100 decision trees, and a GBDT loss function is optimized by using L1+ L2 regularization.
An OCR character recognition model classification confusion matrix is constructed by using the character pictures after image preprocessing as shown in FIG. 10, and a deep forest recognition model classification confusion matrix is constructed by using the waste electronic product images and the character pictures as shown in FIG. 11.
The results of the 2 models are integrated through a multi-model output weighting module, so that a waste electronic product classification model confusion matrix based on heterogeneous multi-models is obtained, and as shown in fig. 12, the classification precision can reach 90.17%.
In order to verify the effectiveness of the method, the same waste electronic product data set is used for respectively constructing 10 classification models of a single feature + deep forest, a VGG feature + deep forest and a VGG feature + OCR model. The accuracy of different brand classification models based on waste electronic product image data sets is shown in table 1.
TABLE 1 precision comparison table for identification model of waste electronic products
Figure RE-GDA0003626337080000091

Claims (1)

1. The method for identifying the waste electronic products based on the heterogeneous multi-model is characterized by comprising the following steps: the image preprocessing module, the multi-element feature extraction module and the heterogeneous multi-model identification module are all 3 parts;
the meanings of appearance variables are shown in table 1;
TABLE 1 meaning of variables
Figure FDA0003453620870000011
Figure FDA0003453620870000021
The input of the image preprocessing module is
Figure FDA0003453620870000022
The data enhancement pre-processing output is XimgThe output of the character preprocessing using the CRAFT character level target detection algorithm is Xdigit
The multivariate feature extraction module obtains the representation of character features and whole back image pixel features in a high-dimensional space by using a VGG19 network based on ImageNet pre-training, and the input of the module is XimgAnd XdigitThe outputs are respectively
Figure FDA0003453620870000023
And
Figure FDA0003453620870000024
the heterogeneous multi-model recognition module comprises 3 parts including an OCR character recognition submodule, a deep forest electronic product recognition submodule and a softmax nonlinear output layer submodule, wherein: the OCR submodule inputs as
Figure FDA0003453620870000025
Output is as
Figure FDA0003453620870000026
The deep forest submodule is input as
Figure FDA0003453620870000027
Output is as
Figure FDA0003453620870000028
The softmax nonlinear output layer submodule maps the output result of the classification submodule to obtain
Figure FDA0003453620870000029
The label with the highest score is the final output
Figure FDA00034536208700000210
The image preprocessing module comprises data enhancement preprocessing and character enhancement preprocessing;
character features in an image on the back of an electronic product are selected as one of classification bases in character enhancement preprocessing, the character positions of the electronic product are determined and segmented by adopting a CRAFT character-level image positioning algorithm, and the character features and an overall picture are linearly spliced to be input as a subsequent model;
for an artificial data set, the data set comprises a Gaussian heat map of a single character in the map, and the CRAFT algorithm carries out supervised training on the part; for an image data set at the back of an electronic product, firstly marking a text box area in an image of the electronic product, and stretching the text box area to a more positive text box through perspective transformation; then, obtaining a position frame of a single character by using a watershed algorithm, generating a corresponding Gaussian heat map, and pasting the position frame back to the corresponding position of the label map corresponding to the original image after conversion; the formula for calculating the score of the watershed algorithm segmentation result is as follows:
Figure FDA0003453620870000031
wherein l (w) represents the length of the text box of the image of the electronic product, and lc(w) is the result of dividing the length of the character string by the watershed algorithm;
obtaining the evaluation of the algorithm according to the formula (1) after the character string length is obtained by the watershed algorithm segmentation, and obtaining the confidence coefficient S if the evaluation is consistent with the real character lengthc(p) is 1, the lower the score is, the worse the reliability of the segmentation result is;
the electronic product image collected by the industrial equipment is represented as XimgSize 400 x 300, and the character image after image preprocessing is represented as
Figure FDA0003453620870000032
Wherein
Figure FDA0003453620870000033
The method comprises the steps of representing the ith character in a character image, wherein the size of a single character is 50 x 50, and the size of the whole character image is 50 x (50 x m); the output dimension of the m character image is consistent with that of the five character image;
the module adopts a VGG19 model pre-trained based on ImageNet as a base model, and firstly, parameters of a convolutional layer and a pooling layer in a VGG19 model are solidified; then, constructing full connection layers with different sizes according to different image characteristics; finally, the models of different images are output to be linearly combined to be used as the input characteristics of a subsequent classification model; the feature dimension after the multi-element feature extraction is determined by the dimension of the full connection layer; aiming at mobile phone images X with different sizesimgAnd character image XdigitThe feature extraction process is shown in formula (2):
Figure FDA0003453620870000034
wherein f isVGG() represents the VGG19 model output process;
constructing an OCR character recognition model and a deep forest recognition model, wherein the sub-model structure is shown as follows;
a) OCR character recognition model submodule
In the OCR electronic product back character recognition flow, only the character features are used
Figure FDA0003453620870000035
As an input; firstly, extracting character sequence characteristics containing complete context information through a bidirectional LSTM; then, the problem that input characteristics and output sequences cannot be aligned to the CTC network is solved; finally, determining the distance between the OCR output character string and the known label through the Levensstein distance, and finally obtaining the brand classification result of the electronic product
Figure FDA0003453620870000041
OCR character recognition model obtains character features according to image preprocessing part
Figure FDA0003453620870000042
Constructing k (wherein k > m) LSTM basic units; the bidirectional LSTM network comprises two sub-network structures, and formulas (3) and (4) respectively represent the transmission of a front item and a back item;
Figure FDA0003453620870000043
Figure FDA0003453620870000044
wherein k is the LSTM basic unit hyperparameter,
Figure FDA0003453620870000045
representing the output result of forward LSTM at time i,
Figure FDA0003453620870000046
represents the output result of the backward LSTM at time i, represents the ith input xiThe bi-directional LSTM output at time i is:
Figure FDA0003453620870000047
next, the CTC network outputs [ h ] to the bidirectional LSTM network1,h2,...,hx]In the repeated recognition, the character is de-duplicated to be [ y1,y2,...,yn](ii) a Since the bidirectional LSTM basic unit is more than the number n of mobile phone characters, the characters are repeatedly divided, for example, "honor" is divided into "hononorr"; multiple substrings of 'hoonorr' can be mapped into correct result 'honor', as shown in formula (6)
Figure FDA0003453620870000048
The CTC network obtains the final result Y by maximizing the posterior probability P < Y | X > given input X, where P < Y | X > is as shown in equation (7):
Figure FDA0003453620870000049
wherein, pi epsilon B (Y) represents all substring sets which can be integrated into Y;
b) deep forest recognition model submodule
In the image recognition process of the waste electronic products in the deep forest, the character features are used
Figure FDA00034536208700000410
And image features
Figure FDA00034536208700000411
Linear combination to obtain deep forest input characteristic XDFThe process is shown in equation (8):
Figure FDA00034536208700000412
first, using XDFConstructing different random forests to obtain different random forest outputs
Figure FDA00034536208700000413
Then outputting the random forest
Figure FDA00034536208700000414
And XDFLinear combination, which is used as input to be transmitted into a next layer model to construct different random forests, and whether to continue constructing the next layer network model is determined according to the classification precision of the current model; finally, ending model growth when model accuracy is no longer improved, and adding the last moreWeighting the classification result of the forest to obtain the final classification result
Figure FDA00034536208700000415
c) Multi-model output weighting module submodule
OCR character recognition model output in the heterogeneous multi-model recognition module
Figure FDA0003453620870000057
The continuous character strings are output as a certain electronic product brand after being mapped based on distance measurement, and the deep forest recognition model is output
Figure FDA0003453620870000058
Probability of all electronic product brands; in order to solve the problem that the output forms of the heterogeneous models are different or the output results are inconsistent, a multi-model output weighting module is added to the classification model;
the softmax function is also called a normalization index function, and is a classifier widely used in a supervised learning part of a deep network in the current deep learning research; the Softmax function is shown in equation (9):
Figure FDA0003453620870000051
wherein n +1 represents the dimension of the heterogeneous multi-model output vector, and e represents the natural logarithm; in the classification model, n labels of the waste electronic products are set, and OCR character recognition results are obtained
Figure FDA0003453620870000052
Depth forest recognition result
Figure FDA0003453620870000053
Obtaining n +1 dimensional result vector after linear splicing
Figure FDA0003453620870000054
As input to the softmax function, the final result is
Figure FDA0003453620870000055
Taking the label with the highest weight corresponding to the weight as the final classification result of the waste electronic products
Figure FDA0003453620870000056
CN202111673248.0A 2021-12-31 2021-12-31 Heterogeneous multi-model-based brand recognition method for waste electronic products Active CN114708593B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111673248.0A CN114708593B (en) 2021-12-31 2021-12-31 Heterogeneous multi-model-based brand recognition method for waste electronic products

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111673248.0A CN114708593B (en) 2021-12-31 2021-12-31 Heterogeneous multi-model-based brand recognition method for waste electronic products

Publications (2)

Publication Number Publication Date
CN114708593A true CN114708593A (en) 2022-07-05
CN114708593B CN114708593B (en) 2024-06-14

Family

ID=82167256

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111673248.0A Active CN114708593B (en) 2021-12-31 2021-12-31 Heterogeneous multi-model-based brand recognition method for waste electronic products

Country Status (1)

Country Link
CN (1) CN114708593B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111931953A (en) * 2020-07-07 2020-11-13 北京工业大学 Multi-scale characteristic depth forest identification method for waste mobile phones
WO2021022970A1 (en) * 2019-08-05 2021-02-11 青岛理工大学 Multi-layer random forest-based part recognition method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021022970A1 (en) * 2019-08-05 2021-02-11 青岛理工大学 Multi-layer random forest-based part recognition method and system
CN111931953A (en) * 2020-07-07 2020-11-13 北京工业大学 Multi-scale characteristic depth forest identification method for waste mobile phones

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王德青;吾守尔・斯拉木;许苗苗;: "场景文字识别技术研究综述", 计算机工程与应用, no. 18, 31 December 2020 (2020-12-31), pages 7 - 21 *
班晓娟;宿彦京;谢建新;: "深度学习在材料显微图像分析中的应用与挑战", 材料科学与工艺, no. 03, 31 December 2020 (2020-12-31), pages 74 - 81 *

Also Published As

Publication number Publication date
CN114708593B (en) 2024-06-14

Similar Documents

Publication Publication Date Title
Yan et al. A graph convolutional neural network for classification of building patterns using spatial vector data
CN111639544B (en) Expression recognition method based on multi-branch cross-connection convolutional neural network
CN110321967B (en) Image classification improvement method based on convolutional neural network
Huang et al. Multiple morphological profiles from multicomponent-base images for hyperspectral image classification
CN113657450B (en) Attention mechanism-based land battlefield image-text cross-modal retrieval method and system
Kang et al. Deep learning-based weather image recognition
CN112633350A (en) Multi-scale point cloud classification implementation method based on graph convolution
Das et al. Automated Indian sign language recognition system by fusing deep and handcrafted feature
Luo et al. On the eigenvectors of p-Laplacian
CN112163114B (en) Image retrieval method based on feature fusion
Kollapudi et al. A New Method for Scene Classification from the Remote Sensing Images.
Liu et al. A semi-supervised high-level feature selection framework for road centerline extraction
Su et al. Probabilistic collaborative representation based ensemble learning for classification of wetland hyperspectral imagery
CN116935100A (en) Multi-label image classification method based on feature fusion and self-attention mechanism
Zhou et al. Infrared handprint classification using deep convolution neural network
Vijayalakshmi K et al. Copy-paste forgery detection using deep learning with error level analysis
CN114913337A (en) Camouflage target frame detection method based on ternary cascade perception
Ying et al. License plate detection and localization in complex scenes based on deep learning
CN113011506A (en) Texture image classification method based on depth re-fractal spectrum network
CN114708593B (en) Heterogeneous multi-model-based brand recognition method for waste electronic products
Turtinen et al. Contextual analysis of textured scene images.
Eurviriyanukul et al. Evaluation of recognition of water-meter digits with application programs, APIs, and machine learning algorithms
Zhong et al. Fuzzy neighborhood learning for deep 3-D segmentation of point cloud
Sun et al. The recognition framework of deep kernel learning for enclosed remote sensing objects
Anggoro et al. Classification of Solo Batik patterns using deep learning convolutional neural networks algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant