CN109978074A - Image aesthetic feeling and emotion joint classification method and system based on depth multi-task learning - Google Patents
Image aesthetic feeling and emotion joint classification method and system based on depth multi-task learning Download PDFInfo
- Publication number
- CN109978074A CN109978074A CN201910272826.6A CN201910272826A CN109978074A CN 109978074 A CN109978074 A CN 109978074A CN 201910272826 A CN201910272826 A CN 201910272826A CN 109978074 A CN109978074 A CN 109978074A
- Authority
- CN
- China
- Prior art keywords
- image
- classification
- aesthetic feeling
- depth
- neural networks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
Present disclose provides a kind of image aesthetic feeling based on depth multi-task learning and emotion joint classification method and system.Wherein, which includes: the corresponding aesthetic feeling classification of mark image and emotional category, forms training dataset;Construction includes the depth convolutional neural networks of across branch articulamentum and Liang Ge parallel network branch;Depth convolutional neural networks are trained using training dataset, until predefined loss function reaches minimum;The depth convolutional neural networks output given image obtained using training belongs to the probability of each aesthetic feeling classification and each emotional category, chooses prediction aesthetic feeling classification and emotional category of the classification respectively as given image of maximum probability in aesthetic feeling classification and emotional category.
Description
Technical field
The disclosure belongs to technical field of computer vision more particularly to a kind of image aesthetic feeling based on depth multi-task learning
With emotion joint classification method and system.
Background technique
Only there is provided background technical informations relevant to the disclosure for the statement of this part, it is not necessary to so constitute first skill
Art.
With the fast development of computer vision technique, people are not intended merely to computer capacity enough in semantic level to image
Content is analyzed, it more desirable to which computer can simulate human vision and thought system, generate higher level sensing capability.Make
Two representative tasks in research are understood for perception, and the aesthetic feeling classification of image and emotional semantic classification are respectively intended to make computer can be with
The aesthetic and emotional responses that the identification mankind are generated by being stimulated by image vision.Currently, the aesthetic feeling classification of image and emotion point
Class technology has been applied in the storage of image, editor, retrieval etc..For example, for user's shooting about same object
Or multiple candidate photos of scene, the works of screening most aesthetic feeling are saved and are shown, reasonably reduce the storage overhead of data;?
In the creation and editor of image artifacts, the aesthetic quality of analysis comparison candidate scheme promotes the visual sense of beauty of works;It is examined in image
In cable system, the Sentiment orientation for returning to image is considered, provide semantic accurate and more infectious search result for user.
It is automatic to realize to the aesthetic feeling classification of image and emotion due to the diversity of picture material and the complexity of human perception
Classification is challenging task.In recent years, have benefited from the large-scale image number with aesthetic feeling label and emotion label
According to the appearance of collection, the method based on machine learning is widely adopted.The core procedure of method is to extract to have in classification task
The Image Visual Feature of good discrimination ability.The method of early stage relies primarily on the feature of engineer, needs researcher to problem
Itself there is deep understanding.As deep learning is in the rise of computer vision field, recent method mainly utilizes convolution refreshing
Through network, automatically extraction feature is used for image aesthetic feeling and emotional semantic classification, and obtains preferable effect.
Inventors have found that aesthetic feeling classification and emotional semantic classification of the prior art usually by image are as two mutually independent
Business.But instinctively, the aesthetic feeling impression and emotion impression of the mankind is not isolated appearance;On the contrary, in psychological cognition level, it
Should be interrelated and interactional.For example, if piece image can make the pleasure of people's acquisition aesthetically, it
It is likely to arouse the positive emotion of observer.The research of neuroscience field also turns out that the aesthetic experience of the mankind is a kind of
The cognitive process constantly upgraded along with affective state, vice versa.
Summary of the invention
To solve the above-mentioned problems, the first aspect of the disclosure provides a kind of image beauty based on depth multi-task learning
Sense and emotion joint classification method, by unified depth convolutional neural networks frame, making can be effective between two tasks
Ground shared information realizes aesthetic feeling classification and emotional category the joint identification to image and identification accuracy and efficiency.
To achieve the goals above, the disclosure adopts the following technical scheme that
A kind of image aesthetic feeling and emotion joint classification method based on depth multi-task learning, comprising:
The corresponding aesthetic feeling classification of image and emotional category are marked, training dataset is formed;
Construction includes the depth convolutional neural networks of across branch articulamentum and Liang Ge parallel network branch;
Wherein, two network branches are each responsible for carrying out aesthetic feeling classification and emotional semantic classification to input picture;Across branch connection
Layer is for connecting corresponding convolutional layer group in two network branches, to be associated with aesthetic feeling classification and the two tasks of emotional semantic classification;It is deep
The output representing input images of degree convolutional neural networks belong to the probability of each aesthetic feeling classification and each emotional category;
Depth convolutional neural networks are trained using training dataset, until predefined loss function reaches minimum;
The depth convolutional neural networks output given image obtained using training belongs to each aesthetic feeling classification and each emotional category
Probability, choose the classification of maximum probability in aesthetic feeling classification and emotional category respectively as the prediction aesthetic feeling classification of given image and
Emotional category.
To solve the above-mentioned problems, the second aspect of the disclosure provides a kind of image beauty based on depth multi-task learning
Sense and emotion joint classification system, by unified depth convolutional neural networks frame, making can be effective between two tasks
Ground shared information realizes aesthetic feeling classification and emotional category the joint identification to image and identification accuracy and efficiency.
To achieve the goals above, the disclosure adopts the following technical scheme that
A kind of image aesthetic feeling and emotion joint classification system based on depth multi-task learning, comprising:
Training dataset forms module, is used to mark the corresponding aesthetic feeling classification of image and emotional category, forms training number
According to collection;
Depth convolutional neural networks constructing module is used to construct comprising across branch articulamentum and Liang Ge parallel network branch
Depth convolutional neural networks;
Wherein, two network branches are each responsible for carrying out aesthetic feeling classification and emotional semantic classification to input picture;Across branch connection
Layer is for connecting corresponding convolutional layer group in two network branches, to be associated with aesthetic feeling classification and the two tasks of emotional semantic classification;It is deep
The output representing input images of degree convolutional neural networks belong to the probability of each aesthetic feeling classification and each emotional category;
Depth convolutional neural networks training module is used to train depth convolutional neural networks using training dataset,
Until predefined loss function reaches minimum;
Predict categorization module, the depth convolutional neural networks output given image for being used to obtain using training belongs to each beauty
Feel the probability of classification and each emotional category, chooses the classification of maximum probability in aesthetic feeling classification and emotional category respectively as given figure
The prediction aesthetic feeling classification and emotional category of picture.
To solve the above-mentioned problems, a kind of computer readable storage medium is provided in terms of the third of the disclosure, passed through
Unified depth convolutional neural networks frame makes that information can be effectivelyd share between two tasks, realizes the aesthetic feeling to image
Classification and the identification of emotional category joint and identification accuracy and efficiency.
To achieve the goals above, the disclosure adopts the following technical scheme that
A kind of computer readable storage medium, is stored thereon with computer program, realization when which is executed by processor
Step in image aesthetic feeling and emotion joint classification method based on depth multi-task learning described above.
To solve the above-mentioned problems, the 4th aspect of the disclosure provides a kind of computer equipment, passes through unified depth
Convolutional neural networks frame is spent, makes that information can be effectivelyd share between two tasks, realizes the aesthetic feeling classification and feelings to image
Feel the identification of classification joint and identification accuracy and efficiency.
To achieve the goals above, the disclosure adopts the following technical scheme that
A kind of computer equipment can be run on a memory and on a processor including memory, processor and storage
Computer program, the processor realize the image aesthetic feeling described above based on depth multi-task learning when executing described program
With the step in emotion joint classification method.
The beneficial effect of the disclosure is:
The disclosure applies to the thought of multi-task learning in the aesthetic feeling classification and emotional semantic classification of image, takes full advantage of
Associate feature between two tasks, and a unified depth convolutional neural networks frame is devised, by being connected across branch
Layer makes can to make effectively share between two tasks in a manner of swap image characteristic pattern between network branches information, and
Automatically learn which information is different task need in training process, realizes to combine the aesthetic feeling classification and emotional category of image and know
Not, the accuracy of image aesthetic feeling classification and emotional category is improved.
Detailed description of the invention
The Figure of description for constituting a part of this disclosure is used to provide further understanding of the disclosure, and the disclosure is shown
Meaning property embodiment and its explanation do not constitute the improper restriction to the disclosure for explaining the disclosure.
Fig. 1 is a kind of image aesthetic feeling and emotion joint classification based on depth multi-task learning that the embodiment of the present disclosure provides
Method flow diagram.
Fig. 2 is the depth convolutional neural networks schematic diagram that the embodiment of the present disclosure provides.
Fig. 3 is across the branch articulamentum schematic diagram that the embodiment of the present disclosure provides.
Fig. 4 is a kind of image aesthetic feeling and emotion joint classification based on depth multi-task learning that the embodiment of the present disclosure provides
System structure diagram.
Specific embodiment
The disclosure is described further with embodiment with reference to the accompanying drawing.
It is noted that following detailed description is all illustrative, it is intended to provide further instruction to the disclosure.Unless another
It indicates, all technical and scientific terms used herein has usual with disclosure person of an ordinary skill in the technical field
The identical meanings of understanding.
It should be noted that term used herein above is merely to describe specific embodiment, and be not intended to restricted root
According to the illustrative embodiments of the disclosure.As used herein, unless the context clearly indicates otherwise, otherwise singular
Also it is intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet
Include " when, indicate existing characteristics, step, operation, device, component and/or their combination.
The image aesthetic feeling and emotion joint classification based on depth multi-task learning that 1 pair of disclosure proposes with reference to the accompanying drawing
Method elaborates.
As shown in Figure 1, a kind of image aesthetic feeling and emotion joint classification side based on depth multi-task learning of the present embodiment
Method, comprising:
S101: the corresponding aesthetic feeling classification of mark image and emotional category form training dataset.
In specific implementation, in image aesthetic feeling classification problem, two class of high aesthetic feeling and low aesthetic feeling is divided the image into;In image
In emotional semantic classification problem, pleasure is divided the image into, reveres, meet, excitement, indignation, detest, is frightened, total eight bases of sadness
Emotional category.
Since the aesthetic feeling and emotion of people are all the very strong cognition attributes of subjectivity, there are apparent individual differences.Therefore,
For the aesthetic feeling classification of image and the mark of emotional category, the strategy that same piece image is marked jointly using more people, it
The classification for taking the highest classification of common recognition degree final as image afterwards.
It should be understood that in other examples, the classification of image aesthetic feeling and Image emotional semantic classification can also be divided into other classes
Not, those skilled in the art can self-setting as the case may be, be not described in detail here.
S102: construction includes the depth convolutional neural networks of across branch articulamentum and Liang Ge parallel network branch;
Wherein, two network branches are each responsible for carrying out aesthetic feeling classification and emotional semantic classification to input picture;Across branch connection
Layer is for connecting corresponding convolutional layer group in two network branches, to be associated with aesthetic feeling classification and the two tasks of emotional semantic classification;It is deep
The output representing input images of degree convolutional neural networks belong to the probability of each aesthetic feeling classification and each emotional category.
Specifically, in the depth convolutional neural networks, convolutional layer group quantity in two network branches is identical to be
n;Quantity across branch's articulamentum is n-1;I-th of across branch articulamentum is by i-th of corresponding convolutional layer in two network branches
The characteristics of image figure of group output is stacked as input, and by these characteristics of image figures inputted along channel direction, by heap
The characteristics of image of poststack is separately input into the corresponding convolutional layer group of i+1 in two network branches;1≤i≤n-1;N be greater than
Or the positive integer equal to 2.
Depth convolutional neural networks in the present embodiment are as shown in Fig. 2.Network includes two parallel branch altogether, they connect
By same width input picture, and it is each responsible for carrying out aesthetic feeling classification and emotional semantic classification to input picture.The knot of each network branches
Structure is identical, is all based on VGG16 network structure (referring to Simonyan K, Zisserman A.Very deep convolutional
networks for large-scale image recognition.arXiv preprint arXiv:1409.1556,
2014.).Each network branches are made of 5 convolutional layer groups, 3 full articulamentums and 1 Softmax layers.Wherein, single convolution
Comprising multiple continuous convolutional layers and 1 maximum pond layer in layer group, the purpose is to extract effective characteristics of image figure.Full connection
Layer carries out multiple nonlinear transformation to the characteristics of image figure of the last one convolutional layer group output, is mapped as a column vector.
The dimension of vector is equal to the number of aesthetic feeling classification or emotional category, per the specific aesthetic feeling classification of one-dimensional correspondence one or emotion class
Not.By final Softmax layer, which is converted into a probability value per one-dimensional, and representing input images belong to correspondence
The probability of classification.Each layer of specific structure and parameter setting are referring to VGG16 network model in network branches.
Across branch articulamentum is introduced, convolutional layer group corresponding in two network branches is attached, across branch articulamentum
Structure it is as shown in Fig. 3.The characteristics of image figure that across branch articulamentum exports two convolutional layer groups as input, and by they
It is stacked along channel direction.Assuming that the channel number of heap prestack single image characteristic pattern is K (K is positive integer), then stack
The channel number of characteristics of image figure is 2K afterwards.Then, the characteristics of image figure of heap poststack is inputted two convolution kernel sizes respectively is
The convolutional layer of 1*1.The two convolutional layers all include K convolution kernel, and the step-length and edge filling size of convolution kernel are respectively 1 and 0.
In this way, two convolutional layers will export new characteristics of image figure again, and the size of new characteristics of image figure is constant, channel
Number reverts to K, and new characteristics of image figure is finally sent into the subsequent convolutional layer group of a network branches or full articulamentum respectively.
Intuitively, across branch articulamentum makes to carry out shared information in a manner of swap image characteristic pattern between two network branches, and
Facilitating model, automatically study determines which information is two tasks be respectively necessary in the training process;
In traditional depth multi-task learning method, different task is normally provided as sharing lower network layer, and
Respective branch is maintained in higher network layer.Before carrying out multi-task learning training, need by virtue of experience artificial in advance
Specify shared network layer in ground.This way lacks theoretical direction, for sharing the unreasonable selection side of may result in of network layer
The serious downslide of method performance.It is different from the above method, the present embodiment all designs individually for different task on all-network layer
Network branches, across branch articulamentum make between network branches can in a manner of swap image characteristic pattern come shared information,
And automatically which information is study different task need in the training process, and then improves classification accuracy.
It should be noted that sequence between step 101 and step 102 can according to the concrete condition of those skilled in the art come
Voluntarily adjustment sequence.
S103: training depth convolutional neural networks using training dataset, until predefined loss function reaches minimum.
In specific implementation, the process of depth convolutional neural networks is trained using training dataset, comprising:
The size dimension of all images of unified training dataset;
Initialize the weight of each layer of depth convolutional neural networks, predefined loss function;
Depth convolutional neural networks are trained using stochastic gradient descent algorithm, determination can make loss function minimum
Network weight;And in each training iteration, one piece of fixed size image block is cut out from the random position of image, and with
Certain probability carries out flip horizontal to image block.
During training depth convolutional neural networks using training dataset, firstly, by all training image scalings
To the size of unified size, the present embodiment is by image zooming to 256*256 pixel;Then, the pixel for calculating training image is average
Value, and make every piece image that the mean value be individually subtracted, which can make training image remove common ground, highlight training image
Individual difference;Finally, it is big to cut out one piece of fixation from the random position for the image for subtracting mean value in each training iteration
Small image block, and flip horizontal is carried out to image block with certain probability.In this way, training sample can effectively be expanded
Quantity, the diversity of training for promotion sample.What the present embodiment was chosen is the image block of 224*224 pixel size, is carried out every time
The probability of flip horizontal operation is 0.5.
In addition to the full articulamentum of the last layer and across branch articulamentum, the weight of each each layer of network branches is all made of
The weight of the VGG16 model of pre-training initializes on ImageNet data set, to the full articulamentum of the last layer and across branch company
The weight for connecing layer carries out random initializtion.Using cross entropy loss function, being defined on the classificatory loss of aesthetic feeling is La, in emotion
Classificatory loss is Le, i.e.,
La=-ya logpa-(1-ya)log(1-pa)
Wherein, yaThe true aesthstic classification for showing input picture, if it is high artistic image, value 1 that image is practical;It is no
Then, value 0.yeShow the real feelings classification of input picture, if image actually belongs to e-th of emotional category, value 1;
Otherwise, value 0.paThe image for representing network output belongs to the probability of high aesthetic feeling classification, peThe image for representing network output belongs to
The probability of e-th of emotional category.
Further, total loss function is L=La+ λ Le.Wherein, λ is the hyper parameter of two class of balance model loss.?
In the present embodiment, consider that aesthetic feeling is classified as two classification problems, and emotional semantic classification is more classification problems, therefore the value that sets of λ is 1/4.
Network is trained using stochastic gradient descent algorithm, determination can make the smallest network weight of loss function.
S104: the depth convolutional neural networks output given image obtained using training belongs to each aesthetic feeling classification and each emotion
The probability of classification chooses prediction aesthetic feeling class of the classification respectively as given image of maximum probability in aesthetic feeling classification and emotional category
Other and emotional category.
In the present embodiment, piece image is given, first by its scaling to 224*224 pixel, then image is inputted and is instructed
The network perfected obtains the probability that it belongs to each aesthetic feeling classification and each emotional category, finally chooses the classification conduct of maximum probability
The prediction aesthetic feeling classification and emotional category of image.
The present embodiment applies to the thought of multi-task learning in the aesthetic feeling classification and emotional semantic classification of image, makes full use of
Associate feature between two tasks, and devise a unified depth convolutional neural networks frame, by connecting across branch
Connecing layer makes to make that information can be effectivelyd share between two tasks in a manner of swap image characteristic pattern between network branches, and
Automatically which information is study different task need in the training process, realizes and combines to the aesthetic feeling classification and emotional category of image
Identification, improves the accuracy of image aesthetic feeling classification and emotional category.
The image aesthetic feeling and emotion joint classification based on depth multi-task learning that 4 pairs of disclosure propose with reference to the accompanying drawing
System elaborates.
As shown in figure 4, a kind of image aesthetic feeling and emotion joint classification system based on depth multi-task learning of the present embodiment
System, comprising: training dataset forms module 11, depth convolutional neural networks constructing module 12, the training of depth convolutional neural networks
Module 13 and prediction categorization module 14.
Wherein:
Training dataset forms module 11, is used to mark the corresponding aesthetic feeling classification of image and emotional category, forms training
Data set.
In specific implementation, in image aesthetic feeling classification problem, two class of high aesthetic feeling and low aesthetic feeling is divided the image into;In image
In emotional semantic classification problem, pleasure is divided the image into, reveres, meet, excitement, indignation, detest, is frightened, total eight bases of sadness
Emotional category.
Since the aesthetic feeling and emotion of people are all the very strong cognition attributes of subjectivity, there are apparent individual differences.Therefore,
For the aesthetic feeling classification of image and the mark of emotional category, the strategy that same piece image is marked jointly using more people, it
The classification for taking the highest classification of common recognition degree final as image afterwards.
It should be understood that in other examples, the classification of image aesthetic feeling and Image emotional semantic classification can also be divided into other classes
Not, those skilled in the art can self-setting as the case may be, be not described in detail here.
Depth convolutional neural networks constructing module 12 is used to construct comprising across branch articulamentum and two parallel networks point
The depth convolutional neural networks of branch.
Wherein, two network branches are each responsible for carrying out aesthetic feeling classification and emotional semantic classification to input picture;Across branch connection
Layer is for connecting corresponding convolutional layer group in two network branches, to be associated with aesthetic feeling classification and the two tasks of emotional semantic classification;It is deep
The output representing input images of degree convolutional neural networks belong to the probability of each aesthetic feeling classification and each emotional category.
Specifically, in the depth convolutional neural networks, convolutional layer group quantity in two network branches is identical to be
n;Quantity across branch's articulamentum is n-1;I-th of across branch articulamentum is by i-th of corresponding convolutional layer in two network branches
The characteristics of image figure of group output is stacked as input, and by these characteristics of image figures inputted along channel direction, by heap
The characteristics of image of poststack is separately input into the corresponding convolutional layer group of i+1 in two network branches;1≤i≤n-1;N be greater than
Or the positive integer equal to 2.
Depth convolutional neural networks in the present embodiment are as shown in Fig. 2.Network includes two parallel branch altogether, they connect
By same width input picture, and it is each responsible for carrying out aesthetic feeling classification and emotional semantic classification to input picture.The knot of each network branches
Structure is identical, is all based on VGG16 network structure (referring to Simonyan K, Zisserman A.Very deep convolutional
networks for large-scale image recognition.arXiv preprint arXiv:1409.1556,
2014.).Each network branches are made of 5 convolutional layer groups, 3 full articulamentums and 1 Softmax layers.Wherein, single convolution
Comprising multiple continuous convolutional layers and 1 maximum pond layer in layer group, the purpose is to extract effective characteristics of image figure.Full connection
Layer carries out multiple nonlinear transformation to the characteristics of image figure of the last one convolutional layer group output, is mapped as a column vector.
The dimension of vector is equal to the number of aesthetic feeling classification or emotional category, per the specific aesthetic feeling classification of one-dimensional correspondence one or emotion class
Not.By final Softmax layer, which is converted into a probability value per one-dimensional, and representing input images belong to correspondence
The probability of classification.Each layer of specific structure and parameter setting are referring to VGG16 network model in network branches.
Across branch articulamentum is introduced, convolutional layer group corresponding in two network branches is attached, across branch articulamentum
Structure it is as shown in Fig. 3.The characteristics of image figure that across branch articulamentum exports two convolutional layer groups as input, and by they
It is stacked along channel direction.Assuming that the channel number of heap prestack single image characteristic pattern is K (K is positive integer), then stack
The channel number of characteristics of image figure is 2K afterwards.Then, the characteristics of image figure of heap poststack is inputted two convolution kernel sizes respectively is
The convolutional layer of 1*1.The two convolutional layers all include K convolution kernel, and the step-length and edge filling size of convolution kernel are respectively 1 and 0.
In this way, two convolutional layers will export new characteristics of image figure again, and the size of new characteristics of image figure is constant, channel
Number reverts to K, and new characteristics of image figure is finally sent into the subsequent convolutional layer group of a network branches or full articulamentum respectively.
Intuitively, across branch articulamentum makes to carry out shared information in a manner of swap image characteristic pattern between two network branches, and
Facilitating model, automatically study determines which information is two tasks be respectively necessary in the training process;
In traditional depth multi-task learning method, different task is normally provided as sharing lower network layer, and
Respective branch is maintained in higher network layer.Before carrying out multi-task learning training, need by virtue of experience artificial in advance
Specify shared network layer in ground.This way lacks theoretical direction, for sharing the unreasonable selection side of may result in of network layer
The serious downslide of method performance.It is different from the above method, the present embodiment all designs individually for different task on all-network layer
Network branches, across branch articulamentum make between network branches can in a manner of swap image characteristic pattern come shared information,
And automatically which information is study different task need in the training process, and then improves classification accuracy.
Depth convolutional neural networks training module 13 is used to train depth convolutional Neural net using training dataset
Network, until predefined loss function reaches minimum.
The depth convolutional neural networks training module 13, comprising:
Size unified modules 131 are used for the size dimension of all images of unified training dataset;
Initialization module 132 is used to initialize the weight of each layer of depth convolutional neural networks, predefined loss function;
Repetitive exercise module 133 is used to be trained depth convolutional neural networks using stochastic gradient descent algorithm,
Determination can make the smallest network weight of loss function;And in each training iteration, cut out from the random position of image
One piece of fixed size image block, and flip horizontal is carried out to image block with certain probability.
During training depth convolutional neural networks using training dataset, firstly, by all training image scalings
To the size of unified size, the present embodiment is by image zooming to 256*256 pixel;Then, the pixel for calculating training image is average
Value, and make every piece image that the mean value be individually subtracted, which can make training image remove common ground, highlight training image
Individual difference;Finally, it is big to cut out one piece of fixation from the random position for the image for subtracting mean value in each training iteration
Small image block, and flip horizontal is carried out to image block with certain probability.In this way, training sample can effectively be expanded
Quantity, the diversity of training for promotion sample.What the present embodiment was chosen is the image block of 224*224 pixel size, is carried out every time
The probability of flip horizontal operation is 0.5.
In addition to the full articulamentum of the last layer and across branch articulamentum, the weight of each each layer of network branches is all made of
The weight of the VGG16 model of pre-training initializes on ImageNet data set, to the full articulamentum of the last layer and across branch company
The weight for connecing layer carries out random initializtion.Using cross entropy loss function, being defined on the classificatory loss of aesthetic feeling is La, in emotion
Classificatory loss is Le, i.e.,
La=-ya logpa-(1-ya)log(1-pa)
Wherein, yaThe true aesthstic classification for showing input picture, if it is high artistic image, value 1 that image is practical;It is no
Then, value 0.yeShow the real feelings classification of input picture, if image actually belongs to e-th of emotional category, value 1;
Otherwise, value 0.paThe image for representing network output belongs to the probability of high aesthetic feeling classification, peThe image for representing network output belongs to
The probability of e-th of emotional category.
Further, total loss function is L=La+ λ Le.Wherein, λ is the hyper parameter of two class of balance model loss.?
In the present embodiment, consider that aesthetic feeling is classified as two classification problems, and emotional semantic classification is more classification problems, therefore the value that sets of λ is 1/4.
Network is trained using stochastic gradient descent algorithm, determination can make the smallest network weight of loss function.
Predict categorization module 14, the depth convolutional neural networks output given image for being used to be obtained using training is belonged to respectively
The probability of aesthetic feeling classification and each emotional category chooses the classification of maximum probability in aesthetic feeling classification and emotional category respectively as given
The prediction aesthetic feeling classification and emotional category of image.
In the present embodiment, piece image is given, first by its scaling to 224*224 pixel, then image is inputted and is instructed
The network perfected obtains the probability that it belongs to each aesthetic feeling classification and each emotional category, finally chooses the classification conduct of maximum probability
The prediction aesthetic feeling classification and emotional category of image.
The present embodiment applies to the thought of multi-task learning in the aesthetic feeling classification and emotional semantic classification of image, makes full use of
Associate feature between two tasks, and devise a unified depth convolutional neural networks frame, by connecting across branch
Connecing layer makes to make that information can be effectivelyd share between two tasks in a manner of swap image characteristic pattern between network branches, and
Automatically which information is study different task need in the training process, realizes and combines to the aesthetic feeling classification and emotional category of image
Identification, improves the accuracy of image aesthetic feeling classification and emotional category.
In another embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, the journey
The image aesthetic feeling and emotion joint classification method based on depth multi-task learning as shown in Figure 1 is realized when sequence is executed by processor
In step.
In another embodiment, a kind of computer equipment is provided, including memory, processor and storage are on a memory
And the computer program that can be run on a processor, the processor are realized as shown in Figure 1 based on depth when executing described program
Spend the step in the image aesthetic feeling and emotion joint classification method of multi-task learning.
It should be understood by those skilled in the art that, embodiment of the disclosure can provide as method, system or computer program
Product.Therefore, the shape of hardware embodiment, software implementation or embodiment combining software and hardware aspects can be used in the disclosure
Formula.Moreover, the disclosure, which can be used, can use storage in the computer that one or more wherein includes computer usable program code
The form for the computer program product implemented on medium (including but not limited to magnetic disk storage and optical memory etc.).
The disclosure is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present disclosure
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage medium
In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic
Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random
AccessMemory, RAM) etc..
The foregoing is merely preferred embodiment of the present disclosure, are not limited to the disclosure, for the skill of this field
For art personnel, the disclosure can have various modifications and variations.It is all within the spirit and principle of the disclosure, it is made any to repair
Change, equivalent replacement, improvement etc., should be included within the protection scope of the disclosure.
Claims (10)
1. a kind of image aesthetic feeling and emotion joint classification method based on depth multi-task learning characterized by comprising
The corresponding aesthetic feeling classification of image and emotional category are marked, training dataset is formed;
Construction includes the depth convolutional neural networks of across branch articulamentum and Liang Ge parallel network branch;
Wherein, two network branches are each responsible for carrying out aesthetic feeling classification and emotional semantic classification to input picture;Across branch articulamentum is used
In connecting corresponding convolutional layer group in two network branches, to be associated with aesthetic feeling classification and the two tasks of emotional semantic classification;Depth volume
The output representing input images of product neural network belong to the probability of each aesthetic feeling classification and each emotional category;
Depth convolutional neural networks are trained using training dataset, until predefined loss function reaches minimum;
The depth convolutional neural networks output given image obtained using training belongs to the general of each aesthetic feeling classification and each emotional category
Rate chooses prediction aesthetic feeling classification and emotion of the classification respectively as given image of maximum probability in aesthetic feeling classification and emotional category
Classification.
2. the image aesthetic feeling based on depth multi-task learning and emotion joint classification method as described in claim 1, feature
It is, in the depth convolutional neural networks, identical convolutional layer group quantity in two network branches is n;Across branch company
The quantity for connecing layer is n-1;The figure that i-th of across branch articulamentum exports i-th in two network branches corresponding convolutional layer group
It is inputted as characteristic pattern is used as, and these characteristics of image figures inputted is stacked along channel direction, by the image of heap poststack
Feature is separately input into the corresponding convolutional layer group of i+1 in two network branches;1≤i≤n-1;N is more than or equal to 2
Positive integer.
3. the image aesthetic feeling based on depth multi-task learning and emotion joint classification method as claimed in claim 2, feature
It is, each convolutional layer group includes a maximum pond layer and at least two continuous convolutional layers.
4. the image aesthetic feeling based on depth multi-task learning and emotion joint classification method as described in claim 1, feature
It is, the process of depth convolutional neural networks is trained using training dataset, comprising:
The size dimension of all images of unified training dataset;
Initialize the weight of each layer of depth convolutional neural networks, predefined loss function;
Depth convolutional neural networks are trained using stochastic gradient descent algorithm, determination can make the smallest net of loss function
Network weight;And in each training iteration, one piece of fixed size image block is cut out from the random position of image, and with certain
Probability carries out flip horizontal to image block.
5. a kind of image aesthetic feeling and emotion joint classification system based on depth multi-task learning characterized by comprising
Training dataset forms module, is used to mark the corresponding aesthetic feeling classification of image and emotional category, forms training dataset;
Depth convolutional neural networks constructing module is used to construct the depth comprising across branch articulamentum and Liang Ge parallel network branch
Spend convolutional neural networks;
Wherein, two network branches are each responsible for carrying out aesthetic feeling classification and emotional semantic classification to input picture;Across branch articulamentum is used
In connecting corresponding convolutional layer group in two network branches, to be associated with aesthetic feeling classification and the two tasks of emotional semantic classification;Depth volume
The output representing input images of product neural network belong to the probability of each aesthetic feeling classification and each emotional category;
Depth convolutional neural networks training module is used to train depth convolutional neural networks using training dataset, until
Predefined loss function reaches minimum;
Predict categorization module, the depth convolutional neural networks output given image for being used to obtain using training belongs to each aesthetic feeling class
Not with the probability of each emotional category, the classification of maximum probability in aesthetic feeling classification and emotional category is chosen respectively as given image
Predict aesthetic feeling classification and emotional category.
6. the image aesthetic feeling based on depth multi-task learning and emotion joint classification system as claimed in claim 5, feature
It is, in the depth convolutional neural networks, identical convolutional layer group quantity in two network branches is n;Across branch company
The quantity for connecing layer is n-1;The figure that i-th of across branch articulamentum exports i-th in two network branches corresponding convolutional layer group
It is inputted as characteristic pattern is used as, and these characteristics of image figures inputted is stacked along channel direction, by the image of heap poststack
Feature is separately input into the corresponding convolutional layer group of i+1 in two network branches;1≤i≤n-1;N is more than or equal to 2
Positive integer.
7. the image aesthetic feeling based on depth multi-task learning and emotion joint classification system as claimed in claim 6, feature
It is, each convolutional layer group includes a maximum pond layer and at least two continuous convolutional layers.
8. the image aesthetic feeling based on depth multi-task learning and emotion joint classification system as claimed in claim 5, feature
It is, the depth convolutional neural networks training module, comprising:
Size unified modules are used for the size dimension of all images of unified training dataset;
Initialization module is used to initialize the weight of each layer of depth convolutional neural networks, predefined loss function;
Repetitive exercise module is used to be trained depth convolutional neural networks using stochastic gradient descent algorithm, determines energy
So that the smallest network weight of loss function;And in each training iteration, one piece is cut out admittedly from the random position of image
Determine sized images block, and flip horizontal is carried out to image block with certain probability.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor
Such as the image aesthetic feeling and emotion joint classification of any of claims 1-4 based on depth multi-task learning is realized when row
Step in method.
10. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor
Calculation machine program, which is characterized in that the processor realizes such as base of any of claims 1-4 when executing described program
Step in the image aesthetic feeling and emotion joint classification method of depth multi-task learning.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910272826.6A CN109978074A (en) | 2019-04-04 | 2019-04-04 | Image aesthetic feeling and emotion joint classification method and system based on depth multi-task learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910272826.6A CN109978074A (en) | 2019-04-04 | 2019-04-04 | Image aesthetic feeling and emotion joint classification method and system based on depth multi-task learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109978074A true CN109978074A (en) | 2019-07-05 |
Family
ID=67083180
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910272826.6A Pending CN109978074A (en) | 2019-04-04 | 2019-04-04 | Image aesthetic feeling and emotion joint classification method and system based on depth multi-task learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109978074A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111401294A (en) * | 2020-03-27 | 2020-07-10 | 山东财经大学 | Multitask face attribute classification method and system based on self-adaptive feature fusion |
CN111523574A (en) * | 2020-04-13 | 2020-08-11 | 云南大学 | Image emotion recognition method and system based on multi-mode data |
CN112668638A (en) * | 2020-12-25 | 2021-04-16 | 山东大学 | Image aesthetic quality evaluation and semantic recognition combined classification method and system |
CN113065571A (en) * | 2019-12-16 | 2021-07-02 | 北京沃东天骏信息技术有限公司 | Method and device for constructing training data set |
CN117315313A (en) * | 2022-03-30 | 2023-12-29 | 北京百度网讯科技有限公司 | Multitasking recognition method, training device, electronic equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106127780A (en) * | 2016-06-28 | 2016-11-16 | 华南理工大学 | A kind of curved surface defect automatic testing method and device thereof |
CN106354768A (en) * | 2016-08-18 | 2017-01-25 | 向莉妮 | Matching method for users and commodities and commodity matching recommendation method based on color |
CN107103590A (en) * | 2017-03-22 | 2017-08-29 | 华南理工大学 | A kind of image for resisting generation network based on depth convolution reflects minimizing technology |
CN107578436A (en) * | 2017-08-02 | 2018-01-12 | 南京邮电大学 | A kind of monocular image depth estimation method based on full convolutional neural networks FCN |
CN108427920A (en) * | 2018-02-26 | 2018-08-21 | 杭州电子科技大学 | A kind of land and sea border defense object detection method based on deep learning |
CN108898105A (en) * | 2018-06-29 | 2018-11-27 | 成都大学 | It is a kind of based on depth characteristic and it is sparse compression classification face identification method |
CN109120992A (en) * | 2018-09-13 | 2019-01-01 | 北京金山安全软件有限公司 | Video generation method and device, electronic equipment and storage medium |
-
2019
- 2019-04-04 CN CN201910272826.6A patent/CN109978074A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106127780A (en) * | 2016-06-28 | 2016-11-16 | 华南理工大学 | A kind of curved surface defect automatic testing method and device thereof |
CN106354768A (en) * | 2016-08-18 | 2017-01-25 | 向莉妮 | Matching method for users and commodities and commodity matching recommendation method based on color |
CN107103590A (en) * | 2017-03-22 | 2017-08-29 | 华南理工大学 | A kind of image for resisting generation network based on depth convolution reflects minimizing technology |
CN107578436A (en) * | 2017-08-02 | 2018-01-12 | 南京邮电大学 | A kind of monocular image depth estimation method based on full convolutional neural networks FCN |
CN108427920A (en) * | 2018-02-26 | 2018-08-21 | 杭州电子科技大学 | A kind of land and sea border defense object detection method based on deep learning |
CN108898105A (en) * | 2018-06-29 | 2018-11-27 | 成都大学 | It is a kind of based on depth characteristic and it is sparse compression classification face identification method |
CN109120992A (en) * | 2018-09-13 | 2019-01-01 | 北京金山安全软件有限公司 | Video generation method and device, electronic equipment and storage medium |
Non-Patent Citations (3)
Title |
---|
YUAN GAO等: "NDDR-CNN: Layer-wise Feature Fusing in Multi-Task CNN by Neural Discriminative Dimensionality Reduction", 《ARXIV:1801.08297V1》 * |
杨文雅等: "基于语义感知的图像美学质量评估方法", 《计算机应用》 * |
汪珊娜: "基于卷积神经网络的织物美感分类与情感标注研究", 《中国优秀硕士学位论文全文数据库 工程科技Ⅰ辑》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113065571A (en) * | 2019-12-16 | 2021-07-02 | 北京沃东天骏信息技术有限公司 | Method and device for constructing training data set |
CN111401294A (en) * | 2020-03-27 | 2020-07-10 | 山东财经大学 | Multitask face attribute classification method and system based on self-adaptive feature fusion |
CN111523574A (en) * | 2020-04-13 | 2020-08-11 | 云南大学 | Image emotion recognition method and system based on multi-mode data |
CN112668638A (en) * | 2020-12-25 | 2021-04-16 | 山东大学 | Image aesthetic quality evaluation and semantic recognition combined classification method and system |
CN117315313A (en) * | 2022-03-30 | 2023-12-29 | 北京百度网讯科技有限公司 | Multitasking recognition method, training device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109978074A (en) | Image aesthetic feeling and emotion joint classification method and system based on depth multi-task learning | |
CN107610123A (en) | A kind of image aesthetic quality evaluation method based on depth convolutional neural networks | |
CN109359538A (en) | Training method, gesture identification method, device and the equipment of convolutional neural networks | |
CN104933428B (en) | A kind of face identification method and device based on tensor description | |
CN107016415B (en) | A kind of color image Color Semantic classification method based on full convolutional network | |
CN108961245A (en) | Picture quality classification method based on binary channels depth parallel-convolution network | |
CN107742107A (en) | Facial image sorting technique, device and server | |
CN108875934A (en) | A kind of training method of neural network, device, system and storage medium | |
CN107341506A (en) | A kind of Image emotional semantic classification method based on the expression of many-sided deep learning | |
CN109325443A (en) | A kind of face character recognition methods based on the study of more example multi-tag depth migrations | |
CN105956150B (en) | A kind of method and device generating user's hair style and dressing collocation suggestion | |
CN102156885B (en) | Image classification method based on cascaded codebook generation | |
CN105512676A (en) | Food recognition method at intelligent terminal | |
CN109145871A (en) | Psychology and behavior recognition methods, device and storage medium | |
CN105469376A (en) | Method and device for determining picture similarity | |
CN109766465A (en) | A kind of picture and text fusion book recommendation method based on machine learning | |
CN110689523A (en) | Personalized image information evaluation method based on meta-learning and information data processing terminal | |
CN108596243A (en) | The eye movement for watching figure and condition random field attentively based on classification watches figure prediction technique attentively | |
CN109377441A (en) | Tongue with privacy protection function is as acquisition method and system | |
CN110059656A (en) | The leucocyte classification method and system for generating neural network are fought based on convolution | |
CN108875693A (en) | A kind of image processing method, device, electronic equipment and its storage medium | |
CN110263822A (en) | A kind of Image emotional semantic analysis method based on multi-task learning mode | |
CN109376683A (en) | A kind of video classification methods and system based on dense graph | |
CN109359610A (en) | Construct method and system, the data characteristics classification method of CNN-GB model | |
CN110163145A (en) | A kind of video teaching emotion feedback system based on convolutional neural networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |