CN116975779A

CN116975779A - Neural network-based oral cavity full-scene feature recognition method, system and terminal

Info

Publication number: CN116975779A
Application number: CN202310954386.9A
Authority: CN
Inventors: 张聪; 王旭; 刘曼; 温梦娜
Original assignee: Shenzhen Polytechnic
Current assignee: Shenzhen Polytechnic
Priority date: 2023-07-28
Filing date: 2023-07-28
Publication date: 2023-10-31

Abstract

The invention provides a neural network-based method, a neural network-based system and a neural network-based terminal for identifying characteristics of full-view oral cavity films, and particularly relates to the technical field of artificial intelligent deep learning and full-view oral cavity films; extracting image features from the oral cavity full-view film by adopting an image encoder, extracting text features from the oral cavity state description text by adopting a text encoder, and then fusing the image features and the text features to obtain fusion features; based on the fusion characteristics, obtaining an interpretation text and picture segmentation result of the fusion characteristics; and obtaining the characteristic recognition result of the oral cavity full-scene film based on the interpretation text and picture segmentation result. The scheme is based on the synergistic effect of the interpretation text and the picture segmentation result, and can improve the accuracy, the effectiveness and the comprehensiveness of the feature recognition result of the extracted oral full-view film.

Description

Neural network-based oral cavity full-scene feature recognition method, system and terminal

Technical Field

The invention relates to the technical field of artificial intelligence deep learning and oral panorama, in particular to an oral panorama feature recognition method, system and terminal based on a neural network.

Background

The whole view of the oral cavity can be obtained by one-time imaging by applying the narrow slit and circular arc track tomography principle. At present, an oral panoramic film is widely used for assisting oral diagnosis and treatment, and a deep neural network is an auxiliary means for diagnosing oral diseases, so that the deep recognition of dental caries, the detection of root tip lesions and the like can be further realized through the deep neural network.

Although the deep neural network has been used for extracting the characteristics of the oral panorama, the existing technology has poor effect on characteristic region identification and characteristic extraction under various complex conditions such as tooth deficiency, artificial filling and the like, so that the accuracy of the oral health detection result is low.

Disclosure of Invention

In view of the shortcomings of the prior art, the invention aims to provide a neural network-based method, a neural network-based system and a neural network-based terminal for identifying characteristics of full-view oral films, and aims to solve the problem that in the prior art, the accuracy of the oral health detection results is low.

In order to achieve the above object, a first aspect of the present invention provides a method for identifying features of an oral cavity full-view film based on a neural network, including:

collecting an oral cavity full-view picture and an oral cavity state description text corresponding to the oral cavity full-view picture;

extracting image features from the oral full view slice using a trained image encoder, and extracting text features from the oral state description text using a trained text encoder;

fusing the image features and the text features to obtain fusion features;

based on the fusion characteristics, obtaining an interpretation text and picture segmentation result of the fusion characteristics; and obtaining the characteristic recognition result of the oral cavity full-scene film based on the interpretation text and the picture segmentation result.

Optionally, the training process of the trained image encoder includes:

collecting a plurality of oral cavity full-view sheets to construct an oral cavity full-view sheet sample set;

constructing an image encoder model;

inputting the oral cavity full-view picture sample set into the image encoder model, extracting image characteristics of oral cavity full-view pictures in the oral cavity full-view picture sample set, calculating image encoder loss according to the image characteristics based on a preset image encoder loss function, and optimizing the image encoder model according to the image encoder loss to obtain a trained image encoder.

Optionally, the fusing the image feature and the text feature to obtain a fused feature includes:

and fusing the image features and the text features by adopting a trained fused neural network to obtain fused features, wherein the training process of the trained fused neural network comprises the following steps:

collecting a plurality of oral cavity full scenic spots and a plurality of oral cavity state description texts to construct an oral cavity characteristic sample set;

constructing a fused neural network model;

inputting the oral cavity feature sample set into the fusion neural network model, extracting oral cavity features in the oral cavity feature sample set, calculating fusion neural network loss according to the oral cavity features based on a preset fusion neural network loss function, and optimizing the fusion neural network model according to the fusion neural network loss to obtain a trained fusion neural network.

Optionally, the obtaining, based on the fusion feature, an interpretation text and picture segmentation result of the fusion feature includes:

decoding the fusion features by using a trained text decoder to obtain interpretation text of the fusion features;

and dividing the fusion features by adopting a trained image dividing network to obtain a picture dividing result.

Optionally, the training process of the text decoder includes:

constructing a text decoder model;

inputting the fusion characteristics into the text decoder model, extracting an interpretation text in the fusion characteristics, calculating text decoder loss according to the interpretation text based on a preset text decoder loss function, and optimizing the text decoder model according to the text decoder loss to obtain a trained text decoder.

Optionally, the method further comprises:

and calculating the characteristic index parameters of the oral cavity by using the characteristic identification result of the oral cavity full-scene.

The second aspect of the present invention provides an oral panorama feature recognition system based on a neural network, the system comprising:

the data acquisition module is used for acquiring the oral cavity full-view film and an oral cavity state description text corresponding to the oral cavity full-view film;

a feature extraction module for extracting image features from the oral cavity full view film using a trained image encoder and extracting text features from the oral cavity state description text using a trained text encoder;

the feature fusion module is used for fusing the image features and the text features to obtain fusion features;

the feature recognition module is used for obtaining an interpretation text and picture segmentation result of the fusion feature based on the fusion feature; and obtaining the characteristic recognition result of the oral cavity full-scene film based on the interpretation text and the picture segmentation result.

Optionally, the method further comprises:

and the oral cavity characteristic index extraction module is used for calculating oral cavity characteristic index parameters by utilizing the characteristic identification result of the oral cavity full-scene film.

The third aspect of the present invention provides an intelligent terminal, which includes a memory, a processor, and a neural network-based oral panorama feature recognition program stored in the memory and capable of running on the processor, wherein the neural network-based oral panorama feature recognition program, when executed by the processor, implements any one of the steps of the neural network-based oral panorama feature recognition method described above.

A fourth aspect of the present invention provides a computer-readable storage medium, on which a neural network-based oral panorama feature recognition program is stored, which when executed by a processor, implements the steps of any one of the above neural network-based oral panorama feature recognition methods.

Compared with the prior art, the beneficial effects of this scheme are as follows:

firstly, extracting image features of an acquired oral cavity full view film by adopting a trained image encoder, and extracting text features of an acquired oral cavity state description text by adopting a trained text encoder so as to simultaneously take the image features and the text features as data features of the oral cavity full view film feature extraction; then fusing the image features and the text features to obtain fusion features, and realizing the supplement and perfection of the text features to the image features; and based on the fusion characteristics, obtaining an interpretation text and picture segmentation result of the fusion characteristics so as to mutually verify and supplement the interpretation text and picture segmentation result, so that the interpretation text is used as powerful interpretation and supplement description of the characteristic recognition result of the oral cavity full-view picture, and the accuracy, the effectiveness and the comprehensiveness of extracting the characteristic recognition result of the oral cavity full-view picture can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a method for identifying features of an oral cavity full-scene film based on a neural network;

FIG. 2 is a schematic diagram of an example of a method for identifying features of an oral cavity full view film based on a neural network;

fig. 3 is a schematic structural diagram of an oral panorama feature recognition system based on a neural network according to the present invention;

fig. 4 is a schematic structural diagram of an intelligent terminal according to the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

The following description of the embodiments of the present invention will be made more fully hereinafter with reference to the accompanying drawings, in which embodiments of the invention are shown, it being evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.

The oral cavity full-view feature recognition method based on the multi-mode depth neural network mainly adopts the multi-mode depth neural network to fuse two data types of texts and images to form a multi-mode data set formed by fusion features, and then adopts the multi-mode depth neural network to recognize various image feature data and text feature data related to the oral cavity, wherein the text feature data is mainly used for assisting in recognizing the image feature data included in the oral cavity full-view, so that the accuracy and the comprehensiveness of extracting effective features in the oral cavity full-view are improved, and the detection result of the oral cavity health condition is more accurate and comprehensive.

Exemplary method

The embodiment of the invention provides a neural network-based oral full-view feature recognition method which is deployed on electronic equipment such as a computer and a server, has an application scene of oral disease diagnosis and aims at the diagnosis and treatment of oral related diseases such as oral structural abnormality (such as tooth deficiency, artificial filling and the like), periodontal tissue abnormality (such as osteoporosis, infection, fracture and the like), tooth correction, dental implant or other oral surgery planning and the like. Specifically, as shown in fig. 1 and fig. 2, the steps of the method in this embodiment include:

step S100: and collecting the oral cavity full view sheet and an oral cavity state description text corresponding to the oral cavity full view sheet.

Specifically, a plurality of oral cavity full scenery patches and oral cavity state description texts corresponding to the oral cavity full scenery patches one by one are collected, and each oral cavity full scenery patch and the corresponding oral cavity state description text are ensured to have the same dimension respectively.

Further, as other preferred embodiments, the method can also perform preprocessing such as filtering and denoising on the oral full-view film, adjusting the image to be a fixed size, normalizing the pixel value range, adjusting the contrast and brightness of the image, and the like, so as to ensure that each image has the same size and resolution, and obtain the preprocessed oral full-view film, thereby improving the accuracy of subsequent analysis.

Step S200: image features are extracted from the oral full view using a trained image encoder, and text features are extracted from the oral state description text using a trained text encoder.

Specifically, collecting a plurality of oral cavity full-view sheets to construct an oral cavity full-view sheet sample set; image encoder models, such as Convolutional Neural Networks (CNNs) and deep learning models, such as VGG, resNet, or acceptance, are built. Input the oral cavity full view film sample set toAnd extracting image characteristics of the oral cavity full scenery in the oral cavity full scenery sample set from the image encoder model, calculating image encoder loss according to the image characteristics based on a preset image encoder loss function, and optimizing the image encoder model according to the image encoder loss to obtain the trained image encoder. The oral panorama patch is then input into a trained image encoder, extracting image features, wherein the image features of the oral panorama patch refer to known image features included in the oral panorama patch in the sample set. The present embodiment uses image feature vector F ₁ And (3) representing.

The effect of image feature extraction may be affected by the selection and parameter settings of the image encoder. The method can be specifically selected and adjusted according to actual task requirements so as to obtain the optimal image characteristic representation. Useful image encoder models include, but are not limited to, convolutional Neural Networks (CNNs) and deep learning models, such as VGG, resNet, or acceptance, among others.

Likewise, collecting a plurality of oral state description texts to construct an oral state description text sample set; a text encoder model is built, a mouth state description text sample set is input into the text encoder model, mouth state description text in the mouth state description text sample set is extracted, text encoder loss is calculated according to the mouth state description text based on a preset text encoder loss function, and the text encoder model is optimized according to the text encoder loss, so that a trained text encoder is obtained. The oral state description text, which refers to known text features included in the oral state description text sample set, is then input into the trained text encoder to extract text features. The present embodiment uses text feature vector F ₂ Representing and incorporating text feature vector F ₂ Conversion to and from image feature vector F ₁ Having the same dimensions.

The effect of text feature extraction may be affected by the choice of text encoder and parameter settings. The text characteristic representation method can be specifically selected and adjusted according to actual task requirements so as to obtain the optimal text characteristic representation. Useful text encoder models include, but are not limited to BERT, roBERTa, ALBERT, NEZHA, XLNET, ERNIE, and the like.

Further, as other preferred embodiments, the extracted image features and text features may be subjected to dimension reduction processing by using dimension reduction technology (such as principal component analysis, independent component analysis, high correlation filtering, forward feature selection, etc.), or the features may be subjected to classification processing by using a clustering algorithm.

Step S300: and fusing the image features and the text features to obtain fusion features.

Specifically, the fusion of the image features and the text features is realized by training a fusion neural network, wherein the training process of the fusion neural network is specifically as follows:

collecting a plurality of oral cavity full scenic spots and a plurality of oral cavity state description texts to construct an oral cavity characteristic sample set; the method comprises the steps of constructing a fusion neural network model, inputting an oral characteristic sample set into the fusion neural network model, extracting oral characteristics in the oral characteristic sample set, calculating fusion neural network loss according to the oral characteristics based on a preset fusion neural network loss function, and optimizing the fusion neural network model according to the fusion neural network loss to obtain a trained fusion neural network. And then inputting the extracted image features and text features into a trained fusion neural network to generate fusion features. The fusion feature in this embodiment represents a fusion feature vector or a fusion feature matrix.

Further, as other preferred embodiments, according to the complexity of the task and the actual application conditions such as the scale of the data set, an appropriate evaluation index such as accuracy, mean square error, etc. may be selected, and the trained fusion neural network is evaluated by using the verification set or the test set, and optimized according to the evaluation result.

In the embodiment, the fusion of the features of different modes is realized by adopting a mode of constructing a fused neural network, and as other preferred embodiments, the fusion of the features of different modes can be realized by selecting modes of linear superposition, splicing, attention mechanism and the like according to actual task requirements. In the method for realizing the fusion of the text features and the image features by adopting the attention mechanism, a self-attention module in a transformer is utilized to exchange information between two modal data, namely the image features and the text features, at the middle layer. For example, a 4-dimensional hidden vector is used to make self-attitution with feature vectors of two modes of image feature and text feature respectively, so as to implement information exchange.

It should be stated that the manner in which features are fused and the design of the network architecture may have an impact on the expressive power of the fused features. Some experimentation and adjustment may be required to obtain the best fusion profile representation, depending on the needs of a particular task.

The oral full-view film and the oral state description text used for training the model in the embodiment are complete oral detection or diagnosis data obtained by the existing clinical experiments.

For example, using a panoramic X-ray machine or the like, a set of oral full-view slices is generated as an oral full-view slice dataset and each image is ensured to have the same size and resolution. The oral cavity full view film comprises abundant oral cavity and jaw detail information, such as full mouth teeth, jaw structure and position relation information (such as tumor cells, tooth deficiency, artificial filling, jaw height, density and shape, and the like), periodontal tissue conditions (such as osteoporosis, infection, fracture, and the like) so as to provide various reference information for oral cavity disease diagnosis, tooth analysis or other oral cavity operation planning, and the like.

Based on the text of the existing diagnosis medical record and the like, a group of oral state description text is generated and used as an oral state description sample set.

Step S400, based on the fusion characteristics, obtaining an interpretation text and picture segmentation result of the fusion characteristics; and obtaining the characteristic recognition result of the oral cavity full-scene film based on the interpretation text and picture segmentation result.

Specifically, step S410 for obtaining an interpretation text of the fused feature, step S420 for obtaining a picture division result, and step S430 for obtaining a feature recognition result of the oral cavity full view are included, wherein step S410, step S420, and step S430 are as follows:

step S410: decoding the fusion characteristics by adopting a trained text decoder to obtain interpretation text of the fusion characteristics;

specifically, a text decoder model is built, fusion features are input into the text decoder model, interpretation texts in the fusion features are extracted, text decoder loss is calculated according to the interpretation texts based on a preset text decoder loss function, the text decoder model is optimized according to the text decoder loss, and a trained text decoder is obtained. The generated fusion features are then input into a trained text decoder, which generates interpreted text associated with the fusion features.

The accuracy of the text interpretation may be affected by the choice of the text decoder and the parameter settings. The method can be specifically selected and adjusted according to actual task requirements to obtain the best interpretation text related to the fusion characteristics. Common text encoder models are generally available for selection, such as Bert, GRU, and even ChatGPT, etc.

Step S420: and dividing the fusion features by adopting a trained image dividing network to obtain a picture dividing result.

Specifically, an image segmentation network model, such as a U-Net, FCN (full convolution network) or deep Lab, is constructed, and the constructed image segmentation network model is trained by utilizing fusion characteristics, specifically: inputting the fusion features into an image segmentation network model, extracting image features in the fusion features, calculating image segmentation network loss according to the extracted image features in the fusion features based on a preset image segmentation network loss function, and optimizing the image segmentation network model according to the image segmentation network loss to obtain a trained image segmentation network.

In the training process, a segmentation loss function (such as a cross entropy loss function) is adopted to measure the difference between network output and a label, and an optimization algorithm such as gradient descent is used to update network parameters, so that the image segmentation network learns to train with the segmentation result of the existing oral cavity full-scene sample for realizing accurate segmentation as a target, a trained image segmentation network is obtained, and then the trained image segmentation network is utilized to map the input image features to the original image size and generate a segmentation result.

It should be stated that in practical applications, a large amount of training data and computing resources are often required to support training, so as to ensure accuracy of the segmentation result.

Step S430: and obtaining the characteristic recognition result of the oral cavity full-scene film based on the interpretation text and picture segmentation result.

Specifically, according to the picture segmentation result, a picture feature recognition result is obtained;

based on the interpretation text and the picture feature recognition result, the feature recognition result of the oral cavity full-view picture is obtained, so that further improvement of the interpretation text on the picture feature recognition result is realized, and the accuracy, the effectiveness and the comprehensiveness of the feature recognition result of the oral cavity full-view picture are improved.

In this embodiment, in the process of training each neural network model, the loss functions that may be selected include a cross entropy loss function, a 0-1 loss function, an absolute value loss function, a logarithmic loss function, a square loss function, and the like, which may be specifically selected according to the actual application needs.

Further, as another preferred embodiment, the characteristic recognition result of the oral cavity full view film may be used to calculate the oral cavity characteristic index parameter. For example, indexes such as distance, angle and symmetry between teeth are calculated by using the image segmentation result to evaluate whether the number of teeth and the arrangement of teeth are normal; detecting dental lesions such as caries, periodontitis, facial tumor, and the like by analyzing morphological characteristics of teeth and periodontal tissues; by analyzing the bone characteristics (such as bone density, fracture, etc.) in the segmented image, corresponding medical indexes such as bone density, fracture, etc. are calculated.

For example, it is known that the oral panoramic film as shown in fig. 2, and the oral state description text corresponding to the oral panoramic film is that the maxilla lacks one tooth, and after the processing from step S100 to step S400, the obtained interpretation text related to the fusion feature is that the maxilla lacks one tooth on the left side, and the effectiveness of the method of the present invention is verified.

As another example, an oral panoramic film (not shown in the figure) is known, and the oral state description text corresponding to the oral panoramic film is: gingival atrophy and oral cyst after the processing of step S100-step S400, the interpretation text related to the fusion features obtained is that gingival atrophy occurs on the right side of the maxilla and 0.5 mm of oral cyst occurs on the left side of the mandible, and the effectiveness of the method of the invention is also verified.

In the embodiment, the image characteristic extraction is carried out on the collected oral cavity full view film by adopting a trained image encoder, and the text characteristic extraction is carried out on the collected oral cavity state description text by adopting a trained text encoder, so that the image characteristic and the text characteristic are simultaneously used as a data set for extracting the oral cavity full view film characteristic; then, fusing the image features and the text features by adopting a trained neural network to obtain fusion features, and realizing the supplement and perfection of the text features to the image features; and then respectively obtaining interpretation text and picture segmentation results of the fusion features based on the fusion features to mutually verify the fusion features and the picture segmentation results to obtain a feature recognition result of the oral cavity full-scope film, so that the obtained feature recognition result of the oral cavity full-scope film can be used as a powerful interpretation of the feature recognition result of the oral cavity full-scope film, and the accuracy, the effectiveness and the comprehensiveness of the feature recognition result of the extracted oral cavity full-scope film can be improved.

Exemplary System

As shown in fig. 3, corresponding to the above method for identifying oral panoramic film features based on a neural network, an embodiment of the present invention further provides an oral panoramic film feature identification system based on a neural network, where the oral panoramic film feature identification system based on a neural network includes:

the data acquisition module 310 is used for acquiring the oral cavity panoramic film and the oral cavity state description text corresponding to the oral cavity panoramic film;

a feature extraction module 320 for extracting image features from the oral full view film using a trained image encoder and extracting text features from the oral state description text using a trained text encoder;

the feature fusion module 330 is configured to fuse the image feature and the text feature to obtain a fused feature;

the feature recognition module 340 is configured to obtain an interpretation text and a picture segmentation result of the fusion feature based on the fusion feature; and obtaining the characteristic recognition result of the oral cavity full-scene film based on the interpretation text and picture segmentation result.

Further, the device also comprises an oral cavity characteristic index extraction module which is used for calculating oral cavity characteristic index parameters by utilizing the characteristic identification result of the oral cavity full-scene.

Specifically, in this embodiment, the specific function of the above-mentioned oral panorama feature recognition system based on the neural network may also refer to the corresponding description in the above-mentioned oral panorama feature recognition method based on the neural network, which is not described herein again.

Based on the above embodiment, the present invention further provides an intelligent terminal, and a functional block diagram thereof may be shown in fig. 4. The intelligent terminal comprises a processor, a memory, a network interface and a display screen which are connected through a system bus. The processor of the intelligent terminal is used for providing computing and control capabilities. The memory of the intelligent terminal comprises a nonvolatile storage medium and an internal memory. The nonvolatile storage medium stores an operating system and a neural network-based oral panorama feature recognition program. The internal memory provides an environment for the operation of an operating system and a neural network-based oral panorama feature recognition program in a non-volatile storage medium. The network interface of the intelligent terminal is used for communicating with an external terminal through network connection. The method for identifying the characteristics of the oral cavity full-view film based on the neural network comprises the step of realizing any one of the oral cavity full-view film characteristic identification methods based on the neural network when the oral cavity full-view film characteristic identification program based on the neural network is executed by a processor. The display screen of the intelligent terminal can be a liquid crystal display screen or an electronic ink display screen.

It will be appreciated by those skilled in the art that the schematic block diagram shown in fig. 4 is merely a block diagram of a portion of the structure associated with the present inventive arrangements and is not limiting of the smart terminal to which the present inventive arrangements are applied, and that a particular smart terminal may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, an intelligent terminal is provided, where the intelligent terminal includes a memory, a processor, and a neural network-based oral cavity full-film feature recognition program stored in the memory and capable of running on the processor, where the neural network-based oral cavity full-film feature recognition program implements the steps of any one of the neural network-based oral cavity full-film feature recognition methods provided by the embodiments of the present invention when executed by the processor.

The embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium is stored with a neural network-based oral cavity full-scene feature recognition program, and the neural network-based oral cavity full-scene feature recognition program realizes the steps of any one of the neural network-based oral cavity full-scene feature recognition methods provided by the embodiment of the invention when being executed by a processor.

It should be understood that the sequence number of each step in the above embodiment does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not be construed as limiting the implementation process of the embodiment of the present invention.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present invention. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other manners. For example, the apparatus/terminal device embodiments described above are merely illustrative, e.g., the division of the modules or units described above is merely a logical function division, and may be implemented in other manners, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that; the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions, which do not depart from the spirit and scope of the embodiments of the invention, are intended to be included within the scope of the present invention.

Claims

1. The method for identifying the characteristics of the oral cavity full-view film based on the neural network is characterized by comprising the following steps of:

fusing the image features and the text features to obtain fusion features;

2. The neural network-based oral full-scene feature recognition method of claim 1, wherein the training process of the trained image encoder comprises:

constructing an image encoder model;

3. The method for identifying features of an oral cavity full-scope film based on a neural network according to claim 1, wherein the fusing the image features and the text features to obtain fused features comprises:

constructing a fused neural network model;

4. The method for identifying features of an oral cavity full-scope film based on a neural network according to claim 1, wherein the obtaining, based on the fusion features, an interpretation text and picture segmentation result of the fusion features includes:

5. The neural network-based oral cavity full-scene feature recognition method of claim 4, wherein the training process of the text decoder comprises:

constructing a text decoder model;

6. The neural network-based oral cavity full-scene feature recognition method of any of claims 1-5, further comprising:

7. An oral panorama feature recognition system based on a neural network, the system comprising:

8. The neural network-based oral panorama sheet feature recognition system according to claim 7, further comprising:

9. The intelligent terminal is characterized by comprising a memory, a processor and a neural network-based oral panorama feature recognition program stored on the memory and capable of running on the processor, wherein the neural network-based oral panorama feature recognition program realizes the steps of the neural network-based oral panorama feature recognition method according to any one of claims 1-6 when being executed by the processor.

10. A computer readable storage medium, wherein the computer readable storage medium has stored thereon a neural network based oral panorama feature recognition program, which when executed by a processor, implements the steps of the neural network based oral panorama feature recognition method according to any one of claims 1-6.