CN114492434B - Intelligent waybill number identification method based on waybill number automatic identification model - Google Patents
Intelligent waybill number identification method based on waybill number automatic identification model Download PDFInfo
- Publication number
- CN114492434B CN114492434B CN202210102603.7A CN202210102603A CN114492434B CN 114492434 B CN114492434 B CN 114492434B CN 202210102603 A CN202210102603 A CN 202210102603A CN 114492434 B CN114492434 B CN 114492434B
- Authority
- CN
- China
- Prior art keywords
- waybill number
- word
- list
- waybill
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/08—Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
- G06Q10/083—Shipping
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Biology (AREA)
- Entrepreneurship & Innovation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Development Economics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an intelligent waybill number identification method based on an waybill number automatic identification model, which uses the waybill number automatic identification model with a machine supervision learning mechanism to automatically identify the name of an express company to which the waybill number belongs, and improves the waybill number identification efficiency and speed. The technical scheme is as follows: performing word segmentation on the waybill number to be recognized to obtain list data; referring to the automatic identification model data structure of the waybill number, and performing deserialization on the list data; acquiring a similarity threshold value of each corresponding express company based on the deserialized list data; calculating the similarity value of the waybill number according to the deserialized list data and the similarity threshold; and obtaining the identification result of the waybill number by comparing the similarity value of the waybill number with the similarity threshold value.
Description
Technical Field
The invention relates to the field of logistics waybill number query, in particular to a waybill number intelligent identification method based on an automatic waybill number identification model.
Background
With the rapid development of economy and the rapid increase of national living standard, express logistics has become an important industry supporting the development of modern active society, and logistics resource sharing is an important way for promoting the transformation of circulation modes and promoting the upgrading of consumption. The method takes innovation, openness, sharing, cooperation and fusion as a development concept, and realizes resource sharing, information intercommunication and benefit win-win among industry subjects to become a core guarantee and an effective way of 'quality improvement and efficiency improvement' of related enterprises. In the process of 'warehouse sharing, transportation sharing, transfer sharing and matching', the batch identification of the freight note number is an important link.
With the daily increase of the number of express waybills, the pressure of batch identification of waybills is gradually increased. The traditional mode of identifying the freight bill number by accessing a third-party interface has the problems of low access speed, low timeliness and the like, so that the identification efficiency and stability cannot be guaranteed. First, accessing a third party interface may occupy a large amount of bandwidth, require a stable network, and cannot achieve network-independent identification. Moreover, once a third-party interface is attacked or a problem occurs inside, serious stagnation and congestion are caused. Therefore, it is very important to provide a method for identifying the waybill number quickly and efficiently without depending on the network.
Disclosure of Invention
The following presents a simplified summary of one or more aspects in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.
The invention aims to solve the problems and provides an intelligent waybill number identification method based on an waybill number automatic identification model.
The technical scheme of the invention is as follows:
the invention provides a method for intelligently identifying an invoice number based on an invoice number automatic identification model, which comprises the following steps:
performing word segmentation on the waybill number to be recognized to obtain list data;
automatically identifying a model data structure according to the waybill number, and performing deserialization on the list data;
acquiring similarity threshold values of corresponding express companies based on the deserialized list data;
calculating the similarity value of the waybill number according to the deserialized list data and the similarity threshold;
and obtaining the identification result of the waybill number by comparing the similarity value of the waybill number with the similarity threshold value.
According to one embodiment of the method for intelligently identifying the waybill number based on the automatic waybill number identification model, the list data comprises a plurality of word segmentation words, wherein each word segmentation word comprises the length of the waybill number and the number of corresponding words.
According to an embodiment of the method for intelligently identifying the waybill number based on the waybill number automatic identification model, the data structure of the waybill number automatic identification model comprises a dictionary, a corpus, an index and an object, and the list data is deserialized according to the dictionary, the corpus, the index and the object.
According to one embodiment of the method for intelligently identifying the waybill number based on the waybill number automatic identification model, the deserialized list data and the similarity threshold value are used for calculating the similarity value of the waybill number through a TF-IDF algorithm, and the method comprises the following steps of:
calculating the word frequency of each participle word;
calculating the reverse file frequency of each word frequency;
and calculating the similarity value of each participle word based on the word frequency of each participle word and the corresponding reverse file frequency.
According to the embodiment of the intelligent identification method of the freight note number based on the automatic identification model of the freight note number, the similarity threshold value is stored in the similarity threshold value list; the similarity threshold list comprises express company names and corresponding express company similarity thresholds, and the waybill number similarity value is combined with the similarity threshold list to obtain a waybill number identification result, and the method comprises the following steps:
comparing the similarity value of the waybill number with the similarity threshold value of each express company in the similarity threshold value list;
judging whether the similarity value of the waybill number is larger than or equal to the similarity threshold value of each express company in the similarity threshold value list or not; if yes, adding the current company name and the similarity value of the waybill number to the similarity threshold list; if not, continuing to compare;
judging whether the comparison times are smaller than the number of express companies in the similarity threshold list or not; if so, outputting an identification list as an identification result of the waybill number; if not, continuing to compare.
According to one embodiment of the method for intelligently identifying the waybill number based on the waybill number automatic identification model, the waybill number automatic identification model automatically acquires the waybill number identification result through model training, and the method comprises the following steps of:
performing word segmentation on the collected waybill number data sets of the express companies, and storing word segmentation words into a word segmentation list;
calculating the word frequency of each word segmentation word and storing the word frequency;
establishing a dictionary and a corpus based on each participle word and the corresponding word frequency;
creating indexes and objects according to the dictionary and the corpus;
and serializing the dictionary, the corpus, the index and the object data for automatically identifying the waybill number.
According to the embodiment of the method for intelligently identifying the waybill number based on the waybill number automatic identification model, the word segmentation list comprises a plurality of word segmentation words and corresponding express company names; wherein each word-segmentation word comprises the length of the waybill number and the number of the corresponding word.
Compared with the prior art, the invention has the following beneficial effects: the invention automatically identifies the analysis result of the waybill number through an automatic waybill number identification model with a machine supervision learning mechanism. In the analysis process, the TF-IDF algorithm is used for analyzing the word segmentation similarity value of the waybill number, the automatic identification speed of the waybill number is improved, the network pressure is relieved, and the work efficiency of the waybill number identification is improved.
Drawings
The above features and advantages of the present disclosure will be better understood upon reading the detailed description of embodiments of the disclosure in conjunction with the following drawings. In the drawings, components are not necessarily drawn to scale, and components having similar associated characteristics or features may have the same or similar reference numerals.
Fig. 1 is a flowchart illustrating a method for intelligently identifying an invoice number based on an automatic identification model of the invoice number according to the present invention.
FIG. 2 is a flow chart illustrating the waybill number auto-id model training of the present invention.
Fig. 3 is an internal flow diagram illustrating an embodiment of the waybill number automatic identification model of the present invention to automatically identify a waybill number.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. It is noted that the aspects described below in connection with the figures and the specific embodiments are only illustrative and should not be construed as imposing any limitation on the scope of the present invention.
Fig. 1 is a flowchart illustrating the method for intelligently identifying an waybill number based on an waybill number automatic identification model according to the present invention. Referring to fig. 1, the following is a detailed description of the steps of the intelligent waybill number identification method.
Step S1: and performing word segmentation on the waybill number to be recognized to obtain list data.
In the embodiment, the waybill number is automatically analyzed and identified by using the waybill number automatic identification model. Fig. 3 is an internal flowchart illustrating an embodiment of the waybill number automatic identification model of the present invention for automatically identifying the waybill number, and the present embodiment is further described below with reference to fig. 3.
Specifically, after the waybill number automatic identification model reads the input waybill number, preprocessing and word segmentation are performed on the to-be-identified waybill number through identification _ waybill _ express (waybill _ no) to obtain list data. The list data comprises m word segmentation words, and each word segmentation word comprises the length of the waybill number and the number of digits of the corresponding word. Specifically, the specific form of the nth (1 < = n < = m) participle word is as follows:
XX|num(1:n+1)。
wherein XX represents the length of the order number, and if the order number is a single digit, 0 is added to the first digit. num (1). Taking the order number JT5045023079024 as an example, the first participle word obtained after the word segmentation is 15JT, and the second participle word is 15JT5.
Step S2: and (4) referring to the waybill number automatic identification model data structure, and performing deserialization on the list data.
Specifically, the data structure of the automatic identification model of the waybill number comprises a dictionary, a corpus, an index and an object, the data structure of the automatic identification model of the waybill number is referred to, and the automatic identification model of the waybill number deserializes each participle data in the list data according to serialized data such as the dictionary, the corpus, the index and the object through a load _ persistence _ obj () function and loads the data.
In addition, as shown in fig. 2, in this embodiment, the waybill number automatic identification model can automatically obtain the waybill number identification result through model training, and includes the following steps:
and D1, segmenting the collected waybill number data sets of the express companies, and storing the segmented words into a segmentation list.
The word segmentation list comprises a plurality of word segmentation words and corresponding express company names, and the format of each word segmentation word is consistent with that in the step S1 and is not repeated.
And D2, calculating the word frequency of each participle word and storing.
In this embodiment, the word frequency (TF) calculation formula of each participle word is as follows:
wherein i represents the ith waybillThe number j indicates the jth participle, k indicates the number of data sets of the waybill number, n i,j Represents the number of times the word-segmented word appears in the data set, sigma k n k,j Representing the total number of all participles in the dataset.
And D3, establishing a dictionary and a corpus based on each participle word and the corresponding word frequency.
In one embodiment, the word frequency of each segmented word may be weighted using the inverse document frequency, and the calculation formula is as follows:
wherein i represents the ith waybill number, j represents the jth participle, and d j Meaning that the word contains a word-segmentation t i The document, | N | represents the total number of waybill numbers in the corpus, | { j: t |, and i ∈d j denotes the word t containing word segmentation i The less the number of orders containing the participle t, the greater the IDF. In addition, in the formula, in order to avoid the denominator being 0,1 is added to the denominator of the formula.
And D4, creating indexes and objects according to the dictionary and the corpus.
In this embodiment, the data structure of the waybill number automatic identification model includes a dictionary, a corpus, an index, and an object. Specifically, the dictionary has a data structure of:
dictionary={w 11 :0,w 12 :1,…,w 1m :m,……,w ij :n ij }。
wherein w ij J' th participle, n, representing the ith waybill number ij And the serial number corresponding to the jth participle of the ith waybill number is shown.
Further, in this embodiment, the corpus of the waybill number automatic identification model is a two-dimensional list with a line number L, where L represents the number of the express companies, each element in the two-dimensional list is a sequence number and a word frequency corresponding to a word-segmentation word, and the data structure is as follows:
wherein a represents the serial number of each express company, m a The number of the types of the words of the a-th express company,m after the express waybill number of the a express company is participated a The sequence number of each participle in the dictionary.M after the express waybill number of the a express company is participated a Word frequency of individual participles.
Further, in this embodiment, the index and the object tfidf are created according to the created dictionary and corpus. The data structure of the index is as follows:
where 0-n denotes the number of the participle word, p a,n Representing the similarity of the a-th express company corresponding to the nth participle word.
And D5, serializing the dictionary, the linguistic data, the index and the object data for automatically identifying the freight note number.
In the embodiment, a pickle module of python is used, and related data such as dictionaries, linguistic data, indexes and objects are serialized to be used as a data structure of the automatic identification model of the waybill number and stored in an external memory, so that the integrity of the data is ensured. When the waybill number needs to be recognized, the word segmentation words of the waybill number to be recognized are deserialized according to the data structure of the automatic waybill number recognition model.
And S3, acquiring the similarity threshold value of each corresponding express company based on the deserialized list data.
In this embodiment, the similarity threshold of each express company is stored in the similarity threshold list stored in the waybill number automatic identification model, and the similarity threshold list in the data structure can be retrieved according to the deserialized list data.
And S4, calculating the similarity value of the waybill number according to the deserialized list data and the similarity threshold.
The data in the similarity threshold list comprises the names of the express companies and corresponding similarity thresholds of the express companies, and the similarity value of the waybill number is combined with the similarity threshold list to calculate the similarity value of each word segmentation word of the waybill number.
Specifically, the waybill number automatic recognition model uses a dictionary.doc2bow (new _ text) function to convert the list data new _ text into a sparse vector, such as [ (0, 1), (1, 1) ] that the first participle appears 1 time and the 2 nd participle appears 1 time, and then obtains a new conversion vector new _ vec _ tff through tfidf [ new _ vec ] object.
Further, in this embodiment, the calculating of the waybill number similarity value by the TF-ID F algorithm based on the deserialized list data and the similarity threshold includes the following steps:
s41: and calculating the word frequency of each participle word.
Specifically, the word frequency TF of each participle word in the fortune bill number to be identified is calculated by the following formula w :
S42: and calculating the reverse file frequency of each word frequency.
Calculating the corresponding inverse file frequency IDF of each participle word i in the list data by the following formula:
s42: and calculating the similarity value of each participle word based on the word frequency of each participle word and the corresponding reverse file frequency.
In this embodiment, the waybill number automatic identification model calculates the ith through index indexing new _ vec _ tfidfSimilarity vector of word-segmentation word, i.e. similarity value sims [i] The calculation formula is as follows:
sims [i] =TF [i] *IDF。
specifically, the automatic identification model of the waybill number normalizes the word segmentation result vector of the waybill number through the tfidf object to obtain a value of new _ vec _ tfidf. And then indexing the normalized result through the index, and using the result as a similarity list sims of word segmentation words [i] 。
The following describes the steps of calculating the similarity sims in this embodiment, taking the waybill number YT2163166411211525 as an example.
Firstly, converting the transfer order number to be identified into a vector by using the word frequency data stored in the dictionary, wherein the vector corresponds to the value of token2id in the dictionary and the mapping times.
Specifically, the result of YT 2163166411525 after participle is [ '18YT', '18YT2', '18YT21', '18YT216', '18YT2163' ], and the result after the word frequency stored in the corresponding dictionary is converted into vector is [ (381, 1), (382, 1), (383, 1), (384, 1) ], i.e. the word 18YT with id 381 appears 1 time and the word 18YT2 with id 382 appears 1 time.
And secondly, according to tf-idf similarity analysis, returning the maximum text after normalization processing, and then using the index to reach the corresponding sims value after vector data is normalized.
Specifically, in this embodiment, the data structure of tfidf under gensim is a two-dimensional array. Where the y-axis is dj and the x-axis is tfidfij value. The vector normalized values of the word frequency translation are [ (381, 0.5), (382, 0.5), (383, 0.5), (384, 0.5) ], and then indexed to the corresponding sims using index. Wherein the data structure of sims is a list dj, and simsj is the final similarity sim of the waybill number.
Step S5: and obtaining the identification result of the waybill number by comparing the similarity value of the waybill number with the similarity threshold value.
In this embodiment, the similarity value of each word-segmentation word in the waybill number is combined with the similarity threshold list to analyze the waybill number, so as to obtain a waybill number identification result, and the specific steps are as follows:
s51: and comparing the similarity value of the waybill number with the similarity threshold value of each express company in the similarity threshold value list.
Specifically, the waybill number automatic identification model identifies the similarity value sims of the ith word segmentation word [i] Similarity threshold TD with each express company in similarity threshold list [i] A comparison was made. At this time, the initial value of i is 0.
S52: judging whether the similarity value of the waybill number is larger than or equal to the similarity threshold value of each express company in the similarity threshold value list or not; if yes, adding the current company name of the waybill number and the similarity value into the similarity threshold list; if not, continuing to compare.
If sims [i] ≥TD [i] Then the sims will be [i] And adding the corresponding express company name into the identification list, if not, continuing to compare, and at the moment, i is equal to i +1.
S53: judging whether the comparison times are smaller than the number of express companies in the similarity threshold list or not; if yes, outputting an identification list; if not, continuing to compare.
And N represents the number of express companies, when i is less than N, the comparison is continued, the ith word segmentation exceeding the threshold value and the corresponding sim value are stored in a similarity threshold value list together, and then the comparison result is returned.
The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the spirit or scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The various illustrative logical blocks, modules, and circuits described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
In one or more exemplary embodiments, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software as a computer program product, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a web site, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk (disk) and disc (disc), as used herein, includes Compact Disc (CD), laser disc, optical disc, digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks (disks) usually reproduce data magnetically, while discs (discs) reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the spirit or scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (6)
1. A method for intelligently identifying an invoice number based on an invoice number automatic identification model is characterized by comprising the following steps:
performing word segmentation on the waybill number to be recognized to obtain list data;
automatically identifying a model data structure according to the waybill number, and performing deserialization on the list data;
acquiring similarity threshold values of corresponding express companies based on the deserialized list data;
calculating the similarity value of the waybill number according to the deserialized list data and the similarity threshold;
acquiring an identification result of the waybill number by comparing the similarity value of the waybill number with a similarity threshold value;
the automatic identification model of the waybill number automatically acquires the identification result of the waybill number through model training, and comprises the following steps:
performing word segmentation on the collected waybill number data sets of the express companies, and storing word segmentation words into a word segmentation list;
calculating the word frequency of each word segmentation word and storing the word frequency;
establishing a dictionary and a corpus based on each participle word and the corresponding word frequency;
creating indexes and objects according to the dictionary and the corpus;
and serializing the dictionary, the corpus, the index and the object data for automatically identifying the waybill number.
2. The method of claim 1, wherein the list data comprises a plurality of word-segmented words, wherein each word-segmented word comprises a length of the waybill number and a number of corresponding words.
3. The method for intelligently identifying the waybill number based on the waybill number automatic identification model as claimed in claim 1, wherein the waybill number automatic identification model data structure comprises a dictionary, a corpus, an index and an object, and the list data is deserialized according to the dictionary, the corpus, the index and the object.
4. The method for intelligent identification of waybill numbers based on the waybill number automatic identification model of claim 1, wherein the deserialized list data and the similarity threshold are used for calculating the similarity value of the waybill number through a TF-IDF algorithm, comprising the steps of:
calculating the word frequency of each participle word;
calculating the reverse file frequency of each word frequency;
and calculating the similarity value of each participle word based on the word frequency of each participle word and the corresponding reverse file frequency.
5. The method for intelligently identifying the waybill number based on the waybill number automatic identification model according to claim 4, wherein the similarity threshold is stored in a similarity threshold list; the similarity threshold list comprises express company names and corresponding express company similarity thresholds, the similarity value of each word segmentation word is combined with the similarity threshold list to obtain the waybill number identification result, and the method comprises the following steps:
comparing the similarity value of the waybill number with the similarity threshold value of each express company in the similarity threshold value list;
judging whether the similarity value of the waybill number is larger than or equal to the similarity threshold value of each express company in the similarity threshold value list or not; if so, adding the current company name and the similarity value of the waybill number to the identification list; if not, continuing to compare;
judging whether the comparison times are smaller than the number of express companies in the similarity threshold list or not; if so, outputting an identification list as an identification result of the waybill number; if not, continuing to compare.
6. The method for intelligently identifying an waybill number based on the waybill number automatic identification model according to claim 1, wherein the word segmentation list comprises a plurality of word segmentation words and corresponding express company names; wherein each word-segmentation word comprises the length of the waybill number and the number of the corresponding word.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210102603.7A CN114492434B (en) | 2022-01-27 | 2022-01-27 | Intelligent waybill number identification method based on waybill number automatic identification model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210102603.7A CN114492434B (en) | 2022-01-27 | 2022-01-27 | Intelligent waybill number identification method based on waybill number automatic identification model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114492434A CN114492434A (en) | 2022-05-13 |
CN114492434B true CN114492434B (en) | 2022-10-11 |
Family
ID=81476254
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210102603.7A Active CN114492434B (en) | 2022-01-27 | 2022-01-27 | Intelligent waybill number identification method based on waybill number automatic identification model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114492434B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104866985A (en) * | 2015-05-04 | 2015-08-26 | 小米科技有限责任公司 | Express bill number identification method, device and system |
CN106021572A (en) * | 2016-05-31 | 2016-10-12 | 北京百度网讯科技有限公司 | Binary feature dictionary construction method and device |
CN107943860A (en) * | 2017-11-08 | 2018-04-20 | 北京奇艺世纪科技有限公司 | The recognition methods and device that the training method of model, text are intended to |
CN108958634A (en) * | 2018-07-23 | 2018-12-07 | Oppo广东移动通信有限公司 | Express delivery information acquisition method, device, mobile terminal and storage medium |
CN111080087A (en) * | 2019-11-28 | 2020-04-28 | 江苏艾佳家居用品有限公司 | Calling center scheduling method based on customer emotion analysis |
CN111582786A (en) * | 2020-04-29 | 2020-08-25 | 上海中通吉网络技术有限公司 | Express bill number identification method, device and equipment based on machine learning |
CN111881677A (en) * | 2020-07-28 | 2020-11-03 | 武汉大学 | Address matching algorithm based on deep learning model |
CN112860906A (en) * | 2021-04-23 | 2021-05-28 | 南京汇宁桀信息科技有限公司 | Market leader hot line and public opinion decision support method and system based on natural language processing |
CN113139376A (en) * | 2020-01-17 | 2021-07-20 | 广州敏行区块链科技有限公司 | Method for realizing express mail address verification |
CN113900995A (en) * | 2020-06-22 | 2022-01-07 | 江苏税软软件科技有限公司 | Method for intelligently searching files for tax affairs |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105162996A (en) * | 2014-07-18 | 2015-12-16 | 上海触乐信息科技有限公司 | Intelligent service interaction platform apparatus, system, and implementing method |
US12020354B2 (en) * | 2017-06-05 | 2024-06-25 | Umajin Inc. | Hub and spoke classification system |
CN110765244B (en) * | 2019-09-18 | 2023-06-06 | 平安科技(深圳)有限公司 | Method, device, computer equipment and storage medium for obtaining answering operation |
CN113139765B (en) * | 2020-01-20 | 2023-12-12 | 中国移动通信集团辽宁有限公司 | Logistics recommendation method and device based on temporal network and computing equipment |
CN113762846B (en) * | 2020-10-22 | 2024-04-16 | 北京京东振世信息技术有限公司 | Method and device for distinguishing face sheet text |
-
2022
- 2022-01-27 CN CN202210102603.7A patent/CN114492434B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104866985A (en) * | 2015-05-04 | 2015-08-26 | 小米科技有限责任公司 | Express bill number identification method, device and system |
CN106021572A (en) * | 2016-05-31 | 2016-10-12 | 北京百度网讯科技有限公司 | Binary feature dictionary construction method and device |
CN107943860A (en) * | 2017-11-08 | 2018-04-20 | 北京奇艺世纪科技有限公司 | The recognition methods and device that the training method of model, text are intended to |
CN108958634A (en) * | 2018-07-23 | 2018-12-07 | Oppo广东移动通信有限公司 | Express delivery information acquisition method, device, mobile terminal and storage medium |
CN111080087A (en) * | 2019-11-28 | 2020-04-28 | 江苏艾佳家居用品有限公司 | Calling center scheduling method based on customer emotion analysis |
CN113139376A (en) * | 2020-01-17 | 2021-07-20 | 广州敏行区块链科技有限公司 | Method for realizing express mail address verification |
CN111582786A (en) * | 2020-04-29 | 2020-08-25 | 上海中通吉网络技术有限公司 | Express bill number identification method, device and equipment based on machine learning |
CN113900995A (en) * | 2020-06-22 | 2022-01-07 | 江苏税软软件科技有限公司 | Method for intelligently searching files for tax affairs |
CN111881677A (en) * | 2020-07-28 | 2020-11-03 | 武汉大学 | Address matching algorithm based on deep learning model |
CN112860906A (en) * | 2021-04-23 | 2021-05-28 | 南京汇宁桀信息科技有限公司 | Market leader hot line and public opinion decision support method and system based on natural language processing |
Non-Patent Citations (3)
Title |
---|
NLP自然语言处理(二)—— 语料及词性标注 & 分词 & TFIDF;hxxjxw;《https://blog.csdn.net/hxxjxw/article/details/106932711?》;20200624;第1-7页 * |
基于深度学习的B2C电子商务物流服务质量评价研究;彭桢真;《中国优秀硕士学位论文全文数据库 信息科技辑》;20220115;第I138-3198页 * |
基于深度学习的快递表单信息处理及应用;张震;《中国优秀硕士学位论文全文数据库 信息科技辑》;20210315;第I138-470页 * |
Also Published As
Publication number | Publication date |
---|---|
CN114492434A (en) | 2022-05-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111414393B (en) | Semantic similar case retrieval method and equipment based on medical knowledge graph | |
CN111782763A (en) | Information retrieval method based on voice semantics and related equipment thereof | |
CN109753517A (en) | A kind of method, apparatus, computer storage medium and the terminal of information inquiry | |
CN111382248A (en) | Question reply method and device, storage medium and terminal equipment | |
CN111339277A (en) | Question-answer interaction method and device based on machine learning | |
CN110990390A (en) | Data cooperative processing method and device, computer equipment and storage medium | |
CN112287069A (en) | Information retrieval method and device based on voice semantics and computer equipment | |
CN115827819A (en) | Intelligent question and answer processing method and device, electronic equipment and storage medium | |
CN113282729B (en) | Knowledge graph-based question and answer method and device | |
CN106933824A (en) | The method and apparatus that the collection of document similar to destination document is determined in multiple documents | |
CN113741864B (en) | Automatic semantic service interface design method and system based on natural language processing | |
CN114492434B (en) | Intelligent waybill number identification method based on waybill number automatic identification model | |
CN113837307A (en) | Data similarity calculation method and device, readable medium and electronic equipment | |
CN114003731A (en) | Heterogeneous data processing method, device, server and storage medium | |
CN112015895B (en) | Patent text classification method and device | |
CN114491079A (en) | Knowledge graph construction and query method, device, equipment and medium | |
CN118364053A (en) | LANGCHAIN-based document vectorization and document segmentation method | |
CN113065343A (en) | Enterprise research and development resource information modeling method based on semantics | |
CN111553442A (en) | Method and system for optimizing classifier chain label sequence | |
CN118013364A (en) | Multidimensional data intelligent identification method | |
CN114091463B (en) | Regional work order random point analysis method and device, electronic equipment and readable storage medium | |
CN113535938B (en) | Standard data construction method, system, equipment and medium based on content identification | |
CN115204142A (en) | Open relationship extraction method, device and storage medium | |
WO2022127124A1 (en) | Meta learning-based entity category recognition method and apparatus, device and storage medium | |
CN116090413A (en) | Serialization-based general RDF data compression method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |