# Project Description [中文文档](README_ZH.md) This item is used for word spell checking. Support English word spelling detection, and Chinese spelling detection. ```java final String hello = "hello"; final String speling = "speling"; Assert.assertTrue(EnWordCheckers.isCorrect(hello)); Assert.assertFalse(EnWordCheckers.isCorrect(speling)); ``` ## Return the best match result ```java final String hello = "hello"; final String speling = "speling"; Assert.assertEquals("hello", EnWordCheckers.correct(hello)); Assert.assertEquals("selling", EnWordCheckers.correct(speling)); ``` ## Corrected the match list by default ```java final String word = "goo"; List stringList = EnWordCheckers.correctList(word); Assert.assertEquals("[good, goo, goon, goof, gobo, gook, goop]", stringList.toString()); ``` ## Specify the size of the corrected match list ```java final String word = "goo"; final int limit = 2; List stringList = EnWordCheckers.correctList(word, limit); Assert.assertEquals("[go, good]", stringList.toString()); ``` # Chinese spelling correction ## Core api In order to reduce learning costs, the core api and `ZhWordCheckers` are consistent with English spelling detection. ## Is the spelling correct? ```java final String right = "正确"; final String error = "万变不离其中"; Assert.assertTrue(ZhWordCheckers.isCorrect(right)); Assert.assertFalse(ZhWordCheckers.isCorrect(error)); ``` ## Return the best match result ```java final String right = "正确"; final String error = "万变不离其中"; Assert.assertEquals("正确", ZhWordCheckers.correct(right)); Assert.assertEquals("万变不离其宗", ZhWordCheckers.correct(error)); ``` ## Corrected the match list by default ```java final String word = "万变不离其中"; List stringList = ZhWordCheckers.correctList(word); Assert.assertEquals("[万变不离其宗]", stringList.toString()); ``` ## Specify the size of the corrected match list ```java final String word = "万变不离其中"; final int limit = 1; List stringList = ZhWordCheckers.correctList(word, limit); Assert.assertEquals("[万变不离其宗]", stringList.toString()); ``` # Formatting Sometimes the user's input is various, this tool supports the processing of formatting. ## Case Uppercase will be uniformly formatted as lowercase. ```java final String word = "stRing"; Assert.assertTrue(EnWordCheckers.isCorrect(word)); ``` ## Full-width half-width Full-width will be uniformly formatted as half-width. ```java final String word = "string"; Assert.assertTrue(EnWordCheckers.isCorrect(word)); ``` # Custom English Thesaurus ## File configuration You can create the file `resources/data/define_word_checker_en.txt` in the project resource directory The content is as follows: ``` my-long-long-define-word,2 my-long-long-define-word-two ``` Different words are on their own lines. The first column of each row represents the word, and the second column represents the number of occurrences, separated by a comma `,`. The greater the number of times, the higher the return priority when correcting. The default value is 1. User-defined thesaurus has a higher priority than the built-in thesaurus of the system. ## Test code After we specify the corresponding word, the spelling check will take effect. ```java final String word = "my-long-long-define-word"; final String word2 = "my-long-long-define-word-two"; Assert.assertTrue(EnWordCheckers.isCorrect(word)); Assert.assertTrue(EnWordCheckers.isCorrect(word2)); ``` # Custom Chinese Thesaurus ## File configuration You can create the file `resources/data/define_word_checker_zh.txt` in the project resource directory The content is as follows: ``` 默守成规 墨守成规 ``` Use English spaces to separate, the front is wrong, and the back is correct. # Long text mixed in Chinese and English ## Condition The actual spelling of the story, the best user experience is a long text entered by the user, and it may be a mixture of Chinese and English. Then realize the corresponding functions mentioned above. ## Core method The `WordCheckers` tool class provides the automatic function of mixing Chinese and English long texts. | Function | Method | Parameters | Return Value | Remarks | |:----|:----|:----|:---|:----| | Determine whether the spelling of the word is correct | isCorrect(string) | The word to be detected | boolean | | | Return the best corrected result | correct(string) | The word to be detected | String | If no word that can be corrected is found, then return itself | | Determine whether the spelling of the text is correct | correctMap(string) | The text to be detected | `Map>` | Return a list of all matching corrections | | Determine whether the spelling of the text is correct | correctMap(string, int limit) | The text to be detected, the size of the returned list | Return the corrected list of the specified size | List size <= limit | ### Is the spelling correct? ```java final String hello = "hello 你好"; final String speling = "speling 你好 以毒功毒"; Assert.assertTrue(WordCheckers.isCorrect(hello)); Assert.assertFalse(WordCheckers.isCorrect(speling)); ``` ### Return the best corrected result ```java final String hello = "hello 你好"; final String speling = "speling 你好以毒功毒"; Assert.assertEquals("hello 你好", WordCheckers.correct(hello)); Assert.assertEquals("selling 你好以毒攻毒", WordCheckers.correct(speling)); ``` ### Determine whether the spelling of the text is correct Each word corresponds to the correction result. ```java final String hello = "hello 你好"; final String speling = "speling 你好以毒功毒"; Assert.assertEquals("{hello=[hello], =[ ], 你=[你], 好=[好]}", WordCheckers.correctMap(hello).toString()); Assert.assertEquals("{ =[ ], speling=[selling, spewing, sperling, seeling, spieling, spiling, speeling, speiling, spelding], 你=[你], 好=[好], 以毒功毒=[以毒攻毒]}", WordCheckers.correctMap(speling).toString()); ``` ### Determine whether the spelling of the text is correct Same as above, specify the maximum number of returns. ```java final String hello = "hello 你好"; final String speling = "speling 你好以毒功毒"; Assert.assertEquals("{hello=[hello], =[ ], 你=[你], 好=[好]}", WordCheckers.correctMap(hello, 2).toString()); Assert.assertEquals("{ =[ ], speling=[selling, spewing], 你=[你], 好=[好], 以毒功毒=[以毒攻毒]}", WordCheckers.correctMap(speling, 2).toString()); ``` # NLP 开源矩阵 [pinyin 汉字转拼音](https://github.com/houbb/pinyin) [pinyin2hanzi 拼音转汉字](https://github.com/houbb/pinyin2hanzi) [segment 高性能中文分词](https://github.com/houbb/segment) [opencc4j 中文繁简体转换](https://github.com/houbb/opencc4j) [nlp-hanzi-similar 汉字相似度](https://github.com/houbb/nlp-hanzi-similar) [word-checker 拼写检测](https://github.com/houbb/word-checker) [sensitive-word 敏感词](https://github.com/houbb/sensitive-word) # Late Road-Map - Support English word segmentation and process the entire English sentence - Support Chinese word segmentation spelling detection - Introduce Chinese error correction algorithm, homophone characters and similar characters processing. - Support Chinese and English mixed spelling detection # Technical Acknowledgements [Words](https://github.com/atebits/Words) provides raw English word data.