# Project Description
[中文文档](README_ZH.md)
This item is used for word spell checking.
Support English word spelling detection, and Chinese spelling detection.
[![Maven Central](https://maven-badges.herokuapp.com/maven-central/com.github.houbb/word-checker/badge.svg)](http://mvnrepository.com/artifact/com.github.houbb/word-checker)
[![Build Status](https://www.travis-ci.org/houbb/word-checker.svg?branch=master)](https://www.travis-ci.org/houbb/word-checker?branch=master)
[![Coverage Status](https://coveralls.io/repos/github/houbb/word-checker/badge.svg?branch=master)](https://coveralls.io/github/houbb/word-checker?branch=master)
[![](https://img.shields.io/badge/license-Apache2-FF0080.svg)](https://github.com/houbb/word-checker/blob/master/LICENSE.txt)
[![Open Source Love](https://badges.frapsoft.com/os/v2/open-source.svg?v=103)](https://github.com/houbb/word-checker)
# Feature description
### Support English word correction
- 1000X faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
- You can quickly determine whether the current word is spelled incorrectly
- Can return the best match result
- You can return to the corrected matching list, support specifying the size of the returned list
- Error message support i18n
- Support uppercase and lowercase, full-width and half-width formatting
- Support custom thesaurus
### Support basic Chinese spelling check
# Change log
> [Change Log](https://github.com/houbb/word-checker/blob/master/CHANGELOG.md)
# Quick start
## JDK version
Jdk 1.7+
## maven introduction
```xml
com.github.houbb
word-checker
0.1.0
```
## Test Case
According to the input, the best correction result is automatically returned.
```java
final String speling = "speling";
Assert.assertEquals("selling", EnWordCheckers.correct(speling));
```
# Core api introduction
The core api is under the `EnWordCheckers` tool class.
| Function | Method | Parameters | Return Value | Remarks |
|:----|:----|:----|:---|:----|
| Determine whether the spelling of the word is correct | isCorrect(string) | The word to be detected | boolean | |
| Return the best corrected result | correct(string) | The word to be detected | String | If no word that can be corrected is found, then return itself |
| Determine whether the spelling of the word is correct | correctList(string) | The word to be detected | List | Return a list of all matching corrections |
| Determine whether the spelling of the word is correct | correctList(string, int limit) | The word to be detected, the size of the returned list | Return the corrected list of the specified size | List size <= limit |
## Test example
> See [EnWordCheckerTest.java](https://github.com/houbb/word-checker/tree/master/src/test/java/com/github/houbb/word/checker/util/EnWordCheckersTest.java)
## Is the spelling correct?
```java
final String hello = "hello";
final String speling = "speling";
Assert.assertTrue(EnWordCheckers.isCorrect(hello));
Assert.assertFalse(EnWordCheckers.isCorrect(speling));
```
## Return the best match result
```java
final String hello = "hello";
final String speling = "speling";
Assert.assertEquals("hello", EnWordCheckers.correct(hello));
Assert.assertEquals("selling", EnWordCheckers.correct(speling));
```
## Corrected the match list by default
```java
final String word = "goo";
List stringList = EnWordCheckers.correctList(word);
Assert.assertEquals("[good, goo, goon, goof, gobo, gook, goop]", stringList.toString());
```
## Specify the size of the corrected match list
```java
final String word = "goo";
final int limit = 2;
List stringList = EnWordCheckers.correctList(word, limit);
Assert.assertEquals("[go, good]", stringList.toString());
```
# Chinese spelling correction
## Core api
In order to reduce learning costs, the core api and `ZhWordCheckers` are consistent with English spelling detection.
## Is the spelling correct?
```java
final String right = "正确";
final String error = "万变不离其中";
Assert.assertTrue(ZhWordCheckers.isCorrect(right));
Assert.assertFalse(ZhWordCheckers.isCorrect(error));
```
## Return the best match result
```java
final String right = "正确";
final String error = "万变不离其中";
Assert.assertEquals("正确", ZhWordCheckers.correct(right));
Assert.assertEquals("万变不离其宗", ZhWordCheckers.correct(error));
```
## Corrected the match list by default
```java
final String word = "万变不离其中";
List stringList = ZhWordCheckers.correctList(word);
Assert.assertEquals("[万变不离其宗]", stringList.toString());
```
## Specify the size of the corrected match list
```java
final String word = "万变不离其中";
final int limit = 1;
List stringList = ZhWordCheckers.correctList(word, limit);
Assert.assertEquals("[万变不离其宗]", stringList.toString());
```
# Formatting
Sometimes the user's input is various, this tool supports the processing of formatting.
## Case
Uppercase will be uniformly formatted as lowercase.
```java
final String word = "stRing";
Assert.assertTrue(EnWordCheckers.isCorrect(word));
```
## Full-width half-width
Full-width will be uniformly formatted as half-width.
```java
final String word = "string";
Assert.assertTrue(EnWordCheckers.isCorrect(word));
```
# Custom English Thesaurus
## File configuration
You can create the file `resources/data/define_word_checker_en.txt` in the project resource directory
The content is as follows:
```
my-long-long-define-word,2
my-long-long-define-word-two
```
Different words are on their own lines.
The first column of each row represents the word, and the second column represents the number of occurrences, separated by a comma `,`.
The greater the number of times, the higher the return priority when correcting. The default value is 1.
User-defined thesaurus has a higher priority than the built-in thesaurus of the system.
## Test code
After we specify the corresponding word, the spelling check will take effect.
```java
final String word = "my-long-long-define-word";
final String word2 = "my-long-long-define-word-two";
Assert.assertTrue(EnWordCheckers.isCorrect(word));
Assert.assertTrue(EnWordCheckers.isCorrect(word2));
```
# Custom Chinese Thesaurus
## File configuration
You can create the file `resources/data/define_word_checker_zh.txt` in the project resource directory
The content is as follows:
```
默守成规 墨守成规
```
Use English spaces to separate, the front is wrong, and the back is correct.
# Long text mixed in Chinese and English
## Condition
The actual spelling of the story, the best user experience is a long text entered by the user, and it may be a mixture of Chinese and English.
Then realize the corresponding functions mentioned above.
## Core method
The `WordCheckers` tool class provides the automatic function of mixing Chinese and English long texts.
| Function | Method | Parameters | Return Value | Remarks |
|:----|:----|:----|:---|:----|
| Determine whether the spelling of the word is correct | isCorrect(string) | The word to be detected | boolean | |
| Return the best corrected result | correct(string) | The word to be detected | String | If no word that can be corrected is found, then return itself |
| Determine whether the spelling of the text is correct | correctMap(string) | The text to be detected | `Map>` | Return a list of all matching corrections |
| Determine whether the spelling of the text is correct | correctMap(string, int limit) | The text to be detected, the size of the returned list | Return the corrected list of the specified size | List size <= limit |
### Is the spelling correct?
```java
final String hello = "hello 你好";
final String speling = "speling 你好 以毒功毒";
Assert.assertTrue(WordCheckers.isCorrect(hello));
Assert.assertFalse(WordCheckers.isCorrect(speling));
```
### Return the best corrected result
```java
final String hello = "hello 你好";
final String speling = "speling 你好以毒功毒";
Assert.assertEquals("hello 你好", WordCheckers.correct(hello));
Assert.assertEquals("selling 你好以毒攻毒", WordCheckers.correct(speling));
```
### Determine whether the spelling of the text is correct
Each word corresponds to the correction result.
```java
final String hello = "hello 你好";
final String speling = "speling 你好以毒功毒";
Assert.assertEquals("{hello=[hello], =[ ], 你=[你], 好=[好]}", WordCheckers.correctMap(hello).toString());
Assert.assertEquals("{ =[ ], speling=[selling, spewing, sperling, seeling, spieling, spiling, speeling, speiling, spelding], 你=[你], 好=[好], 以毒功毒=[以毒攻毒]}", WordCheckers.correctMap(speling).toString());
```
### Determine whether the spelling of the text is correct
Same as above, specify the maximum number of returns.
```java
final String hello = "hello 你好";
final String speling = "speling 你好以毒功毒";
Assert.assertEquals("{hello=[hello], =[ ], 你=[你], 好=[好]}", WordCheckers.correctMap(hello, 2).toString());
Assert.assertEquals("{ =[ ], speling=[selling, spewing], 你=[你], 好=[好], 以毒功毒=[以毒攻毒]}", WordCheckers.correctMap(speling, 2).toString());
```
# NLP 开源矩阵
[pinyin 汉字转拼音](https://github.com/houbb/pinyin)
[pinyin2hanzi 拼音转汉字](https://github.com/houbb/pinyin2hanzi)
[segment 高性能中文分词](https://github.com/houbb/segment)
[opencc4j 中文繁简体转换](https://github.com/houbb/opencc4j)
[nlp-hanzi-similar 汉字相似度](https://github.com/houbb/nlp-hanzi-similar)
[word-checker 拼写检测](https://github.com/houbb/word-checker)
[sensitive-word 敏感词](https://github.com/houbb/sensitive-word)
# Late Road-Map
- Support English word segmentation and process the entire English sentence
- Support Chinese word segmentation spelling detection
- Introduce Chinese error correction algorithm, homophone characters and similar characters processing.
- Support Chinese and English mixed spelling detection
# Technical Acknowledgements
[Words](https://github.com/atebits/Words) provides raw English word data.