Tags: hrs/docsim
Tags
Ensure all Unicode characters are parsed correctly The previous implementation only handled ASCII characters! There's no reason to do that---Go supports Unicode just fine, and it's actually *easier* to do the right thing---so this adds Unicode letter and number support in parsing. It also extracts token parsing out into a separate function, which makes the tests more readable. Since it's now easier to test, this also backfills test coverage for tokens that include numbers.
Reorganize interface to take default string query When I started writing `docsim` I'd thought of it as a tool for comparing documents (hence `doc`ument `sim`ilarity). But I'd been thinking of "documents" as "files," not as abstract blobs of text including queries, so I added a `--query` flag that took a file, not a string. Previously: - `--query` took a path argument - passing in a string query only worked through `STDIN`. With this commit: - the first positional argument is always a string query, unless either: - there's a Boolean `--stdin` flag which reads the query from `STDIN`, or - there's a string flag `--file` that takes a file and acts in the current way. If there's both a `--file` and `--stdin` flag we error and print a usage message. Usage lines: docsim [OPTION...] QUERY [PATH...] command | docsim [OPTION...] --stdin [PATH...] docsim [OPTION...] --file PATH [PATH...] This commit also: - Updates the README and `man` page accordingly - Bumps the version to `0.1.4` - Improves usage error message