Skip to content

Releases: jqnatividad/qsv

0.22.1

22 Nov 14:22
Compare
Choose a tag to compare
  • added lua and foreach feature flags. These commands are very powerful and can be easily abused or get into "foot-shooting" scenarios.
    They are now only enabled when these features are enabled during install/build.
  • censor and censor_check now support the addition of custom profanities to screen for with the --comparand option.
  • smaller stripped binaries for x86_64-unknown-linux-gnu, i686-unknown-linux-gnu, x86_64-apple-darwin targets
  • expanded apply help text
  • added more tests (currencytonum, censor, censor_check)

See CHANGELOG for details.

0.22.0

15 Nov 20:42
Compare
Choose a tag to compare

MAJOR NEW FEATURES

  • generate command. Generate test data by profiling a CSV using Markov decision process machine learning.
  • add --no-headers option to rename command (see discussion #81)
  • New environment variables:
    • QSV_DEFAULT_DELIMITER - single ascii character to use as delimiter. Overrides --delimeter option.
      Defaults to "," (comma) for CSV files and "\t" (tab) for TSV files, when not set. Note that this will also set the delimiter for qsv's output. Adapted from xsv PR by @camerondavison.
    • QSV_NO_HEADERS - when set, the first row will NOT be interpreted as headers. Supersedes QSV_TOGGLE_HEADERS.
    • QSV_MAX_JOBS - number of jobs to use for parallelized commands (currently frequency, split and stats). If not set, max_jobs is set
      to number of logical processors divided by four. See Parallelization for more info.
    • QSV_REGEX_UNICODE - if set, makes search, searchset and replace commands unicode-aware.
      For increased performance, these commands are not unicode-aware and will ignore unicode values when matching and will panic when unicode characters are used in the regex.
  • Added parallelization heuristic (num_cpus/4), in connection with QSV_MAX_JOBS.

See CHANGELOG for details.

0.21.0

08 Nov 00:49
Compare
Choose a tag to compare

MAJOR NEW FEATURES

  • added apply geocode caching, more than doubling performance in the geocode benchmark.
  • added --random and --seed options to sort command from @pjsier, enabling reproducible, randomized "scrambling" of CSVs.
  • Bash shell qsv tab completion
  • additional apply operations subcommands:
    • Match Trim operations - enables trimming of more than just whitespace, but also of multiple trim characters in one pass (Example):
    • replace: Replace all matches of a pattern (using --comparand)
      with a string (using --replacement) (Std::String replace wrapper).
    • regex_replace: Replace the leftmost-first regex match with --replacement (regex replace wrapper).
    • titlecase - capitalizes English text using Daring Fireball titlecase style
      https://daringfireball.net/2008/05/title_case
    • censor_check: check if profanity is detected (boolean) Examples
    • censor: profanity filter
  • added parameter validation to apply operations subcommands
  • added more robust parameter validation to apply command by leveraging docopt

More benchmark script improvements:

  • allow binary to be changed, so users can benchmark xsv and other xsv forks by simply replacing the $bin shell variable
  • now uses a much larger data file - a 1M row, 512 mb, 41 column sampling of NYC's 311 data

See CHANGELOG for details.

0.20.0

31 Oct 15:36
Compare
Choose a tag to compare

MAJOR NEW FEATURES

  • major refactoring of apply command:
    • to take advantage of docopt parsing/validation.
    • instead of one big command, broke down apply to several subcommands:
      • operations
      • emptyreplace
      • datefmt
      • geocode
  • added string similarity operations to apply command:
    • simdl: Damerau-Levenshtein similarity
    • simdln: Normalized Damerau-Levenshtein similarity (between 0.0 & 1.0)
    • simjw: Jaro-Winkler similarity (between 0.0 & 1.0)
    • simsd: Sørensen-Dice similarity (between 0.0 & 1.0)
    • simhm: Hamming distance. Number of positions where characters differ.
    • simod: OSA Distance.
    • soundex: sounds like (boolean)
  • added progress bars to commands that may spawn long-running jobs - for this release,
    apply, foreach, and lua. Progress bars can be suppressed with --quiet option.
  • added progress bar helper functions to utils.rs.

Benchmark improvements:

  • added apply to benchmarks.
  • added sample NYC 311 data to benchmarks.
  • added records per second (RECS_PER_SEC) to benchmarks

See CHANGELOG for details.

0.19.0

25 Oct 03:16
Compare
Choose a tag to compare

MAJOR NEW FEATURES

  • new scramble command. Randomly scrambles a CSV's records.
  • read/write buffer capacity can now be set using environment variables
    QSV_RDR_BUFFER_CAPACITY and QSV_WTR_BUFFER_CAPACITY (in bytes).
  • benchmark script revamped. Now produces aligned output onscreen,
    while also creating a benchmark TSV file; downloads the sample file from GitHub;
    benchmark more commands. Designed to help users tailor and maximize qsv's performance
    in their environment.
  • added a Performance Tuning section in the README.

See CHANGELOG for details.

0.18.2

21 Oct 15:21
Compare
Choose a tag to compare

See CHANGELOG for details.

0.18.1

21 Oct 01:01
Compare
Choose a tag to compare

See CHANGELOG for details.

0.18.0

19 Oct 00:24
Compare
Choose a tag to compare

MAJOR NEW FEATURES

  • stats mode is now also multi-modal -i.e. returns multiples modes when detected.
    e.g. mode[1,1,2,2,3,4,6,6] will return [1,2,6].
    It will continue to return one mode if only one is detected.
  • stats quartile now also computes IQR, lower/upper fences and skew (using Pearson's median skewness). For code simplicity, calculated skew with quartile.
  • join now also support left-semi and left-anti joins, the same way Spark does.
  • search --flag option now returns row number, not just '1'.
  • searchset --flag option now returns row number, followed by a semi-colon, and a list of matching regexes.

See CHANGELOG for details.

0.17.3

12 Oct 14:42
Compare
Choose a tag to compare

See changelog for details.

0.17.2

11 Oct 01:57
Compare
Choose a tag to compare

See CHANGELOG for details.