Skip to content

Latest commit

 

History

History
352 lines (282 loc) · 15.9 KB

CHANGELOG.md

File metadata and controls

352 lines (282 loc) · 15.9 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

1.6.2 - 2019-12-18

Fixed

  • Using random generation name could crash the program in some cases because it was set twice (issue #31)
  • Several escapable characters (~, @, %, |, { and }) were not considered escapable (reopened issue #24)
  • rule command crashed when called with a number of examples to generate (issue #29)
  • Parser opened file upon being created rather than when starting to parse, leading to parsing of the same file several types after resetting the system (issue #28)
  • One escapable character was missing, leading to escape character not being removed for ] (issue #27)

Added

  • When using the JSONL adapter, entities have a new end-index field representing the first index after the entity that is not part of it
  • Comment at the beginning of each Rasa Markdown output file, stating that the file was generated using Chatette

1.6.1 - 2019-11-11

Added

  • New opposite random generation modifier (using syntax [unit?!randgen name])

Changed

  • Temporarily, caching is disabled when there are more than 50 units declared, in order to prevent performance issues with large template files
  • Rules don't cache their examples at all any longer, as it ended up mostly duplicating caches for very few performance increase

Fixed

  • Escapement characters not being removed in all cases (issue #24)
  • Entity positions were incorrectly updated in some cases (issue #22)

1.6.0 - 2019-09-18

Added

  • New adapter to output a Markdown file that can be used as input for Rasa NLU
  • New choice syntax: [choice1|choice2]

Changed

  • Shadowing a unit definition (i.e. redefining a unit a second time) is not allowed anymore
  • File inclusion is done with respect to the file currently being parsed rather than the master file
  • Command set-modifier now accepts randgen, randgen-name and randgen-percent
  • Running the interactive command line interpreter without asking to parse a file is now allowed, using the command python -m chatette -i
  • Show the seed used during execution to allow to re-execute the program in the exact same way
  • Allow the percent symbol % to be appended to random generation percentages
  • Accept non-integer percentages for random generation percentages
  • Choices can contain other choices
  • Choices can now take random generation names and random generation percentages
  • Merge word groups and choices together to make the new choice syntax
  • Large refactor of the parser and generator to improve the quality, maintainability and readability of the code
  • Manage parsing statistics by creating a class intended for that
  • Only require rasa_nlu_data as a top-level field in base file (not common_examples and entity_synonyms anymore)

Removed

  • Completely removed the limits on the number of examples that can be generated

Fixed

  • Don't crash when file paths and names contain unicode characters in Python 2.7
  • Take random generation names into account when generating all possible examples (issue #19)
  • Prevent some compatibility issues when using different versions of Python
  • Double space generated in choices in some very precise cases

Deprecated

  • Deprecate old choice syntax {choice1/choice2} in favor of the new syntax [choice1|choice2]

1.5.0 - 2019-06-13

Added

  • Program option -f or --force to overwrite the output folder without asking the user for confirmation
  • Base file containing predefined JSON data, that can be extended with generated data, when using the Rasa adapter

Removed

  • Drop tests for Python 3.3 because pytest dropped support for it => Python 3.3 is not supported anymore

Changed

  • Max number of examples per intent to generate (20'000 => 1'000'000)
  • Ask for user confirmation before overwriting the output folder. Use program option -f or --force to have the same behavior as before (no confirmation).

Fixed

  • Command interpreter not working with Python 2.7

1.4.2 - 2019-04-24

Fixed

  • Entity marker not being removed from the generated text when the slot starts and ends with whitespaces
  • Computation of the maximum number of examples that a choice could generate was 1 off. It could lead to a "sample larger than population" error

1.4.1 - 2019-03-07

Changed

  • Make adapter_str, local and seed arguments of the contructor of the facade optional arguments

Fixed

  • Template files were included with respect to the initial master file rather than the file that was currently being read

1.4.0 - 2019-02-17

Added

  • Check for circular includes: an exception will be raised with relevant information about which file was starting to get parsed twice rather than the old "too many recursion" error
  • Interactive mode, executable using -i or --interactive program option, with commands that give information or change the state of the parser after it read template files
  • Add program option -I or --interactive-commands-file to feed the script a file of commands that will be directly executed

Changed

  • jsonl adapter doesn't create a synonyms.json file anymore if it has no data to write inside it
  • Output file is written using the default encoding of the platform Chatette is being used on, to avoid encoding issues
  • Output folder and all its contents are now deleted before being created again to write the output file(s), in order to prevent old outputs from being mixed up with new outputs
  • Accept comment lines and empty lines inside unit definitions (not considered as a new rule or as a syntax error as it was the case before)
  • Completely refactor the parser and tokenizer: when instantiating the parser, give it the path of the file rather than the file itself. Parsing might behave differently than it used to.
  • Move all code that is related to parsing into directory parsing and rename file parsing.py back to parser.py

Fixed

  • Duplicate examples not removed when they were generated by different rules
  • Entity values being synonyms of themselves if the same value was used in several different slots
  • Number of training and testing examples for intent not correctly parsed in some cases
  • Too strict checks on the syntax of choices
  • Possible infinite loop during generation (with a lot of bad luck)
  • (Invisible) warnings because of invalid control sequences in the code and the tests

1.3.2 - 2019-01-21

Added

  • Wiki explaining the whole syntax of template files and the usage of the program
  • Examples for the wiki
  • Use a built-in DeprecationWarning for deprecation warnings (additionally to printing the warning on stdout)

Fixed

  • Possible exception caused by missing import
  • Output directory (provided by the user with -o or --output flag) was ignored

1.3.1 - 2019-01-10

Added

  • Program option -v or --version to display the version number of the module (a __version__ attribute of the module itself is also now available)

Fixed

  • Missing requirements when installing the package from PyPI
  • Casegen (i.e. change of case for examples) didn't apply for some definitions (notably when not asking for a specific number of examples to be generated)
  • Version number displayed in help messages in terminal

1.3.0 - 2018-12-30

Added

  • Code of conduct and instructions for contributing
  • Unit tests for some parts of the projects (automatically run by Travis CI)
  • New adapter that outputs .jsonl files (choosing which adapter to use is done with the program option -a or --adapter)

Changed

  • The number of examples to generate for training and testing does not need to be surrounded with single quotes anymore (but still can): 'training':'5' is accepted as well as test: 3
  • The output files cannot contain more than 10000 examples anymore
  • The output files are now by default put in folders output/train/ and output/test/
  • Refactoring of some parts of the code
  • Script is now referred to as chatette rather than chatette.run when executing from the command line

Fixed

  • Using an empty definition now raises an exception rather than removing all generated examples
  • Having a line with only spaces doesn't crash the script anymore
  • When writing output files in Rasa format, the entity highlighted could be located incorrectly in the example text (if an entity value was used twice for example)
  • Possible duplicated examples when generating units with different letter case

1.2.3 - 2018-11-22

Added

  • Command line option (-l or --local) to make the working directory be the directory containing the template file

Changed

  • Working directory to be the directory from which the command is executed

Fixed

  • Missing import in a particular case
  • Several error messages that used legacy variables
  • parser.py changed to parsing.py to avoid some computers importing the default Python module named parser

1.2.2 - 2018-11-04

Added

  • Program option (-s or --seed) that is used as the seed of the random number generator

Fixed

  • Restaurant example which still had tests within it

1.2.1 - 2018-10-22

Changed

  • Accept train and test as well as training and testing for the identifiers of the numbers of examples to generate

Fixed

  • Potential ImportErrors when running script directly from the command line
  • Logo display on PyPI

1.2.0 - 2018-09-19

Added

  • Contributors to README

Changed

1.1.5 - 2018-09-19

Added

  • Files to make the script a package and register it on PyPI

Changed

  • More pythonic project structure

Fixed

  • Generator's max number of example setter missing a parameter

1.1.4 - 2018-09-16

Added

  • Possibility to change some special characters from the code

Changed

  • Accept modifiers in any order

1.1.3 - 2018-09-13

Changed

  • In synonyms lists, replace argument identifiers with their previously encountered values (i.e. each value accross the whole templates)

1.1.2 - 2018-09-11

Added

  • Hard limit on the generation of intent example to avoid producing too large files (by default, not more then 20'000 examples per intent)

Changed

  • Manage arguments within generated entities
  • Release number to follow SemVer 2.0.0

Fixed

  • When asking to generate lots of examples, entities were not listed

1.1.1 - 2018-09-11

Added

  • Warning about circular references in the documentation

Changed

  • Deprecate semi-colon syntax in documentation

1.1.0 - 2018-09-11

Added

  • Support for generation of non-overlapping training and testing datasets
  • Parser support for Chatito v2.1.x's syntax for asking for intent generation (('training': '5', 'testing': '3')). Old way is not deprecated!

Changed

  • Discard duplicates in generated examples (for both training and test datasets)
  • Discard inapplicable case generation modifiers (in most cases)

Deprecated

  • Semi-colon ; syntax for comments (rather use double slash // syntax) to stick closer to Chatito v2.1.x

1.0.0 - 2018-09-08

Added

  • Changelog
  • Logo
  • Syntax documentation
  • Add shebang in all files

Changed

  • Update README to be nice for users
  • Update real-life data
  • Generate all possible examples when no number of generation is given: Chatette is now a superset of Chatito v2.0.0
  • Use more list and dict comprehensions

0.4.2 - 2018-08-25

Changed

  • Take variations and synonyms into account when generating all possibilities
  • Slash / syntax in slot definitions now takes the identifier of the first token of the rule to avoid having the same behavior as the empty equal syntax
  • Update real-life data

Fixed

  • Empty synonyms list
  • Incorrect letter case within entities

0.4.1 - 2018-08-24

Removed

  • Generator methods now unused with new parser

Fixed

  • Crashing Rasa adapter

0.4.0 - 2018-08-24

Added

  • Generate all possible strings for each and every token
  • Synonym support (in Rasa NLU format) in generator

Changed

  • Rewrite the whole parser in an Object-Oriented way (with support for everything that was supported before)
  • Update real-life data

Fixed

  • Escapement for arguments being removed too soon

0.3.2 - 2018-08-20

Fixed

  • Comment lines and empty lines within definitions being considered as rules
  • Several bugs when referencing without variations a token defined with some

0.3.1 - 2018-08-19

Removed

  • Lots of debugging prints

0.3.0 - 2018-08-19

Added

  • Argument support
  • Real-life data
  • Random generation to choices

Changed

  • Possibility to use a token without variation even though it was defined with it
  • Simplify the parser
  • Check that tokens are named

Fixed

  • Keep track of leading spaces with choices
  • Escapement within choices
  • Line feed \n inside parsed strings
  • Assumption that words provided to the generator begin with a lowercase letter

0.2.0 - 2018-08-17

Added

  • Support for choices in the parser and the generator
  • Easier way to have a slot value named as the string generated (i.e. using slash / syntax)

Changed

  • Named random generations now generate (or don't generate) together

0.1.0 - 2018-08-17

Added

  • MIT license file
  • README file
  • .gitignore file
  • Draft of syntax description
  • Utility functions
  • Complete parser with support for words, word groups, aliases, slots and intents
  • Support for slot value names
  • Generator able to generate an output file in Rasa NLU format (without support for synonyms or regex features)