Skip to content

Latest commit

 

History

History
213 lines (142 loc) · 8.63 KB

XYPCRE.md

File metadata and controls

213 lines (142 loc) · 8.63 KB

xypcre REFERENCE & USAGE

- REFERENCE | REMARKS | USAGE -



REFERENCE

All functions reside within the xypcre:: namespace.

- core functions | utility functions | helper functions -


Core Functions

Core functions comprise the principal features of xypcre.

- pcrematch() | pcrereplace() | pcrecapture() | pcresplit() -

  1. pcrematch()

    Finds match(es) of regexp pattern in given string.

    Syntax: pcrematch(string, pattern, sep='||', index=0, format=2)

    string   String to work on (haystack).
    pattern  The regexp pattern to match (needle).
    sep      Separator between returned matches, at least two characters long.
    index    1-based index of one match to return when there are multiple matches.
             Ineffective if < 1. returns last match if > total count.
    format   Format of returned data. Can be one of 0, 1, 2. See REMARKS.
    

    Return: Matching substring(s) in defined format.

  2. pcrereplace()

    Makes replacements to regexp pattern match(es) in given string.

    Syntax: pcrereplace(string, pattern, replace)

    string   String to work on (haystack).
    pattern  The regexp pattern to match (needle).
    replace  The string or pattern to replace match with.
    

    Return: Resulting string after replacement.

  3. pcrecapture()

    Finds match(es) of specified regexp capture group in given string.

    Syntax: pcrecapture(string, pattern, index=1, sep='||', format=2)

    string   String to work on (haystack).
    pattern  The regexp pattern to match (needle), with at least one capture group.
    sep      Separator between returned matches, at least two characters long.
    index    1-based index of the capturing group to return.
             Returns 1st group if < 1, or last one if > total count.
             Pass named groups by their ordinal index.
    format   Format of returned data. Can be one of 0, 1, 2. See REMARKS.
    

    Return: Matching substring(s) of the group, separated by sep, in defined format.

  4. pcresplit()

    Splits given string at each point where regexp pattern matches.

    Syntax: pcresplit(string, pattern, sep='||', format=2)

    string   String to work on (haystack).
    pattern  The regexp pattern to split at (needle).
    sep      Separator between returned substrings, at least two characters long.
    format   Format of returned data. Can be one of 0, 1, 2. See REMARKS.
    

    Return: Split substrings, separated by sep, in defined format.

    Notes: Matched text is destroyed by split. Use lookaheads/lookbehinds to retain portions.


Utility Functions

Utility functions are intended to assist in working with core functions.

- pcretoken() -

  1. pcretoken()

    Converts a substring/token in the return data of core xypcre functions to its original form. This is equivalent to gettoken(), but for the special xypcre return formats.

    Syntax: pcretoken(data, index=1, format=2, sep='||')

    data     The source tokenlist data.
    index    1-based index of token to return, or total token count if specified as 'count'.
    format   Format of data. Can be 0, 1, or 2. See REMARKS.
    sep      Separator used in data between tokens.
    

    Return: Specified token, or total count. The token is returned in original unescaped form.


Helper Functions

Helper functions are required dependencies of Core functions.

- xypcrefind() | xypcrewaiter() -

  1. xypcrefind()
    Finds a valid xypcre.exe (downloaded if not found) and returns its path.
  2. xypcrewaiter()
    Synchronizes communication between xyscript and xypcre.


REMARKS

Some functions can return a list of substring or tokens. The following points are relevant in this case:

  1. sep should be at least 2 characters long. It is suggested to be the same character, repeated twice.
    the sep parameter is irrelevant if format is set to 2.

  2. format decides the format of returned data. Possible values are one of 0 or 1 or 2 (default).

    • 0: return tokens are separated by sep, and not further processed in any way. Not even if sep characters already exist in the strings. Because of this, gettoken() may fail to retrieve complete tokens.
      But this is the fastest format when the return is known to contain no characters of sep, eg, when sep is <crlf 2>, and the source string is all in one line.
    • 1: return tokens are separated by sep, and each sep character inside tokens are surrounded with square brackets.
      For example, if sep is <>, a tokenlist abc>def<>ghi becomes abc[>]def<>ghi.
      gettoken() is able to retrieve a complete token, but it has to be unescaped afterwards.
    • 2: return is a string constructed as: <token1 length>+<token2 length>|<token1><token2>
      Eg, for substrings 'data', '', 'info|intel', the return is: 4+0+10|datainfo|intel
      As stated before, the sep parameter is irrelevant if this format is used.
  3. Regardless of which format and sep is used, the pcretoken() function is able to retrieve any one substring/token in its original representation.

The reason for all this elaborate escaping and formatting of return data is to retrieve complete matches even when the matched text may contain the separator characters.



USAGE

- Misc Usage Notes | Debugging Notes -

The system is comprised of a xyscript include file: xypcre.xyi, and an executable utility: xypcre.exe.

  • INCLUDE the xyi file in your script like this, assuming it's saved as <xyscripts>\_inc\xypcre.xyi

     INCLUDE '_inc\xypcre.xyi';
       text pcretoken(pcrematch('a,b,c,a,bb,d', 'b+(?=,d)'), 1);  // demo, return 1st match of pattern
    
  • Multiple instances of xypcre can run independently, even from inside another xypcre function.

  • The functions look for xypcre.exe in these locations, in this order:

    • $P_UDF_pcre_xypcre, a permanent variable pointing to the executable. then,
    • <xyscripts>\xypcre.exe, then,
    • <xydata>\xypcre.exe, then,
    • <xypath>\xypcre.exe, then,
    • If the utility is not found, it is downloaded from the releases page of the github repo. If downloading failed, the parent function aborts and returns an empty string.
  • The helper functions must be included with core functions, especially when specific core functions are included separately.

Misc Usage Notes

  • Contrary to builtin regexp functions, settings such as multiline and case sensitivity do not have dedicated parameters, instead these are controlled by flags in the regexp pattern.

  • The functions pcrematch(), pcrecapture() and pcresplit() can return a list of substrings. pcretoken() is recommended for retrieving a single token (or total count) from such returns.

  • Most of PCRE(1) syntax is available. See here (au3) and here (pcre) for details on supported functionality. (These pages also describe some defaults and quirks.)

Debugging Notes

  • If a malformed or too complex regexp pattern is provided, xypcre.exe might seem to hang or freeze. An abort prompt is displayed every 8 seconds (usually plenty of time for a good regexp to finish.)
    If confirmed, this kills the offending xypcre.exe and exits the function with an empty return.

  • If the functions still cannot finish properly for some reason:

    • Quit xypcre.exe from explorer taskbar, Or
    • Clear any permanent variable named in the form of $P_UDF_pcre_IFS#, where # is any number, Or
    • press Esc repeatedly to stop the entire script stack unconditionally.
  • xypcrefind() has a commented out section for advanced users to enable use of au3 source script instead of compiled exe.


Read all that? Really, that entire tower of text? Great! I hope you find it useful. :tup: