gendict(1) — Linux manual page

NAME | SYNOPSIS | DESCRIPTION | OPTIONS | CAVEATS | ENVIRONMENT | AUTHORS | VERSION | COPYRIGHT | SEE ALSO | COLOPHON

GENDICT(1)                  ICU 73.0.1 Manual                 GENDICT(1)

NAME         top

       gendict - Compiles word list into ICU string trie dictionary

SYNOPSIS         top

       gendict [ --uchars | --bytes --transform transform ] [ -h, -?,
       --help ] [ -V, --version ] [ -c, --copyright ] [ -v, --verbose ]
       [ -i, --icudatadir directory ]  input-file  output-file

DESCRIPTION         top

       gendict reads the word list from dictionary-file and creates a
       string trie dictionary file. Normally this data file has the
       .dict extension.

       Words begin at the beginning of a line and are terminated by the
       first whitespace.  Lines that begin with whitespace are ignored.

OPTIONS         top

       -h, -?, --help
              Print help about usage and exit.

       -V, --version
              Print the version of gendict and exit.

       -c, --copyright
              Embeds the standard ICU copyright into the output-file.

       -v, --verbose
              Display extra informative messages during execution.

       -i, --icudatadir directory
              Look for any necessary ICU data files in directory.  For
              example, the file pnames.icu must be located when ICU's
              data is not built as a shared library.  The default ICU
              data directory is specified by the environment variable
              ICU_DATA.  Most configurations of ICU do not require this
              argument.

       --uchars
              Set the output trie type to UChar. Mutually exclusive with
              --bytes.

       --bytes
              Set the output trie type to Bytes. Mutually exclusive with
              --uchars.

       --transform
              Set the transform type. Should only be specified with
              --bytes.  Currently supported transforms are: offset-<hex-
              number>, which specifies an offset to subtract from all
              input characters.  It should be noted that the offset
              transform also maps U+200D to 0xFF and U+200C to 0xFE, in
              order to offer compatibility to languages that require
              these characters.  A transform must be specified for a
              bytes trie, and when applied to the non-value characters
              in the input-file must produce output between 0x00 and
              0xFF.

        input-file
              The source file to read.

        output-file
              The file to write the output dictionary to.

CAVEATS         top

       The input-file is assumed to be encoded in UTF-8.  The integers
       in the input-file that are used as values must be made up of
       ASCII digits. They may be specified either in hex, by using a 0x
       prefix, or in decimal.  Either --bytes or --uchars must be
       specified.

ENVIRONMENT         top

       ICU_DATA
              Specifies the directory containing ICU data. Defaults to
              ${prefix}/share/icu/73.0.1/.  Some tools in ICU depend on
              the presence of the trailing slash. It is thus important
              to make sure that it is present if ICU_DATA is set.

AUTHORS         top

       Maxime Serrano

VERSION         top

       1.0

COPYRIGHT         top

       Copyright (C) 2012 International Business Machines Corporation
       and others

SEE ALSO         top

       http:https://www.icu-project.org/userguide/boundaryAnalysis.html 

COLOPHON         top

       This page is part of the ICU (International Components for
       Unicode) project.  Information about the project can be found at
       ⟨http:https://site.icu-project.org/home⟩.  If you have a bug report for
       this manual page, see ⟨http:https://site.icu-project.org/bugs⟩.  This
       page was obtained from the project's upstream Git repository
       ⟨https://github.com/unicode-org/icu⟩ on 2023-12-22.  (At that
       time, the date of the most recent commit that was found in the
       repository was 2023-12-22.)  If you discover any rendering
       problems in this HTML version of the page, or you believe there
       is a better or more up-to-date source for the page, or you have
       corrections or improvements to the information in this COLOPHON
       (which is not part of the original manual page), send a mail to
       [email protected]

ICU MANPAGE                    1 June 2012                    GENDICT(1)