Skip to content

File Encoding

guwirth edited this page Mar 10, 2021 · 2 revisions

To read in files the encoding is defined as follows:

  • source files:

    • First, it checks if the file contains a BOM. If a BOM is present, this encoding is used.
    • For files without BOM it tries to read the encoding from the property sonar.sourceEncoding
    • default is default system encoding
  • XML reports:

    • Encoding type is read from the prolog section of the XML document.
    • If no definition is available, UTF-8 is used.
  • text reports:

    • First, it checks if the file contains a BOM. If a BOM is present, this encoding is used.
    • For files without BOM it tries to read the encoding from the properties (e.g. sonar.cxx.clangtidy.charset=UTF8).
    • If no property is defined, UTF-8 is used as default.

The list of available encodings depends on your JVM. Every implementation of the Java platform is required to support the following standard charsets:

Charset Description
US-ASCII Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set
ISO-8859-1 ISO Latin Alphabet No. 1, a.k.a. ISO-LATIN-1
UTF-8 Eight-bit UCS Transformation Format
UTF-16BE Sixteen-bit UCS Transformation Format, big-endian byte order
UTF-16LE Sixteen-bit UCS Transformation Format, little-endian byte order
UTF-16 Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark
Clone this wiki locally