Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decoder_utf8: do not read escaped newline (#615) #692

Merged
merged 1 commit into from
Aug 16, 2018

Conversation

meggarr
Copy link
Contributor

@meggarr meggarr commented Jul 25, 2018

In the UTF-8 decoder, it tries to read escaped char for general
purpose. Escaping a new-line is valid in JSON encoding, if it
lines a UTF-8 decoder and then a JSON decoder, the JSON decoder
will fail, as the escaped new-line is treated as a real newline
in the first UTF-8 decoder.

This fixes the issue in #615

Signed-off-by: Richard Meng [email protected]

In the UTF-8 decoder, it tries to read escaped char for general
purpose. Escaping a new-line is valid in JSON encoding, if it
lines a UTF-8 decoder and then a JSON decoder, the JSON decoder
will fail, as the escaped new-line is treated as a real newline
in the first UTF-8 decoder.

This fixes the issue in fluent#615

Signed-off-by: Richard Meng <[email protected]>
@meggarr
Copy link
Contributor Author

meggarr commented Jul 26, 2018

After the change, my config for K8s is like below,

  fluent-bit.conf: |
    [SERVICE]
        Flush         1
        Log_Level     info
        Daemon        off
        Parsers_File  parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_Port     2020

    @INCLUDE input-kubernetes.conf
    @INCLUDE filter-kubernetes.conf
    @INCLUDE filter-fields.conf
    @INCLUDE output-elasticsearch.conf

  input-kubernetes.conf: |
    [FILTER]
        Name         modify
        Match        kube.*
        Rename       log log0

    [FILTER]
        Name         parser
        Match        kube.*
        Key_Name     log0
        Parser       escape_utf8_log
        Reserve_Data True

    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*.log
        Parser            docker
        DB                /var/log/flb_kube.db
        Buffer_Chunk_Size 64KB
        Buffer_Max_Size   512KB
        Mem_Buf_Limit     16MB
        Skip_Long_Lines   On
        Refresh_Interval  10

  filter-kubernetes.conf: |
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.kubernetes.rancher.internal:6443
        Annotations         Off
        Merge_Log           On
        K8S-Logging.Parser  On

  filter-fields.conf: |
    [FILTER]
        Name         record_modifier
        Match        kube.*
        Remove_key   time

    [FILTER]
        Name         modify
        Match        kube.*
        Rename       message message0

    [FILTER]
        Name         parser
        Match        kube.*
        Key_Name     message0
        Parser       escape_message
        Reserve_Data True

  output-elasticsearch.conf: |
    [OUTPUT]
        Name            es
        Match           *
        Host            ${ELASTICSEARCH_HOST}
        Port            ${ELASTICSEARCH_PORT}
        Logstash_Format On
        Logstash_Prefix cloud
        Retry_Limit     False


  parsers.conf: |
    [PARSER]
        Name   escape_message
        Format regex
        Regex  ^(?<message>.*)$
        # Command      | Decoder | Field | Optional Action
        # =============|=================|=================
        Decode_Field_As  escaped   message

    [PARSER]
        Name   json
        Format json
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   escape_utf8_log
        Format regex
        Regex  ^(?<log>.*)$
        # Command      | Decoder     | Field | Optional Action
        # =============|=====================|=================
        Decode_Field_As  escaped_utf8  log

    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   Off

It will handle the case blow,

{"log":"{\"timeMillis\":1532502611649,\"message\":\"Response \u003eGET /abc/xyz, 69ms, \u003e200 OK, \u003e{\\n  \\\"_id\\\" : \\\"50-CCBE59-261712046121\\\",\\n  \\\"manufacturer\\\" : \\\"BigBang\\\",\\n}\\n\",\"threadId\":64,\"threadPriority\":5}\r\n","stream":"stdout","time":"2018-07-25T07:10:11.650537766Z"}

@nikolay
Copy link

nikolay commented Jul 31, 2018

@edsiper Can you please merge this?

@edsiper
Copy link
Member

edsiper commented Aug 16, 2018

thanks. Merged into 0.13 branch (I will port the change to 0.14)

edsiper pushed a commit that referenced this pull request Aug 16, 2018
rawahars pushed a commit to rawahars/fluent-bit that referenced this pull request Oct 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants