Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collector cannot forward to another collector using the otlphttp exporter #4221

Closed
timcosta opened this issue Oct 20, 2021 · 8 comments · Fixed by #4269
Closed

Collector cannot forward to another collector using the otlphttp exporter #4221

timcosta opened this issue Oct 20, 2021 · 8 comments · Fixed by #4269
Labels
bug Something isn't working

Comments

@timcosta
Copy link

Describe the bug
The collector is unable to forward traces to another collector using the otlphttp exporter and otlp receiver. The data exits the original collector with trace/span IDs base64 encoded instead of hex encoded, which causes the secondary receiver to 400 with an {"code":3,"message":"invalid length for ID"} response.

Related:
#3195
open-telemetry/opentelemetry-go#1990

Steps to reproduce
Just point a collector using the otlphttp exporter at another collector, and all trace requests will fail

What did you expect to see?
I expected trace requests coming out of the otlphttp exporter to have properly formatted trace and span IDs

What did you see instead?
Trace and span IDs were base64 encoded. For example, the otlphttp exporter outputs QwOFPwhvT4yGzxmLZVHfhA== as the trace id when it received 4303853f086f4f8c86cf198b6551df84.

What version did you use?
Version: 0.36.0

What config did you use?
Config: Here's the secondary receiver config

    receivers:
      otlp:
        protocols:
          http:
            endpoint: 0.0.0.0:${port}

    processors:
      batch:

    exporters:
      logging:
        loglevel: debug
      jaeger:
        endpoint: jaeger-collector.${namespace}.svc.cluster.local:14250
        tls:
          insecure: true

    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [batch]
          exporters: [logging,jaeger]

Environment
OS: otel/opentelemetry-collector-contrib:0.36.0
Compiler(if manually compiled): provided in docker image

Additional context
This seems to be due to an issue in the Go SDK, but the SIG meeting elected to drop JSON exporting support instead of fixing the issue. Is this fixable in the collector? If not, should the otlphttp exporter be removed from the collector as it exports data that isnt spec-compliant?

@timcosta timcosta added the bug Something isn't working label Oct 20, 2021
@bogdandrutu
Copy link
Member

bogdandrutu commented Oct 20, 2021

@timcosta I am confused about which exporter are you talking about. The otlphttp in the collector is not related to the one in the Go SDK, and the otlphttp exporter in the collector does not support JSON.

@timcosta
Copy link
Author

@bogdandrutu apologies for the confusion - let me clarify.

We are using the otlphttp exporter in the collector. When we receive data at an HTTP endpoint, the traceId and spanId in the data we receive is incorrectly formatted.

Example trace protobuf we received:

CscBCkkKIAoMc2VydmljZS5uYW1lEhAKDnVvcC5zdGFnZS1ldS0xCiUKGW91dHN5c3RlbXMubW9kdWxlLnZlcnNpb24SCAoGOTAzMzg2EnoKEQoMdW9wX2NhbmFyaWVzEgExEmUKEEMDhT8Ib0+Mhs8Zi2VR34QSCOVRPDJ5XEG5IgA5QE41aASRrxZBQE41aASRrxZKEAoKc3Bhbl9pbmRleBICGANKHwoNY29kZS5mdW5jdGlvbhIOCgxteUZ1bmN0aW9uMzZ6AA==

Here's how "decoding it" goes:

>>> from opentelemetry.proto.collector.trace.v1.trace_service_pb2 import ExportTraceServiceRequest
>>> import base64
>>> test_bytes = base64.b64decode('CscBCkkKIAoMc2VydmljZS5uYW1lEhAKDnVvcC5zdGFnZS1ldS0xCiUKGW91dHN5c3RlbXMubW9kdWxlLnZlcnNpb24SCAoGOTAzMzg2EnoKEQoMdW9wX2NhbmFyaWVzEgExEmUKEEMDhT8Ib0+Mhs8Zi2VR34QSCOVRPDJ5XEG5IgA5QE41aASRrxZBQE41aASRrxZKEAoKc3Bhbl9pbmRleBICGANKHwoNY29kZS5mdW5jdGlvbhIOCgxteUZ1bmN0aW9uMzZ6AA==')
>>> test = ExportTraceServiceRequest()
>>> test.ParseFromString(test_bytes)
202
>>> test
resource_spans {
  resource {
    attributes {
      key: "service.name"
      value {
        string_value: "uop.stage-eu-1"
      }
    }
    attributes {
      key: "outsystems.module.version"
      value {
        string_value: "903386"
      }
    }
  }
  instrumentation_library_spans {
    instrumentation_library {
      name: "uop_canaries"
      version: "1"
    }
    spans {
      trace_id: "C\003\205?\010oO\214\206\317\031\213eQ\337\204"
      span_id: "\345Q<2y\\A\271"
      start_time_unix_nano: 1634684637873000000
      end_time_unix_nano: 1634684637873000000
      attributes {
        key: "span_index"
        value {
          int_value: 3
        }
      }
      attributes {
        key: "code.function"
        value {
          string_value: "myFunction36"
        }
      }
      status {
      }
    }
  }
}

as you can see, the trace_id and span_id arent being parsed properly from the incoming b64 encoded protobuf our server receives.

here's the original payload we sent to the otel collector:

{
    "resourceSpans": [
        {
            "instrumentationLibrarySpans": [
                {
                    "spans": [
                        {
                            "traceId": "4303853f086f4f8c86cf198b6551df84",
                            "spanId": "e5513c32795c41b9",
                            "endTimeUnixNano": "1634684637873000000",
                            "attributes": [
                                {
                                    "value": {
                                        "intValue": 3
                                    },
                                    "key": "span_index"
                                },
                                {
                                    "value": {
                                        "stringValue": "myFunction36"
                                    },
                                    "key": "code.function"
                                }
                            ],
                            "startTimeUnixNano": "1634684637873000000",
                            "status": {}
                        }
                    ],
                    "instrumentationLibrary": {
                        "name": "uop_canaries",
                        "version": "1"
                    }
                }
            ],
            "resource": {
                "attributes": [
                    {
                        "value": {
                            "stringValue": "uop.stage-eu-1"
                        },
                        "key": "service.name"
                    },
                    {
                        "value": {
                            "stringValue": "903386"
                        },
                        "key": "outsystems.module.version"
                    }
                ]
            }
        }
    ]
}

there's a base64 encoding being applied to the trace_id and span_id in the collector somewhere, either in the otlphttp exporter or somewhere else that's causing data to be corrupted.

@tigrannajaryan
Copy link
Member

there's a base64 encoding being applied to the trace_id and span_id in the collector somewhere, either in the otlphttp exporter or somewhere else that's causing data to be corrupted.

You can add a logging exporter to that same pipeline to see if the data is corrupted before its hits the exporters or the ottlphttp exporter corrupts it.

@timcosta
Copy link
Author

2021-10-25T23:39:26.616Z	INFO	loggingexporter/logging_exporter.go:41	TracesExporter	{"#spans": 1}
2021-10-25T23:39:26.616Z	DEBUG	loggingexporter/logging_exporter.go:51	ResourceSpans #0
Resource labels:
     -> service.name: STRING(uop.stage-eu-1)
     -> outsystems.module.version: STRING(903386)
InstrumentationLibrarySpans #0
InstrumentationLibrary uop_canaries 1
Span #0
    Trace ID       : 4303853f086f4f8c86cf198b6551df84
    Parent ID      :
    ID             : e5513c32795c41b9
    Name           :
    Kind           : SPAN_KIND_UNSPECIFIED
    Start time     : 2021-10-19 23:03:57.873 +0000 UTC
    End time       : 2021-10-19 23:03:57.873 +0000 UTC
    Status code    : STATUS_CODE_UNSET
    Status message :
Attributes:
     -> span_index: INT(3)
     -> code.function: STRING(myFunction36)

it's corrupted in the otlphttp exporter.

@tigrannajaryan
Copy link
Member

Please post otlphttp exporter config. I don't see it in the config that is included in the issue description.

@tigrannajaryan
Copy link
Member

Looking at the otlphttp exporter code I don't see how it can corrupt traceid and I am not sure how to reproduce this.
I would advise to double check your receiver to make sure the problem is not in your receiver's decoding logic.

tigrannajaryan added a commit to tigrannajaryan/opentelemetry-collector that referenced this issue Oct 26, 2021
Related to open-telemetry#4221

There was a report that TraceID and SpanID are correupted by otlphttp exporter.
I added to TraceID and SpanID to try to reproduce the bug. The bug is not reproduced
but keeping these in the test is useful.
@tigrannajaryan
Copy link
Member

I verified with a test that uses otlphttp exporter and otlp reciever: #4268
The traceid and spanid are correctly exporter/received (I verified in debugger).

bogdandrutu pushed a commit that referenced this issue Oct 26, 2021
Related to #4221

There was a report that TraceID and SpanID are correupted by otlphttp exporter.
I added to TraceID and SpanID to try to reproduce the bug. The bug is not reproduced
but keeping these in the test is useful.
@bogdandrutu
Copy link
Member

I think you got confused by your Python script:

Python 3.9.7 (default, Sep  3 2021, 12:45:31) 
[Clang 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from builtins import bytes
>>> b = bytes.fromhex("4303853f086f4f8c86cf198b6551df84")
>>> b
b'C\x03\x85?\x08oO\x8c\x86\xcf\x19\x8beQ\xdf\x84'
>>> b.hex()
'4303853f086f4f8c86cf198b6551df84'

As you can see (you can also check that in the generated code the traceId is builtins.bytes) Python prints the builtins.bytes as base64 encoded string to ensure printable characters. The value is actually the same if you print the hex of the traceId instead of the default base64.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants