Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(file sink): supports input based on encoding type #21726

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

nionata
Copy link
Contributor

@nionata nionata commented Nov 7, 2024

Summary

The file sink only supports log event input. This PR changes it to support any input that the configured encoding supports.

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

How did you test this PR?

Cargo test

I added single and multi-partition tests to the file sink tests module. I added a log_ prefix to all the existing tests to disambiguate between the new metric_ tests. I did not add a compression or reopening test for metrics input as it should be handled by the log tests.

The full test suite passes

nix develop nixpkgs-unstable#vector
cargo test 
....
test result: ok. 1623 passed; 0 failed; 3 ignored; 0 measured; 0 filtered out; finished in 23.77s

     Running tests/e2e/mod.rs (target/debug/deps/e2e-eab25220265d1d1d)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/integration/lib.rs (target/debug/deps/integration-1d73c0d5e2cc11c9)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests vector

running 2 tests
test src/expiring_hash_map.rs - expiring_hash_map::ExpiringHashMap<K,V>::next_expired (line 141) ... ignored
test src/expiring_hash_map.rs - expiring_hash_map::ExpiringHashMap<K,V>::next_expired (line 158) ... ok

test result: ok. 1 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out; finished in 27.60s

Manual

Before

podman run -p 1337:1337/udp -i -v $(pwd)/vector-influx-socket-file.yaml:/etc/vector/vector.yaml --rm docker.io/timberio/vector:0.42.0-debian
2024-11-04T21:05:14.216066Z  INFO vector::app: Log level is enabled. level="info"
2024-11-04T21:05:14.218669Z  INFO vector::app: Loading configs. paths=["/etc/vector/vector.yaml"]
2024-11-04T21:05:14.220239Z ERROR vector::cli: Configuration error. error=Data type mismatch between in (Metric) and out (Log)

After

make environment CONTAINER_TOOL="podman"

mkdir /etc/vector && cat << EOF > /etc/vector/vector.yaml
sources:
  in:
    type: "socket"
    address: "0.0.0.0:1337"
    mode: "udp"
    decoding:
      codec: "influxdb"

sinks:
  out:
    inputs:
      - "in"
    type: "file"
    encoding:
      codec: "native_json"
    path: "/tmp/vector-%Y-%m-%d.log"
EOF

cargo run
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.35s
     Running `target/debug/vector`
2024-11-07T19:00:10.170285Z  INFO vector::app: Log level is enabled. level="info"
2024-11-07T19:00:10.172461Z  INFO vector::app: Loading configs. paths=["/etc/vector/vector.yaml"]
2024-11-07T19:00:10.182373Z  INFO vector::topology::running: Running healthchecks.
2024-11-07T19:00:10.182732Z  INFO vector::topology::builder: Healthcheck passed.
2024-11-07T19:00:10.182739Z  INFO vector: Vector has started. debug="true" version="0.43.0" arch="x86_64" revision=""
2024-11-07T19:00:10.182800Z  INFO vector::app: API is disabled, enable by setting `api.enabled` to `true` and use commands like `vector top`.
2024-11-07T19:00:10.183073Z  INFO source{component_kind="source" component_id=in component_type=socket}: vector::sources::socket::udp: Listening. address=0.0.0.0:1337

echo -n "measurement,tag1=value1,tag2=value2 field1=123,field2=1.23 1630424257000000000" | nc -4u -w0 0.0.0.0 1337

cat /tmp/vector-2021-08-31.log 
{"metric":{"name":"measurement_field1","tags":{"tag1":"value1","tag2":"value2"},"timestamp":"2021-08-31T15:37:37Z","kind":"absolute","gauge":{"value":123.0}}}
{"metric":{"name":"measurement_field2","tags":{"tag1":"value1","tag2":"value2"},"timestamp":"2021-08-31T15:37:37Z","kind":"absolute","gauge":{"value":1.23}}}

Validation still catches when the encoding type does not support the input type

cat /etc/vector/vector.yaml 
sources:
  in:
    type: "socket"
    address: "0.0.0.0:1337"
    mode: "udp"
    decoding:
      codec: "influxdb"

sinks:
  out:
    inputs:
      - "in"
    type: "file"
    encoding:
      codec: "avro"
      avro:
        schema: ""
    path: "/tmp/vector.log"
cargo run
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.35s
     Running `target/debug/vector`
2024-11-21T20:33:28.560862Z  INFO vector::app: Log level is enabled. level="info"
2024-11-21T20:33:28.563455Z  INFO vector::app: Loading configs. paths=["/etc/vector/vector.yaml"]
2024-11-21T20:33:28.571274Z ERROR vector::cli: Configuration error. error=Data type mismatch between in (["Metric"]) and out (["Log"])

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the "no-changelog" label to this PR.

Checklist

  • Please read our Vector contributor resources.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run dd-rust-license-tool write to regenerate the license inventory and commit the changes (if any). More details here.

References

closes #21704

@bits-bot
Copy link

bits-bot commented Nov 7, 2024

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added the domain: sinks Anything related to the Vector's sinks label Nov 7, 2024
@nionata nionata changed the title enchancement(file sink): supports input based on encoding type feat(file sink): supports input based on encoding type Nov 7, 2024
@pront
Copy link
Contributor

pront commented Nov 14, 2024

Hi @nionata, just checking if you wanted a review on this or it's purely a draft for your own purposes.

@nionata
Copy link
Contributor Author

nionata commented Nov 15, 2024

Hi @nionata, just checking if you wanted a review on this or it's purely a draft for your own purposes.

Hello @pront! I would like to get this reviewed and merged. I have it as draft because I wanted to add a test. Feel free to let me know early if this is something we could get merged or not.

@pront
Copy link
Contributor

pront commented Nov 15, 2024

Hi @nionata, just checking if you wanted a review on this or it's purely a draft for your own purposes.

Hello @pront! I would like to get this reviewed and merged. I have it as draft because I wanted to add a test. Feel free to let me know early if this is something we could get merged or not.

Makes sense, no rush 👍
We will definitely benefit from a test. Ping me when it's ready for review.

@nionata nionata force-pushed the nji/file-metric branch 2 times, most recently from 4dcbb56 to 8248f8c Compare November 19, 2024 20:23
@nionata nionata marked this pull request as ready for review November 19, 2024 20:24
@nionata nionata requested a review from a team as a code owner November 19, 2024 20:24
@@ -286,10 +286,36 @@ pub fn random_metrics_with_stream(
batch: Option<BatchNotifier>,
tags: Option<MetricTags>,
) -> (Vec<Event>, impl Stream<Item = EventArray>) {
Copy link
Contributor Author

@nionata nionata Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should perform the exact same for existing callers as the previously hardcoded timestamp params are being passed into the new function

@nionata
Copy link
Contributor Author

nionata commented Nov 19, 2024

@pront ready for review!

Copy link
Contributor

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thankss @nionata, this looks good! I have only one concern, with this PR we also accept traces. Is it possible to add a basic test for traces as well? Just to make sure we don't falsely advertise trace support. If we find a problem, we can just exclude traces.

}

// The TestSerializer only supports Log and Metric data types. This test asserts the current state of running the vector file sink with an event stream.
// A file partition is created and empty lines are written in the file. Running a real vector instance would fail on the configuration because
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"A file partition is created and empty lines are written in the file" - deeming changing the logic to not write a line out of scope here because the top-level config validation should prevent this from happening.

@nionata
Copy link
Contributor Author

nionata commented Nov 21, 2024

Thankss @nionata, this looks good! I have only one concern, with this PR we also accept traces. Is it possible to add a basic test for traces as well? Just to make sure we don't falsely advertise trace support. If we find a problem, we can just exclude traces.

Agreed. Pushed not supported and supported trace tests.

src/sinks/file/mod.rs Outdated Show resolved Hide resolved
@nionata nionata requested a review from pront November 21, 2024 21:26
@pront pront enabled auto-merge November 22, 2024 15:26
Copy link
Contributor

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nionata, this is ready to go. great improvement and I appreciate that you added all these tests 👍

@pront
Copy link
Contributor

pront commented Nov 22, 2024

Ah, please add a changelog fragment. See readme and examples here: https://github.com/vectordotdev/vector/tree/master/changelog.d

auto-merge was automatically disabled November 22, 2024 17:58

Head branch was pushed to by a user without write access

@nionata
Copy link
Contributor Author

nionata commented Nov 22, 2024

@pront done

$ ./scripts/check_changelog_fragments.sh
validating '21704_file_sink_supports_encoding_input.feature.md'
changelog additions are valid.

@pront pront enabled auto-merge November 22, 2024 18:13
auto-merge was automatically disabled November 22, 2024 18:52

Head branch was pushed to by a user without write access

@nionata
Copy link
Contributor Author

nionata commented Nov 22, 2024

/git/vectordotdev/vector# cargo clippy --all-targets
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.93s

@pront pront enabled auto-merge November 22, 2024 20:11
@nionata nionata requested a review from pront November 22, 2024 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: sinks Anything related to the Vector's sinks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow file sink to accept metric events
3 participants