Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add Column field to Line message #687

Closed
mar-kolya opened this issue Mar 3, 2022 · 10 comments · Fixed by #818
Closed

Proposal: Add Column field to Line message #687

mar-kolya opened this issue Mar 3, 2022 · 10 comments · Fixed by #818
Labels
Priority: p3 Buganizer priority - P3 type: feat Buganizer type - Feature Request

Comments

@mar-kolya
Copy link

mar-kolya commented Mar 3, 2022

It would be really useful to have a column field along with the line number field in the Line message. This is especially useful for runtimes that use source code compacting/obfuscation like Javascript.

In profile.proto this would look like:

index ee0391f..92f9fef 100644
--- a/proto/profile.proto
+++ b/proto/profile.proto
@@ -195,6 +195,8 @@ message Line {
   uint64 function_id = 1;
   // Line number in source code.
   int64 line = 2;
+  // Column number in the source code.
+  int64 column = 3;
 }

 message Function {

Additional changes to the pprof tool:

  • Output column information if it is present when -lines is used.
  • Where pprof currently outputs <path>:<line> make it output <path>:<line>:<column> when column is available.
    • Columns are 1-indexed, so 0 column doesn’t exit.
  • merge.go needs updating to take into account column numbers, also (*Profile).Aggregate(), aggregate() in driver.go. We would need to check all the places where Line.line is used.

Please let me know what you think.

Thanks!

@mar-kolya mar-kolya changed the title Pprof format should support 'column' (in addition to the line number) Proposal: Add Column field to Line message Sep 27, 2022
@felixge
Copy link

felixge commented Sep 27, 2022

@aalexand we're currently thinking about potential improvements to profile.proto at Datadog. Before submitting any bigger proposals, what do you think about this small one that my colleague @mar-kolya made a while ago? He just updated to title/text to make the proposal more clear. If accepted, we'd be happy to send a PR for the profile pkg to support encoding/decoding this new field.

@aalexand
Copy link
Collaborator

aalexand commented Oct 2, 2022

@felixge @mar-kolya No conceptual objections but it would be good to include in the description how this will integrate with pprof as a tool. In particular (not an exhaustive list):

  • How this will affect the granularity option. Today there is -lines choice, will there also be -linecolumns (similar to -filefunctions) or will we simply include the column when the lines granularity is requested?
  • How will each output format support the new field?
  • What places beyond encoding / decoding will be updated. For example, merge.go will need to be updated. What else?

I basically want to make sure that if we add this field then we do our best to identify what full support for the new field looks like in pprof and have this support added as part of the PR that adds the field, including tests.

@mar-kolya
Copy link
Author

@aalexand I've updated the proposal. Please let me know what do you think.
Thanks!

@aalexand
Copy link
Collaborator

I wonder whether it's possible to avoid adding -linescolumns granularity and simply output the column information if it's present. Basically, treat it as "fractional" part of the line number.

Similarly I would prefer to avoid adding has_column_numbers. These has_* fields are typically populated based on the detail level of debug info and "has column number" doesn't quite fit that.

@felixge
Copy link

felixge commented Oct 11, 2022

@aalexand I'll let @mar-kolya confirm, but AFAIK we're totally happy with following your suggestions. Our current workaround for minified JS has been to put both column and line numbers into the line field (using 32 bit for each). This works for our use case, but since we're invested in pprof we'd love to contribute back and make the format better for everybody as well.

@mar-kolya
Copy link
Author

Sounds good to me, updated original proposal.
Thanks!

@aalexand
Copy link
Collaborator

One question I have on this: rendering performance data on a long source line is not very usable so I would expect that at some point of the profile data processing the data needs to be re-mapped to the original source lines via source maps?

@mar-kolya
Copy link
Author

mar-kolya commented Oct 17, 2022

Yes, that indeed happens. But this happens on the pprof data outside of the process that created the pprof file.

That being said: this was just original usecase that has inspired this ticket. I'm sure there are languages where knowing column number is useful because there are multiple statements on the same line - but those languages do not put everything onto one line.

@aalexand
Copy link
Collaborator

I think the Function message would also need to be updated - there is start_line field and if we introduce ability to specify column in the Line message then we should consistently update the "precision" in other places.

@aalexand
Copy link
Collaborator

FTR, #818 is in review for this feature.

snehasish added a commit to snehasish/pprof that referenced this issue Nov 16, 2023
This change adds column numbers to the line message in profile.proto.
This allows users to distinguish between souce code locations on the
same line. Only the llvm addr2line is capable of reading column number
information from dwarf debug information. Other changes include:
* Add "columns" option for output granularity.
* Account for column numbers during profile merge.
* Update the encoder and decoder.
* Update golden test files for legacy profiles.

Fixes google#687
snehasish added a commit to snehasish/pprof that referenced this issue Nov 28, 2023
This change adds column numbers to the line message in profile.proto.
This allows users to distinguish between souce code locations on the
same line. Only the llvm addr2line is capable of reading column number
information from dwarf debug information. Other changes include:
* Add "columns" option for output granularity.
* Account for column numbers during profile merge.
* Update the encoder and decoder.
* Update golden test files for legacy profiles.

Fixes google#687
snehasish added a commit to snehasish/pprof that referenced this issue Nov 29, 2023
This change adds column numbers to the line message in profile.proto.
This allows users to distinguish between souce code locations on the
same line. Only the llvm based addr2line is capable of reading column number
information from dwarf and PE debug information. Other changes include:
* Add "columns" option for output granularity.
* Account for column numbers during profile merge.
* Update the encoder and decoder.
* Update golden test files for legacy profiles.

Fixes google#687
aalexand added a commit that referenced this issue Jan 17, 2024
* Add column numbers to profile.proto.

This change adds column numbers to the line message in profile.proto.
This allows users to distinguish between souce code locations on the
same line. Only the llvm based addr2line is capable of reading column number
information from dwarf and PE debug information. Other changes include:
* Add "columns" option for output granularity.
* Account for column numbers during profile merge.
* Update the encoder and decoder.
* Update golden test files for legacy profiles.

Fixes #687

* Update the expecatations in the test to match testdata.

* Update report generation and driver for column numbers.

* Address comments

---------

Co-authored-by: Alexey Alexandrov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: p3 Buganizer priority - P3 type: feat Buganizer type - Feature Request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants