Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: stamp the pseudo-version in builds generated by go build #50603

Open
4ad opened this issue Jan 14, 2022 · 91 comments
Open

cmd/go: stamp the pseudo-version in builds generated by go build #50603

4ad opened this issue Jan 14, 2022 · 91 comments

Comments

@4ad
Copy link
Member

4ad commented Jan 14, 2022

NOTE: The accepted proposal is #50603 (comment).


cmd/go embeds dependency version information in binaries, which is very useful. From Go 1.18 onwards, cmd/go also embeds VCS information in binaries, which makes it even more useful than it was before.

As #37475 mentions, people place version information in binaries using -ldflags='-X foo=bar', which requires an additional build wrapper. The new VCS stamping feature of cmd/go should alleviate the need for external wrapper, but I am afraid it comes short.

The version information, in the sense of Go's pseudo version is not recorded for the main module when doing go build:

: emerald:ver; go build
: emerald:ver; go version -m hello | grep 'mod.*hello'
	mod	mgk.ro/hello	(devel)	
: emerald:ver; 

The version is recorded as expected when doing go install:

: emerald:ver; go install robpike.io/ivy@latest
go: downloading robpike.io/ivy v0.1.124
: emerald:ver; go version -m `which ivy` | grep 'mod.*ivy'
	mod	robpike.io/ivy	v0.1.12	h1:qI7dnEiXhorB+za07W6qX3sG+IvBK4EUl38vUHAf53Q=
: emerald:ver; 
: emerald:ver; 

I am afraid this limitation of cmd/go will continue to force people to use external build wrappers that set -ldflags, which is rather unfortunate.

I am not the first to want main module version information in binaries, this has been already asked for in various issues, for example in #29814, which was closed as a duplicate of #37475, but it really wasn't a duplicate, as #37475 is about VCS information, and #29814 is about semantic versioning. Other examples of people asking for this feature are mvdan/sh#519 and #29228 (comment) where various workarounds were proposed.

Speaking of workarounds, the only workaround that I know that currently works would be to create a local module proxy and pass GOPROXY to go install, but that is an extremely high-overhead workaround, and go install is not a replacement for go build anyway, since go install comes with some rather severe limitations regarding how vendoring works and what you can put in go.mod, and go install doesn't support controlling GOBIN when cross-compiling.

I realize that Git tags are a local concept, and by doing the "wrong" git operations one could come up with a different pseudo-version for the same source code. I am afraid I don't have any solution or suggestion regarding this git misfeature, except to note that even in this case the hash information is recorded correctly, and in every case by the virtue of having access to the local source code the programmer can always do some local operation that has the potential to cause a version mislabeling. Git is just more prome to do this by accident, but the ability is there, always.

I don't have any stats to back this up, but from my experience most corporate source code is built by go build, not go install, and it would be great if somehow Go's notion of versioning would be stamped by go build.

CC @bcmills @mvdan @rsc

@4ad 4ad added Proposal NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. modules labels Jan 14, 2022
@mvdan
Copy link
Member

mvdan commented Jan 14, 2022

At least speaking personally, for cases like mvdan/sh#519, my intent is to show something like devel ${GIT_SHA} when someone does a local Go build out of a git checkout. If someone is manually cloning and building, as opposed to the advertised and easier go install url@latest, I imagine they know what a git hash is. So what 1.18 is currently shipping with is enough for my needs.

It's true that something like a proper module version might be more useful; a git commit hash doesn't give any hint as to how old a version is, whereas a semver version prefix or a timestamp can give a starting point. So, in principle, I agree with you: 1.18 is a big step forward, but it's still unfortunate that the main module version remains as (devel) for local builds.

However, in practice, I still agree with Jay's comment in #29228 (comment); we shouldn't make such a "locally inferred version" look like a normal version, because it's reasonably likely to be wrong or cause confusion with users.

in every case by the virtue of having access to the local source code the programmer can always do some local operation that has the potential to cause a version mislabeling.

Could you give some examples? I can only think of very unlikely scenarios, such as manually corrupting the module download cache after downloading some dependencies. That cache is read-only by default, and go mod verify exists to double-check the contents too.

With the main module in a git checkout, I can think of multiple scenarios which seem more likely:

  • What if I've made a commit or tag but not pushed it?
  • What if I've edited some files and not committed them?
  • What if two people on two computers make the same tag with different code - would they end up with the same exact module version for different software? If one of them pushes their tag to the internet, would the other's computer be affected by the mismatch?

I think that, if we are to implement something like this, the versions must be somehow different from the canonical and unique versions that get computed from fully published commits and tags. This would make it very clear that the versions are inferred from local state, and not guaranteed to be correct. As a simplistic example, imagine that tagging v1.2.3 locally results in a build whose main module version is devel v1.2.3, but when pushed and go installed, gets the version v1.2.3.

@mvdan
Copy link
Member

mvdan commented Jan 14, 2022

we shouldn't make such a "locally inferred version" look like a normal version, because it's reasonably likely to be wrong or cause confusion with users.

To add a more concrete example: if we made the change proposed here, and locally inferred versions looked like fully published versions, I would have a harder time trusting the output of shfmt -version when my users report bugs. I would have to update the issue template to also ask: did you build from a modified git checkout?

@4ad
Copy link
Member Author

4ad commented Jan 14, 2022

Could you give some examples? I can only think of very unlikely scenarios, such as manually corrupting the module download cache after downloading some dependencies. That cache is read-only by default, and go mod verify exists to double-check the contents too.

I was thinking of the case where since Go itself doesn't expose its own concept of a version to the program, the users themselves are forced to create their own concepts of a version, either through things like VERSION files, or through some build wrappers. By definition, any such concept is under user's control, and the user can and will make mistakes. In fact, from experience, users try to naively use git tags for this which then fail for precisely the reasons you just explained.

Let me rephrase my point. Go can't enforce any useful properties for the user's notion of a version because it doesn't know about it, and as such if we make userVersion==moduleVersion, the fact that Go can't enforce any properties is neither better nor worse for the user. The user is on the hook for doing the right thing in both cases. In one case the user must properly maintain their VERSION, and in the other case the user must properly maintain their git checkouts.

The user does gain something in the latter case though. They don't have to create build wrappers.

With the main module in a git checkout, I can think of multiple scenarios [which might fail ... ] I think that, if we are to implement something like this, the versions must be somehow different from the canonical and unique versions that get computed from fully published commits and tags. [...] As a simplistic example, imagine that tagging v1.2.3 locally results in a build whose main module version is devel v1.2.3, but when pushed and go installed, gets the version v1.2.3.

I very much agree with this, with one caveat. If the locally checked-out version is identical to a published release, I would expect the version to match the release. If the locally checked-out version can not be guaranteed to match any release, then yes, it should be published with something like devel v1.2.3 (which matches what Go does, but why not v1.2.3-devel or v1.2.3-unknown, which is semver-compatible?).

Unfortunately, I can't imagine how this would work without internet access, and quite often a prerequisite of automated systems running go build is to not go to the Internet.

@4ad
Copy link
Member Author

4ad commented Jan 14, 2022

Hold on, another thought. If we always add the commit hash, and some other metadata to the main module version for local builds, essentially always making them a fully qualified Go pseudo-version, then they will always be different from the published version, so there's no potential for confusion there.

Even better, in semver terms these builds will sort before the published version, which is probably what people want.

For this, what I said earlier about

If the locally checked-out version is identical to a published release, I would expect the version to match the release.

can no longer be true, but perhaps that is ok as long as we come up with a documented and stable convention that describes versioning for local builds (as opposed to just dumping a "devel" in the metadata field).

@bcmills
Copy link
Contributor

bcmills commented Jan 14, 2022

The main caveat here, I think, is unpublished tags. If I create a local, unpublished tag for, say, v1.1000.0, then my pseudo-versions will be v1.1000.0-0.2022…, but everyone else's pseudo-versions may be on an arbitrarily lower version (say, v0.8.3-0.2022….

That may or may not be a significant issue, though: if we always use a pseudo-version, we'll at least have the commit hash as a common point of reference even if the base versions differ.

@mvdan
Copy link
Member

mvdan commented Jan 14, 2022

@4ad right, a local build can't always know what is or isn't published, as requiring a network roundtrip takes us back to square one.

Your idea of trying to stick to semver, and always using some form of pseudo-version which includes a hash, sounds good. With one caveat, though: the commit hash isn't enough to make the version unambiguous, because I can have infinite kinds of uncommitted changes that do not change the HEAD commit hash.

@bcmills good point about tags still messing with pseudo-versions, but at least if we always include a timestamp and some form of unique hash, then I think we're good. With the caveat above about uncommitted changes :)

@mvdan
Copy link
Member

mvdan commented Jan 14, 2022

We do have another hash available to us, though, which changes whenever any input Go code changes: the build IDs used for the build cache. I seem to recall that one such ID is embedded into binaries, too.

Not ideal, as such a hash also includes build parameters like GOOS or -tags, which don't normally affect versions. But at least it fixed the problem with uncommitted files in VCS.

@4ad
Copy link
Member Author

4ad commented Jan 14, 2022

Yes, uncommitted changes should be explicit in the pseudo-version, but I think we can suffix +, just as we do with Go itself, no?

@seankhliao
Copy link
Member

the new buildinfo already records whether the workspace is clean with vcs.modified=true|false

@seankhliao
Copy link
Member

we could use one of +local (for clean builds) or +dirty (uncommitted changes, implies local) as the semver build id, attached to a pseudoversion which should make the situation clear enough?

So main will always have a version like

vX.Y.Z-timestamp-commit+local
vX.Y.Z-timestamp-commit+dirty

@hyangah
Copy link
Contributor

hyangah commented Jan 14, 2022

What is the main motivation of encoding the local version in pseudo-version style rather than keeping those extra info (timestamp?) as extra metadata fields - if it's not guaranteed that they are always available in the origin or proxies?

It seems like the vcs.time is already there in the build info metadata.


BTW, I feel like the main module's version isn't sufficient to describe a tool's behavior in certain cases - go version used to go build, third-party tools dependencies, and go build's behavior change (go.work left over somewhere accidentally?) can affect a tool's behavior. So when triaging issues, I hope we develop best practice using go version -m or richer build info dump rather than relying on the main module version string.

@4ad
Copy link
Member Author

4ad commented Jan 15, 2022

What is the main motivation of encoding the local version in pseudo-version style

The main motivation is that go install does it, and many people expect to have a notion of a program's version available and want it, and because they don't have it with go build, they rely on build wrapers or other workarounds, which are undesirable in the broader Go ecosystem.

encoding the local version in pseudo-version style rather than keeping those extra info (timestamp?) as extra metadata fields

Emphasis mine.

It's not rather than, It doesn't replace the existing metadata fields. If you want to read the metadata, you should read it from those fields instead of parsing the pseudo-version. However, that metadata is useful in disambiguating builds produced by go build from published releases. Presumably we could come with some other kind of metadata for the same purpose, but since pseudo-versions are a de-facto standard in the Go ecosystem, why not reuse it?

I feel like the main module's version isn't sufficient to describe a tool's behavior in certain cases - go version used to go build, third-party tools dependencies, and go build's behavior change (go.work left over somewhere accidentally?) can affect a tool's behavior.

This sounds like an argument to always use the build ID as the version suffix instead of the VCS hash.

So when triaging issues, I hope we develop best practice using go version -m or richer build info dump rather than relying on the main module version string.

I hope so too, but again, I think that discussion is out of scope for this thread, which is more about bringing go build in line with go install and providing a solution for users that avoids build wrappers.

@4ad 4ad changed the title cmd/go: go build stamps version (devel) in binaries, which is not very useful proposal: stamp the pseudo-version in builds generated by go build Jan 17, 2022
@gopherbot gopherbot added this to the Proposal milestone Jan 17, 2022
@gopherbot gopherbot removed the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Jan 17, 2022
@rsc
Copy link
Contributor

rsc commented Jan 19, 2022

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@maja42
Copy link

maja42 commented Jan 26, 2022

As a kind of experience report, I'm using custom build-scripts for years, solely to embed version information of the main module into the executable. When building, I attach the following information:

type BuildInfo struct {
	// Major, Minor, Patch contain the semantic version components.
	// In case of an unofficial build, this version is from a previous commit on the same branch.
	Major, Minor, Patch int
	// If true, a version-tagged commit was built.
	OfficialBuild bool

	// CommitHash is the full hash of the git commit.
	// Note that the actually built source-code is different if LocalChanges is true.
	CommitHash string
	// CommitTimestamp is the timestamp of the git commit.
	// Note that the actually built source-code is different if LocalChanges is true.
	CommitTimestamp time.Time
	// BuildTimestamp contains the timestamp when the project was built.
	BuildTimestamp time.Time

	// LocalChanges specifies if the project directory differed from the commit due to local (uncommitted) changes.
	LocalChanges bool
}

With this kind of information, I'm able to build version-strings however I like. Usually I try to match go's pseudo-version strings, but when I need to stay compatible with the version-scheme from other, non-go projects, I can do so as well.

// VersionString returns the project's semantic version number without a leading 'v'.
// Patterns:
//    <major>.<minor>.<patch>
//        Built from an officially tagged commit (without local changes)
//    0.0.0-<buildTime>
//        Unofficial build. There are no tagged commits yet.
//    <major>.<minor>.<patch*>-<buildTime>
//        Unofficial build. The commit was not tagged, or there were local changes.
//        The used patch-version is +1 compared to the last tagged commit.
//    The build timestamp (yyyymmddhhmmss) is the UTC time when the application was built / ctgover generated the version number.
func (b *BuildInfo) VersionString() string {
	preRelease := ""
        patch := p.Patch
	if !b.OfficialBuild {
		buildTime := b.BuildTimestamp.Format("20060102150405")
		preRelease = "-" + buildTime
                patch++
	}
	return fmt.Sprintf("%d.%d.%d%s", b.Major, b.Minor, patch, preRelease)
}	

The version is printed when the application is started with a -version command-line flag. But I guess go-tools could/should show the same version string.

I don't have anything against adding a build ID, but it wouldn't really solve my problem. My use cases are:

  • Identify the source-code / git-commit that was built
  • Identify the application's version (for example to detect available updates)
  • Identify when the application was built (quickly find out if it super old)
  • Compare if two applications are the same, and which one is newer

The previously raised issue that git-tags might only be local was never really an issue in my experience. Version-tags aren't made lightheartedly and are always immediately pushed to the server in the projects I work on.

I really hope we can this kind of information into a go-executable one day, because I would finally be able to get rid of all my build- and tool-scripts.

@rsc
Copy link
Contributor

rsc commented Jan 26, 2022

It sounds like @mvdan and @bcmills have some hesitation around the fact that these pseudo-versions would not correspond to any publicly available version, even though they look like those. That does seem like a reason not to do this.

We now have Git VCS info separately in the builds (as of Go 1.18; try go version -m). Do we need to add a second way to record that information?

@rsc rsc changed the title proposal: stamp the pseudo-version in builds generated by go build proposal: cmd/go: stamp the pseudo-version in builds generated by go build Jan 26, 2022
@4ad
Copy link
Member Author

4ad commented Jan 31, 2022

It sounds like @mvdan and @bcmills have some hesitation around the fact that these pseudo-versions would not correspond to any publicly available version, even though they look like those.

We can make it unambiguously distinct, for example instead of v1.2.4-0.20191109021931-daa7c04131f5 we could use v1.2.4-0.unpublished.20191109021931-daa7c04131f5 or something like that.

We now have Git VCS info separately in the builds (as of Go 1.18; try go version -m). Do we need to add a second way to record that information?

No, we certainly only need one way to encode VCS info. The suggestion to put VCS info in the metadata field of the pseudo-version was to match the go install behavior, but putting something else there, for example the build id is probably better. The build id also works, and is meaningful when you might not have VCS info, like from a tarball, which is a pretty common case where you'd use go build for. Scratch that idea, without VCS we can't detect the version either.

@ChrisHines
Copy link
Contributor

We can make it unambiguously distinct, for example instead of v1.2.4-0.20191109021931-daa7c04131f5 we could use v1.2.4-0.unpublished.20191109021931-daa7c04131f5 or something like that.

With replace statements in go.mod and the new workspace mode in Go 1.18 it is possible to build Go programs that include local versions of modules besides the main module. For those modules the go tool also lists (devel) for their version.

The new build vcs metadata only helps identify the main module. Adding vcs metadata for all local modules seems valuable and not currently supported. The format suggested above by @4ad would be more informative than (devel). Maybe also including a dirty flag if there are uncommitted local edits.

@rsc
Copy link
Contributor

rsc commented Feb 2, 2022

The new build vcs metadata only helps identify the main module. Adding vcs metadata for all local modules seems valuable and not currently supported.

This may be true, but the concern above seems to be adding vcs metadata that looks like a pseudo-version. It need not, and it probably should not. We can always add that separately; maybe that should be a different proposal. (I think this is the first comment to bring up VCS info for replaced modules that point to other local repos.)

@mvdan
Copy link
Member

mvdan commented Feb 4, 2022

A thought: if we're only concerned about having a reliable way to always get some useful version for the main module, I think it could be an API of its own, like debug.MainVersion() string. It could first try to get the main module version from https://pkg.go.dev/runtime/debug#ReadBuildInfo, otherwise fall back to VCS information, and otherwise fall back to something that should always work, such as the binary's build ID.

I personally will be implementing logic like that to replace -ldflags=-X=main.version=... in my projects, where I use a default of var version = "(devel)". And I think it should be useful to other projects as well, at least as a good starting point.

Another option, if we want this to also work for library modules, would be debug.OwnVersion, which would do the equivalent but for the module containing the package that's making the function call. Perhaps that would cover @ChrisHines's needs. I maintain some libraries and I admit that reliably knowing my own version could help in terms of logging or capturing debug information.

If the above sounds interesting, I'll happily develop the idea further and create a new proposal. I realise it's not the same as this proposal, but I also think it could be a different solution to the same end-user problem :)

@kortschak
Copy link
Contributor

@mvdan I would very much like to see something like that to replace the boiler place build lines that exist in code at work.

@4ad
Copy link
Member Author

4ad commented Feb 4, 2022

We use the semantic version of the binaries in order to compute API compatibility between different binaries. I am afraid that if your debug.MainVersion() doesn't always return some string that is compatible with semver, we will still have to resort to -ldflags=-X=....

Now, one might object to using the binary version in this way and perhaps recommend using a separately maintained API version instead that is separate from the binary version. I would tend to agree except this is outside my control. I do not have the operational liberty to change this.

@DavidGamba
Copy link

Are uncommitted changes only considered for the directory where the go.mod lives? I work in monorepo environments and rarely ever achieve a full clean repo.

@samthanawalla
Copy link
Contributor

@DavidGamba
Are uncommitted changes only considered for the directory where the go.mod lives?

We were planning to rely on what the VCS state is, I.e. git status.
I don't think it makes sense to only consider the directory where go.mod lives because it introduces additional states to consider which I think may make the version info more confusing. But I could be wrong.

However as a compromise, would v[tag]+dirty work for your use case?

If your current commit matches a tagged version but you have uncommitted changes, you would get v[tag]+dirty instead of [pseudoversion]+dirty.

@DavidGamba
Copy link

I would prefer to be able to push a binary as is from my machine if there are no changes in the the given go module. Other than a go.work file and possible env vars there shouldn't be any dirty files affecting the go binary itself so marking the binary as dirty when there are no changes in the module itself seems overly cautious.

it introduces additional states to consider

I am not sure I know enough about Go to know anything else other than the go.work and env vars that introduce additional states. I would love to learn a bit more about what else affects a go module.

At the end of the day, having the version stamped will be a win even if I can only get the non-dirty version out of CI or a clean clone.

@samthanawalla
Copy link
Contributor

samthanawalla commented Jun 27, 2024

By additional states I meant to say version control states.

I do get your point but a tag is more a property of the repo and not just the main module. Changes to the repo as a whole will reflect accordingly in the stamped version.

While it may be overly cautious, I don't think we will support this use case as of now. But that could change in the future if necessary.

Updated #50603 comment to include v[tag]+dirty use case.

@matloob
Copy link
Contributor

matloob commented Jun 27, 2024

@DavidGamba I would be interested in understanding your use case better. Our expectation is that those planning to do a build of a go program at a given version would check out the appropriate version and do a clean build. Our plan is to reuse the vcs.modified field in the buildinfo in determining whether a build is clean. That would help avoid doing extra work for what we believe is an uncommon case. But if it isn't an uncommon case we'd like to know.

@zpavlinovic
Copy link
Contributor

vcs.modified seems to be set here. My understanding of the code is that this value is false by default unless one of known versioning systems is used. For those systems, status command is used to get the value for modified using the current working directory.

I would prefer to be able to push a binary as is from my machine if there are no changes in the the given go module. Other than a go.work file and possible env vars there shouldn't be any dirty files affecting the go binary itself so marking the binary as dirty when there are no changes in the module itself seems overly cautious.

The more and more I think about this, the more I get the feeling that having a completely precise solution is either extremely hard or impossible.

If a module has a replace directive pointing to a local directory outside of the module, then changing that replacing directory content could result in a different binary. To make things more complicated, this replacing directory might be outside of a repo where the module is.

It seems to me that the current approach is making a decent compromise. It will cover the more common case where the replacing module is in the same repo. Of course, it might add +dirty if part of a repo completely unrelated to module is being changed.

Regarding monorepo, it seems that the current approach will always have vcs.modified set to true. An example is svn (it does not have a Status method so computing modified will be skipped). It would be good to verify that.

@DavidGamba
Copy link

My use case is not a major use case since official builds will come from CI. It is just in those cases where you want to push something out that doesn't have a pipeline yet. Cloning a clean repo for those is not a major inconvenience and worst case the binary will just have the +dirty label.

@gopherbot
Copy link
Contributor

gopherbot commented Jul 15, 2024

Change https://go.dev/cl/596035 mentions this issue: cmd/go: stamp the version for binaries built with go build.

ulope added a commit to shutter-network/rolling-shutter that referenced this issue Jul 16, 2024
Due to golang being annoying the VCS version isn't recorded in
debug.ReadBuildInfo when using `go build`.

If you have too much time, see:
- golang/go#29228
- golang/go#50603
@thepudds
Copy link
Contributor

thepudds commented Aug 5, 2024

Would someone mind summarizing exactly where this new version info will show up in runtime/debug.BuildInfo?

Is it runtime/debug.BuildInfo.Main.Version?

(Sorry if this is spelled out above – I did some re-skimming and some Ctrl-F, and I saw a couple of suggestions, I’m not sure I saw a definitive statement).

@samthanawalla
Copy link
Contributor

samthanawalla commented Aug 5, 2024

@thepudds
Is it runtime/debug.BuildInfo.Main.Version?

Yes! No worries, there's a lot of comments.

Updated #50603 comment to reflect this.

gopherbot pushed a commit that referenced this issue Aug 12, 2024
This CL will set the binary version using local tag information if
present.

For #50603

Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-windows-amd64-longtest
Change-Id: I58bed345c7eea20e51a7b24ff6e943d9d1ed240d
Reviewed-on: https://go-review.googlesource.com/c/go/+/596035
Reviewed-by: Michael Matloob <[email protected]>
Reviewed-by: Karam Moore <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
@samthanawalla
Copy link
Contributor

samthanawalla commented Aug 13, 2024

https://go.dev/cl/596035 is merged which adds version stamping for git only. This will come out in 1.24.

I have not looked into the other VCS's hg(Mercurial), svn(Subversion), bzr(GNU Bazaar), or fossil and unsure whether they will be added by 1.24.

I will leave this issue open for now.

@mvdan
Copy link
Member

mvdan commented Aug 14, 2024

Thanks @samthanawalla, I just gave it a try and it seems to be working as advertised!

I'm still slightly surprised at the appearance of +dirty when we already track this information via vcs.modified. Why duplicate the information when it is already there? It doesn't give me 100% assurance that the lack of +dirty means the version is to be trusted anyway, as a local git tag can be modified, as discussed before. Personally I would omit this feature - it wasn't in the original design, and I still don't understand why it has been added.

I also notice that the version stamping does not happen with -buildvcs=false. That's probably reasonable, but we should document this properly. I had asked this question some time back in #50603 (comment) but the proposed design wasn't clear before we had an implementation.

@samthanawalla
Copy link
Contributor

@mvdan
Awesome, glad to hear!

While vcs.modified technically captures this information, the decision to surface it within the version is driven by making it immediately visible to those who primarily focus on the version. It's not necessarily motivated by trust but rather transparency and better 'book keeping'. It more directly conveys the relationship between tag information and how VCS changes can influence the build.

Does +dirty create any problems for tooling? My understanding is that it can easily be chopped off for those who don't need it but is useful for those who may intentionally or unintentionally ignore vcs.modified and only look at the version.

If there are specific use cases where +dirty causes issues for tooling then we can still change that as there is plenty of time till the 1.24 release. However we believe the enhanced visibility it provides into the build's status outweighs any potential drawbacks.

Of course, we're open to hearing further feedback and specific examples :)

I also notice that the version stamping does not happen with -buildvcs=false

Yes correct. Sounds good I will be sure to document that aspect as well.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/605615 mentions this issue: doc/next: document version stamping for go builds within a Git Repo

@mvdan
Copy link
Member

mvdan commented Aug 19, 2024

I don't feel strongly about the presence of a +dirty suffix, nor do I have a specific use case that gets broken by it. Like you say, it can be chopped off fairly easily. Although I could say almost the same about adding it myself from vcs.modified - this is more a matter of what the default behavior should be. I don't think choosing to be explicit and verbose by default is wrong per se.

However, I do find it an odd default because I often end up with +dirty builds when the binary is identical. For example, I regularly capture cpu or memory pprof files, and I sometimes leave them around for a while as they are harmless. Because I don't explicitly gitignore them, they make my local binaries +dirty, even though they don't participate in the build process at all.

Another similar scenario that I'm running into is building binaries for many platforms at once. I tend to build the binaries in the local directory, inside the module, to then upload them as release archives. And once again they are not gitignored - which is fine, as I delete them soon after, and they don't participate in the build. But once again, +dirty shows up for any build after the first one:

$ GOOS=linux GOARCH=amd64 go build -o binary-linux-amd64 ./cmd/cue
$ GOOS=linux GOARCH=386 go build -o binary-linux-386 ./cmd/cue
$ ./binary-linux-amd64 version
cue version v0.11.0-0.dev.0.20240819151828-81d6f8bfcd61

go version devel go1.24-527610763b 2024-08-15 23:43:00 +0000
      -buildmode exe
       -compiler gc
  DefaultGODEBUG asynctimerchan=1,gotypesalias=0,httpservecontentkeepheaders=1,tls3des=1,tlskyber=0,x509keypairleaf=0,x509negativeserial=1
     CGO_ENABLED 1
          GOARCH amd64
            GOOS linux
         GOAMD64 v3
             vcs git
    vcs.revision 81d6f8bfcd61eeff2ec60ac71c8cc9867e6853e9
        vcs.time 2024-08-19T15:18:28Z
    vcs.modified false
cue.lang.version v0.11.0
$ ./binary-linux-386 version
cue version v0.11.0-0.dev.0.20240819151828-81d6f8bfcd61+dirty

go version devel go1.24-527610763b 2024-08-15 23:43:00 +0000
      -buildmode exe
       -compiler gc
  DefaultGODEBUG asynctimerchan=1,gotypesalias=0,httpservecontentkeepheaders=1,tls3des=1,tlskyber=0,x509keypairleaf=0,x509negativeserial=1
     CGO_ENABLED 0
          GOARCH 386
            GOOS linux
           GO386 sse2
             vcs git
    vcs.revision 81d6f8bfcd61eeff2ec60ac71c8cc9867e6853e9
        vcs.time 2024-08-19T15:18:28Z
    vcs.modified true
cue.lang.version v0.11.0

Arguably this is all an issue affecting vcs.modified as well, which can be seen by cue version above printing the BuildInfo setting lines as well. I would agree with that - if any files make VCS "dirty", but go list can tell that they do not participate in the build (not known source files, not used for go:embed, etc), then they shouldn't make built binaries "dirty".

Even though this was already an issue for vcs.modified, I honestly didn't really care too much before this point. Now that it's causing me +dirty suffixes in locally built module versions more often than not, it is causing me some issues. So I'd be fine with keeping +dirty suffixes as long as them and vcs.modified did not care about the dirtiness of files which do not participate in the Go build.

Speaking of... what if I had a new uncommitted Go file which did participate in the Go build, but was gitignored, so it wouldn't show up in git status? While this is technically not "VCS dirty", I would argue it should still mark the binary version and vcs.modified as dirty, because the Go tool can tell that it built a source file which is not tracked by VCS at all. Does this happen today?

@hyangah
Copy link
Contributor

hyangah commented Aug 21, 2024

@samthanawalla Thanks for this feature! This will help simplify the future release workflow of vscode go release greatly.

One more tuning request: I noticed that some projects use the tools.go hack to pick govulncheck or gopls versions. (note: imo, listing tools like gopls to tools.go is always a mistake, but listing tools like govulncheck in tools.go is understandable)

However, tools built this way can be very different from the tools built with go install <tool>@<version> or from a clean checkout of the tool's source code.

For example,

$ cat go.mod
module xxx

go 1.21.0

require (
	golang.org/x/tools/gopls v0.16.2-pre.1
	golang.org/x/vuln v1.1.0
)
...
$  gotip build golang.org/x/tools/gopls

This gopls binary is very different from [email protected].

$ gotip install golang.org/x/tools/[email protected]
$ gotip version -m ~/go/bin/gopls > install.txt
$ gotip version -m ./gopls > unclean.txt
$ diff install.txt unclean.txt
1c1
< /Users/hakim/go/bin/gopls: devel go1.24-f38d42f2c4 Wed Aug 21 01:11:27 2024 +0000
---
> ./gopls: devel go1.24-f38d42f2c4 Wed Aug 21 01:11:27 2024 +0000
7,8c7,8
< 	dep	golang.org/x/mod	v0.19.0	h1:fEdghXQSo20giMthA7cd28ZC+jts4amQ3YMXiP5oMQ8=
< 	dep	golang.org/x/sync	v0.7.0	h1:YsImfSBoP9QPYL0xyKJPq0gcaJdG3rInoqxTWbfQu9M=
---
> 	dep	golang.org/x/mod	v0.20.0	h1:utOm6MM3R3dnawAiJgn0y+xvuYRsm1RKM/4giyfDgV0=
> 	dep	golang.org/x/sync	v0.8.0	h1:3NFvSEYkUoMifnESzZl15y791HH1qU2xm6eCJU5ZPXQ=
12c12
< 	dep	golang.org/x/vuln	v1.0.4	h1:SP0mPeg2PmGCu03V+61EcQiOjmpri2XijexKdzv8Z1I=
---
> 	dep	golang.org/x/vuln	v1.1.0	h1:ECEdI+aEtjpF90eqEcDL5Q11DWSZAw5PJQWlp0+gWqc=
18c18
< 	build	DefaultGODEBUG=asynctimerchan=1,gotypesalias=0,httplaxcontentlength=1,httpmuxgo121=1,httpservecontentkeepheaders=1,panicnil=1,randseednop=0,tls10server=1,tls3des=1,tlskyber=0,tlsrsakex=1,tlsunsafeekm=1,winreadlinkvolume=0,winsymlink=0,x509keypairleaf=0,x509negativeserial=1
---
> 	build	DefaultGODEBUG=asynctimerchan=1,gotypesalias=0,httplaxcontentlength=1,httpmuxgo121=1,httpservecontentkeepheaders=1,randseednop=0,tls10server=1,tls3des=1,tlskyber=0,tlsrsakex=1,tlsunsafeekm=1,winreadlinkvolume=0,winsymlink=0,x509keypairleaf=0,x509negativeserial=1

Once the tool dependency proposal #48429 is implemented, some users may attempt to pick a specific version of govulncheck (or gopls) and such unexpected and untested versions of tools may appear more often. Currently golang.org/x/telemetry uses the main module's version string as the tool's version. The crash reports and counters from such untested versions are not distinguishable, and can add much noise.

Is it possible to use a different version string (+dirty) or add an extra build setting info if a tool is built this way? That will help us filter out telemetry from unexpected builds of tools.

@matloob
Copy link
Contributor

matloob commented Aug 21, 2024

@hyangah I had a similar (though slightly different concern) that the build with go build would get a different build list from the one using go install. I think once a build is being done outside of the go install pkg@version context, the version of the installed binary's module tells us a lot less about the dependencies used to build the binary.

What do you think about adding a buildinfo field for the module path of the main module (or whether the build was done from a workspace, or from a pkg@version context) and telemetry could treat the binary different depending on which it is?

@gopherbot

This comment was marked as off-topic.

@samthanawalla
Copy link
Contributor

samthanawalla commented Aug 26, 2024

@mvdan I see your point. Multiple sequential builds will create a +dirty tag because of the binary that gets created which does not seem ideal. I will see about making an exception for that.

As far as the dirtiness of files which do not participate in the Go build affecting +dirty, we discussed that above: #50603 (comment)

We think the most common case is a clean build from CI. Otherwise can clone & build from a clean repo.

Yes an uncommitted file that is gitignored will produce a build without a +dirty tag. This is a drawback of the current implementation. However if this is an uncommon situation, then perhaps we may need to compromise somewhere.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/609155 mentions this issue: cmd/go: add Mercurial based version stamping for binaries

@mvdan
Copy link
Member

mvdan commented Aug 28, 2024

@samthanawalla @matloob I feel like whether the builds happen locally or in CI is a red herring. Whether I do the builds locally or in CI, placing the binaries inside the same cloned repository seems perfectly fine to me. They don't affect the build in any way, nor do they touch any committed files.

git describe --always --dirty agrees with me here, for what it's worth; it only adds a -dirty suffix if I modify or delete any tracked files, or if I git add any new files. Any new untracked files, even if they are not gitignored, do not cause a -dirty suffix. So our notion of "dirty" seems to misalign with git's, and git is easily the most common VCS system people are using.

And as explained before, I think it's a mistake for go build to not mark a binary as dirty if a gitignored Go file is present in the build. I just tested this, and the resulting build was not marked as "dirty", even though it definitely built different source code than what is committed.

I realise my arguments are against vcs.modified; your addition of +dirty simply follows vcs.modified. I think both follow a flawed notion of "dirtyness" that is not useful to Go developers and can be confusing to Go users. I think we should adjust it so that:

  1. A build is dirty when there are new and VCS-ignored files which contribute to the build (e.g: Go files part of a built Go package, or any embedded files, etc).
  2. A build is not dirty when there are new and non-VCS-ignored files which do not contribute to the build (e.g: built Go binaries, or pprof files, etc).

That is, Go builds should only be marked as "dirty" if we can tell that they used a different set of source files compared to what is committed. That is honestly what I thought vcs.modified already was, and I think it would be a much better definition than what we currently have. As currently implemented, I'm afraid that I'll have to strip or ignore vcs.modified build settings and +dirty version suffixes, because they'll often confuse users for no benefit.

gopherbot pushed a commit to golang/vscode-go that referenced this issue Aug 28, 2024
This subcommand cross-compiles the tagged version of vscgo
for all targets we want to release prebuilt vscgo binaries for.
It uses `go install github.com/golang/vscode-go/vscgo@<TAG_NAME>`
to compile the binary, in order to embed useful build info
(so, we can have meaningful telemetry). After go1.24
(golang/go#50603) we will be able to use
`go build` instead.

The output of this subcommand is "vscgo.zip" in the -out directory.

The test involves running the `go install` and checking
the compiled binaries included in the vscgo.zip file
have expected GOOS/GOARCH. For hermetic unit testing,
the test creates a file-based module proxy using the code
copied from golang.org/x/telemetry/internal/proxy.

For #3186

Change-Id: I45848fdb676d094c634786ac939481f1778b4995
Reviewed-on: https://go-review.googlesource.com/c/vscode-go/+/595376
Reviewed-by: Hongxiang Jiang <[email protected]>
kokoro-CI: kokoro <[email protected]>
Reviewed-by: Robert Findley <[email protected]>
Commit-Queue: Hyang-Ah Hana Kim <[email protected]>
@samthanawalla
Copy link
Contributor

@mvdan
Conclusion from the contributor bi-monthly tools meeting:
While we'd ideally like a completely precise solution for determining dirtiness, it looks like that would require some extensive modifications to the build system (we don't track all files that contribute to the build right now.) This might also introduce significant overhead and slow down builds.
Given the potential complexity and impact, let's move this discussion to a dedicated proposal for refining vcs.modified.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/611916 mentions this issue: cmd/go: prevent git from fetching during local only mode

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Accepted
Development

No branches or pull requests