Hacker News new | past | comments | ask | show | jobs | submit login
Git is my buddy: Effective Git as a solo developer (mikkel.ca)
418 points by vortex_ape 12 days ago | hide | past | favorite | 205 comments

I'm also a solo developer and use git in a much less sophisticated fashion. I tend to use it as, "freeze my code here, so in case I f something up, I can get back to a moderately clean state." It's kind like a snapshot-based local history. And, quite frankly, I rarely revert one, but it makes me feel safer. I don't care if I have a lot of commit messages that say, "interim". The good ones are clear.

Is this a terrible coding practice? I don't have enough non-me experience to know what an anti-pattern this probably is. I probably won't change my process, but I'm curious.

> it makes me feel safer

This for me is one of the biggest things I like about working under version control, even solo. It gives me the freedom to explore some crazy idea or refactor without having to think about the way back if it doesn't pan out. If it turns out to be more complex than I am willing to do now, I can stash or branch.

If I think back to my pre-source control days, I used to leave commented code everywhere, or just make a full copy of the folder. It doesn't take long before this becomes an absolute mess. Copying in particular was a barrier: you had to realize it was necessary then interrupt your flow to do something that would take a several seconds. (By contrast, if you commit as you go - especially everytime you get to a working "checkpoint" - there's zero extra effort needed.)

Exploratory refactoring turns out to be very close to an exercise in creative writing, as I learned one day accidentally from my lit major friend.

Take the section you are stuck on, print it out, cut it up into sentences or phrases, and just rearrange them until either it makes sense, or you figure out where you went wrong.

Rearranging code statements until something makes sense is exactly what refactoring is.

I beg to differ.

Refactoring is not merely rearranging code statements. Refactoring is restructuring of the code starting from the architectural and abstract goal and then looking at how pieces of existing code would fit. Sometimes, that requires writing new code and tests. Refactoring by definition also means not breaking the user space.

I've never heard of any serious writer printing out their prose and cutting it and rearranging it. That just sounds absurdly unnecessary to me.

> code starting from the architectural and abstract goal

you are either using a different definition of architecture or this is wrong. Refactoring is bottom up construction. Most of the time when I see people frustrated or struggling (including myself) it's because they have forgotten this and need to take a break.

I am using the standard architecture difinition.

More information here: https://en.wikipedia.org/wiki/Code_refactoring

There are many goals in refactoring, specifically this section:

> Potential advantages of refactoring may include improved code readability and reduced complexity; these can improve the source code's maintainability and create a simpler, cleaner, or more expressive internal architecture or object model to improve extensibility. Another potential goal for refactoring is improved performance; software engineers face an ongoing challenge to write programs that perform faster or use less memory.

I was addressing OP's analogy to cutting pieces of written prose and rearranging.

A bad paragraph does not get the point across because it is doing things in the wrong order, or taking to long to get there. Hoisting code can be removing repetition, performance... So many things. Rearranging or deleting code so that a piece is not trying to do three things at cross purposes, for one.

Code is meant to be read by humans and only incidentally by computers.

A lot of architecture is just being clear about what is intrinsic complexity and what is accidental, be it cognitive or computational.

> I've never heard of any serious writer printing out their prose and cutting it and rearranging it.

Writers definitely do this. Maybe not at the prose level, but for sure at the plot level and chapter level.

I've found myself doing this in e-mail recently. I naturally try to set the stage/explain the situation, then ask for something (opinion/resources/prioritization/...). But I've been advised to lead with the request, and then explain. It often takes a minor rewrite to make it work, but I've become convinced it helps motivate the reader to read the explanation.

I’ve definitely heard of doing this when plotting something out, ie at compile time instead of runtime to stretch an analogy.

I think you can buy a box full of words: https://magneticpoetry.com/

Yeah, writing code without source control sounds horrible. Can't imagine what it must've been like for those who had to suffer through such time.

I know a contractor who has worked for a major company that I won't name. He has told me that their source control was, for a time, Google Drive. He knew it was a recipe for disaster but real work was nonetheless getting done, and the client was satisfied. They didn't know how the sausage was being made, but they liked the output.

I think a lot of people who haven't been around the scene wouldn't believe these stories, but this stuff happens a lot. Like major commercial projects with no tests whatsoever (unit, integration, or otherwise), that are still successful and making a lot of money.

Not surprised. A lot of this stuff doesn't get set up because people are lazy. Or developers don't want to, or are unable to, do sysadmin work.

I worked at a small "startup" inside a larger, several billion dollar company, back in the late 90's. Nobody set up source control for that division, despite the parent company being over two decades old and having people very experienced with that sort of thing. We were also integrating code from third party contractors, and it was a big mess. Files getting overwritten, people copying stuff off their local desktops, consultants FTPing in updates, etc. After a couple months of copying junk everywhere, I finally got fed up. As a 22 year old, basically straight out of college, I was training the entire team how to use CVS...

We’ve had effective source control since the 70s, latest 80s; for the most part working without it was self inflicted.

Exactly. When people say 'before we had version control' I want to ask, how old are you? And by the way, I am older than just about all of you.

Started with SCCS with versioned control lists to determine what got pulled from SCCS. The outer wrapper was all written in shell. 1980s.

Talking about a large system, eight or ten sub projects, each sub project in its own versioned source tree.

A release spec pulled the SCCS deltas of all the sub project control lists, and then SCCS was directed by those versioned control lists to pull all the source code for each sub project.

So yes, version control that I am aware of was firmly entrenched in 1980. And I am certain it goes back further than that.

Yes totally. It gives you freedom to try things you don’t fully understand in your IDE or framework too then decide whether you want to revert everything afterwards.

Modern IDEs often ave basic source control baked in. You don't even need to commit anything. I wonder whether there is any point in using Git for basic version control if those features are already available.

At least in IntelliJ, I find using the local file history stuff painful. With explicit source control, I'm making specific decisions to check in known states. When I have to resort to the local file history stuff, there's a lot more of "oh, here I undo'd a typo" and so on type of things.

That said, it can be a lifesaver when I didn't make an explicit commit, then started doing stuff, then realized "ok this got out of hand AND I wish I could go back five minutes but it's gonna be annoying."

The intellij local history works best when your code is a raging monolith. It punishes you when you find coupling in otherwise uncoupled code, and that punishment is there whether you leave the coupling or try to fix it.

One of the failure modes for people leaving a mess is if it's too hard to fix it they give up. So that's no good.

Committing aside, git also has stash. Many people prefer using one consistent tool (git) over the variety of IDE's equivalent features available.

I posted elsewhere. Stash is your friend. If you aren't using stash today, learn it. And learn the difference between 'stash pop' and 'stash apply'. Each has its place and time.

Yes: you like the interface for git already and are productive with it versus learning how to achieve the same productivity while learning the idiosyncrasies of your chosen IDE platform.

I'm butchering something someone said here about a month ago:

Some people cut a trail through the jungle, others just push the branches out of the way and expect everyone behind them to do the same.

Most IDEs that have basic source control baked in typically means it has a git client and UI. I know that years ago I used Netbeans local history but it's a per-file history and does nothing to keep a set of files at a specific "save point" together.

Quite a few people are suggesting that, when it's time to share your code with others, maybe you should squash/rebase it to clean things up. That's totally up to you... but just know that not everyone thinks rebasing is a good idea. See [1], for example.

[1] https://fossil-scm.org/home/doc/trunk/www/rebaseharm.md

I think we often feel the urge to rebase and squash not because it actually makes our code changes easier to understand, but because it makes us feel better about ourselves. That's a red flag. Understanding how you got to the goal -- encoding all the fumbles and disoriented thoughts right in the commit history -- that can be a genuine benefit to the reader. Who do we really help by pretending that we're more organized, coherent, and linear than we actually were?

> Who do we really help by pretending that we're more organized, coherent, and linear than we actually were?

We're helping the future reader who's reading the history because they want to understand why a change was made - and "because the author of the branch initially had the wrong idea" is almost never the answer they're looking for.

I sometimes enjoy reading stream-of-consciousness writing, but most of the time (especially when reading code) I'm more interested in the point itself. The same applies to version history. It can be used to tell the raw story, but there's usually a more useful and interesting story to be told.

Exactly. I want to "tell a story" with my commits, and that story is really more of an idealized retelling of what I actually did.

Five years from now, no one needs to know that I forgot to add that one line to a prior commit and had to add it separately, or that my first attempt didn't quite pan out as expected.

What that future person _will_ care about is:

- What final changes actually got made?

- What task was I working on?

- What was the reason for any of these changes in the first place?

- Why did I make some of these changes specifically to implement that task?

- What additional side info is important context for understanding the diffs?

Exactly. It's also great to compartmentalize different aspects of your change.

Often my changes are

1. Refactor the existing code to support the new feature

2. Add the new feature

It's great to keep these separate, because someone can look at number 1 and see that the two versions of the code ought to be functionally the same (same tests pass, app looks the same, refactor is easy-to-understand), and look at number 2 and see the new feature.

There are countless other times where you want to tell the "story" in a logical fashion.

(Honestly, I expect that there is a significant correlation between being a good git committed and being a clear story-teller.)

I understand that you want to tell a story. But as someone examining your code, I also want to know how you got there. While you're throwing out your junk, you're also throwing away valuable information. If I'm taking the actual time to review the code history, then let me play it out in real-time, mistakes and all. I know how to step back and summarize, I don't need you do do that for me.

This is especially true if your code is clever. I'm much more likely to understand your polished gem if I can see all the things that you bumped into while you were discovering it.

That is what comments and commit messages are for. I trawl history all the time. Running into an unbisectable mess of a branch (because a bug that was introduced in commit X~15 is fixed in X on the same branch) is a complete nightmare. I have to discect the branch history and understand what is because of the branch and what is debugging/review/CI cycle cleanups. Commit messages for fixups also tend to be 100% terrible and utter trash. "Fix review comments". Thanks. If we're doing that, let's copy what the comment was in too and why it fixes it.

The problem with your request is that 90+% of the time (with the way I develop), the dead ends are on MRs that got closed or code that never got pushed in the first place. So again, comments as to why this approach is used is way better than hiding it in the history because someone coming to "clean up" code sees the thought process instead of having to remember to search for it.

I don't do much work like that -- I suspect you're part of a much larger developer team -- but I think I understand the problem you're describing.

Couldn't you simply review/bisect at the fork/join points? i.e., take the commits at which forks began or ended, ignore any intermediary commits, and run the bisect (or, read diffs) across that subset? That way you're only comparing at the chapter-markers of the story, so to speak, and not getting mired in the gory details.

Yes, `git bisect --first-parent` was a feature I wanted for a long time. It finally exists now, so yes that helps, but is not a complete solution.

Even with `bisect --first-parent`, I still want useful commit messages which "fixup" commits, again, are uniquely terrible at being on the whole.

I do software process and other things, so some of my branches tend to be gigantic (e.g., revamping the build system) and can be 200+ commits because one cannot meaningfully land a build system rewrite incrementally. That one in particular was meant to be bisectable because when rebasing on top of new development, I wanted fixes to be in the "port this library over" commit instead of after some random merge commit based on when I decided to sync up that week (it took a year to do it). So once I get it down to a particular MR, being able to inspect that topic is still a useful property.

Note that this only works with a `merge --no-ff` workflow too. The `rebase && merge --ff-only` pattern and `merge --squashed` are both terrible, IME, at making useful history. The force-rebase workflow is just as confounding to me as the no-rebase workflow (the former de-parallelizes your MR merge process and the latter tends to make a terrible commit history).

Note that even for single-developer projects I run, I tend to make PRs even for my own changes (once it's gotten off the ground).

While I understand and somewhat empathize with this desire (I'd use it all the time for personal repos, for example)... current VCS systems are terrible at supporting it.

What you probably want in this case is something like "automatically commit on every change (possibly recording every keystroke)" + "automatically tag based on tests/builds passing or failing" + "allow manual comments at any time, whether based on files changing or not". All of that is technically possible with git/hg/fossil/etc, but it's so much work for both the recorder and the viewer that it's infeasible.

This is great, except that we’re often bad at recounting this idealized history without lying in ways that make later maintenance more difficult

> or that my first attempt didn't quite pan out as expected.

Actually that's still important, it's just important from an architecture perspective.

As a much newer developer, the biggest problem I have with git is that I rarely end up actually making one change at a time. I'll be working on some larger thing, and in the process I'll notice and quickly fix a smaller thing before returning to the original task. This might be a typo in a code comment, a poorly named variable, or a block of code I realize is dead.

I suspect this is the type of tendency which goes away with experience, but it makes git a lot less useful. My commits won't really tell you what changed; the most they can tell you is the primary change I was working on.

Many of us do that, and it's not just a new developer thing. Git actually enables this, because you get to pick and choose what to add to the index (`git add`) before committing. So that little tweak you made in the unrelated function? -- no problem, just `git add` that later, and commit it under a different message. Not all SCM tools give you that kind of flexibility.

On the other hand, there's a diminishing return to placing every tiny change into a separate commit. Commit messages like "Fixed multiple small things" might make some people clutch their pearls, but sometimes you just need to get shit done and move on to solving bigger problems.

My suggestion is to consider breaking your commit into two: one for "fixed this big issue that everyone cares about", and one for "a bunch of tiny cleanup stuff that I happened to notice." (Maybe call that second one "refactoring" -- it will go over better with your audience.)

> Git actually enables this, because you get to pick and choose what to add to the index (`git add`) before committing.

That assumes the changes are in separate files though, right? I know you can do use the "-i" flag, but it's fairly labor intensive.

That kind of depends on your tooling. e.g., I use Magit (an Emacs front-end for git) which makes interactive mode really, really easy.

(But easy or not, other version control systems such as Subversion don't offer the feature at all. We kind of take Git for granted these days, but it wasn't always like that.)

A lighter weight option is the --patch flag to 'git add' and 'git commit'.

Personally I have gotten used to using `git commit --patch` for everything (even if I only have one change) just as a convenient way of reviewing the changes I am about to commit. With that, only committing part of the changes is no additional effort.

Look at `git add -i`. You can commit just part of a change to a file. So if you notice a small problem and already have a bunch of changes made, you can still make those changes, and commit them separately.

Up to you if you wanted to rebase those changes back onto main.

I don't use it often and find it's kind of painful to use, but if you're in the position where you've already saved two different things in your IDE and need to pull them apart for commit, it's a useful tool.

Have you tried using 'git commit --patch'? It makes it easy to separate out unrelated changes when committing. You can precede it with an invocation of 'git reset $HASH' to restructure your last few commits.

In general, more experienced git users aren't actually working on one commit at a time. They're just comfortable enough with editing history to make it look that way.

That is what OP was talking about: after you've done the change, make a commit with just that tiny refactoring. Once you're done and ready to review your work, you can cherry pick just that fix and move it to main / master / it's own PR. Since it is self-contained, it can be processed by itself only.

`git add -p` will take you through all the changes in your files, and let you add them selectively. I find this makes for much cleaner commits.

With the add command's interactive mode, it is often possible to selectively stage and commit individual patches in a file.


> We're helping the future reader who's reading the history because they want to understand why a change was made...

"Change" is a subject to interpretation. Most of the time it's the scope that the change belongs to is what has the meaningful value.

Say, changes made in connection to fixing an issue are logically tied for inclusion as well as for potential unwinding.

Some tangent changes technically should not be casually folded in, just in case this changeset will need to be propagated or rolled back.

Thus this elaborate muli-staged commit management in Git.

Many projects don't have such need to manange the change flow, so Version control is used as a kind of undo buffer. Which is fine, in such cases the meaning is tied to release states.

If anything, it makes more practical sense to preserve only commits with a buildable state, not just some transitional changes.

The advantage if that you get a more usable and understandable list of historical changes. "You wouldn't publish the first draft of a book" [1]

A squashed merge or rebased and cleaned set of commits gives a very clean overview of which changes where made, at what point, why they were made, and what together. That picture tends to get utterly lost in the "set up X", "make test Y", "fix typo", "wip" and "change error handling" commits a feature branch typically has.

Additionally I'm not really interested in that my colleague started change X yesterday before lunch, I'm interested in when it went live and became visible for the all developers when it was merged into the main branch.

[1] https://git-scm.com/book/en/v2/Git-Branching-Rebasing#_rebas...

You wouldn't publish a first draft, but neither would you burn it once the final draft was off to the printer. Personally, I'd prefer it if "squashing" commits was purely a UI thing; the underlying commits were all still there, but grouped together and displayed as a single big "virtual" commit. That way you could still drill down to the real history if you needed to.

Why would you want to see every typo that was corrected? Every little test that was changed erroneously and then backed out again?

That may be an accurate representation of the order savepoints were made, but it's not an accurate representation of how the software evolved. It is noise that needs to be discarded if a reader would like to know what change was really made. It also makes if difficult or impossible to use tools like git bisect.

Is the argument really that a more detailed history is always better? In the trivial case every keypress could be a savepoint, and every savepoint a commit.

One does not always know in advance that a commit needs to be split in two. The only way to produce readable commits without rebasing them in that case is to work with local _backup files. A version control system does this much better.

In fairness, you're only seeing 5% of the typos. We caught the other 95% before committing. :)

I love your question, "why not a commit per keypress?", because it raises an interesting follow-up: why not squash and rebase entire months or years of project work into single commits? If squashing is so useful, why do we only apply it at low-grain scales? Could we read and understand massive projects quickly and easily, if they only had a few commits to them?

I'm sure that we don't experiment with larger-scale rebases because of the limitations in the technology -- we all know that we're not supposed to 'git rebase' in public, and why that is. But suppose those obstacles were lifted. Now that we can rebase and rewrite at any time scale, which scale(s) is the right one(s) to choose?

> why not squash and rebase entire months or years of project work into single commits?

The argument here is that one should rebase and carefully craft commits that isolates each functional change into a separate commit, where each change is motivated and builds on previous, before pushing anything. Every commit should build cleanly, preferably even pass tests. That makes changes easier to reason about, and enables the use of tools such as bisect. Look at git itself for an example of this type of history.

The counter argument to that was that it presents a false view of history. Maybe there were false starts and mistakes made along the way. Without preserving these to history the reader is left without understanding these. This is not an uncommon argument. Some people argue rebase should never be used.

This view suggests that a more detailed history is preferable. Taken to its logical extreme, that would mean every keypress and editor command.

But "why not delete all of history" is not an example of "carefully crafted commits" taken to an extreme. Quite the opposite.

Basically, you want to keep the history of individual logical patches to the codebase, but not the meta-history of how those patches were made.

It helps to think about how git grew out of an email based workflow.

A commit is essentially an email. It has a sender, date, a subject line and a message body. The commit message format is subject, empty line, body. Think of git repository as an archived mailing list worth of patches.

Much the same as you wouldn't send an email describing several days of work without proofreading it, you should treat your commits the same way. The git design grew out of this usage, which was much harder in something like Subversion.

No one would send an email to the kernel mailing list suggesting a patch set that included errors, false starts, and reverts. That would waste reviewers' time. Code history is a craft to aid understading the code, it is not an undo log.

> it raises an interesting follow-up: why not squash and rebase entire months or years of project work into single commits?

That's effectively what happened before version control/before the small-scale rebases we enjoy now were possible. And the reason is that it's hugely valuable in certain circumstances to be able to see some granularity of the history. (Though clearly people disagree about what the grain size should be.)

> Could we read and understand massive projects quickly and easily, if they only had a few commits to them?

I don't think so. The current state is visible at the top of the git tree regardless. History comes in when you are trying to understand why the state is what it is. Usually this is for troubleshooting in my experience, but sometimes also when doing a refactor. Meaningful commit messages attached to meaningfully-clumped patches are, in my opinion, absolute gold in those cases.

There's little benefit to squashing down a year's worth of work into 5 commits because you can just as easily tag each of those 5 commits with a version number, give it a little write up, and call it a release.

I think the reason to squash commits is to cut out the noisy bits that were only useful to the original developer that day and create a timeline that's helpful for future readers. It doesn't really make sense to get more granular than the level of a single commit with a good comment and a small set of cohesive changes. So you store your history at that granular level and you can take care of the rest with tags, minor and major versions, etc.

The Fossil designer agrees with you:

"So, another way of thinking about rebase is that it is a kind of merge that intentionally forgets some details in order to not overwhelm the weak history display mechanisms available in Git. Wouldn't it be better, less error-prone, and easier on users to enhance the history display mechanisms in Git so that rebasing for a clean, linear history became unnecessary?"

I'm not a user of it myself, but I believe this is the philosophy behind how Fossil approaches it:


Pull requests can serve the same purpose; messy feature branches and a clean main trunk.

The only way you get that in Git is if you squash-and-rebase before merge, though. Which is fine if that's the process and end result that you want, but does (if you keep feature branches "messy") disconnect feature branches from their related merges into trunk from Git's point of view.

Yeah, you're reliant on Github metadata to make those links for you; there's nothing natively in git itself doing it. It's also an all-or-nothing affair, where the whole PR becomes a single squashed commit. To get anything in between ("here's my single large PR which I've rebased into N incremental commits, but you can also dig in and see the work that actually led here"), you really do need first class support in the tool.

I suppose the Github answer to all this would be "just make separate PRs", but going that way asks a lot more of the developer in terms of how polished those incremental states need to be.

Mercurial does this with the Evolve extension.


It still has the individual commits, but the interface will make it appear as if it's just one commit.

The real history is useless. Especially if we have tests. In that case it doesn’t matter how often we make changes.

I do think this is because I prefer to think of code as a black box. No one should need to figure out how my functions work. Someone should just need the name of the function, what inputs it receives, and what output does it return. If someone actually has to read my code, that’s a failure.

> If someone actually has to read my code, that’s a failure.

I can't tell if you're being serious, or are a brilliant troll. :)

Assuming you're serious, Hyrum's Law is one reason I might need to see your code (https://www.hyrumslaw.com/). The signature of your function is not the whole signature, it's just a sketch of the high points.

You really should just need to read the code in case something goes wrong, but otherwise, no. You need to be more careful with your time.

> Who do we really help by pretending that we're more organized, coherent, and linear than we actually were?

You help the reviewer.

To understand why git is the way it is, you have to understand the workflow of the original git-using project (other than git itself), the Linux kernel. Whenever someone proposes a change to the Linux kernel, it's sent as a sequence of patches. Each patch should contain a single logical change, and will be reviewed individually. For instance, suppose you want to change the way a field in a particular structure is stored. The first patch of your series might introduce a couple of helper functions to access the structure fields. Patches 2-5 might each change a separate subsystem to use the new helper functions, instead of accessing the field directly. The next patch changes both the field and the helper functions to use the new representation. When reviewing this sequence, it's easier to see that each patch is correct. And that was a simple example; it's not rare to have patch series with over 15 patches, and even longer patch series are not unheard of. I've seen patch series which refactor whole subsystems, where each patch in the series was an obviously correct transformation, while the final result was completely different.

From the Fossil page: > Rebasing is lying about the project history

This tired hyperbole just won’t seem to ever go away. Please try to ignore this junk, the Fossil devs could and should make their point without the FUD and misleading judgement, if they want to be taken seriously. Rebase has perfectly legitimate uses, and if Fossil makes it so you don’t need to rebase, that’s fantastic.

Rebase is most useful before pushing local changes to other people, and most people fluent in git know this fact, and also know that you don’t rebase public branches, you don’t rebase other people’s commits or your own after they’re pushed, except in emergencies and with team communication.

Rebasing before you push is the same amount of “lying” as typing something into your editor and then deleting it before you hit save. You don’t actually want your history at the raw keystroke level, right? You aren’t “lying” if you fix a bug you wrote before you push the bug into public branches, right?

> Understanding how you got to the goal -- encoding all the fumbles and disoriented thoughts right in the commit history -- that can be a genuine benefit to the reader.


Sorry, but I'd rather be rather inclined to read commit history like this: (whether it's reviewing others' code or my own at a later time)

- Add functionality X to function y()

- Fix a bug in y(): ...

- Fix a bug in z(): ...


- X

- oops

- fuck, typo fix

- do it another way

- ok, y is fixed now

- another typo fix

- it has a bug, fix it

- z has the same bug

- typo fix

Whereas the latter can be quite common during dev cycle so as to keep it to yourself. It's not about 'pretending' at all.

I think that's a pretty valid argument about just wanting to rewrite history.

I'll offer an alternative. I love having every commit buildable. When I'm drafting, this isn't going to happen. I'd like to save my work and move between machines more frequently than that. But after a rebase, it's great to only have compiling commits. It makes doing a bisect a lot easier when you're hunting for something.

I have found this works a charm, if I want to present a clean repo (for things like tutorials and classes): https://24ways.org/2013/keeping-parts-of-your-codebase-priva...

But basically, I let things "all hang out."

Tools shouldn't really be running the show.

My commit history is often a descent into profane madness.

For my solo projects I break the "don't code in master" rule because there is nobody else to coordinate with, and I usually only work on a major idea at a time. However I still use branches, usually if I want to quickly test out a breaking change, or if I start something I don't anticipate being finished with in a long time, so that my master branch remains usable for other side tangents.

The branching strategy means that it's pretty important that my commits are small, the brief commit message is accurate (even if I occasionally commit too many changes at once) and the description explains my train of thought. Nearly every time, I am communicating those changes to myself in 6 months when I switch into that branch randomly and wonder what I was in the middle of doing.

Rebasing private code to clean up WIP commits and break it into logical steps is healthy and a very good practice. But as Linus himself says in the linked mailing list post[1], just don't rebase public code.

     "In other words, you really shouldn't rebase stuff
     that has been exposed anywhere outside of your own
     private tree. But *within* your own private tree, and
     within the commits that have never seen the light of
     day, rebasing is fine."

     -- Linus Torvalds

[1] https://yarchive.net/comp/linux/git_rebase.html

This applies mainly to projects with kernel style of development. There are not many of those. In centralized repo style (GitHub), it's fine to rebase, even force push as long as you know exactly what you are doing and coordinate with your colleagues.

In a lot of projects topic branches generally have a single owner and are not used as the basis of other people's branches, even if they are technically public they are not public in the same sense that he is referring to. If you aren't going to be getting PRs on your branch you can consider it private and rebase all you like IMO.

edit to add: I generally prefer people not rebase after they've asked for a PR review just because the reference for comments will be lost. If they want to, maybe do it after all the reviews are approved.

I'm going to disagree, unless your are a solo developer (and even then it's bad practice to rebase commits that have already been pushed). Allowing rebase on shared branches just opens the door to too many possible catastrophic mistakes. When I make a new repo the first thing I do is disable history rewriting.

Absolutely. This guy has too many rules.

When I work alone I'm climbing a mountain, and Git is the rope. I can fall, but I won't fall far. I commit as often as I want to. The log is not a story for someone else to read later, it's the way I get to the top.

I find commit logs useful even if I'm the only person ever reading them. I like to be able to git blame on a line and be reminded of the context in which I did something and what I was trying to solve. I don't bother to pretty up my feature branches though, I just squash them so that master has a clear story

How we do it at work is every single commit message must have a ticket number in it. This is super easy to do and super useful. Even if the commit message is "fix exception #1823" you can go and look up #1823 and see what that issue was to make sure you don't reintroduce it with your change. You will always find more info and context in the ticket than in git commit messages.

I hate this. Often I am rewriting a bad comment, or improving the working code I checked in yesterday whose ticket has already been closed. Deleting an unneeded #include. All kinds of stuff for which there is no open ticket.

This kind of rule prevents people from maintaining the code base as they go. I have literally quit a company because of bullshit such as this. I was a senior engineer and could not fix a typo in a comment without a bug number and two code reviews.

This has been a non issue for me. I just slap minor fixes in with other tickets even if they are not related. Usually I just drop a comment in the review page with "saw this other issue and fixed it"

Short circuiting the review and QA steps is not ideal. A reviewer should see the change is just a comment typo fix and accept it even if it has nothing to do with the current ticket.

The log is a useful byproduct, but it's not the product.

This is an excellent metaphor, thank you. I’m going to add it to the bag of metaphors I use when explaining git

> It's kind like a snapshot-based local history.

You can extend it to remote-history too, because git makes it almost trivial to create a repo that you want to work over the network (without a running server of any kind).

I use git as a fancy rsync sometimes.

I do most of my work on a remote box, but I still like to edit locally in an IDE, but occasionally I make a change on the remote side.

On the remote side, I do

git init --bare project.git

git remote add clusterx remotebox:dev/project.git

Then do a git clone on the remote box from that repo, then I can push changes back to that local repo and when I'm done with the day, I can just pull it all back to my laptop with a git pull.

This used to be full of patch + diff + rsync in the past, but when you build stuff remotely and do diffs, but add new files to the codebase, it is so much easier to just use git.

For my personal projects, I think CSS files are the most common things I've edited in this sort of loop - my web-app folders are generally just git clone --depth 1, which also takes care of the other loop where I edit locally and deploy to remote.

Even when I use git by myself, I like to use branches. This helps me keep my work separate, and avoids issues if I'm working on one thing but need to do a quick fix elsewhere. I also tend to have anything in master set to go to live, so branching helps keep things that aren't ready from going live. Even if you don't have a "production" environment you push to, making sure your master branch is only code that works well is a good idea IMO.

And as another reply mentioned, squashing commits is good for keeping your history cleaner. My branches tend to have a ton of "fix" commits that get squashed out when merging into master.

Another reason I personally like branches is because it gives me confidence to make breaking changes without immediately having to worry about regressions.

Is it "terrible"?

If it's working for you to produce software how you need, then it's working.

But I would say it's building up habits of using git that would not transfer well to a multi-person team. That may or may not matter to you.

OP's usage is interesting, I think by and large they are transferable to a multi-person team, they are still good habits, or on the _way_ to good habits or _similar_ to good habits with a multi-person team. The one difference is how much easier it is for a solo developer to "rewrite git history" without disrupting others, in OP we see it done with abandon.

But in general the way OP is thinking about things -- what they are trying to prioritize how -- are things that apply to a multi-person team too. Keeping commit history readable, keeping branches cohesive, etc.

Your practices are... not. Which doesn't make them terrible, but it means you are developing habits you'd probably have to revise when/if working on a multi-person team.

Yep, small disciplined commits take valuable time. If you rarely revert or get other benefits from them they might be a net loss for you. Especially in solo projects when you can keep a lot of what's going on in your head.

It's a bit like testing - there's a lot of posts about where you need them and not many discussing where you don't.

It is funny because I use git (and commit messages) to help me keep track of what I was working on since I'm a solo developer but also an entire IT department so coding is only a portion of my time. Sometimes I'll just be starting to implement a major feature when something else will come up and I'll have to put it on hold for a few days/weeks. Having the quick little commits helps me figure out where I was and helps me get back into the flow.

If you use “git add -p” it makes small commits pretty painless. I still like small commits in repos I work alone on because it makes reading the history during future debugging easier.

Side conversation because I recognize your username. I've been playing wordoid every day since you posted it three weeks ago. You made a comment about having heard that someone scored 3000, and I think that's now in my mind as an end goal. I've gotten to about 1800 and can't quite let go yet. :)


Great, glad it's a fun distraction and that's a better score than I can get. :) Feel free to contact me outside HN as well if you can think of any improvements (global high score tables are sounding good!).

More on topic, when coding the game, I was Git committing maybe every hour or so without useful commit messages and didn't have a problem. With games (in the early stages anyway), I find you're typically changing lots and lots of small things all over the place to tweak the gameplay and presentation in an experimental way, so granular commits aren't helpful.

I would switch to more granular commits now though since the game has stabilised more.

If you work with branches, can you merge with the --squash option? This makes one neat commit on your default branch. You could even then commit without the -m option, and type a more descriptive multi-line commit message detailing the changes you've made.

I only work on little solo projects and this is what I'm doing. It makes a very readable history, and helps me answer "why on earth did I do that?", but it's harder to revert small changes later.

If I'm working with others, I try to match my committing style to the project.

Squashing makes tools like git blame or emacs’s vc-annotate a lot less useful: with small commits, I can reconstruct the code as it was when a particular line changed; with a squash, the coordinated changes are a lot less useful.

Without squashing git blame has too much noise in it for my taste. I don't want to see 90 different commits in a single file's blame, when they were actually related to 9 different features. If each topic branch has a reasonable scope then the squashed changes I think are more useful than each little tweak or fixup.

If you really want to do it nice, you get to the end and then move all of the commited changes in to the uncommited state and then recommit them in to logical steps and commit them piece by piece with well written messages.

But at some point you are spending more time bookkeeping than the actual value you will get from it. If its a personal repo, don't bother. If you are sending a patch to Linus, tidy your commit messages.

Pretty much what I do too, even working in collaborative projects. Once nice thing about git is that it makes it easy to go back and clean up your history with rebase before publishing it to others, so you can make as big of a mess as you want in your local branches without anyone else having to see it.

So yes, having "interm" or "wip" commits would be an anti-pattern in a shared repo, as it makes it harder for others to see what changes you made. For a local branch though; not a big deal.

So maybe that's the idea for my projects that I make available to others. Be a bit more deliberate with branches, allow them to be junky, and clean up when I merge to master. That seems like the best of both worlds with minimal effort. I think I'll even try it.

That has been how I have started to do it. I make a new branch, make a mess of it (until I am finished), then merge it back into the original "golden" branch.

> Is this a terrible coding practice? I don't have enough non-me experience to know what an anti-pattern this probably is. I probably won't change my process, but I'm curious.

For solo development? I don't think it's good, but ultimately you should do whatever works for you. When you work on a team, though, it might be hard to break the habit later if this is what you're used to, and you really really will need to. Nobody wants to see "snapshot" commits in a shared repo; commits will need to actually accomplish a clear goal. Also, I find it very helpful to be able to make independent changes in separate commits (sometimes I see something wrong that's unrelated to whatever I'm doing), then reorder (rebase) them to polish them sometime before pushing. If you don't get in the habit of making your commits somewhat orthogonal, you won't be able to do these kinds of things (whether on teams or solo).

I think what you're doing is fine especially if you're just coding for yourself. I'm perfectly happy with slightly more expressive commit messages like "LoadData appears to be working; still hacking away at TransmorgifyData", followed eventually by "stabilized TransmorgifyData".

I think it's when you start syncing with other people over multiple days that people start insisting that a commit should (compile), be atomic, tested, etc. What they're really looking for at that point is that incoming changes be easy to understand and modular and possibly easy to omit if some code change is causing them trouble for a moment.

As a solo developer who works across several machines, I still use subversion to coordinate code. I have yet to see the advantage of using git in this situation.

As long as you're not still using CVS -- that would just be masochistic. :)

If it's working, stick with it. Most people use Git as a centralized RCS anyway. I like the decentralized features of Git (darcs, fossil, hg, whatever), but mainly for short-term problem solving -- on any project, eventually an official hub emerges.

Yeah, I'm pretty much the same. Still, there are elements of the article's approach that I follow.

  Principle 2a: Every commit must include its own tests
  Principle 2b: Every commit must pass all tests
But otherwise, I don't create branches. My commits are medium sized, one big thing, and to the trunk. My commit messages are at best ok.

After a commit, I git --amend liberally. It's never really clear in my mind when a commit ended and the next one starts. This wouldn't fly in a group.

The one think I'd recommend is Never Type git. That's overstating, but basically git's command line syntax is just terrible AND thus dangerous. I think my one moderate sized git screwup was due to the command line syntax. So now I hide (most of) it behind shell aliases. This guy goes a bit far but you get the idea:


I pull rarely enough that I prefer to type it out.

Also, configure a good diff tool (although Apple seems to reject kdiff3 for now). And .gitignore goes without saying.

Regarding diff tools: suggestions welcome. I've used VIM's three-way diffs, and would prefer to stay in the CLI. But the Jetbrains IDEs come with a great GUI diff tool for Git merge conflicts. I'd love to find something comparable on the CLI.

I use kdiff3 but I only want a visual diff. I'm not using it as a merge tool. VS Code has great git support and great diff support. If it only had better (or rather, more accurate) vim support I'd be there but every time I try, I head back to standard vim with something like VimR. It's been six months and I should try again again.

Thank you.

> Is this a terrible coding practice?

Nope. I have been versioning everything from C# to SQL for more than a decade and it saved me many, many times. With Subversion too, which is far less evolved and modern than Git.

The advantage of mastering a complex tool like git and mantaining a central repository is the increased granularity of commits/branches and clarity of versioning, but if the "snapshot here and there" approach works for you, then use it.

>Is this a terrible coding practice?

Are you the only consumer of the practice? And do you like it? Then no, it's not terrible at all, it's useful. Git will function just fine for this. I do similar things with my "experiment" repos, they're practically "streams of thought saved to disk" and they contain a ton of digressions and occasional breakages and that's totally fine. I have zero complaints after several years of doing this.

The major benefits to much-more-structured approaches come in the form of automated tooling that's really only useful when you have large repos or many contributors (git bisect is a perfect example), or external automation (ci/cd pipelines, etc). For those kinds of repos, yeah, I'd say it's a terrible practice, and it'll cause some easily-avoided pain. But even then: work however you like on a branch, and merge (or squash) when you have "good" stuff, and it generally works well.

I do that on shared projects too. I absolutely will not end my day with work that exists only on my machine, and git is a fine place to put it, as far as I'm concerned. I routinely make a branch called "phil/stash" that I will commit totally broken code to at the end of the day. Then I rebase/ammend it into shape when I'm ready to PR.

Just squash the junk commits with rebase when you are done. It keeps the history clean and you have many points to revert to.

Same thing for me.

Generally I am against rewriting history unless there is a big mess to fix. For me, git is my work process, and bugs, typos, bad merges and code that doesn't compile is part of it and I don't try to hide it. Personally, I value historical accuracy more than cleanliness.

But some people have compelling arguments for the opposite, like the author. These people tend to view git as a release schedule where every commit is workable code. It is good for bisecting, and git log is your actual changelog. But you lose information about how you solved problem, when you did what, etc... it is also more time consuming to maintain.

You can use a hybrid solution with two parallel branches and merge commits, or you can just use tags.

Git is not very opinionated on how you should work. Merge or rebase, clean or historically accurate, push or pull, etc... There is more than one way to do it.

Some of the sophistication you are 'missing' is there as a solution to scaling problems, to business problems (do you need to track/fix customer/client issues?), or to other problems that may not be as critical for solo devs.

Some of the other is best practices which you are missing out on.

You might want to take a look at what some 'best-practices' are and see which might improve your coding.

Simple things like tagging a commit as "feature/fix/refactor/chore" might make you think differently about your programming workflow. Or you might find it more of a distraction and limitation than a help.

and yes, sometimes you certainly need that 'interim' tag to freeze work. For those rare cases where you run out of time or inspiration before you get to a natural end point of a task.

For me, those cases of running out of time or inspiration before I get to the end of the task are incredibly common (basically a daily occurence). I understand it might be rare for you, but for me (and I would think others) it's the default state of programming.

When I'm really getting going I flow through a ton of work and only stop when I hit a time limit, so I expect to finish in the middle of a task whenever I start coding.

Yeah I'm 13,000 lines into a solo side project and haven't bothered with a single branch+merge. I've got a bunch of tests, but I don't test everything, I don't make sure every commit has tests. Commits are mostly checkpoints of when major feature achieve some kind of initial stability where I want to be able to diff back to last-known-working. I try to do better commits than "WIP" but they're something like "such and such feature now seems to actually work (lots of buggy edge conditions)". I'll throw in a lot of unrelated code cleanup that happens into single commits as well. I focus on moving the needle on the end results though and not having perfect process. Many bits of code that I have which work well enough don't have any tests at all. As I hit bugs I drill into code and fill out the tests that I didn't do. Simple code that is used all over the place and doesn't cause issues may not have any formal tests at all. I rarely actually bother to go back into my own git history, mostly I just use it like a quicksave in case I wind up dying at the next bossfight.

I think what'll get you in trouble more than having perfect process is writing spaghetti code that violates separation of concerns. If things are separated well, you should be able to come back and test it easily if it causes issues. Test the stuff that is complicated and obviously will cause issues if its not perfect. Test the stuff that is found to be buggy or needs to be proven to be not buggy in order to track down bugs. Don't bother with perfectly testing everything.

I've been adding a threaded AVL tree implementation lately. I definitely tested that extensively and did a savegame when the AVL tree was written and passing tests properly, and then added threads and did another couple of savegames. I'm going to build on top of that, and I need to be able to trust it without falling back into debugging it. I've got a Clamp01 function though which takes a double and ensures it is within 0 <= x <= 1 and I don't have that one tested. I'm pretty confident it works though.

Sounds exactly like what I'm doing.

What I wondered was if and when other developers deviate from that workflow. After the first release? After the first collaborator has joined? Never?

Are there textbook developers who use a strict strategy like the one Daniel Stenberg [1] is following from day 1?

[1] https://daniel.haxx.se/blog/2020/11/09/this-is-how-i-git/

I do think that after you release and more or less go "1.0" (even if you don't call it 1.0) you should start treating master as always-releasable. At that point if you have a lot of work to do and need checkpoints, do it on a branch. Same with major breaking change features. Keep master always ready so that you can release for bumping your upstream deps, releasing security fixes, or other interrupt driven housekeeping to stay current.

I mostly do solo work too and for me the main goal is less to make code readable for other people, but for code to be readable for myself in a year from now.

I’ve definitely had situations when I had reverted code to a year back to check if there was a bug. Git was very helpful from that point of view.

That has been my pattern for some hobby projects, however it quickly becomes a mess to keep track of what you have done over a period of time (I have the same bad habit, I commit things so I can freexe my code in case I mess something up).

My hobby project has been to extend a program with a new plugin, but then along the way I found bugs in the core code, and I wanted to upstream the fixes (and the plugin). I am very glad I knew what the fixes where, becasue I more or less had to rebase to the original code in order to untangle the mess of commits I made.

I have also found other folks wanting to use my code, so it also made it much more helpful if ourside folks can see how I altered the original program.

Absolutely nothing wrong with this if you are developing solo. As suggested by a sibling comment, you can always squash if you are looking for a cleaner history, though this is probably isn't necessary if you are never going to share your code. If you are going to share and your important commits are clean, then it's easy to squash.

Another good trick trick is to simply stage things to "freeze" them—You can then `git checkout` any changed files if you want to revert to the staged state. This is useful if you are in working state without a lot of changes but want to run a quick experiment before committing.

I don't think it's an antipattern if you're working by yourself. As you said, it's a safety net that helps your confidence in case something does go wrong. The only thing I'd recommend changing might be to make your commit messages a tiny bit more detailed, even for the interim commits - that way you know what's going on at each commit if you do have to eventually do a `git bisect` to hunt down a regression.

Going a step further and rebasing interactively to tidy up your logs would also accomplish the same goal, but if it's just for your own eyes, it's probably not worth the time.

As long as the commits compile, it's a great coding practice. Unfortunately most development teams haven't caught up to what git actually helps with (and doesn't help with) yet and will block you from doing this, but if you can find a team that's open to actually testing out different practices and seeing what works better then it will serve you very well.

> Is this a terrible coding practice

It is not. I do the same.

I tried to use git the way OP describes and I was taking more time to manage the logistics than to code.

I then hit the "I can obviously call a function in a new commit from one in a previous one". Handling this gracefully means full time work.

My code works 70% of the time anyway si I ended up making efforts to move it up a few percent points than to have a byzantin git tree.

I do the same. I use branches almost exclusively for features that won’t be done for a while and won’t make it into production until then.

If it works for you it's fine. I used Git in different ways depending on the projects, on some solo projects I do like you say (similar to saving a game to be able to go back). For some other solo projects (with a longer lifespan or more critical) I follow Gitflow and I am more strict with the process.

> Is this a terrible coding practice?

Not at all. Copying folders with names like code1, code2, code3 is terrible practice. Using SCM and committing as checkpoints while you work is good coding practice. You have a process, and it works for you.

I don’t think it’s an anti-pattern but you might just need some sort of backup tool with incremental backups and rollback. Sounds like it would likely suit your needs with less overhead.

I do the same if I am doing assignment or writing a single script.

I'd prefer something close to his if I am writing a library or a small app.

If using this pattern it helps to split out your commits, i’d possible. I do this with changelists in intellij/pycharm.

I thought I was the only one using guy like this.

I'm happy for this person if this works well for them. For me, no thanks. One of the reasons I feel much more productive as on my solo project than at work is that I don't have this kind of overhead. I don't need to write tests for everything (I write them just where they add value). I don't need to follow some strict branching standard. I can commit in chunks that make sense to me, and adjust as needed for the situation.

From my perspective, a lot of this kind of rigid process is important and valuable when you are working with a team, but is counterproductive when it's just me. I know the tradeoffs of the corners I cut. I have the experience to know, for example, that's it's not a safe assumption that I will understand my own code a year (or even a month) from now. But the solution to that is to throw in some freeform comments that jog my memory; not to implement a heavyweight documentation system.

Everyone should work in the way that makes them the most productive and happy, but I don't think it's a good idea for solo developers to bring in practices that are designed for team collaboration without really understanding what value they will get from the extra effort.

I used to feel the same way when working on solo projects, eschewing tests, build automation, and even source control.

It wasn't tool long before source control was missed - at it's simplest, it acts like a backup, but having the full history is really useful. Once familiar with SVN (this was 15 or so years ago), it really didn't add any overhead at all.

I thought tests would he too much work to maintain - but looking back, it was clear that it was my lack of experience and understanding that was the problem. The way I was writing tests, laden heavy with mocks, and brittle as a result, was resulting in crap tests. Much later, when I'd finished making terrible mistakes, but had learned a lot from the process, I realised how helpful tests were. I believe they are usually a net benefit - the number of times where I write a test for something that seems really simple, and it fails, is higher than I'd like to admit. Failing fast during dev is far prefer to having to diagnose production issues. And it also means I can refactor with much more confidence. Also, for OSS projects, I think the presence of a good test suite inspires confidence in users.

When working solo, the time taken for manual tasks also irked me - especially when preparing final releases, where I was anxious not to get things wrong - building for different platforms, creating installers, putting together configs, putting together docs, code signing, running things through Virus Total, publishing binaries... it took soooo long, and was error prone. In the beginning, automating this kind of stuff seemed like a mountain, but again with experience came competence and confidence. And the time savings were great, and the extra confidence around releases was huge.

Each to their own of course, but for anything non-trivial I wouldn't be without source control, tests or build automation.

I was going to post that on my personal projects I'm intentionally fast and loose with git and then saw your post which sums up my thoughts.

With personal projects, the most important thing to maintain is interest and momentum. Best practices aren't so useful if you end up hating working on your project because of self-inflicted process.

Certainly, if you are using your project to improve best practices or learn "correct" ways of doing things for some other long-term career benefit, go for it. What I've learned, however, is that personal projects where I've spent more time doing "meta-work" were the ones where I never shipped anything or just spun my wheels feeling proud of the form of the project. The projects where I just threw caution to the wind and cut corners strategically (not everywhere, mind you) were the ones where I ended up shipping something.

I agree wholeheartedly.

I was a solo developer. We "transitioned" rather abruptly to the kind of workflow you would expect from an organization with hundreds of developers once we hired a couple more programmers, despite it being a poor idea, because we were still spread out among so many different projects. In retrospect, all of this turned out to be resume-polishing and practice runs for one of the developers and my manager; they blasted off to large organizations rather promptly.

All of it left a bad taste in my mouth and some rather negative feelings associated with git, which certainly are not helped by git's porcelain. There's an element of cargo culting against the practices of big SV organizations, but there's a very long tail of solo developers out there, and figuring out where you sit and what tradeoffs are required can be tough against the constant din of The Way Things Are Done.

Definitely--when working solo I tend to only write tests for things I actually want to run in isolation (and usually just to save time running the full codebase) or to establish some concrete expectations that I need to be aware of changes to (e.g. specific user scenarios that are critical to the thing working). These tend to overlap a lot.

When I was a new developer and first learned Git, I felt compelled to use Git the "right" way. That is... small commits with clean messages, things described in this and other posts.

But, as I spent more time programming, I developed different Git habits for different situations:

- Solo explorative work: working on a feature, many small commit messages create nothing but noise. Trying to come up with wording for these commit messages is mostly a drain on my mental energy: there is an infinitesimal chance I'll need these commits later. In such cases, I prefer to maintain large "checkpoint" work-in-progress commits that are the result of near-continuous `git commit --amend`s so that I can use `git status` to see what's changed since my last checkpoint, and easily revert to the last checkpoint. If this fails me, I can almost just as easily refer to the reflog to find some changes. The reflog, in my opinion, is an extremely under-utilized tool. `git diff [email protected]{2}[email protected]{0}` allows me to see what's happened in the the last two times that I updated my checkpoint. Since this is active work I'm doing right now, I have a good sense of what happened and when, all without tiresomely writing commit messages that will be a bunch of gibberish in two days. When I'm satisfied with with the work on a checkpoint commit, I reset the branch to the previous real commit, and then incrementally create real, meaningful commits from the accumulated work, that have some chance to be useful to me in the future.

- Working on patches and bug fixes for live systems or production libraries: small, atomic commits that include tests for the patch are the only way to go. I believe the "correct Git" approach that is widely espouses is targeted to this use case, but not explorative feature work. Or maybe that's just me.

Wondering if anyone else codes like this.

I do lots of small commits as save checkpoints and new branches for each point of exploration. Rebase/reset to create clean commits. Each final commit is a complete thought.

Yeah sometimes the small commits with message is useful for me too instead of amending a big accumulated check. Doing this now, in fact. It's all so situation dependent.

What’s the point of using the reflog? Just make multiple commits instead of amending a single one

Writing commit messages for checkpoint commits is a drag and doesn't provide value for me. Much prefer just to amend and use the reflog whenever I (rarely) need to look at work-in-progress history.

Yea... no.

An overly clean git history for me is a sign of too much perfectionism and greatly reduced productivity.

When I code I usually have a general idea of the stuff I want to include in my branch, but then I stumble upon bugs or code couplings which I need to fix for my feature to work. And then I include the fix into my feature branch, because it's just tedious to switch all the time and create 5 interdependent branches that need to be merged together anyway. Also as long as the feature branch itself is fairly clean then I don't give a rats ass about atomic commits.

And commits having to pass tests is just ludicrous. That's what the tests are for, so you can fix it before merging the branch. Don't go crazy on the commit level..

It depends a bit on the project, and how public it is. But in the end your git history never provides any benefit to customers and doesn't make your code better by itself. I hardly ever rewrite or rebase commits unless there's a good reason for it.

Kind of disagree.

As a developer who works with code base that's got a git history going back over a decade and lot of legacy parts that haven't been touched in a long while, written by devs who've long since left, I fairly frequently wish they'd been more careful with the commit history.

You do mention project context as being relevant, but if bug fixes and refactoring _can_ be decoupled, it's a kindness to pull-request reviewers (if there are any) to do so and keep PRs small. It's also helpful if something needs to be rolled back if the commit history is fairly coherent.

Well I'll admit there is still a balance to it.

I'll just say that the emphasis should be on clean branches and PR's much more than clean individual commits. And on good code much more than clean git branches.

If there's too much ceremony around branches and pull requests I tend to avoid small fixes because it's just much work and that definitely doesn't improve the quality of the code.

Why are you comparing advice meant for _solo projects_ to a 10 year old legacy codebase written by many different developers?

As long as the bugfix is at least in a commit of its own so it can be cherry-picked into another branch if needed, that should work too...

The easy mode is just doing the work in another branch/merge request and then ticking the "squash on merge" checkbox. You may or may not want this depending on how important the individual commits are but sometimes you end up with 50 "fix it" commits that add no value.

> What if I told you Git can be a valuable tool without ever setting up a remote repository?

Even if you don't set up a public repository, having a remote one can be a good idea. The .git folder is surprisingly easy to trash. Any box with SSH access works, and GitHub/GitLab offer private projects that you don't have to configure further than a name.

My latest mistake was running 'rclone sync', which it turns out deletes files missing from the source without confirmation contrary to rsync.

The "every commit must be independent" ideology sounds nice when you write it down, but often you'll have such large changes that in order to make them independent you'd have to basically finish an entire feature. In those cases I tend to just indicate WIP (work in progress) in my commit messages with a brief snippet about what progress was made.

This way, if you skip all the WIP commits you'll have mostly runnable code, and if you want to review a PR commit-by-commit it's easier to see what was being attempted in each commit than to only look at the feature as a whole.

When I'm working solo, the only reason I use git is to sync my code between my desktop and laptop, and to "back it up" to my remote server for peace of mind.

I have never in all of my years working solo on projects needed to revert my history to debug a problem (excluding CTRL+Z of course!). If I am experimenting with something that I don't think will work, then I use a branch.

The amount of work to maintain a clean history and disciplined git practices is not worth it. For non-solo projects with even just 1 extra developer, then totally. Otherwise, you're just wasting your time...IMO.

But honestly, the nice thing about solo work is that you can do whatever you want, and confidently ignore people who think they know how you should work better than you do. If this helps you be more productive or organized, then go for it. And worst case scenario, you're less productive with your project, but you become a git wizard.

I try to keep my commits as compact as possible instead of just saying "yolo here is 9000 lines of code". But, there are times when I am just crushing through the initial bits of a project where it just keeps getting in the way.

+1 to the sentiment. I will say I've occasionally used git-revert, but it's usually when I'm half-expecting to revert it but want to see how it behaves in prod.

As others have mentioned, trying to keep your commits atomic while simultaneously working on several features at once is basically impossible since they're immutable. And given that modern source-control platforms (e.g., GitHub) support squashing on merge, it's pretty much unnecessary. You get "atomic commits", "every commit has tests", even "clean git history" just by squashing your PR's on merge, so PR's become atomic units of work. Which makes sense! Every PR is an incremental addition to a project that is reviewed as a unit and committed all at once.

Small incremental commits on a feature branch, which allow for fine-grained development and review.

Large squashed commits on the main branch, each representing an approved and merged PR, which allow for a reasonable history of features and fixes.

Nothing says "readable history" like 1000 lines in a single commit.

Better than pretending as if ten 100-line commits are atomic!

I have a quite strict solo-SCM workflow as well (including PRs!).

The reason is that the SCM is not just a tool - the history of the codebase follows the thought process of the developer.

More structured history == more structured thinking. Cleaner history == cleaner thinking.

Documentation (traceability etc.) is certainly a benefit, but in this sense, it's part of the smaller picture.

This comment reminds me of writing structured articles: writing structures your thinking. I guess having things in a clear workflow does the same thing.

As somebody who will occasionally have a spurt of 2 weeks hard work on a side project in the evenings, before putting it back down for 3 months, I think branches and tests are well worth it. As the saying goes, there's nothing worse than reading code you wrote a year ago.

Having every commit in a branch pass all tests is a little overkill, but gating merges to master/main on tests passing and then having branches squash before merging seems like a happy middle ground. At least that way every commit in master/main has passed tests.

GitHub too is useful for solo work: you can code review your own code.

I’ve found it useful because it’s in a separate context/UI (GitHub web view) as opposed to your code editor or even git diff on command line.

I tell people all the time not to underestimate the value of PRing your own code even if you are the only one reviewing the code. You can still use GitHub's merge time checks (including setting up your CI/CD builds and PR integration), even if it is just a solo project. You can use PR merge commits as your "clean level" instead of worrying as much about rebase/squash (and tools like --first-parent in git log and git bisect from your main branch).

I always do a file by file sweep of my merge requests on gitlab before sending them to peer review. Usually it only finds commented code or console logs but its worth it and pretty fast to do a final check.

Some developers favor tools other than Git. For example, D. Richard Hipp, the developer of SQlite, uses the Fossil Distributed SCM to some success https://fossil-scm.org/

This seems like a classic case wherein messy execution should be followed immediately by meticulous cleanup.

Messy execution exploits top-of-mind opportunities without permitting administrative overhead chores to distract.

Immediate meticulous cleanup constructs an idealized legible history, with the advantage of familiar recency.

The human brain itself works this way, consolidating long-term memory overnight during sleep.

The next question is how best to implement this workflow in git.

One option would be to use complex arcane git commands to transform a messy actual work history into an idealized legible official record. Even if the user performs this transformation perfectly, at minimum it causes a loss of information about the actual work history, by altering messy commits.

Therefore it's better to write completely new commits for the official record. One's idiosyncratic work history doesn't belong in the public collaboration git repo.

I find it easier to use separate git repos, one personal and the other collaborative. I transfer info between them only via manually syncing the working trees.

It may take syncing from multiple personal commits to update the official record sensibly, which sounds like a burden, until compared to the alternative of trying to understand a mysterious ancient official commit embracing multiple unrelated changes.

Code spelunkers unsatisfied with the terseness of the official record should be free to investigate the contributing dev's personal repo to sort through his chaos for clues.

I do a lot of solo development and I always use git. I'm definitely a lot less disciplined about small atomic commits than I am when collaborating, but Git is still very valuable to me.

I use it for a few things:

1. As a collection of notes describing what was in my mind when I wrote that code 2. To allow me to work on different machines. For example I have an app where some of the work I do is directly on my staging server (long story). It's nice to be able to commit and pull there and to my local workstation 3. So I can rollback to known good states. Doesn't happen often but has saved me a few times over the years!

Whether a branch should do one thing or a collection of things, for me, depends on how big/complex is the thing.

If I’m making a very small change, e.g., a fix, I’ll work on master directly (personal work only! Shared work always uses the way we’ve agreed to work!).

A slightly more complex change or an exploration or experimental change will get its own branch.

A very complex change will have a base branch and feature branches off that base, possibly with issues, one per feature, merged into the base branch, which will eventually be rebased on then merged into master.

Just curious, I'm also solo developer. I push my local development to git (bitbucket), then do a git pull in production server to sync the two. Is this how most people do it?

The only downside I found, I need to reboot the server for the django app to update to the changes. So I take my site offline for 3 minutes or so.

I use docker extensively, personally. Modern web app frameworks, especially those in python and ruby, are super annoying (imo/ime) to operate on a bare metal host, because how pip/gem install dependencies by default is just a mess that's impossible to isolate. Pre-docker, I had no end of headache where touching any deployed thing broke all the other sites due to dependency garbage. Rbenv/pipenv/etcetc are language-specific, non-trivial band-aids and I don't like wasting brain cycles on them.

Docker makes it so easy to deploy and operate programs that I even use it for ecosystems that don't "need" it, like Java. Also makes backups super easy because it's just backing up the docker volumes.

Agreed, I once worked at a place that used saltstack to manage rails servers and it would take me an entire day to set up a new server with the tools it needs and getting the app running. Now with docker its so trivial to get something running on the server and even multiple versions of ruby for different apps.

It's not necessarily bad, but there's plenty of steps to automate away, with two ideas:

* You can use your production server as a "git server" without basically any overhead

* You can set up scripts that run on git events.

Basically, you can push directly to your production server and then deploy using that those event listeners

Here's a better explainer


As for having to reboot your server every time, I have no idea, although that seems like a long time. I'd expect less than a minute, but I don't know much about django

Other comments mention Docker and Kubernetes which feels like overkill for what you describe. Especially Kubernetes.

Check out bitbucket pipelines that trigger on a commit which should be able to build/run any tests, copy the code to your server, and then restart django.

From there you can later build a docker image and copy it to run wherever, etc...

Most decent CI tools will just clone the one branch needed for deploying and build/deploy from there. Cloning it fresh each time can rule out artifacts from past builds having an impact on the current deploy.

I used to do that before the introduction of docker+kubernetes. Once you've incorporated this into your solo workflow, you will never look back.

This is a nice blog post and it may make you an effective Git user as a solo developer and that's about it.

I'm sorry when I have to squeeze as much as possible into the limited time I get for my side projects after a long day of work as a professional developer - I have one rule: get stuff (that matters) done.

Git is designed to make it easy to restructure commits (rebasing, resetting, selective adding ) . Don’t be so obsessive with it . Track your work and at the end of the day , restructure your work so that you can manage it well (for posterity or collaboration )

I have been using version control of one flavor or another since I started down this career path in the mid 1980s.

My home projects went from SCCS to CVS to SVN to GIT over the years. But there was always some form of version control, even for my home projects. I generally followed what I was using in day to day work, but with out all the process modeling, just the base version control.

Briefly in the mid 1990s I also dabbled in 3DFS and Plan9 for date based file systems. Those sort of negated the need for explicit version tracking systems, but neither idea endured the test of time.

Anyone know a clean way to have nested git projects? Every time I make a commit in B (the nested git project), there are changes in A. Previous searching of a solution was hard to understand..

Perhaps you would like one of the monorepo management tools like Lerna?

In general, the best advice is to avoid nested git projects as much as possible (even though tools like git submodules exist, they are more footguns than you want them to be). You either want to reorganize your folder tree so that your git projects are only ever side-by-side rather than nested, or that there is only one repo for all of them ("monorepo"), depending on your preferences and when/how you expect to share them.

Perhaps you're looking for something like submodules?

Just create a Git submodule, and then push from within the submodule, like so: https://stackoverflow.com/a/5814351.

Ten years ago I was told to never use Git submodules.

Five years ago I inherited a project that had four repos sharing a Git submodule.

Now I tell people to never use Git submodules.

I think it very much depends how you use them. A great option to manage submodules is TwoSigma's "git-meta" https://github.com/twosigma/git-meta.

There's also just "meta" which is a cleaner way to approach the functionality of submodules https://github.com/mateodelnorte/meta.

I've not used them myself but a lot of people strongly dislike git submodules. A recent thread on the topic: https://news.ycombinator.com/item?id=26165445

I’ve some ideas based on recent work, but I want to make sure I understand your use case: should changes to A and B be independent, but A and B require each other to do useful work?

While mid-reply and defining what the problem is, I realized a solution would be to just ignore the nested git project B, and deal with them separately. I must admit my original question wasn't well thought out, as I can't recall why it was important that there be a relationship between A and B at all, even if they are of the same project.

Mid-reply context preserved below.


I have a git-inited folder A, called 'Build my web app'. Inside it, it has non-source-code stuff like pictures, pdfs, notes. Also inside it is a folder B, called 'my-web-app-js', which is source code.

[Problem was that making a commit in project B would trigger unexpected changes in A]

The non-source code stuff can be ignored via .gitignore (more on that in a moment), assuming you do not want it in the repo at all.

Your 'B' use case sounds like one we have right now: A framework, if you will ('A') and apps delivered via that framework ('B', 'C', etc.). Eventually, we will deliver A as a package and the apps as plug-ins to that package (more or less) but right now they all live together (npm run start in A brings in the apps in B, C, etc.).

Our top-level folder has a .gitignore that looks like this:

where the items preceded by ! are all in the A repo, things we want git to consider when in A but not in B, C, etc.

In A's top-level folder, do a git clone of the repos for B, C, etc., and their folders will be created in that folder, but

  1. All git operations in A will ignore B, C, etc., because of the .gitignore, and

  2. Operations in B, C, etc., will ignore A, because it is "outside"
Works well enough for now.

Damn, I must say that it's rare that I do find something so compelling yet so radically different to how I usually work[1]. Which is not to say that I think I'm an expert; if anything, it's because I don't know how much I don't know.

I must try this.

[1]: Pretty much the usual gitflow with sequential commits of partial work on a branch—often with commits fixing previous errors in the branch—until I deem it finished.

Heh. On my teams, I _am_ "that one guy who knows more about Git" :)

I recently wrote an extended post that aims to help devs understand how Git works, available Git commands, and techniques for using Git effectively:


Relevant useful tool: diff2html - a CLI that lets you quickly see an HTML output of all uncommitted changes you've made (or compare against a branch).


I have an alias `alias diff='diff2html -s side --ig package-lock.json'` which shows a side-by-side comparison of my changes. Highly recommend!

FYI Git has a difftool setting/feature specifically for doing diffs through an external command.

Git is way too complicated for a solo developer. It's great for Linus' Linux code management - that doesn't mean it's great for every single developer situation.

As a solo developer/potentially newbie, I think it's better to spend brain cycles on actually learning programming, than to learn the idiosyncrasies of some crazily complex tool like git.

git init git add . git commit . -m uh git push

Doesn't look too hard to me. Particularly when I'm going to rewrite my system, I'll go ahead and check out a new Branch so I don't freak out when I break everything. Infact I tell a developer to learn git first.

The only time git becomes an issue is when you have large binary objects, like with a video game. Git LFS is pain.

I figure it takes 3-4 weeks for a typical newbie developer to learn git in depth. That's too much compared to the benefits. Git is way too complex and the CLI is badly designed, especially for a solo developer.

The crux with git is that you really do have to learn it in depth, if you want to be self-sufficent in the end.

>The crux with git is that you really do have to learn it in depth

You don't though. I was using git for years at a 2 developer shop and I don't think I used anything but commit, pull, push, add. It was very trivial, never even made a branch.

Then when I moved to a bigger place I had to learn about cherry-pick, revert, reset, bisect, checkout, etc but every single one of those things I learned was immensely useful.

You can use GitHub when you're getting started.

I found get to be an extremely useful tool when I needed to go back and look over some of the changes I made.

No, that doesn't help much in understanding git to the level that you're self-sufficent.

It's enough to back up your work and send the code to potential employers

Learning to commit and push doesn't take that long, that's enough to maintain history and backup, which is what most sole developers need from git in the beginning. Then you learn branch or diff for cases when you need it. Then when you need to revert back to some older version you learn that. Then when you need to automate deployments you learn a bit about tagging or some branch structure for development, bug fixing, deployment whatever. You don't have to learn it all from the beginning, you learn as you go, that's at least how it works for me. Then you might learn bits and pieces for one time use and forget it later because you don't need to use it again for a year. It is just the normal pace of learning git, you don't have to grok it all at once

Nice article. I have a similar one where I have 2 line explanations of more nuanced Git commands for some situations.

Git Wizardry: Obscure but useful Git incantations https://legends2k.github.io/note/git_nuances/

My very first programming project, porting https://github.com/ericson2314/voxlap, was rebase-heavy git to always be able to bisect my many mistakes. Other people tried to contribute but it didn't go too well!

You really need to be in a git-first mindset to follow these. Not saying that's a bad thing, but for me at least as a solo dev it often comes as an afterthought and breaks pretty much all the rules.

When developing only with yourself, git quickly ends up becoming a cloud backup. It shouldn't but...

As a beginner, I am comfortable with git push, pull, commit. However, it is hard to realize whether a practice is good or not. For instance, I do not know if a rebase or squash is an accepted practice (or under which scenario is a good idea).

also, git stash is your friend. and after you stash, it isn't always 'git pop', but possibly 'git apply'

There are times when I have three (or more) stashes stacked up on any given branch. I know I have reached a cohesive set of changes when I am ready to 'git stash drop' every one.

This is a good idea how to do git in a team. For solo use, you can actually give yourself a little bit more slack.

I love this title.

One of the neatest aspects of git that I never see used is delta debugging, where git automatically finds the code that introduced a bug through binary searching the commit history. This requires many small commits, which is tedious.

Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact