Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The question is though, why would you even bother changing public history, even if you can work around the practical problems?

In my view, the concept of a commit mapping exactly to a functional change, and therefore being able to be correct or incorrect, improved, etc, is going against the grain of what revision control is. A commit just is what it is. If it contains a typo, a bug, etc, you notice and fix it 2 days later and that's another commit. Git just describes what happens. What is the utility in pretending that didn't happen and rewriting the history of changes as if you never made that mistake? Who benefits?

If you are concerned about keeping master 'stable' so that checking out any commit will result in a clean, working codebase, you can use abstractions on top such as tags to point out to people which commits are good and/or bad.

I get the idea of a stable, neat git history as though you were all knowing and perfect is comforting, but it's also nonsense and trying to attain that is just wasted effort. Just let git describe what actually happened, yes it's chaotic, yes there is constant rapid iteration, mistakes made and corrected etc, but that's just the process of building stuff. That's the reason you shouldn't rewrite history. There are pragmatic exceptions, though, like writing out egregious errors like committing security keys that can't be quickly changed.



I'm convinced that the obsession with rewriting history is solely due to inadequate tools. Git doesnt keep the name of a branch after it's merged, so people want to make merges look like a single commit on top so that they don't face this ambiguity. Github doesnt even display the branching structure in its commit log, which also shows a woefully small number of commits per page, further incentivising squashing/editing. Many tools (some are better than others) display commit history in a similarly non-dense way or in a way that implicitly discourages branching in some way, e.g. gitk doesn't even display commits from other branches by default. Large numbers of commits are also unwieldy when commits are hashes that cannot be ordered mentally just by looking at them.

Over in mercurial land people are more likely to keep history, even though history rewriting is not only equally powerful, but more safe than git via the 'evolve' extension. We can limit our bisecting to a single branch, such as a stable branch or the default (mercurial parlance for 'master') branch, skipping over commits in feature branches that have been merged in. We can do this because the branches retain their identities post-merge. The most widely-used tool, tortoisehg, displays large numbers of commits densely, with the full tree structure and branch names on display by default. Commits can be referred to via their hash or by a simple incrementing integer (which is only valid on your local clone, but still, this makes things easier for local work).

So we keep all those typo commits - they're usually in feature branches anyway since we don't merge until features are done and we try to keep the default branch functioning. If a merge breaks something, we bisect on the default branch only, which will tell us which merge commit broke it.

I'm still sad that git won the VCS wars over mercurial.


> Github doesnt even display the branching structure in its commit log, which also shows a woefully small number of commits per page, further incentivising squashing/editing. Many tools (some are better than others) display commit history in a similarly non-dense way or in a way that implicitly discourages branching in some way, e.g. gitk doesn't even display commits from other branches by default.

GitLab and Fisheye usually display very well the graph of branching&merges

Also, I have this wonderfull git alias :

lg = log --color --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%an>%Creset' --abbrev-commit --date-order


> I'm convinced that the obsession with rewriting history is solely due to inadequate tools.

The article at the top literally said explicitly never rewrite public history, so what obsession are you talking about exactly? Git has what you want as long as you don’t mistake local operations before push as “history”, and instead only consider history to be commits that have been shared with other people. That makes more sense anyway, there’s nothing sacred to preserve in the arbitrary, noisy sequence of things I did while I was bumbling around on my machine before I push.

Git was designed with a toolset that shows every commit and lets you clean up your own work before you contribute it to public history. Its tools work well when you understand git’s design and use it the way it was intended. Git is not Mercurial, though, that’s true. Perforce isn’t Mercurial either.

Git can limit bisect to a single branch, and normally does skip branches until you want to descend into them. Don’t confuse losing the branch name with losing the branch, git doesn’t lose the branches, only the names, and only if you delete the names.


I'm talking about attitudes in general and not disagreeing with the post.

I agree with the advice never to rewrite public history, and I totally agree with Linus's approach. He is in the minority with this attitude though, since never rewriting public history means never doing a squash merge and never rebasing a merge/pull request at merge time (both of which are common practice). I suspect even people who endorse the idea of never rewriting public history kind of don't think of the fork from which a pull request is coming as 'public' even if it literally is.

I love the kernel's "keep-all" approach and want more people to use it, I bet if they did the tools would improve to actually work better with that style - whereas right now I think the tools are driving the workflow instead.


> right now I think the tools are driving the workflow instead.

Okay that's fair, I think that's true. To some degree it has to be true to matter which tools you use, right? Even if it's Mercurial.

I haven't personally seen squash merges and rebasing pull requests being used on pull requests of large multi-person branches very commonly, are you saying that's common? I agree that there's common practice of using squash merges and rebasing on private branches, or branches that contain commits by only a single person and contain only code commits.

I'm looking for clarity, not disagreeing with you. The 'principled' argument for never using rebase is almost always attacking the branching practices of individuals and not teams. There definitely is a fuzzy line between pushing to your own branch that is visible to others, but nobody else touches. I'd normally consider that case private, not public, even if it's "literally" public.

I don't feel like I'm hearing what the tangible advantages of never modifying history are. Why is history considered more sacred than clarity of semantic intent? People make mistakes and noise, a lot, why shouldn't the tooling allowing fixing mistakes and cleaning up irrelevant noise after the fact, as long as it doesn't affect others?

Edit: I'm realizing another conceptual line to draw beyond what makes history "public": the question is one of whether you're going to rewrite history out from underneath other people. If not, and you're the only person affected, then you made the local history in the first place, there's no principled reason to prevent you from updating your own work, because it's equivalent to making the same change before committing. If your rewrite is modifying commits that other people already have, then you're inflicting damage on other people. You may cause them to have merge conflicts, you may be modifying code dependencies they're working on but haven't pushed, it's bad for very practical reasons. Using this lens of what other people depend on, does that help clarify your examples of squash merges and rebased pull requests?


> I haven't personally seen squash merges and rebasing pull requests being used on pull requests of large multi-person branches very commonly, are you saying that's common?

I have. Github makes it quite easy to fall into this.

> Why is history considered more sacred than clarity of semantic intent? People make mistakes and noise, a lot, why shouldn't the tooling allowing fixing mistakes and cleaning up irrelevant noise after the fact, as long as it doesn't affect others?

I've got a concrete example of where it causes problems: code reviews. If you've reviewed a branch at a specific commit, and standard practice is to squash merge into master, or to otherwise allow rebases after the review point, you lose the confidence that what's on master is actually what was reviewed. I've seen cases where people got into the habit of getting reviews done, then doing a squash rebase locally, and including tidy-up commits which had never been seen by anyone else before merging straight into master.

If you're in an environment where the rule is that Everything Must Be Reviewed, that's a problem: it's far too easy for an accidental bug to end up on master despite the code reviews and the automated tests on the preceding branch being green.

With the example above, I never would have seen the problem unless I'd been trying to use the git history to measure some statistics about how long it was taking us to get code reviews done. It was only because I was looking at the history commit by commit that it jumped out.


That's a good example, IMO, and yeah it should be very much frowned on (or outright disallowed) to modify an approved code review before pushing without further review. That is kind-of a code review workflow problem, more than a discussion of whether rebase should "never" be used though, right?

The company I work for now has both notifications for commits in code reviews, so everyone sees if you modify something after it being approved, and some repos also have lockdown features where the approved review is tagged and cannot be checked in if modified. So this can be solved with some tooling around code reviews, and git itself doesn't exactly add up to a modern code review toolset. This may be as much or more of a Github problem than a git problem... acknowledging that there's a large swath of developers that doesn't really know the difference between them.


It's a bit of both. If you don't have a strong "thou shalt not rebase" culture, it can be difficult to get people to accept the inconvenience of getting re-reviews on the branch they've just committed a typo-fix to, so you end up leaning on more complex tooling to force the issue.


> Git doesnt keep the name of a branch after it's merged...

Eh? This is trivial to change by specifying the "--no-ff" option to `git merge`, or by setting the config option "merge.ff" to false.


That's not what I'm talking about - after a no-ff merge, sure you can tell that there were two branches, but how can you tell which was which? Which was master and which was the feature branch?

You can use convention to store this information, e.g. the first parent is always master, or you can put the info in the merge commit message. But it's hacky.

In mercurial the branch name lives on forever, attached to those commits whether they are merged or not. "Closing" a branch in mercurial is just a hint that it isn't going to be used for the time being and so shouldn't be listed in tools that list branches, but doesn't actually remove the label from previous commits. So the commit history has the branch names still after a merge. This way you can say "show me all commits in the master branch" (as distinct from "show me all ancestors of the tip of the master branch") and this will exclude feature branches, and is ideal for bisecting.


I mean...ok, I guess if you allow committing directly to master, and you allow merging in both directions...sure, I see your point.

In our repos, we allow neither, so all non-merge commits are, by definition, on a feature branch on the right-hand-side.

I guess one person's hacky convention is another's primary workflow. ¯\_(ツ)_/¯


Does this mean you can't have merge commits in feature branches?

And what about maintaining a release branch? There are good reasons to be merging in both directions sometimes, even if you never commit directly to master.


Who benefits?

We do, simply because it's less cognitive overhead to read log/blame output when it's less chaotic. This doesn't mean that there's zero chaos in the commit history of course. But less than when we'd just never fix simple mistakes right away.


I figured this benefit would be much greater in a public-facing repo. After all, there’s often a few commits fixing minor linting errors etc just before the final merge. Rebasing these away is reducing noise by alot.

Rebasing on master/stable/release branches is another story altogether.


Because git bisect.


Accidental commits/pushes do happen, too. Keys, connection strings, even just mildly embarrassing comments that might show your penchant for Taylor Swift lyrics. Impossible to remove those completely, as far as I understand it, from existing repositories, but new clones it is possible, and it's a damn sight preferable to a commit with message "oops, didn't mean to share this" which _duplicates_ the accident.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: