This is a great idea. Interestingly, Google has a tool that will analyse your git history and identify "hotspots" i.e code that is regularly associated with commit messages with words like "fix".
I'm wondering if the same general idea is applicable to other types of commits given your list. For example, if you are regularly adding features and a certain part of the code base is touched, perhaps with a lower ratio of "refactor" commits, that code could be a solid candidate for refactoring.
Not a bad idea actually. It's like the PowerShell 'approved verbs'. At first I was like 'meh' but after a while it makes sense as it greatly improves discoverability. Looking at a couple of repositories I contribute to this also looks close to what 'good' committers tend to use automatically.
The current buzzword for feature flags is 'branch by abstraction'.
The idea is that instead of making a version control branch to change feature A to feature B and then merging back into the mainline of development, you build an abstraction over the thing you want to change, build a new implementation of that abstraction, switch out the two and then (if you like) remove the abstraction, all within the main line of development.
So instead of history that looks like this:
* Merge branch 'feature-cookie-login' into master
|\
| * Polish up cookie feature
| |
. * Switch from tokens to cookies
. |
. * Clean up and refactor login code
|/
.
.
.
Your history looks like this:
* Stop abstracting the authentication type.
|
* Switch from auth tokens to session cookies
|
* Add a SessionCookie authentication type.
|
* Start abstracting the authentication tokens as a generic authentication type
|
.
.
.
But with any completely arbitrary commits interspersed between those commits, as none of them break other code. The first one creates an interface, the second one reimplements the interface, the third switches the used implementation and the optional fourth removes the abstraction and deletes the old implementation.
The idea is usually to allow committing partially completed or deployed features, which are hidden by config flags until you're ready to activate them. When the feature is fully baked and effectively always on in production, you just remove the flag to make it permanent.
Using feature flags can increase a team's productivity by encouraging multiple commits a day, every day. It can also make rollbacks faster.
For my personal projects I often commit unfinished stuff. Using the kinds of flags you describe sounds like a good idea for projects where more people are involved indeed. Thanks.
well, it works for anything that you deploy often. some features are just big, and you don't want to have something sitting there undeployed for weeks. it helps to have bigger teams, because merging and deploying often is obviously good, but merging and deploying often help to make sure your individual parts don't slow anything down/etc, especially if the feature isn't self contained
I particularly like this because it doesn't interfere with the flow of the commit message's first line in explaining what it does. There are too many commits out there that waste half the first line with the ticket number, area of code, etc.
Instead of 'JAT-1241: app/index.js(opt): Optimised the index', 'Optimised the index' should be fine. Tools can understand that, and can already work out which files changed.
Some of the active verbs are also commands that automatically close/reference issues right out of the box on GitHub & BitBucket (& I'm sure on GitLab too)
This is a really nice way doing commits. It's the kind of simplicity that can be made into a simple document and shared easily, or printed out and put somewhere everyone can see.
Might try and get my team into using this.
At the moment our teams commit messages are a mangled mess of everyone's own 'commit language'. It can be really tricky to quickly scan over commit logs and get a feel of where development has been heading over the last x weeks.
I use similar strategy with my team. In addition I ask them to summarize in one line the job they are going to do before starting the job... which is related to the task description in the task board. That is normally the commit message. Works (most of the time...).
How do you enforce such commits messages? People makes mistakes, or forget stuff. But when you have a pull request, all intermediate commits are already pushed to central repository. They already are public. You can't change them anymore. Pre-commit hook?
The commit-msg hook. You can use it to validate your project state or commit message before allowing a commit to go through. The git docs demonstrate using this hook to check that your commit message is conformant to a required pattern.
There are so many articles like this, and all of them focus so much on prescriptive rules for commit messages. There's a similar set of articles on how to deal with branching and merging. Somehow, everybody comes to slightly different conclusions.
The discussion that needs to happen before this is to understand what tools you want to make available to your developers in the future. Using git's history as a first class debugging tool is powerful, but it's by no means mandatory to provide. There's also a real cost to providing each of the tools.
- Do you want bisect to be available? Well, then you should have most commits represent a fully-functional version of the software. Consider squashing branches when you merge them.
- Do you want narrative documentation around strange choices? Fine-grained commits are a great place to put those thoughts, but they may discourage devs from writing those thoughts in inline comments.
- Do you want ownership via git blame? Line-by-line changes may help you identify who wrote the code, but that might prevent your developers from ever fully transferring ownership, which could create bottlenecks in startups that have a few long-tenure devs and a lot of recently hired devs.
I really like to think of git history as a context tool, like monitoring or unit testing or documentation. It's worthwhile to sit down with your team, define what you want them to be able to do with commit history, and build your commit style from there.
My personal, subjective impression: Commits are getting smaller and smaller nowadays. As in: In the subversion days, many people commited only few times a day, sometimes not for several days. SVN commits of course involved a sync with the server (a "push" in git lingo), and thus usually represented a much larger increment with a substantial change to the code base [X]
With git, it became very common to structure changes to a code base in many, very small commits. Rename a variable? Commit. Write some docs? Commit. Of course, the overall changes when developing a feature did not become smaller, they are now just distributed over many more commits. So I'd argue that a SVN commit was often conceptionally closer to what we now have with a git pull-request.
Why does this matter? Because It is kind of hard and not helping anyone if you describe your renaming of a local variable with an extensive docstring.
What I do miss however, is a good description of the overall change. I.e. now often the description in the merge commit is just the autogenerated message, but this is where I would like people to really take the time and describe the change extensively. This is why I like `--squash` merges, because they let people focus on the relevant parts in their description. I know, rewriting history is bad, but overall, I favour reading a history book than 18th century newspapers.
[X] not saying that there weren't small one-line-change commits, but overall they were rarer.
Never thought of that usage of merge commits. This is a great place to write the couple paragraphs that you might have in a Pull Request, better than squashing IMO.
I've found that for smaller commits, if you have something long you want to explain in the commit message body... you should probably put it in a code comment!
If you don't think it merits a code comment, it's probably not important enough for people to look up the commit message body either (if only because the commit message body is less likely to be seen).
Changing public history is bad, because it makes collaboration and two devs working on one branch harder.
But I do not see a problem with rewriting history on a branch, if (and only if) you kind of know that no one else is pulling the changes. Or, when merging a PR, a rewrite is okay too, if the next feature will be branched off of the trunk, too.
Also, mercurial's tooling seems to help https://www.mercurial-scm.org/wiki/ChangesetEvolution with rewritten history by making it easier to track history rewrites. Basically I think this is a path in version control systems worth exploring.
Not only not a problem, but a must in my book and I'm fairly sure I'm not alone. For me it's like a new workflow which I always wanted but never could have without git. A lot of days for me now consist of creating a lot of small commits and then every couple of hours when a single 'thing' is finished, start an interactive rebase and create a storyline which is easy to read, understand and follow. This can be even one commit sometimes if it makes sense. And in repos I manage myself an if the change spans several days it's usually big and I might create a seperate branch and have a merge commit so it's extra clear all commits belong to feature/xxx.
I find tons of small commits a clutter and waste of time. I don't see any reason for doing so. On the contrary I can see disadvantage - reading and understanding a history later may become difficult task. After all what counts is your full chunk of work, reviewed via pull request, and merged to master. It should be treated as a whole.
Has it really become so common with git? I don't see such trend around me.
>On the contrary I can see disadvantage - reading and understanding a history later may become difficult task.
I'm replying to you but this is directed at everybody who advocates squash merge and discourages small commits.
IMO this is a tooling problem, plain and simple. When I am committing to Git, I am using the "write" components of Git which are incredibly powerful. I can commit in as small a chunk as I want and preserve the richest history of all the small changes I've made, knowing full well that the state of the code at HEAD will not be degraded for doing so. If I make two small independent changes, I can feel free to branch them separately and then merge them together to show that they could have been performed in any order.
When you read my history, you are using the "read" components of Git. Unfortunately these are not as powerful. You can do some nice things, like if you want to treat history as a straight line you can use `git log --first-parent` and you'll see only the merge commits (as if all merges had been squash-rebases).
It would be much better if you were able to collapse or expand any sequence of linear commits to gloss over the lower level details. But as far as I'm concerned, this is a problem with the "read" components of Git, not the "write" components, and so I will continue to use the "write" components to their full power. And the best part is that if I do it this way, we can improve the "read" components and allow the reader to collapse my verbose history, but we will never be able to expand pre-collapsed history.
The main reason I request commits to be split up is for ease of code review. It's much easier to review three commits that each do one easily comprehensible small thing than one commit that does three things at once. It's also better if you find there's a bug -- you can bisect down to a commit that's fairly small where the bug should be easy to see, rather than one that's enormous and where the bug is hard to find among all the other changes.
I think it is a matter of definition of "small" and "enormous". If you have a small thing, easily comprehensible, but big enough for it to be a complete piece of work. Then probably you also have separate task for it, and the change you introduce doesn't break the build. So it the end it's just a perfect candidate for pull request.
But note the comment above mentioned a commit for variable change. Or a commit for adding some comment sentence. Nano commits they are.
Sure, tasks should be small, easy to get, easy to review. But there must be a balance. Going to extreme, both ways, doesn't do any good.
indeed, if the commits are individually reviewable it is nicer. To the contrary however often these small commits can be a bit messy. Sometimes you'll find commits that are reverted later on, or fixed up later on. I.e. for commit-level review to work well, it's great if the history was polished.
Small, incremental commits are an asset with git blame, git bisect and git revert. I find it much easier to deal with too many small ones, rather than too few large ones. Especially if you keep the convention that master is always "merged into", i.e. "left of the merge", i.e. "parent 1".
especially with very small commits, I find small commits to be tedious and error prone (sometimes the software doesn't even build because the developer distributed two not-so-independent changes over two commits because the connection wasn't so obvious. Then you have a failed build and you don't really know if `git bisect` just beamed you into the middle of a refactoring, or whether there is an actual issue.
> After all what counts is your full chunk of work, reviewed via pull request, and merged to master. It should be treated as a whole.
I find the PR mechanism works great for the view of the whole, whereas the individual commits are great for the pieces. So in my commit history, you can read the timeline, and then if you want to see the commits squashed down, you click on the individual PR. On the PR screen (assuming you're using GitHub), it has a nice list of the subject lines of each of the individual commits.
Commits can serve as a supplement to documentation. When you properly commit the different logical steps that led to the current state of the code, it becomes incredibly easier for another team member to get why and how you have implemented things a certain way.
Would be interesting if there was a way to annotate a set of commits, like "commit ???? - ????: refactored A,B, and C" so you'd get the advantage of small commits and clearer messages.
This is what PRs are good for. Also, with my particular approach to commits, I always have at least one issue associated to a commit, and I'm always working on a particular branch associated to the issue. I pick an emoji that captures the issue/branch in a single concept, and I have that in my subject line. This is combined with my git commit template mechanism, and I like it. At a glance, I can see which commits belong together, and if I want to look at the whole, I go to the PR.
I think you can do that in a merge commit, sort of.
The more I think about it, the stranger a strong aversion to rewriting commit history for clarity is. In university if I did some math / physics calculation, I would often start, and once I got somewhere, make a clean copy of the successful work to have a concise and revised version.
Mostly fine to do it on a feature/PR branch also, in my opinion.
If those become long-lived with multiple people touching them (where history rewrites become peoblematic) you are not integrating continuously enough.
Unfortunately, I'm guilty of the opposite: I rarely, rarely commit. Maybe one commit per point. I have to consciously remind myself to commit more often.
But kind of it was also the tooling. Most svn projects I worked on were trunk-based and thus integrated much tighter than git feature-branch based code. However, the times I merged subversion branches, I kind of was sure that subversion lost some changes.
If you like this format, then you may want to try a similar format that uses the same purpose, plus uses words that easier to read and that make more sense to people in more cultures.
We use Add, Fix, Refactor, Reformat, Optimize, etc.
Agreed. I started using these on private & professional projects a year ago (and mostly got the team to use them, too) and it's a pleasure to browse the git log!
In the beginning the definition of "scope" is a bit wonky per project. However, once it solidifies you can easily start going through your log looking for "feat(endpoint)" to find new routes that have been added to an API for example.
I've been writing commit messages this way for a little under a year (I think this is the same guide I used when I was looking for a more consistent form to write them and to avoid the dreaded -m "Fixed some things").
One thing I noticed is that it's increased my confidence in my commits; at the moment that I go to write the commit message because I'm describing why I made the decisions I did it breaks logical inconsistencies between what I've actually done and what I think I've done. If I'm able to explain all the change I'm much more confident that it's correct.
Another is that bad commit messages have trained people not to read them. Often people will ask why I've made a change and then discover that the commit message contains the answer to their question!
Why? Because the git CLI doesn't wrap properly? To borrow a quote, that seems like a 'you' problem, not a me problem.
Maybe I'm just biased because these days I almost entirely interact with git through a GUI (either desktop client or web interface), and though I use the CLI occasionally (mostly for branch management, sometimes for quick commits) I can't think of the last time I used it for any type of history viewing -- pretty much any GUI is going to do a better job of that.
My team often uses markdown (mainly bulleted lists) and the output looks terrible when you insert manual line breaks (because markdown interprets that as meaning that you explicitly want a line break there) and you're viewing it on a screen/viewport that is either larger or smaller than 72 characters wide.
Unless you're explicitly using a publishing format (eg, LaTeX, PDF, postscript), the function of wrapping text should be a concern of the rendering of the output, not the origin.
Am I missing something here? Is there any other reason to manually wrap text besides the git CLI's handling of it as a viewer?
Linus Torvalds answered exactly this question [0]. Not that that means you should unblinkingly take it on authority, but the original reasoning is: the renderer doesn't alway know when a line should be wrapped. Examples: a stack trace, or long log line, or essentially any other quoted artifact that has a specific pre-determined format.
The relevant quote from the link:
Some things should not be word-wrapped. They may be some kind of
quoted text - long compiler error messages, oops reports, whatever.
Things that have a certain specific format.
The tool displaying the thing can't know. The person writing the
commit message can. End result: you'd better do word-wrapping at
commit time, because that's the only time you know the difference.
I understand this rationale, but I think most developers will encounter commit messages written by bozos who don't press enter after every 72 characters, far more frequently than commit messages that contain stack traces or other fixed-format artifacts. (Disclosure: I am one of these bozos.) The tool flubs every non-wrapped paragraph just so it can preserve the occasional blob of ASCII art.
If the tool applied reasonable wrapping heuristics and got it wrong once in a while, it could easily offer a `--no-wrap` option to let users see the message exactly as it was composed.
Sounds like your markdown interpreter has an issue, or you're leaving lots of white space at the end of your lines.
Generally, in markdown, if you insert a line break, it won't translate to an explicit line break unless you put two in a row, or if there is 2+ spaces at the end of the line.
First off, the commit message is plain text (by design) and can't be "wrapped" automatically, and any tool that tried would be insane.
The reason for 72 characters is that the CLI, like lots of other presentation mechanisms (including quoting in other commits or in code), wants to indent your message for readability. And the uniform standard width for terminals has been 80 characters for like four decades now.
Must it be? I dunno. I can imagine a uniform agreement among a broad team that everyone will assume a 100 character line and all tools should enforce that. Maybe a little more, but not that much because even on a modern screen you want to have two full terminals of text readable at a time.
But that's just a number. You'd still be told by your commit message style guide (or checkpatch.pl, or whatever) to wrap your lines manually at 92 characters. Is 25% more bytes on a line really worth yelling about?
>This text box I'm replying to you with is a plain text textbox. It word-wraps just fine.
No it doesn't. In fact I had to edit a comment I made above this one about four times until I got the formatting right. It annoyingly ignored me doing this:
You're missing my point because you don't understand the meaning I'm going for here. The comment box is not markdown, it's plaintext. This is not a rich text WYSIWYG editor.
My original point is that plenty of software does plaintext wrapping just fine and it's pretty ridiculous that consoles are stuck in this 80 character mindset.
I really like that Gerrit code reviews allow reviewers to comment on the commit message in the same way as on the code. The way to ensure useful commit message practices is the review process, if you ask me.
2. I also use emoji (with a key there in the git commit template for reference) to communicate concepts like implemented, bug fix, etc., in a single emoji character. Note though that it is inappropriate when you start to do a pull request, as not all git message viewers will display the emoji. But I find it very useful FTMP.
3. Bullets are a simple way to enable both terse and structured comments within a commit.
Does nobody else require there to be at least one issue tracker reference in the commit message these days? It makes automating stuff like release notes much easier.
It doesn't matter that much. Some things in programming mater a lot, like consistent indentation and casing. Some things doesn't, like perfectly formatted commit messages. whether the verbs in commit messages should be in imperative or present tense is just too much.
I can tell you that I have been programming for a very long time. So if someone asks me "Why does it MATTER if lines are longer than 80 characters?" then I can answer "Because it means that the distribution of line-lengths will be more uniform, meaning that you will be able to fit more code on the screen, meaning that the amount of scrolling you'll have to do when trying to understand the code is greatly reduced." And scrolling interrupts your focus, which is bad. But if I ask someone, "Why does it MATTER if verbs are in present or imperative tense?" no one can answer.
One reason I can think of for using the imperative tense is to use commit messages to generate changelogs. AngularJS does it this way, wouldn't be surprised if there were others.
Git has zero opinion on when you commit or what the messages are. Commits are made in private. Commits are also immutable. This is a losing battle.
If you use a code management tool like BitBucket or GitHub, it seems like the unit of work is less a commit and more a PR. A PR's description can always be edited and refined for future engineers, and a PR (almost) always represents a block of work that can be reverted. Instead of creating a hundred different approaches to writing commits that engineers essentially have to get right the first time, why not just focus on documentation in the form of PR's? It seems like if you could trivially tie a commit back to the PR and ticket it came from, most of these problems would be solved.
Private commits aren't immutable at all. You can change anything about them up until you push them to a remote repository that other people are syncing from. I routinely squash a bunch of private commits together before issuing a public PR, for example.
Commits are immutable; when you squash commits you're replacing them with new ones and rewriting the history. And since there is no difference between your local repository and the remote one, the reason why you wouldn't squash commits on the remote repository is to be nice to your coworkers.
So unlike PR's, if your coworkers criticize your commit messages it's already too late.
> And since there is no difference between your local repository and the remote one
Uhh, there's plenty of difference between your local repository and the remote one. One is local and used just by you and one is remote and used by many, for starters. Changes aren't automatically synced between them. You can rewrite commits locally to your heart's content (which I do all the time). You aren't "locked in" to anything until you've pushed it to a shared repository and someone syncs from it.
All those differences you mentioned are not related to implementation. The remote repo is functionally identical to your local repo. You can "rewrite commits to your heart's content" on the remote repo too, there is nothing stopping you. Like I said, the only reason you don't is to be nice to your coworkers.
By the time somebody else criticizes your commit messages, you've already pushed. It's too late. All that time you had while the commit was local-only means nothing, unless you brought your coworkers over to your computer to review your commits before creating a PR. Expecting developers to get commits right the first time (before pushing) is not a sustainable solution.
Calling commits "immutable" is thus verging into technically correct, but misleading and thus not useful territory. It's giving people the impression that commits are set in stone the moment they are made, when in fact this is far from the truth and you can go back and rewrite them, insert new commits between existing ones, squash them together, remove them entirely, etc. The only "gotcha" to watch out for is if other people have synced your changes; then you start having problems.
It's not misleading at all. If you forget that commits are immutable, then you get people who think they can edit a commit message and push it up no problem.
The main issue with commits is that they end up representing more than a single logical change.
Are there any tools that allow you to initiate a commit as an 'intention stack' for work I am about to do, rather than work I have already done?
I'd love to be able to write an intent message "refactor XYZ.." before starting in on that activity, then when I have to go down another rabbit hole in the middle I push another intent message to the stack, then pop back out afterward and continue with XYZ. The final overall commit message could be auto generated from the initial intention and all tangents.
My team uses task number and title as a commit message. We just can't fit all the description from task into one line of comment. So, when we look at git blame - we get a reference to a task in project management tool where we can find all the reasons behind a given change.
If there are more commits for the same task, all of them bare the same commit message. After each review, we add a comment to project management tool. That way we focus only on newer commits when performing another round of reviews.
Every other attempt to describe changes and intentions in one line seems doomed to me.
I've really welcomed the recent GitHub features such as "squash & merge" and "rebase & merge". Instead of parsing through 200 commits of "typo", "lint", "spacing", a feature is placed in a compact diff with the feature title & related task id as the commit message.
A commit should be a unit of work. Tests should pass before and after. The commit message should describe the change. Ensuring that you have a good commit message then influences what content you contain in a commit.
I've found that writing commit messages by stating the problem in the subject, and an explanation of the solution in the body makes them very clear and descriptive.
For example:
Problem: Windows build script requires edit of VS version
Solution: Use CMD.EXE environment variable to extract
DevStudio version number and build using it.
I got the idea from ZeroMQ's/hintjens' various repos.
We've used this style in our commit messages for a few years, and it's been great. We also only allow fast-forward merges of squashed commits for our branching strategy.
[ISSUE-ID]: Title
[Problem]
State problem and customer impact
[Solution]
Describe and justify solution
[Test]
Describe automated/manual testing
He mentions using the imperative mood for the subject line, but doesn't anyone else think it makes more sense and sounds better to use the present tense, e.g.
its been submitted to a message board for comments, did you expect everything to be positive ?
GIT commit message are hardly significant, unless you are doing some form of automated release notes - and then you just follow whatever convention is required.
The most important tip is "Use the body to explain what and why vs. how".
I'd also say: remove thw -m option from git and force people to open the editor. Do not accept messages shorter than 3 lines, start the editor with a template
For my own commits in my personal repos I often find that less than 80 chars are sufficient to describe the change entirely. I generally don't care about surpassing 80 chars for commit messages though, so even if my commit message is somewhat long I won't split it up.
I commit very often and usually small, "atomic" changes most of the time.
What I will do, is that I start the commit message with the most important sentence and add less important sentences after, so even if it exceeds 80 chars and you can't see the whole message you will still get the most relevant info without having to scroll sideways.
Totally agree. Using three lines of text to describe fixing a typo in the comments or rewording/reformatting a block of text will just lead to an unclear message.
It's likely that this would push the tools toward being too opinionated. I spend a lot of time pushing little WIPs so I know where I am and my macros rely heavily on -m, e.g. I end up with a commit message like "WIP: don't merge these debugging lines" or "WIP: Tweak the IO timeout and profile". -m is also useful for making trivial commits, I don't use it for this often because it's easy to violate 50 chars but it's a perfectly valid use of the tool IMO.
We agree on a short list of leading active verbs:
Add = Create a capability e.g. feature, test, dependency.
Cut = Remove a capability e.g. feature, test, dependency.
Fix = Fix an issue e.g. bug, typo, accident, misstatement.
Bump = Increase the version of something e.g. dependency.
Make = Change the build process, or tooling, or infra.
Start = Begin doing something; e.g. create a feature flag.
Stop = End doing something; e.g. remove a feature flag.
Refactor = A code change that MUST be just a refactoring.
Reformat = Refactor of formatting, e.g. omit whitespace.
Optimize = Refactor of performance, e.g. speed up code.
Document = Refactor of documentation, e.g. help files.