Hacker News new | past | comments | ask | show | jobs | submit login
Things I wish everyone knew about Git (Part II) (plover.com)
268 points by emmelaich on July 20, 2022 | hide | past | favorite | 136 comments



I said this in another comment recently about git, but I find it odd that even after a decade of using git as a mandatory part of my professional life, I'm still learning new things.

That's maybe not a good sign about the usability of git.

That `@{'3 days ago'}` thing? That's incredible. Why didn't I know about that 8 years ago? Why wasn't it obvious, intuitive to me that this was possible?

git is brilliant and I love it. But there are very few affordances that make it obvious what to do next, what is possible. There's few patterns in it where I can apply what I already know.

The thing that replaces git will have most of the power of git, but an intuitive interface that makes learning everything about it easy.


> I said this in another comment recently about git, but I find it odd that even after a decade of using git as a mandatory part of my professional life, I'm still learning new things.

If I can humbly offer a suggestion: read the man pages. A man page read once thoroughly is much more useful than the same man page skimmed 1000 times. The syntax in question is in the rev-parse man page (`git help rev-parse`).

These are tools we use every day. Look, I'm the first person to throw out those furniture assembly instructions and dive right in. But the truth is, we'd all be better off reading the documentation for our tools.

The git man pages are far from the best, but they contain tons of useful information.

And it's not just with git that developers seem allergic to reading documentation. I noticed a colleague the other day doing this from a shell script:

    BRANCH=$(python -c 'import os; print(os.environ["BRANCH"].split("/")[1])')
I asked what they were trying to do. They wanted to strip "origin/" from the front of BRANCH. I showed them you can do that directly in the shell:

    BRANCH="${BRANCH#origin/}"  # strip origin/ from front
Or:

    BRANCH="${BRANCH#*/}"       # strip */ from front
They'd never read the bash man page. Had no idea the shell could do this. So I pointed them at https://www.gnu.org/software/bash/manual/html_node/Shell-Par...

I think we'd all be better developers if we RTFM, at least for the tools/languages we use every day. You simply don't know what you don't know. This way you at least know a little more what you don't know. :-)


I think rather than asking everyone to read the manual, the designers of software systems ought to all read Don Norman's "The Design of Everyday Things".

Apple products became what they are because no manual was needed. They were designed with patterns that were intuitive, affordances, and a deep understanding of human psychology.

What I am arguing is that git lacks those very things that make reading a manual not needed.


Agree with your point, but I am thinking now how many people understand python part and how many bash version.


I don't think the fact that you keep discovering features is a bad sign for a tool like git. It would be a bad sign if you kept discovering new footguns in the features you use (there is a fair bit of software like that).

I'm 25 and a couple weeks ago I discovered a new way to use a knife to prepare a fish, even though I've been doing that all my life. I don't think knives or fish have bad UX, even though I keep discovering new ways to use knives on fish. What's important is that I don't keep discovering new ways to hurt myself with knives.


All those bones in fish are pretty bad UX, compared to some of the competition.


GIT is just the new Regex. Everybody uses it, most have no clue how it works and just copy/pasts stuff from other places/repeats the same little trick


No, that’s terrifyingly untrue.

Git is more like… the new car. It’s complicated and most people have no idea how it works, but they know how to steer very simply and can drive it with a bit of training. Experienced people however, who know how cars work, can do really cool things with them and have no problem repairing them on their own.

Still, cars are useful. A bike would be better of course.


I like it! I'll do you one further. Git is more like... someone ripped out all the controls from your car and installed a Boeing 787 dashboard. You'd probably figure out how to start it and make it go forward, left and right, but you'd miss out on a lot of functionality and likely destroy something while playing with unknown buttons.


I find regex easier to write than to read.

I never copy them from stackoverflow because I'm worried it does some weird magic that I don't want. Instead I build them against one or more examples.


> I find regex easier to write than to read.

There is nothing new under the sun.

Joel, 22 years ago:

There’s a subtle reason that programmers always want to throw away the code and start over. The reason is that they think the old code is a mess. [...] The reason that they think the old code is a mess is because of a cardinal, fundamental law of programming: It’s harder to read code than to write it.

https://www.joelonsoftware.com/2000/04/06/things-you-should-...


Joel is 100% spot on. This is exactly why I don’t think things like Copilot will catch on. Making programming involve less writing code and more reading code actually makes the job more difficult.


And the corollary: since code is harder to read than write, steer clear of writing clever code where simple code would suffice, as the clever code will be much harder to read.


A second argument is that debugging is harder than writing.



s/writing/reading


I say that because there's a meme that nobody writes regex and that everyone copy-paste it from stackoverflow.


I can highly recommend regexr.com for testing and developing regex patterns; paste a representative sample in the "Text" area and it'll show you how things match (and usually, crucially, don't match) as you change the regex pattern.

I'm not associated with them, and I understand you don't seem to have any real issues with regex, but thought this would be a good place to mention a useful tool.


Using such testing you can only prove a regex isn’t what you want for specific inputs, and make a number of plausibility checks, but you can’t prove it is what you want for all inputs. For that you need to do the reasoning on the expression as if you had built it yourself.


I've used that and regexbuddy and others over the years. Almost anything the gives you some visual rendering of what's going on is helpful. Personally, I've taken to using the regex stuff in the Jetbrains IDEs, mostly because I'm there already most of the time, and it's 'good enough'. But I'm not always at my own setup, and regexr and similar are always a good tool to have (and to share with others to show them how a regex is working).

If I did this all the time, I might not need tools like this. But complex regex are something I only dive in to a handful of times per year, and it's never the same problem twice.


Nice! I’ve been writing (and reading) regexes for over 20 years now. I can usually ‘read’ them on sight, but I’ve been wanting this tool for when it all goes wrong.

Which of course always happens at least once or twice in any new dataset/usage.


Thanks for pointing out regexr.com.

I really appreciate the availability of tools like that.

See also the Emacs built-in M-x re-builder (for elisp-style regexp).


Totally agree! But oddly enough, I have this problem with all code/languages… I always write code much more easily than I read it. I can write fairly complex applications just by following the logic of what the application must do in my brain (but with ZERO cleverness). But give me the source code of even the simplest Unix command and I’ll struggle to understand the purpose of every #define, the variable naming scheme, etc., etc. I’m sure this is telling me something interesting about how my brain works but I’m not sure what!


> I’m sure this is telling me something interesting about how my brain works

See my sibling comment. Joel says you are just normal.


Regex is much simpler than Git, but also the different regex implementations in use have a lot of subtle and less subtle differences, whereas there is roughly only one Git.


I use 3% of git commands and that's it. I probably use 3% of regex rules too.


  > even after a decade of using git as a mandatory part of my professional life, I'm still learning new things.
I still learn new things in Python every month. I still learn new things in Magento every _day_. I'll learn something new in VIM or PyCharm or PhpStorm at least once every few weeks. I'll learn something new in my native language every few weeks, and I'll learn something new in my other languages almost every day. I'll learn something new about Bash or Linux or CORS or Selenium or some useful Chrome or Firefox setting or extension at least every week.

Learning new things is not unusual in the software development industry.


It's vindicating to see the general response in this thread vs threads about git a decade ago, where everybody defended git and decreed all of us simpletons just don't understand it enough.

Oh we do, and we understand how most of it is a flaming trash pile.


Oh no, I think it's a beautiful piece of art. With sharp edges that keep cutting my hands as I try to handle it.

I love git. I just wish it was more usable.


git could also have just followed some standards. For instance, how am I supposed to know the syntax of "3 days ago" in general?

They could have used time spans from ISO 8601 standard, so for instance simply "P3D". Or "P3Y6M4DT12H30M17S".

git is powerful but it involves a higher amount of time investment than it could if it were not reinventing the wheel and taking shortcuts as much. It's easy to see that it is and was built for powerusers mainly.


Are you really suggesting that more people know what "P3D" means than "3 days ago"?


I think more people know how to define "3 days ago" when you tell them "provide it as an ISO 8601 time span" vs. "provide it in gits own format".

When it comes to presenting it nicely, then sure, use 3 days ago. No problem with that.


I love this quote from Part I:

"Git has an elegant and powerful underlying model based on a few simple concepts:

  1. Commits are immutable snapshots of the repository
  2. Branches are named sequences of commits
  3. Every object has a unique ID, derived from its content

Built atop this elegant system is a flaming trash pile. "


The quote is kinda wrong in that point 1 and 2 are mixed up. Branches are just pointers to commits. Commits contain a reference to their history. It's only kind of incorrect because in praxis branches are used to refer to a history (a sequence of commits). But it's also misleading once you have to do anything more complicated than just commiting/merging.

When I started out using git I was working with the same assumptions, but I was perpetually confused. Git became a lot easier to use once that misunderstanding cleared up.

Maybe it's just like that for me but I think we might be doing newcomers to git a disservice by explaining the basics of git in this simplified manner.


Do you have an actual example of this? I've never encountered a situation where those three points are incorrect.


If you're in a detached head state and create a new commit, you have a commit that isn't on a branch. The commit still has a parent, so it's not the branch that's tracking the sequence, it's the commit. Branches are just pointers that get updated as you add commits.


This feels like declaring squares aren't rectangles because you can make rectangles that aren't a square.


Git is only commits. Branches and tags are references to the commits; one moves while the other one doesn’t. That’s it.

Commits link to their parent(s) until the initial commit. You can visualize a “branch” as a series of commits, but the same thing can exist without calling it a branch:

    A -> B -> C —> Initial
Is that a branch? No. How about:

    /refs/heads/main = A
That’s a branch, and it looks exactly the same as a tag:

    /refs/tags/v1 = A
Neither a branch or a tag is “a series of commits” even if you think it does.


> You can visualize a “branch” as a series of commits, but the same thing can exist without calling it a branch

You can visualize a "square" as a rectangle with equal sides, but the same thing can exist without calling it a square.


2. is true or untrue depending on how deeply you interpret it.

A branch is named, points to a commit and under normal circumstances will track the sequence of commits made via itself (i.e. when you commit and HEAD points to branch b, b starts pointing to the child commit). So you could say it's a named sequence of commits.

On the other hand you can reset the branch to any commit in the tree, even ones which aren't even on the current sequence of parents. It is still technically a named sequence of commits, just a totally different one.


Git branch is actually very similar to git tag[1]. For example, there are commands to point branch to completely different commit in different section of the commit graph, just like you can do it with tags.

[1] The main difference perhaps is that, if you create a new commit, the branch pointer will actually be moved to point to that latest commit (while tag stays fixed).


Git supports history rewriting so 1 isn't true. Git uses hashes for "unique ids" hashes aren't unique just low probability of collision, so 3 is also not true.

Having run into issues that appear to be caused by 3 not being true, I don't see that as a theoretical issue.


1 is true. When git is “rewriting” history it’s actually creating a new commits and moving the branch pointer over.

Until the gc reaps them, you can absolutely git checkout the hash of any of the rewritten history and it’s still there, same as you left it. You can even move the branch pointer back, undoing the history rewrite.


So when I do a pull I get their un-rewritten history? Just because they have a short undo buffer doesn't make modifiable history immutable.


> once you have to do anything more complicated than just commiting/merging

If you really have to do more complicated things with git.. why? I mean that seriously, if your workflow necessitates anything more complicated that committing or merging with any level of regularity, it sounds like you have a bad workflow. Committing and merging should be 99.9% of your activity within git, shouldn't it?


It depends a little on how you use git. Apart from stashing (git-stash), I use interactive rebases a lot. Combined with autosquashing it's a very powerful tool to ensure a somewhat nice history.

Of course, this only matters if you care about your history. It feels like there are two camps of git users: One camp squashes every MR/PR together even if it's huge, hasn't heard of git-bisect, creates plenty of merge commits in both directions, and piles unrelated changes into a single commit. The other camp cleanly groups changes into self-contained commits, rebases often resulting in a clean patch-series-style MR/PR, and likes to use git-bisect.


This observation about the two camps rings very true. The sad part is that while the second camp is the original user base of Git, at least GitHub basically acts as if they don't exist and only really caters to the first camp.


Items 1 and 2 are somewhat misleading.

Commits are only immutable in the sense that the same commit hash will (with huge likelihood) always refer to the same commit history. But they are not immutable in the sense that you couldn’t change the history of e.g. a branch. Other VCSs actually offer more immutability than Git here.

Branches are sequences of commits only up to the last merge, because they are really just labels on branch tips. In other VCSs, branches are truly linked with each commit of a sequence.

Meaning, when you’re familiar with such a VCS, reading items 1 and 2 may lead you to think “well, that’s what I know and expect from a VCS”, whereas really it is a bit different in Git.


> But they are not immutable in the sense that you couldn’t change the history of e.g. a branch.

That sentence doesn't make sense, though. Assuming from context that `they` = commits,

"[commits] are not immutable in the sense you couldn't change the history of a branch"

Branches and tags are mutable - they're just pointers to commits - but that doesn't change whether or not commits are immutable.

When you "change" the history of a branch, what git is really doing is creating another series of new commits, and then making the branch pointer point to the new commits. However, because commits are immutable, the old commits are still there, and at any time you can go back.


You are defining commits to be different by the mere fact that their contents (and history) is different. With that definition, how could a commit ever be mutable? Only by an indirection were the VCS allows the referencee to change. And that is exactly what Git allows with branches.

When rewriting history, the fact that the “new” commits are new is a tautology if you define commits to be different when their contents is different. This only matters when you use the commit hash as your reference point, which is what I was pointing out in the parent comment.

The expectation of immutable commits would be that history never gets lost and cannot be changed (without rewriting history into a different repository), but it certainly can in Git.

The real point about Git’s content-derived addressing is that it enables the distributed aspect, because it makes the path a commit has taken between repositories not matter. But that is already better expressed in point 3.


> With that definition, how could a commit ever be mutable?

It can’t. Hence commits in git are immutable. It’s a function of gits design that commits are immutable.

I could make my own VCS that doesn’t use content derived addressing, but instead uses UUIDs for commit IDs, and let you issue commands that changes the contents of that commit. In that system, commits are mutable. But that isn’t git.

> The expectation of immutable commits would be that history never gets lost and cannot be changed

No it isn’t. The expectation of immutable commits is that a commit object can’t be changed. And in git that is true, at least it can’t be changed without also changing its identity, which is functionally the same.

As has been explained, you can make an alternative parallel history, and you can switch to using that new parallel history as you’re new truth, but you haven’t changed the original commits. They remain un-mutated, because they are immutable.


So, like Nix, basically... ;)


A simple one is being able to write multiple messages (subject and body), e.g.

  git commit -m "subject" -m "a longer body"
or just running 'git commit' and using your text editor (subject first line, body after).

I've worked on too many repos with messages like "fixed the thing" where a few more sentences of context would've prevented headaches when trying to debug or change something.


I always tell people the `-m` flag of `git commit` is only ever to be used in scripts. Let git open an editor and write your multi-line message there. Look at the comments telling you which files have been changed.

Even better than using `git add` and `git commit` is to use `git gui` to add and commit files. The (standard) GUI is not to help beginners, it is to make you a more effective user of git. Note there are other apps you can use with more features, for example I like gitx on MacOS, but there are also terrible apps out there that do try to be easy for beginners, and they should be avoided. The most obvious is github desktop (I haven't tried it for a while - it might have slightly improved, but I still won't go near it)


Good points. I occasionally use gitk to browse recent commits. A gui definitely has its place.

I like using the terminal, so my workflow is to use `git add -p`. That makes me look at each change, in small chunks,as I stage them. As a bonus, it helps me keep commits small, or decide if there’s some logical separation in my changes that would be better expressed in two or more commits.


What's `git gui`?

    $ git gui
    git: 'gui' is not a git command. See 'git --help'.

    The most similar commands are
      ci
      gc
      grep
      init
      lg80
      pull
      push


Your package manager seems not to include it. `git gui` is available in the standard Windows installer for git. The package on Ubuntu 20.04 seems to not include the gui.

As noted on git website, there are two GUIs that are in-tree -- native components of the project. https://git-scm.com/downloads/guis


Yeah it's packaged separately. I'm using git from Homebrew on macOS.


You may be able to get to it from `gitk` (it's under File > Start Git Gui). I've seen distributions that don't include the `git-gui` shortcut in PATH but do add `gitk` to somewhere in PATH.


Thanks for the tip. I was mostly curious about it as I had never used or heard of it. I have looked into gitk in the past, and also other GUIs but I find git easier to use via the CLI.


Alternative:

  git-log --graph --all --stat


That's more like gitk, not git gui.

On ubuntu it's a separate install from the main git, https://launchpad.net/ubuntu/bionic/+package/git-gui

Try `apt install git-gui`, or similar.


> Look at the comments telling you which files have been changed.

And you can make this even better by adding the diff that’s being committed with verbose mode (see `git commit --help`), so that you can scroll down and easily see exactly what’s going into the commit:

  git commit --verbose
You can make it permanent so:

  git config --global commit.verbose 1
I also recommend setting commit.cleanup = scissors as a related thing.


Thanks for this tip. One of the things I always did in larger commits was compose the commit message in a separate text editor while reviewing the diff in my terminal. This puts the diff right where I can see it, although time will tell if I enjoy scrolling up and down to switch between reading the diff and editing the commit message.


See if your editor supports a split view; in Vim, for example, you have :split or :vsplit.


In addition to 'git gui' I would also recommend 'tig'. It allows you to stage individual lines and hunks like 'git gui' but also has a history view. With a little bit of creative key binding, this makes creating --squash commits a breeze. (It also has mouse support which needs to be enabled manually in the config.)


I'd love to know how to do this in VSCode. Its infuriating that it yells at you for exeeding a 50 character limit.


The 50 character limit warning is just for the "top line" of a commit, the "commit subject line". It's general good advice to keep things like `git log --oneline` readable for people using fixed width command line terminals to view commit logs.

VS Code expands the box as you add newlines to allow you type a longer body below the "subject line". The lines in the body also give you suggested warnings to stick to 72 characters or below, which also comes from general good advice to keep things readable in fixed width command line terminals for things like `git log` and `git show <commithash>`.

In some ways those warnings/general advice follow things like writing a plaintext email (or usenet posts or…) in pre-GUI days.

You don't have to follow those warnings' advice. You may not care how your commits look in fixed width command line terminals. Some GUI tools format commit messages poorly if you actually format it like an ancient email, and you can just write paragraphs and let the people with fixed width command line terminals use a pager tool that can better reflow the text for them.

The point is the advice comes from a good place, and there are git repos that are sticklers for plaintext commit formatting requirements which is why VS Code shows that advice. But you don't have to follow it.


do `export EDITOR="code -w"`

`git commit` will then open up the commit message as a temp file in vscode, you can write your message then save and close (cmd-s, cmd-w on mac, probably ctrl-s ctrl-w on windows and linux?) and git commit will continue on. `code -w <file>` is telling vscode "open this file for editing and don't return until the user closes it"


Hit enter. Line #2 starts the body.


The terminal in VSCode. Over the past 6 months or so I've been forcing myself to move as much of my workflow into the terminal as possible and I've found things have been continually getting easier and easier. A lot of arbitrary restrictions based around half-baked UIs get removed when there's no more half-baked UI.


"It is really hard to lose stuff"

It is really hard to lose stuff that you have ever committed. It is really hard to lose stuff as long as you commit early and often. Uncommitted work is pretty easy to lose with 'git reset'.


Depends. I use an IDE from Jetbrains, their "Local History" feature makes it fairly difficult to lose anything, even between changes on uncommitted files.


I don’t think that counts as a “depends” if it involves the use of another tool.


This has saved me many times. Memory is cheap these days, this should probably be a more common editor feature, on by default.


... or with `git checkout` or with `rm` or `git clean` (for files not yet under version control).


... or git merge / git pull (but not git rebase) ...


How do you loose files with git merge / git pull?


merge and pull update the reflog just like rebase does, so both can be undone!


My thoughts on this:

- even after years of working with git, I'm still not familiar with all the dark corners I can get myself into by running the wrong command at the wrong time. Yes, most of the time my changes are still there, but the way back to the state I actually want for my repository may be long, complicated and require lots of googling/asking around.

- as the article mentions, git cares a lot about stuff that has been committed, but not so much about the files in your working directory. Because of that, IDEs that keep a local history of your file system changes (like the JetBrains products) are really helpful. "Commit early, commit often" may save you from this, but produces more noise in the repo (which you can reduce by squashing commits, but that's extra effort) and may make it harder to find things - and I guess that goes double if you follow the suggestion of committing changes automatically "every few minutes". The local history is there when you need it and gets out of the way when you don't.


I don't think it's so strange that we need to rework the patch series a bit before sending it for review. At least not if we see readability as a main goal.

Not many great writers can sit down and write a perfect script on the first try. They write a first draft and then rework it, move things around etc. In the same way the git user goes over the patch series with rebase and amend to massage it into something that flows nicely and is pleasant to read.

I find that it's actually easier to make a good patch series if you start from many small commits, rather than a few big ones. The tools in git that merge patches together are faster to use than those that pick patches apart.


FYI The committing every few minutes idea commits to a separate branch to prevent this clutter.


I can't tell you how many times IDEA's "local history" saved my ass. I use it extensively. Instead of stashing local changes and going around messing with my working tree, what I did 30 minutes ago is just a right-click away.

(Also, if you right-click on the top project folder itself, you can restore the entire project to a previous snapshot, if something was working before and you heroically went down the wrong path).


Yeah I agree. This is a huge productivity boost.


If I could just convince colleagues to `pull.rebase`, so CI isn't constantly building 'Merge branch master' on the master branch, I'd be happy enough.


Uh, couldn't disagree more. I just discourage the use of pull itself, but having it and defaulting to rebase is quite invasive.


Maybe it would have been good if the git defaults were changed from the defaults used for the kernel development workflow to the git workflow ~everyone else is using, even though git was created for the kernel people by the kernel people - most users are not kernel people.

"git pull" in the kernel workflow is for integrating downstream changes into your own upstream repository. Meanwhile, "git pull" in the everyman workflow is for synchronizing your local changes with upstream changes (you know, re-basing your patches on upstream or something).

Even the merge commits left behind by default git pull reveal this. "Merge branch master of ssh://yourorg-git/upstream" - completely backwards for what you are doing - you're merging the upstream into the downstream? What, your local repo is the boss now? Doesn't make any sense.


The idea of "merging into" has no meaning with git. You're merging two things into one. But saying that there are a master thing and a slave thing has no meanings.


git does privilege the first parent[1] in a merge in a bunch of contexts

[1] i.e. the branch you did the merge from


Not true because commit parents are ordered. One of them is first - the main parent.


It's a detail. But for some people it looks like it changes everything.


> It is really hard to lose stuff

Furthermore I recommend turning off automatic garbage collection. Turning it off makes it even harder to lose things, and unless you have an insanely massive amount of churn, you really don’t need it.

https://donatstudios.com/yagni-git-gc


Or committing large binaries frequently, but yes. Totally agreed.

GC manually unless you're dealing with scale where you know precisely how much it helps you. Otherwise it's safer and easier to not do it, and the disk space is utterly inconsequential (and trivial to find and `git gc` if it proves otherwise). You can afford a few more megabytes every year or so.


> What if you can't find it?

Funny enough, I just wrote "git-lost-and-found" last week because I'd added a new file (`git add`) but before I created a commit I did a `git reset --hard` which deleted the file. It wouldn't have been in the reflog since there was never a commit created. But I knew it would be under .git/objects until the next time a `git gc` runs, so the trick was to just take a look at a few recent blob objects till I found what I was looking for:

https://gist.github.com/jaysoffian/f42f4b1806f65158b40408814...

Put that in your PATH as `git-lost-and-found` and then you can call it as `git lost-and-found` from inside any repo.

Edit: oh, MJD beat me to it by 6 years but I hadn't seen this when I wrote the gist above:

https://blog.plover.com/prog/git-reset-disaster.html


I guess I could use `git fsck --lost-found` too.

https://git-scm.com/docs/git-fsck


> But the old commit is still in there, pristine, forever.

> (Git will eventually throw away lost and unused snapshots, but typically not anything you have used in the last 90 days.)

Without having exlplained what a 'snapshot' is and what 'typically' means, it's pretty unclear whether the first statement is affected by the second, and if it were the first is incorrect, so I feel this clarification should be added to the list of things you want people to know about git.


> Without having exlplained what a 'snapshot'

A commit is effectively a snapshot of the repository at a given point in time. This is unlike some other revision control systems where a commit is a diff. In git, a commit contains whole files, and the diff is generated if needed.

> and what 'typically' means

Pretty much everything is configurable, and you can invoke the cleanup process by hand if you really want to.

> it's pretty unclear whether the first statement is affected by the second, and if it were the first is incorrect

Yes, I think it's incorrect and shouldn't be relied on. Things floating around unconnected to a branch can be removed by git eventually. There's plenty time to fix any mistake, but it's not "forever".


Well, the Part I have en entire chapter enticing you to read Git From The Bottom Up [0] before continuing. So the author can expect that you know what is a snapshot.

[0] : https://jwiegley.github.io/git-from-the-bottom-up/


> It is really hard to lose stuff

If that stuff is committed.

Maybe I'm stating the obvious but conversations seem to revolve so much around finding stuff and merging stuff that my favorite use of git isn't talked about as much -- throwing stuff away. A commit is your save point - once made, you can basically code risk-free, knowing you can always go back to that save point. Try new things, thrash your code, don't stress even if you break everything - because you can always just revert back to that last commit.

So I'd say that it is really hard to lose the stuff you explicitly declared you wanted to keep. And easy to lose the rest.


As a corollary: cheap branches are wonderful.

The workflow I teach the newbies at my job is to just do a quick `git checkout -b tempbranch` when trying out: complicated rebasing, cherry-picking commits, whatever.

Do your thing on that temporary branch. Did the thing work? Hell yeah. `git checkout - && git reset --hard tempbranch`. Presto magic, your old branch is now a perfect replica of tempbranch. It's as if you did the whole thing on your original branch to start.

Did it fail miserably? Oh well. `git checkout - && git branch --delete tempbranch`. It vanishes and no one has to know that the rebase fell apart. Your working branch is in pristine condition still.


Those are pretty much the same thing. Branches work for that purpose because commits are immutable - the branch just saves you from having to remember the hash. A tag would also work in this case.


> just saves you from having to remember the hash

You're underselling it :) That is the entire point. Don't tell me you enjoy using reflog. This "just saves you" in the same way that a rebase "just saves you" from having to write out enormous cherry-pick statements.

While "the plumbing is the same" is a nice bit of trivia, the porcelain is all that we should care about when we're recommending behaviors.


> A few things can be lost forever!

> The dangerous commands are git-reset and git-checkout

Don't forget about `git clean`, which will happily gobble up all your important ignored files (API keys, IDE configurations, etc)


That!

Also, 'git reset' is only dangerous in some modes, e.g., --hard. It's a user experience nightmare to have the same command be nice in one case and dangerous in another, depending on an option parameter. It is a very useful command, so it's likely that you accidentally lose your local changes the one day when you're a bit tired, maybe.

Also, 'git checkout' is used for many good things like, well, checkout. To abuse it also for reverting local changes is a user experience nightmare, too. It should have a different command for that functionality.


> It should have a different command for that functionality.

It does in today's git. Rather than using `git checkout` you can migrate to using the two split commands `git switch` (switches branches, generally safe) and `git restore` (restores files, generally "dangerous"). `git restore` also has the benefit that the some of the most dangerous commands at that point have similar names: restore, revert, and reset.

It will take a while for people's `git checkout` muscle memory to be replaced with `git switch` and `git restore` (and `git restore` is still marked as "experimental" despite being stable for quite a few git releases now, though that "experimental" suggestion seems to be mostly earmarked for people thinking to use `git restore` in automated scripts than it is for human users), but the command split is a very good thing for git user experience.

(I've mostly gotten used to `git switch` at least and have stopped using `git checkout` in my own work.)


But git clean requires confirmation, unless you configured Git to not ask your permission to trash the .gitignored files.

It even shows you which files are going to be deleted.

Now, on that topic of "important but (.git)ignored files", I don't know what the best practices are but I like to symlink my IDE config files / test API keys etc. that aren't to be committed to Git. So in case I'm not cautious I'm only deleting the symlink but not the file itself.

To me it's the best of every world: I don't need to take any special care of "not backuping" these infos were they shouldn't be backed because I don't backup my Git repo with anything but Git (and they're .gitignore'd so I'm good), I don't risk losing them even if in engage in reckless behavior (like a git clean or some "rm -rf ..." the whole repo and re- git clone, you guys know what I mean) and I can centralize my secret / personal config files into a specifir dir which I know is important and which I can securely backup (say doing encrypted backups).

If I delete the symlink (and git clean only deletes the symlink, as expected), I can just re-create it.


["Git from the Bottom Up"](https://jwiegley.github.io/git-from-the-bottom-up/) changed my life. Someone recommended this to me, and I recommend this to anyone seeking more information about git. People often just want to know "which commands to run", which is a fair expectation, until they reach a threshold of what they can do without knowing the internals.

[When should someone read "Git from the Bottom Up"? What are your git recommendations?](https://forms.gle/aqFy52uvTp8CUvkF9)


I made a simple script below I find useful that commits current changes then automatically rebases and squashes it with the previous commit, preserving the previous commit message. I find it useful for a way to quickly save current changes before switching branches, or for easily backing up work before its worth making a new separate commit:

  git add --all
  git commit --fixup=HEAD~1
  GIT_SEQUENCE_EDITOR="sed -i -re 's/^pick (.*)fixup! (.*)/fixup \1\2/'" git rebase -i --autosquash --autostash --keep-empty HEAD~2


Isn't this the same as something like:

  git commit --all --amend
With maybe a --no-edit thrown in?


You know what you're right, the command

  git commit --all --amend --no-edit
seems to do essentially the exact same thing, thanks!


HN italics markup is mangling your code – indent the code block to fix :)

    like so!
> save current changes before switching branches

Also: why this over a stash? Stashing fundamentally just creates a hidden little commit with all your unstaged work on it. It does have one advantage over your approach, which is easily applying the work to another branch.


> indent the code block to fix :)

Thanks!

> Also: why this over a stash?

I just like this better because with stash you have to do one more command (stash pop) to get the changes back after switching back to the first branch, but this way the changes are already there when you switch back.


I wish that instead of focusing on this API view of git that everyone seems to have run with that something akin to a plan9/fuse file server was the main interface everyone learned. I would imagine it would work like a souped up version of RepoFS:

https://github.com/AUEB-BALab/RepoFS

There is a paper to go along with the package which showed some convincing examples that much of the git tooling could just go away in favor of already existing shell commands: e.g. no need for `git grep` when `grep` will do the job.


a colleague accidentally force pushed a stale branch to master and a weeks worth of work was potentially lost. That day git reflog helped and saved our jobs as we were contractors for a big Germany based client.


Always disable force pushes to trunk (if that feature is available, as it is on Github).


I hope you(they) are backing up their source code as well, which would have helped you if reflog did not.


>Some people automate this: they have a process that runs every few minutes and commits the current working tree to a special branch that they look at only in case of disaster.

This sounds nice. Does anyone have a concrete example?

I'd like to try with a cronjob but it would be useful to avoid having to create the job manually for each repo & I don't want a job repeatedly scanning all my disks. Is there anything like a global post-clone hook? Additionally, I'd be worried about race-conditions from saving work during the job.



"It is really hard to lose stuff"

Indeed. This also means that garbage keeps piling up in git repos.

This is how I make sure, I do lose stuff eventually:

https://github.com/no-gravity/git-gc-all-repos.sh

A script that goes through all my repos and performs garbage collection.


I know your link says you don't want to do `gc --aggressive --prune all`, but for anyone who does: a) That still leaves objects referenced by the reflog, so you may want to `reflog expire --expire=now --all` first if your reflog is full of rebases etc that you don't need, and b) you may want to run gc twice because the second time compacts a tiny bit more than the first.


Disk space is cheaper than lost data.


True, but wouldn't the reduction in size make it a little quicker to transfer over slow networks?


You might be interested in the `git maintenance` subcommand that was introduced in Git 2.31. It can do per-repository automatic background garbage collection on a configurable schedule, among other things.


Learning git is not easy.

There is one thing that will almost guarantee no work is ever lost: commit. if something is commit, even if it’s not pushed it can be retrieved.

One other thing that has nothing to do with how it works, but makes all the difference in usability: write commit messages describing _why_.

To learn git is to use git, reading about it has not helped me.


> write commit messages describing _why_.

Could you provide a concrete example? I don't understand your point.


As in: "changed this property in the config _because_ it will extinguish all evil from planet earth" vs. "changed this property in the config".

I can see _what_ somebody did well enough from the code diff itself -- reading an uninterrupted stream of messages explaining _why_ in a git log is a wonderful experience.


Thank you, I think I get your idea.


> The dangerous commands are git-reset and git-checkout

Don't forget about `git-restore` which replaces `git reset -- filename`


It's also designed to replace `git checkout branch -- filename` and related complicated `git checkout` uses. (It was added alongside `git switch` and between `git switch` and `git restore` you can entirely avoid `git checkout`.)


Hrm… can you really recover branches and tags once you delete them?

I was under the impression commits were retained but removing labels (tags, branches, remotes) was unrecoverable and immediate.


You can't recover where your references were but all the objects (anything with a SHA1) are still there.


reflog tells you when you created/changed a label and what ID it had, so you can recover them.


Has anyone written a tool to actually does this? I want to add that to my post-rewrite hook immediately so that I can gain peace of mind I will always be able to switch to a branch where I can study the previous reality after rebasing a new one.


My project https://github.com/arxanas/git-branchless does this. Use `git undo -i` to get a graphical view of where branches were at an arbitrary previous point in time.

It's worth noting that, in principle, there are cases where the reflog will have never observed a branch move. The reflog we usually look at is that of HEAD; if HEAD did not have the branch checked out when it moved, then that information will be lost. (For example, if you run `git branch -f my-branch abc123`, the branch will be forcibly reassigned without having been checked out.) See https://github.com/arxanas/git-branchless/wiki/Architecture#... for more details.


From Part I:

> "I don't need to know how it works. I just want to know which commands to run."

> with Git, this does not work.*

Well, so what is your post about? Make it work, then call back.


Unrelated to git, but I LOVE how fast and user-friendly this page is.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: