Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'd bet that if you built git as a single statically linked multi-call binary a la busybox, it would be far less than 230MB. Statically linking dozens of separate binaries with large amounts of shared code and then measuring the resulting disk usage doesn't tell you anything meaningful except how much disk space dynamic linking would save you.


git is built as a multi-call binary. I wonder if he's perhaps not realizing that all those other "git-*" binaries are hard linked to "git". Depending on which boxes I check on, my git binary has in the region of 80-110 or so hard links (EDIT: admittedly not a statically linked version, but none of it dependencies are big enough that it should add up to anywhere remotely near 230MB)


So ls -i should show they all share the same inode number?

Thanks for this. I was not aware of that. Perhaps I will give it another try.


> So ls -i should show they all share the same inode number?

It would, yes. Another useful tool here is du which by default will screen out files with duplicate inode numbers. So for an example where I have two 100M files each with multiple hard links:

  me@swann:/tmp/tmp$ ls -lhi
  total 701M
  180277 -rw-r--r-- 3 me us 100M Aug 17 13:44 zero.file
  180278 -rw-r--r-- 4 me us 100M Aug 17 13:45 zero.file.2
  180278 -rw-r--r-- 4 me us 100M Aug 17 13:45 zero.file.2.link1
  180278 -rw-r--r-- 4 me us 100M Aug 17 13:45 zero.file.2.link2
  180278 -rw-r--r-- 4 me us 100M Aug 17 13:45 zero.file.2.link3
  180277 -rw-r--r-- 3 me us 100M Aug 17 13:44 zero.file.link
  180277 -rw-r--r-- 3 me us 100M Aug 17 13:44 zero.file.link2
  me@swann:/tmp/tmp$ du -shc *
  101M    zero.file
  101M    zero.file.2
  201M    total
du does this duplicate ignorance trick across whole trees so the links do not have to be in the same directory, and you can have it scan a whole tree and it will show how much space it really taken, not how much is nominally taken. Like so:

  me@swann:/tmp/tmp$ cd ..
  me@swann:/tmp$ du -shc tmp
  201M    tmp
  201M    total
The reason I'm getting 101Mb instead of 100Mb (and 701Mb in total in ls) is that it is counting each link as taking a small amount of space, then "100MByte-plus-a-bit" is being rounded up to 101Mb (and 700-and-a-fraction rounds up to 701).

Also the number in the 3rd column of the ls output above is the number of links to the object, which can be helpful in understanding this sort of situation too.


I am no expert with du and all it's options and behaviours, but it's funny you mention the h, c and s ones because I did bother to learn and commit those three to memory long ago and routinely that combination.

I also use routinely use dd to get "exact" file sizes (yes, it's crude, but dd is on almost every UNIX-like system and it works), unless I have access to a good stat utility.


For a large chunk of the main binaries. There are certainly some things that are split out in separate binaries and scripts.

On a Debian system, take a look at /usr/lib/git-core/ - it contains a number of additional binaries, but it's still reasonably small. And a lot of what's in there is optional functionality and stuff you can delete if you don't want it. E.g. "git-imap-send", "git-instaweb" and a bunch of other things that you may or may not care about at all.

The main stuff like "git-commit" etc. is all linked to the main binary (or not necessarily present at all, depending on your build/distro).

EDIT: I just compiled a statically linked "git" binary. Stripped it is 2.5MB. That obviously excludes the few things that are in separate binaries. Things like git-daemon weighs in at 1.7MB statically linked.

Some things, like git-imap-send, seems to be a bit tricky to build statically (git-imap-send barfs errors about libdl all over my screen, and I'm not motivated to figure out why)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: