TIL - removing git large objects from .git
2025-10-29 blogpage sketch til
clone a fresh copy from github
$ git clone --mirror <url>
detecting the large file
❯ dust --ignore-directory .venv/ --ignore-directory data/
8.0K ┌── README.md │█ │ 0%
8.0K ├── REFACTORING.md │█ │ 0%
8.0K │ ┌── augment.py │█ │ 0%
8.0K │ ├── core.py │█ │ 0%
8.0K │ ├── model.py │█ │ 0%
8.0K │ │ ┌── core.cpython-313.pyc │█ │ 0%
8.0K │ │ ├── model.cpython-313.pyc │█ │ 0%
12K │ │ ├── augment.cpython-313.pyc │█ │ 0%
36K │ ├─┴ __pycache__ │█ │ 0%
72K │ ┌─┴ deepaugment │█ │ 0%
72K ├─┴ src │█ │ 0%
248K ├── uv.lock │█ │ 0%
12K │ ┌── logs │█ │ 0%
16K │ ├── index │█ │ 0%
60K │ ├── hooks │█ │ 0%
76K │ │ ┌── pack-0542d6e993463b99ac69399306e79f7efc43457b.idx │█ │ 0%
80M │ │ ├── pack-0542d6e993463b99ac69399306e79f7efc43457b.pack│█ │ 99%
80M │ │ ┌─┴ pack │█ │ 99%
80M │ ├─┴ objects │█ │ 99%
80M ├─┴ .git │█ │ 100%
81M ┌─┴ . │█ │ 100%
.git/objects/hook/pack-0542d6e993463b99ac69399306e79f7efc43457b.pack file is 80M!
find historic large file paths
find object hashes of largest 10 historic files
git verify-pack -v .git/objects/pack/pack-081cab608fc2c70786413cfef1580dd9205e67e9.pack | sort -k3 -n | tail -10
for all hashes (i.e. object sha), find their historic paths
git rev-list --objects --all | grep "<object-sha-from-above>|<object-sha-from-above>|..."
save those to a new file. let's call it historic_large_file_paths.txt
Single command doing all together
\( git rev-list --objects --all | grep -E "\)(git verify-pack -v .git/objects/pack/pack-0542d6e993463b99ac69399306e79f7efc43457b.pack | sort -k3 -n | tail -10 | awk '{print \(1}' | tr '\n' '|' | sed 's/|\)//')" | awk '{print $2}' > historic_large_file_paths.txt
explanation (progressively)
git rev-list --objects --all: list all objects in the repositorygrep -E <options>: grep extended regular expressiongrep -E "<option-1>|<option-2>|...": grep for multiple options"$(<any-command>)": run any command and use its output as a stringgrep -E "$(<any-command>)": run any command and grep based on its output (above see the long command)git verify-pack -v .git/objects/pack/<pack-file>.pack: verify pack file in verbose mode, showing objects by sha and their sizessort -k3 -n: sort by third column (-k3) in numeric order (-n). in our example, third column is the size of the objecttail -10: show last 10 linesawk '{print $1}': pick the first column. in our above example, first column is the sha of objectstr '\n' '|': replace newlines with pipe symbolsed 's/|$//': remove trailing pipe symbolawk '{print $2}': pick the second column. in our above example, second column is the path of objects> historic_large_file_paths.txt: save to file
remove selected files form git history
$ brew install git-filter-repo
$ git filter-repo --paths-from-file large_file_paths.txt --invert-paths
References
- https://www.warp.dev/terminus/remove-secret-git-history
Related
Unix & Linux Commands Cookbook