TIL - removing git large objects from .git

2025-10-29   blogpage sketch til


clone a fresh copy from github

$ git clone --mirror <url>

detecting the large file

❯ dust --ignore-directory .venv/ --ignore-directory data/
8.0K   ┌── README.md                                               │█ │   0%
8.0K   ├── REFACTORING.md                                          │█ │   0%
8.0K   │   ┌── augment.py                                          │█ │   0%
8.0K   │   ├── core.py                                             │█ │   0%
8.0K   │   ├── model.py                                            │█ │   0%
8.0K   │   │ ┌── core.cpython-313.pyc                              │█ │   0%
8.0K   │   │ ├── model.cpython-313.pyc                             │█ │   0%
 12K   │   │ ├── augment.cpython-313.pyc                           │█ │   0%
 36K   │   ├─┴ __pycache__                                         │█ │   0%
 72K   │ ┌─┴ deepaugment                                           │█ │   0%
 72K   ├─┴ src                                                     │█ │   0%
248K   ├── uv.lock                                                 │█ │   0%
 12K   │ ┌── logs                                                  │█ │   0%
 16K   │ ├── index                                                 │█ │   0%
 60K   │ ├── hooks                                                 │█ │   0%
 76K   │ │   ┌── pack-0542d6e993463b99ac69399306e79f7efc43457b.idx │█ │   0%
 80M   │ │   ├── pack-0542d6e993463b99ac69399306e79f7efc43457b.pack│█ │  99%
 80M   │ │ ┌─┴ pack                                                │█ │  99%
 80M   │ ├─┴ objects                                               │█ │  99%
 80M   ├─┴ .git                                                    │█ │ 100%
 81M ┌─┴ .                                                         │█ │ 100%

.git/objects/hook/pack-0542d6e993463b99ac69399306e79f7efc43457b.pack file is 80M!


find historic large file paths


find object hashes of largest 10 historic files

git verify-pack -v .git/objects/pack/pack-081cab608fc2c70786413cfef1580dd9205e67e9.pack | sort -k3 -n | tail -10

for all hashes (i.e. object sha), find their historic paths

git rev-list --objects --all | grep "<object-sha-from-above>|<object-sha-from-above>|..."

save those to a new file. let's call it historic_large_file_paths.txt


Single command doing all together
\( git rev-list --objects --all | grep -E "\)(git verify-pack -v .git/objects/pack/pack-0542d6e993463b99ac69399306e79f7efc43457b.pack | sort -k3 -n | tail -10 | awk '{print \(1}' | tr '\n' '|' | sed 's/|\)//')" | awk '{print $2}' > historic_large_file_paths.txt

explanation (progressively)

  • git rev-list --objects --all: list all objects in the repository

  • grep -E <options>: grep extended regular expression

  • grep -E "<option-1>|<option-2>|...": grep for multiple options

  • "$(<any-command>)": run any command and use its output as a string

  • grep -E "$(<any-command>)": run any command and grep based on its output (above see the long command)

  • git verify-pack -v .git/objects/pack/<pack-file>.pack: verify pack file in verbose mode, showing objects by sha and their sizes

  • sort -k3 -n: sort by third column (-k3) in numeric order (-n). in our example, third column is the size of the object

  • tail -10: show last 10 lines

  • awk '{print $1}': pick the first column. in our above example, first column is the sha of objects

  • tr '\n' '|': replace newlines with pipe symbol

  • sed 's/|$//': remove trailing pipe symbol

  • awk '{print $2}': pick the second column. in our above example, second column is the path of objects

  • > historic_large_file_paths.txt: save to file



remove selected files form git history

$ brew install git-filter-repo

$ git filter-repo --paths-from-file large_file_paths.txt --invert-paths


References

  • https://www.warp.dev/terminus/remove-secret-git-history

Related


Unix & Linux Commands Cookbook




Incoming Internal References (0)

Outgoing Internal References (1)

Outgoing Web References (0)

Receive my updates

Barış Özmen © 2025