Preliminary note: this note applies to a simple git repository.
For a gitlab repository, please refer to https://docs.gitlab.com/ee/user/project/repository/reducing_the_repo_size_using_git.html. gitlab (and possibly other forges based on git) tends to be protective, in order to keep track of the events in the repository and to prevent discrepancy on its data.
By example, a merge request will create references pointing to the branches used when the MR was initiated; these references are not updated by the process described below, since they are internal to gitlab, meaning the git gc
will not try to reclaim the git objects used by these branches.
This post is to handle the situation where a file was included by mistake in a git repository.
Common examples are sensitive data (passwords, confidential data) or binary data.
git rm
is not enough, since the purpose of a SCM is to keep the history of the repository. So the file removal needs a history rewriting. The current recommended command is git filter-repo
, and the usage of git filter-branch
or bfg
is discouraged.
git clone git@gitlab.inria.fr:namespace/repository_name.git
cd repository_name
# removes path_to_remove (i.e. keeps everything except path_to_remove)
git filter-repo --path path_to_remove --invert-paths
git gc --aggressive --prune=now
# unprotect the branches of the repository: on gitlab, go to the
"repository/Settings/Repository/Protected branches"
git push --all --force git@gitlab.inria.fr:namespace/repository_name.git
git push --tags --force git@gitlab.inria.fr:namespace/repository_name.git
# restore the protection on the branches of the repository
Note that the repository on the gitlab server should not see its size be reduced immediately.