Remove data from a git history

Preliminary note: this note applies to a simple git repository.
For a gitlab repository, please refer to https://docs.gitlab.com/ee/user/project/repository/reducing_the_repo_size_using_git.html. gitlab (and possibly other forges based on git) tends to be protective, in order to keep track of the events in the repository and to prevent discrepancy on its data.
By example, a merge request will create references pointing to the branches used when the MR was initiated; these references are not updated by the process described below, since they are internal to gitlab, meaning the git gc will not try to reclaim the git objects used by these branches.

This post is to handle the situation where a file was included by mistake in a git repository.
Common examples are sensitive data (passwords, confidential data) or binary data.

git rm is not enough, since the purpose of a SCM is to keep the history of the repository. So the file removal needs a history rewriting. The current recommended command is git filter-repo, and the usage of git filter-branch or bfg is discouraged.

git clone git@gitlab.inria.fr:namespace/repository_name.git

cd repository_name



# removes path_to_remove (i.e. keeps everything except path_to_remove)

git filter-repo --path path_to_remove --invert-paths

git gc --aggressive --prune=now 



# unprotect the branches of the repository: on gitlab, go to the 
"repository/Settings/Repository/Protected branches"

git push --all --force git@gitlab.inria.fr:namespace/repository_name.git

git push --tags --force git@gitlab.inria.fr:namespace/repository_name.git

# restore the protection on the branches of the repository

Note that the repository on the gitlab server should not see its size be reduced immediately.

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée.