# Context
The jupyter notebooks contain both code to execute, outputs and metada related with the notebook. Among those metadata, there is by example the number of times (“execution_count”) a given cell has been run.
Below is an extract of a notebook file, with in bold some of the data that will produce noisy commits.
{
“cells”: [
{
“cell_type”: “code”,
“execution_count”: 1,
“metadata”: {},
“outputs”: [
{
“data”: {
“text/plain”: [
“NULL”
]
},
“metadata”: {},
“output_type”: “display_data”
}
],
“source”: [
“R_INSTALL_DIR=paste(\”../build\”, \”wrp/sdrCore/R\”, \”\”, sep=\”/\” )\n”,
“dyn.load(paste(R_INSTALL_DIR, \”sidres\”, .Platform$dynlib.ext, sep=\”\”))\n”,
“source(paste(R_INSTALL_DIR, \”sidres.R\”, sep=\”\” ) )\n”,
“names(\”sidres\”)\n”
]
},
Keeping the outputs and metadata such as execution_count in a git repository can be annoying since each time someone runs a notebook, those data change, even if the runned code has not changed. That produce git commits containing irrelevant changes, make the changes less readable.
The nbstripout script cleans a notebook from its outputs and metadata, letting git only seen the code parts when checking whether the notebook was modficated.
# Installation of nbstripout with conda
The nbstripout can be found at: https://github.com/kynan/nbstripout.
- the base environment of conda contains nbstripout, to make it available in all the projects:
conda conda install -c conda-forge nbstripout
- ${HOME}/.gitattributes contain:
*.ipynb filter=nbstripout
*.ipynb diff=ipynb
- the ${HOME}/.gitconfig contain the following sections:
[filter "nbstripout"]
clean = nbstripout
smudge = cat
required = true
[diff "ipynb"]
textconv = nbstripout -t
This file can be filled either using your favourite editor (should be vim, anyway) or using the following git commands:
git config --global filter.nbstripout.clean nbstripout
git config --global filter.nbstripout.smudge cat
git config --global filter.nbstripout.required true
git config --global diff.ipynb.textconv "nbstripout -t"