Here are this month articles:
Hydra A fresh look at configuration for machine learning projects
Hydra is a recently released open-source Python framework developed at Facebook AI that simplifies the development of research and other complex applications.
Great when you want to launch the same code with different options:
python my_app.py --multirun dataset=imagenet,cifar10 optimiser=adam,nesterov.
This will launch 4 runs!
On this subject, there is also some articles here and facebook blog
To put it short, Facebook AI Research (FAIR) build a library to combine deep learning and 3D objects. It also contains Mesh R-CNN, a model introduced last year capable of rendering 3D objects from 2D shapes in images of interior spaces. There is also a differentiable mesh renderer (look at facebook blog)
88 Lines of Code to Simulate a Real Physical Environment using MLS-MPM (Moving Least Squares Material Point Method).
Later, Mr. Hu pushed the work one step further and proposed DiffTaichi, differentiable programming, which was included in ICLR 2020.
In the code in this article, Hu created 10 different physical simulators and benchmarked their performance against existing benchmarks.) Not only 2D, more complex 3D elastomers can also be simulated. github, website difftaichi
ZeRO & DeepSpeed
The latest trend in AI is that larger natural language models provide better accuracy; however, larger models are difficult to train because of cost, time, and ease of code integration. Microsoft is releasing an open-source library called DeepSpeed, which vastly advances large model training by improving scale, speed, cost, and usability, unlocking the ability to train 100-billion-parameter models. DeepSpeed is compatible with PyTorch.
Self-training with Noisy Student improves ImageNet classification code on efficientnet repo has been updated to include it!
Didn’t have time to read the paper yet but I will do it soon. It set SOTA on Imagenet with less parameters! Here is an extract from the abstract
We present a simple self-training method that achieves 88.4% top-1 accuracy on ImageNet, which is 2.0% better than the state-of-the-art model that requires 3.5B weakly labeled Instagram images. On robustness test sets, it improves ImageNet-A top-1 accuracy from 61.0% to 83.7%, reduces ImageNet-C mean corruption error from 45.7 to 28.3, and reduces ImageNet-P mean flip rate from 27.8 to 12.2.
To achieve this result, we first train an EfficientNet model on labeled ImageNet images and use it as a teacher to generate pseudo labels on 300M unlabeled images. We then train a larger EfficientNet as a student model on the combination of labeled and pseudo labeled images. We iterate this process by putting back the student as the teacher. During the generation of the pseudo labels, the teacher is not noised so that the pseudo labels are as accurate as possible. However, during the learning of the student, we inject noise such as dropout, stochastic depth and data augmentation via RandAugment to the student so that the student generalizes better than the teacher.
- Reformer the efficient transformer
Here is the blog, the paper, and a siraj video. We will be using reformer-pytorch for one a benchmark soon.
It’s quite a nice evolution from the original transformer allowing to train it on just one GPU! It brings mainly 2 evolutions to the original transformer:
- LSH (Local Sensitive Hashing) Attention
- Reversible layer