For this month, here is what I found interesting
-
EfficientNet
This paper talk about scaling up networks. Once you have a small model working fine, you usually wants to scale it up to have better performances. You can scale up the width (wider layers), the depth (more layers) or the resolution (bigger images as input). Previous work in the literature only scale up one of these features, this paper present a method to scale these 3 features at the same time in a almost efficient way. The base network has an architecture similar to MnasNet. Here is google blog -
Annotated GPT-2
This is a nice blog article explaining the GPT-2 model. It also has links to other blog articles if you don’t know much about transformers. -
jax
Introduced in 2018 by Google, JAX is a numerical computing library that combines NumPy, automatic differentiation, and GPU/TPU support. -
meena also siraj video It’s the latest chatbot trained by google. It’s a 2.6 billion parameters model based on transformer decoder trained using a new metric called Sensibleness and Specificity Average (SSA). This new metric allows Meena to:
conduct conversations that are more sensible and specific than existing state-of-the-art chatbots.
-
Using neural networks to solve advanced mathematics equations It’s an AI system that can solve advanced mathematics equations using symbolic reasoning (they focus on integration and differential equations). To do so, they developed a new way of representing mathematical expressions compatible with seq2seq methods (first tree like structure, then sequence). It currently works for problems with one variable and they plan to expand it to multiple-variable equations. They also developed a dataset of millions of examples and as they say:
When presented with thousands of unseen expressions — equations that weren’t part of its training data — our model performed with significantly more speed and accuracy than traditional, algebra-based equation-solving software, such as Maple, Mathematica, and Matlab.
- SLIDE, it stands for SMART ALGORITHMS OVER HARDWARE ACCELERATION FOR LARGE-SCALE DEEP LEARNING SYSTEMS. It’s an algorithm that could make CPUs a cheap way to train AI. The main idea is to add a Locally Sensitive Hashing (LSH) function to each layer in order to find which weights are being used and only compute the forward and backward pass on those weights. It leads to sparse matrix computation (so less computation), uncoalesced memory access and thus favoring CPUs over GPUs. Here is also rice university news and engadget article.