Рет қаралды 12,569
Paper presented by Gail Weiss to the Neural Sequence Model Theory discord on the 24th of February 2022.
Gail's references:
On Transformers and their components:
- Thinking Like Transformers (Weiss et al, 2021) arxiv.org/abs/2106.06981 (REPL here: github.com/tech-srl/RASP)
- Attention is All You Need (Vaswani et al, 2017) arxiv.org/abs/1706.03762
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al, 2018) arxiv.org/abs/1810.04805
- Improving Language Understanding by Generative Pre-Training (Radford et al, 2018) s3-us-west-2.amazonaws.com/op...
- Are Transformers universal approximators of sequence-to-sequence functions? (Yun et al, 2019) arxiv.org/abs/1912.10077
- Theoretical Limitations of Self-Attention in Neural Sequence Models (Hahn, 2019) arxiv.org/abs/1906.06755
- On the Ability and Limitations of Transformers to Recognize Formal Languages (Bhattamishra et al, 2020) arxiv.org/abs/2009.11264
- Attention is Turing-Complete (Perez et al, 2021) jmlr.org/papers/v22/20-302.html
- Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers (Wei et al, 2021) arxiv.org/abs/2107.13163
- Multilayer feedforward networks are universal approximators (Hornik et al, 1989) www.cs.cmu.edu/~epxing/Class/...
- Deep Residual Learning for Image Recognition (He at al, 2016) www.cv-foundation.org/openacc...
- Universal Transformers (Dehghani et al, 2018) arxiv.org/abs/1807.03819
- Improving Transformer Models by Reordering their Sublayers (Press et al, 2019) arxiv.org/abs/1911.03864
On RNNs:
- Explaining Black Boxes on Sequential Data using Weighted Automata (Ayache et al, 2018) arxiv.org/abs/1810.05741
- Extraction of rules from discrete-time recurrent neural networks (Omlin and Giles, 1996) www.semanticscholar.org/paper...
- Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples (Weiss et al, 2017) arxiv.org/abs/1711.09576
- Connecting Weighted Automata and Recurrent Neural Networks through Spectral Learning (Rabusseau et al, 2018) arxiv.org/abs/1807.01406
- On the Practical Computational Power of Finite Precision RNNs for Language Recognition (Weiss et al, 2018) aclanthology.org/P18-2117/
- Sequential Neural Networks as Automata (Merrill, 2019) aclanthology.org/W19-3901.pdf
- A Formal Hierarchy of RNN Architectures (Merrill et al, 2020) aclanthology.org/2020.acl-mai...
- Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets (Joulin and Mikolov, 2015) proceedings.neurips.cc/paper/...
- Learning to Transduce with Unbounded Memory (Grefenstette et al, 2015) proceedings.neurips.cc/paper/...
Paper mentioned in discussion at the end:
- Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth (Dong et al, 2021) icml.cc/virtual/2021/oral/9822