핑퐁팀 ML 세미나, 그 다섯 번째

핑퐁 ML 리서치 사이언티스트들과 엔지니어들의 시즌 5 세미나 자료

구상준 김준성 박채훈 백영민 서상우 이주홍 장성보 정다운 정욱재 홍승환 | 2020년 07월 28일 | #Machine_Learning

안녕하십니까? 여러가지로 혼란스러웠던 2020년도 반 넘게 지나가고 있습니다. 저희 사이언티스트들과 엔지니어들의 시즌 5 세미나 발표자료를 갈무리하여 올립니다.

본 세미나는 2020년 4월, 5월 동안 매주 진행되었으며, 지난 시즌과 마찬가지로 주제의 제한 없이 자유롭게 시작했습니다. 저번 시즌과 마찬가지로 Transformer를 많이 다루었는데 ‘어떻게 Transformer를 잘 활용할 수 있을까?’에 대한 답을 찾는 과정에서 비롯되었습니다. 또한 이번에는 엔지니어 두 분께서 함께 참여해주시어 ‘어떻게 하면 거대한 Transformer 모델을 성능을 떨어뜨리지 않고 잘 줄일 수 있을까?’에 대한 실증적인 논문들을 많이 발표해주시어 더 유익한 다섯 번째 시즌을 보냈습니다.

Dialogue Natural Language Inference (장성보)

Dialogue Natural Language Inference
- Written by Sean Welleck et al. @ New York University & Facebook AI Research
- Published @ ACL 2019

Unified Language Model Pre-training for Natural Language Understanding and Generation (서상우)

Unified Language Model Pre-training for Natural Language Understanding and Generation
- Written by Li Dong et al. @ Microsoft Research
- Published @ NeurIPS 2019

NLP 모델의 수치해석 능력 (백영민)

Do NLP Models Know Numbers? Probing Numeracy in Embeddings
- Written by Eric Wallace et al. @ Allen Institute for AI, Peking University & University of California, Irvine
- Published @ EMNLP 2019
Injecting Numerical Reasoning Skills into Language Models
- Written by Mor Geva et al. @ Tel Aviv University & Allen Institute for AI
- Published @ ACL 2020
Deep Learning for Symbolic Mathematics
- Written by Guillaume Lample and Francois Charton @ Facebook AI Research
- Published @ ICLR 2020

Knowledge Distillation for BERT (정욱재)

DynaBERT: Dynamic BERT with Adaptive Width and Depth
- Written by Lu Hou et al. @ Huawei Noah’s Ark Lab
- Preprinted in arXiv 2020
FastBERT: a Self-distilling BERT with Adaptive Inference Time
- Written by Weijie Liu et al. @ Peking University, Tencent Research & Beijing Normal University
- Published @ ACL 2020
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
- Written by Victor Sanh et al. @ Hugging Face
- Published @ 5th Workshop on Energy Efficient Machine_Learning and Cognitive Computing - NeurIPS 2019
Patient Knowledge Distillation for BERT Model Compression
- Written by Siqi Sun et al. @ Microsoft Dynamics 365 AI Research
- Published @ EMNLP 2019

8-Bit Quantization of Transformer Model (정욱재)

Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model
- Written by Aishwarya Bhandare et al. @ Artificial intelligence Products group, Intel Corp.
- Published @ Joint Workshop on On-Device Machine_Learning & Compact Deep Neural Network Representations - ICML 2019

Generalized ODIN: Detecting Out-of-distribution Image without Learning from Out-of-distribution Data (김준성)

Generalized ODIN: Detecting Out-of-distribution Image without Learning from Out-of-distribution Data
- Written by Yen-Chang Hsu et al. @ Georgia Institute of Technology & Samsung Research America
- Published @ CVPR 2020

SYNTHESIZER: Rethinking Self-Attention in Transformer Models (정다운)

SYNTHESIZER: Rethinking Self-Attention in Transformer Models
- Written by Yi Tay et al. @ Google Research Mountain View
- Preprinted in arXiv 2020

Byte-level BPE: Neural Machine Translation with Byte-Level Subwords (이주홍)

Neural Machine Translation with Byte-Level Subwords
- Written by Changhan Wang et al. @ Facebook AI Research, New York University & CIFAR Global Scholar
- Preprinted in arXiv 2019

Pruning Basics on Multi Head Attention-based Models (홍승환)

Are Sixteen Heads Really Better than One?
- Written by Paul Michel et al. @ Carnegie Mellon University & Facebook AI Research
- Published @ NeurIPS 2020
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
- Written by Elena Voita et al. @ Yandex, University of Amsterdam, University of Edinburgh, University of Zurich & Moscow Institute of Physics and Technology
- Published @ ACL 2019
Reducing Transformer Depth on Demand with Structured Dropout
- Written by Angela Fan et al. @ Facebook AI Research
- Published @ ICLR 2020

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (박채훈)

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- Colin Raffel et al. @ Google
- Preprinted in arXiv 2019

What can neural networks reason about? (구상준)

What can neural networks reason about?
- Keyulu Xu et al. @ Massachusetts Institute of Technology, University of Maryland, Institute for Advanced Study & National Institute of Informatics
- Published @ ICLR 2020

마치며

2020년 봄에 진행되었던 머신러닝 세미나 자료를 공유해보았습니다. 비록 각 발표자료의 주제는 다르지만 Transformer가 가진 위력과 이 위력을 제대로 발휘하기 위해서는 어떤 노력이 뒤따라야하는지 보여주었다고 요약할 수 있겠습니다.

저희는 계속해서 사람보다 더 사람다운 인공지능을 만드는 데 온 힘을 다하고 있으며 앞으로도 그 노력의 결실을 나눌 수 있기를 바랍니다. 감사합니다.