AI: Megatron the Transformer, and its related language models – Dr Alan D. Thompson – Life Architect


AI: Megatron the Transformer, and its related language models – Dr Alan D. Thompson – Life Architect
Clip source: AI%3A%20Megatron%20the%20Transformer%2C%20and%20its%20related%20language%20models%20%u2013%20Dr%20Alan%20D.%20Thompson%20%u2013%20Life%20Architect

Skip to content
Life Architect

AI: Megatron the Transformer, and its related language models

What is Megatron?

Megatron is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA, based on work by Google.

Viz: Megatron MT-NLG (530B, September 2021)

Megatron-Turing Natural Language Generation model (MT-NLG). MT-NLG is the successor to Microsoft Turing NLG 17B and NVIDIA Megatron-LM 8.3B. The MT-NLG model is three times larger than GPT-3 (530B vs 175B).
Download source (PDF)
Contents: View the data (Google sheets)

Viz: Evolution of Megatron (2019-2021)

* RealNews is practically the same as Common Crawl News (CC-News). RealNews is 120GB from 5,000 domains from Common Crawl Dec/2016-Mar/2019. CC-News is 76GB from Common Crawl Sep/2016-Feb/2019. They are shown with different colours here (amber/blue) for interest only.
Download source (PDF)
Contents: View the data (Google sheets)

Timeline

November 2018: Google open sources BERT. Trained in four days.
Name
BERT
Bidirectional Encoder Representations from Transformers.
Lab
Google
Parameters
345M
Dataset sources
English Wikipedia (12GB)
+ BookCorpus (4GB).
Dataset total size
16GB
July 2019: Facebook AI and University of Washington introduce RoBERTa.
Name
Robustly optimized BERT approach.
Lab
FAIR (Facebook AI Research) + UW
Parameters
125M (RoBERTa-base)
Dataset sources
Trained with BERT original dataset:
English Wikipedia (12GB)
+ BookCorpus (4GB)
+ CC-News, 63 million English news articles from Sep/2016-Feb/2019 (76GB).
+ OpenWebText/Reddit upvoted (38GB).
+ Stories, 1M story documents from the CC (31GB).
Dataset total size
161GB
August 2019: NVIDIA introduces Megatron-LM. Trained in 53 minutes.
8.3 billion parameter transformer language model with data parallelism trained on 512 GPUs.
Name
Megatron-LM
Lab
NVIDIA
Parameters
8.3B (8,300M)
Dataset sources
Dataset total size
174GB
April 2020: Facebook AI Research labs introduce Megatron-11b (RoBERTa).
Megatron-11b is a unidirectional language model with 11B parameters based on Megatron-LM. Following the original Megatron work, FAIR trained the model using intra-layer model parallelism with each layer’s parameters split across 8 GPUs.
Name
Megatron-11B
Lab
FAIR (Facebook AI Research)
Parameters
11B (11,000M)
Dataset sources
Same as RoBERTa. Trained with BERT original dataset:
English Wikipedia (12GB)
+ BookCorpus (4GB)
+ CC-News, 63 million English news articles from Sep/2016-Feb/2019 (76GB).
+ OpenWebText/Reddit upvoted (38GB).
+ Stories, 1M story documents from the CC (31GB).
Dataset total size
161GB
October 2021: NVIDIA and Microsoft introduce Megatron-Turing NLG 530B (The Pile).
Megatron-Turing Natural Language Generation model (MT-NLG). MT-NLG is the successor to Microsoft Turing NLG 17B and NVIDIA Megatron-LM 8.3B. The MT-NLG model is three times larger than GPT-3 (530B vs 175B). Following the original Megatron work, NVIDIA and Microsoft trained the model on over 4,000 GPUs.
Name
Megatron MT-NLG
Lab
NVIDIA and Microsoft
Parameters
530B (530,000M)
Dataset sources
Trained with The Pile v1 + more, totalling 15 datasets:
Books3
OpenWebText2 (Reddit links)
Stack Exchange
PubMed Abstracts
Wikipedia
Gutenberg (PG-19)
BookCorpus2
NIH ExPorter
Pile-CC
ArXiv
GitHub
+ Common Crawl 2020
+ Common Crawl 2021
+ RealNews, from 5000 news domains (120GB).
+ CC-Stories, 1M story documents from the CC (31GB).
Dataset total size
>825GB
(My estimate is 1.86TB or 1,863GB)
"We live in a time where AI advancements are far outpacing Moore’s law. We continue to see more computation power being made available with newer generations of GPUs, interconnected at lightning speeds. At the same time, we continue to see hyperscaling of AI models leading to better performance, with seemingly no end in sight."
— NVIDIA and Microsoft (October 2021)

How to use it

Play with the Megatron-11B model at InferKit.com.
Dr Alan D. Thompson is an AI expert and consultant. With Leta (an AI powered by GPT-3), Alan co-presented a seminar called ‘The new irrelevance of intelligence’ at the World Gifted Conference in August 2021. He has held positions as chairman for Mensa International, consultant to GE and Warner Bros, and memberships with the IEEE and IET. He is open to major AI projects with intergovernmental organisations and impactful companies, and is currently based in the US through 2022.Contact.

This page last updated: 11/Dec/2021. https://lifearchitect.ai/megatron/

Related articles by Life Architect...

LifeArchitect.ai – Integrated AI

Read more about ‘Integrated AI’

Best-selling gifted parenting book

https://lifearchitect.ai/bright/

Get Smart (60 Minutes)

Video Player
00:00
00:00

Making Child Prodigies interview (ABC)

Video Player
00:00
00:00

Decoding Genius (GE)

Video Player
00:00
00:00

Child Genius (Warner Bros)

Video Player
00:00
00:00

Highlights from Life Architect

Integrated AI: The rising tide lifting all boats (GPT-3)
The new irrelevance of intelligence
Looking inside the advanced brain
Gifted, Talented, Genius, Prodigy
Zygote to Adult
Elon Musk's gifted school
IQ charts: Visualising brightness

Books by Life Architect

https://www.amazon.com/dp/B09M55W669
The Ultimate Coach: Steve Hardison (Amazon)

https://lifearchitect.ai/connected/
Connected: Intuition and Resonance in Smart People

https://lifearchitect.ai/people-like-me/
People like me

https://lifearchitect.ai/bright/
Bright: Seeing superstars, listening to their worlds, and moving out of the way

https://lifearchitect.ai/welcome/
Welcome: Stories to wake up to!

https://lifearchitect.ai/best-the-book/
Best: A practical guide to living your best life

Get Illumination

Get your free copy of Alan's seminar handout Illumination, for bright families and high performers, with curated content from experts around the world.

Integrated AI

AI overview
Talk to GPT
Large language models
Megatron
AI + IQ testing
GPT-4
Life-changing AI
Books written by AI
AI art
AI + the human brain
AI + BMIs
AI articles
Irrelevance of intelligence
The rising tide lifting all boats
Bonum
The sky is on fire (2021 AI retrospective)
AI video
Leta AI
GPT-3 vs IBM Watson
Aurora AI
Zhibing Hua AI (China)
AI media
Alan talks to ABC
AI is outperforming humans
AI fire alarm
AI sound bites
AI theory
AI + ethics
AI + prompt crafting
AI + spirituality
AI definitions
AI timeline
AI papers
Marvin Minsky
EleutherAI
Connor Leahy
Quotes about AI
Arguments about AI
Dr Alan D. Thompson, as featured here:
Dr Alan D. Thompson is a world expert in the fields of AI, intelligence, high performance, and personal development.
[Built to last. LifeArchitect.ai is designed to last beyond 2030: feel free to link and cite.]
© 2011-2021 Life Architect.
LifeArchitect.ai