Relevant notes:
High-level overview
Large language models are algorithms that have been trained on vast amounts of text data, such as books, articles, and websites.
These models can understand the meaning of words and sentences, and can generate new text that sounds like it was written by a human.
They are used in a variety of applications, such as language translation, chatbots, and speech recognition (eg Whisper ).
How do they work?
Training
the process of teaching a model to understand the meaning of words and sentences.
Inference
the process of deriving new sentences from old ones.
Assignment of a truth value to every propositional symbol, a "possible world".
Models
Some of the most well-known ones include:
GPT (Generative Pre-trained Transformer)
developed by OpenAI, including GPT-2, GPT-3, and others.
BERT (Bidirectional Encoder Representations from Transformers)
developed by Google.
RoBERTa (Robustly Optimized BERT Approach)
developed by Facebook AI.
T5 (Text-to-Text Transfer Transformer)
developed by Google.
XLNet (eXtreme Multi-lingual Language Understanding)
developed by Carnegie Mellon University and Google.
Each of these models has its own strengths and weaknesses and is designed for specific applications.