Learning Large Language Models

LLMs are the new hot thing.

Relevant notes:




High-level overview

Large language models are algorithms that have been trained on vast amounts of text data, such as books, articles, and websites.

These models can understand the meaning of words and sentences, and can generate new text that sounds like it was written by a human.

They are used in a variety of applications, such as language translation, chatbots, and speech recognition (eg Whisper ).

How do they work?


the process of teaching a model to understand the meaning of words and sentences.


the process of deriving new sentences from old ones.

Assignment of a truth value to every propositional symbol, a "possible world".


Some of the most well-known ones include:

GPT (Generative Pre-trained Transformer)

developed by OpenAI, including GPT-2, GPT-3, and others.

BERT (Bidirectional Encoder Representations from Transformers)

developed by Google.

RoBERTa (Robustly Optimized BERT Approach)

developed by Facebook AI.

T5 (Text-to-Text Transfer Transformer)

developed by Google.

XLNet (eXtreme Multi-lingual Language Understanding)

developed by Carnegie Mellon University and Google.

Each of these models has its own strengths and weaknesses and is designed for specific applications.