Terminology

Here are a few terms you should understand. You can learn more elsewhere, such as on Wikipedia.

AI = Artificial Intelligence: Attempts to make computers able to do tasks that humans can do. There have been many different approaches to AI over the years. For example, specialized search alogorithms were behind the success of the Deep Blue chess computer that beat world champion Garry Kasparov. Nowadays, AI is largely synonymous with ML.
NL = natural language: A language used by humans to communicate with one another, such as English, Mandarin, or Klingon.
GenAI = Generative AI: AI tools that produce artifacts, such as texts or pictures, based on natural-language input. Previous generations of AI required more specialized input and instructions, or acted primarily to make decisions, such as classifying inputs into categories.
Model: A compact mathematical representation of information. For example, the formula "$10 * number_of_pizzas + $5" model the cost of having `number_of_pizzas` delivered, where $10 represents the price per pizza and $5 is the delivery fee. It represents a number of possibilities, such as $15 for one pizza, $45 for four pizzas, etc. This model has two parameters: $10 and $5. `number_of_pizzas` is not a parameter, but an input to the model. A model may be exact (like this one) or approximate.
ML = Machine Learning: Processing large amounts of data in order to create a model of that data. The data used to create the model is called the "training set". One use for the model is to classify a new datum: is it like the data in the training set, or not? Another use for the model is to generate a new datum that is like the ones in the training set.
Overfitting: Creating a model that works fo the training set but does not generalize. For example, given the data { (1 pizza, $15), (4 pizzas, $45) }, a ML model might just memorize those values and be unable to predict the cost for 3 pizzas.
LLM = Large Language Model: A model that is tuned to natural language input and output. ChatGPT is an example of an LLM. The "large" part of the name indicates that the model contains billions of parameters. ChatGPT is a well-known LLM.
Neural network: A model that is represented as a directed graph. Each edge in the graph carries a floating-point number. Each node in the graph takes some number of floating-point inputs and computes a floating-point output. The input nodes have no incoming edges, and the output nodes have no outgoing edges. The input and output are each a list (vector) of floating-point numbers.
A neural network's behavior depends on its "weights". Each edge has a weight. The weight indicates how much the source node (the "from" node of the edge) affects the target node (the "to" node of the edge). In other words, an edge weight indicates how much one neuron affects another.
The weights are set (and therefore the network's behavior is determined) by training? Traditionally, a neural network is intended for some purpose, such as answering questions about Pokémon. (These days, LLMs are trained to be useful for every purpose.) When the input is a question about Pokémon, the output should be the answer to that question. The neural network is trained by being asked many, many questions about Pokémon (often, training goes through all of the training data multiple times). When it gets the answer right, that is an indication that the weights are good. When it gets the answer wrong, the "loss" indicates how to change the current weights so that it does better in the future.
Embedding = representation: The representation of some concept as a vector of floating-point numbers. Although a model may seem to operate on text, images, or other data, it really just processes vectors of floating-point numbers. An embedding converts some datum (such as a user's natural-language query to an LLM) into a vector of numbers, and can convert a vector of numbers back into a datum (such as the LLM's textual reply).
Token: The atomic unit of the embedding. A word typically takes about 2 token to represent. Note that in English and in programming language parsing, "token" means a distinct lexical unit. "int a = b + c;" contains 7 programming language tokens (whitespace is ignored), but it might be represented by fewer or more ML tokens.
Transformer: A particular type of neural network that, for each input token, produces one output token. Transformers for translation of NL were an early success of ML. You can think of a translation transformer as taking, as input, each word of a text. For each input word, it outputs a word.
The output depends not only on the current word, but on all the words seen so far, so the translation is not simply one-to-one without any consideration of the context in which the word is used.
Furthermore, the output of most ML models depends on random choices. The model does not merely choose the most likely next word/token based on its training data, but chooses among all possible next words, based on their likelihood.
History: The transcript of all of the user's interactions with the model. The history affects the model's output: that is, the output depends not only on the user's current query but also all of the user's previous queries. Starting a new session generally resets the history.
Hallucination: A false statement made by a model. The model does not understand the world or reason about it; rather, the model is just a set of weights in a network, that have been trained to respond in certain ways. When the user asks a question, the model outputs words that are related to the question, according to the weights in the network. Those words might or might not be the correct answer (whether or not the training data contained the answer to the user's question). Models can say, "I don't know", but they tend to be trained to give some answer rather than giving up, because most users seem to prefer that behavior.
Memorization: Sometimes a model can exactly output some fact or text. A pure model does not have a database of facts that it can look up. (Agentic models and R.A.G. models do have such a capability, but let's disregard them for the time being.) Nonetheless, the weights in the network, by predicting the next word repeatedly, regurgitates, say, the entire text of Romeo and Juliet or of some copyrighted work. (This might or might not require the training data to contain multiple copies of the text that gets memorized.)
Agent: A model that can act independently. In the past, ML models only answered questions, such as questions posed to ChatGPT or the implicit question in an editor, "What text could follow the cursor?" An agent is able to perform actions as well, such as editing a text, doing lookups on the web, or running a program.
Static analysis: A program analysis that examines the program's source code but does not run the program. An example is a compiler. By contrast, a dynamic analysis runs the program. Examples are testing and profiling.
Linting: A linter is a static program analysis tool that warns about poor style and error-prone constructs. For example, a linter might warn if you write if (a = b) { ... } because maybe you intended to write if (a == b) { ... } — and even if you meant if (a = b), that is tricky and poor style.
Verifier: A verifier is a static program analysis tool that mathematically proves some property about a program. For example, it might prove that the program never dereferences null, or it might prove that a procedure implementation is consistent with the procedure's specification.
git branch: A git branch is an independent copy of the entire git repository. You might use one branch to implement feature1, a different branch to implement feature2, and a third branch to fix a bug in feature3.