Viewing a single comment thread. View all comments

poo2thegeek t1_j67nbsu wrote

Chat GPT is a form of deep learning model, which is a subsection of a machine learning model. A machine learning (ML) model is one in which the decisions the model makes are based off a ‘training’ step rather than being physically encoded.

A simple example is a model that tried to distinguish between different breeds of flower. So, you give this model some information about each flower (petal length, colour, etc) as well as a ‘truth label’ (what a flower expert has said that flower is).

The model takes these numbers as inputs, these inputs are multiples by a set of numbers, have some numbers added to them, and then get passed to the output, and some value is decided as a cut off (eg, if output >5 it’s flower A, otherwise it’s flower B) If the model is wrong, all those numbers get changed a little bit, in a process known as stochastic gradient descent.

In a deep learning model, the inputs are multiplied, and then passed to a ‘hidden layer’ of nodes (often called neurons). Then these numbers are again multiplied by another set of numbers. This keeps going for multiple layers until you get to the output layer.

This is an over simplification, but is the basis of how things like chatGPT work. They simply look for patterns, and output the next word based on what they think matches the pattern.

What makes chat gpt pretty powerful is (mostly) it’s size. It contains 175 billion of those numbers that have to get updated while training, and so takes a long time + is very expensive to train

12