Viewing a single comment thread. View all comments

Wonko-D-Sane t1_j47iger wrote

TL&DR: Damn this was ranty. Check out things like "perceptron", "Taylor series", and "Gradient descent"

Rant:

"AI" is a misnomer for a programming model where you don't describe the logic, but rather provide the data to "program" software. The method via which this data is generated, organized, and processed into logic is called "training" The steps to train will depend on the type of model and type of problem.

The "model" is the rules for how the data interacts to form logic, but is devout of problem specific logic. However models do model the possible logic by describing the data flow on how to search for a solution.D depending on the type of "training", supervised vs unsupervised, and the type of model you are using: A discriminator/classifier vs a transformer, training will entail some specific details and could even go as far as use another AI model to generate the training data itself. Ultimately it just a bunch of number crunching in order to calculate the coefficients (scalar weights) used at the various inputs of the model's internals. The task of training is typically massively parallel over a large data set, so SIMD (single instruction, multiple data) type processors like GPUs or TPUs have been tremendously successful since the relationship of weights to data can be represented as a multi dimensional vector (array, list, or whatever else)

There are two steps to training:

1 - black magic: selecting (or generating) the right training data, handcrafting/selecting a model (data flow) suitable for the problem, determining the source of the error signal and deciding on he cost function. The success criteria of one model vs another is biased by these factors.

2 - Crunching some numbers via Math: Linear algebra and Calculus (processing)

The weird stuff is in 1:

A raw "model" is a combination of mathematical functions that can execute on data and a description of all possible data flow between functions. When I say function, i literally mean things like "add". For example consider a basic linear function of f(x)=ax+by+cz+k in a 3d space... so 3 inputs... the job of training is to calculate a, b, c and k such that line used to connect the 3 variables in your specific problem class.... obviously i can make a subtract out of the add by training a negative coefficient, or I can recuse the function to form an exponent... the basic building block for a deep learning neural network is the "perceptron" (or the McCulloch-Pitts neuron) which was inspired by the function of neuron dendrites (weighted inputs), soma (add them up) and axon (output to other neurons)... an overly fancy way to in fact just say 1 if there is enough input to fire, and 0 if there isn't. Enough input is defined as the dot product between the input vector x (so [x y z] ) and the coefficient vector ( for our line [a b c] ) plus k (the constant) being greater than 1

A line is not a particularly sophisticated function, however by defining x as a line of some subset of other variables, you can get a generic polynomial. Engineers will recall the usefulness of Taylor series, where if you give me enough data, I can guess the right function pretty damn well. AI training is basically that, beating a polynomial into the shape of your data set without "over-fitting" (meaning, hard wiring your training data as a solution) the function.

These perceptions are wired up as necessary to create more complex functions. Additionally the model may also have some creative wiring, like "attention" which is message signalling back and forth between the neuron layers, which creates interesting behaviours in Transformer models that are oddly successful in processing human language. (until a model succeeds, this is the black magic, researchers are just chimping combinations of these structures till they find one that works for their problem)

Ultimately for training to be efficient the model must be differentiable, meaning you can find its derivative. This allows you to use calculus' gradient descent to find minima in the cost (or loss) function calculated on the errror signal.

Once you have defined a model, "training it" is easy

The process is simple...

  • pick random values for coefficients
  • from training data set pick the first sample set of inputs
  • perform dot product (lots of them) this is why SIMD/GPU across all features (input layer and intermediate layers)
  • generate output
  • compare output to reference in training set (or against objective function if rule based in self play, or against a discriminator if using a GAN or other such unsupervised learning), calculate error
  • use error as input to cost function
  • perform gradient descent on the weights to reduce the cost of the error
  • repeat until you have found global minimum (derivative of error vs stepping in your coefficients is as low as it is going to get with specific heuristics to ensure you didn't accidentally find a local minimum in the polynomial's error - "over-fit" the model) minimized the cost of error

So training requires a large amount of linear algebra and calculus processing of "training data" and "internal weights".

Once you have the model trained, and you are happy with it. Re-running it on new input data, without adjusting weights (coefficients) is called "inference" Since inference is basically calculating an already know polynomial (training discovered it, inferring it means I just have new values for the variables), the "algorithm" can be implemented as optimally as a physical circuit that requires very little computational resources, however that wouldn't be very "upgrade-able"... You are seeing "inference" engines that are basically programmable circuits like FPGAs for low power low latency realtime inference (running the algorithm derived by the model)

For what it is worth, AI is not a panacea, It is very likely that P is not equal to NP. Meaning that not all problems are solvable by a polynomial, no matter how big your calculator or how badly you beat the math into submission. Get into too many "if this then that" conditions and your model is not differentiable and you will struggle to generalize a loss function. Even in cases where you "found" a polynomial solution, you can't make the assumption real data will always fit. Humans can learn as they infer, AI typicallyu cannot for the simple reason that the computational tasks are different between learning and inference, and for a large model you'd have to packing quite the overkill of melted sand with jammed up lightning in it just so you can dynamically re-train.

3