Viewing a single comment thread. View all comments

abaxeron t1_j4brylo wrote

The simplest algorithm used for basically every student-level task in my youth was "backward propagation of errors".

To run this model, you need three things: a big multi-layered filter (which will be our "AI"; a set of matrices that the initial data is multiplied on, and an activation function to mix some non-linearity in), a sufficient set of input data, and a sufficient set of corresponding output data. Sufficient for the system to pick on the general task.

Basically you take initial (empty or random) filter, feed it a piece of input data, subtract output data from corresponding desired result (finding what we call "error", i.e. difference between actual and desired result), and then you go backwards through the filter, layer by layer, and with simple essentially arithmetic operations, adjust the coefficients in a way that IF you fed the same data again, the "error" would be smaller.

If you "overfeed" one and the same input to this model 10 million times, you'll end up with the system that can only generate correct result for this, specific input.

But, when you randomly shift between several thousand options of inputs, the filter ends up in "imperfect but generally optimal" state.

The miracle of this algorithm is that it keeps working no matter how small the adjustments are, as long as they are made in the right direction.

One thing to keep in mind is that this particular model works best when the neuron activation function is monotonic, and the complexity of the task is actually limited by the amount of layers.

As a student, I made a simple demonstration program on this principle that was designing isotropic beams of equal resistance in response to given forces. During this experiment, I have proven that such a program requires two layers (since the task at hand is essentially a double integration).

I'm putting this response in because no-one seems to have mentioned backwards propagation of errors; modern and complex AI systems, especially working on speech/text, actually use more complex algorithms; it's just that this one is most intuitive and easiest to understand for humans.

1