Jurph

Jurph t1_j73ozbe wrote

Hey, I dove into "Progressive Growing of GANs" without knowing what weights were. And now here I am, four or five years later. I've trained my own classifiers based on ViTs, DNNs, written python interfaces for them, and I'm working on tooling to make Automatic1111's GUI behave better with Stable Diffusion. We've all got to start somewhere.

3

Jurph t1_j71nymu wrote

I recommend diving in, but getting out a notepad and writing down any term you don't understand. So if you get two paragraphs in and someone says this simply replaces back-propagation, making the updated weights sufficient for the skip-layer convolution and you realize that you don't understand back-prop or weights or skip-layer convolution ... then you probably need to stop, go learn those ideas, and then go back and try again.

For deep neural nets, back-propagation, etc., there will be a point where a full understanding will require calculus or other strong mathematic principles. For example, you can't accurately explain why back-prop works without a basic intuition for the Chain Rule. Similarly, activation functions like ReLu and sigmoid require a strong algebraic background for their graphs to be a useful shorthand. But you can "take it on faith" that it works, treat that part of the system like a black box, and revisit it once you understand what it's doing.

I would say the biggest piece of foundational knowledge is the idea of "functions", their role in mappings and transforms, and how things similar to Newton's Method are meant to work to get approximate solutions after several steps. A lot of machine learning is based on the idea of expressing the problem as a composed set of mathematical expressions that can be solved iteratively. Grasping the idea of a "loss function" that can be minimized is core to the entire discipline.

18

Jurph t1_isxlxf9 wrote

> prioritise memorising docs and single line solutions

This is actually toxic to long-term best practices for a business whose intellectual property is stored as source code. Source code is for humans to read, and so one-lining it into a very clever but obscure invocation is costly in two ways: it costs the writer time & effort to "compress" it, and then it costs every maintainer time & effort to "decompress" it. Five well-commented lines of code that have clear variable names are superior -- from a business case, and a security case -- than one line. In most scripting languages those one-liners compile (hand-wave, whatever) to the same machine code as the five good lines, so there's typically no performance difference.

> claims they hire the top 3%

They hire 3% of candidates, so obviously it's the top 3%, and not an arbitrary slice of the candidate pool filtered by their bogus biases, right? I'm a hiring manager and this interview process sounds totally garbage. I suspect they have no data that correlate their interview process to productivity on the job.

2