Ricenaros t1_jegqd3d wrote on March 31, 2023 at 10:04 PM

Reply to comment by FermiAnyon in [D] Turns out, Othello-GPT does have a world model. by Desi___Gigachad

> The point is it's finite

Seems to indicate that you're talking about finite/infinite, no?

Ricenaros t1_jefllyl wrote on March 31, 2023 at 5:31 PM

Reply to comment by FermiAnyon in [D] Turns out, Othello-GPT does have a world model. by Desi___Gigachad

What does (in)finite have to do with anything? Infinity is an abstract mathematical concept used for modeling purposes and has nothing to do with physical reality.

Ricenaros t1_jeax41q wrote on March 30, 2023 at 6:09 PM

Reply to comment by RecoilS14 in [D] Simple Questions Thread by AutoModerator

I would suggest picking up either pytorch or tensorflow and sticking with one of these while you learn (personally I'd choose pytorch). It'll be easy to go back and learn the other one if needed once you get more comfortable with the material.

Ricenaros t1_jeawpf3 wrote on March 30, 2023 at 6:06 PM

Reply to comment by sparkpuppy in [D] Simple Questions Thread by AutoModerator

It refers to the number of scalars needed to specify the model. At the heart of machine learning is matrix multiplication. Consider input vector x of size (n x 1). Here is a Linear transformation: y = Wx + b. In this case, the (m x n) matrix W(weights) and the (m x 1) vector b(bias) are the model parameters. Learning consists of tweaking W,b in a way that lowers the loss function. For this simple linear layer there are m*n + m scalar parameters (The elements of W and the elements of b).

Hyperparameters on the other hand are things like learning rate, batch size, number of epochs, etc.

Hope this helps.

Ricenaros t1_j0zqyxw wrote on December 20, 2022 at 4:58 PM

Reply to comment by vprokopev in [D] Why are we stuck with Python for something that require so much speed and parallelism (neural networks)? by vprokopev

using vectorized operations isn't just a design choice of the language you're programming in. It's a fundamental concept for optimizing code. for loops don't magically become fast just because you're using C++. For example, google "vectorize for loop c++" there are tons of results. In general you don't want to be using loops, especially for large scale data problems.

Ricenaros t1_izkmdxd wrote on December 9, 2022 at 8:11 PM

Reply to [D] Simple Questions Thread by AutoModerator

I'm trying to understand concepts involving feature engineering and correlation, because I feel like I'm encountering conflicting ideas about these two points. On the one hand, we can generate new features by combining our existing features, for example multiplying feature 1 by feature 2. This is said to improve ML models in some cases.

On the other hand, I have read that a desirable property of our input/output data is predictors being highly correlated with the target variable, but not correlated with other predictors. This idea seems to conflict with feature engineering, as our newly derived features can be correlated with the features they were constructed from. Am I missing something here?

Ricenaros t1_iwdldik wrote on November 14, 2022 at 9:18 PM

Reply to [D] What does it mean for an AI to understand? (Chinese Room Argument) - MLST Video by timscarfe

I relate to the person in the room so hard. I often feel like I have no idea what I'm doing