csreid

csreid t1_j91llzp wrote

Reply to comment by Optimal-Asshole in [D] Please stop by [deleted]

>Be the change you want to see in the subreddit.

The change I want to see is just enforcing the rules about beginner questions. I can't do that bc I'm not a mod.

40

csreid t1_j8dqcrs wrote

I like that /r/science (I think?) has verification and flair to show levels of expertise in certain areas, and strict moderation. I wouldn't hate some verification and a crackdown on low-effort bloom-/doom-posting around AI ("How close are we to star trek/skynet?").

1

csreid t1_j2yva43 wrote

Yes, but I'm less sure about language models at the really high level (eg arriving at novel solutions to hard problems through LLMs).

Most ML in practice isn't about doing better than a person, it's about doing it faster and cheaper. Could a human who studied my viewing habits curate better Netflix recommendations for me? Obviously, but Netflix can't afford to do that for everyone and it would take forever.

There's also ML that's not based on data generated by humans. I know we're in the era of LLMs, but that's not all there is

1

csreid t1_j109ic5 wrote

>I also think that federation is a terrible way to run a social network

How come? I was a little put off by the federated nature, but it doesn't actually get in your way once you're in. I expected it to be more siloed but it's not. Discoverability actually seems better than twitter bc people sort themselves into nice buckets. It's a little like if "ML twitter" was an actual thing rather than just a collection of accounts.

I am also into the idea of opting in to mod/admin policies that suit me, and I've become pretty skeptical of centralizing after this whole fiasco

2

csreid t1_j0wsuhz wrote

> the same group(s)

What do you mean? I'm not turned off by the groups on discord/slack, I'm turned off by the whole experience. It's like ppl are trying to jam a social network into a chat app (bc they are).

I'm using tusky for mastodon on my phone and it's kinda rad. I will probably never use the native Mastodon web interface. I also never used the Twitter web interface, but I'm learning that's maybe weird

3

csreid t1_izbhhqs wrote

What you're describing is just called "question answering" in NLP afaik. A language model will take in a source document and a question and spit out either a generated answer to the question or a section of the source text containing the answer.

Check some of the QA models on huggingface to get an idea if you're not already familiar

1

csreid t1_izbgnyy wrote

> The spatial components make me want to use a CNN, but each input being just a 1x3 vector rather than something bigger makes me think that's not possible?

The point of the convolution is to efficiently capture information from surrounding pixels when considering a single pixel. Back in the pre-DL olden days, computer vision stuff still involved convolutions, they were just handcrafted -- we had a lot of signal processing machinery we could use to eg detect edges and such. In your case, you don't really have anything to convolve over.

You could try just feeding the coordinates into an MLP with the other covariates and it should be able to capture that spatial component.

1

csreid t1_iykq7xn wrote

And it's sometimes kinda hard to realize you're doing a bad job, especially if your bunk experiments give good results

I didn't have a ton of guidance when I was writing my thesis (so, my first actual research work) and was so disheartened when I realized my excellent groundbreaking results were actually just from bad experimental setup.

Still published tho! ^^jk

13

csreid t1_iwrh9rt wrote

>- Is reward a input?

Kind of, in that it comes from the environment

>- Is reward the process of constant retraining?

I'm not sure what this means

>- Is reward the process of labeling?

No, probably not, but I'm not sure what you mean again.

>- Can it only be used with mdp?

MDP is part of the mathematical backbone of reinforcement learning, but there's also work on decision processes that don't satisfy the Markov property (a good google term for your card-playing use case would probably be "partially observable Markov decision processes", for example)

>- Can it only be used in ql / dql?

Every bit of reinforcement learning uses a reward, afaik

>- I dont use cnn and images, can it be done without?

Absolutely! Training process is the same regardless of the underlying design of your q/critic/actor/etc function

>- Lots of examples out there using «gym», can you do it without?

You can, you just need something which provides an initial state and then takes actions and returns a new state, a reward, and (sometimes) an "end of episode" flag.

>- Many examples use -100 to 100 as reward, should it not be -1 to 1?

Magnitude of reward isn't super important as long as it's consistent. If you have sparse rewards (eg 0 except on win or loss), it might help to have larger values to help the gradient propagate back through the trajectory, but that's just me guessing. You can always try scaling to -1/1 and see how it goes.

I read "Reinforcement Learning" by Sutton and Barto (2018 edition) over a summer and it was excellent. Well-written, clear, and extremely helpful. I think what you're missing is maybe the Bellman background context.

1

csreid t1_iwm0di2 wrote

I've always kicked around the idea of using a Hawkes process to model the concept of "momentum" in sports (which statistically doesn't seem to exist but has tons and tons of people who will chase you with weapons when you tell them that), but I'm lazy.

You wouldn't be willing to open source the code here, would you? 😅

1

csreid t1_isat8n1 wrote

Sometimes I wonder what, when I'm old, is going to be the thing that my generation was obviously backwards and awful and ignorant about, but more and more I think it's gonna be that lots of animals are smarter/more aware than we realized and we're going to be severely but fairly judged for the way we treated them.

17