Comments

You must log in or register to comment.

mysakbm t1_it661l7 wrote

I think that 2. it's data leak. That would explain the bump.

However, if you created the averaged vector from train set and then new one for test set then I'm wrong.

2

ultronthedestroyer t1_it6ah10 wrote

Wait - you used the labels, which you're trying to predict, to construct a feature for each user?

Sounds like you've just leaked your data unless you explain the methodology more.

1

Integral_humanist OP t1_it6m96u wrote

Nope used the same one. Created the vectors on the users in the training the data, and then reused them for the test data.

Since I'm predicting the future behavior of the same users, this isn't a problem right?
I'm essentially using past user behavior via value, to predict future (same) user behavior with different content categories.

1