Viewing a single comment thread. View all comments

RogerKrowiak t1_j6diy8w wrote

I have a very basic question. If I have two columns of data:

"Students": ["John", "John", "Roger", "Eve", "John"]
"Sex": ["M", "M", "M", "F", "M"]

can I use different encoding for each column? E.g. frequency encoding for students and binary for sex?Thank you for your answer. If you have tip for basic readings on this, it would be appreciated.

1

Maleficent-Rate6479 t1_j6fx4hp wrote

If your response variable is sex then you meed to make it binary, otherwise I do not see a problem I think.

2

qalis t1_j6ir4fh wrote

Yes, you can. Variables in tabular learning are (in general) independent in terms of preprocessing. In fact, in most cases you will perform such different preprocessings, e.g. one-hot + SVD for high cardinality categorical variables, binary encoding for simple binary choices, integer encoding for ordinal variables.

2