Submitted by Thijs-vW t3_y9xjnh in MachineLearning

One-hot-encoding is a popular method for encoding categories due to its simplicity and interpretability. Interpretability also makes it compatible method for simple machine learning algorithms (such as polynomial regression, which, yes, I classify as AI). Nonetheless, there are alternatives, such as base-n encoding which I find an appealing idea. However, I am not sure what method of encoding neural networks prefer.

What is your advice? Should you prefer one-hot encoding, base-n encoding (and which n should you choose), or some other method?

0

Comments

You must log in or register to comment.

Travolta1984 t1_it80hch wrote

If you are using a neural network model, I would use Embedding layers instead, as it can learn a latent representation of your features (something that one-hot encoding can't for example).

−3

Remarkable_Owl_2058 t1_itbh6oj wrote

I used embedding layers too using FastAI API. Results were better than one-hot encoding !

0

Thijs-vW OP t1_it82bgp wrote

I looked in to the embedding layer in Keras, but I was not impressed. They are merely fancy lookup tables. That is nice when you want to encode sentences or the like, but I have a variable with merely 51 categories. In this case, a dense layer to transform the one-hot encoded variable would achieve the same, if I am not mistaken.

−6

TheCloudTamer t1_itau2sm wrote

Embedding are dense layers, just for one-hot vector input.

3

Thijs-vW OP t1_itbb6xe wrote

Thanks for the clarification. So if I one-hot encode my categorical variable and feed it to a dense layer, I would achieve the same as with an embedding layer?

3