Viewing a single comment thread. View all comments

TheCoconutTree t1_j6jjb43 wrote

Discrete features as training data:

Say I am using SQL table rows as training data input for a deep neural net classifier. One of the columns contains a number from 1-5 representing a discrete value, say type of computer connection. It could be wifi, mobile-data, LAN, etc. What would be the best way to represent as input features? Right now I'm thinking split into a five dimensional vector, one for each possible value. Then pass 0 or 1 depending on whether a given feature is selected. I'm worried that including the range of values as a single vector would lead to messed up learning since one discrete value doesn't have any meaningful closeness to it's nearest discrete neighbor.

1

pronunciaai t1_j6l3lzv wrote

Your suggested approach is the correct one and is called "one-hot encoding". Your thinking about why an embedding (single learned value) is inappropriate is also accurate.

1