Comments

You must log in or register to comment.

Marvsdd01 t1_ismmzln wrote

If I understood you correctly, you can handle dates and diffs of dates as a diff of Unix timestamp representations of these dates. Any programming language should have a time data manipulation lib and should offer APIs for converting dates to they Unix timestamp values. It is an approach, but has its limitations. Using months, days and years as different features is also possible. Using cyclical encoding of dates is also possible, buy I use to see this kind of thing only when dealing with the hours, minutes and seconds of a date. Embedding these dates, if we're talking about embedding dates by using an ML algorithm to generate these representations, seems a really, really bad idea, as, in my point of view, it adds work without adding any benefits to your solution. If you're not talking about that, then sorry, but I couldn't understand what you meant by taking about these "embeddings of dates" :)

6

tal_franji t1_isn26dv wrote

In some application one way to represent dates for periodic cycles is by encoding year, month an week period by sin/cos pair. For example if you think yearly period (seasons) has meaning - create teo featurs cos(year_day/3652pi) an sin(). (In financial applications day of month makes sense,in consumer tradic - day of week)

5

seiqooq t1_isn62an wrote

What is your ground truth? How does the data available for prediction differ from your GT? Depending on your answers, dates may add noise to your predictions.

2

Meddhouib10 t1_isnjh7k wrote

Yes I by embedding I meant transforming each number of months to a vector, like nn.embedding in pytorch (knowing that the difference between dates can’t be more than 5 years so 60 months) Thanks for the answer !

1

Meddhouib10 t1_isnjlvi wrote

I have couples of dates and procedures/tests ans their results. So having the date is important (per example a patient had cancer 5 years ago and was treated using conization)

1

Marvsdd01 t1_iso6fyu wrote

So maybe you could make every date an Unix timestamp, which is an integer, then you get the difference between those integers, then you can use an standard or min max scaler to put it under a certain interval.

I do not think anyone ever encoded dates as embeddings the way you're proposing, just because you can already get these kind of representations by using Unix timestamp.

3

ThrowThisShitAway10 t1_iso8i7p wrote

What seems to matter here is not the dates but rather the amount of time between scans, right?

1

seiqooq t1_isoymwj wrote

Gotcha. In that case I’d use sinusoid embedding like others have suggested. Another alternative is normalizing all of the dates onto some small range, eg [0,1]

1