james_mclellan t1_je5ru4r wrote on March 29, 2023 at 4:44 PM

Two questions :

(1) Does anyone create missing data when constructing models? Examples - searchjng for stronger relationships between data set and first and second derivatives of time series data, compairsons to same day of week last N periods, same holiday last N periods; examining distance to an urban center for geodata

(2) Does anyone use a model that falls back on functions when a match is not 100%? For example, "apple" may mean fruit, music, machines, music companies or machine companies -- instead of a number 0 to 1 of the probable meaning, does anyone use models where the code "performs a test" to better disambiguate?

gmork_13 t1_je7fmm8 wrote on March 29, 2023 at 11:20 PM

I'm assuming you don't mean missing values in your dataset.

You can create 'missing' data, but if you create the missing data out of the data you already give to the model you're sort of doing the work for it. For compute efficient reasons you might want to avoid giving it 'unnecessary' data. What is unnecessary can be hard to define. Think about what you want the model to grasp in the first place.
I'm not sure what you mean by performing a test. If you were to train a language model the context of the word would define its meaning. You can always take the output probs of a model and do something with that if you'd like (for instance, if it's lots of low probability alternatives - do something).