I have a data set with three columns and want to predict a numerical value. The data set is divided into groups such that each group is 50 rows. There is a necessary constraint where the sum of the predicted value in each group of 50 rows must equal the value in one column for that group. What model can I use for this, if any?

Comments

You must log in or register to comment.

www3cam t1_j1bxx29 wrote on December 23, 2022 at 4:05 AM

One hack that I often use for this problem is to use any gradient based machine learning approach and just add a Lagrangian to the cost function. As the function converges gradually increase the langrangian until the value holds up to your tolerance for error. If your system is some sort of convex program or something nice, looking for a constrained program that solves your problem formulation may be a good bet.

Dylan_TMB t1_j1cb0nx wrote on December 23, 2022 at 6:12 AM

This doesn't sound like you're trying to predict a number it sounds like you want to predict 50 numbers.

Sir-Rhino t1_j1a3o4a wrote on December 22, 2022 at 7:57 PM

I don't know much about models for tabular data.

That being said, it sounds like you want to pick/select a certain member/item from a group of 50? Instead of predicting the corresponding value, maybe you could just predict the index of that item.

Don't have much more to add. If you provide a bit more context about what you're trying to achieve someone may be able to provide more feedback. BTW this type of post is probably more suited for /r/learnmachinelearning , so try there too.

blablanonymous t1_j1a5ovy wrote on December 22, 2022 at 8:11 PM

Maybe give an example? Maybe I’m just slow but I think it’s ambiguous the way it’s phrased

rapp17 OP t1_j1a9akp wrote on December 22, 2022 at 8:34 PM

I have a quantity of 100 units that need to be allocated across 50 days. The data set is comprised of hundreds of these scenarios X quantity allocated across 50 days. The quantity varies but number of days remain the same. What's the best way to get an ML model to do this.

blablanonymous t1_j1aacs3 wrote on December 22, 2022 at 8:41 PM

It sounds more like constrained optimization than ML but still too vague of an explanation for me to be helpful so I’m giving up. Good luck

flapflip9 t1_j1dk0ke wrote on December 23, 2022 at 2:47 PM

Sounds like you want to predict 50 values, given 150 inputs. ML might work, but I doubt you'd have enough data to avoid over fitting.

It also sounds like it's not like there's a single correct numerical answer for any given day, rather, you're trying to find a decent distribution. So look first into constrained optimizations first, similar to budget allocation or task distribution optimizations.

LimitedConsequence t1_j1d91l2 wrote on December 23, 2022 at 1:15 PM

My first thought is to predict the 50 numbers simultaneously, and to apply softmax to the output (enforcing summing to 1), then scaling that so it sums to your desired number for each group.

blablanonymous t1_j1dp14o wrote on December 23, 2022 at 3:23 PM

If you just want to normalize everything why creating a custom activation function that just does that?

LimitedConsequence t1_j1ep2ui wrote on December 23, 2022 at 7:28 PM

I'm not sure I understand what you mean?

blablanonymous t1_j1f4wy6 wrote on December 23, 2022 at 9:20 PM

You’re suggesting Softmax then normalize to whatever is required. Softmax takes the exponential of the value then performs the normalization. You might not want that exponential. You can create a layer that does the normality through a custom activation function you use in the last layer

LimitedConsequence t1_j1fdayy wrote on December 23, 2022 at 10:22 PM

Yes I was implicitly talking about the final activation function. With regards to softmax, he said in another comment "I have a quantity of 100 units that need to be allocated across 50 days.", so I took that to imply the outputs should be positive (hence the exponential is reasonable).

blablanonymous t1_j1fdeb6 wrote on December 23, 2022 at 10:23 PM

Ha good point