Submitted by rapp17 t3_zspe6r in MachineLearning

I have a data set with three columns and want to predict a numerical value. The data set is divided into groups such that each group is 50 rows. There is a necessary constraint where the sum of the predicted value in each group of 50 rows must equal the value in one column for that group. What model can I use for this, if any?

5

Comments

You must log in or register to comment.

Sir-Rhino t1_j1a3o4a wrote

I don't know much about models for tabular data.

That being said, it sounds like you want to pick/select a certain member/item from a group of 50? Instead of predicting the corresponding value, maybe you could just predict the index of that item.

Don't have much more to add. If you provide a bit more context about what you're trying to achieve someone may be able to provide more feedback. BTW this type of post is probably more suited for /r/learnmachinelearning , so try there too.

1

rapp17 OP t1_j1a9akp wrote

I have a quantity of 100 units that need to be allocated across 50 days. The data set is comprised of hundreds of these scenarios X quantity allocated across 50 days. The quantity varies but number of days remain the same. What's the best way to get an ML model to do this.

1

www3cam t1_j1bxx29 wrote

One hack that I often use for this problem is to use any gradient based machine learning approach and just add a Lagrangian to the cost function. As the function converges gradually increase the langrangian until the value holds up to your tolerance for error. If your system is some sort of convex program or something nice, looking for a constrained program that solves your problem formulation may be a good bet.

5

Dylan_TMB t1_j1cb0nx wrote

This doesn't sound like you're trying to predict a number it sounds like you want to predict 50 numbers.

3

LimitedConsequence t1_j1d91l2 wrote

My first thought is to predict the 50 numbers simultaneously, and to apply softmax to the output (enforcing summing to 1), then scaling that so it sums to your desired number for each group.

1

flapflip9 t1_j1dk0ke wrote

Sounds like you want to predict 50 values, given 150 inputs. ML might work, but I doubt you'd have enough data to avoid over fitting.

It also sounds like it's not like there's a single correct numerical answer for any given day, rather, you're trying to find a decent distribution. So look first into constrained optimizations first, similar to budget allocation or task distribution optimizations.

1

blablanonymous t1_j1f4wy6 wrote

You’re suggesting Softmax then normalize to whatever is required. Softmax takes the exponential of the value then performs the normalization. You might not want that exponential. You can create a layer that does the normality through a custom activation function you use in the last layer

1

LimitedConsequence t1_j1fdayy wrote

Yes I was implicitly talking about the final activation function. With regards to softmax, he said in another comment "I have a quantity of 100 units that need to be allocated across 50 days.", so I took that to imply the outputs should be positive (hence the exponential is reasonable).

2