Hi,

I have to work with sequences of vectors.

It is the spatial position of a solid expressed as a translation and a quaternion for the rotation. The solid has a trajectory in the space that follow some logic but is still to complex to be mathematically modeled. All solids are evolving in a similar environment with strong winds and fixed obstacles.

At each step of the trajectory we need to predict the next position in order to predict the position were we need to focus the camera. It has a relatively narrow field of vision so it shall be accurate enough to have the object in the field of vision.

The camera is not just rotating over a fixed point. It has to capture one of the faces of the solid to the move is slow enough that we need some prediction ahead of time.

A vector is always of length 9 :

The solid position: Tx, Ty, Tz, Qx, Qy, Qz, Qw

2 imposed environmental factors : Pu, Pv, that can have some influence of the trajectory. These are measured just before the prediction and shall not be predicted.

The position of the vector in the sequence is meaningful, so we could add a coordinate that is imposed: index.

The sequence of previous vectors can vary from 1 to 100 but some ~5 of the ~15 previous vectors are the most meaningful for the next vector prediction. Not necessarily the last ones. These numbers can slightly vary. We can find by advance, using an analytical algorithm, a list of 6 possibly meaningfull vectors to take into account, leave 15 vectors and let the system decide or even let the full history.

To start experimenting with... We have ~20 different trajectories of 100 items. We will probably have a few hundred within a few months.

The goal is to give some of the first vectors to the system and let him predict the next vector. At each step we have the environmental factors available and we need to predict a next move that is realistic. Not necessarily the next move that will happen exactly in reality as we can measure and correct it afterwards, but something near that can be exploited to focus the camera on the right place to capture the object in a relatively narrow field of vision.

I was thinking about next word prediction models that behave similarly. Namely LSTM and Transformers. I also think about simple position aware decision trees and plain neural networks with the index in the parameters.

Does this ring some bells about possible papers or concepts I shall explore before testing some implementations ?

Thanks for any advice!

Comments

You must log in or register to comment.

mmeeh t1_je9hfuo wrote on March 30, 2023 at 12:00 PM

#2,480,714

Why don't you dump this on ChatGPT and get a way more accurate answer ?

mmeeh t1_je9hohs wrote on March 30, 2023 at 12:02 PM

#2,480,762

Replying to mmeeh (#2,480,714)

Also, sounds more like a Reinforcement Learning problem rather than LSTM.

x11ry0 OP t1_jeampyd wrote on March 30, 2023 at 5:03 PM

#2,491,582

Replying to mmeeh (#2,480,714)

Well, the original ChatGPT is overloaded and Bing is not really helpful. It loosely suggest to use LSTM or Transformers.

Using reinforcement could also be a nice idea.

blimpyway t1_jebtlmt wrote on March 30, 2023 at 9:36 PM

#2,501,957

Assemble a dataset and raise a challenge on Kaggle?

suflaj t1_jebxmmx wrote on March 30, 2023 at 10:03 PM

#2,502,929

If you have a small dataset, then Transformers are out of the question, especially if we're talking pretraining and all.

Seems to me like you might be interested in ML methods, such as XGBoost. Since you have tabular data it will probably outperform all other methods at first. From there on out you would be trying to find a better tailored model from the task, depending on how you want to use your data. Given your data situation, you would be looking at deep LSTMs for the end game. But currently, it doesn't matter if it's 20 or 2000 samples (idk how you count them), that's not enough to solve something you claim is too difficult to outright mathematically model.

Reinforcement learning might not be adequate given that you say that the problem is too difficult to model mathematically. RL will only be useful to you if it is difficult to model it because the problem is wide, ie it is hard for you to narrow it down to a general formula. If the problem is hard in the sense that it would be difficult or narrow, then your agent might not be able to figure out how to solve the task at all, and you would have to think out the training regimen really well to teach it anything. RL is not really well suited for very hard problems.

Finally, it doesn't seem to me you have an environment set up for the agent, because if you did, your problem would be solved given that it would require you to mathematically model it. And if it was easy to obtain data in the first place, you would be having way more than 20 or 2000 samples. That's why I presume that RL is completely out of the question for you as well.

I would personally not tackle this problem with trajectories. If you want to solve this using DL, then you should create a bigger dataset using actual camera recording, and then either label the bounding boxes or segment the image. Then you can use any of the pretrained backbones and simply train an object detector. Given an offset in the next frame, you can calculate the movement for the camera.

This is a task so generic that just with a few hundred to thousand samples you can probably get a semi-supervised labelling scheme going on - with some other model labelling the images automatically and then you just need a few humans judging these labels or correcting them. And this task is so trivial and widespread you can find a workforce to do this anywhere.

The question is what performance you would expect. But in all cases I would say that if you need a very robust solution, you should probably look into mathematically modelling it - we are presumably talking about a differential system in the background, which is not going to be easily solved by any mainstream DL model. All methods mentioned here can essentially be dumbed down to a very large non-linear equation. They can only mimic a differential system up to a certain amount of precision, determined by their width and depth, as well as the statistic significance of your samples.

mmeeh t1_jed5sr9 wrote on March 31, 2023 at 3:39 AM

#2,514,602

Replying to x11ry0 (#2,491,582)

yeah, anything with memory can help ur case, all this algorithms are expensive in computations

4_love_of_Sophia t1_jedrxbe wrote on March 31, 2023 at 7:59 AM

#2,520,047

I would really suggest to use Extended Kalman Filters or Particle Filters for this. You can model the environment factors, the acceleration/velocity factors and also take into account orientation history

x11ry0 OP t1_jefvk0p wrote on March 31, 2023 at 6:36 PM

#2,543,427

Replying to suflaj (#2,502,929)

Hi, thanks a lot for this very detailed answer. I will have a look at your suggestions. Yes we are looking into the mathematical equations too. We fear that it will be quite un-precise but in other hand I understand very well the issue with the small dataset currently available. We may end up exploring different solutions.