Submitted by 1azytux t3_11nl766 in MachineLearning

Hi everyone,

I'm interested in learning more about recent advances in multimodal models, particularly chain of thoughts models. I'm curious to know what people working in this field are most excited about and what ideas and papers have inspired them.

Specifically, I'm interested in learning about:

  • The latest research on multimodal models, especially chain of thoughts models
  • The challenges that researchers are currently facing when developing these models
  • How researchers are addressing these challenges
  • What researchers are most excited about when it comes to the potential applications of these models

If you work on multimodal models, I'd love to hear your thoughts and insights. What papers have been particularly inspiring or influential? What challenges are you currently facing, and how are you addressing them? What are you most excited about when it comes to the future of multimodal models?

Thank you in advance for your responses :)



You must log in or register to comment.

aozorahime t1_jd26b23 wrote

is it similar to multi modal deep learning? because this is what I am currently studying. you can check this paper for a brief explanation
for chain of thought models, could you elaborate about this?


1azytux OP t1_jd2ho88 wrote

i'm looking for ideas based on the papers given :
- Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

- Multimodal Chain-of-Thought Reasoning in Language Models

and such .. with general chain of thought idea for language can be looked at this paper.

I'm not sure if the link you provided will work, but as it's huge I might have missed (I've glanced on it) can you point out the parts which you think should be paid attention?


aozorahime t1_jd66laf wrote

Ah, CoT. I think I have heard about this topic somewhere but forgot it already. yes it is similar to my next research proposal for Ph.D. regarding Visual QnA in mathematical reasoning. Thank you for the recommendation papers!


About the link that I mentioned, well it is just like a brief explanation of what people doing so far in terms of multimodal deep learning, the model, benchmark, dataset, etc. Since I get exposed with overwhelmed information about the current models, I think I need to look for this (just finished reading the NLP part).


Have you worked with multimodal before?


1azytux OP t1_jdcvjki wrote

Yes, I have worked with multimodal models before, but I'm still in nascent stage of discovering the field of NLP. What about you? Are you interested in multimodal models? What's your PhD on?

I was interested in CoT, and more in multimodal ones because of the recent advances of chatgpt as it's able to remember the previous conversations. I hope this is correct.

Yes, I saw the link and wasn't able to find much about CoT in particular, so asked about you.

I can talk about what I've worked on and what I was trying and want to do in future, maybe in DMs .. ?


aozorahime t1_jdv86p7 wrote

yes, I am interested in multimodal, I think I wanna use this topic for Ph.D. plan. I actually am still a master's student :)) I dunno why since there are various new models, I also get confused what should I do or improve for this CoT, probably because I read few papers, i guess

sure, let's talk via DM or discord (?). I am interested in hearing about your experience in this area.


1azytux OP t1_jdv913f wrote

I am actually not using discord for time being, but maybe reddit messaging will work :) I can DM you.