Yes, I have worked with multimodal models before, but I'm still in nascent stage of discovering the field of NLP. What about you? Are you interested in multimodal models? What's your PhD on?

I was interested in CoT, and more in multimodal ones because of the recent advances of chatgpt as it's able to remember the previous conversations. I hope this is correct.

Yes, I saw the link and wasn't able to find much about CoT in particular, so asked about you.

I can talk about what I've worked on and what I was trying and want to do in future, maybe in DMs .. ?


i'm looking for ideas based on the papers given :
- Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

- Multimodal Chain-of-Thought Reasoning in Language Models

and such .. with general chain of thought idea for language can be looked at this paper.

I'm not sure if the link you provided will work, but as it's huge I might have missed (I've glanced on it) can you point out the parts which you think should be paid attention?


do you know which foundation models we can use though, or are open sourced? It seems like every other model is either not available or their weights aren't released yet. It's case with, CoCa, Florence, Flamingo, BEiT3, FILIP, ALIGN. I was able to find weights for ALBEF.