Submitted by WobblySilicon t3_zz0tua in MachineLearning

Hello Everyone!

Are there any research problems in language comprehension and summarization tasks which don't require much compute? I wish to play with NLP/NLU now but compute requirements are enormous.. After reading around, i found that text to video problem is being actively researched and may not require as much compute as bare language models do. Are their any novel ideas in text to video domain not requiring much compute?

7

Comments

You must log in or register to comment.

Mefaso t1_j29980m wrote

>i found that text to video problem is being actively researched and may not require as much compute as bare language models

There are always opportunities for research with little compute, usually this means your research has to avoid training new models, or at least avoid training from scratch.

However, text to video models are typically very compute extensive

7

Complete-Maximum-633 t1_j2ab7zy wrote

Anything with “video” is going to be costly.

8

WobblySilicon OP t1_j2d1n2x wrote

question is how much cost? can it be done with one GPU or do i need a swarm of those?

1

Complete-Maximum-633 t1_j2drquz wrote

Impossible to answer without more context.

1

WobblySilicon OP t1_j2ffcey wrote

Sure! Sir!

In the months to come i would be working on the problem of text to video. After literature review i got the idea that it might be compute extensive, like a cluster of GPUs required to train the models. So I asked that if it could be done with a mediocre GPU such as a 3080. I haven't really thought about the models i would use or general architecture of the model. Just wanted an answer, because i dont wish to take up this topic then get stuck due to compute issues.

1

WobblySilicon OP t1_j2a0oop wrote

I do have access to an A6000 for a few days. Other resources (less memory) are available by the university as well. By compute expensive I mean whole clusters of gpus...

I have difficulty in trying to wrap my head around text to video problem (particularly the newer models with many smaller components). Are their any suggestions/resources to get acquainted with this new task..? I have read recent research papers but it seems hard to find an area where improvement could be made by technical customization of base models. Do you have any tips on this?

Finally, If I cant work on text to video then my other option would be deep fake detection. Can you comment on merits or demerits of choosing this topic for my study? Both topics are very new for me. I have exposure to intermediate vision based problems and feel confident enough to try these out. Right now it just feels that I am out of ideas for any tinkering with the base models.

1

CriticalTemperature1 t1_j29hrai wrote

Take a look at this paper. The authors pursued a similar approach to the one you mentioned:

https://arxiv.org/abs/2212.14034 (Cramming: Training a Language Model on a Single GPU in One Day)

>Recent trends in language modeling have focused on increasing performance through scaling, and have resulted in an environment where training language models is out of reach for most researchers and practitioners. While most in the community are asking how to push the limits of extreme computation, we ask the opposite question: How far can we get with a single GPU in just one day?
>
>We investigate the downstream performance achievable with a transformer-based language model trained completely from scratch with masked language modeling for a single day on a single consumer GPU. Aside from re-analyzing nearly all components of the pretraining pipeline for this scenario and providing a modified pipeline with performance close to BERT, we investigate why scaling down is hard, and which modifications actually improve performance in this scenario. We provide evidence that even in this constrained setting, performance closely follows scaling laws observed in large-compute settings. Through the lens of scaling laws, we categorize a range of recent improvements to training and architecture and discuss their merit and practical applicability (or lack thereof) for the limited compute setting.

Although its not text to video, you can probably apply similar approaches to vision transformers, diffusion models, etc

3

ardula99 t1_j29wdd0 wrote

Research in fairness is something you can do without too much compute. Small BERTs and stuff are things you can easily experiment with on even a single GPU

2