eigenham

eigenham t1_j8sbr6j wrote

It's an arms race. Do Russia and the US really know that each other's nukes work? Or would they rather just not find out. It's like that... will these patents hold up in court? Well if I have enough of them and it's my 20k patents vs your 15k patents, we're probably going to settle based on the likelihood that enough will hold up that we have proportional mutually assured financial destruction.

The crappy part here is that a patent is essentially useless and inaccessible for the individual inventor. A bigger entity will dwarf them. In the end it comes down to money, and it's just a matter of understanding the nature of the investment, which is not as concretely defined as the general public might think.

1

eigenham t1_j6dukjy wrote

Looking at your background you're a recent (or soon to be recent) BS graduate? I'm asking because when you get into graduate level coverage of these topics there's considerable overlap. In terms of papers, etc I'm not sure you could find a clear line where one field starts and the other ends. Maybe if you tried hard you could say that one has more focus on data-driven methods or something, but I think you'd be able to find so many counter examples that I'd question the point of it.

So to answer your question, I'd say it's all around you. If you're looking, you've probably already seen it.

8

eigenham t1_j1uhtn7 wrote

A similar phenomenon happens because of batching in general though. More generally, the distribution of the samples in each batch determines what the cost function "looks like" (as a function approximation) to the gradient calculation. That sample (and thus function approximation) can be biased towards a single sample or a subset of samples. I think OP's question is still an interesting one for the general case.

1

eigenham t1_ivy0qhf wrote

Reply to comment by No_Captain_856 in [D]Transformers! by No_Captain_856

I mean it definitely captures a relationship between parts of input data in ways that many other models cannot. It also cannot do everything.

Like most real world problems, there's a question of how you will represent the relevant information in data structures that best suit the ML methods you intend to use. Similarly there's a question of whether the ML methods will do what you want, given the data is in that form.

Despite the fact that transformers are killing it for ordered data, I'd say their flexibility to deal with unordered data is definitely of interest for real world problems where representations are tricky.

2

eigenham t1_ivxwhgg wrote

Reply to comment by No_Captain_856 in [D]Transformers! by No_Captain_856

The attention head is a set to set mapping. It takes the input set, then compares each input to a context set (which can be the input set itself, or another set), and based on those comparisons outputs a new set of the same size as the input set

Out of curiosity, how were you thinking of using that for gene expression?

2

eigenham t1_iudbxis wrote

Ok so you really have one input vector but you're concerned that some important elements of it are going to get ignored or underutilized. Normally that's the whole point of the optimization process in the fitting problem: if those features result in the most gain during training, the information from them should be prioritized (up to getting stuck in local minima). Why do you think this wouldn't be the case for your problem? Is this small set of inputs only relevant for a minority class or something like that (unless addressed, this would make them underrepresented in your optimization problem)?

1

eigenham t1_iud7sg6 wrote

Thanks and just to make sure I understand you: are these inputs of different sizes available all the time simultaneously (e.g. could theoretically be concatenated into a single vector)?

Or are only some of them available at a time (and you've found that the smaller vectors are more predictive of the more important class)?

1

eigenham t1_iud3a5l wrote

>To give you an idea of the scale, one input is a 200-dimensional vector, another input is a 1-dimensional number, and another is a 5-dimensional vector.

When you're talking about vector length, are you 1) talking about a sequence model, and 2) the length of the sequence? Or are you talking about the number of elements in an actual vector input?

2

eigenham t1_isfkldi wrote

Yeah that's definitely one of the bigger groups/institutions in this field. You can expect groups like theirs to push the bounds early, and only some of those efforts gain enough traction for actual acceptance in clinical research/practice, so while this is a good effort, the real sign of movement will be when these methods start showing up in clinical journals

2

eigenham t1_iseq5zz wrote

Medical imaging community doesn't like when you make up new data (which makes sense when you think about the use case). That said, sure there's work on interpolation, but probably a lot of what you're looking for is hiding in the literature as "super resolution imaging". There's a bunch of hand wavy work and a few groups doing really good validation studies (just look for the authors from the biggest and most famous institutions, because the sad truth is you need money and resources to properly validate).

35