marr75

marr75 t1_j7ksi6o wrote

They should be. I think LLMs will totally upset how content is indexed and accessed. It's one of the easiest and lowest stakes use cases for them, really.

Unfortunately, Google has such a huge incumbent advantage that they could produce the 5th or 6th best search specialized LLM and still be the #1 search provider.

1

marr75 t1_j4da0ub wrote

Sure, there will probably be plenty of litigation in the next few years. I find it probable that these suits fail. Sorry for my imprecision on the origin and application of the four-part test. I think we'll just hold our same opinions on the matter coming out so I don't really care enough about this debate to formulate my sentences that carefully or continue.

0

marr75 t1_j4cuv4t wrote

I read the 30 word OP here and the jukebox blog post and have read multiple analyses of AGI vs Google. The best I can guess, you're referring to the jukebox post, which only references IP in the sentence:

> As generative modeling across various domains continues to advance, we are also conducting research into issues like bias and intellectual property rights

So, I question if you know what discussion you're replying in, if you yourself read the post, or if I'm just so confused I can't believe my own reading comprehension anymore (which could happen any day now).

The multi-part fair use test established in AGI vs Google is widely held to be applicable to AI and ML models. There are no guarantees when it comes to credible legal theories and the winds can shift after a Supreme Court decision or two, but that's the state of the art today.

1

marr75 t1_j4ctyyh wrote

Things aren't "true"/"false" in this context, unfortunately. It is commonly held by IP and copyright lawyers to be the most credible legal theory available today. The multi-part test for fair use it created has been generally upheld as usable in AI and machine learning scenarios.

1

marr75 t1_j3c7zik wrote

I'm not following what you're saying but you can detect all local minima with a single function call, order them and know their summary statistics with a second function call, and come up with a threshold based comparison for the end of the video if that's what you want.

None of this requires a machine learning model. You lost me when you mixed in "only when an ad occurs". Do you have any data that would help you train such a model? Are you just trying to detect ads? You could:

  • identify all local minima attention drops
  • engineer features such as distance into video, length of drop (time spent below average before and after local minima), magnitude of drop
  • perform unsupervised learning, i.e. PCA/t-sne/k-means
  • hope the "structural" features identified by unsupervised learning help you organize ads vs non-ads (they might!)

Again, not a complicated system because you don't have complex features as you've described them.

Is this just a novelty project? The way you're asking about it makes me think there's a low chance of follow through and your questions are kind of "arguing" towards a more complicated model. Run whatever code you are capable of then, I guess. I will probably decline to give further advice if that trend of leading questions continues.

3

marr75 t1_izywb2h wrote

If they were using a custom python pipeline for the statistical models, yeah, I could see this argument. But, like many of the Nixtla tools:

!conda install -c conda-forge statsforecast
import sf
sf.fit(Xzero, yzero)
yone = sf.predict(Xone)

This is a pretty common "marketing" post format from Nixtla. I think they make good tools and good points, so I'm not at all mad about it. They're providing a ready to use tool (StatsForecast) and making a great point about it's performance and cost vs the AWS alternative. Asking for the total cost of developing and maintaining statsforecast means you'd have to also account for the total cost and complexity of developing and maintaining AmazonForecast...

12

marr75 t1_iymo8k3 wrote

Yeah

> Just guessing here, but

is a common US English idiom that typically means, "Obviously".

You're absolutely right, though. Just by comparing the training data to the training process and serialized weights, you can see how clearly this should overfit. Once your model is noticeably bigger than a dictionary of X, Y pairs of all of your training data, it's very hard to avoid overfitting.

I volunteer with a group that develops interest and skills in science and tech for kids from historically excluded groups. I was teaching a lab on CV last month and my best student was like, "What if I train for 20 epochs, tho? What about 30?" and the performance improved (but didn't generalize as well). He didn't understand generalization yet so instead, he looked at the improvement trend and had a lightbulb moment and was like, "What if I train for 10,000 epochs???" I should check to see if his name is on the list of collaborators for the paper 😂

3

marr75 t1_itbnukd wrote

I edited down the flatly negative part of what I wrote above because you're engaging so sincerely to improve it. I can't imagine getting a feel for it without running a lot of queries (100 a month or 10 per hour or 1 per minute, something like this). On top of that, the job to be done here is a little suspect for me. Are there people who have a commercially viable need to get a phrase back for a description?

The 2 tests I wanted to try were 2 very specific words I can't remember. The first is one of those german multi-word combinations that means, "the problem is solved by the mere structure of the solution." I don't think that word is probably even in the dictionary based on the results I was getting and I also started to learn that it was giving me back short phrases instead of words, which was disappointing. The second word means "distribution preserving" and I didn't get a chance to test it but it's got latin roots and I'm skeptical phraisely has it in the dictionary, too.

Overall, I was hoping the technology on display would be more powerful. I guess I'd pay $1 for either of those words.

1

marr75 t1_itblwi0 wrote

I was trying to get a feel for it and can't even remember how many queries I issued. 3-5 maybe? There was no indicator that I was using a quota (especially a quota that small) when suddenly I was told I needed to wait 224 hours for more queries.

1

marr75 t1_isxougd wrote

I read a good blog post from a guy talking about how modern IDEs encourage you to learn really weird "motions" (using pycharm's refactor, codegen, and code completion mid-stream, for example). He wasn't saying it was bad per se, just that we should all remember the point isn't to be "good" at the IDE, it's to solve problems with the code.

I feel the same about pandas. If anything, the skill to focus on is vectorizing your operations. That's the biggest readability and performance improvement and it's portable to dplyr, polars, etc.

3

marr75 t1_isxnz3w wrote

I think they're designed by very traditional engineering managers. The coding test trend gained popularity thanks to Jeff Atwood because he used it as an early screen for people applying for lucrative jobs they didn't actually know how to do (which is useful!). Managers were using it as a higher and higher floor for skills and we got the leetcode style (a fresh bootcamper might pass fizzbuzz but they're unlikely to have months to grind leetcode). We've also seen an explosion in roles who code but are more responsible for the wisdom and value of their creations (the spec and visual design aren't enough or even relevant for a model, a Lagrangian relaxation, a recommendation engine, etc).

Real conversation I had with another executive, "Hey, we've got that coding screener for engineers, can we whip up something similar for [name a role]?" You start combining these different forces - a desire for selectivity, a desire to lower hiring cost, more complex technology roles that have to chart some of their own spec, and just human laziness - and you get what OP described.

1

marr75 t1_iram7mu wrote

Depends on what you mean by effective. This article summarizes and links a few quality studies.

For symptomatic Omicron infections: 2 dose ~50-60% effective, 3-dose ~70-80% effective, 4-dose ~90% effective. The death rate was not reliably calculable in these studies because it was so low for all groups. So, is 50% efficacy against symptomatic infection, [some very high efficacy]% against death/severe illness "effective"? It certainly slows the spread and keeps a lot of people alive. Plus there's strong evidence you could "choose your own efficacy", if you were higher risk or just didn't want the hassle of symptomatic Covid, you could choose first and second boosters.

For reference, the flu vaccine, which is becoming a better comparison as we have vaccines against this family of coronavirii and they have become endemic (there's a smaller and smaller population with no prior immunity) is typically 40-60% effective against the most common strains of flu each year.

3