uhules
uhules t1_j9jpkun wrote
Reply to comment by GraciousReformer in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer
Before why, ask if. GBDTs are very widely used.
uhules t1_j8dtc07 wrote
Reply to comment by tysam_and_co in [D] Quality of posts in this sub going down by MurlocXYZ
The problem is that what defines what a "buzzword" is is its attention-grabbing, catchy misuse. The shelter has unfortunately been breached for a while now.
uhules t1_j8dsggs wrote
Reply to comment by dustintran in [D] Quality of posts in this sub going down by MurlocXYZ
Aside from "We've just published X" threads (which are usually comprised of healthy praises, questions and critiques), I loathe most ML twitter discussions. They tend to have all the usual "hot take" issues from the platform, even from prominent names in the field. Not really a great place to discuss ML as a whole.
uhules t1_j6wrx63 wrote
Reply to comment by Mefaso in [D] Why is stable diffusion much smaller than predecessors? by dahdarknite
Except DALL-E 2 also applies diffusion in latent space and Imagen performs diffusion in low-res pixel space. My initial hunch was the upscaling diffusion models, but they account for a relatively small portion of the total number of parameters and are more relevant speed-wise. The lackluster explanation is simply "SD does latent better", since you'd need to do an extensive ablation study to compare rather different architectures.
uhules t1_j6spu3f wrote
Reply to [D] Audio segmentation - Machine Learning algorithm to segment a audio file into multiple class by PlayfulMenu1395
What kind of model would work in this case is heavily dependent on data availability and the quality of your annotation. Check these datasets from Papers With Code and see whether any one of those is similar enough to your setting, and pick models or code from their leaderboards.
uhules t1_j6sp5wk wrote
Reply to comment by jiamengial in [D] Audio segmentation - Machine Learning algorithm to segment a audio file into multiple class by PlayfulMenu1395
CTC is better suited for unaligned sequences, if OP has precise timings for the sound events, plain frame-wise classification should work better.
uhules t1_j6juq7x wrote
Reply to comment by fasttosmile in [D] What's stopping you from working on speech and voice? by jiamengial
Lhotse is basically part of the "Kaldi 2.0 ecosystem" (K2/Lhotse/Icefall/Sherpa), you'll probably see people referring to the whole lot as Kaldi as well.
uhules t1_j47fkag wrote
Reply to comment by BossOfTheGame in [R] Git is for Data (CIDR 2023) - Extending Git to Support Large-Scale Data by rajatarya
I'm guessing this is unintentional, but you talk like XetHub has been a thing for a while. I even went to see what had I missed, and for what I gathered it's a startup that just emerged from it's stealth status like, five days ago (from its twitter status it's more like three weeks, but still). They'll probably opensource the core tech as a freemium like almost everything else in the current convoluted MLOps landscape.
uhules t1_j3z3bqe wrote
Reply to comment by TheLexoPlexx in [D] Microsoft ChatGPT investment isn't about Bing but about Cortana by fintechSGNYC
And even then, it's still a money sink. Which reinforces the improbability of Cortana being a main focus.
uhules t1_jadfp2z wrote
Reply to comment by hackinthebochs in [R] Large language models generate functional protein sequences across diverse families by MysteryInc152
At the point where it stops being a P(w|h) estimator.