I_draw_boxes t1_jcia41b wrote on March 17, 2023 at 12:46 AM

Reply to comment by ggf31416 in [D] Choosing Cloud vs local hardware for training LLMs. What's best for a small research group? by PK_thundr

A fix for the Nvidia driver is forthcoming for the P2P related issue with PyTorch DDP training. The 3090 didn't support P2P either and the bug fix won't enable P2P for the 4090, but it will correct the issue and should train much faster once fixed.

I_draw_boxes t1_j6zjxbt wrote on February 3, 2023 at 12:13 AM

Reply to [D] Normalizing Flows in 2023? by wellfriedbeans

Human Pose Regression with Residual Log-likelihood Estimation learns an error distribution using normalizing flows. The technique filled a large performance gap between regression and heat map methods.

I_draw_boxes t1_iznviek wrote on December 10, 2022 at 2:48 PM

Reply to [D] Making a regression NN estimate its own regression error by Alex-S-S

Human Pose Regression with Residual Log-likelihood Estimation learns an error distribution using normalizing flows. The network predicts expected variance and this is used to train the flow model to learn the error distribution to reparameterize the loss function. The predicted variance can also be used at inference.

I_draw_boxes t1_iuk27ck wrote on October 31, 2022 at 10:41 PM

Reply to comment by nomadiclizard in [News] The Stack: 3 TB of permissively licensed source code - Hugging Face and ServiceNow Research Denis Kocetkov et al 2022 by Singularian2501

Permissive licenses basically allow the user to do anything they want with the code save sue the author.

>What if a commercial for-profit company trains on a lot of copyleft code, then commercialises the result and refuses to release the model?

That probably isn't legal, but copyleft licenses are not permission licenses and are not included in this dataset for that reason.

I_draw_boxes t1_iqvuh8g wrote on October 3, 2022 at 2:03 PM

Reply to comment by chatterbox272 in [D] Focal loss - why it scales down the loss of minority class? by Lugi

>The alpha term is therefore being set to re-adjust the background class back up, so it doesn't become too easy to ignore.

This is it. The background in RetinaNet far exceeds foreground so the default prediction of the network will be background which generates very little loss per anchor in their formulation. Focal loss without alpha is symmetrical, but the targets and behavior of RetinaNet is not.

Alpha might be intended to bring up the loss for common negative examples to keep it in balance with foreground loss. It might also be intended to bring up the loss for false positives which are even more rare than foreground.