FastestLearner
FastestLearner t1_j8esc0c wrote
Reply to [D] Is a non-SOTA paper still good to publish if it has an interesting method that does have strong improvements over baselines (read text for more context)? Are there good examples of this kind of work being published? by orangelord234
Neural networks were not SOTA for a very very long time. The world would be very different if everyone had only published SOTA results improving upon existing SOTAs of the 90s.
FastestLearner t1_j6v8nsz wrote
Reply to Using Jupyter via GPU by AbCi16
Being able to use the GPU doesn’t have anything to do with Jupyter. It’s the packages (TensorFlow, PyTorch, etc.) that must be installed with CUDA support and also you must have the correct drivers installed. My recommendation would be simply use a conda environment, which automatically installs the correct CUDA packages during a PyTorch install or a Tensorflow install.
FastestLearner t1_j6mhjd2 wrote
Reply to Best practice for capping a softmax by neuralbeans
Use composite loss, i.e. add extra terms in the loss function to make the optimizer force the logits to stay within a fixed range.
For example, if current min logit = m
and allowed minimum = u
, current max logit = n
and allowed maximum = v
, then the following loss function should help:
Overall loss = CrossEntropy loss + lambda1 * max(u - m, 0) and lambda2 * max(n - v, 0)
The max
terms ensure that no loss is added when the logits are all within the allowed range. Use lamba1
and lambda2
to scale each term so that they roughly match the CE loss
in strength.
FastestLearner t1_j6exhli wrote
Corrections:
-
The weights are set to random only at the beginning (i.e. before iter=0). Every iteration onwards, the optimization algorithm (some form of gradient descent) kicks in and nudges the weights slightly in a way to make the whole network perform incrementally better at the task it’s being trained for. After hundreds of thousands of iterations, it is hoped that the weights reach an optimal state, where more nudging does not optimize the weights any further (and by extension it does not make the neural network learn any better). This is called convergence.
-
Coming to your example of path finding, first of all this is a reinforcement learning (RL) problem. RL is different from DL. DL or deep learning is a subset of machine learning algorithms which is mostly concerned with the training of deep neural networks (hence the name). RL is a particular method of training ‘any’ learning algorithm (doesn’t always have to be neural networks) using what are called reward functions. Think of it like training a dog (an agent) to perform tricks (a task) using biscuits (as rewards). Every time your dog does what you ask him to do and then you follow up by giving him a biscuit, you basically ‘reinforce’ his behavior, so he will do more of it when you ask him to do it again.
-
Now, the example of the path finding agent that you gave is silly. No RL agent is trained on one single scenario. If you do train an RL agent on just a single scenario, you get a condition called overfitting, meaning that your agent learns perfectly well on how to navigate that one scenario but it doesn’t generalize to any other unseen scenarios. In practice, we train an RL agent on hundreds of thousands of different scenarios, with each scenario being slightly different from the rest. Many of these scenarios can have different conditions like different lighting, differently structured environment, different geometries and different obstacles, etc. etc. What we hope to achieve is that after training, the RL agent learns a generalized navigation function that is adaptive to any scenario.
I suggest you watch some TwoMinutePapers videos on YT, of some OpenAI’s RL papers. There are some videos in which RL agents learn to fight in a boxing match, and in another one, several agents collaborate to play hide and seek. You’d get a feel for how RL works.
FastestLearner t1_j677lru wrote
Reply to comment by RelevantDiscussion44 in Which is your go to framework for deep learning, in python by V1bicycle
For an absolute beginner, definitely PyTorch is what I would recommend. It’s like an extension of numpy.
Both frameworks are extremely matured and will get the job done no matter what you throw at it (I don’t get what you mean by practicality).
For industry purposes, if you have a particular company in mind, then check which framework they use (ask some employee on LinkedIn) and learn that framework (some companies still have their codebases in TF1, they never updated). If you are in the market for a job hunt, then having both on your CV will give you the best chance.
FastestLearner t1_j630p88 wrote
The thing with me is that I started with TensorFlow v1 back when PyTorch wasn’t even in the race, and because of the constant breaking changes to the TensorFlow API and cryptic error messages, my experience was hellish TBH. Even getting support from stackoverflow was messed up because people would be posting solutions for different API versions. Then PyTorch got released and boy was it the savior I needed. It literally saved me hundreds of hours of debugging (and possibly from brain hemorrhage too). Compared to the burning hell TF1 was, PT was like coding on a serene beach. And then TensorFlow v2 came out with eager execution, that promised PyTorch way of doing things. But then the question is, why switch if it is the same as PyTorch? And so I didn’t.
I’m coming from a research point of view. If I was coming from a production POV, things could’ve been different.
FastestLearner t1_j5ul7uu wrote
Reply to comment by K_fortytwo in [R] Best service for scientific paper correction by Meddhouib10
Oh no. There’s nothing wrong. I think it’s just an inferior tool for the amount of ads they show everyone on the internet. I’ve met people who are overly enthusiastic about Grammarly (coz they’ve been biased with all the ads they’ve seen) and I think it’s overrated for what it is. People fall for overrated over-advertised products a lot and make bad decisions in the process. Reminds me of the paperlike screen protector ad on every other iPad review video. The product is not at all bad but considerably overhyped. What this kind of unhealthy hype does is that it creates a bad smoky atmosphere, which doesn’t let other products shine through even though they are equally good (in this case Quillbot is arguably better).
That’s said, if Grammarly works for you, then you should definitely choose it.
FastestLearner t1_j5tzcvr wrote
For minor polishing, I use quillbot.
Also, stay away from grammarly.
FastestLearner t1_j5ogton wrote
It actually depends on what you want to achieve. For example, if you want to do research in DL, the best way is not to start with DL at all and instead do some fundamental math courses like LinAlg, Prob/Stats, Intermediate and Advanced Calc, etc., then turn to traditional ML, and only after that you do DL. This is the bottom-up approach and it is a long journey that takes years. But from your post, it seems that you are looking for a quick top-down approach. For that, I would suggest you simply look into some medium.com articles, youtube videos, udemy courses and most importantly the dive head first into coding (try running as many examples from github as you can). Try reproducing some basic results, like getting >90% accuracy on CIFAR-10 classification with a ResNet model. You could also try getting into a bootcamp if there's one going on nearby.
FastestLearner t1_j5mwz47 wrote
Reply to comment by ArnoF7 in [D] Multiple Different GPUs? by Maxerature
It is possible, but it would require you to write custom code for every memcopy operation that you want to perform i.e. tensor.to(device)
, which you can get away with on a smaller project but could become prohibitively cumbersome on a large project. Also you'd still need to do two forward passes (one with the data on the 3080 itself, and then another with the data on the 1080 after having it transferred to the 3080). Whether or not this is beneficial boils down to differences in transfer rates between the RAM-3080 route and the RAM-1080-3080 route. I won't be able to tell which one is faster without benchmarking.
DeepSpeed handles the RAM-3080 to-and-fro transfers for large batch sizes automatically.
FastestLearner t1_j5iklgu wrote
Reply to comment by Maxerature in [D] Multiple Different GPUs? by Maxerature
If you don't engage the second GPU, it will remain dormant, and should not automatically interfere with anything. For example if you are training a network in PyTorch without using DP or DDP, then it will use the first GPU by default. You can always change which GPU it uses using the environment variable CUDA_VISIBLE_DEVICES
. Also, make sure the primary GPU occupies the first PCIe slot. You could verify this with nvidia_smi
. When you have the display hooked up to it, the primary GPU will have a slightly higher memory usage (~100 MB) because of display server processes like Xorg, than all other GPUs.
FastestLearner t1_j5if4nz wrote
Reply to [D] Multiple Different GPUs? by Maxerature
Tim Dettmers wrote about this in one of his articles. AFAIK, SLI is not required for DL (it’s a gaming thing where sync between GPUs becomes important for smooth gameplay). In DL tasks, any GPU can just wait for others to finish. So you can use any combination of any number of Nvidia GPUs as long as you can interface with them (PCIe or Ethernet). The catch is that the speed of training/inference will be limited by the weakest link in the chain, i.e. the weakest GPU will bottleneck all other GPUs. But on the flip side, you should be able to fit more data owing to the increased VRAM.
The other thing that you can do is run two different experiments on each GPU simultaneously. In that way, you can maximize the usage of your GPUs.
If you do want to fit more data on the 3080, look for pytorch plug-ins, such as deepspeed or FP16 or simply do two forward passes per backward pass, which will double your batch size.
FastestLearner OP t1_j4z74l7 wrote
Reply to comment by Philpax in [D] Idea: SponsorBlock with a neural net as backend by FastestLearner
Yes. I too agree that a large model in not required for detecting simple words like "Please subscribe to our channel" or "Here is the sponsor of our video". I also have another idea which I think should help in getting better accuracies. Use the channel's unique identifier (UID) or the channel's name as input ( and generate conditional probabilities conditioned on the channel's UID). This should help because any particular YouTube channel almost always use the same phrase to introduce their sponsors in almost all of their videos. Think of LinusTechTips, you always here the same thing, "here's the segue to our sponsor yada yada." So this should definitely allow the model to do more accurate inference. Alternatively, you can just reduce the model complexity to save client's resources.
The other thing you mentioned about the average user not hitting the right arrow two times, I think (and this is my hypothesis), the graph of users using adblocking softwares is just increasing monotonically, because once a user gets to savour the internet without ads, they don't go back. Only the old aged folks and the absolutely-not-computer-savvy people don't use adblockers, and IMO that population is decreasing and in the (near) future, that population would simply vanish. This is similar to what Steve Jobs said when he was asked whether people would ever use the mouse. Look at now, everyone uses the mouse. Coming to sponsor blocking, not hitting the right arrow is just more convenient than hitting the right arrow two times. Sometimes hitting it x number of times does not get the job done and you need to hit it further. Also, you might miss the beginning of the non-sponsored segment, so you need to hit the left arrow once too. All of this is made convenient by the current SOTA SponsorBlock extension. It has just begun its journey and I have no doubt that just like the adblocking extensions, sponsorblocking is going to take off and see an exponential growth.
FastestLearner t1_j4yw7kj wrote
What do you mean by “correct method”?
FastestLearner OP t1_j4y3zw2 wrote
Moderators, why did you delete the post? We were having such a good discussion.
FastestLearner OP t1_j4x7rvp wrote
Reply to comment by float16 in [D] Idea: SponsorBlock with a neural net as backend by FastestLearner
Yes. Your first point is something that I would happily engage in as well. I have no problems contributing to the community. Moreover, the extension can have several additional options like:
(i) Do not perform any kind of inference on the client, i.e. always use query existing timestamps from an the online database. This will be helpful for users with low power devices like laptops.
or
(ii) Perform inference (only) for the video that the client wants. This is, of course, necessary if the video does not have any timestamps on the server. It does the inference and uploads the results on the central server.
or
(iii) Keep performing inference for new videos (even ones that are not watched by the particular user) - Some folks who runs a powerful enough hardware and are eager to donate their computation time can choose this option. I am pretty sure some folks will emerge who are willing to do this. The LeelaChessZero project banked entirely on this particular idea. For this option, there could be slider to let the user control how much of the resources to keep actively engaged (maybe by limiting thread count).
The second point that you mentioned could be a implemented with a peer-to-peer communication protocol, but if the neural network's weights don't change, then there would be nothing different with most recent vs. stale timestamps. Also, in P2P you'd still need trackers to keep track of peers, which could be a central server or be decentralized and serverless depending on the implementation. One potential problem could be latency though.
FastestLearner OP t1_j4uj7q2 wrote
Reply to comment by much_bad_gramer in [D] Idea: SponsorBlock with a neural net as backend by FastestLearner
Godspeed to you. I think the first person to get it to the chrome/firefox extension store would get the most downloads and pave the future for all other adblocking/sponsorblocking extensions (coz no other extension currently does that, AFAIK).
FastestLearner OP t1_j4uj5oy wrote
Reply to comment by Philpax in [D] Idea: SponsorBlock with a neural net as backend by FastestLearner
I don't have much experience of the cost of training NLP models (I work mostly in Vision). But I think if you can get a product out with just enough accuracy to get the heads turning in your favour, you could always scale up the model later down the road. Alternatively, you could have donate button on the extension's settings page (which many extensions do), if you do get some donations you could use it to update the model later on. It could be crowd-sourced and crowd-funded simultaneously.
FastestLearner OP t1_j4uilg8 wrote
Reply to comment by Philpax in [D] Idea: SponsorBlock with a neural net as backend by FastestLearner
Yes. I initially thought of having a neural net trained on the audio track of a particular YT video, but I think the transcripts would provide just enough information, and fine tuning existing language models would work quite well especially with the recent tremendous growth of NLP. Collecting the audio would also require far more storage space than text, and would probably require more RAM, VRAM and compute.
If you are leaning towards crowd-sourcing the inference, I think it would be possible to do that using JS libs (such as TensorFlow.js), although I have no experience of these. The good thing is, once you do an inference on a video, you just upload them to the central server and everyone can get it for free (not requiring further inference costs).
FastestLearner OP t1_j4uhkbm wrote
Reply to comment by C0hentheBarbarian in [D] Idea: SponsorBlock with a neural net as backend by FastestLearner
Yes. I did think about that and potential solutions could be:
(1) A startup offering services in exchange of a small fee - The good thing about it is that once you do an inference on a video, you can serve it to thousands of customers with no additional cost (except for server maintenance and bandwidth, but no extra GPU cost other than the first time you ran it on a particular video).
(2) Crowd sourced inference - The current state of the sponsor-blocking extension is that it requires manual user input which it sources from the crowd and collects at a central server. So it's basically crowd-sourced (or peer-sourced) manual labour. I'm sure if someone could come up with an automated version like an executable which runs in the background with very small resource usage, then inference can be done via crowd-sourcing too, the timestamps can then be collected to a central server and distributed across the planet. The good thing about this is that as more and more people join in to participate in the peer-sourced inference, the lower would be the cost of keeping any one peer's GPU busy.
FastestLearner OP t1_j4ufxgc wrote
Reply to comment by CallFromMargin in [D] Idea: SponsorBlock with a neural net as backend by FastestLearner
I am not well acquainted with NLP tasks. So I have no idea of how much resource it would need to get a transformer trained on it (or finetune an existing model like BERT on the dataset). If resources are a concern, one could do a crowd sourced training, like LeelaChessZero. I think it's a matter of time someone comes along and does this, because blocking ads is the inevitable future of the internet. Also, some company/startup can do it on a subscription model like the already existing paid adblocking softwares. It's a potential startup idea IMO.
Submitted by FastestLearner t3_10f2joc in MachineLearning
FastestLearner t1_j3i0bdo wrote
Reply to [D] Why is Vulkan as a backend not used in ML over some offshoot GPU specification? by I_will_delete_myself
But what can Vulkan do that CUDA can’t already do?
FastestLearner t1_j3c0yju wrote
You are not using non-linearity. Yours is just a linear model. Deep CNNs thrive on non-linearity. Try adding a ReLU layer after every MaxPool. Also, for better convergence, add BN layers after each Conv. Don’t use two Linear layers (mostly redundant). Use AvgPool instead of Flatten. Replace Softmax with LogSoftmax. Set Adam lr=1e-4, decay=1e-4.
PM me if you face any more issues.
FastestLearner t1_j8gmxcw wrote
Reply to MacBook Air vs Pro by Fun-Cartographer8611
Read this:
https://www.reddit.com/r/mac/comments/10gpu46/hardware_for_scientific_computing/j54jpj3/?utm_source=share&utm_medium=ios_app&utm_name=iossmf&context=3
My recommendation: (1) Abandon macOS and get a laptop with an Nvidia GPU. Or (2) If you don’t like working on Linux/Windows and prefer Macs, then get a cheap MBA and an laptop with an Nvidia GPU. Use the Mac for coding but run the codes over ssh on the Nvidia laptop. The combined price would not exceed that of a specced out MacBook Pro, while the perf benefit would be more than 10x. Or (3) If you want to both code and run your code on a Mac and also don’t want to carry two laptops, then get the highest specced MacBook Pro possible. Neural network training is computationally very expensive. Normally we run our neural networks in our lab servers that contain anywhere between 4 to 64 GPUs. Even the highest end M2 Maxes are nothing to an RTX 4090.