chatterbox272 t1_j7kjmwc wrote

>I've seen a big push from for Swift (they claim it's the future, etc)

You've seen some dated stuff, from before S4TF became dead in the water.

The indisputable most useful language for ML is Python. The ecosystem is by far the strongest, and the language more-or-less stays out of your way while you interact with specific libraries that do what you want. Those libraries, are written in highly optimised compiled languages like C/C++, so are extremely efficient. As long as you keep them fed, you'll see very little of the "python-slow-interpreted-bad".


chatterbox272 t1_j6myph4 wrote

If the goal is to keep all predictions above a floor, the easiest way is to make the activation into floor + (1 - floor * num_logits) * softmax(logits). This doesn't have any material impact on the model, but it imposes a floor.

If the goal is to actually change something about how the predictions are made, then adding a floor isn't going to be the solution though. You could modify the activation function some other way (e.g. by scaling the logits, normalising them, etc.), or you could impose a loss penalty for the difference between the logits or the final predictions.


chatterbox272 t1_j67x55u wrote

Don't train on a laptop. Being portable and using lots of computational power are essentially opposite goals. You're not going to be able to train anything more than toys on battery, at which case if you're going to be tethered to a power cord you might as well be tethered to a desktop. You're also going to be limited in performance, due to a combination of efficiency-focussed laptop hardware as well as the thermal constraints imposed by a laptop form factor. You're far better off getting a highly portable, long battery life, but low power machine, and using cloud resources (even free ones like Colab or Paperspace) to do the heavier lifting.

If you absolutely must use a laptop because you're living out of your car or something and have nowhere to set up a desktop, then the rest depends on what you're doing:

If you're doing "deep learning" (anything involving neural networks more than a layer or two deep) you'll need a discrete GPU from NVIDIA specifically. AMD and AS support exist but are far from mature or feature complete in most frameworks. CPU need only be powerful enough to keep the GPU fed, a modern i5 or equivalent AMD will do the job, although you may find that specs with a suitable GPU aren't offered with less than an i7 / R7.

If you're not doing deep learning, you probably don't need a GPU. In that case, stick with integrated graphics and look for a higher end i7 or i9 (or equivalent AMD).

As a rule, you'll get better support on Linux than Windows or MacOS. You can skirt this in Windows via the WSL.

Finally, this post reads like you haven't even started doing whatever it is you're trying to do. I'm guessing you're a beginner just starting out, and I'd strongly advise anyone at that stage to delay purchasing hardware as long as humanly possible. Everything I noted is a generalisation, none of it specific to what you're doing because you haven't been (and likely can't be) specific to what you're doing. If you get started first using free or cheap cloud resources, you'll get a much better idea of which things you need, and which things you don't.


chatterbox272 t1_j5wfrwp wrote

You can try. Augment hard, use a pretrained network and freeze everything except the last layer, and don't let anyone actually try and deploy this thing. 100 images is enough to do a small proof-of-concept, but nothing more than that.


chatterbox272 t1_j40jame wrote

Not publishing the dataset is becoming less common as we start inching our way slowly to reproducible science. Public code with public data is the simplest form of reproducible research, where we can re-run your experiments with the same code and should get the same result (modulo some extremely low-level randomness or hardware differences that we may not be able to control).

That alone isn't enough to kill a paper, but it doesn't help. As another commenter said, showing your approach on public datasets and other approaches on your dataset will help, as it gives the rest of the community something that is reproducable.

It's more common in medical venues because of a few reasons:

  1. Difficulties around safely releasing medical data. Proper anonymisation and informed consent.
  2. It is more common in medical science to go for a higher level of reproducibility, where the same or a similar study will be done on a different population (i.e. same method, different data). This is pretty uncommon in ML, it's hard to get papers accepted in this format.

chatterbox272 t1_j2c9gnj wrote

>I do feel that Appleā€™s gpu availability and the popularity of AMD demand a more thorough coverage.

Apple's GPUs are 2 years old, and although you didn't mention them Intel's dGPUs are <1 year old. Both account for a relatively small portion of users and an effectively zero percent of deployment/production.

Most non-deep ML techniques aren't built on a crapload of matmuladd operations, which is what GPUs are good at and why we use them for DL. So relatively few components of sklearn would benefit from it and I'd be deeply surprised if those parts weren't already implemented for accelerators in other libraries (or transformable via hummingbird). Contributing to those projects would be more valuable than another reimplementation, lest you fall into the 15 standards problem


chatterbox272 t1_ix7mx5j wrote

>the cost/performance ratio for the 1080's seems great..

Only if your time is worthless, your ongoing running costs can be ignored, and expected lifespan is unimportant.

Multi-GPU instantly adds a significant amount of complexity that needs to be managed. It's not easy to just "hack it out" and have it work under multi-GPU, you either need to use frameworks that provide support (and make sure nothing you want to do will break that support), or you need to write it yourself. This is time and effort you have to spend that you otherwise wouldn't with a single GPU. You'll have limitations with respect to larger models, as breaking up a model over multiple GPUs (model parallelism) is way more complicated than breaking up batches (data parallelism). So models >11GB for a single element are going to be impractical.

You'll have reduced throughput unless you have a server, since even HEDT platforms are unlikely to give you 4 PCIe Gen3 x16 slots. You'll be on x8 slots at best, and most likely on x4 slots. You're going to be pinned to much higher end parts here, spending more on the motherboard/cpu than you would need to for a single 3090.

It's also inefficient as all buggery. The 3090 has a TDP of 350W, the 1080Ti has 250W. That means for the same compute you're drawing roughly (TDP is a reasonable but imperfect stand in for true power draw) 3x the power for that compute. That will drastically increase the running cost of the system. Also a more expensive power supply and possibly even needing to upgrade the wall socket to allow you to draw that much power (4 1080Ti to me means a 1500W PSU minimum, which would require a special 15A socket in Australia where I live).

You're also buying cards that are minimum 3 years old. They have seen some amount of use, and use in a time where GPU mining was a big deal (so many of the cards out there were pushed hard doing that). The longer a GPU has been out of your possession, the less you can rely on how well it was kept. The older arch will also be sooner dropped for support. Kepler was discontinued last year, so we have Maxwell and then Pascal (where the 10 series lies). Probably a while away, but a good bit sooner than Ampere (which has to wait through Maxwell, Pascal, Volta, and Turing before it hits the chopping block).

Pros: Possibly slightly cheaper upfront
Cons: Requires more expensive hardware to run, higher running cost, shorter expected lifespan, added multi-GPU complexity, may not actually be compatible with your wall power.

TL; DR was TL; DR: Bad idea, don't do it.


chatterbox272 t1_iw5jfwn wrote

I'll regularly write custom components, but pretty rarely write whole custom networks. Writing custom prediction heads that capture the task more specifically can improve training efficiency and performance (e.g. doing hierarchical classification where it makes sense, customized suppression postprocessing based on domain knowledge, etc.).

When I do write networks from scratch, they're usually variations on existing architectures anyway. E.g. implementing a 1D or 3D CNN using the same approach as existing 2D CNNs like ResNet or ConvNext. I usually find I'm doing this when I'm in a domain task and don't already have access to pretrained networks that are likely to be reasonable initialisation.


chatterbox272 t1_iv4kbwb wrote

>It will output stuff from open source projects verbatim

I've seen this too, however only in pretty artificial circumstances. Usually in empty projects, and with some combination of exact function names/signatures, detailed comments, or trivially easy blocks that will almost never be unique. I've never seen an example posted in-context (in an existing project with it's own conventions) where this occurred.

>One solution without messing with co-pilot training or output is to have a second program look at code being generated to see if it's coming from any of the open source projects on gitbub and let the user know so they can abide by the license.

This kinda exists, there is a setting to block matching open-source code although reportedly it isn't especially effective (then again, I've only seen this talked about by people who also report frequent copy-paste behaviour, something I've not been able to replicate in normal use).


chatterbox272 t1_iv4ic65 wrote

I was just proposing some candidate reasons. Your political journalism article is only relevant to Americans, and there's plenty of other people who live in other places who don't give a rats about it.

Like I said, I never liked twitter in the first place so I'm going to follow anyone who moves to mastodon and let that take over as much of my twitter usage as possible. I might not get off it entirely, but I'd like to try. I personally dislike Elon's consistent over-promise-under-deliver strategy that he applies to the genuinely cool tech his companies develop. So if rolling back my twitter usage means that I see and hear less of/about him then great.


chatterbox272 t1_iv3w3yh wrote

  1. Reportedly hate speech on the platform has gone through the roof, people may not want to even risk having to put up with that. It was already bad at times, so it getting worse is concerning.
  2. Musk has been very clear with his intended direction of the platform. Continued use of the platform is acceptance of that direction. Essentially the "vote with your wallet" kind of thing, except it's "vote with the advertising revenue twitter would make from having you on the platform"
  3. Musk is a pretty controversial person in general, and now that he owns Twitter he profits from you being on that platform. Leaving Twitter because you don't want to support Elon Musk might be a reason.
  4. Relevant to this community, Twitter purged their entire "responsible AI" team in the layoffs. As with point 2, leaving because you don't support that decision.
  5. Leaving because you never liked Twitter and you now might have an alternative available with the community considering moving. Or even if you did like Twitter but like Mastodon more.

For me I'm weaning out mostly on point 5, never really liked Twitter but tolerated it because there was/is value in the community. If the community is moving I'll jump on the opportunity to move. 2 is also a factor, I don't believe in the absolutist free speech point of view and won't participate in that version of Twitter for the same reason I don't browse 4chan.

The things that have changed over the past week may be minor from certain points of view, but they can also be the straw that broke the camel's back. If people already weren't too fond of Twitter, any one of the minor changes might be enough to finally push them over the edge.


chatterbox272 t1_iv3tw2v wrote

Street prices vary over time and location. For example, I have zero issues getting a 4090 at RRP where I am in Australia. Using RRP for comparisons makes the comparisons more universal and evergreen.

If you're considering a new GPU you'll need to know the state of your local market, so you can take the findings from Lambda, apply some knowledge about your local market (e.g. 4090s are actually 1.5x RRP right now where you are, or whatever), and then you can redraw your own conclusions. Alternatively, if they were using "street prices" I'd also have to know the state of their local market (wherever that happens to be) at the time of writing (whenever that happens to be), then work out the conversion from their market to mine.


chatterbox272 t1_iudez8h wrote

Without a doubt. You get more than double the VRAM (11GB -> 24GB), and you get tensor cores which are significantly faster, and also half-precision tricks give you effectively double VRAM again compared to FP32. A 3090 (possibly Ti) can train the kinds of models you used to train on 4x1080Ti.


chatterbox272 t1_irifzyq wrote

Your model is a teeny-tiny MLP, your dataset is relatively small, it's entirely possible that you're unable to extract rich enough information to do better than 70% on the val set.

You also haven't mentioned how much L2 or Dropout you're using, nor how they do on their own. Both of those methods come with their own hyperparameters which need to be tuned.


chatterbox272 t1_iqp67eq wrote

It is most likely because the focal term ends up over-emphasizing the rare class term for their task. The focal loss up-weights hard samples (most of which will usually be the rare/object class) and down-weights easy samples (background/common class). The alpha term is therefore being set to re-adjust the background class back up, so it doesn't become too easy to ignore. They inherit the nomenclature from cross entropy, but they use the term in a different way and are clear as mud about it in the paper.