BrotherAmazing

BrotherAmazing t1_je7vj9v wrote

People saying get more than 100 images are right (all else being equal, yes, get more images!) but you likely can make good progress without as many images for your problem with clever augmentation and a smaller network.

Here’s why:

  1. You only have to detect cavities. It’s not some 1,000-class semantic segmentation problem.

  2. You should be working with single channel grayscale images, and not pixels that naturally come in 3-channel RGB color.

  3. This is X-ray data just of teeth, so you don’t have nearly the amount of complex fine-detailed textures and patterns (with colors) that are exhibited in more general RGB optical datasets of all sorts of objects and environments.

Of course for a real operational system that you will use in commercial products you will need to get far more than 100 images. However, for a simple research problem or prototype demo, you can likely show good results and feasibility (without overfitting, yes) on your dataset with a smaller net and clever augmentation.

7

BrotherAmazing t1_jdn10wi wrote

I can’t speak directly to the question posed, but I have often observed people/groups that either:

  1. Overparametrize the model and then uses regularization as needed to avoid overfitting

  2. Underparametrize a “baseline” prototype, then work their way up to a larger model until it meets some performance requirement on accuracy, etc.

Time and time again I have seen approach 2 lead to far smaller models that train and run much faster and sometimes yield better test set results than approach 1 depending on the data available during training. I have, of course, seen approach 1 perform better than approach 2 at times, but if you have an accuracy requirement and ramp up the model complexity in approach 2 until you meet/exceed it, you still met your requirement and end up with a smaller faster to run/train model.

4

BrotherAmazing t1_jda5jna wrote

Either it’s an easy problem where 98% - 100% accuracy on samples this size is just typical and not really worth publishing, or (not exclusive) the study is flawed.

One could get a totally independent data set of FNA images with these features extracted from different patients in different years, etc. and run their random forest on those. If it gets 98% - 100% accuracy then this is not a hard problem (the feature engineering might have been hard—not taking away from that if so!). If it fails miserably or just gets waaaay lower that 100% you know the study was flawed.

There are so many ML neophytes making “rookie mistakes” with this stuff who don’t fully grasp basic concepts that I think you always need a totally new independent test set that the authors didn’t have access to in order to really test it. That’s even a good idea for experts to be honest.

The paper’s conclusion is likely wrong either way; i.e., that Random Forests are “superior” for this application. Did they get an expert in XGBoost, neural networks, etc and put as much time and effort into those techniques using the same training and test sets to see if they also got 99% - 100%? It didn’t appear so from my cursory glance.

1

BrotherAmazing t1_jch3dkl wrote

I agree with your sentiment and have no problem with that.

There just seem to be more than one or two people here with the idea that Corporate entities have generally been publishing a higher % of their R&D than they actually ever did though. Some people (not saying you personally) seem to go farther and believe it is their duty to publish important IP and research.

I like them publishing and think it’s great, but just believe they never have a “duty” to do so if they don’t want to and have seen companies that “publish” behind the scenes hold a lot back too.

1

BrotherAmazing t1_jch23xt wrote

In the real-world cases I have been involved in, granted it was only four cases, things did not at all play out that way. Once it went to court but the defendant settled on terms favorable to the plaintiff, once the defendant complies with the cease and desist prior to the lawsuit being initiated, and the other two times actually went to trial and weren’t settled (which they told me was rare) with the plaintiffs winning once and the defendants winning once.

What you say really is not true because once you win or lose in court, it cannot be tried again and it’s a settled matter, and that process indeed does legally settle whether there is infringement or not. No one sits around after the verdict is read and scratches their head, wondering whether they are infringing or not.

1

BrotherAmazing t1_jch1gll wrote

I never said they don’t publish, re-read.

I can tell you firsthand what they publish has to get approval, and a lot of things do jot get approval to publish and are held as trade secrets. It boggles my mind this sub clearly has so many people who have never worked on the Corporate side of this industry and have these strong ideas that the Corporate side is or has ever been fully transparent and allows employees to publish anything and everything. The is so far from the truth it’s not funny.

For every model and paper published, there exists another model and many other papers that are not approved to be published and many exist in a different format as internal publications only. Other internal publications get watered down and a lot of extra work is omitted in order to get approval to publish. or they publish “generation 3” to the world while they’re working on “generation 5” internally.

1

BrotherAmazing t1_jce3zky wrote

I would be happy to sign an NDA if Google allowed me to have access to verify, validate, and run some of their most prized models they keep secret and have not released, and it is incredibly rare for an NDA to last forever.

Also, a lot of research goes on behind closed doors among people who have signed NDAs. They still replicate each other’s work and verify and validate it, they just don’t publish it for you to read.

This thread isn’t specifically about “replication research” across the broad range international community either, is it? OP did not indicate that, and primary research a company performs and then successfully transitions it into a system that empirically outperforms the competition is validation enough that need not be replicated by their competitors. In fact, the whole point is you don’t want anyone to replicate it but it is still did valid useful research if you bring a product to market that everyone demands and finds useful.

When you work for Google or nearly any company and nove away from academia, you don’t have an ability to publish everything the company ever has done that you learn about or everything you do at the company automatically. Are you really under that impression? Have you ever worked in the Corporate world??

−3

BrotherAmazing t1_jcdloe7 wrote

You can still replicate results in private under a non-disclosure agreement or verify/validate results without it getting published to the world though.

I like open research but research that happens in private still can be useful and is reality.

−4

BrotherAmazing t1_jcdhklt wrote

All of these companies publish some things, they keep other things trade secrets, patent other things, and so on. Each decision is a business decision.

This thread is baffling to me because so many people seem to have this idea that, at one time, AI/ML or any tech companies were completely “open” and published everything of any value. This is nowhere close to reality.

3

BrotherAmazing t1_jcdgtka wrote

I don’t understand what OP is worried or complaining about. Every business can choose whether they wish to publish or release IP or withhold it and keep it as a trade secret. That is a business decision.

You are allowed to “benefit from” information other companies publish so long as you don’t break any laws.

OP implies OpenAI is infringing on patents and Google or Meta should enforce their patents and make OpenAI pay royalties, cease and desist, or face legal consequences. What patents is OpenAI infringing on? I have an INCREDIBLY hard time believing Google or Meta wouldn’t go after someone who was infringing on their patents if they became aware of it.

−5

BrotherAmazing t1_jb37vx3 wrote

It’s sort of a “clickbait” title I didn’t like myself even if it’s a potentially interesting paper.

Usually we assume dropout helps prevent overfitting, not help with underfitting, but the thing I don’t like about the title is it makes it sound like dropout helps with underfitting in general. It does not and they don’t even claim it does—even by the time you finish reading their Abstract you can tell that they’re only saying dropout has been observed to help with underfitting in certain circumstances when used in certain ways only.

I can come up with low dimensional counter-examples where dropout won’t help you when you’re underfitting, and will necessarily be the cause of the underfitting for example.

6

BrotherAmazing t1_j9x1vlw wrote

Ethereum does, and should also be differentiated from “crypto” in general, but unlike Bitcoin it is a security and also has started seriously considering implementing mechanisms to censor transactions.

You also seem to ignore or perhaps are unaware of the Lightning Network.

But I agree Bitcoin, Ethereum, and Monero are a few non-scam non-shit blockchains, but almost everything else is complete rubbish, hence the main point that $COIN is at risk when they’re shilling shitcoin scams up the ying-yang and Papa Gensler comes guns a-blazing one of these days.

1

BrotherAmazing t1_j9wxw7f wrote

I wasn’t saying everyone should use Bitcoin or that it should or will be adopted by the masses, replace fiat, etc etc. No, I don’t agree with that. I only was remarking that it is one of the few that should be distinguished from the rest of crypto as clearly not being a security (how many cryptos has Gensler publicly admitted again and again that they are not securities?) and it does solve use-case problems and provide value to some people in certain situations, often outside the developed world. I didn’t say anything about the size of the market for those who value using the Bitcoin payment network.

If someone wanted me to give them my jewelry in exchange for a check on a Sunday and I didn’t know them, I would do it. If they offered an ACH or Visa and it was “Pending” I wouldn’t give them the jewelry until it settled if I didn’t know them. That would take some time. If they paid me in BTC and it settled, I’d give them the jewelry.

1

BrotherAmazing t1_j9wmzyw wrote

Me personally? Only a few grand each year.

But I’ve never used Moneygram or Western Union in my life, never play video games, never use a lot of things I personally don’t have a desire to use.

That doesn’t mean these thing I personally don’t use don’t serve some use case or aren’t valued by people in this world.

2

BrotherAmazing t1_j9wmopp wrote

I already named useful things Bitcoin does and you completely ignored them:

  1. Bitcoin provides final irreversible settlement faster than ACH or Visa. This is important for me if I want to be 100% sure I have a final payment that can’t be clawed back before I send you something of value or perform a service for you of value. This can occur on a holiday or weekend or after hours when banks are closed.

  2. I can send you a payment and it cannot be censored, sanctioned, or declined.

  3. For a speculative investment, the native token of the Bitcoin payment network, BTC, is not a security by definition (even the SEC agrees) and that risk associated with being a security is eliminated. It also has a cap of 21M total BTC. Speculative? Yes! Don’t invest what you can’t afford to lose. Stupid? No! Not for 0.5% - 1% of your portfolio. The risk-reward given Bitcoin’s internal monetary policy and history of a bull market after each halving justifies speculation far more than on, say, 90% of the nonsense that gets thrown around here. It has the largest market cap for a reason.

There are more advantages than those I list, but those are very obvious advantages of Bitcoin.

Furthermore, Bitcoin was invented to solve a problem that was technical in nature, not the problem of “I want to get rich off a vaporware shitcoin” like almost all other chains besides Monero and Ethereum.

But the whole point here isn’t why Bitcoin will survive, the whole point is > 99% of crypto is garbage shit and Coinbase may not be a great investment longterm.

2

BrotherAmazing t1_j9w54r6 wrote

People have been saying that for 10 years. These comments have never aged well. Just wait for one or two more halvings and you’ll be seeing people saying “Bitcoin is dead!” when it crashes down to $75k a coin.

Literally you people have always been wrong longterm about Bitcoin, and there is nothing new or compelling you are bringing up here as an argument. I would be happy to listen if you had anything new that hasn’t been said constantly for the last 10 years while BTC has done nothing but gained adoption and crushed the S&P 500.

5

BrotherAmazing t1_j9vpk26 wrote

Not true.

Bitcoin is a peer-to-peer electronic payment network that requires no trusted 3rd party, no bank or Visa can sanction or deny payments on Bitcoin (Ethereum is already floating adding this capability now that they are PoS), and final irreversible payment occurs faster than Visa or ACH. Bitcoin is not a security, even the SEC has admitted this, while Ethereum is now considered a security by many at the SEC and even those who want to be Ethereum friendly have not said it isn’t a security like they have with Bitcoin, which is another advantage of Bitcoin.

I’m not going to shit on Ethereum like I will shit on 99% of the other crap out there (smart contracts are cool, I like Vitalik overall), but to say Bitcoin has no use case is ignorant. Just because it may not have a use case for you doesn’t mean there aren’t tens of thousands of people every day finding value in the services that Bitcoin’s network provides.

2

BrotherAmazing t1_j9uuqpv wrote

Bitcoin is one of the few with a use case, adoption, and value that is not inherently speculative, but almost all crypto is trash (way more than 99%) and even the price of BTC is hard to know what it would be if we took away a lot of the speculative part of the demand; i.e., what portion of the demand is by those who actually use the bitcoin payment network and it fills a niche service of value/utility for them?

0

BrotherAmazing t1_j9utljt wrote

Bitcoin will survive. Blockchain will survive. The question is whether all these shitcoins survive, and I think the answer is no. Not in the U.S. at least.

If you have an ICO and reserve a bunch of the supply for yourself, the founder(s) and dev team, you’re registering as a security or not operating lawfully in the U.S. is where this is going. Otherwise you launch like Bitcoin did and are then clearly not a security.

The days of “cryptoChef2020” making up a token out of thin air or a blatant copy of an Open Source project, gives himself 20% of the supply and his “dev team” 20% of the supply, holds an ICO on his own with vaporware/copied code behind it he pumped, then either skips town or legit tries to make the tech gain adoption for a little while to ease his conscience and might get listed on Coinbase are going to be gone soon if not already.

11