suflaj t1_jebxmmx wrote

If you have a small dataset, then Transformers are out of the question, especially if we're talking pretraining and all.

Seems to me like you might be interested in ML methods, such as XGBoost. Since you have tabular data it will probably outperform all other methods at first. From there on out you would be trying to find a better tailored model from the task, depending on how you want to use your data. Given your data situation, you would be looking at deep LSTMs for the end game. But currently, it doesn't matter if it's 20 or 2000 samples (idk how you count them), that's not enough to solve something you claim is too difficult to outright mathematically model.

Reinforcement learning might not be adequate given that you say that the problem is too difficult to model mathematically. RL will only be useful to you if it is difficult to model it because the problem is wide, ie it is hard for you to narrow it down to a general formula. If the problem is hard in the sense that it would be difficult or narrow, then your agent might not be able to figure out how to solve the task at all, and you would have to think out the training regimen really well to teach it anything. RL is not really well suited for very hard problems.

Finally, it doesn't seem to me you have an environment set up for the agent, because if you did, your problem would be solved given that it would require you to mathematically model it. And if it was easy to obtain data in the first place, you would be having way more than 20 or 2000 samples. That's why I presume that RL is completely out of the question for you as well.

I would personally not tackle this problem with trajectories. If you want to solve this using DL, then you should create a bigger dataset using actual camera recording, and then either label the bounding boxes or segment the image. Then you can use any of the pretrained backbones and simply train an object detector. Given an offset in the next frame, you can calculate the movement for the camera.

This is a task so generic that just with a few hundred to thousand samples you can probably get a semi-supervised labelling scheme going on - with some other model labelling the images automatically and then you just need a few humans judging these labels or correcting them. And this task is so trivial and widespread you can find a workforce to do this anywhere.

The question is what performance you would expect. But in all cases I would say that if you need a very robust solution, you should probably look into mathematically modelling it - we are presumably talking about a differential system in the background, which is not going to be easily solved by any mainstream DL model. All methods mentioned here can essentially be dumbed down to a very large non-linear equation. They can only mimic a differential system up to a certain amount of precision, determined by their width and depth, as well as the statistic significance of your samples.


suflaj t1_je1uvo8 wrote

They probably redid the experiments themselves. Also, ResNets had some changes shortly after release I believe, and they could have used different pretraining weights. AFAIK He et al. never released their weights.

Furthermore, Wolfram and PyTorch pretrained weights are also around 22% top-1 error rate, so that is probably the correct error rate. Since PyTorch provides weights that reach 18% top-1 error rate with some small adjustments to the training procedure, it is possible the authors got lucky with the hyperparameters, or employed some techniques they didn't describe in the paper.


suflaj t1_jdxh3kc wrote

That would be breaching copyright. Depending on the company and the product, you'd get anywhere from a pretty nasty debt to straight up ruining your whole life (and potentially the lives of your family and people associated with you).

The same way you wouldn't steal from the mob, you would not steal from a company that makes money on a product FOSS can't compete with. Aside from that, decompilers exist for a very long time yet we have not witnessed such vigilantism.


suflaj t1_jdxgd3z wrote

In most cases yes, but inherently no. Understand that compilers, as part of their optimization step, might compile high level code into something that you can't really connect with the actual code. Part of the information is lost in the optimization step and so in a general case you will not be able to revert the compilation step. At least not fully, of course you will be able to get something resembling the solution, but it is not guaranteed to be the exact code that compiled into your starting input.

This is, of course, after taking into consideration you will not be able to recover dead source code if it's never compiled into something. Because if you take this into account, even if a language does not optimize the source code otherwise, if it only discards dead code: you are also losing information.

And also, this is disregarding name mangling. Obviously name mangling can be done in a way you have information loss, but this is probably irrelevant since concrete entity names are not that relevant.


suflaj t1_jdvbyog wrote

With the constraints you have I'm afraid the best you could do is:

  • find a person who can quickly copy and paste prompts
  • give them internet access
  • pay for ChatGPT Plus
  • have them copy user prompts into ChatGPT and copy its answer to the user

suflaj t1_jdqh5se wrote

> Gpt hallucinates a lot and is unreliable for any factual work.

No, I understand that's what you're saying, however, this is not a claim that you can even check. You have demonstrated already that your definitions are not aligned with generally accepted ones (particularly for intuition), so without concrete examples this statement is hard to take into account seriously.

> Your wall of text can be summarized as, “I’m gonna debate you by suggesting no one knows the definition of AGI.”

I'm sad that's what you got from my response. The point was to challenge your claims about whether GPT4 is or isn't AGI based on the mere fact you're judging that over properties which might be irrelevant for the definition. It is sad that you are personally attacking me instead of addressing my concerns.

> No one knows what the definition of intuition is

That is not correct. Here are some definitions of definition:

  • an ability to understand or know something immediately based on your feelings rather than fact (Cambridge)
  • the power or faculty of attaining to direct knowledge or cognition without evident rational thought and inference (Merriam-Webster)
  • a natural ability or power that makes it possible to know something without any proof or evidence : a feeling that guides a person to act a certain way without fully understanding why (Brittanica)

You might notice that all these 3 definitions are satisfied by DL models in general.

> but what we know is that memory does not play a part in it.

This is also not true:

The question is - why are you making stuff up despite the counterevidence being 1 Google search away?

> It’s actually hilarious that you bring up source citation as some form of trump card after I mention how everything you know about GPT4 is something someone has told you to believe in without any real discernible and reproducible evidence.

I bring it up as you have not provided any other basis for your claims. You refuse to provide the logs for your claims to be checked. Your claims are contrary to my experience, and it seems others' experience as well. You claim things contrary to contemporary science. I do not want to discard your claims outright, I do not want to personally attack you despite being given ample opportunity to do so, I'm asking you to give me something we can discuss and not turn it into "you're wrong because I have a different experience".

> Instead of maybe asking me to spoon feed you spend a whole of 20 secs googling.

I'm not asking you to spoon feed me, I'm asking you to carry your own burden of proof. It's really shameful for a self-proclaimed person in academia to be offended by someone asking them for elaboration.

Now, could you explain what those links mean? The first one, for example, does not help your cause. Not only does it not concern GPT4, but rather Bard, a model significantly less performant than even ChatGPT, it also claims that the model is not actually hallucinating, but not understanding sarcasm.

The second link also doesn't help your cause - rather than examining the generalization potential of a model, it suggest the issue is with the data. It also does not evaluate the newer problems as a whole, but a subset.

The 3rd and 4th links also do not help your cause. First, they do not claim what you are claiming. Second, they list concerns (and I applaud them for at least elaborating a lot more than you), but they do not really test them. Rather than claims, they present hypotheses.

> “I don’t quite get it how works” + “it surprises me” ≠ it could maybe be sentient if I squint.

Yeah. Also note: "I don't quite get how it works" + "It doesn't satisfy my arbitrary criteria on generalization" ≠ It doesn't generalize

> after I acknowledged and corrected the mistake myself

I corrected your correction. It would be great if you could recognize that evaluation the performance on a small subset of problems is not equal to evaluating whether the model aces anything.

> maybe you have some word quota you were trying to fulfill with that

Not at all. I just want to be very clear, given that I am criticisng your (in)ability to clearly present arguments; doing otherwise would be hypocritical.

> My point is, it’s good at solving leetcode when it’s present in the training set.

Of course it is. However, your actual claim was this:

> Also the ones it does solve it solves at a really fast rate.

Your claim suggested that the speed at which it solves it is somehow relevant to the problems it solves correctly. This is demonstrably false, and that is what I corrected you on.

> Ps- also kindly refrain from passing remarks on my understanding of the subject when the only arguments you can make are refuting others without intellectual dissent.

I am not passing these remarks. You yourself claim you are not all that familiar with the topic. Some of your claims have not only cast doubt about your competence on the matter, but now even of the truthfulness of your experiences. For example, I have been beginning to doubt whether you have even used GPT4 given your reluctance to provide your logs.

The arguments I am making is that I don't have the same experience. And that's not only me... Note, however, that I am not confidently saying that I am right or you are wrong - I am, first and foremost, asking you to provide us with the logs so we can check your claims, that for now are contrary to the general public's opinion. Then we can discuss what actually happened.

> It’s quite easy to say, “no I don’t believe u prove it” while also not being able to distinguish between Q K and V if it hit u on the face.

It's also quite easy to copy paste the logs that could save us from what has now turned into a debate (and might soon lead to a block if personal attacks continue), yet here we are.

So I ask you again - can you provide us with the logs that you experienced hallucination with?

EDIT since he (u/BellyDancerUrgot) downvoted and blockedme

> Empty vessels make much noise seems to be a quote u live by. I’ll let the readers of this thread determine who between us has contributed to the discussion and who writes extensively verbose commentary , ironically , with 0 content.

I think whoever reads this is going to be sad. Ultimately, I think you should make sure as little people see this as possible, this kind of approach bring not only shame to your academic career, but also to you as a person. You are young, so you will learn not to be overly enthusiastic in time, though.


suflaj t1_jdnvq8q wrote

> Not giving u the exact prompts

Then we will not be able to verify your claims. I hope you don't expect others (especially those with a different experience, challenging your claims) to carry your burden of proof.

> When I said ‘ace’ I implied that It does really good on leetcode questions from before 2021 and it’s abysmal after.

I have not experienced this. Could you provide the set of problems you claim this is the case for?

> Also the ones it does solve it solves at a really fast rate.

Given its architecture, I do not believe this is actually the case. Its inference is only reliant on the output length, not the problem difficulty.

> From a test that happened a few weeks ago it solved 3 questions pretty much instantly and that itself would have placed it in the top 10% of competitors.

That does not seem to fit my definition of acing it. Acing is being able to solve all or most question. Given a specific year, that is not equal to being able to solve 3 problems. Also, refer to above paragraph about why inference speed is meaningless.

Given that it is generally unknown what it was trained on, I don't think it's even adequate to judge its performance on long-known programming problems.

> Insufficient because as I said , no world model, no intuition, only memory. Which is why it hallucinates.

You should first cite some authority on why it would be important. We generally do not even know what it would take to prevent hallucination, since we humans, who have that knowledge, often hallucinate as well.

> Intuition is understanding the structure of the world without having to have the entire internet to memorize it.

So why would that be important? Also, the world you're looking for is generalizing, not intuition. Intuition has nothing to do with knowledge, it is at most loosely tied to wisdom.

I also fail to understand why such a thing would be relevant here. First, no entity we know of (other than God) would possess this property. Secondly, if you're alluding that GPT- like models have to memorize something to know, you are deluding yourself - GPT-like models memorize relations, they are not memory networks.

> A good analogy would be of how a child isnt taught how gravity works when they first start walking.

This is orthogonal to your definition. A child does not understand gravity. No entity we know of understands gravity, we at most understand its effects to some extent. So it's not a good analogy.

> Or how you can not have knowledge about a subject and still infer based on your understanding of underlying concepts.

This is also orthogonal to your definition. Firstly it is fallacious in the sense that we cannot even know what is objective truth (and so it requires a very liberal definition of "knowledge"), and secondly you do not account for correct inference by chance (which does not require understanding). Intuition, by a general definition, has little to do with (conscious) understanding.

> These are things u can inherently not test or quantify when evaluating models like gpt that have been trained on everything and you still don’t know what it has been trained on lol.

First you should prove that these are relevant or wanted properties for whatever it is you are describing. In terms of AGI, it's still unknown what would be required to achieve it. Certainly it is not obvious how intuition, however you define it, is relevant for it.

> I’m not even an NLP researcher and even then I know the existential dread creeping in on NLP researchers because of how esoteric these results are and how AI influencers have blown things out of proportion citing cherry picked results that aren’t even reproducible because you don’t know how to reproduce them.

Brother, you just did an ad hominem on yourself. These statements only suggest you are not qualified to talk about this. I have no need to personally attack you to talk with you (not debate), so I would prefer if you did not trivialize your standpoint. For the time being, I am not interested in the validity of it - first I'm trying to understand what exactly you are claiming, as you have not provided a way for me to reproduce and check your claims (which are contradictory to my experience).

> There is no real way an unbiased scientist reads openAIs new paper on sparks of AGI and goes , “oh look gpt4 is solving AGI”.

Nobody is even claiming that. It is you who mentioned AGI first. I can tell you that NLP researchers generally do not use the term as much as you think. It currently isn't well defined, so it is largely meaningless.

> Going back on what I said earlier, yes there is always the possibilit

The things worth considering you said are easy to check - you can just provide the logs (you have the history saved) and since GPT4 is as reproducible as ChatGPT, we can confirm or discard your claims. There is no need for uncertainty (unless you will it).


suflaj t1_jdlruqe wrote

Could you share those questions it supposedly hallucinated on? I have not see it hallucinate EVER on new chats, only when the hallucination was based on that chat's hiatory.

> Of course it aces tests and leetcode problems it was trained on.

It does not ace leetcode. This statement casts doubt about your capabilities to objectively evaluate it.

> How do you even get an unbiased estimate of test error?

First you need to define unbiased. If unbiased means no direct dataset leak, then the existing evaluation is already done like that.

> Doesn’t even begin to approach the footholds of AGI imo.

Seems like you're getting caught on the AI effect. We do not know if associative memory is insufficient to reach AGI.

> No world model. No intuition.

Similarly, we do not know if those are necessary for AGI. Furthermore, I would dare you to define intuition, because depending on your answer, DL models inherently have that.


suflaj t1_jdf3j2k wrote

Unless you plan on quantizing your model or loading it layer by layer, I'm afraid 2B parameters is the most you'll get. 10GB VRAM is not really enough for CV nowadays, let alone NLP. With quantization, you can barely run the 7B model.

4 bit doesn't matter at the end of the day since it's not supported out of the box, unless you intend to implement it yourself.


suflaj t1_jcfbkxq wrote

It took more than 6 years from zero, because to reach GPT4 you had to develop transformers and all the GPTs before 4... The actual difference between ChatGPT and GPT4 is apparently in the data and some policies that regulate when it is allowed to answer (which are still incomplete). This is not remarkable.

I AGAIN fail to see how this relates to previous comments.


suflaj t1_jcevhag wrote

I understand what the user is saying, I do not understand how it relates to anything said before that.

Sadly, while GPT4 may be able to predict the next token based on studying the language's syntax thoroughly, it still fails to actually understand anything. Unless the original commenter is a bot, I would expect them to explain how what they said has anything to do with my comment, or the claims made about NLP researcher's obsolescence due to its release.


suflaj t1_jcd4131 wrote

> I’ve been using ChatGPT to write all of my sales emails for difficult clients lately, and it has been fantastic. It took what should have been another staffmember at my company and made it into a proofreading duty I can handle while working on other things.

I fail to see the point you're making.

> Also… hate to say it, but the fact that you’re using the words “humiliated” and “jailbroken” in this context doesn’t exactly cast a very good light on your understanding of the situation.

I also fail to see what you're saying. How else would you describe events in which you show how stupid ChatGPT actually is and those where you get to trick it to bypass all security filters?


suflaj t1_jccsipl wrote

You mean the same type of foresight with GPT3, when people (or rather "people", given that it was mostly journalists) got baited into spreading hysteria over the authors claims that the technology is world ending? Or ChatGPT, which was humiliated and jailbroken within 36 hours of its public release?

It has been a day now, and I've heard the same concerns that it's ultimately biased. Definitely not career-ending.


suflaj t1_jccohd6 wrote

> The recent release of GPT4 has apparently sent most of that sector into a mass existential crisis

I don't know where you got this from

I can tell you for sure that no one worth their salt would make claims like those for something that has been out for a day, and from what I've seen, still has the same problems you can get sued over. Might be torchkiddies larping NLP peeps and starting mass hysteria. The Andrew Ngang.


suflaj t1_jc7jibo wrote

> The same could have been said of Deep Learning until the Image Net breakthrough. The improvement process is evolutionary, and this may be a step in that process.

This is not comparable at all. ImageNet is a database for a competition - it is not a model, architecture or technique. When it was "beaten", it was beaten not by a certain philosophy or ideas, it was beaten by a proven implementation of a mathematically sound idea.

This is neither evaluated on a concrete dataset, nor is it delved into deeply in the mathematical sense. This is a preprint of an idea that someone fiddled with using a LLM.

> As for reinforcement learning, it has been successfully applied in many real-world scenarios, including robotics, game playing, and autonomous driving.

My point is that so has the 6 year old DNC. The thing is, however, that neither of those is your generic reinforcement learning - they're very specifically tuned for the exact problem they are dealing with. If you actually look at what is available for DRL, you will see that aside from very poor framework support, probably the best we have is Gym, the biggest issue is how to even get the environment set up to enable learning. The issue is in making the actual task you're learning easy enough for the agent to even start learning. The task of knowing how to memorize or recall is incredibly hard, and we humans don't even understand memory well enough to construct problem formulations for those two.

Whatever technique you come up with, if you can't reproduce it for other problems or models, you will just be ending up with a specific model. I mean - look at what you are saying. You're mentioning AlphaGo. Why are you mentioning a specific model/architecture for a specific task? Why not a family of models/architectures? Maybe AlphaZero, AlphaGo, MuZero sound similar, but they're all very, very different. And there is no real generalization of them, even though they all represent reinforcement learning.

> This is one path and other methods could be incorporated such as capsule networks, which aim to address the limitations of traditional convolutional neural networks by explicitly modeling the spatial relationships between features.

And those are long shown to be a scam, basically. Well, maybe not fundamentally scam, but definitely dead. Do you know what essentially killed them? Transformers. And do you know why Transformers are responsible for almost killing the rest of DL architectures? Because they showed actual results. The paper that is the topic of this thread fails to differentiate the contribution of this method disregarding the massive transformer they're using alongside it. If you are trying to show the benefits of a memory augmented system, why simply not use a CNN or LSTM as controller? Are the authors implying that this memory system they're proposing needs a massive transformer to even use it? Everything about it is just so unfinished and rough.

> Another approach is to use memory augmented networks to store and update embeddings of entities and their relationships over time, and use capsule networks to decode and interpret these embeddings to make predictions. This approach can be especially useful for tasks that involve sequential data, such as language modeling and time-series forecasting.

Are you aware that this exactly has been done by Graves et al., where the external memory is essentially a list of embeddings that is 1D convoluted on? The problem, like I mentioned, is that this kind of process is barely differentiable. Even if you do fuzzy search (Graves at al. use sort of an attention based on access frequency alongside the similarity one), your gradients are so sparse your network basically doesn't learn anything. Furthermore, the output of your model is tied to this external memory. If you do not optimize the memory, then you are limiting the performance of your model severely. If you are, then what you're doing is nothing novel, you have just arbitrarily decided that part of your monolithic network is memory, even though it's just one thing.


suflaj t1_jc73bnx wrote

I have skimmed over it before writing this. They have what working? Synthetic toy examples? Great, Graves et al. had even more practically relevant problems solved 6 years ago. The thing is, it never translated into solving real world problems, and the paper and follow up work didn't really manage to demonstrate how it could actually be used.

So, until this paper results in some metrics on known datasets, model frameworks and weights, I'm afraid there's nothing really to talk about. Memory augmented networks are nasty in the sense that they require transfer learning or reinforcement learning to even work. It's hard to devise a scheme where you can punish bad memorization or recall, because it's hard to link the outcome of some recall + processing to the process that caused such recall.

Part of the reason for bad associative memorization and recall is the data itself. So naturally, it follows that you should just be able to optimize the memorized data, no? Well, it sounds trivial, but it ends up either non-differentiable (because of an exact choice, rather than a fuzzy one), or hard to train (vanishing or sparse gradients). And you have just created a set of neural networks, rather than just a monolithic one. That might be an advantage, but it is nowhere near as exciting as this paper would lead you to believe. And that would not be novel at all: hooking up a pretrained ResNet with a classifier would be of the same semantics as that, if you consider the ResNet a memory bank: a 7 year old technique at this point.

Memorizing things with external memory is not exactly a compression task, which DNNs and gradient descent solve, so it makes sense that it's hard in a traditional DL setting.


suflaj t1_jc7119l wrote

This is not something new. It was already present 6 years ago, pioneered by Graves et al ( The takeaway was that it's hard, if not impossible to train.

The paper did not present any benchmarks on known sets. Until that happens, sadly, there is nothing really to discuss. Neat idea, but DL is all about results nowadays.

I was personally working on a full neural memory system myself, I built the whole framework for it, just to find out it wouldn't train on even a toy task. Graves' original work required curriculum learning to work for even toy tasks, and I am not aware of any significant achievement using his Differentiable Neural Computers.


suflaj t1_jc6n8v1 wrote

Just apply an aggregation function on the 0th axis. This can be sum, mean, min, max, whatever. The best is sum, since your loss function will naturally regularise the weights to be smaller and it's the easiest to differentiate. This is in the case you know you have 18 images, for the scenario where you will have a variable amount of images, use mean. The rest are non-differentiable and might give you problems.

If you use sum, make sure you do gradient clipping so the gradients don't explode in the beginning.


suflaj t1_jbx9h57 wrote

But there is evidence of a defense by taking as many adversarial attacks as possible and training against them. Ultimately, the ultimate defense is generalization. We know it exists, we know it is achievable, we only don't know HOW it's achievable (for non-trivial problems).


suflaj t1_jb92dey wrote

Well to be honest, unless there's some particular reason why you need the GPUs locally, the most cost effective solution is to just run it in the cloud.

Having GPUs locally is mostly a luxury for when some contract prevents you from using the cloud, or you need to train something every day for several hours over a year or more. For everything else, cloud pay-as-you-go will be cheaper and faster.