OrderOfM t1_iyya61l wrote on December 5, 2022 at 2:02 AM

Does using more than one transformation on your data increase the efficiency of a machine learning model?? For example a Min Max Scaler and a Centering technique..

pier4r t1_iyzl5ta wrote on December 5, 2022 at 10:47 AM

I not too deep in ML , but I read articles every now and then (especially about hyped models, GPT and co). I see that there is progress on some amazing things (like GPT-3.5) also because their NN gets bigger and bigger.

My question is: are there studies that check that NN could do more (are more precise or whatever) given the same parameters? In other words, it is a race in making NN as large as possible (given that they are structured appropriately) or is the "utility" per parameter also growing? I would like to know if there is literature about it.

It is a bit like an optimization question. "Do more with the same HW" so to speak.

Weth1000 t1_izgeuuz wrote on December 8, 2022 at 10:33 PM

I am an industrial engineer that complete Andrew Ng’s course. I have very large industrial continuous ovens that I want to optimize when they are in upset conditions. Meaning have spots of blank product. I am thinking a neural network may work but I am not sure. I rather use linear regression but I think that is better for steady state. How do I get help on how to best tackle this problem?

[deleted] t1_izighb3 wrote on December 9, 2022 at 10:04 AM

[deleted]

Ricenaros t1_izkmdxd wrote on December 9, 2022 at 8:11 PM

I'm trying to understand concepts involving feature engineering and correlation, because I feel like I'm encountering conflicting ideas about these two points. On the one hand, we can generate new features by combining our existing features, for example multiplying feature 1 by feature 2. This is said to improve ML models in some cases.

On the other hand, I have read that a desirable property of our input/output data is predictors being highly correlated with the target variable, but not correlated with other predictors. This idea seems to conflict with feature engineering, as our newly derived features can be correlated with the features they were constructed from. Am I missing something here?

I-am_Sleepy t1_izr846m wrote on December 11, 2022 at 6:11 AM

I am not sure why your output need to not be correlated with other predictor. If the task is correlated then its feature should be correlated too e.g. panoptic segmentation and depth estimation

For feature de-correlation there are some technique you can applied. For example in DL there is orthogonal regularization (enforce feature dot product to be 0), and this blog post

mymar101 t1_izkz1vl wrote on December 9, 2022 at 9:34 PM

I’m looking for simple ideas for practical projects to incorporate machine learning into. I’m also looking for something that a solo beginner could do using libraries like scikit learn. Any ideas? I’m not interested in simply predicting things. I’d like a practical application to fit it inside.

pythoslabs t1_j00ge8i wrote on December 13, 2022 at 4:23 AM

Here are some ideas -

- collection of news and finding the impact of news on stock prices ( NLP / Timeseries )

- put a camera in front of your street and predict daily traffic volume ( Computer Vision + prediction )

- predict the winners of the next UFC fight / NFL championship

Basically build a system on events that are currently happening / yet to happen in the near future and evaluate your results against the real outcomes.

If you want to do the whole end-to-end project here are the things you have to do -

Try the whole pipeline - starting from

data collection
cleaning the data ( build rules)
building the feature list
creating your analytical dataset
the complete model creation step
prediction
evaluation & interpretation of model result
deploy to production
evaluate model drift
model refresh

[deleted] t1_j03wh2d wrote on December 13, 2022 at 9:59 PM

[deleted]

SufficientStautistic t1_izya9wk wrote on December 12, 2022 at 7:11 PM

What does Gluon offer? How does it compare to TensorFlow and PyTorch?

[deleted] t1_j01ifv6 wrote on December 13, 2022 at 12:07 PM

[deleted]

jakderrida t1_j025iqs wrote on December 13, 2022 at 3:17 PM

Problem with that is that using engagement or clicks will just give you an inferior version of Facebook's formula for turning retirees into conspiracy theorists.

On the other hand, I think you could make one. Perhaps by scraping the abstracts of published research and differntiating between those that later received extraordinary amounts of citations and those that didn't. I actually used NLP models against Seeking Alphas author-tagged articles for Bullish and Bearish to the stocks article pertained to and while I started with expectations to beat a coin toss, the results surged to over 90% accuracy.

[deleted] t1_j028rzq wrote on December 13, 2022 at 3:43 PM

[deleted]

jakderrida t1_j02apso wrote on December 13, 2022 at 3:58 PM

Well, for one, flipping the script already occurs. When I was an electrician, a manager overheard me claim that a device measures resistance in the circuit. He proclaimed it measures continuity of the charge going through it. I repeatedly told him that it's the same thing with no success.

If it measures whether it has many citations, the inverse of the probability measure given will be the probability it has low citations.

Now if what you're looking for is something like short stories, the hurdle to cross would be to find pretagged data that you would consider a reliable measure of "interesting/engaging" to be converted into mutually exclusive dummy variables for the NLP tool to train for. The reason I mentioned published research and citations is only because it's massive, well-defined, and feasible to collect metrics with associated texts.

Just to ensure you don't waste your time with any dreams of building the database without outside sources, I want you to realize that the thing about deep learning/neural network technologies is that it tends to produce terrible results unless the training data is pretty massive. Even the 50,000 tagged articles I used from Seeking Alpha would be considered somewhat frivolous of me by most in the ML community. Not because they're jerks or anything, but because that's just how NNs work.

[deleted] t1_j02b3bj wrote on December 13, 2022 at 4:01 PM

[deleted]

jakderrida t1_j02bzq2 wrote on December 13, 2022 at 4:08 PM

>It must be a pretty hard problem.

Not particularly. The only hurdle is the database. I collected all the Seeking Alpha articles and tags very easily before organizing the data and building the model to astonishing success on Colab.

An alternative would be to find literature from great writers (James Joyce, Emile Bronte, etc.) and divide it into paragraphs as texts, remove paragraphs that are too small and tag those paragraphs as a 1 and take awful writing (Twilight, Ann Coulter, Mein Kampf, etc.) and do the same with them tagged as 0 before training the model to separate the two.

[deleted] t1_j02d1ch wrote on December 13, 2022 at 4:15 PM

[deleted]

jakderrida t1_j02d90q wrote on December 13, 2022 at 4:17 PM

I guess I just assumed you wanted to avoid things intellectually vacuous. My bad.

[deleted] t1_j02drwm wrote on December 13, 2022 at 4:20 PM

[deleted]

[deleted] t1_j04ahtg wrote on December 13, 2022 at 11:34 PM

[deleted]

Phoneaccount25732 t1_j06kihn wrote on December 14, 2022 at 1:07 PM

With a background in OR and fluid dynamics, once you get going you should check out Kidger's work on Neural Differential Equations.

Comfortable_End5976 t1_j08rtd1 wrote on December 14, 2022 at 9:54 PM

Thank you, I have been looking into them, PINNs, and sciML in general. A fair bit of it is beyond me at the moment which is why I need to catch up on the fundamentals a bit first :)

kasperonline t1_j0h1kdw wrote on December 16, 2022 at 4:13 PM

I’m doing a regression and scaling my data using the MinMaxScaler on sklearn. I want to find out a way I can scale back the regression coefficients so I can interpret them in context of the original data values.

inverse_transform function only works for the data itself. Anybody has any idea how to do such a thing?

Unique_Enthusiasm_ t1_iywfdkr wrote on December 4, 2022 at 6:24 PM

If I have the monthly electricity consumption data for the last 18 months and I want to predict the electricity consumption for the next month, which ML model should I use?

ForceBru t1_iyxs8wm wrote on December 4, 2022 at 11:46 PM

You should probably start with basic time-series models like ARIMA, its seasonal version (seasonality should be particularly important for electricity forecasting) and maybe exponential smoothing.

When looking for research about time-series forecasting, I somewhat often stumble upon these basic methods perform well for electricity forecasting. I can't cite any particular papers since electricity forecasting is not my area of research, but I do feel like these methods are often discussed in the context of electricity forecasting specifically. I'm not sure whether this is a general trend though.

Anyway, in time-series analysis, it's often beneficial to try the traditional models first and only then reach for machine learning. Looks like ARIMA-like models perform fairly well in many cases, so there may be no need for any complicated ML.

darthjeio t1_iywoo5p wrote on December 4, 2022 at 7:23 PM

I’m working on images (let’s say object detection) but the information is somewhat sparse (let’s say detect a white line on a noisy dark background). What would be a good model for this task in order to save computational time/resources? CNN-based SOTA models seems a bit overkill even tho I’m sure they would work. Was thinking about masking or transformers.. any idea?

zenmandala t1_iyy3jr5 wrote on December 5, 2022 at 1:10 AM

I've had success with Squeezenet for finding the origin of white circles in extremely noisy images, so maybe you could use that. Just change the last convolution in the classifier to match the desired dimensions of your output.

I was able to CPU train a solution that way. It's actually my go to for tasks like that because it seems to just do better than some larger newer networks at that sort of thing.

_PYRO42_ t1_iyxivdk wrote on December 4, 2022 at 10:39 PM

I want to create a new type of neural network, but it might be nothing new. I struggle to find anything about it on Google Scholar. I am missing the nomenclature associated with such a technique.

I want to create a neural network with conditional execution. Instead of executing every neuron, layer-by-layers, I wish to build a system where the network can NOT execute a neuron and any subsequent paths after it. By not executing, I mean, no CPU cycles, no computation, no electricity, and no power consumed.

This non-execution of code is conditional. Example: IF A>0.5 THEN execute LEFT neuron ELSE execute RIGHT neuron

Do such systems already exist? What do we call them? I need a name to search for it! :)
Thank you for your help!

HandSchuhbacca t1_iyxoq9r wrote on December 4, 2022 at 11:21 PM

Maybe have a look at mixture of experts? That is a popular method where different blocks are executed conditionally.

_PYRO42_ t1_izml827 wrote on December 10, 2022 at 5:24 AM

Oh lord, that's not a bad one. I love it!
I will use the GPU while retaining recursion and conditionality. Blocks of GPU-processable neurons, linked with particular conditional/recursive neurons.

Superschlenz t1_iyy2jx0 wrote on December 5, 2022 at 1:02 AM

Normally, compute is saved by pruning away slow changing weights which are close to zero.

And you seem to want to prune away fast changing activations.

Don't the machine learning libraries have a dropout mechanism where you can zero out activations with a binary mask? I don't know. You would have to compute the forward activations for the first layer, then compare the activations with a threshold to set the mask bits, then activate the dropout mask for that layer before computing the next layer's activations. Sounds like a lot of overhead instead of a saving.

Edit: You may also manually force the activations to zero if they are low. The hardware has built-in energy saving circuitry that skips multiplications by zero, maybe by 1 and additions of zero as well. But it still needs to move the data around.

_PYRO42_ t1_izmknte wrote on December 10, 2022 at 5:18 AM

I have an intuition: Larger models are successful not because of the amount of computation they can take advantage of but because of the amount of knowledge they can encode. I want to try an ultra-large, ultra-deep neural network with Giga bytes of neurons that would consume no more than 50 Watts of power. The human brain uses 20 Watts; I feel we are making a mistake when we start poking in the 100-200W of power on a single network. I want to control machines, not generate pieces of art. I want Factorio not to be a game but a reality of ours.

I will bring edge computing to this world. I will make it a thing you can wear not on your skin but as your skin.

My brother, come join me. In battle, we are stronger.

_PYRO42_ t1_iyyyqw6 wrote on December 5, 2022 at 5:37 AM

That's about what I was looking for:LIU, Lanlan et DENG, Jia. Dynamic deep neural networks: Optimizing accuracy-efficiency trade-offs by selective execution. In : Proceedings of the AAAI Conference on Artificial Intelligence. 2018.

Problem: Control nodes prevent the direct application of back-propagation to learn.I have an idea of how we could solve that... >:)A way to remove control nodes while still retaining the concept of control

I only need to add recursion. A truly Turing complete NN, with Billions of Neurons but a small execution path. Encoding knowledge, but using it only when needed!

Brudaks t1_iz04r4f wrote on December 5, 2022 at 2:19 PM

Thing is, it's generally more compute-efficient to do the exact opposite and replace conditional execution with something that always does the exact same operations in parallel but just multiplies them by zero or something like that if they're not needed. Parallelism and vectorization is how we get effective execution nowadays.

zenmandala t1_iyy3o5i wrote on December 5, 2022 at 1:11 AM

What's the smallest number of parameters you've seen for MNIST? I've been golfing with myself at it and managed to get 99% validation accuracy at 2922 parameters. I'm wondering if anyone has done lower?

mo6phr t1_iyy9xcg wrote on December 5, 2022 at 2:00 AM

Some guy got 99.1% test acc with ~1900 params link

zenmandala t1_iyyjcr4 wrote on December 5, 2022 at 3:16 AM

Thank you that's awesome. Super surprised to see its a tuned CNN, I've been going FCNN. Very interesting, you've made my day.

HandsomeMLE t1_iyz0511 wrote on December 5, 2022 at 5:52 AM

I've finished training a model, but I'm not confident about how to test or prepare it against unexpected risks in terms of trustworthiness and reliability when deployed. Are there some kinds of rules of thumb or any recommended methods to thoroughly test a model against those unseen risks?

trnka t1_iz0722k wrote on December 5, 2022 at 2:37 PM

If possible, find some beta testers. If you're in industry try to find some non-technical folks internally. Don't tell them how to use it, just observe. That will often uncover types of inputs you might not have tested, and can become test cases.

Also, look into monitoring in production. Much like regular software engineering, it's hard to prevent all defects. But some defects are easy to observe by monitoring, like changes in the types of inputs you're seeing over time.

If you're relationship-oriented, definitely make friends with users if possible or people that study user feedback and data, so that they pass feedback along more readily.

HandsomeMLE t1_iz45m9t wrote on December 6, 2022 at 9:42 AM

Many thanks for your answer! I'll definitely do that. I'm also wondering if there are some kind of tools, services, or even methodologies that help pre-screen potential model defects or that catch unexpected reliability issues the model might have, so I can improve the model quality and accuracy with various methods.

trnka t1_iz4nfux wrote on December 6, 2022 at 1:23 PM

Depends on the kind of model. Some examples:

For classification, a confusion matrix is a great way to find issues
For images of people, there's a good amount of work to detect and correct racial bias (probably there are tools to help too)
It can be helpful to use explainability tools like lime or shap -- sometimes that will help you figure out that the model is sensitive to some unimportant inputs and not sensitive enough to important features
Just reviewing errors or poor predictions on held-out data will help you spot some issues.
For time-series, even just looking at graphs of predictions vs actuals on held-out data can help you discover issues
For text input, plot metrics vs text length to see if it does much worse with short texts or long texts
For text input, you can try typos or different capitalization. If it's a language with accents, try inputs that don't have proper accents

I wish I had some tool or service recommendations. I'm sure they exist, but the methods to use are generally specific to the input type of the model (text, image, tabular, time-series) and/or the output of the model (classification, regression, etc). I haven't seen a single tool or service that works for everything.

For hyperparameter tuning even libraries like scikit-learn are great for running it. At my last job I wrote some code to run statistical tests assuming that each hyperparam affected the metric independently and that helped a ton, then did various correlation plots. Generally it's good to check that you haven't made any big mistakes with hyperparams (like if the best value is the min or max of the ones you tried, you can probably try a wider range).

Some of the other issues that come to mind in deployment:

We had one pytorch model that would occasionally have a latency spike (like <0.1% of the time). We never figured out why, except that the profiler said it was in happening inside of pytorch.
We had some issues with unicode input -- the upstream service was sending us latin-1 but we thought it was utf8. We'd tested Chinese input and it didn't crash because the upstream just dropped those chars, but then crashed with Spanish input
At one point the model was using like 99% of the memory of the instance, and there must've been a memory leak somewhere cause after 1-3 weeks it'd reboot. It was easy enough to increase memory though
One time we had an issue where someone checked in a model different than the evaluation report

HandsomeMLE t1_iz95iqy wrote on December 7, 2022 at 11:52 AM

Thank you very much for your detailed explanation, trnka. It's been really helpful! It seems inevitable to have lots of unexplained issues in the process and I guess we can't expect to be perfect all at once :)

How would you weigh the importance of validating/testing a model? (maybe it depends on sector/industry?) As a beginner, I hope I'm not putting too much time and effort into it than I should be.

trnka t1_iz9j30k wrote on December 7, 2022 at 2:02 PM

It definitely depends on sector/industry and also the use case for the model. For example, if you're building a machine learning model that might influence medical decisions, you should put more time into validating it before anyone uses it. And even then, you need to proceed very cautiously in rolling it out and be ready to roll it back.

If it's a model for a small-scale shopping recommendation system, the harm from launching a bad model is probably much lower, especially if you can revert a bad model quickly.

To answer the question about the importance of validating, importance is relative to all the other things you could be doing. It's also about dealing with the unknown -- you don't really know if additional effort in validation will uncover any new issues. I generally like to list out all the different risks of the model, feature, and product. And try to guesstimate the amount of risk to the user, the business, the team, and myself. And then I list out a few things I could do to reduce risk in those areas, then pick work that I think is a good cost-benefit tradeoff.

There's also a spectrum regarding how much people plan ahead in projects:

Planning-heavy: Spend months anticipating every possible failure that could happen. Sometimes people call this waterfall.
Planning-light: Just ship something, see what the feedback is, and fix it. The emphasis here is on a rapid iteration cycle from feedback rather than planning ahead. Sometimes people call this agile, sometimes people say "move fast and break things"

Planning-heavy workflows often waste time on unneeded things, and fail to fix user feedback quickly. Planning-light workflows often make mistakes on their first version that were knowable, and can sometimes permanently lose user trust. I tend to lean planning-light, but there is definite value in doing some planning upfront so long as it's aligned with the users and the business.

In your case, it's a spectrum of how much you test ahead of time vs monitor. Depending on your industry, you can save effort by doing a little of both rather than a lot of either.

I can't really tell you whether you're spending too much time in validation or too little, but hopefully this helps give you some ideas of how you can answer that question for yourself.

HandsomeMLE t1_izdtpzc wrote on December 8, 2022 at 11:10 AM

After all, I take it all depends on what kind of model we're working on, how much we weigh the importance and likelihood of possible risks associated with it, and how to act and measure accordingly.

Thank you very much for your thoughtful input. It's been really helpful!

gkamer8 t1_iz0ad6v wrote on December 5, 2022 at 3:03 PM

I’ve been trying to train a transformer from scratch on a couple books in hopes that it can give me English-ish text, even if it’s overfitting. The model is getting stuck just outputting the most likely token as “space”, second mostly likely as “comma”, third “and” and so on. That’s for every token. Has anyone run into similar issues, or can help me brainstorm some problems? Some things I’ve checked/tried so far:

The model can learn a toy problem where sequences are either abc or def - first token is a/b 50%, rest of tokens are 99% correct because they can tell if the first token was a or d. So the model is not completely broken
Warmup / long warmup. I used the learning rate formula in vaswani et al. Just tried it last night with a much longer warmup with learning rates multiplied by 0.01, no dice.
layer norm epsilon - added one for numerical stability
input/output embeddings use shared weights, input embeddings are multiplied 1/sqrt(dmodel)
using label smoothing = .1 on my cross entropy loss
increased the batch size by accumulating gradients, so every batch had about 20k tokens
ran overnight in hopes that it would break out of the local minimum, didn’t
using the Adam optimizer

Some other details-

using the GPT 2 tokenizer
sequence length of 64
batches of size 200
model is made completely from scratch, so no PyTorch or hugging face libraries
the model has the same parameters as “base” in vaswani et al

Any suggestions would be appreciated

Brudaks t1_iz4av37 wrote on December 6, 2022 at 11:01 AM

My intuitive understanding is that transformers are far too "powerful"/assumption-free that they are quite data-hungry and need far more than "a couple books" to learn the structure.

If all you have is a couple of books, then IMHO a small RNN would bring better results than a transformer (but still bad - "all the works of Shakespeare" seems to be a reasonable minimum to get decent results) and the inflection point where transformer architecture starts to shine is at much larger quantities of training data.

If you do want to exactly that (and with overfitting), try starting from a sequence length of, say, 4 or 8 as a starting point.

gkamer8 t1_iz55n2z wrote on December 6, 2022 at 3:44 PM

Thanks- since writing this, I got past that particular minimum with better initialization and a modified arch, but it still isn’t generating terribly interesting text. I upped the dataset to about 10 books. I think I’ll download a proper large dataset to see if it can do any better. Thanks!

silverjoda t1_iz19r28 wrote on December 5, 2022 at 6:58 PM

What is multi-objective optimization about? In the end, the weight of the individual objectives is a designer specification and any multi objective optimization can be formulated as a single objective optimization that is the weighted sum of all the objectives.

Wahajs t1_iz1oxuh wrote on December 5, 2022 at 8:35 PM

We have multiple employment laws across the region and get constant queries as part of a business. Laws differ region to region and industry to industry. Is there a way to train a machine/bot on it that shares answers and for complex queries points it to a human?

I want to build something that can pick up items from the law and respond with reference to the law. Happy to invest my teams time to train but need a starting point as I am not from the industry.

zenmandala t1_iz3n8kx wrote on December 6, 2022 at 5:31 AM

That's a domain specific chatbot. There are a bit too many factors in how your current data for answers is stored to be specific. I would look at various approaches to domain specific chatbots and then see which one is most applicable for you. This paper might be a starting point: https://arxiv.org/ftp/arxiv/papers/2001/2001.00100.pdf

One piece of advice I would personally give is read a lot before starting such a project. Better to have a clear plan than try to establish as you go.

csreid t1_izbhqi8 wrote on December 7, 2022 at 9:58 PM

You might be able to start with Rasa, which is an open source chatbot framework.

augustintherome t1_iz3drnz wrote on December 6, 2022 at 4:01 AM

I am trying to build a product that would integrate user's many data sources like notion, email, notes, company chats, jira, linear and e.t.c; after which he would be able to ask natural questions like "When do i need to workout today?", to which he will get something like "Today's workout 19:00 (Link to original document)".

What direction should i follow with this idea (e.g. semantic search, text embedings)?

csreid t1_izbhhqs wrote on December 7, 2022 at 9:56 PM

What you're describing is just called "question answering" in NLP afaik. A language model will take in a source document and a question and spit out either a generated answer to the question or a section of the source text containing the answer.

Check some of the QA models on huggingface to get an idea if you're not already familiar

Old_Stick_9560 t1_iz3yo2l wrote on December 6, 2022 at 7:56 AM

So I have about 1m data set with 25 attributes. I wanted to know how can I either segment the data or can i take 1% or 10% of the dataset. Like how do i tell which approach to go to? Target is a simple 3 state classification model and the dataset mostly contains numerical data.

jrhabana t1_iz4d85e wrote on December 6, 2022 at 11:33 AM

About text generation: Can someone share experiences with Bloom, Palm, and others than Gpt-3

I'm trying to build a niche text generator in Spanish using open-source models, but the internet conversation is dominated by gpt-3, and few articles about Prompt engineering, or model comparisons

still_tyler t1_iz587yc wrote on December 6, 2022 at 4:02 PM

What's the best way to go about a multiclassification problem in which 3 of the features are x, y, z coordinates, with each row only having one location per outcome?

I'd like to take advantage within the model of the idea that there is spatial correlation in the outcomes (e.g. one record close to another in x, y, z will likely have a similar outcome). The spatial components make me want to use a CNN, but each input being just a 1x3 vector rather than something bigger makes me think that's not possible?

(fwiw, xgboost has the best predictive accuracy. Tried a gaussian process too but XGB still beat it. Was thinking there might be a NN approach but google has not been fruitful)

trnka t1_iz9ol1s wrote on December 7, 2022 at 2:44 PM

> one record close to another in x, y, z will likely have a similar outcome

That sounds a lot like k-nearest neighbors, or SVM with RBF kernel. Might be worth giving those a shot. That said, xgboost is effective on a wide range of problems so I wouldn't be surprised if it's tough to beat. Under the hood I'm sure it's learning approximated bounding boxes for your classes.

I haven't heard of CNNs being used for this kind of problem. I've more seen CNNs for spatial processing when the data is represented differently, for example if each input were a 3d shape represented by a 3d tensor rather than coordinates.

still_tyler t1_iz9prl7 wrote on December 7, 2022 at 2:53 PM

Yeah, XGB still outperforms knn and svm here. There's a bunch of other non-coordinate covariates that contribute and XGB just kicks butt in this case. Fair enough, thanks for the response!

csreid t1_izbgnyy wrote on December 7, 2022 at 9:51 PM

> The spatial components make me want to use a CNN, but each input being just a 1x3 vector rather than something bigger makes me think that's not possible?

The point of the convolution is to efficiently capture information from surrounding pixels when considering a single pixel. Back in the pre-DL olden days, computer vision stuff still involved convolutions, they were just handcrafted -- we had a lot of signal processing machinery we could use to eg detect edges and such. In your case, you don't really have anything to convolve over.

You could try just feeding the coordinates into an MLP with the other covariates and it should be able to capture that spatial component.

Accomplished-Bill-45 t1_iz7dgwm wrote on December 7, 2022 at 12:35 AM

Are currently state of art model for logical/common-sense reasoning all based on NLP(LLM)?

Not very familiar with NLP, but I'm playing around with OpenAI's ChatGPT; particularly impressed by its reasoning, and its thought-process. Are all good reasoning models derived from NLP (LLM) models with RL training method at the moment? What are some papers/research team to read/follow to understand this area better and stay on updated?

Nameless1995 t1_izj36ff wrote on December 9, 2022 at 2:07 PM

I think in principle if you have enough resource, and investigate the right fine tuning techniques you can get the SOTA out of them. However, at the moment it's quite new, moreover not as open access for research. Furthermore, RL training is not easy to do for a random researcher (because it's kind of in a human in the loop framework and you require human annotations --- you can kind of probably do it with AWS and such though; but it probably won't easily become a standard too soon because of the inconvenience).

Another thing is ELLM (let's say "extra large language models" to distinguish GPT3+ style of models from BERT/BART/Roberta/GPT2 style of models) are generally used in few-shot setup/instruction following setup, and probably won't fit exactly with "fine-tuning on whole dataset" setup. And it can be again hard for random researchers to fine-tune or even run those humongous models. So it may again take time to time to seep into everywhere.

In my investigation ChatGPT seems to still struggle a bit on some harder logical challenges (some of which even I struggled a bit with) eg. in LogiQA: https://docs.google.com/document/d/1PATTi0hmalBvY_YQFr4gQrjDqfnEUm8ZDkG20J6U1aQ/edit?usp=sharing

(although you can probably improve upon by more specialized RL training for logical reasoning + multiple reasoning path generation + self-consistency checking + least-to-most prompting etc.)

I think SOTA of logiQA is: https://aclanthology.org/2022.findings-acl.276/ (you can find relevant papers by looking at the citation network in semantic scholar)

For reasoning on other areas, you can probably use the chain of thought papers and its related citations to keep track (because COT is almost a landmark in prompt engineering for enhanced reasoning, and most future ELLM paper working on reasoning would probably cite it).

Don't know much about common-sense reasoning (either as a human or in terms of research in that area).

Username912773 t1_iz8afhl wrote on December 7, 2022 at 5:01 AM

Are GaNs or stable diffusion better for final quality in image synthesis?

I-am_Sleepy t1_izr9aqr wrote on December 11, 2022 at 6:22 AM

Right now, diffusion model (see FID score https://paperswithcode.com/sota/image-generation-on-celeba-64x64)

[deleted] t1_iz8b60r wrote on December 7, 2022 at 5:09 AM

[deleted]

mrpacetv t1_iz98n4y wrote on December 7, 2022 at 12:28 PM

What will be the suitable network to train on size of grains in an image?
Inputs will be images (64x64 or higher order 256x256) and output should be number float (size of particles in image). I prepared a dataset using voronoi cells.
I looked into the digits recognition problem but that seemed classification problem (categories of 10 digits). So in examples it was using MLPclassifier or CNN( categorial loss and softmax in final layer.)

teenaxta t1_iz9t4k2 wrote on December 7, 2022 at 3:17 PM

how much of an improvement is RTX 3090 over RTX 3080 10GB for deep learning. Will be working mostly with resnet-50 or something like that

[deleted] t1_izb6jaq wrote on December 7, 2022 at 8:43 PM

[deleted]

[deleted] t1_izcdqpr wrote on December 8, 2022 at 1:49 AM

[removed]

[deleted] t1_izdn65d wrote on December 8, 2022 at 9:38 AM

[removed]

_RootUser_ t1_izemqro wrote on December 8, 2022 at 3:33 PM

I am trying to make a project for my college inserting elements of AI in it. But I want to know enough about concepts/formulae behind libraries, functions of ML/DL. I have found Statistical Learning to be too vast for me. Yet, I cannot seem to grasp the concept behind choice of libraries, functions and layers in model creation.
What am I missing? How do I approach this problem? I am trying to recreate certain functions and models from scratch with numpy but I fail in it. Any advices or suggestions for me? I have about 7-10 days where I can give more than 12 hours everyday for this. How should I approach this problem?

I-am_Sleepy t1_izr913e wrote on December 11, 2022 at 6:20 AM

Your description is a bit vague, but if it is a regression, try linear model using least squared with polynomial features

[deleted] t1_izftdi0 wrote on December 8, 2022 at 8:10 PM

[deleted]

Readityesterday2 t1_izg4iim wrote on December 8, 2022 at 9:22 PM

What are good desktop alternatives to GPTchat that you can train on your own for creating text?

What are good ones for image generation?

Thanks!

Nameless1995 t1_iziy3gn wrote on December 9, 2022 at 1:27 PM

What do you mean by "desktop alternatives"? You mean something you can train on a single GPU or two? I don't think you would get any real alternative for that unless you lower your expectations by a lot. But for more open source stuff you can check https://www.eleuther.ai/ GPT-style models and others like Blenderbot, bloom etc. For image generation, probably stable diffusion or something.

mayermensch69 t1_izk7m7b wrote on December 9, 2022 at 6:34 PM

I came across this approach of dialog evaluation: https://github.com/Shikib/fed

What I don't understand is, how the (more or less) raw loss can be used as a metric, since it is not really bounded. It may work when directly comparing specific examples with this method, but how does one compare these scores to other metrics with a fixed scale?

[deleted] t1_izkejij wrote on December 9, 2022 at 7:19 PM

[deleted]

Subject-Resort5893 t1_iznkd7x wrote on December 10, 2022 at 1:07 PM

Can you do machine learning in SQL? Or do you have to have python/R?

BlueSubaruCrew t1_izr01bd wrote on December 11, 2022 at 4:57 AM

SQL Is mostly used to query data from databases (which is important for a lot of machine learning projects). The actual machine learning stuff is usually implemented in python/R.

Duckdog2022 t1_izvt3xa wrote on December 12, 2022 at 5:20 AM

Can you do it? Probably, yes. But that'll be very ugly and inefficient if you can even make it work.
So better use something that's designed for it like Python or R.

Ok_Distance5305 t1_j07z9d3 wrote on December 14, 2022 at 6:55 PM

Yes. See for example https://cloud.google.com/bigquery-ml/docs/introduction. There are other similar implementations.

I’m not suggesting it’s a good idea nor have I seen it used, but you you can try.

ollih12 t1_izre2kw wrote on December 11, 2022 at 7:16 AM

What is the best approach for text generation?

For context: I'm trying generate episode synopsis of a show by training a model with existing episode titles and synopsis of a show and using an input title as the input for the generated episode. I've read that LSTM models are good for this since they maintain the context. I have also read that GPT-3 is the best for this but it's not free. This is just a personal project and I intend on using PyTorch if it's of any significance, currently I have scraped synopsis and titles of existing episodes and have them stored in a pandas dataframe so just not sure where to go from here.

pythoslabs t1_izxk8im wrote on December 12, 2022 at 4:25 PM

>also read that GPT-3 is the best for this but it's not free.

Try ChatGPT (https://chat.openai.com/ ) . Its free pre-beta release and so you can try your hands on it .

Also be careful that it might not be 100% factually accurate . But to try out simple text generation, it should do the job pretty well.

ollih12 t1_izzhdj9 wrote on December 12, 2022 at 11:56 PM

Can ChatGPT be fine tuned for what I described?

BrightCounter738 t1_izzwtbu wrote on December 13, 2022 at 1:50 AM

It is not open-sourced (and one probably wouldn’t be able to run it personally even if it was), so no.

ollih12 t1_j00bp15 wrote on December 13, 2022 at 3:44 AM

Would the GPT-2 model from the transformers package be ok for it?

pythoslabs t1_j00ffog wrote on December 13, 2022 at 4:15 AM

Yes.

You have to train on their system with your custom data. It is costly though.

eg: if you want to train on the Davinci model will cost you - $0.0300 / 1K tokens for training ( fine tuning ) and $0.1200 / 1K tokens for its usage - if you wish to use it as an API end point )

ollih12 t1_j00hiav wrote on December 13, 2022 at 4:33 AM

Are there any free alternatives you would recommend?

GaseousOrchid t1_izrjinp wrote on December 11, 2022 at 8:31 AM

How do flax/haiku (for ML with Jax) compare? I've seen people use both, but there seem to be a lot more people using flax. I've alwyas preferred Haiku, but last tried Flax 2 years ago -- curious if things have changed?

Trustafew t1_izu4zh0 wrote on December 11, 2022 at 9:33 PM

What approach would you use for modelling the proportional impact of one value on another over time? I’m curious about modelling the impact of CPI on interest rates. Any time series ML resources you would recommend? I’m a bioinformatician/researcher and I do tons of clustering and decision tree based stuff w R using caret and now TidyModels but adding the time dimension is new (scary!) to me.

Dramatic_Sector_6237 t1_j014rrb wrote on December 13, 2022 at 9:07 AM

Is AI limited to tabular data?

I am completing an online training to become data scientist and this question just popped in my mind.

Indeed, during my training I only use csv files although I can manipulate SQL databases in order to get those csv files.

However I was wondering if it was possible to train AI models on other kind of data that tabular ones because otherwise it appears to be quite limiting in my point of view. (I am barely a junior data scientist so maybe my question is naïve or so...)

creativekinase t1_j04zk32 wrote on December 14, 2022 at 2:39 AM

Hi everyone,

I'm using a tensorflow keras model to classify medications using infrared spectra. I'm wondering if there are resources of how many spectra I should have for each class and if there is a maximum number of classes I can have (if there is a maximum).

Thanks!

BackgroundFeeling707 t1_j05lywf wrote on December 14, 2022 at 5:55 AM

Hi, How do local language model inferencing such as kobold ai's webui keep information? I understand you can only produce a certain number of tokens in one go.

Does it just use the last 30 tokens or so in the new batch?

Eventually. I run out of memory.. Unable to continue the text adventure. It Shouldn't do that right?

Are there techniques to store info?

BackgroundFeeling707 t1_j05mhuh wrote on December 14, 2022 at 6:01 AM

In general, for stable diffusion, why are there often large vram spikes at the end of inferencing, and what kind of code techniques are done to solve this problem?

faCt011 t1_j08c288 wrote on December 14, 2022 at 8:15 PM

Hello,

I'm new to the topic. I was wondering why Microsoft refers to machine learning models as "files"? Is it really something which is comparable to a .txt file that you can open and edit? I always imagined models as a big collection of numbers. So, how are production-ready models saved and used?

Source: https://learn.microsoft.com/en-us/windows/ai/windows-ml/what-is-a-machine-learning-model (very first line)

Thanks in advance :)

EdenistTech t1_j097uyn wrote on December 14, 2022 at 11:45 PM

Hello. I have binary classification problem. However, instead of aiming for a high overall prediction rate for the entire training set, I would like to find subsets of features that with a very high probability places a given sample in category X and other subsets that place samples in category Y. In other words a prediction should not be attempted if the conviction of the estimate is low. Does such an algorithm exist?

drewfurlong t1_j0a3zn7 wrote on December 15, 2022 at 3:47 AM

Would you say you're looking for a classifier with high precision, and perhaps low recall?

EdenistTech t1_j0aa5is wrote on December 15, 2022 at 4:39 AM

To some extent yes. But rather than focusing on the true positives of the entire training set, I would be interested in the algorithm carving out subsets of features and values for which precision is very high - higher than the precision of the entire training set. I hope that makes sense?

onionhead888 t1_j0j7xza wrote on December 17, 2022 at 1:22 AM

I think you’re looking for random forests which is a binary split algorithm.

drewfurlong t1_j0a3mbn wrote on December 15, 2022 at 3:44 AM

Have you come across any excellent explanations for how the various attention layers work? Ideally with worked examples and graphics.

After reading the wikipedia article on the topic, and 6 pages into attention is all you need, I'm thoroughly confused. it's tough to keep track of what's a key, query, and value, where the recurrent layer goes, etc.

Wakeme-Uplater t1_j0atqe1 wrote on December 15, 2022 at 8:20 AM

What is an alternative way to optimize for ads budget allocation?

As far as I know, RCT (A/B testing) is the golden standard to test for different factor effect. But that assume the distribution don’t drift overtime, and is cost intensive

Another method I found is Marketing Mix Modeling. But this seems to riddle with biases and pitfall, which stemmed from heavy induction bias, and treating correlation as casual model

As my understanding goes, there is a way to reduce confounding effect through causal inference

However, casual inference requires causal graph. This can be done with causal graph discovery, but it impose a lot of (unverifiable) assumptions on the data generation process. So a domain specific causal graph construction is still needed (which is not ideal)

So are there alternative methods to ROI? Also is there a stochastic process that model this problem? Pinning link of related topic/research is also welcome

rr718 t1_j0b9nst wrote on December 15, 2022 at 11:55 AM

Hey. Is it true that ML algorithms need highly formatted, uniform data while neural nets can learn from even unformatted/real-world data?

CuriousJam t1_j0c9q1j wrote on December 15, 2022 at 4:35 PM

Is there a reason why regression models tend to poorly capture local maxima and minima? My reasoning is that those points are typically underrepresented in training data, or is there another reason why it does so poorly at those points?

AttentionNo6483 t1_j0cc1bi wrote on December 15, 2022 at 4:49 PM

Take a look at the loss function. The regression line minimizes across the entire data set (which is why outliers can impact it). If you want to focus on local maxima you would want to do piecewise regression.

[deleted] t1_j0cijcp wrote on December 15, 2022 at 5:30 PM

[deleted]

[deleted] t1_j0ciu75 wrote on December 15, 2022 at 5:32 PM

[deleted]

FrankyMonkey t1_j0crrzr wrote on December 15, 2022 at 6:29 PM

Hello ML community. I am not the most knowledgeable person on this topic. But I am interested in analyzing the Export Controls. And my question to you is "why do you think it would be difficult to limit API-based access to ML?

Any elaboration is more than welcome. Thanks.

[deleted] t1_j0dto2a wrote on December 15, 2022 at 10:34 PM

[deleted]

BuiltLikeABagOfMilk t1_j0e2vtm wrote on December 15, 2022 at 11:39 PM

How do you determine an appropriate minimum support level when doing market basket analysis?

honchokomodo t1_j0f156j wrote on December 16, 2022 at 3:58 AM

how do i make my autoencoder use more of the latent dimensions?

I-am_Sleepy t1_j0mhitl wrote on December 17, 2022 at 8:05 PM

For starter, look at InfoVAE (See this blog for context). Another way is to vector-quantized it (VQ-VAE based models), as the model only need to learn a small number of latent, it can optimize them better

seacucumber3000 t1_j0fem4u wrote on December 16, 2022 at 6:07 AM

When tuning hyperparameters, is learning rate (decay, scheduling, etc.) dependent on things like model size and activation function? Or can I search for the ideal model architecture first, then tune learning rate after?

-zharai t1_j0goty0 wrote on December 16, 2022 at 2:48 PM

Are there any study groups for the fast.ai deep learning course? I'm open to rushing things a bit to catch up.

throwaway2676 t1_j0hvos1 wrote on December 16, 2022 at 7:29 PM

What are the chances ChatGPT offers a subscription mode that is totally uncensored?

MaterialLogical1682 t1_j0ijykt wrote on December 16, 2022 at 10:17 PM

Can anyone suggest a good book to learn about credit risk modeling in Python? Or in general a good source to start from.

Thanks

toothie25 t1_j0jvmec wrote on December 17, 2022 at 4:51 AM

How can the performance of a chatGPT model be evaluated in a way that takes into account both the quality of its generated responses and its ability to maintain coherence in long-term conversations? One approach might be to use a metric like perplexity, which is a measure of how well a language model predicts the next word in a sequence given the words that have come before it. However, perplexity does not necessarily capture the coherence of the model's responses over multiple turns in a conversation. Another possibility might be to use a measure like the BLEU score, which compares the model's generated responses to a set of reference responses and assigns a score based on the overlap between the two. However, the BLEU score does not take into account the quality of the generated responses themselves, only their similarity to the reference responses. Is there a way to combine these two approaches, or to come up with a new metric that takes into account both the quality and coherence of the model's responses in a more holistic way?

Old_Pea3923 t1_j0kple3 wrote on December 17, 2022 at 11:19 AM

I thought the context window of gpt3 was 2048 or 4000 tokens so how does chatGPT work?

Old_Pea3923 t1_j0kq93b wrote on December 17, 2022 at 11:28 AM

"While ChatGPT is able to remember what the user has said earlier in the conversation, there is a limit to how much information it can retain. The model is able to reference up to approximately 3000 words (or 4000 tokens) from the current conversation - any information beyond that is not stored.
Please note that ChatGPT is not able to access past conversations to inform its responses." - https://help.openai.com/en/articles/6787051-does-chatgpt-remember-what-happened-earlier-in-the-conversation

My question then is how does it do this?

Daminio6 t1_j0luzwr wrote on December 17, 2022 at 5:29 PM

Hi, recently I was ~~watching 3D porn~~ doing some scientific research and... Oh, it will be hard to explain, ok, I was watching 3D porn and noticed that there always animated one or few frictions, which then looped for long time and this makes such porn somewhat dull. So, I got an idea: what if someone can use NN to build skeleton of humans in real porn video and then attach 3d models to this skeletons? I see that there exist NNs which can estimate human skeleton but never used them. How do you think, is this possible with current existing models to take porn video, then estimate skeletons of two people on that video, and use resulting skeleton for 3D porn? And what problems may occur if someone will try to do this?

Maria_Adel t1_j0nwkbm wrote on December 18, 2022 at 2:29 AM

What models would you use for product assortment/getting the product range right for different stores

I-am_Sleepy t1_j0phzjn wrote on December 18, 2022 at 1:19 PM

I'm not sure, but I think there are several ways to model product assortments

First, Demand forecasting - You predict demand of each product, and act accordingly. This usually can be done using time-series forecast, or

Second, personalize taste - You assume that each customer has their own fixed preference, and you modeled that. If you know the demographic of each customer, you would be able to estimate the demand from recommended products

But the later probably going to output a static distribution, so I think you can apply demand forecast on the second method to discount them correctly (I think)

However, every method need data. If you have a cold-start product, you might want to perform basic A/B testing first to get the initial data

Maria_Adel t1_j0s4l0o wrote on December 19, 2022 at 12:18 AM

Thanks a lot. Data is available so that should not be a problem. What models would you suggest for demand forecasting of each product ( gradient boosting or hybrid deep learning models or ARIMA) and what key variables would you include in the model ( I’d suspect previous sales, price)

I-am_Sleepy t1_j0sh383 wrote on December 19, 2022 at 1:54 AM

If you have a target variable, and other input features, you can treat this problem as a normal regression problem. Using model like linear regression, Random Forest Regression, or XGBoost is very straight forward from there

You can then look at feature importance to try to weed-out the uncorrelated features (if you want to). There are a few automated ml for timeseries, but currently I mostly use Pycaret

But if you suspect that your target variable autocorrelate, model like SARIMAX can be used instead. An automated version of that is Statsforecast e.g. AutoARIMA with exogenous variables (haven't used it though)

But noted that if you are in direct control of a few variables, and you want to predict want will happen, this is no longer a simple regression anymore i.e. the data distribution may shift. That would be in Casual Inference territory (see this handbook)

finlaydotweber t1_j0op4rt wrote on December 18, 2022 at 7:03 AM

New to Machine learning and starting to learn about it. I think I understand how ML is fundamentally different from traditional programming. I have also come across the 3 categories of ML: Supervised, Unsupervised and Reinforced learning.

There 2 other terms I often seen thrown about that I still can't fit into my fundamental understanding of machine learning. And these are Neural Networks and Deep Learning.

What exactly is a Neural Networks? What exactly is Deep Learning? How does it fit into ML? Are they other kinds of ML? or techniques used in the different subset of ML?

[deleted] t1_j0os6be wrote on December 18, 2022 at 7:42 AM

[deleted]

sargentpilcher t1_j05ib9c wrote on December 14, 2022 at 5:19 AM

Hello everyone.

Im a big fan of using Automatic1111's WebGUI for Stable Diffusion, but I'm wondering if there's an equivalent for StyleGAN? Or do I need to essentially learn a bit of programming before I can actually play around with such an amazing AI tool?

[deleted] t1_iz7avkb wrote on December 7, 2022 at 12:16 AM

[deleted]

amonguswoman t1_izd3vdu wrote on December 8, 2022 at 5:28 AM

just posting your homework?

Nirmalpb t1_izoj68v wrote on December 10, 2022 at 5:37 PM

Suppose you have data on patients that record their height (in millimetres) and the length of their eyelashes (in millimetres). The aim is to perform clustering on this data. Explain why clustering using a distance measure such as Euclidean distance would be problematic on this data and outline how you could modify the data to address this.

Comments