Submitted by AutoModerator t3_zcdcoo in MachineLearning

Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead!

Thread will stay alive until next one so keep posting after the date in the title.

Thanks to everyone for answering questions in the previous thread!

21

Comments

You must log in or register to comment.

OrderOfM t1_iyya61l wrote

Does using more than one transformation on your data increase the efficiency of a machine learning model?? For example a Min Max Scaler and a Centering technique..

2

pier4r t1_iyzl5ta wrote

I not too deep in ML , but I read articles every now and then (especially about hyped models, GPT and co). I see that there is progress on some amazing things (like GPT-3.5) also because their NN gets bigger and bigger.

My question is: are there studies that check that NN could do more (are more precise or whatever) given the same parameters? In other words, it is a race in making NN as large as possible (given that they are structured appropriately) or is the "utility" per parameter also growing? I would like to know if there is literature about it.

It is a bit like an optimization question. "Do more with the same HW" so to speak.

2

Weth1000 t1_izgeuuz wrote

I am an industrial engineer that complete Andrew Ng’s course. I have very large industrial continuous ovens that I want to optimize when they are in upset conditions. Meaning have spots of blank product. I am thinking a neural network may work but I am not sure. I rather use linear regression but I think that is better for steady state. How do I get help on how to best tackle this problem?

2

Ricenaros t1_izkmdxd wrote

I'm trying to understand concepts involving feature engineering and correlation, because I feel like I'm encountering conflicting ideas about these two points. On the one hand, we can generate new features by combining our existing features, for example multiplying feature 1 by feature 2. This is said to improve ML models in some cases.

On the other hand, I have read that a desirable property of our input/output data is predictors being highly correlated with the target variable, but not correlated with other predictors. This idea seems to conflict with feature engineering, as our newly derived features can be correlated with the features they were constructed from. Am I missing something here?

2

I-am_Sleepy t1_izr846m wrote

I am not sure why your output need to not be correlated with other predictor. If the task is correlated then its feature should be correlated too e.g. panoptic segmentation and depth estimation

For feature de-correlation there are some technique you can applied. For example in DL there is orthogonal regularization (enforce feature dot product to be 0), and this blog post

1

mymar101 t1_izkz1vl wrote

I’m looking for simple ideas for practical projects to incorporate machine learning into. I’m also looking for something that a solo beginner could do using libraries like scikit learn. Any ideas? I’m not interested in simply predicting things. I’d like a practical application to fit it inside.

2

pythoslabs t1_j00ge8i wrote

Here are some ideas -

- collection of news and finding the impact of news on stock prices ( NLP / Timeseries )

- put a camera in front of your street and predict daily traffic volume ( Computer Vision + prediction )

- predict the winners of the next UFC fight / NFL championship

Basically build a system on events that are currently happening / yet to happen in the near future and evaluate your results against the real outcomes.

​

If you want to do the whole end-to-end project here are the things you have to do -

Try the whole pipeline - starting from

  • data collection
  • cleaning the data ( build rules)
  • building the feature list
  • creating your analytical dataset
  • the complete model creation step
  • prediction
  • evaluation & interpretation of model result
  • deploy to production
  • evaluate model drift
  • model refresh
1

SufficientStautistic t1_izya9wk wrote

What does Gluon offer? How does it compare to TensorFlow and PyTorch?

2

[deleted] t1_j01ifv6 wrote

[deleted]

2

jakderrida t1_j025iqs wrote

Problem with that is that using engagement or clicks will just give you an inferior version of Facebook's formula for turning retirees into conspiracy theorists.

On the other hand, I think you could make one. Perhaps by scraping the abstracts of published research and differntiating between those that later received extraordinary amounts of citations and those that didn't. I actually used NLP models against Seeking Alphas author-tagged articles for Bullish and Bearish to the stocks article pertained to and while I started with expectations to beat a coin toss, the results surged to over 90% accuracy.

1

[deleted] t1_j028rzq wrote

[deleted]

2

jakderrida t1_j02apso wrote

Well, for one, flipping the script already occurs. When I was an electrician, a manager overheard me claim that a device measures resistance in the circuit. He proclaimed it measures continuity of the charge going through it. I repeatedly told him that it's the same thing with no success.

If it measures whether it has many citations, the inverse of the probability measure given will be the probability it has low citations.

Now if what you're looking for is something like short stories, the hurdle to cross would be to find pretagged data that you would consider a reliable measure of "interesting/engaging" to be converted into mutually exclusive dummy variables for the NLP tool to train for. The reason I mentioned published research and citations is only because it's massive, well-defined, and feasible to collect metrics with associated texts.

Just to ensure you don't waste your time with any dreams of building the database without outside sources, I want you to realize that the thing about deep learning/neural network technologies is that it tends to produce terrible results unless the training data is pretty massive. Even the 50,000 tagged articles I used from Seeking Alpha would be considered somewhat frivolous of me by most in the ML community. Not because they're jerks or anything, but because that's just how NNs work.

2

[deleted] t1_j02b3bj wrote

[deleted]

2

jakderrida t1_j02bzq2 wrote

>It must be a pretty hard problem.

Not particularly. The only hurdle is the database. I collected all the Seeking Alpha articles and tags very easily before organizing the data and building the model to astonishing success on Colab.

An alternative would be to find literature from great writers (James Joyce, Emile Bronte, etc.) and divide it into paragraphs as texts, remove paragraphs that are too small and tag those paragraphs as a 1 and take awful writing (Twilight, Ann Coulter, Mein Kampf, etc.) and do the same with them tagged as 0 before training the model to separate the two.

2

[deleted] t1_j04ahtg wrote

[deleted]

2

Phoneaccount25732 t1_j06kihn wrote

With a background in OR and fluid dynamics, once you get going you should check out Kidger's work on Neural Differential Equations.

1

Comfortable_End5976 t1_j08rtd1 wrote

Thank you, I have been looking into them, PINNs, and sciML in general. A fair bit of it is beyond me at the moment which is why I need to catch up on the fundamentals a bit first :)

1

kasperonline t1_j0h1kdw wrote

I’m doing a regression and scaling my data using the MinMaxScaler on sklearn. I want to find out a way I can scale back the regression coefficients so I can interpret them in context of the original data values.

inverse_transform function only works for the data itself. Anybody has any idea how to do such a thing?

2

Unique_Enthusiasm_ t1_iywfdkr wrote

If I have the monthly electricity consumption data for the last 18 months and I want to predict the electricity consumption for the next month, which ML model should I use?

1

ForceBru t1_iyxs8wm wrote

You should probably start with basic time-series models like ARIMA, its seasonal version (seasonality should be particularly important for electricity forecasting) and maybe exponential smoothing.

When looking for research about time-series forecasting, I somewhat often stumble upon these basic methods perform well for electricity forecasting. I can't cite any particular papers since electricity forecasting is not my area of research, but I do feel like these methods are often discussed in the context of electricity forecasting specifically. I'm not sure whether this is a general trend though.

Anyway, in time-series analysis, it's often beneficial to try the traditional models first and only then reach for machine learning. Looks like ARIMA-like models perform fairly well in many cases, so there may be no need for any complicated ML.

3

darthjeio t1_iywoo5p wrote

I’m working on images (let’s say object detection) but the information is somewhat sparse (let’s say detect a white line on a noisy dark background). What would be a good model for this task in order to save computational time/resources? CNN-based SOTA models seems a bit overkill even tho I’m sure they would work. Was thinking about masking or transformers.. any idea?

1

zenmandala t1_iyy3jr5 wrote

I've had success with Squeezenet for finding the origin of white circles in extremely noisy images, so maybe you could use that. Just change the last convolution in the classifier to match the desired dimensions of your output.

I was able to CPU train a solution that way. It's actually my go to for tasks like that because it seems to just do better than some larger newer networks at that sort of thing.

2

_PYRO42_ t1_iyxivdk wrote

I want to create a new type of neural network, but it might be nothing new. I struggle to find anything about it on Google Scholar. I am missing the nomenclature associated with such a technique.

I want to create a neural network with conditional execution. Instead of executing every neuron, layer-by-layers, I wish to build a system where the network can NOT execute a neuron and any subsequent paths after it. By not executing, I mean, no CPU cycles, no computation, no electricity, and no power consumed.

This non-execution of code is conditional. Example: IF A>0.5 THEN execute LEFT neuron ELSE execute RIGHT neuron

Do such systems already exist? What do we call them? I need a name to search for it! :)
Thank you for your help!

1

HandSchuhbacca t1_iyxoq9r wrote

Maybe have a look at mixture of experts? That is a popular method where different blocks are executed conditionally.

2

_PYRO42_ t1_izml827 wrote

Oh lord, that's not a bad one. I love it!
I will use the GPU while retaining recursion and conditionality. Blocks of GPU-processable neurons, linked with particular conditional/recursive neurons.

1

Superschlenz t1_iyy2jx0 wrote

Normally, compute is saved by pruning away slow changing weights which are close to zero.

And you seem to want to prune away fast changing activations.

Don't the machine learning libraries have a dropout mechanism where you can zero out activations with a binary mask? I don't know. You would have to compute the forward activations for the first layer, then compare the activations with a threshold to set the mask bits, then activate the dropout mask for that layer before computing the next layer's activations. Sounds like a lot of overhead instead of a saving.

Edit: You may also manually force the activations to zero if they are low. The hardware has built-in energy saving circuitry that skips multiplications by zero, maybe by 1 and additions of zero as well. But it still needs to move the data around.

1

_PYRO42_ t1_izmknte wrote

I have an intuition: Larger models are successful not because of the amount of computation they can take advantage of but because of the amount of knowledge they can encode. I want to try an ultra-large, ultra-deep neural network with Giga bytes of neurons that would consume no more than 50 Watts of power. The human brain uses 20 Watts; I feel we are making a mistake when we start poking in the 100-200W of power on a single network. I want to control machines, not generate pieces of art. I want Factorio not to be a game but a reality of ours.

I will bring edge computing to this world. I will make it a thing you can wear not on your skin but as your skin.

My brother, come join me. In battle, we are stronger.

1

_PYRO42_ t1_iyyyqw6 wrote

That's about what I was looking for:LIU, Lanlan et DENG, Jia. Dynamic deep neural networks: Optimizing accuracy-efficiency trade-offs by selective execution. In : Proceedings of the AAAI Conference on Artificial Intelligence. 2018.

Problem: Control nodes prevent the direct application of back-propagation to learn.I have an idea of how we could solve that... >:)A way to remove control nodes while still retaining the concept of control

I only need to add recursion. A truly Turing complete NN, with Billions of Neurons but a small execution path. Encoding knowledge, but using it only when needed!

1

Brudaks t1_iz04r4f wrote

Thing is, it's generally more compute-efficient to do the exact opposite and replace conditional execution with something that always does the exact same operations in parallel but just multiplies them by zero or something like that if they're not needed. Parallelism and vectorization is how we get effective execution nowadays.

1

zenmandala t1_iyy3o5i wrote

What's the smallest number of parameters you've seen for MNIST? I've been golfing with myself at it and managed to get 99% validation accuracy at 2922 parameters. I'm wondering if anyone has done lower?

1

mo6phr t1_iyy9xcg wrote

Some guy got 99.1% test acc with ~1900 params link

1

zenmandala t1_iyyjcr4 wrote

Thank you that's awesome. Super surprised to see its a tuned CNN, I've been going FCNN. Very interesting, you've made my day.

1

HandsomeMLE t1_iyz0511 wrote

I've finished training a model, but I'm not confident about how to test or prepare it against unexpected risks in terms of trustworthiness and reliability when deployed. Are there some kinds of rules of thumb or any recommended methods to thoroughly test a model against those unseen risks?

1

trnka t1_iz0722k wrote

If possible, find some beta testers. If you're in industry try to find some non-technical folks internally. Don't tell them how to use it, just observe. That will often uncover types of inputs you might not have tested, and can become test cases.

Also, look into monitoring in production. Much like regular software engineering, it's hard to prevent all defects. But some defects are easy to observe by monitoring, like changes in the types of inputs you're seeing over time.

If you're relationship-oriented, definitely make friends with users if possible or people that study user feedback and data, so that they pass feedback along more readily.

1

HandsomeMLE t1_iz45m9t wrote

Many thanks for your answer! I'll definitely do that. I'm also wondering if there are some kind of tools, services, or even methodologies that help pre-screen potential model defects or that catch unexpected reliability issues the model might have, so I can improve the model quality and accuracy with various methods.

1

trnka t1_iz4nfux wrote

Depends on the kind of model. Some examples:

  • For classification, a confusion matrix is a great way to find issues
  • For images of people, there's a good amount of work to detect and correct racial bias (probably there are tools to help too)
  • It can be helpful to use explainability tools like lime or shap -- sometimes that will help you figure out that the model is sensitive to some unimportant inputs and not sensitive enough to important features
  • Just reviewing errors or poor predictions on held-out data will help you spot some issues.
  • For time-series, even just looking at graphs of predictions vs actuals on held-out data can help you discover issues
  • For text input, plot metrics vs text length to see if it does much worse with short texts or long texts
  • For text input, you can try typos or different capitalization. If it's a language with accents, try inputs that don't have proper accents

I wish I had some tool or service recommendations. I'm sure they exist, but the methods to use are generally specific to the input type of the model (text, image, tabular, time-series) and/or the output of the model (classification, regression, etc). I haven't seen a single tool or service that works for everything.

For hyperparameter tuning even libraries like scikit-learn are great for running it. At my last job I wrote some code to run statistical tests assuming that each hyperparam affected the metric independently and that helped a ton, then did various correlation plots. Generally it's good to check that you haven't made any big mistakes with hyperparams (like if the best value is the min or max of the ones you tried, you can probably try a wider range).

Some of the other issues that come to mind in deployment:

  • We had one pytorch model that would occasionally have a latency spike (like <0.1% of the time). We never figured out why, except that the profiler said it was in happening inside of pytorch.
  • We had some issues with unicode input -- the upstream service was sending us latin-1 but we thought it was utf8. We'd tested Chinese input and it didn't crash because the upstream just dropped those chars, but then crashed with Spanish input
  • At one point the model was using like 99% of the memory of the instance, and there must've been a memory leak somewhere cause after 1-3 weeks it'd reboot. It was easy enough to increase memory though
  • One time we had an issue where someone checked in a model different than the evaluation report
1

HandsomeMLE t1_iz95iqy wrote

Thank you very much for your detailed explanation, trnka. It's been really helpful! It seems inevitable to have lots of unexplained issues in the process and I guess we can't expect to be perfect all at once :)

How would you weigh the importance of validating/testing a model? (maybe it depends on sector/industry?) As a beginner, I hope I'm not putting too much time and effort into it than I should be.

1

trnka t1_iz9j30k wrote

It definitely depends on sector/industry and also the use case for the model. For example, if you're building a machine learning model that might influence medical decisions, you should put more time into validating it before anyone uses it. And even then, you need to proceed very cautiously in rolling it out and be ready to roll it back.

If it's a model for a small-scale shopping recommendation system, the harm from launching a bad model is probably much lower, especially if you can revert a bad model quickly.

To answer the question about the importance of validating, importance is relative to all the other things you could be doing. It's also about dealing with the unknown -- you don't really know if additional effort in validation will uncover any new issues. I generally like to list out all the different risks of the model, feature, and product. And try to guesstimate the amount of risk to the user, the business, the team, and myself. And then I list out a few things I could do to reduce risk in those areas, then pick work that I think is a good cost-benefit tradeoff.

There's also a spectrum regarding how much people plan ahead in projects:

  • Planning-heavy: Spend months anticipating every possible failure that could happen. Sometimes people call this waterfall.
  • Planning-light: Just ship something, see what the feedback is, and fix it. The emphasis here is on a rapid iteration cycle from feedback rather than planning ahead. Sometimes people call this agile, sometimes people say "move fast and break things"

Planning-heavy workflows often waste time on unneeded things, and fail to fix user feedback quickly. Planning-light workflows often make mistakes on their first version that were knowable, and can sometimes permanently lose user trust. I tend to lean planning-light, but there is definite value in doing some planning upfront so long as it's aligned with the users and the business.

In your case, it's a spectrum of how much you test ahead of time vs monitor. Depending on your industry, you can save effort by doing a little of both rather than a lot of either.

I can't really tell you whether you're spending too much time in validation or too little, but hopefully this helps give you some ideas of how you can answer that question for yourself.

2

HandsomeMLE t1_izdtpzc wrote

After all, I take it all depends on what kind of model we're working on, how much we weigh the importance and likelihood of possible risks associated with it, and how to act and measure accordingly.

Thank you very much for your thoughtful input. It's been really helpful!

1

gkamer8 t1_iz0ad6v wrote

I’ve been trying to train a transformer from scratch on a couple books in hopes that it can give me English-ish text, even if it’s overfitting. The model is getting stuck just outputting the most likely token as “space”, second mostly likely as “comma”, third “and” and so on. That’s for every token. Has anyone run into similar issues, or can help me brainstorm some problems? Some things I’ve checked/tried so far:

  • The model can learn a toy problem where sequences are either abc or def - first token is a/b 50%, rest of tokens are 99% correct because they can tell if the first token was a or d. So the model is not completely broken
  • Warmup / long warmup. I used the learning rate formula in vaswani et al. Just tried it last night with a much longer warmup with learning rates multiplied by 0.01, no dice.
  • layer norm epsilon - added one for numerical stability
  • input/output embeddings use shared weights, input embeddings are multiplied 1/sqrt(dmodel)
  • using label smoothing = .1 on my cross entropy loss
  • increased the batch size by accumulating gradients, so every batch had about 20k tokens
  • ran overnight in hopes that it would break out of the local minimum, didn’t
  • using the Adam optimizer

Some other details-

  • using the GPT 2 tokenizer
  • sequence length of 64
  • batches of size 200
  • model is made completely from scratch, so no PyTorch or hugging face libraries
  • the model has the same parameters as “base” in vaswani et al

Any suggestions would be appreciated

1

Brudaks t1_iz4av37 wrote

My intuitive understanding is that transformers are far too "powerful"/assumption-free that they are quite data-hungry and need far more than "a couple books" to learn the structure.

If all you have is a couple of books, then IMHO a small RNN would bring better results than a transformer (but still bad - "all the works of Shakespeare" seems to be a reasonable minimum to get decent results) and the inflection point where transformer architecture starts to shine is at much larger quantities of training data.

If you do want to exactly that (and with overfitting), try starting from a sequence length of, say, 4 or 8 as a starting point.

2

gkamer8 t1_iz55n2z wrote

Thanks- since writing this, I got past that particular minimum with better initialization and a modified arch, but it still isn’t generating terribly interesting text. I upped the dataset to about 10 books. I think I’ll download a proper large dataset to see if it can do any better. Thanks!

1

silverjoda t1_iz19r28 wrote

What is multi-objective optimization about? In the end, the weight of the individual objectives is a designer specification and any multi objective optimization can be formulated as a single objective optimization that is the weighted sum of all the objectives.

1

Wahajs t1_iz1oxuh wrote

We have multiple employment laws across the region and get constant queries as part of a business. Laws differ region to region and industry to industry. Is there a way to train a machine/bot on it that shares answers and for complex queries points it to a human?

I want to build something that can pick up items from the law and respond with reference to the law. Happy to invest my teams time to train but need a starting point as I am not from the industry.

1

zenmandala t1_iz3n8kx wrote

That's a domain specific chatbot. There are a bit too many factors in how your current data for answers is stored to be specific. I would look at various approaches to domain specific chatbots and then see which one is most applicable for you. This paper might be a starting point: https://arxiv.org/ftp/arxiv/papers/2001/2001.00100.pdf

One piece of advice I would personally give is read a lot before starting such a project. Better to have a clear plan than try to establish as you go.

1

csreid t1_izbhqi8 wrote

You might be able to start with Rasa, which is an open source chatbot framework.

1

augustintherome t1_iz3drnz wrote

I am trying to build a product that would integrate user's many data sources like notion, email, notes, company chats, jira, linear and e.t.c; after which he would be able to ask natural questions like "When do i need to workout today?", to which he will get something like "Today's workout 19:00 (Link to original document)".

What direction should i follow with this idea (e.g. semantic search, text embedings)?

1

csreid t1_izbhhqs wrote

What you're describing is just called "question answering" in NLP afaik. A language model will take in a source document and a question and spit out either a generated answer to the question or a section of the source text containing the answer.

Check some of the QA models on huggingface to get an idea if you're not already familiar

1

Old_Stick_9560 t1_iz3yo2l wrote

So I have about 1m data set with 25 attributes. I wanted to know how can I either segment the data or can i take 1% or 10% of the dataset. Like how do i tell which approach to go to? Target is a simple 3 state classification model and the dataset mostly contains numerical data.

1

jrhabana t1_iz4d85e wrote

About text generation: Can someone share experiences with Bloom, Palm, and others than Gpt-3

I'm trying to build a niche text generator in Spanish using open-source models, but the internet conversation is dominated by gpt-3, and few articles about Prompt engineering, or model comparisons

1

still_tyler t1_iz587yc wrote

What's the best way to go about a multiclassification problem in which 3 of the features are x, y, z coordinates, with each row only having one location per outcome?

I'd like to take advantage within the model of the idea that there is spatial correlation in the outcomes (e.g. one record close to another in x, y, z will likely have a similar outcome). The spatial components make me want to use a CNN, but each input being just a 1x3 vector rather than something bigger makes me think that's not possible?

(fwiw, xgboost has the best predictive accuracy. Tried a gaussian process too but XGB still beat it. Was thinking there might be a NN approach but google has not been fruitful)

1

trnka t1_iz9ol1s wrote

&gt; one record close to another in x, y, z will likely have a similar outcome

That sounds a lot like k-nearest neighbors, or SVM with RBF kernel. Might be worth giving those a shot. That said, xgboost is effective on a wide range of problems so I wouldn't be surprised if it's tough to beat. Under the hood I'm sure it's learning approximated bounding boxes for your classes.

I haven't heard of CNNs being used for this kind of problem. I've more seen CNNs for spatial processing when the data is represented differently, for example if each input were a 3d shape represented by a 3d tensor rather than coordinates.

2

still_tyler t1_iz9prl7 wrote

Yeah, XGB still outperforms knn and svm here. There's a bunch of other non-coordinate covariates that contribute and XGB just kicks butt in this case. Fair enough, thanks for the response!

1

csreid t1_izbgnyy wrote

> The spatial components make me want to use a CNN, but each input being just a 1x3 vector rather than something bigger makes me think that's not possible?

The point of the convolution is to efficiently capture information from surrounding pixels when considering a single pixel. Back in the pre-DL olden days, computer vision stuff still involved convolutions, they were just handcrafted -- we had a lot of signal processing machinery we could use to eg detect edges and such. In your case, you don't really have anything to convolve over.

You could try just feeding the coordinates into an MLP with the other covariates and it should be able to capture that spatial component.

1

Accomplished-Bill-45 t1_iz7dgwm wrote

Are currently state of art model for logical/common-sense reasoning all based on NLP(LLM)?

Not very familiar with NLP, but I'm playing around with OpenAI's ChatGPT; particularly impressed by its reasoning, and its thought-process. Are all good reasoning models derived from NLP (LLM) models with RL training method at the moment? What are some papers/research team to read/follow to understand this area better and stay on updated?

1

Nameless1995 t1_izj36ff wrote

I think in principle if you have enough resource, and investigate the right fine tuning techniques you can get the SOTA out of them. However, at the moment it's quite new, moreover not as open access for research. Furthermore, RL training is not easy to do for a random researcher (because it's kind of in a human in the loop framework and you require human annotations --- you can kind of probably do it with AWS and such though; but it probably won't easily become a standard too soon because of the inconvenience).

Another thing is ELLM (let's say "extra large language models" to distinguish GPT3+ style of models from BERT/BART/Roberta/GPT2 style of models) are generally used in few-shot setup/instruction following setup, and probably won't fit exactly with "fine-tuning on whole dataset" setup. And it can be again hard for random researchers to fine-tune or even run those humongous models. So it may again take time to time to seep into everywhere.

In my investigation ChatGPT seems to still struggle a bit on some harder logical challenges (some of which even I struggled a bit with) eg. in LogiQA: https://docs.google.com/document/d/1PATTi0hmalBvY_YQFr4gQrjDqfnEUm8ZDkG20J6U1aQ/edit?usp=sharing

(although you can probably improve upon by more specialized RL training for logical reasoning + multiple reasoning path generation + self-consistency checking + least-to-most prompting etc.)

I think SOTA of logiQA is: https://aclanthology.org/2022.findings-acl.276/ (you can find relevant papers by looking at the citation network in semantic scholar)

For reasoning on other areas, you can probably use the chain of thought papers and its related citations to keep track (because COT is almost a landmark in prompt engineering for enhanced reasoning, and most future ELLM paper working on reasoning would probably cite it).

Don't know much about common-sense reasoning (either as a human or in terms of research in that area).

1

mrpacetv t1_iz98n4y wrote

What will be the suitable network to train on size of grains in an image?
Inputs will be images (64x64 or higher order 256x256) and output should be number float (size of particles in image). I prepared a dataset using voronoi cells.
I looked into the digits recognition problem but that seemed classification problem (categories of 10 digits). So in examples it was using MLPclassifier or CNN( categorial loss and softmax in final layer.)

1

teenaxta t1_iz9t4k2 wrote

how much of an improvement is RTX 3090 over RTX 3080 10GB for deep learning. Will be working mostly with resnet-50 or something like that

1

_RootUser_ t1_izemqro wrote

I am trying to make a project for my college inserting elements of AI in it. But I want to know enough about concepts/formulae behind libraries, functions of ML/DL. I have found Statistical Learning to be too vast for me. Yet, I cannot seem to grasp the concept behind choice of libraries, functions and layers in model creation.
What am I missing? How do I approach this problem? I am trying to recreate certain functions and models from scratch with numpy but I fail in it. Any advices or suggestions for me? I have about 7-10 days where I can give more than 12 hours everyday for this. How should I approach this problem?

1

I-am_Sleepy t1_izr913e wrote

Your description is a bit vague, but if it is a regression, try linear model using least squared with polynomial features

1

Readityesterday2 t1_izg4iim wrote

What are good desktop alternatives to GPTchat that you can train on your own for creating text?

What are good ones for image generation?

Thanks!

1

Nameless1995 t1_iziy3gn wrote

What do you mean by "desktop alternatives"? You mean something you can train on a single GPU or two? I don't think you would get any real alternative for that unless you lower your expectations by a lot. But for more open source stuff you can check https://www.eleuther.ai/ GPT-style models and others like Blenderbot, bloom etc. For image generation, probably stable diffusion or something.

1

mayermensch69 t1_izk7m7b wrote

I came across this approach of dialog evaluation: https://github.com/Shikib/fed

What I don't understand is, how the (more or less) raw loss can be used as a metric, since it is not really bounded. It may work when directly comparing specific examples with this method, but how does one compare these scores to other metrics with a fixed scale?

1

Subject-Resort5893 t1_iznkd7x wrote

Can you do machine learning in SQL? Or do you have to have python/R?

1

BlueSubaruCrew t1_izr01bd wrote

SQL Is mostly used to query data from databases (which is important for a lot of machine learning projects). The actual machine learning stuff is usually implemented in python/R.

2

Duckdog2022 t1_izvt3xa wrote

Can you do it? Probably, yes. But that'll be very ugly and inefficient if you can even make it work.
So better use something that's designed for it like Python or R.

2

ollih12 t1_izre2kw wrote

What is the best approach for text generation?

For context: I'm trying generate episode synopsis of a show by training a model with existing episode titles and synopsis of a show and using an input title as the input for the generated episode. I've read that LSTM models are good for this since they maintain the context. I have also read that GPT-3 is the best for this but it's not free. This is just a personal project and I intend on using PyTorch if it's of any significance, currently I have scraped synopsis and titles of existing episodes and have them stored in a pandas dataframe so just not sure where to go from here.

1

pythoslabs t1_izxk8im wrote

>also read that GPT-3 is the best for this but it's not free.

Try ChatGPT (https://chat.openai.com/ ) . Its free pre-beta release and so you can try your hands on it .

Also be careful that it might not be 100% factually accurate . But to try out simple text generation, it should do the job pretty well.

2

ollih12 t1_izzhdj9 wrote

Can ChatGPT be fine tuned for what I described?

1

BrightCounter738 t1_izzwtbu wrote

It is not open-sourced (and one probably wouldn’t be able to run it personally even if it was), so no.

1

ollih12 t1_j00bp15 wrote

Would the GPT-2 model from the transformers package be ok for it?

1

pythoslabs t1_j00ffog wrote

Yes.

You have to train on their system with your custom data. It is costly though.

eg: if you want to train on the Davinci model will cost you - $0.0300 / 1K tokens for training ( fine tuning ) and $0.1200 / 1K tokens for its usage - if you wish to use it as an API end point )

1

ollih12 t1_j00hiav wrote

Are there any free alternatives you would recommend?

1

GaseousOrchid t1_izrjinp wrote

How do flax/haiku (for ML with Jax) compare? I've seen people use both, but there seem to be a lot more people using flax. I've alwyas preferred Haiku, but last tried Flax 2 years ago -- curious if things have changed?

1

Trustafew t1_izu4zh0 wrote

What approach would you use for modelling the proportional impact of one value on another over time? I’m curious about modelling the impact of CPI on interest rates. Any time series ML resources you would recommend? I’m a bioinformatician/researcher and I do tons of clustering and decision tree based stuff w R using caret and now TidyModels but adding the time dimension is new (scary!) to me.

1

Dramatic_Sector_6237 t1_j014rrb wrote

Is AI limited to tabular data?

I am completing an online training to become data scientist and this question just popped in my mind.

Indeed, during my training I only use csv files although I can manipulate SQL databases in order to get those csv files.

&#x200B;

However I was wondering if it was possible to train AI models on other kind of data that tabular ones because otherwise it appears to be quite limiting in my point of view. (I am barely a junior data scientist so maybe my question is naïve or so...)

1

creativekinase t1_j04zk32 wrote

Hi everyone,

I'm using a tensorflow keras model to classify medications using infrared spectra. I'm wondering if there are resources of how many spectra I should have for each class and if there is a maximum number of classes I can have (if there is a maximum).

Thanks!

1

BackgroundFeeling707 t1_j05lywf wrote

Hi, How do local language model inferencing such as kobold ai's webui keep information? I understand you can only produce a certain number of tokens in one go.

Does it just use the last 30 tokens or so in the new batch?

Eventually. I run out of memory.. Unable to continue the text adventure. It Shouldn't do that right?

Are there techniques to store info?

1

BackgroundFeeling707 t1_j05mhuh wrote

In general, for stable diffusion, why are there often large vram spikes at the end of inferencing, and what kind of code techniques are done to solve this problem?

1

faCt011 t1_j08c288 wrote

Hello,

I'm new to the topic. I was wondering why Microsoft refers to machine learning models as "files"? Is it really something which is comparable to a .txt file that you can open and edit? I always imagined models as a big collection of numbers. So, how are production-ready models saved and used?

Source: https://learn.microsoft.com/en-us/windows/ai/windows-ml/what-is-a-machine-learning-model (very first line)

Thanks in advance :)

1

EdenistTech t1_j097uyn wrote

Hello. I have binary classification problem. However, instead of aiming for a high overall prediction rate for the entire training set, I would like to find subsets of features that with a very high probability places a given sample in category X and other subsets that place samples in category Y. In other words a prediction should not be attempted if the conviction of the estimate is low. Does such an algorithm exist?

1

drewfurlong t1_j0a3zn7 wrote

Would you say you're looking for a classifier with high precision, and perhaps low recall?

2

EdenistTech t1_j0aa5is wrote

To some extent yes. But rather than focusing on the true positives of the entire training set, I would be interested in the algorithm carving out subsets of features and values for which precision is very high - higher than the precision of the entire training set. I hope that makes sense?

1

onionhead888 t1_j0j7xza wrote

I think you’re looking for random forests which is a binary split algorithm.

2

drewfurlong t1_j0a3mbn wrote

Have you come across any excellent explanations for how the various attention layers work? Ideally with worked examples and graphics.

After reading the wikipedia article on the topic, and 6 pages into attention is all you need, I'm thoroughly confused. it's tough to keep track of what's a key, query, and value, where the recurrent layer goes, etc.

1

Wakeme-Uplater t1_j0atqe1 wrote

What is an alternative way to optimize for ads budget allocation?

As far as I know, RCT (A/B testing) is the golden standard to test for different factor effect. But that assume the distribution don’t drift overtime, and is cost intensive

Another method I found is Marketing Mix Modeling. But this seems to riddle with biases and pitfall, which stemmed from heavy induction bias, and treating correlation as casual model

As my understanding goes, there is a way to reduce confounding effect through causal inference

However, casual inference requires causal graph. This can be done with causal graph discovery, but it impose a lot of (unverifiable) assumptions on the data generation process. So a domain specific causal graph construction is still needed (which is not ideal)

So are there alternative methods to ROI? Also is there a stochastic process that model this problem? Pinning link of related topic/research is also welcome

1

rr718 t1_j0b9nst wrote

Hey. Is it true that ML algorithms need highly formatted, uniform data while neural nets can learn from even unformatted/real-world data?

1

CuriousJam t1_j0c9q1j wrote

Is there a reason why regression models tend to poorly capture local maxima and minima? My reasoning is that those points are typically underrepresented in training data, or is there another reason why it does so poorly at those points?

1

AttentionNo6483 t1_j0cc1bi wrote

Take a look at the loss function. The regression line minimizes across the entire data set (which is why outliers can impact it). If you want to focus on local maxima you would want to do piecewise regression.

2

FrankyMonkey t1_j0crrzr wrote

Hello ML community. I am not the most knowledgeable person on this topic. But I am interested in analyzing the Export Controls. And my question to you is "why do you think it would be difficult to limit API-based access to ML?

&#x200B;

Any elaboration is more than welcome. Thanks.

1

BuiltLikeABagOfMilk t1_j0e2vtm wrote

How do you determine an appropriate minimum support level when doing market basket analysis?

1

honchokomodo t1_j0f156j wrote

how do i make my autoencoder use more of the latent dimensions?

1

I-am_Sleepy t1_j0mhitl wrote

For starter, look at InfoVAE (See this blog for context). Another way is to vector-quantized it (VQ-VAE based models), as the model only need to learn a small number of latent, it can optimize them better

2

seacucumber3000 t1_j0fem4u wrote

When tuning hyperparameters, is learning rate (decay, scheduling, etc.) dependent on things like model size and activation function? Or can I search for the ideal model architecture first, then tune learning rate after?

1

-zharai t1_j0goty0 wrote

Are there any study groups for the fast.ai deep learning course? I'm open to rushing things a bit to catch up.

1

throwaway2676 t1_j0hvos1 wrote

What are the chances ChatGPT offers a subscription mode that is totally uncensored?

1

MaterialLogical1682 t1_j0ijykt wrote

Can anyone suggest a good book to learn about credit risk modeling in Python? Or in general a good source to start from.

Thanks

1

toothie25 t1_j0jvmec wrote

How can the performance of a chatGPT model be evaluated in a way that takes into account both the quality of its generated responses and its ability to maintain coherence in long-term conversations? One approach might be to use a metric like perplexity, which is a measure of how well a language model predicts the next word in a sequence given the words that have come before it. However, perplexity does not necessarily capture the coherence of the model's responses over multiple turns in a conversation. Another possibility might be to use a measure like the BLEU score, which compares the model's generated responses to a set of reference responses and assigns a score based on the overlap between the two. However, the BLEU score does not take into account the quality of the generated responses themselves, only their similarity to the reference responses. Is there a way to combine these two approaches, or to come up with a new metric that takes into account both the quality and coherence of the model's responses in a more holistic way?

1

Old_Pea3923 t1_j0kple3 wrote

I thought the context window of gpt3 was 2048 or 4000 tokens so how does chatGPT work?

1

Old_Pea3923 t1_j0kq93b wrote

"While ChatGPT is able to remember what the user has said earlier in the conversation, there is a limit to how much information it can retain. The model is able to reference up to approximately 3000 words (or 4000 tokens) from the current conversation - any information beyond that is not stored.
Please note that ChatGPT is not able to access past conversations to inform its responses." - https://help.openai.com/en/articles/6787051-does-chatgpt-remember-what-happened-earlier-in-the-conversation

My question then is how does it do this?

1

Daminio6 t1_j0luzwr wrote

Hi, recently I was watching 3D porn doing some scientific research and... Oh, it will be hard to explain, ok, I was watching 3D porn and noticed that there always animated one or few frictions, which then looped for long time and this makes such porn somewhat dull. So, I got an idea: what if someone can use NN to build skeleton of humans in real porn video and then attach 3d models to this skeletons? I see that there exist NNs which can estimate human skeleton but never used them. How do you think, is this possible with current existing models to take porn video, then estimate skeletons of two people on that video, and use resulting skeleton for 3D porn? And what problems may occur if someone will try to do this?

1

Maria_Adel t1_j0nwkbm wrote

What models would you use for product assortment/getting the product range right for different stores

1

I-am_Sleepy t1_j0phzjn wrote

I'm not sure, but I think there are several ways to model product assortments

First, Demand forecasting - You predict demand of each product, and act accordingly. This usually can be done using time-series forecast, or

Second, personalize taste - You assume that each customer has their own fixed preference, and you modeled that. If you know the demographic of each customer, you would be able to estimate the demand from recommended products

But the later probably going to output a static distribution, so I think you can apply demand forecast on the second method to discount them correctly (I think)

However, every method need data. If you have a cold-start product, you might want to perform basic A/B testing first to get the initial data

1

Maria_Adel t1_j0s4l0o wrote

Thanks a lot. Data is available so that should not be a problem. What models would you suggest for demand forecasting of each product ( gradient boosting or hybrid deep learning models or ARIMA) and what key variables would you include in the model ( I’d suspect previous sales, price)

1

I-am_Sleepy t1_j0sh383 wrote

If you have a target variable, and other input features, you can treat this problem as a normal regression problem. Using model like linear regression, Random Forest Regression, or XGBoost is very straight forward from there

You can then look at feature importance to try to weed-out the uncorrelated features (if you want to). There are a few automated ml for timeseries, but currently I mostly use Pycaret

But if you suspect that your target variable autocorrelate, model like SARIMAX can be used instead. An automated version of that is Statsforecast e.g. AutoARIMA with exogenous variables (haven't used it though)

But noted that if you are in direct control of a few variables, and you want to predict want will happen, this is no longer a simple regression anymore i.e. the data distribution may shift. That would be in Casual Inference territory (see this handbook)

1

finlaydotweber t1_j0op4rt wrote

New to Machine learning and starting to learn about it. I think I understand how ML is fundamentally different from traditional programming. I have also come across the 3 categories of ML: Supervised, Unsupervised and Reinforced learning.

There 2 other terms I often seen thrown about that I still can't fit into my fundamental understanding of machine learning. And these are Neural Networks and Deep Learning.

What exactly is a Neural Networks? What exactly is Deep Learning? How does it fit into ML? Are they other kinds of ML? or techniques used in the different subset of ML?

1

sargentpilcher t1_j05ib9c wrote

Hello everyone.

Im a big fan of using Automatic1111's WebGUI for Stable Diffusion, but I'm wondering if there's an equivalent for StyleGAN? Or do I need to essentially learn a bit of programming before I can actually play around with such an amazing AI tool?

0

Nirmalpb t1_izoj68v wrote

Suppose you have data on patients that record their height (in millimetres) and the length of their eyelashes (in millimetres). The aim is to perform clustering on this data. Explain why clustering using a distance measure such as Euclidean distance would be problematic on this data and outline how you could modify the data to address this.

−4