Submitted by AutoModerator t3_11pgj86 in MachineLearning

Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead!

Thread will stay alive until next one so keep posting after the date in the title.

Thanks to everyone for answering questions in the previous thread!

35

Comments

You must log in or register to comment.

WesternLettuce0 t1_jbyl2wi wrote

I used distilbert and legalbert separately to produce embeddings for my documents. What is the best way to use the embeddings for classification? Do I create document level embeddings before training my classifiers? Do I combine the two embeddings?

1

kuraisle t1_jbyrulz wrote

Has anyone had any experience data mining BioArXiv? It's on a requester pays Amazon s3 bucket, which isn't something I've used before and I'm struggling to guess how much I would have to pay to retrieve a few thousand articles. Thanks!

5

I1onza t1_jc1g0u9 wrote

I'm a material engineering student and an outsider to the ML and AI community. During my studies I take notes on my laptop and don't have a quick and reliable solution for copying down simple graphs. With recent publicity of AI models I was wondering if someone already tried to train a model to draw graphs form natural language. DALL - E does it quite horribly (Cf. picture ). If you haven't heard of such a thing, maybe its a project you might find interesting to make.

0

EcstaticStruggle t1_jc1jts4 wrote

How do you combine hyper parameter optimization with early stopping in cross-validation for LightGBM?

Do you:

  1. Use the same validation set for hyperparameter performance estimation as well as early stopping evaluation (e.g., 80% training, 20% early stopping + validation set)
  2. Create a separate fold within cross-validation for early stopping evaluation. (e.g. 80%, 10%, 10% training, early stopping, validation set)
  3. Set aside a different dataset altogether (like a test set) which is constantly used for early stopping across different cross-validation folds for early stopping evaluation.

In the case of 1) and 2), how would you use early stopping once you identified optimal hyperparameters? Normally, you would re-fit on the entire dataset with the best hyperparameters, but this removes the early stopping data.

1

denxiaopin t1_jc1uk2f wrote

How difficult and time consuming is it to teach AI how to choose glasses according to the type of face with tools we have today?

1

bangbangwo t1_jc21eac wrote

Hey, I'm new at ML and I have a question. I've created a LSTM and XGBoost model etc, trained it, evaluated it etc. But now, how do I actually forecast future data ? Do you have a notebook where the creator actually plot predictions? I can't seem to find one !

1

AnomalyNexus t1_jc2ictg wrote

Do I need a specific GPU generation for 4bit weights? Or just anything that supports tensorflow/pytorch?

1

TwoTurnWin t1_jc2thhm wrote

So I'm working with the UrbanSound 8k set on Kaggle.

I want to try two approaches:

  • MFCCs and Mels for image classification.
  • Raw audio data classification.

Would a 1DCNN work for both approaches?

1

towsif110 t1_jc4khxj wrote

What would be the way to detect any malicious nodes by machine learning? Let's say, I have datasets of RF signals of three kinds of drones. But my target is to detect any malicious drone except the drones I possess. I have two ideas: one is to use label two drones as good and the remaining one as malicious and my othe idea is to use unsupervised learning. Is there any better way?

1

Anthony-Z-Z t1_jc5a9kk wrote

What are some good YouTube channels to learn Machine learning?

1

Neeraj666 t1_jc5edo3 wrote

I am looking to build a ML model which can analyse answers for behavioural interview questions and provide a rating? e.g. Talk about a challenging situation at work and how did you overcome that.. wondering where should I start and which algorithms to focus on etc.

1

tiddysiddy t1_jc5g7vm wrote

I have a codebase I want to train GPT on so that I can ask it questions. Is there any way to accomplish this with either GPT or any other LLM?

My current challenge is the tuneable davinci model from openAi is not as good as text-davinci and gpt turbo. But also the finetuning is only based on simple labelled data. I want it to be able to interpret my codebase on its own and train up a version of an LLM which understands and can come up with ideas

Is this a long shot? I've noticed Bing can sometimes search up pages of documentation and gives decent instructions

1

nitdit t1_jc5gfc3 wrote

What is stroke data? (sure, it is not the heart stroke)

1

DreamMidnight t1_jc5om1y wrote

What is the basis of this rule of thumb in regression:

"a minimum of ten observations per predictor variable is required"?

What is the origin of this idea?

3

mmmfritz t1_jc5s85o wrote

Fact checking. Any open source models or people working on fact checking?

1

No_Complaint_1304 t1_jc7szbg wrote

Complete beginner looking for insight

I made an extremely efficient algorithm in C that skim through a data base and search for words, I want to add a feature that if it is not found the program can somehow understand the context and predict what is the actual word intended and also conjugate the verbs accordingly. I have no idea if what I am saying is crazy hard to implement or can easily be done by someone with experience. This field interest me a lot and i will definitely come back to this sub sooner or later, but right now i don’t have time to dig in this subject, I just want to finish this project, slap a good looking gui and get over with it. Can I achieve what i stated above in a week or am i just dreaming? If it is possible what resources do you think I should be looking at? Ty :>

1

trnka t1_jc8csxm wrote

If you have significant data, I'd suggest starting with BERT (and including some basic baselines).

If you only have a small amount of data, you might be able to use GPT models with a fair amount of prompt engineering.

Also, you'll probably face different challenges if the candidate types the response vs an interviewer is summarizing a response. If it's an interviewer's notes, you might find simple proxies like certain interviewers will type more for good candidates.

1

Abradolf--Lincler t1_jc8ynrt wrote

Learning about language transformers and I’m a bit confused.

It seems like the tutorials on transformers always make input sequences (ie. Text files batched to 100 words per window) the same length to help with batching.

Doesn’t that mean that the model will only work with that exact sequence length? How do you efficiently train a model to work with any sequence length, such as shorter sequences with no padding and longer sequences than the batched sequence length?

I see attention models advertised as having an infinite window, are there any good resources/tutorials to explain how to make a model like this?

1

Sonicxc t1_jca1rh1 wrote

How can i train a model so that it detects severity of damage in a image. Which algo will suit for my need?

3

2lazy2buy t1_jcaary6 wrote

How is one achieving long context lengths for LLM? Chatgpt has a length 32k? Is the transformer decoder "just" that big?

2

trnka t1_jcalqfm wrote

Converting the text to fixed-size windows is done to make training more efficient. If the inputs are shorter, they're padded up to the correct length with null tokens. Otherwise they're clipped. It's done so that you can combine multiple examples into a single batch, which becomes an additional dimension on your tensors. It's a common technique even for LSTMs/CNNs.

It's often possible to take the trained model and apply it to variable-length testing data so long as you're dealing with a single example at a time rather than a batch. But keep in mind with transformers that attention does N^2 comparisons, where N is the number of tokens, so it doesn't scale well to long texts.

It's possible that the positional encoding may be specific to the input length, depending on the transformer implementation. For instance in Karpathy's GPT recreation video he made the positional encoding learnable by position, so it wouldn't have defined values for longer sequences.

One common alternative in training is to create batches of examples that are mostly the same text length, then pad to the max length. You can get training speedups that way but it takes a bit of extra code.

2

PhysZhongli t1_jccbh4w wrote

Hi everyone, I am a novice trying to learn ML and AI. I am trying to train a CNN model to classify 9000+ images with 100 labels. These images are flower patterns/leaves from what I can tell. The catch is that the actual test dataset has 101 labels and the when the model detects an image not in the original 100 labels it has to assign it to the 101st label. What would be the best way to go about doing this?

I have used resnet50 with imagenet weights and made some of the previous layers trainable to fine tune the model. I have followed it with a globalaverage layer, a 1024 node dense layer with l2 regularization, batchnorm, dropout and softmax layer as the classifer. I am using adam optimizer with a batch size of 16, learning rate of 0.0001. I then assign a threshold value of 0.6 and if the model prediction is below the threshold value it assigns it the 101st label. Currently i have a ~90% testing accuracy.

Are there any obvious things i should be doing better/changing and how can i go about optimising the threshold value or is there a better way to handle the 101st label? Should i be using resnet or something else for flower patterns and leaves given my training dataset of 9000+ images

1

ViceOA t1_jccekvp wrote

Precious Advices About AI-supported Audio Classification Model

Hello everyone,I'm Omer.
I am new in this group and writing from Turkey. I need very valuable advice from you precious researchers.
I am a PhD program student in the department of music technology. I have been working in the field of sound design and audio post-production for about 8 years. For the last 6 months, I have been doing research on AI-supported audio classification.My goal is to design an audio classifier to be used in the classification of audio libraries. Let me explain with an example as follows; I have a sound bank with 30 different classes and 1000 sounds in each class (such as bird, wind, door closing, footsteps etc.).
I want to train an artificial neural network with this sound bank. This network will produce labels as output. I also have various complex signals (imagine a single sound track with different sound sources like bird, wind, fire, etc.). When I give a complex signal to this network for testing, it will give me the relevant labels.I have been doing research on this system for 6 months and if I succeed, I want to write my PhD thesis on this subject. I need some advice from you, my dear friends, about this network. For example, which features should I look at for classification? Or what kind of artificial intelligence algorithm should I use?
Any advice you say you should definitely read this article or that article on this subject.I apologize if I've given you a headache. I really need your advice. Please guide me. Thank you very much in advance.

1

BM-is-OP t1_jccin4h wrote

When dealing with an imbalanced dataset, I have been taught to oversample on only the train samples and not the entire dataset to avoid overfitting, however this was for structured text based data in pandas using simple models from sklearn. However is this still the case for image based datasets that will be trained on a CNN? I have been trying to oversample only the train data by applying augmentations to the images. However, for some reason I get a train accuracy of 1.0 and a validation accuracy of 0.25 which does not make sense to me on the very first epoch, where the numbers dont really change as the epochs progress which doesn't make sense to me. Should the image augmentations via oversamping be applied to the entire dataset? (fyi I am using PyTorch)

2

Batteredcode t1_jccqitv wrote

I'm looking to be able to train a model that is suited to taking an image and reconstructing it with additional information, for example, taking R&G channels for an image and recreating it with the addition of the B channel. On first glance it seems like an in-painting model would be best suited to this, and treat the missing information as the mask, however I don't know if this assumption is correct as I've not got too much experience with those kinds of models. Additionally, I'm looking to progress from a really simple baseline to something more complex, so I was wondering if an architecture of a simple CNN or an autoencoder trained to output the target image given image missing information, but I may be way off here. Any help greatly appreciated!

1

rainnz t1_jce240r wrote

I have degree in CS but have not done anything with ML, AI, NN or CV.

I want to create simple program, that I intend to run on Nvidia Jetson Nano, that will process live HDMI video stream from a street video camera. If someone appears in the video feed, holding a sign with a specific sport's team symbol, like Arizona Cardinals - I want this to be detected right away and some action performed. Like sending an email.

Is it something I can do with OpenCV's object detection? If not - please let me know what would be the appropriate framework I'd need to use for this.

Thank you.

2

ilrazziatore t1_jcf9ag7 wrote

In your job as data scientists have you ever had to compare the quality of the probabilistic forecasts of 2 different models? if so, how do you proceed?

1

Capital-Duty-744 t1_jcfidsx wrote

What are the most important concepts that I need to know for ML? Possible courses are below:
Algebra & Calculus II
Algebra & Calculus III
Bayesian Stats
Probability
Multivariate stats analysis
Stochastic processes
Time series
Statistical inference

To what extent should I know and be familiar with linear algebra?

2

fteem t1_jcg3zlh wrote

What happened with the WAYR (What Are You Reading) threads?

2

LeN3rd t1_jcgn73n wrote

Can anyone recommend a good, maintained and well organized MCMC python package? Everything i found was either not maintained, had only a single research group behind it, or had to many bugs for me to continue with that project. I want Tensorflow/Pytorch, but for MCMC sampling please.

2

LeN3rd t1_jcgo5ro wrote

You should take a look at uncertainty in general. What you are trying to do is calculate epistemic uncertainty. (google epistemic vs aleatoric uncertainty).

One thing that works well is to have a dropout layer, that is active during prediction!! (in tensorflow you have to feed training=True into the call to activate it during prediction). Sample like 100 times and calculate the standard deviation. This gives you a general "i do not know" function from the network. You can also do so by training 20 models and letting them output 20 different results. With this you can assign the 101 label, when the uncertainty is too high.

In my experience you should stay away from bayesian neural networks, since the are extremly hard to train, and cannot model multimodal uncertainty. (dropout can neither, but is WAAAAYYY easier to train).

1

LeN3rd t1_jcgp44s wrote

How big is your dataset? Before you start anything wild, i would look at kernel clustering methods. Or even clustering without kernels. Just cluster your broken and non broken images and calculate some distance (can be done with kernels if it needs to be nonlinear).

Also Nearest neighbor could work pretty well in your case. Just compare your new image to the closest (according to some metric) in your two datasets and bobs your uncle.

If you need a number, look at simple CNNs. you need more training data though for this to work well.

2

LeN3rd t1_jcgq97y wrote

You will need more than a week. If you just want to predict the next word in a sentence, take a look at large language models. ChatGPT being one of them. BERT is a research alternative afaik. If you aim to learn the probabilities yourself, you will need at least a few months.

In general what you want is a generative model that can sample from the conditional probability distribution. In sequences usually transformers like BERT and chatgpt are state of the art. You can also take a look at normalizing flows and diffusion models to learn probability distributions. But this needs some maths, and i unfortunatly do not know what smaller models can be used for computational linguistic applications like this.

1

LeN3rd t1_jcgqzvo wrote

If you have more variables than datapoints, you will run into problems, if your model starts learning by heart. Your models overfits to the training data: https://en.wikipedia.org/wiki/Overfitting

You can either reduce the number of parameters in your model, or apply a prior (a constraint on your model parameters) to improve test dataset performance.

Since neural networks (the standard emperical machine learning tools nowadays) have a structure for their parameters, this means they can have much more parameters than simple linear regression models, but seem to run into problems, when the number of parameters in the network matches the number of datapoints. This is just empirically shown, i do not know any mathematical proves for it.

1

LeN3rd t1_jcgrhlm wrote

Be a little more coherent in your question please. No one has any idea about your specific setup unless you tell us what you want to achieve. I.e. RF is usually short for reinforcement learning in the AI community, not radiofrequency. If you want to classify data streams coming from drones, take a look at pattern matching and nearest neighbour methods, before you start to train up a large neural network.

3

LeN3rd t1_jcgrxfp wrote

Strongly depends on your constraints. There are ways to get 3d geometry from a photo/video. If you have the geometry of your glasses you should be able to see if they fit, though you might have some problems with actually adjusting the glasses to fit on the face geometry. But you could also just do what you optician does and take a frontal photo of your face in a controlled environment.

1

No_Complaint_1304 t1_jcgshk7 wrote

Well I did expect this but still month’s! I’ll look into everything you mentioned. And I’ll drop the project for now, if I can’t finish it by studying heavily, I might as well learn slowly but surely, absorb all the information and then go back to make a project that involve predictions and analyzing data. ty4ur help

1

LeN3rd t1_jcgsjxq wrote

define probabilistic. Is it model uncertainty, or data uncertainty? Either way you should get a standard deviation from your model (either as an output parameter, or implicitly by ensembles), that you can compare.

1

LeN3rd t1_jcgu1z5 wrote

This is possible in multiple ways. Old methods for this would be to view this as an inverse problem and apply some optimization method to it, like ADMM or FISTA.

If lots of data is missing (in your case the complete R&G channels) you should use a neural network for this. You are on the right track, though it could get hairy. If you have a prior (You have a dataset and you want it to work on similar images), a (cycle) GAN, or a retrained Stable diffusion model could work.

I am unsure about VAEs for your problem, since you usually train them by having the same input and output. You shouldn't enforce the latent to be only the blue channel, since the the encoder is useless. Training only the decoder site is essentially what GANs and diffusion networks do so i would start there.

1

ilrazziatore t1_jcgy9ya wrote

Model uncertainty. One model is a calibrated bnn ( i splitted the dataset in a training, a calibration and a test set), the other model is a mathematical model developed considering some physical relation. For computational reasons the bnn assume iid samples normally distributed around their true values and maximize the likelihood (modeled as a product of normal distribution), the mathematical model instead rely on 4 coefficients and is fitted using Monte Carlo with a multivariate likelihood with the full covariance matrix. I wanted to compare the quality of the model uncertainty estimates but I don't know if I should do it on the test dataset for both. Afterall, models calibrated with mcmc methods do not overfit so why split the dataset?

1

LeN3rd t1_jcgzk3c wrote

If it is model uncertainty, the bnn should only assume distributions only for the model parameters, no? If you make the samples a distribution, you assume data uncertainty. Also I do not know exactly what you other model gives you, but as long as you get variances, I would just compare those at first. If the models give vastly different means, you should take that into account. There is probably some nice way to add this ensemble uncertainty with the uncertainty of the models. Also this strongly means that one model is biased and does jot give you a correct estimate of the model uncertainty.

1

ilrazziatore t1_jch3vpu wrote

Uhm..... the bnn are built assuming distribution both on th parameters( ie the value assumed by the neurons weights) and on the data (the last layer has 2 outputs : the predicted mean and the predicted variance. Those 2 values are then used to model the loss function which is the likelihood and is a product of gaussians. I think its both model and data uncertainty.

Let's say I compare the variances and the mean values predicted.

Do I have to set the same calibration and test dataset apart for both models or use the entire dataset? The mcmc model can use the entire dataset without the risk of overfitting but for the bnn it will be like cheating

1

josejo9423 t1_jchq421 wrote

I am not quite familiar with deep learning but don’t you have loss function where you can maximize recall precision or AUC? I believe accuracy would not apply in this case since you have imbalanced dataset, also over sampling as it dealed in random forest you are making up new images i don’t know how good is that, why don’t you try under sampling better or weight adjustments?

1

Batteredcode t1_jci3t9m wrote

Great, thank you so much for a detailed answer. Do you have anything you could point me to (or explain further) about how I could modify a diffusion method to do this?
Also, in terms of the VAE, I was thinking I'd be able to feed 2 channels in and train it to output 3 channels, I believe the encoder wouldn't be useless in this case and hence my latent would be more than merely the missing channel? Feel free to correct me if I'm wrong! My assumption is that even with this a NN may well perform better, or at least a simpler baseline. That said, my images will be similar in certain ways, so being able to model a distribution of the latents could prove useful presumably?

1

LeN3rd t1_jcitswg wrote

The problem with your VAE idea is, that you cannot apply the usual loss function of having the difference between the input and the output, and thous a lot of nice theoretical constraints go out of the window afaik.

https://jaan.io/what-is-variational-autoencoder-vae-tutorial/

​

I would start with a cycleGAN:

https://machinelearningmastery.com/what-is-cyclegan/

Its a little older, but i personally know it a bit better than diffusion methods.

​

With the free to use StableDiffusion model you could use it to conditionally inpaint on your image, though you would have to describe what is on that image in text. You could also train your own diffusion model, though you need a lot of training time. Not necessarily more than a GAN, but still.

It works by adding noise to an image, and then denoising it again and again. For inpainting you just do that for the regions you want to inpaint (your R and G channel), and for the regions you wanna stay the same as your original image, you just take the noise that you already know.

1

Odibbla t1_jcj24kc wrote

I did this when I was in Robomaster AI challenge. My solution is to use YOLOv3, which should be enough for the task you are asking for. The flow is: you mark the symbol by yourself, train YOLO step by step(all version should work actually, v3 is just my option). Take in video stream, YOLO will output the exact location of that sign in the frames. I did it on Jetson Nano and that is smooth. Since you got a degree, you shouuld be fully capable of doing this. Good luck!

2

shiva_2176 t1_jcjovuc wrote

Could someone please recommend a machine learning algorithm to create a "Flood Risk Matrix"? Additionally, any article or video tutorial on this subject that elaborates on methodology is highly desired.

1

MirrorBredda t1_jckho6w wrote

Subject: Template to create new library with Scikit Learn Fit Predict API style

Hi every1ne,

I have seen so many packages re-using the fit.predict API style Scikit Learn came up with which is the most popular nowadays.
I was reckoning whether there was a sort of Python Github template project to fork and start from? It would be to create a new library based on such fit.predict style but as alone researcher in the project, we are trying to find the optimal development sprints to avoid loosing time re-creating the wheel.

Best wishes,

1

f-d-t777 t1_jckpovo wrote

Subject: Spacecraft image analysis using computer vision

​

Hi guys,

Im looking to develop a system that uses computer vision algorithms to analyze images captured by spacecraft cameras and identify potential safety hazards or security threats. For example, the system could detect debris or other objects in orbit that could pose a risk to spacecraft.

I am looking to do this using all AWS tools. I am pretty new to this and am developing a technology architecture project around this topic to present for a program I'm doing.

How would I go about approaching/doing this? I am looking to find/create my own mock datasets as well as present the alogrithm/code I used to train my model. More specifically, I am focusing on these aspects for my project:

Preprocess the images: Preprocess the images to improve their quality and prepare them for analysis. This could include cropping, resizing, and adjusting the brightness and contrast of the images.

Train the computer vision algorithms: Train the computer vision algorithms using the dataset of images. There are various computer vision techniques that could be used, such as object detection, segmentation, or classification. The specific technique will depend on the requirements of the system.

​

In addition, it would be cool to have some sort of hardware/interactive portion that actually utilizes a camera to detect things in space. That can be implemented into the system. Once the computer vision algorithms have been trained and evaluated, implement the system. This could involve integrating the algorithms into a larger software system that can process images captured by spacecraft cameras in real-time.

Thank you

1

gonomon t1_jclwgtg wrote

Subject: Generating Synthetic Data for Human Action Recognition
Hello,

In my master's thesis, I generated a realistic dataset that
can be used for human action recognition (using the Unity engine). The dataset
contains 2D - 3D pose information and RGB videos. I wanted to test the effects
of this dataset on real-world action detection (directly on videosYouTube) when
the classifier is trained with synthetic data in addition to real-data (NTU
120).
I want to use skeleton-based action recognition methodology
(since it outperforms RGB-only methodologies for NTU 120) and to achieve this I
applied a pose estimator to videos from YouTube, our synthetic dataset, and
NTU120 and trained them since I believe instead of using directly sterile
ground truth information of our dataset, I can apply pose estimator and use
those pose informations directly instead of worrying with domain adaptation
strategies.
Question is: Should I have directly used ground truth pose
information of our synthetic data in trainings with real-data, or the thing I
did does make sense? If there is any usage of pose estimators as domain
adaptation methods, I would be extremely happy if you can share the papers when
commenting.
Best,

1

myself991 t1_jcmhn3k wrote

Hi everybody,

I forgot to submit my file for a conference, but cmt3 submission section was open about 45 minutes passed the deadline. Therefore, I could upload it there.

I was wondering if anybody had any experience with submitting supplementary material to cmt3 for a conference an hour after the deadline? Are they going to remove the paper, although they kept the uploading section open?

Also, do conferences normally set deadline in cmt3 a little more than after deadline?

Thanks,

1

sinazyo t1_jco44zo wrote

Hi guys! why isn't Facebook's Bart model studied like the larger models of the GPT family?

Is it just because BERT is superior in discriminative model and GPT is superior in generative model?

I like the BART model, but it's a pity that I haven't seen much research related to it. Please let me know if there are any studies on barts with more parameters.

1

jakderrida t1_jcotnis wrote

The basis of this rule of thumb is that having too few observations relative to the number of predictor variables can lead to unstable estimates of the model parameters, making it difficult to generalize to new data. In particular, if the number of observations is small relative to the number of predictor variables, the model may fit the noise in the data rather than the underlying signal, leading to overfitting.

1

josejo9423 t1_jcpu2pe wrote

I would go with 1 but I would no tune early stopping just the number of estimators , xgbboost has the option of stopping iterations (early stopping) when there are no improvements in the metric, if you plot then what model believes and realizes that could have been stopped early , step up that number that you consider before overfitting

1

LeN3rd t1_jct6arv wrote

Ok, so all of these are linear ( logistics) regression models, for which it makes sense to have more data points, because the weights aren't as constraint as in a convolutional layer I.e. but it is still a rule of thumb, not exactly a proof.

1

EcstaticStruggle t1_jcthdzz wrote

Thanks. This was something I tried earlier. I noticed that using the maximum number of estimators almost always lead to the highest cross validation score. I was worried there would be some overfitting as a result.

1

Jonathan358 t1_jcuh7ya wrote

Hello, I have a very simple question but cannot find any info on:

How to create an exponential range (squared) for hyperparameter values to be tuned? E.g. from 2-64, increament in steps of 2^2?

Not looking for a complicated solution involving lists, ect.

ff_dim=hp.Int('ff_dim', min_value=2, max_value=64, step=n^2)

edit: solved with, sampling="log"

1

rylo_ren_ t1_jcvak4c wrote

Hi everyone! This is a simple troubleshooting question. I'm in my master's program for python and I keep running into an issue when I try running this code for a linear regression model:

airfares_lm = LinearRegression(normalize=True)

airfares_lm.fit(train_X, train_y)

print('intercept ', airfares_lm.intercept_) print(pd.DataFrame({'Predictor': X.columns, 'coefficient': airfares_lm.coef_}))

print('Training set') regressionSummary(train_y, airfares_lm.predict(train_X)) print('Validation set') regressionSummary(valid_y, airfares_lm.predict(valid_X))

It keeps returning this error:

---------------------------------------------------------------------------

TypeError Traceback (most recent call last) /var/folders/j1/1b6bkxw165zbtsk8tyf9y8dc0000gn/T/ipykernel_21423/2993181547.py in <cell line: 1>() ----> 1 airfares_lm = LinearRegression(normalize=True) 2 airfares_lm.fit(train_X, train_y) 3 4 # print coefficients 5 print('intercept ', airfares_lm.intercept_)

TypeError: init() got an unexpected keyword argument 'normalize'

I'm really lost, any help would be greatly appreciated! I know there's other ways to do this but I was hoping to try to use this technique since it's the primary way that my TA codes regression models. Thank you!

1

suineg t1_jcwhvs3 wrote

I'm curious on the feasibility of a concept before I start going down the road. I am also unsure if maybe there is already a project that I should look into.

There is a fantasy book series that I enjoy and it's 10 books and 3.3M words (I don't have a character count). The world and characters are complicated and their interactions with other characters is sometimes pretty obscure. I want to make a dynamic wiki and search tool for two things.

Phase 1 - Ingest all of the text and start building out character profiles, book profiles, etc. The front end would tag information based on what book so if you've only ready up to book 7 you don't get 8-10 spoiled. You could give it a parameter like "list all the battles character a and character b are in together".

Phase 2 - This would be the difficult portion much later on and I'm not focused on it yet. You could get ask it something like "give me a view of character b after event_32" and based on the descriptions it would generate art. You could also give it things like "give me a scene of character b, d, and h at the battle of event_40" and it would generate one based on that stored event.

1

disastorm t1_jcwyjyv wrote

I noticed that "text-generation" models have variable output but alot of other models like chatbots and other models often give the exact same response for the same input prompt. Is there a reason for this, or perhaps is there a setting that would allow a chatbot for example to have variable responses, or is my understanding of this just wrong?

1

henkje112 t1_jcxjx44 wrote

I'm assuming you're using sklearn for LinearRegression. You're initializing an instance of the LinearRegression class with a normalize parameter, but this is not valid for this class (for a list of possible parameters, see the documentation).

I'm not sure what you're trying to do, but I think you want to normalize your input data? In that case you should ook at MinMaxScaler. This transforms your features by scaling each feature to a given range.

1

SnooMarzipans3021 t1_jcxk1a3 wrote

Hello, does anyone have experience with vision transformers?

I get wierd grid artifacts, especially on white / bright, textureless walls or sky.

Here is how it looks like: https://imgur.com/a/dwF69Z3
Im using maxim architecture: https://github.com/vztu/maxim-pytorch

My general task is image enchancement (make image prettier)
I have also tried simple GAN methods https://github.com/eezkni/UEGAN which doesnt have such issues

I have researched a bit but im unable to formualte this problem properly. I have found that guided filters might help here but havent tested them yet. Thanks

1

henkje112 t1_jcxlc7t wrote

Look into Convolutional Neural Networks as your architecture type and different types of spectrograms as your input features. The different layers of the CNN should do the feature transformation, and your final layer should be dense, with a softmax (or any other desired) activation function.

2

darthstargazer t1_jcxqcgw wrote

Subject : Variational inference and genarative networks

I've been trying to grasp the ideas behind Variational auto encoders (Kingma et al) vs normalized flows (E.G RealNVP)

If someone can explain the link between the two I'd be thankful! Aren't they trying to do the same thing?

1

trnka t1_jcyped6 wrote

Some systems output the most probable token in each context, so those will be consistent given a prompt. Traditionally that could lead to very generic responses.

So it's common to add a bit of randomness into it. The simplest approach is to generate tokens according to their probability. There are many other variations on this to allow more control over how "creative" the generator can be.

1

Papaya_lawrence t1_jcyqbyw wrote

I will be teaching a class of about 18 students. Each student will need to train their own StyleGAN2 model towards the end of the semester and I'm trying to figure out which platform I want them to use. These students will be coming from different disciplines and so ideally we'd use something like Google Colab because then they could easily work off of my code, avoid learning how to ssh into a virtual machine, using bash commands, etc. And for context, this is not a technical course so I'm more concerned with ease of use than having a detailed introduction to using a virtual/remote machine. The other parts of this course involve more reading & discussion on the history of Generative Art. So I see training their own model as a chance to bring in a hands-on approach to thinking with and about Machine Learning in a creative context. I can propose a budget to my institution so it is possible that I use a paid platform (although logistically, it may be more difficult to figure out how to allocate funds to different accounts). I've looked at Paperspace's Gradient tool as well. I know apps like RunwayML would allow students to train a model code-free, but my concern is that Runway uses transfer learning and I kind of want them to only train the model on their own data that they've collected. I'm curious if any of you have suggestions or anecdotes from your own personal experience using different platforms. Thanks in advance!

1

Xotchkass t1_jczkfku wrote

What are the input length of the Llama model? Can't find it anywhere.

1

djmaxm t1_jd05tgt wrote

I have a 4090 with 32GB of system RAM, but I am unable to run the 30B model because it exhausts the system memory and crashes. Is this expected? Do I need a bunch more RAM? Or am I doing something dumb and running the wrong model. I don't understand how the torrent model, the huggingface model, and the .pt file relate to each other...

3

SomeLongWindedIdiot t1_jd07i7z wrote

Why is AI safety not a major topic of discussion here and in similar communities?

I apologize if the non-technical nature of my question is inappropriate for the sub, but as you’ll see from my comment I think this is very important.

I have been studying AI more and more over the past months (for perspective on my level that consists of Andrew Ng’s Deep Learning course, Kaggle competitions and simple projects, reading a few landmark papers and digging into transformers) The more I learn, the more I am both concerned and hopeful. It seems all but certain to me that AI will completely change life as we know it in the next few decades, quite possibly the next few years if the current pace of progression continues. It could change life to something much, much better or much, much worse based on who develops it and how safely they do it.

To me safety is far and away to most important subfield in AI now, but is one of the least discussed. Even if you think there is a low chance of AI going haywire on its own, in my admittedly very non-expert view it’s obvious that we should be also concerned about the judgment and motives of the people developing and controlling the most powerful AIs, and the risks of such powerful tools being accessible to everyone. At the very least I would want discussion on actionable things we can all do as individuals.

I feel a strong sense of duty to do what I can, even if that’s not much. I want to donate a percentage of my salary to funding AI safety, and I am looking whether I can effectively contribute with work to any AI safety organizations. I have a few of my own ideas along these lines; does anyone have any suggestions? I think we should also discuss ways to shift the incentives of major AI organizations. Maybe there isn’t a ton we can do (although there are a LOT of people looking, there is room for a major movement), but it’s certainly not zero.

3

killerstorm t1_jd0yzdj wrote

Have people tried doing "textual inversion" for language models? (i.e not in a context of StableDiffusion)

1

VS2ute t1_jd1hjeo wrote

Are Nvidia Tesla GPUs made for immersion cooling? I notice these things don't have fans going back quite a few models. So you would need to add screaming server fans to cool them by air. I presume new datacentres use immersion cooling to reduce electricity consumption.

1

ViceOA t1_jd20dzj wrote

>Look into Convolutional Neural Networks as your architecture type and different types of spectrograms as your input features. The different layers of the CNN should do the feature transformation, and your final layer should be dense, with a softmax (or any other desired) activation function.

Thanks for your precios advices, im grateful!

1

YouAgainShmidhoobuh t1_jd2qmh1 wrote

Not entirely the same thing. VAEs offer approximate likelihood estimation, but not exact. The difference here is key - VAEs do not optimize the log-likelihood directly but they do so through the evidence lower bound, an approximation. Flow based methods are exact methods - we go from an easy tractable distribution to a more complex one, guaranteeing at each level that the learned distribution is actually a legit distribution through the change of variables theorem.

Of course, the both (try) to learn some probability distribution of the training data, and that is how they would differ from GAN approaches that do not directly learn a probability distribution.

For more insight you might want to look at https://openreview.net/pdf?id=HklKEUUY_E

2

Gody_ t1_jd4ak8v wrote

Hello guys, would you consider this supervised or unsupervised learning?

I am using Keras LSTM to generate new text, by tokenizing it, making n-grams from it and training the LSTM to predict the next word (token) by putting n-1 n-grams as a train sample, and as "labels" I am putting the last word (token) of the n-gram. Would you consider this supervised or unsupervised ML?

Technically, I do have a label for every n-gram, its own last word, but the dataset itself was not labeled beforehand. As I am new to ML I am a little bit confused and even ChatGPT sometimes says that its supervised, and sometimes unsupervised ML.

Thanks for any answers.

0

asterisk2a t1_jd59igg wrote

Question about ML research breakthroughs and narratives.

AlexNet was not the first and not the fastest and not the CNN that won the most prices - using Nvidia GPU CUDA cores for acceleration. Then why is it so often named as the 'it' paper in the popular MSM & AI YouTube Channels narrative around AI? Even Jensen Huang, CEO of Nvidia mentioned it in his keynote.

Is it because AlexNet can be traced back to 'Made in America' and sold to Google? And co-author is Chief Science Officer at OpenAI? And the others aren't.

2

Lucas_Matheus t1_jd5j1co wrote

In few-shot learning, are there gradient updates from the examples? If not, what difference does it make?

1

neriticzone t1_jd5se2v wrote

Feedback on stratified k fold validation

I am doing some applied work with CNNs in the academic world.

I have a relatively small dataset.

I am doing 10 fold stratified cross validation(?) where I do an initial test-train split, and then the data in the train split is further cross validated to a 10 fold train-validate split.

I then run the ensemble of 10 train models against the test split, and I select the results from the best performing model against the test data as the predicted values for the test data.

Is this a reasonable strategy? Thank you!

1

RainbowRedditForum t1_jd64o4e wrote

A CRNN is trained with logmel as input, calculated as follows:
the input audio is split in 30ms frames with 10ms hop size, and 40 logmel are calculated for each frame.
The CRNN performs a binary classification.
With this setup, are these two considerations true?

  • two consecutive output labels generated by the CRNN are associated with two overlapped audio frames (each of size 30ms (0.03s) and hop size 10ms);
  • for 10 minutes audio the CRNN should generate about 30000 output labels, each one associated with a 30ms frame with 10ms of overlap
1

disastorm t1_jd66swu wrote

I see thanks, is that basically the equivallent of having "top_k" = 1?

Can you explain what these mean. From what I understand top_k means it considers the top K number of possible words at each step.

I can't exactly understand what top_p means, can they be use together?

1

Bornaia t1_jd7f50p wrote

Everyone is speaking about AI content, creative stories, texts.. but do companies or people in the real world actually use it for their products?

1

trnka t1_jd82eo1 wrote

If you're using some API, it's probably best to look at the API docs.

If I had to guess, I'd say that top_k is about the beam width in beam search. And top_p is dynamically adjusting the beam width to cover the amount of the probability distribution you specify.

top_k=1 is probably what we'd call a greedy search. It's going left to right and picking the most probable token. The sequence of tokens selected in this way might not be the most probable sequence though.

Again, check the API docs to be sure.

All that said, these are just settings for discovering the most probable sequence in a computationally efficient way. It's still deterministic and still attempting to find the most probable sequence. What I was describing in the previous response was adding some randomness so that it's not deterministic.

1

throwaway2676 t1_jd8qe6f wrote

When training LLMs to write code, is it standard to just make indentation and new line their own tokens? Like '<\n>' and <\ind>' or something?

Follow up: Are there any good models on HuggingFace that specialize in writing and explaining code?

2

GaryS2000 t1_jd9tanf wrote

For my final year uni project I need to train a TensorFlow CNN on the FER-2013 dataset. When training the model on data from the .csv file instead of locally stored images the model trains significantly faster, with around 10 seconds per epoch as opposed to 10 minutes or so for the images. My question is it okay for me to use .csv data instead of locally stored images for this image classification task? I know I won't be able to apply data augmentation as easily but I can't think of any other downsides which would disqualify me from using the .csv data instead of the images

1

sore__ t1_jdaskxp wrote

I want to make an AI Chatbot similar to OpenAI's DaVinci 3 but my own version & offline. I'm trying to use Python but I don't know what intents I should add to it, because I want it to know basically everything. Is it possible to just feed the code everything on Wikipedia? I'm VERY VERY new to machine learning so this might be overambitious but idk it just seems fun. Anyways, if anyone has ideas, please reply :)

1

weaponized_lazyness t1_jdbvolq wrote

Is there a subreddit for more academic discussions on ML? This space has now been swarmed by LLM enthusiasts, which is fine but it's not the content I was looking for.

2

GaryS2000 t1_jdc74v5 wrote

Like I said the .csv data. Its the same data as the image dataset with one of thr columns containing the pixel values of the images, meaning it can reconstruct the image from the file.

1

andrew21w t1_jdcb0vo wrote

Why nobody uses polynomials as activation functions?

My mere perception is that polynomials are the best since they can approximate nearly any kind of function you like? So they're perfect....

But why aren't they used?

2

GaryS2000 t1_jdcd6xq wrote

Yeah the csv file has three columns separated into emotion, pixels, and usage. Emotion corresponds to the labels whereas usage corresponds to training/test/val, and the pixels column is made up of all of the pixel values used to make the image. It seems to produce much quicker training times than using the images, which is my main reason for wanting to use it. Training on .csv takes around 10 seconds per epoch whereas images take 10 minutes or so.

They both produce the same result, a trained model which can make predictions on facial expressions, however its felt weird throughout the entire process that the model trains so quick, you know? I've been led to believe that machine learning is an extremely time intensive process but for me it hasn't took long at all, so I was wondering if there's some fundamental error with using the .csv data instead of the images. Hopefully it should be fine though, I don't see the issue myself if it produces the same result.

1

rikiiyer t1_jddanig wrote

The 30B params of the model are going onto your GPUs VRAM (which should be 24GB), which is causing the issue. You can try loading the model in 8bit which could reduce size

1

jarmosie t1_jddmvp9 wrote

What are you some informative blogs, RSS feed or newsletter you've subscribed to for regular content on Machine Learning? In general, the Software Development community has an abundance of people maintaining high quality online content through individual blogs or newsletter.

I know there's Towards Data Science & Machine Learning Mastery to name a few but what other lesser known yet VERY informative resource did you stumble across & one which has help you further you knowledge even more?

1

underPanther t1_jddpryu wrote

Another reason: wide single-layer MLPs with polynomials cannot be universal. But lots of other activations do give universality with a single hidden layer.

The technical reason behind this is that non-discriminatory discriminatory activations can give universality with a single hidden layer (Cybenko 1989 is the reference).

But polynomials are not discriminatory (https://math.stackexchange.com/questions/3216437/non-trivial-examples-of-non-discriminatory-functions), so they fail to reach this criterion.

Also, if you craft a multilayer percepteron with polynomials, does this offer any benefit over fitting a Taylor series directly?

2

mcAlt009 t1_jde6kby wrote

What's the VM I can rent out with a GPU. Ideally I want a VM where I can train models, host websites, etc. Location isn't too important

1

lightyagami03 t1_jde9kv5 wrote

is it even worth trying to break into AI/ML now as a CS student or has everything already been/will be solved in the near future? like the jump from GPT3.5 to 4 was insane, soon GPT 5 will roll out and it'll be even better, and GPT6 might as well be AGI, at which point there wouldnt be anything to work towards

1

Nyanraltotlapun t1_jdeltnm wrote

Long story short, main property of complex systems is the ability to pretend and mimic. So the real safety of AI lies in its physical limitations (compute power algos etc.) the same limitations that makes them less useful less capable. So the more powerful AI is the less safe it is. There more danger it poses. And it is dangerous alright. More dangerous than nuclear weapons is.

1

TiredMoose69 t1_jdf31vk wrote

Why does LlaMa 7B (pure) perform so MUCH better than Alpaca 30B (4bit)?

1

JimiSlew3 t1_jdfk435 wrote

Nublet question: is there anything linking LLMs and data analyst and visualizations yet? I saw a bit with MS Copilot and Excel. I want to know if there is anymore advanced in the works. Thanks!

2

nth_citizen t1_jdgl589 wrote

I'm not aware of anything like this and depending on your vision I can certainly see something like the first step being reasonable - might be willing to help as it sounds kind of interesting.

1

dotnethero t1_jdgqv13 wrote

Hey everyone, I'm trying to figure out which parts of my code are using CPU and which are using GPU. During training, I've noticed that only about 5% of my usage is on the GPU, while the CPU usage is high. Any tips on how I can better understand what's going on with my code? Thanks in advance!

1

kross00 t1_jdgutr0 wrote

Is it feasible to train Llama 65B (or smaller models) to engage in chit-chatting in a manner that would not readily reveal whether one is conversing with an AI or a human? The AI does not need to answer highly complex questions and could decline them similarly to how a human would.

1

Kaasfee t1_jdhcnlf wrote

Im trying to train yolov7 to detect football(european one) players and the ball. In a typical frame there are lots of players and only one ball. After training it only detects the players. My guess is that it learned to ignore guessing the ball since its statistically irrelevant. Is this assumption correct, and if so how would I go about changing it?

1

LeN3rd t1_jdhe9qb wrote

What language/suite are you using? You can take a look at profilers in your language. I know Tensorflow has some profiling tools and you can look at what operations are running on what device. Probably Torch has some as well. If its more esoteric, just use general language profilers and take a look at what your code is doing most of the time.

2

trnka t1_jdhvzy3 wrote

Eh, we've gone through a lot of hype cycles before and the field still exists. For example, deep learning was hyped to replace all feature engineering for all problems and then NLP would be trivialized. In practice, that was overhyped and you still need to understand NLP to get value out of deep learning for NLP. And in practice, there's still quite a bit of feature engineering (and practices like it).

I think LLMs will turn out to be similar. They'll change the way we approach many problems, but you'll still need to understand both LLMs and more problem-specific aspects of ML.

Back to your question, if you enjoy AI/ML and you're worried about jobs in a few years, I think it's still worth pursuing your interests.

If anything, the bigger challenge in jobs in the next year or two is the current job market.

1

JimiSlew3 t1_jdi0izc wrote

Thanks. I'm curious once we get it to do things. Like, tell it to analyze a giant dataset, and produce a visual of interesting stuff. Some tools I use will offer suggestions and I'm thinking the link between asking a question and getting information will be significantly shortened and wanted to know if anyone had done that yet.

1

Chris_The_Pekka t1_jdi7gqr wrote

Hello everyone, I have a dataset with news articles and real radio-messages written by journalists. Now I want to generate radio-messages that look like real radio-messages so that is must not be done manually anymore. I wanted to use a GAN structure that uses a CNN as Discriminator, and a LSTM as Generator (as literature from 2021 suggested). However, now that GPT has become very strong, I want to use GPT. Could I use GPT as both the Discriminator and the Generator, or only the Generator (using GPT as Generator seems to be good, but I will need to do prompt optimization). Has anyone got an opinion or suggestion (or paper/blog I could read into that I might have missed)? I am doing this for my thesis and it would help me out greatly. Or maybe I am too fixated in using a GAN structure, and you suggest me to look into something else.

1

Prometheushunter2 t1_jdjhln9 wrote

Here’s an oddly specific question: a few years ago I read about a neural network that could both classify an image and, if ran in reverse, could generate synthetic examples of the classes it has learned. Th e problem is I’ve forgotten the name and it’s been haunting me lately, so I ask does anyone know what kind of neural network this might be?

1

jay_hoenes t1_jdk0xl8 wrote

I was wondering if there are any new models like StyleGAN?
I mean, image generation recently became much easier with Text-to-Image models like Stable Diffusion, Midjourney and Dall-E and so on. But I like the general idea of training an own model with a unique input dataset.
I found that there is StyleGAN3, but except one google colab notebook which doesn't work for me, it doesn't seem to be well supported and not really used by people.
Are there any recent alternatives to create a variety of images only based on my personal input images without being trained on huge datasets? Or is it maybe possible with stable diffusion?

1

loly0ss t1_jdloewa wrote

Hello everyone,

I had a very ignorant question which I’m trying to find an answer too but i still couldn’t find it.

In terms of the deep learning model in supervised segmentation vs semi-superised segmentation.

Is the model itself the same in both cases, for example using Unet++ for both? And the only diffference comes during training where we use psuedo-labels for example for semi-supervised segmentation?

Or is the model different when it comes between supervised vs semi-supervised segmentation?

Thank you!

1

RiotSia t1_jdmhn6h wrote

Hey,

I got the 7B llama model running on my machine. Now I want it to analyze a large text for me (a pdf file) like hamata.ai does. How can I do it ? Does any one has like a site with resources on how I can learn to do that or even tell me?

1

yaru22 t1_jdn17j5 wrote

Hello,

GPT4 has context length of 32K tokens while some others have 2-4K tokens. What decides the limit on these context lengths? Is it simply bigger the model, larger the context length? Or is it possible to have a large context length even on a smaller model like LLaMA 7/13/30B?

Thank you!

1

ajingnk t1_jdn5uwr wrote

What is the minimum hardware requirement to fine tune like Stanford Alpaca? I am thinking to build a workstation to do some DL exploration and fine-tuning work. For fine-tuning, I have around 10k samples.

1

sampdoria_supporter t1_jdnwwdd wrote

Does anybody else feel overwhelmed and frozen in the face of all this concurrent development and releases? I can't seem to even jump on much of what is going on because it seems like the next day will just flip the table.

2

LacedDecal t1_jdp6z6y wrote

If one is trying to model something where the “correct” answer for a given set of features is inherently probabilistic—for example the outcome of a baseball plate appearance—how should you tell a neural network to grade it’s accuracy?

For those who aren’t familiar with baseball, the most likely outcome for any plate appearance — even the leagues best batter against the leagues worst pitcher — is some kind of out. Generally somewhere on the order of 60-75% that will be the outcome. So I’m realizing that the most “accurate” set of predictions against literally any dataset of at bats were to predict “out” for every one.

What I’m realizing is that the “correct” answer I’m looking for is a set of probabilities. But how does one apply, say, a loss function involving categorical cross entropy, in any kind of meaningful way? Is there even a way to do supervised learning when the data points “label” isn’t the actual probability distribution but rather one collapsed event for each “true” probability distribution?

Am I even making sense?

Edit: I know I need something like softmax but when I start training it quickly spirals into a case of exploding gradients no matter what I do. I think it’s because the “labels” I’m using aren’t the true probabilities each outcome had, but rather a single hard max real life outcome that actually occurred (home run, out, double, etc).

1

AntelopeStatus8176 t1_jdq1t5p wrote

I have a set of 20.000 raw measurement data slices, each of which
contains 3.000 measurement samplepoints. For each of the data slices,
there is a target value assigned to it. The target values are continous.
My first approach was to do feature engineering on the raw
measurement slices to reduce data and to speed up ML-teaching. This
approach works reasonably well in estimating the target value for
unknown data slices of the testing data set.
My second approach would be to use the raw data slices as input.
On a second thought, this appears to be dramatically computing power
intensive, or at least way more than i can handle with my standard-PC.
To my understanding, this would mean to construct an ANN with 3.000
input nodes and several deep layers.
Can anyone give advice whether teaching with raw measurement data
with this kind of huge datasets does even make sense and if so, which
algorithms to use? Preferably examples in python

1

Camillo_Trevisan t1_jdqjtbp wrote

Hello everyone,

I state that I am a neophyte.

I'm looking for a Machine Learning software that can analyze large datasets composed as follows: 3D surface defined by triplets of XYZ values (at least 150 triplets or more, defined on a regular and constant grid or, possibly, also on an irregular grid, different for each set) and the related outputs, produced by my software, which contain about seventy calculated numerical parameters on that surface. I would like to analyze a few thousand datasets, each consisting of at least 500/600 or more numerical values.

The idea is both to analyze the entered data and also to carry out simulations such as: if I define a new set of output values, which 3D surface could generate them using my software?

The utility is given by the fact that my software takes many hours of calculation to generate a set of output values and also it only works in one direction (input grid -> output values).

Thanks in advance for any suggestion

Camillo

1

yaru22 t1_jdron1b wrote

So it's not an inherent limitation on the number of parameters the model has? Or is that what you meant by more processing power? Do you or does anyone have some pointers to papers that talk about this?

1