Submitted by AutoModerator t3_122oxap in MachineLearning

Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead!

Thread will stay alive until next one so keep posting after the date in the title.

Thanks to everyone for answering questions in the previous thread!

16

Comments

You must log in or register to comment.

fishybird t1_jdtf6jh wrote

Anyone else bothered by how often LLMs are being called "conscious"? in AI focused YouTube channels and even in this very sub, comments are getting dozens of upvotes for saying we're getting close to creating consciousness.

I don't know why, but it seems dangerous to have a bunch of people running around thinking these things deserve human rights simply because they behave like a human.

6

pale2hall t1_jdva40w wrote

Great point! I
actually really enjoy AIExplained's videos on this. There are a bunch of different ways ways to measure 'consciousness' and many of them are passed by GPT4, which really just means we need new tests / definitions for AI models.

4

fishybird t1_jdvy0h1 wrote

Well yeah that's the whole problem! Why are we even calling them "tests for consciousness"? Tests for consciousness don't exist and the only reason we are using the word "consciousness" is pure media hype. If an AI reporter even uses the word "conscious" I immediately know not to trust them. It's really sad to see that anyone, much less "experts", are seriously discussing whether or not transformers can be conscious

3

colincameron49 t1_jega9ag wrote

I have 0 experience with machine learning but looking to solve a problem I have and wondering if ML might not be the solution. Looking for some guidance on tools and how to get started on the project as quickly as possible. I work in agriculture and some portion of my time is reviewing pesticide labels for certain attributes. I have tried different document parsing platforms but the labels between manufacturers are all slightly different so structure has been hard to nail down. The other issue is I am specifically looking for certain key words in these documents as my company sells products that can be paired with pesticides to make them work better. I am hoping to build a workflow where I could drop a PDF into a folder have software spit out some sort of structure surrounding ingredients and instructions while flagging the keywords. I am decently proficient in no-code platforms if one such exists for my problem. Thanks in advance for any guidance. If this is the wrong subreddit for this I also apologize.

3

Various_Ad7388 t1_jdsp2q8 wrote

Hey @all if I am just starting off in machine learning what should I learn first Tensorflow or PyTorch or other?? Also once Im more experienced where do I go from there?

2

Matthew2229 t1_jduyuw9 wrote

I think either is probably fine to learn. Both have roughly the same set of features at this point. TF used to be the pre-dominant framework, but PyTorch has gained popularity over the past few years. Now if it'll stay that way or there will be a new trend in the future, no one can say for sure.

1

Various_Ad7388 t1_jdvxeer wrote

Hey thanks Matthew! Do you know why PyTorch has gained popularity?? Is it just the hot new thing or is there actual features and aspects that are dramatically better

1

gmork_13 t1_je7h3rc wrote

Having started with TF and moved to torch myself, torch was just easier to work with when doing something a bit out of the ordinary. Since then it has gained in popularity and with popularity comes lots of walkthroughs, documentation, videos guides and research papers with github repos.

1

gmork_13 t1_je7gvde wrote

Definitely start with torch. It works all the way up, just start building more complex things.

1

zaemis t1_jdtm2zm wrote

I'm going to train a gpt model (distilgpt2) in a language other than english. At this point I'm just teaching it the language - not worrying about further abilities such as Q&A, I expect that to be later with fine-tuning. Anyway, my dataset is currently a csv with [id, text] and each text is a paragraph.

It is my understanding that only 512 characters/tokens are going to be fed in (depending on my max_length, but my point is that it'll probably be less than the entire length of the paragraph), and beyond that will be ignored. If I were to break the paragraphs into 512-word chunks, I could make better use of the dataset. But most likely those subsequent chunks wouldn't start a phrase or sentence - it'd be starting in the middle of a sentence.

For example, "The quick brown fox jumped over the lazy sleeping dog." might be broken up into two samples. "The quick brown fox jumped over the lazy" and "sleeping dog."

Is it a problem if I use text samples that don't "start properly?"

2

masterofn1 t1_jdu8jug wrote

How does a Transformer architecture handle inputs of different lengths? Is the sequence length limit inherent to the model architecture or more because of resource issues like memory?

2

Matthew2229 t1_jduyi8o wrote

It's a memory issue. Since the attention matrix scales quadratically (N^2) with sequence length (N), we simply don't have enough memory for long sequences. Most of the development around transformers/attention has been targeting this specific problem.

2

topcodemangler t1_jduuhcf wrote

Is there any real progress on the JEPA architecture proposed and pushed by LeCun? I see him constantly bashing LLMs and saying how we need JEPA (or something similar) to truly solve intelligence but it has been a long time since the initial proposition (2 years?) and nothing practical has come out of it.

​

It may sound a bit aggressive but that was not my intention - the original paper really sparked my interest and I agree with a lot that he has to say. It's just that I would want to see how those ideas fare in the real world.

2

Dartagnjan t1_jdzo44q wrote

Is anyone in need of machine learning protégé? I am looking for a doctorate position in the German and English speaking worlds.

My experience is in deep learning, specifically GNNs applied to science problems. I would like to remain in deep learning, broadly but would not mind changing topic to some other application, or to a more theoretical research project.

I am also interested in theoretical questions, e.g. given a well defined problem (e.g. the approximation of the solution of a PDE), what can we say about the "training difficulty", is optimization at all possible (re. Tangent kernel analysis), how architectures help facilitate optimization, and solid mathematical foundations of deep learning theory.

I have a strong mathematical background with knowledge in functional analysis and differential geometry, and also hold a BSc in Physics, adjacent to my main mathematical educational track.

Last week I also started getting into QML with pennylane and find the area also quite interesting.

Please get in touch if you think I could be a good fit for your research group or know an open position that might fit my profile.

2

thomasahle t1_je14a0c wrote

Are there any "small" LLMs, like 1MB, that I can include, say, on a website using ONNX to provide a minimal AI chat experience?

2

thedamian t1_je5eweg wrote

Before answering the question, I would submit that you should be thinking of keeping your models behind an api. No need to have it sitting on the client side (which is why it feels you're asking the quesiton)

And behind an API it can be as big as you'd like or can afford on your server)

2

RandomScriptingQs t1_je3lv1g wrote

Is anyone able to contrast MIT's 6.034 "Artificial Intelligence, Fall 2010" versus 18.065 "Matrix Methods in Data Analysis, Signal Processing, and Machine Learning, Spring 2018"?
I'm wanting to use the one that lies slightly closer to the more theoretical/foundational side as supplementary study and have really enjoyed listening to both Instructors in the past.

2

james_mclellan t1_je5ru4r wrote

Two questions :

(1) Does anyone create missing data when constructing models? Examples - searchjng for stronger relationships between data set and first and second derivatives of time series data, compairsons to same day of week last N periods, same holiday last N periods; examining distance to an urban center for geodata

(2) Does anyone use a model that falls back on functions when a match is not 100%? For example, "apple" may mean fruit, music, machines, music companies or machine companies -- instead of a number 0 to 1 of the probable meaning, does anyone use models where the code "performs a test" to better disambiguate?

2

gmork_13 t1_je7fmm8 wrote

I'm assuming you don't mean missing values in your dataset.

  1. You can create 'missing' data, but if you create the missing data out of the data you already give to the model you're sort of doing the work for it. For compute efficient reasons you might want to avoid giving it 'unnecessary' data. What is unnecessary can be hard to define. Think about what you want the model to grasp in the first place.

  2. I'm not sure what you mean by performing a test. If you were to train a language model the context of the word would define its meaning. You can always take the output probs of a model and do something with that if you'd like (for instance, if it's lots of low probability alternatives - do something).

1

Nobodyet94 t1_je6pqi2 wrote

Can you advise me a Vision Transfomer project to present at university? Thanks!

2

gmork_13 t1_je7eo4s wrote

Does it have to be a transformer?
Have a look at this model, but it's difficult to answer your question without knowing the compute you have access to: https://paperswithcode.com/method/deit

Browse that site for some alternatives.

1

Nobodyet94 t1_jealqfl wrote

Thanks, well I have a 1660 gtx and 16 gb of ram, and yes it has to be a transformer used for vision. The fact is that I am not creative enough to choose a project ahah.

1

gmork_13 t1_jebwta4 wrote

Just pick the one that doesn't require too much compute (don't go for too high res images) and make sure you can find tutorials or guides for it.

1

Nobodyet94 t1_jegotr3 wrote

is that paper you wrote before fine to replicate? How should I start?

1

itsyourboiirow t1_jefs1oh wrote

People/organizations to follow on Twitter with all things machine learning (traditional, deep neural networks, LLM, etc)

2

russell616 t1_jdrbedj wrote

Dumb question that's probably asked multiple times. But where should I continue in learning ML? I went through the tensorflow cert from Coursera and am yearning for more. Just don't know where to go now without a structured curriculum.

1

Username2upTo20chars t1_jdrxowm wrote

Try a Kaggle competition for some practical experience of applying ML to already cleaned data. There is always published code of other competitors and Kaggle has also tutorials.

2

gmork_13 t1_je7hec6 wrote

What are you interested in?
I'd recommend covering some classification and generation using images and text, with several different models and data sets.

1

Chris_The_Pekka t1_jdrfr4l wrote

Hello everyone, I have a dataset with news articles and real radio-messages written by journalists. Now I want to generate radio-messages that look like real radio-messages so that is must not be done manually anymore. I wanted to use a GAN structure that uses a CNN as Discriminator, and a LSTM as Generator (as literature from 2021 suggested). However, now that GPT has become very strong, I want to use GPT. Could I use GPT as both the Discriminator and the Generator, or only the Generator (using GPT as Generator seems to be good, but I will need to do prompt optimization). Has anyone got an opinion or suggestion (or paper/blog I could read into that I might have missed)? I am doing this for my thesis and it would help me out greatly. Or maybe I am too fixated in using a GAN structure, and you suggest me to look into something else.

1

Username2upTo20chars t1_jdrydqw wrote

I am confused about your mention of GAN structure. If you want to generate natural language text, use a pretrained Large Language Model. You probably have to finetune it for best use, as you don't have access to the giant ones, which do very well with zero-shot prompting.

Some LLMs, there is also RWKV-4 and FAIRs LLama

1

Username2upTo20chars t1_jdrzb3d wrote

Are there any websites/articles/blogs/forums with proven prompt formats for ChatGPT and co you can recommend.

Especially ones for programming/refactoring/tests... and general error messages (operating system, installation, crashes).

I am just starting to look into using ChatGPT or alternatives.

I have found a page with ranked jailbreak prompts for ChatGPT so far.

1

Kush_McNuggz t1_jdsdy2y wrote

I'm learning the very basics of clustering and classification algorithms. From my understanding, these use hard cutoffs to set boundaries between the groups in the outputs. My question is - do modern algorithms allow for smoothing or "adding weight" to the boundaries, so they are not just hard cutoffs? And if so, are there any applications where you've seen this done?

1

Matthew2229 t1_jduz7mi wrote

When you're clustering or classifying, you are predicting something discrete (clusters/classes), so it's unclear what you mean by removing these hard cutoffs. There must be some kind of hard cutoff when doing clustering/classification unless you are okay with something having a fuzzy classification (e.g. 70% class A / 30% class B).

1

Kush_McNuggz t1_jdvwik4 wrote

Ah ok thanks, I see now. I didn't know the correct term for fuzzy classification but that's what I was trying to describe.

1

kross00 t1_jdui6ot wrote

Can AlphaTensor be utilized to solve math problems beyond matrix multiplication algorithms?

1

AlgoTrade t1_jdwa6it wrote

Hey everyone, I am looking for a way to take some old maps and overlay them using google's overlay features.
Google is kind enough to overlay the maps for me if I give precise lat/long boundaries on the image, but i'm unsure of some of those lat/long values. Moving and centering the map works fine for me, but is extremely manual. I was wondering if there are any tools or techniques that exist to auto tag maps/lines/boundaries? Any information helps, or even just a few key search terms to look for!
Thanks!

1

ReasonablyBadass t1_jdx7f88 wrote

I still remember the vanishing/exploding gradient problem. It seems to be a complete non issue now. Was it just Relus and skip connections that sovled it?

1

OnlyAnalyst9642 t1_jdxlki8 wrote

I have a very specific problem where I am trying to forecast tomorrow's electricity price with an hourly resolution (from tomorrow at midnight to tomorrow at 11pm). I need to forecast prices before 10AM today. Electricity prices have very strong seasonality (24 hours) and I am using the whole day of yesterday and today up to 10AM as an input to the model (an input of 34 hours). In tensorflow terms (https://www.tensorflow.org/tutorials/structured_data/time_series) my input width is 34, the offset is 14 and the label width is 24.

Since I only care about the predictions I get at 10AM for the following day, should I only train my model with the observations available at 10am?

I am pretty sure this has been addressed before. Any documentation/resources that consider similar problems would help

Thanks in advance!

1

MammothJust4541 t1_jdz0nxj wrote

If I wanted to make a system that takes an image and transforms it into the style of another image what sort of ML model would I want to use?

1

shiuidu t1_jdzc9o6 wrote

I have a project I want to build a natural language interface to. Is there a simple way to do this? It's a .net project but I have a python project I want to do the same thing for?

1

GirlScoutCookieGrow t1_jdzvasi wrote

OpenAI API? It's not clear exactly what you need

1

shiuidu t1_je3iozf wrote

I'm not too sure either, I don't know enough about how APIs are connected to LLMs. Do you know what I should search for implementing the API so it can control the program?

1

SnooMarzipans3021 t1_jdzehws wrote

Hello, does anyone have suggestions on how to do guided image upsacling?
Basically I have 6000x6000 image which im unable to load in network because of GPU memory. I had this idea of resizing the image to something like 1500x1500 and then upscaling it back to 6000x6000. But I have to do it without losing details and dont want to use super resolution models (im ofraid they will hallucinate and inpaint). If I already have the ground truth resolution, how can I use it to guide the upscaling?

1

GirlScoutCookieGrow t1_jdzv85v wrote

I'm not sure I understand what you hope to accomplish. If you have the full size image, why do you want to downscale and upscale? This won't help you fit the full image on the GPU

1

SnooMarzipans3021 t1_je3x9ah wrote

Im unable to load full res image into the model and train it even with batch size 1 and all sorts of optimizations. My idea is to add two small modules to my network. One at the front which downscales the image and one at the back which upscales the image.

The problem here is the upscaling, it will need to be some sort of super resolution model.

1

alyflex t1_je4uq2y wrote

Another solution is to use a memory efficient neural network: https://arxiv.org/pdf/1905.10484.pdf With this type of neural network you can easily fit those size images into your neural network. However the problem with them is that they are very difficult to make (you manually have to code up the backpropagation). So depending on your math proficiency and ambitions this might just be too much.

1

SnooMarzipans3021 t1_je96o53 wrote

Thank you for the suggestion. At first glance it does seem overwhelming, I will check it out. The problem im solving has to be rapidly tested, i will run out of time implementing this.

1

DrinkHumblyDumbly t1_je0ifon wrote

What type of data drift that describes the changes to future data partially due to the deployment of ML models?

1

RecoilS14 t1_je0x3ud wrote

I’m a new hobbiest programmer and have spent the last month or so learning python (CS50, Mosh, random Indian guys, etc) and currently also watching the Stanford ML/DL lectures on YouTube.

I have started to learn ML, Pytorch, and some Tensorflow, along with how Tensors and vectors works with ML.

I am wondering if anyone can point me in the direction of other aspects of ML/DL/Neural Networks that I may be missing out on. Perhaps a good series that goes in to length on these subjects via lectures and not just to programming side of it so I can further understand the concepts.

I’m sure there’s lots of things I’m missing out on my journey and I some perspective would be nice.

1

alyflex t1_je4u0rr wrote

It really depends what you are intending to use this for. There are many sides to machine learning, but you don't have to know all of them. To name a few very different concepts:

MLOps (Corsera has an excellent series on this) Reinforcement learning GANs Graph neural networks

I would say that once you have an idea about what most of these topics involve it is time to actively dive into some of them by actually trying to code up solutions in them, or downloading well known github projects and trying to run them yourself.

1

Ricenaros t1_jeax41q wrote

I would suggest picking up either pytorch or tensorflow and sticking with one of these while you learn (personally I'd choose pytorch). It'll be easy to go back and learn the other one if needed once you get more comfortable with the material.

1

3Street t1_je30hpn wrote

Do we expect businesses to be able to fine-tune training chat gpt or other big models with their own data sets? Has this been discussed or rumoured at all? Or is it already happening? I may have missed something.

1

patniemeyer t1_je5v9m7 wrote

Yes, in fact OpenAI offers an API for this right now: https://platform.openai.com/docs/guides/fine-tuning

It *appears* from the terminology that they are using that they are actually performing training on top of their model with your data (which you supply in json). They talk about learning rate and epochs, etc. as params, however I have not seen a real doumentation of what they are doing.

2

3Street t1_je5wk7v wrote

Interesting, thank you! The link only seems to mention gpt 3, though? I wonder if / when they'll offer for gpt4

1

patniemeyer t1_je5wxiv wrote

The pricing page lists GPT-4. I think it was just added in the past day or two. (I have not confirmed that you can actually access it though)

EDIT: When I query the list of models through their API I still do not see GPT4, so maybe it's not actually available yet... or maybe I'm querying the wrong thing.

1

disastorm t1_je8lm7w wrote

I have a question about reinforcement learning, or more specifically gym-retro ( i know gym is pretty old now I guess ).

In the case of gym-retro, if you give a reward to the AI, are they actually looking at a set of variables and saying like "oh I pressed this button while all of these variables were these values and got this reward, so I should press it when all these variables are similar" or are they just saying like "oh I pressed this button and got this reward, so I should press it more often"?

1

sparkpuppy t1_je8v49k wrote

Hello! Super-n00b question but I couldn't find an answer on google. When an image generation model has "48 M parameters", what does the term "parameter" mean in this sentence? Tags, concepts, image-word pairs? Does the meaning of "parameter" vary from model to model (in the context of image generation)?

1

Ricenaros t1_jeawpf3 wrote

It refers to the number of scalars needed to specify the model. At the heart of machine learning is matrix multiplication. Consider input vector x of size (n x 1). Here is a Linear transformation: y = Wx + b. In this case, the (m x n) matrix W(weights) and the (m x 1) vector b(bias) are the model parameters. Learning consists of tweaking W,b in a way that lowers the loss function. For this simple linear layer there are m*n + m scalar parameters (The elements of W and the elements of b).

Hyperparameters on the other hand are things like learning rate, batch size, number of epochs, etc.

Hope this helps.

2

sparkpuppy t1_jee9qj3 wrote

Hello, thank you so much for the detailed explanation! Yes, it definitely helps me have a clearer vision of the meaning of that expression. Have a nice day!

1

alpolvovolvere t1_jeavq3v wrote

I'm trying to use Whisper in Python to produce a transcription of an 8-minute Japanese-language mp4. It doesn't really matter which model I use, the script's execution screeches to a halt after a few seconds, going from 9MiB/s to like 200Kib/s. Is this a "thing"? Like is it just something that everyone knows about? Is there a way to make this faster?

1

Adventurous_Win8348 t1_jeazn5c wrote

Hi I want to make a ml model that can listen to the sound of the road and tell that what cars are they like auto or lorry or bus and tell me how many vehicle passed though and give a real-time feedback. I don’t know how to code.

1

qiqitori t1_jedpmjn wrote

I made a tool that makes it a little easier to verify OCRs of hex dumps (not necessarily hex dumps, but that's what I used it for). I'm not exactly an OCR expert, and just wondering if anyone has seen any similar tools:

You feed in segmented images and labels (as produced by some OCR system) and it'll display all images sorted by their class (so for hex dumps, 0, 1, 2, ... , F), which makes it considerably easier to spot mistakes. (You can then drag and drop images that were OCR'd wrong into their correct position and press a button to regenerate and you'll get a corrected hex dump.) At the risk of sounding spammy, the tools are available at https://blog.qiqitori.com/ocr/monospace_segmentation_tool/ (for segmentation if you don't have segmented images yet) and https://blog.qiqitori.com/ocr/verification_tool/, and here's some documentation (and screenshots) on how the tools can be used: https://blog.qiqitori.com/2023/03/ocring-hex-dumps-or-other-monospace-text-and-verifying-the-result/

1

mejdounarodni t1_jeff83b wrote

Hey, I don't know how relevant this is, but is there any voice cloning tools for other important languages aside from English? Such as Spanish, Russian, Mandarin Chinese... Thus far I have only found it for English and I think French. I have seen some sites claiming they work for other languages since arguably you type in the text in any language you want... only the phonemes used to recreate what you have written are those of the English language so it's a bit absurd, really. Any tips would be appreciated.

1

LartoriaPendragon t1_jefiqoi wrote

What programming languages besides Python are often used in industry for machine learning applications or projects? What are some relevant technologies I should be looking to learn?

1

MO_IN_2D t1_jeggm8i wrote

Is there a current AI dedicated to generate vector graphics from raster images?

We’ve seen plenty of raster image generating AIs such as Dall-E or Stablediffusion, but so far I haven’t seen any AI developed to generate good vectors, either from a raster image input or a text string.The fact that AI also stands for Adobe Illustrator makes researching the existing of such tools quite hard on google.

I could see great use in this, since existing image tracing algorithms often only deliver mediocre results, and also generating vectors from text strings could be of great use.To my limited understanding of machine learning, it should be very doable, since vectors are based on clear mathematical paths, easy to build on for the algorithms.

1

fool126 t1_jeh0t01 wrote

What are the dominant methods for solving contextual bandit problems?

1

CormacMccarthy91 t1_jdrsh0g wrote

I have a problem. Bing chat just tried to sell me on Unified Theory of Everything and Quantum Gravity and String theory... I told it those arent based on any evidence and it told me it didnt want to continue the conversation. it wouldnt tell me anything further until i restarted and asked about more specific things... that really scares me, its all monotheistic / consciousness is spiritual not physical stuff its spouting like facts, and when its questioned it just ends the conversation...

i dont know where to talk about it where people wont jump on the spiritual "big bang is just a theory" train. its really unsettling. If i tried do divert it from bringing god into astrophysics it would end the conversation.

its oddly religious. https://ibb.co/W36fjfC

−2

Matthew2229 t1_jduzxv3 wrote

I don't see it professing anything about monotheism, God, or anything like what you mentioned. You asked it about string theory and it provided a fair, accurate summary. It even points out "string theory also faces many challenges, such as the lack of experimental evidence, ...", and later calls it "a speculative and ambitious scientific endeavor that may or may not turn out to be correct". I think that's totally fair and accurate, no?

Despite it mentioning these things, you claim "That's not true" and that string theory is based on zero evidence and is backed by media. Personally, you sound a hell of a lot more biased and misleading than the bot.

2

pale2hall t1_jdvaify wrote

Data In -> Data Out

I don't think they're having any religion re-enforced on them, but think of it this way:

You know how mad some super religious extremists get when you even use words that imply gay people are normal, or trans people exist (and aren't just mentally ill),

Imagine if people got as mad every time someone said "oh my god" or "JFC" etc. This imaginary group would be claiming "micro-reglious-agression" all. day. long.

I think that Abrahamic religious are soooo ubiquitous in the training set that the AI is likely to just go with the flow on it.

1