ViceOA t1_jccekvp wrote on March 15, 2023 at 8:47 PM

Precious Advices About AI-supported Audio Classification Model

Hello everyone,I'm Omer.
I am new in this group and writing from Turkey. I need very valuable advice from you precious researchers.
I am a PhD program student in the department of music technology. I have been working in the field of sound design and audio post-production for about 8 years. For the last 6 months, I have been doing research on AI-supported audio classification.My goal is to design an audio classifier to be used in the classification of audio libraries. Let me explain with an example as follows; I have a sound bank with 30 different classes and 1000 sounds in each class (such as bird, wind, door closing, footsteps etc.).
I want to train an artificial neural network with this sound bank. This network will produce labels as output. I also have various complex signals (imagine a single sound track with different sound sources like bird, wind, fire, etc.). When I give a complex signal to this network for testing, it will give me the relevant labels.I have been doing research on this system for 6 months and if I succeed, I want to write my PhD thesis on this subject. I need some advice from you, my dear friends, about this network. For example, which features should I look at for classification? Or what kind of artificial intelligence algorithm should I use?
Any advice you say you should definitely read this article or that article on this subject.I apologize if I've given you a headache. I really need your advice. Please guide me. Thank you very much in advance.

henkje112 t1_jcxlc7t wrote on March 20, 2023 at 10:13 AM

Look into Convolutional Neural Networks as your architecture type and different types of spectrograms as your input features. The different layers of the CNN should do the feature transformation, and your final layer should be dense, with a softmax (or any other desired) activation function.

ViceOA t1_jd20dzj wrote on March 21, 2023 at 7:14 AM

>Look into Convolutional Neural Networks as your architecture type and different types of spectrograms as your input features. The different layers of the CNN should do the feature transformation, and your final layer should be dense, with a softmax (or any other desired) activation function.

Thanks for your precios advices, im grateful!