Hi! I'm looking for a neural network that takes in speech and outputs phonemes. I basically want the first part of a speech-to-text network. I'd like to do this operation in real time. I've had no luck finding a network like this so I'd appreciate any input :)
Input: array of numbers representing the last N seconds of speechOutput: array of IPA-like values for each T milliseconds chunk of input
natfabulous t1_ivbpmj3 wrote
Reply to [D] Simple Questions Thread by AutoModerator
Hi! I'm looking for a neural network that takes in speech and outputs phonemes. I basically want the first part of a speech-to-text network. I'd like to do this operation in real time. I've had no luck finding a network like this so I'd appreciate any input :)
Input: array of numbers representing the last N seconds of speechOutput: array of IPA-like values for each T milliseconds chunk of input