Submitted by UberStone t3_1015pjo in MachineLearning

Hi. The recent incredible improvements in AI and ML has resurrected an old project of mine to read the back panel of electronic components like this AV receiver and spit out logically formed text to describe each I/O. I have a LOT of direct experience this specific issue as well as general software experience but no AI/ML development experience.

I know it is possible but on a scale of 1-10 how hard? Any new tools make this easier? Ultimately I want to feed the AI pictures of electronic back panels and get formatted text back.

Thanks!

8

Comments

You must log in or register to comment.

RobbinDeBank t1_j2lu9hi wrote

I don’t think it will be very technically challenging. The most difficult part might be to construct/find a reliable dataset about those electrical components.

15

UberStone OP t1_j2m9per wrote

I have a large amount of electronic component data I can access but I lack any sort of framework for cataloging what the individual connections are. I envision some sort of an app where a human could label individual points on a photo as a way of training.

1

Worth-Advance-1232 t1_j2mgewg wrote

Keep in mind that manually labelling a large set of data is very tedious, but your approach of building an app sounds like a good solution.

2

FHIR_HL7_Integrator t1_j2mzl50 wrote

This is actually a great idea. It would be really cool if you could take pictures of really old components and boards, label the points and they'd be added to the data set. What I'm saying is you should farm this out to component and electronics subs. Figure out and streamline the tagging process first, then crowdsource it. Good luck. I really think all the technology is consumer level enough for this to be achieved!

1

JimmyTheCrossEyedDog t1_j2lzmhs wrote

As someone who knows next to nothing about electronic components, can you provide some example inputs and outputs? Without knowing what the exact problem is, it's hard to determine feasibility.

Off the top of my head, if the symbolic language is quite simple (i.e., every symbol acts more or less independently of each other, so you can just tack the text for what each does one after another), you can essentially do this with optical character recognition or some computer vision approach and just use a simple set of rules to translate each visual detection into text. If the language of how these diagrams work is more complicated than that, though, it may not be so simple.

3

UberStone OP t1_j2m8zyl wrote

Good question. Basically every connection on a back panel would generate the connection type, signal type, label and input or output (or both). A simple example would be an AV Receiver with three HDMI inputs labeled VIDEO 1, VIDEO 2 and VIDEO 3. The ML/AL would recognize the actual HDMI pin, look at the corresponding label matching the pin and produce the following output.

IN-HDMI-HDMI-VIDEO1
IN-HDMI-HDMI-VIDEO2
IN-HDMI-HDMI-VIDEO3

This is good for about 80% of the components the other 20% of the connections are edge cases.

1

muffdivemcgruff t1_j2m3pi4 wrote

Literally 6 lines of code with the appropriate frameworks.

3

UberStone OP t1_j2ma2sg wrote

What would it take to feed it an image and get text out like the comment above?

1

Worth-Advance-1232 t1_j2mh0vy wrote

You would need to preprocess the pictures anyway, so it should be quite easy to only get text back, especially if your labels are already formatted the way you want your output to be. This would probably mean you only need to decode your output.

1

Codac123 t1_j2mno8p wrote

Your data will be you getting the coordinates of the corners that form a box around each input you want it to recognize and then having it labeled whatever input it is. Basically you're using an object recognition model. The model will end up learning where to come up with box coordinates around an input and tell you which input it is. Here's a simple example to get you thinking. https://d2l.ai/chapter_computer-vision/bounding-box.html

2

clickmeimorganic t1_j3ca1wx wrote

Look at connectionist temporal classification loss, and a CNN.

1

evanthebouncy t1_j2lsn7h wrote

Wait 1 year until we have something like chatgpt but with vision integrated in. Currently it's typically an ocr followed by some nlp. But in a year it can be as simple as give a few examples of what you want done (few shot prompting) in a single model hosted online somewhere

I'd wait a bit more.

−7