InitialWalrus t1_iwuetbz wrote on November 18, 2022 at 12:53 PM

Reply to comment by dwightsrus in [D] Simple Questions Thread by AutoModerator

https://pypi.org/project/PyPDF2/ This python library will allow you to convert the pdf to a string (assuming it is text readable. If it's not text readable you'll need to look into OCR, optical character recognition).

dwightsrus t1_iwuq4um wrote on November 18, 2022 at 2:29 PM

Thanks for the suggestion. My challenge is that each pdf is not structured the same way. Would love to get a bunch of them go through a ML training model that spits out the data in the format I need.