PDA

View Full Version : Converting a PDF to Plain Text - Question



Dennis Peacock
03-15-2021, 9:57 PM
I get about 8 two page documents each month that I need to convert from PDF to plain text. I don't want to pay a subscription fee to enable me to do this. I've already tried an online service but the text file.....all spaces are filled with a "."....so then I have to pull it into another program or use my "vi" editor to search and remove all the periods in each doc. I thought that MS Word would "import" a pdf file and make it "word" format but I haven't been successful with this either.

Advice?

Mike Henderson
03-15-2021, 11:51 PM
I Googled "convert pdf to text" and got about seven results that claim to be free. You could try them one at a time to see if they will give you what you want.

The problem, I think, is that some of them may just do an OCR of the pdf to give you the ability to edit the text. However, I did see one that claimed to convert to MS Word.

Mike

John K Jordan
03-16-2021, 12:37 AM
I get about 8 two page documents each month that I need to convert from PDF to plain text. I don't want to pay a subscription fee to enable me to do this. I've already tried an online service but the text file.....all spaces are filled with a "."....so then I have to pull it into another program or use my "vi" editor to search and remove all the periods in each doc. I thought that MS Word would "import" a pdf file and make it "word" format but I haven't been successful with this either.

Advice?

Is the text simply formatted? I usually just select the text in Adobe Acrobat Reader, select and copy (ctrl-a, ctrl-c), and paste into Notepad or better, WinWord. Don't get the photos and the line breaks in the PDF are carried to the word processor but for many things it works fine. I use MS Word 2000.

JKJ

Nicholas Lawrence
03-16-2021, 8:03 AM
I get about 8 two page documents each month that I need to convert from PDF to plain text. I don't want to pay a subscription fee to enable me to do this. I've already tried an online service but the text file.....all spaces are filled with a "."....so then I have to pull it into another program or use my "vi" editor to search and remove all the periods in each doc. I thought that MS Word would "import" a pdf file and make it "word" format but I haven't been successful with this either.

Advice?

Maybe the person who creates them can make it easier for you.

If they are scanned, you will need something that does OCR. If they create it from word or a similar program, you should be able to just cut and paste without any significant formatting issues.

Lots of people still scan stuff because they do not know how to "print to PDF" or even easier just save as a PDF.

Brian Tymchak
03-16-2021, 9:20 AM
..all spaces are filled with a "."

Some editors offer an option to set the character for field separator. Maybe there's an option to do the same for blanks/spaces in the tool you are using?

....I'm surprised vi is still around. It was my favorite editor 35 years ago.

Dennis Peacock
03-16-2021, 9:21 AM
John,

It's music chord charts. :)

Dm ------ G --- A ---- B
You know what words I speak

Like that...except without the dashes for spacing :) . So spacing is important as well.

Dennis Peacock
03-16-2021, 9:24 AM
Brian,
Where I work, we are a heavy Linux/Unix shop. VI is strong here. :) Some have moved to using VIM, but some of us still stick with VI. :D

Brian Tymchak
03-16-2021, 9:33 AM
Brian,
Where I work, we are a heavy Linux/Unix shop. VI is strong here. :) Some have moved to using VIM, but some of us still stick with VI. :D

Yep, I grew up on Unix. Is nawk stll being used? We prototyped a lot of phone and early data network management functionality in nawk.

Jim Becker
03-16-2021, 10:33 AM
Select the text with your rodent, and copy it to Word or whatever text processor you want to use...

David Bassett
03-16-2021, 11:26 AM
Yep, I grew up on Unix. Is nawk stll being used? We prototyped a lot of phone and early data network management functionality in nawk.

(Linux box running Ubuntu 16.04 LTS: )

% which awk
/usr/bin/awk

% which nawk
/usr/bin/nawk


So, I guess so, though I haven't used either since the 80's.

Kev Williams
03-16-2021, 11:39 AM
Foxit (pdf) reader has a nifty 'select text and image' feature--
454512

click on it and you can now sweep any text on the page, and
when you release the mouse button, a handy 'copy' icon pops up
to click on--
454513

I pasted what I just swept into Word AND Notepad--
454514
sometimes one works better than the other, but in this case they worked the same, although Word formatted the text correctly for size, etc...
You can sweep and copy any combination of text you like :)

As Nicholas pointed out above, if your pdf doesn't contain actual text, an OCR program can translate photos of text INTO text...

Bob Coates
03-16-2021, 4:56 PM
I opened a PDF and was able to copy the text as John suggested and paste it into LibreOffice. The pictures did not copy/paste, but was able to copy and paste the pictures as a separate task. Slow if lot of pictures but it works. LibreOffice is free.
Bob

Dennis Peacock
03-18-2021, 7:12 PM
Thanks for all the tips and responses. Much appreciated!