Coverting PDF files into HTML, DOC, LIT


#1

ok i lost many of my e-books after my hdd crash.... i re-downloaded sum of am but i got them in PDF format and i really like LIT format by which i can listen by TEXT-to-SPEECH these books like my mother use to tell stories when i was little..... =D

any how if there is a freeware tool by which i can convert PDF files to html, DOC or LIT please share the information.....

and plz dont mention the select all method cuz that is wrong as it leaves huge gaps after every line and those can get annoying if you are listening a book......


#2

or a software like MICROSOFT READER which supports PDF will work too......


#3

In Ubuntu resp, there is a tool called pdftohtml. Search for it in Synaptic Package Manager, and Install. To convert a file, just type in terminal:

pdftohtml [pdf filename]

Check Adobe Reader for linux also, it has an option to save pdf files to txt, which can then be used in OpenOffice to convert to doc or html.


#4

ok thanxxx...... cool......


#5

ok just tried it but was unable to do so and can you plz tell me where do u keep the file in th first place to convert???? i mean should i keep it in root or on my dekstop or sum place else cause i m reciving the followiong error: Couldn't open file 'Book1.pdf'


#6

adobe has an online tool for what u want

http://www.adobe.com/products/acrobat/access_onlinetools.html


#7

yeah i did send them the pdf files four days ago so far no reply.......


#8

[quote=", post:, topic:"]

ok just tried it but was unable to do so and can you plz tell me where do u keep the file in th first place to convert??? i mean should i keep it in root or on my dekstop or sum place else cause i m reciving the followiong error: Couldn’t open file ‘Book1.pdf’
[/quote]

Use absolute path to the file, eg.

/home/your-username/e-book1.pdf

#9

ok its works but it convert apostrophes into [ ’ ] and thats not gud is it......


#10

[quote=", post:, topic:"]

ok its works but it convert apostrophes into [ ’ ] and thats not gud is it…
[/quote]

There are many ways to fix this. The simplest is to use “Find and Replace” or “Replace all” options in text editors. Another is to use Grep or Perl to fix the file. For example, if you use the following command in terminal, perl should fix all the html files in current directory (Make sure you are in the directory where the .html files are saved). There is a chance that this might not work, as the characters to search are “special” characters.

/usr/bin/perl -p -i -e “s/’/’/g” *.html

#11

[quote=", post:, topic:"]

ok its works but it convert apostrophes into [ ’ ] and thats not gud is it…
[/quote]

I guess the encoding is set to iso-8859-1 or any other variant of the western character set. Try setting the encoding to utf-8. Try any of the following:

1. View -> Character Encoding -> Unicode (UTF-8) in Firefox

2. If there’s a meta tag that defines the content-type and charset in the html file, replace the content-type meta header to

3. Use notepad2 to change encoding of the html file to utf-8. Try iconv on linux. (iconv -f ISO-8859-1 -t UTF-8 *.html – I’m not sure if it will work with *.html though.) Try this in a .sh file otherwise:

#!/bin/bash

for htmlFile in *.html

do

iconv -f iso8859-1 -t utf-8 “$htmlFile”

done


#12

ok ill try it..... but asad's answer is very complicated...... i understand the bit about charecter set encoding bieng different but the convertion bit is gibbirish to me.... i m a linux virgin......


#13

[quote=", post:, topic:"]

ok ill try it… but asad’s answer is very complicated… i understand the bit about charecter set encoding bieng different but the convertion bit is gibbirish to me… i m a linux virgin…
[/quote]

Ok, before trying out anything else, try #1. If it works, move ahead to do the conversion. Or if it’s just a single character that’s causing you the trouble, then replace it as Asad Ahmed suggested. Though, it will be more beneficial to know about character set conversion for future.


#14

nop my encodin charecter is set to UTF-8. and i looked up the torrents and download the lit version so its cool for now but thanxx for your help...

hey is there linux replacement for microsoft reader with text to speech.... i search the repository with 'text to speech' and it came up with dis-ability tools.....


#15

"Espeak" should be already installed in Ubuntu, It is a very nice tool that can convert whole files to wav files as well. Type "espeak --help" for its supported commands. Or try Festival : https://help.ubuntu.com/community/TextToSpeech

Then there is "speex" in Synaptic Package manager and a few others as well. Search for "speech" in it.

You can check Click too: http://clickspeak.clcworld.net/