PDFREADER

Reading and rendering PDF files in Common Lisp.

Notes

All I wanted to do was extract text from some PDFs. This has turned out to be surprisingly difficult, and has led me through the detours of dealing with (selected parts of) compression, embedded font parsing and, while I was doing that, rendering a visual representation (which is, after all, what PDFs are all about).

These are very early days for the project.

Because PDF is a binary format that pretends to be ASCII at some points, I use reader macros #! and #", which are analagous to #\ and " in that they take a character or a string, but return an octet (instead of a Lisp character) or a vector of octets (instead of a Lisp string). There is a naive octets= for comparison of octet vectors. (Because Gnu Emacs doesn't understand these reader macros, parsing with #!( and #!) confuses it mightily, which is why I use the actual char-codes in certain parts.)

Because I wrote the parsing code in a PEEK-and-READ style, and beacuase there isn't a PEEK-BYTE in the standard, there are simple octet-file and octet-vector streams in utils.lisp that implement a version of PEEK-BYTE based on EdiWare. In particular, it uses a different (less annoying?) order of arguments from PEEK-CHAR.

Acknowledgements

I continue to be inspired by the following work:

cl-pdf (license)
pdf.js (license)
poppler (license) and other descendants of Glyph & Cog's XPDF
mupdf (license)

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ccitt.lisp		ccitt.lisp
cmap.lisp		cmap.lisp
content.lisp		content.lisp
device.lisp		device.lisp
document.lisp		document.lisp
encoding.lisp		encoding.lisp
filter.lisp		filter.lisp
font.lisp		font.lisp
glyphlist.txt		glyphlist.txt
graphics-state.lisp		graphics-state.lisp
lexer.lisp		lexer.lisp
object.lisp		object.lisp
package.lisp		package.lisp
page.lisp		page.lisp
pdfreader.asd		pdfreader.asd
readtable.lisp		readtable.lisp
text-device.lisp		text-device.lisp
utils.lisp		utils.lisp
vecto-device.lisp		vecto-device.lisp
vecto-fix.lisp		vecto-fix.lisp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDFREADER

Notes

Acknowledgements

About

Releases

Packages

Languages

License

splittist/pdfreader

Folders and files

Latest commit

History

Repository files navigation

PDFREADER

Notes

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages