Skip to content

tsfn/doc2txt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 

Repository files navigation

通过分析二进制的OLE结构得到doc中的WordDocument Stream,Table Stream等部分, 然后用其中的某些字段得到文本和格式信息。

Compilation

$ make

Encoding

the extracted text is encoded in UTF-16. ANSI is not supported.

About

extract text from MS-WORD's .doc binary format file

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published