Skip to content

XQuery 3.0 module for exposing Apache Tika file parsing capabilities supporting over a 1000 file types!

Notifications You must be signed in to change notification settings

james-jw/xq-tika

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 

Repository files navigation

xq-tika

XQuery 3.0 (java bindings) module for exposing Apache Tika parsing capabilities to xquery. Tika currently supports over a 1000 file types including popular office formats.

Installation

Automated

Use xqpm to do it for you!

xqpm xq-tika
Manually
  1. Download the latest verison of the Tika-app.jar file.

  2. Add the file to your class path or if using BaseX simply add the file to the BaseX\lib folder.

Note in Windows: When launching BaseX as the GUI. Ensure to use the batch files located in BaseX\Bin folder, as opposed to the gui executable. The batch files ensure all jar files in the lib folder are added to the class path.

  1. Clone this repository to your local machine and import the xq-tika.xqm module into your project.

Functionality

The xq-tika module currently exposes two core methods: parse and parse-lines.
Upon execution, the type of file is automatically detected with text contents returned utilizing the Tika libraries.

parse($path as xs:string) as xs:string
parse-lines($path as xs:string) as xs:string*

To support large files, and reduce memory footprint, a max string length can be specified resulting in the document only being parsed up to the length specified.

parse($filePath as xs:string, $maxStringLength as xs:string) as xs:string
parse-lines($filePath as xs:string, $maxStringLength as xs:string) as xs:string*

Example

import module namespace tika = "https://xq-tika";
tika:parse('c:\my-word-document.doc')

Shout Out!

If you like what you see here please star the repo and follow me on github or linkedIn

Happy Parsing!

About

XQuery 3.0 module for exposing Apache Tika file parsing capabilities supporting over a 1000 file types!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages