Skip to content

Historical - Early 2000's hack to "fix" bad codes in UTF-8 encoded XML so that XML parsers will be able to parse it

License

Notifications You must be signed in to change notification settings

zimeon/utf8conditioner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

utf8conditioner
---------------

Takes UTF-8 input from stdin, writes processed UTF-8 to stdout
and errors/warnings to stderr. Developed as a tool for checking and
pre-processing UTF-8 encoded XML responses to OAI-PMH requests
(see https://www.openarchives.org/).

This program aims to `fix' bad codes in UTF-8 encoded XML so that 
XML parsers will be able to parse it (albeit with some corruption 
introduced by substitution of dummy characters in place of illegal 
codes). Even though data may need to be discarded as unreliable it
may still be possible to extract any <resumptionToken> element to
see if a harvest was complete or not.


COMPILING

This program has been developed on Linux using gcc but should compile 
with on other systems with an ANSI C compiler.

On Linux with gcc:

> make	                        [should build utf8conditioner]

> make test                     [will run several tests, should all say PASS]

> ./utf8conditioner -h    	[will display help]

To use another compiler, change the CC = gcc line in the Makefile.


Simeon Warner, [email protected]
$Id: README,v 1.2 2005/10/25 23:18:23 simeon Exp $

About

Historical - Early 2000's hack to "fix" bad codes in UTF-8 encoded XML so that XML parsers will be able to parse it

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published