Skip to content

Standards-compliant library for parsing and serializing HTML documents and fragments in Python

License

Notifications You must be signed in to change notification settings

agafonovdmitry/html5lib-python

Repository files navigation

html5lib is a pure-python library for parsing HTML. It is designed to
conform to the HTML 5 specification, which has formalized the error handling
algorithms of popular web browsers.

 = Installation =

html5lib is packaged with distutils. To install it use:
 $ python setup.py install

 = Tests =

You may wish to check that your installation has been a success by
running the testsuite. All the tests can be run by invoking
runtests.py in the html5lib/tests/ directory

 = Usage =

Simple usage follows this pattern:

import html5lib
f = open("mydocument.html")
parser = html5lib.HTMLParser()
document = parser.parse(f)


More documentation is avaliable in the docstrings or from
http:https://code.google.com/p/html5lib/wiki/UserDocumentation

 = Bugs =

Please report any bugs on the issue tracker:
http:https://code.google.com/p/html5lib/issues/list

 = Get Involved =

Contributions to code or documenation are actively encouraged. Submit
patches to the issue tracker or discuss changes on irc in the #whatwg
channel on freenode.net

About

Standards-compliant library for parsing and serializing HTML documents and fragments in Python

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages

  • Python 98.6%
  • Shell 1.4%