Update and expand "moving parts" doc

Sid532001 · Apr 15, 2017 · 637826f · 637826f
1 parent c8fca0e
commit 637826f
Showing 1 changed file with 31 additions and 34 deletions.
diff --git a/doc/movingparts.rst b/doc/movingparts.rst
@@ -4,22 +4,25 @@ The moving parts
 html5lib consists of a number of components, which are responsible for
 handling its features.
 
+Parsing uses a *tree builder* to generate a *tree*, the in-memory representation of the document.
+Several tree representations are supported, as are translations to other formats via *tree adapters*.
+The tree may be translated to a token stream with a *tree walker*, from which :class:`~html5lib.serializer.HTMLSerializer` produces a stream of bytes.
+The token stream may also be transformed by use of *filters* to accomplish tasks like sanitization.
 
 Tree builders
 -------------
 
 The parser reads HTML by tokenizing the content and building a tree that
-the user can later access. There are three main types of trees that
-html5lib can build:
+the user can later access. html5lib can build three types of trees:
 
-* ``etree`` - this is the default; builds a tree based on ``xml.etree``,
+* ``etree`` - this is the default; builds a tree based on :mod:`xml.etree`,
  which can be found in the standard library. Whenever possible, the
  accelerated ``ElementTree`` implementation (i.e.
  ``xml.etree.cElementTree`` on Python 2.x) is used.
 
-* ``dom`` - builds a tree based on ``xml.dom.minidom``.
+* ``dom`` - builds a tree based on :mod:`xml.dom.minidom`.
 
-* ``lxml.etree`` - uses lxml's implementation of the ``ElementTree``
+* ``lxml`` - uses the :mod:`lxml.etree` implementation of the ``ElementTree``
  API. The performance gains are relatively small compared to using the
  accelerated ``ElementTree`` module.
 
@@ -31,21 +34,15 @@ You can specify the builder by name when using the shorthand API:
  with open("mydocument.html", "rb") as f:
  lxml_etree_document = html5lib.parse(f, treebuilder="lxml")
 
-When instantiating a parser object, you have to pass a tree builder
-class in the ``tree`` keyword attribute:
+To get a builder class by name, use the :func:`~html5lib.treebuilders.getTreeBuilder` function.
 
-.. code-block:: python
-
- import html5lib
- parser = html5lib.HTMLParser(tree=SomeTreeBuilder)
- document = parser.parse("<p>Hello World!")
-
-To get a builder class by name, use the ``getTreeBuilder`` function:
+When instantiating a :class:`~html5lib.html5parser.HTMLParser` object, you must pass a tree builder class via the ``tree`` keyword attribute:
 
 .. code-block:: python
 
  import html5lib
- parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder("dom"))
+ TreeBuilder = html5lib.getTreeBuilder("dom")
+ parser = html5lib.HTMLParser(tree=TreeBuilder)
  minidom_document = parser.parse("<p>Hello World!")
 
 The implementation of builders can be found in `html5lib/treebuilders/
@@ -55,17 +52,16 @@ The implementation of builders can be found in `html5lib/treebuilders/
 Tree walkers
 ------------
 
-Once a tree is ready, you can work on it either manually, or using
-a tree walker, which provides a streaming view of the tree. html5lib
-provides walkers for all three supported types of trees (``etree``,
-``dom`` and ``lxml``).
+In addition to manipulating a tree directly, you can use a tree walker to generate a streaming view of it.
+html5lib provides walkers for ``etree``, ``dom``, and ``lxml`` trees, as well as ``genshi`` `markup streams <https://genshi.edgewall.org/wiki/Documentation/streams.html>`_.
 
 The implementation of walkers can be found in `html5lib/treewalkers/
 <https://github.com/html5lib/html5lib-python/tree/master/html5lib/treewalkers>`_.
 
-Walkers make consuming HTML easier. html5lib uses them to provide you
-with has a couple of handy tools.
+html5lib provides a few tools for consuming token streams:
 
+* :class:`~html5lib.serializer.HTMLSerializer`, to generate a stream of bytes; and
+* filters, to manipulate the token stream.
 
 HTMLSerializer
 ~~~~~~~~~~~~~~
@@ -90,15 +86,14 @@ The serializer lets you write HTML back as a stream of bytes.
  '>'
  'Witam wszystkich'
 
-You can customize the serializer behaviour in a variety of ways, consult
-the :class:`~html5lib.serializer.htmlserializer.HTMLSerializer`
-documentation.
+You can customize the serializer behaviour in a variety of ways. Consult
+the :class:`~html5lib.serializer.HTMLSerializer` documentation.
 
 
 Filters
 ~~~~~~~
 
-You can alter the stream content with filters provided by html5lib:
+html5lib provides several filters
 
 * :class:`alphabeticalattributes.Filter
  <html5lib.filters.alphabeticalattributes.Filter>` sorts attributes on
@@ -110,11 +105,11 @@ You can alter the stream content with filters provided by html5lib:
  the document
 
 * :class:`lint.Filter <html5lib.filters.lint.Filter>` raises
- ``LintError`` exceptions on invalid tag and attribute names, invalid
+ :exc:`AssertionError` exceptions on invalid tag and attribute names, invalid
  PCDATA, etc.
 
 * :class:`optionaltags.Filter <html5lib.filters.optionaltags.Filter>`
- removes tags from the stream which are not necessary to produce valid
+ removes tags from the token stream which are not necessary to produce valid
  HTML
 
 * :class:`sanitizer.Filter <html5lib.filters.sanitizer.Filter>` removes
@@ -125,9 +120,9 @@ You can alter the stream content with filters provided by html5lib:
 
 * :class:`whitespace.Filter <html5lib.filters.whitespace.Filter>`
  collapses all whitespace characters to single spaces unless they're in
- ``<pre/>`` or ``textarea`` tags.
+ ``<pre/>`` or ``<textarea/>`` tags.
 
-To use a filter, simply wrap it around a stream:
+To use a filter, simply wrap it around a token stream:
 
 .. code-block:: python
 
@@ -142,9 +137,11 @@ To use a filter, simply wrap it around a stream:
 Tree adapters
 -------------
 
-Used to translate one type of tree to another. More documentation
-pending, sorry.
+Tree adapters can be used to translate between tree formats.
+Two adapters are provided by html5lib:
 
+* :func:`html5lib.treeadapters.genshi.to_genshi()` generates a `Genshi markup stream <https://genshi.edgewall.org/wiki/Documentation/streams.html>`_.
+* :func:`html5lib.treeadapters.sax.to_sax()` calls a SAX handler based on the tree.
 
 Encoding discovery
 ------------------
@@ -156,14 +153,14 @@ the following way:
 * The encoding may be explicitly specified by passing the name of the
  encoding as the encoding parameter to the
  :meth:`~html5lib.html5parser.HTMLParser.parse` method on
- ``HTMLParser`` objects.
+ :class:`~html5lib.html5parser.HTMLParser` objects.
 
 * If no encoding is specified, the parser will attempt to detect the
  encoding from a ``<meta>`` element in the first 512 bytes of the
  document (this is only a partial implementation of the current HTML
- 5 specification).
+ specification).
 
-* If no encoding can be found and the chardet library is available, an
+* If no encoding can be found and the :mod:`chardet` library is available, an
  attempt will be made to sniff the encoding from the byte pattern.
 
 * If all else fails, the default encoding will be used. This is usually