Skip to content

Commit

Permalink
Updated documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Ville Tuulos committed Apr 8, 2009
1 parent 94d5178 commit d671495
Show file tree
Hide file tree
Showing 5 changed files with 115 additions and 1 deletion.
4 changes: 3 additions & 1 deletion doc/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ also for development in general. It is highly recommended that you test
your functions first locally with :mod:`homedisco`, before running them
in the normal distributed Disco environment.

.. _reduceonly:
.. _outputtypes:

How can I output arbitrary Python objects in map and reduce, not only strings?
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Expand All @@ -71,6 +71,8 @@ If you want to output arbitrary objects in your reduce function, set also
:func:`disco.core.result_iterator` to read results, set its *reader* parameter
to :func:`disco.func.object_reader`.

.. _reduceonly:

Do I always have to provide a function for map and reduce?
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
*Updated for Disco 0.2 which supports the reduce-only case*
Expand Down
1 change: 1 addition & 0 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Background
intro
overview
FAQ <faq>
releases
glossary

Getting started
Expand Down
11 changes: 11 additions & 0 deletions doc/py/core.rst
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,17 @@ anymore. You can delete the unneeded job files as follows::

Returns a dictionary containing information about the job *name*.

.. method:: Disco.oob_get(name, key)

Returns an out-of-band value assigned to *key* for the job *name*.
The key-value pair was stored with a :func:`disco_worker.put` call
in the job *name*.

.. method:: Disco.oob_list(name)

Returns all out-of-band keys for the job *name*. Keys were stored by
the job *name* using the :func:`disco_worker.put` function.

.. method:: Disco.wait(name[, poll_interval, timeout, clean])

Block until the job *name* has finished. Returns a list URLs to the
Expand Down
74 changes: 74 additions & 0 deletions doc/py/disco_worker.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,80 @@ As job functions are imported to the :mod:`disco_worker` namespace
for execution, they can use functions in this module directly without
importing the module explicitely.

.. _oob:

Out-of-band results
-------------------
*(new in version 0.2)*

In addition to standard input and output streams, map and reduce tasks can
output results through an auxiliary channel called *out-of-band results* (OOB).
In contrast to the standard output stream, which is sequential, OOB results
can be accessed by unique keys.

Out-of-band results should not be used as a substitute for the normal output
stream. Each OOB key-value pair is saved to an individual file which waste
space when values are small and which are inefficient to random-access in bulk.
Due to these limitations, OOB results are mainly suitable, e.g for outputting
statistics and other metadata about the actual results.

To prevent rogue tasks from overwhelming nodes with a large number of OOB
results, each is allowed to output 1000 results (:func:`put` calls) at maximum.
Hitting this limit is often a sign that you should use the normal output stream
for you results instead.

You can not use OOB results as a communication channel between concurrent tasks.
Concurrent tasks need to be independent to preserve desirable fault-tolerance
and scheduling characteristics of the map/reduce paradigm. However, in the
reduce phase you can access OOB results produced in the preceding map phase.
Similarly you can access OOB results produced by other finished jobs, given
a job name.

You can retrieve OOB results outside tasks using the :meth:`disco.core.Disco.oob_list` and
:meth:`disco.core.Disco.oob_get` functions.

.. function:: put(key, value)

Stores an out-of-band result *value* with the key *key*. Key must be unique in
this job. Maximum key length is 256 characters. Only characters in the set
``[a-zA-Z_\-:0-9]`` are allowed in the key.

.. function:: get(key, [job])

Gets an out-of-band result assigned with the key *key*. The job name *job*
defaults to the current job.

Given the semantics of OOB results (see above), this means that the default
value is only good for the reduce phase which can access results produced
in the preceding map phase.


Utility functions
-----------------

.. function:: this_partition()

For a map task, returns an integer between *[0..nr_maps]* that identifies
the task. This value is mainly useful if you need to generate unique IDs
in each map task. There are no guarantees about how ids are assigned
for map tasks.

For a reduce task, returns an integer between *[0..nr_reduces]* that
identifies this partition. You can use a custom partitioning function to
assign key-value pairs to a particular partition.

.. function:: this_host()

Returns jostname of the node that executes the task currently.

.. function:: this_master()

Returns hostname and port of the disco master.

.. function:: this_inputs()

List of input files for this task.

.. function:: msg(message)

Sends the string *message* to the master for logging. The message is
Expand Down
26 changes: 26 additions & 0 deletions doc/releases.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@

Release notes
=============

Disco 0.2 (April 7th 2009)
--------------------------

New features
''''''''''''

- :ref:`oob`: A mechanism to produce auxiliary results in map/reduce tasks.
- Map writers, reduce readers and writers (see :meth:`disco.core.Disco.new_job`): Support for custom result formats and internal protocols.
- Support for arbitrary output types: :ref:`outputtypes`.
- Custom task initialization functions: Ssee *map_init* and *reduce_init* in :meth:`disco.core.Disco.new_job`.
- Jobs without inputs i.e. generator maps: See the `raw:https://` protocol in :meth:`disco.core.Disco.new_job`.
- Reduces without maps for efficient join and merge operations: See :ref:`reduceonly`.

Bugfixes
''''''''

- ``chunked = false`` mode produced incorrect input files for the reduce phase (commit db718eb6)
- Shell enabled for the disco master process (bug #7, commit 7944e4c8)
- Added warning about unknown parameters in ``new_job()`` (bug #8, commit db707e7d)
- Fix for sending invalid configuration data (bug #1, commit bea70dd4)
- Fixed missing ``msg``, ``err`` and ``data_err`` functions (commit e99a406d)

0 comments on commit d671495

Please sign in to comment.