Skip to content

Commit

Permalink
Initial version of Cassandra connector documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
tobrien authored and electrum committed Sep 9, 2014
1 parent befdb8e commit 8f15589
Show file tree
Hide file tree
Showing 5 changed files with 171 additions and 166 deletions.
110 changes: 0 additions & 110 deletions presto-cassandra/README.md

This file was deleted.

1 change: 1 addition & 0 deletions presto-docs/src/main/sphinx/connector.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ Connectors
.. toctree::
:maxdepth: 1

connector/cassandra
connector/hive
connector/jmx
connector/sys
Expand Down
167 changes: 167 additions & 0 deletions presto-docs/src/main/sphinx/connector/cassandra.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
===================
Cassandra Connector
===================

The Cassandra connector allows querying data stored in Cassandra.

Configuration
-------------

To configure the Cassandra connector, create a catalog properties file
``etc/catalog/cassandra.properties`` with the following contents,
replacing ``host1,host2`` with a comma-separated list of the Cassandra
nodes used to discovery the cluster topology:

.. code-block:: none
connector.name=cassandra
cassandra.contact-points=host1,host2
You will also need to set ``cassandra.native-protocol-port`` if your
Cassandra nodes are not using the default port (9042).

Multiple Cassandra Clusters
^^^^^^^^^^^^^^^^^^^^^^^^^^^

You can have as many catalogs as you need, so if you have additional
Cassandra clusters, simply add another properties file to ``etc/catalog``
with a different name (making sure it ends in ``.properties``). For
example, if you name the property file ``sales.properties``, Presto
will create a catalog named ``sales`` using the configured connector.

Configuration Properties
------------------------

The following configuration properties are available:

================================================== ======================================================================
Property Name Description
================================================== ======================================================================
``cassandra.contact-points`` Comma-separated list of hosts in a Cassandra cluster. The Cassandra
driver will use these contact points to discover cluster topology.
At least one Cassandra host is required.

``cassandra.native-protocol-port`` The Cassandra server port running the native client protocol
(defaults to ``9042``).

``cassandra.thrift-port`` The Cassandra server port running the Thrift client protocol
(defaults to ``9160``).

``cassandra.limit-for-partition-key-select`` Limit of rows to read for finding all partition keys. If a
Cassandra table has more rows than this value, splits based on
token ranges are used instead. Note that for larger values you
may need to adjust read timeout for Cassandra.

``cassandra.max-schema-refresh-threads`` Maximum number of schema cache refresh threads. This property
corresponds to the maximum number of parallel requests.

``cassandra.schema-cache-ttl`` Maximum time that information about a schema will be cached
(defaults to ``1h``).

``cassandra.schema-refresh-interval`` The schema information cache will be refreshed in the background
when accessed if the cached data is at least this old
(defaults to ``2m``).

``cassandra.consistency-level`` Consistency levels in Cassandra refer to the level of consistency
to be used for both read and write operations. More information
about consistency levels can be found in the
`Cassandra consistency`_ documentation. This property defaults to
a consistency level of ``ONE``. Possible values include ``ALL``,
``EACH_QUORUM``, ``QUORUM``, ``LOCAL_QUORUM``, ``ONE``, ``TWO``,
``THREE``, ``LOCAL_ONE``, ``ANY``, ``SERIAL``, ``LOCAL_SERIAL``.

``cassandra.allow-drop-table`` Set to ``true`` to allow dropping Cassandra tables from Presto
via :doc:`/sql/drop-table` (defaults to ``false``).

``cassandra.username`` Username used for authentication to the Cassandra cluster.
This is a global setting used for all connections, regardless
of the user who is connected to Presto.

``cassandra.password`` Password used for authentication to the Cassandra cluster.
This is a global setting used for all connections, regardless
of the user who is connected to Presto.
================================================== ======================================================================

.. _Cassandra consistency: http:https://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html

The following advanced configuration properties are available:

================================================== ======================================================================
Property Name Description
================================================== ======================================================================
``cassandra.fetch-size`` Number of rows fetched at a time in a Cassandra query.

``cassandra.fetch-size-for-partition-key-select`` Number of rows fetched at a time in a Cassandra query that
selects partition keys.

``cassandra.partition-size-for-batch-select`` Number of partitions batched together into a single select for a
single partion key column table.

``cassandra.split-size`` Number of keys per split when querying Cassandra.

``cassandra.partitioner`` Partitioner to use for hashing and data distribution. This
property defaults to ``Murmur3Partitioner``. The other supported
values are ``RandomPartitioner`` and ``ByteOrderedPartitioner``.

``cassandra.thrift-connection-factory-class`` Allows for the specification of a custom implementation of
``org.apache.cassandra.thrift.ITransportFactory`` to be used to
connect to Cassandra using the Thrift protocol.

``cassandra.transport-factory-options`` Allows for the specification of arbitrary options to be passed to
the Thrift connection factory.

``cassandra.client.read-timeout`` Number of milliseconds the Cassandra driver will wait for an
answer to a query from one Cassandra node. Note that the underlying
Cassandra driver may retry a query against more than one node in
the event of a read timeout. Increasing this may help with queries
that use an index.

``cassandra.client.connect-timeout`` Number of milliseconds the Cassandra driver will wait to establish
a connection to a Cassandra node. Increasing this may help with
heavily loaded Cassandra clusters.

``cassandra.client.so-linger`` Number of seconds to linger on close if unsent data is queued.
If set to zero, the socket will be closed immediately.
When this option is non-zero, a socket will linger that many
seconds for an acknowledgement that all data was written to a
peer. This option can be used to avoid consuming sockets on a
Cassandra server by immediately closing connections when they
are no longer needed.
================================================== ======================================================================

Querying Cassandra Tables
-------------------------

The ``users`` table is an example Cassandra table from the Cassandra
`Getting Started`_ guide. It can be created along with the ``mykeyspace``
keyspace using Cassandra's cqlsh (CQL interactive terminal):

.. _Getting Started: https://wiki.apache.org/cassandra/GettingStarted

.. code-block:: none

cqlsh> CREATE KEYSPACE mykeyspace
... WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
cqlsh> USE mykeyspace;
cqlsh:mykeyspace> CREATE TABLE users (
... user_id int PRIMARY KEY,
... fname text,
... lname text
... );

This table can be described in Presto::

DESCRIBE cassandra.mykeyspace.users;

.. code-block:: none

Column | Type | Null | Partition Key | Comment
---------+---------+------+---------------+---------
user_id | bigint | true | true |
fname | varchar | true | false |
lname | varchar | true | false |
(3 rows)

This table can then be queried in Presto::

SELECT * FROM cassandra.mykeyspace.users;
4 changes: 2 additions & 2 deletions presto-docs/src/main/sphinx/connector/hive.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,8 @@ You can have as many catalogs as you need, so if you have additional
Hive clusters, simply add another properties file to ``etc/catalog``
with a different name (making sure it ends in ``.properties``). For
example, if you name the property file ``sales.properties``, Presto
will create a catalog named ``sales`` using the Hive connector. If
you are connecting to more than one Hive metastore you can create
will create a catalog named ``sales`` using the configured connector.
If you are connecting to more than one Hive metastore, you can create
any number of properties files configuring multiple instances of
the Hive connector.

Expand Down
55 changes: 1 addition & 54 deletions presto-docs/src/main/sphinx/installation/deployment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -231,60 +231,7 @@ contents to mount the ``jmx`` connector as the ``jmx`` catalog:
connector.name=jmx
Hive
""""

Presto includes Hive connectors for multiple versions of Hadoop:

* ``hive-hadoop1``: Apache Hadoop 1.x
* ``hive-hadoop2``: Apache Hadoop 2.x
* ``hive-cdh4``: Cloudera CDH 4
* ``hive-cdh5``: Cloudera CDH 5

Create ``etc/catalog/hive.properties`` with the following contents
to mount the ``hive-cdh4`` connector as the ``hive`` catalog,
replacing ``hive-cdh4`` with the proper connector for your version
of Hadoop and ``example.net:9083`` with the correct host and port
for your Hive metastore Thrift service:

.. code-block:: none
connector.name=hive-cdh4
hive.metastore.uri=thrift:https://example.net:9083
If your Hive metastore references files stored on a federated HDFS,
or if your HDFS cluster requires other non-standard client options
to access it, add this property to reference your HDFS config files:

.. code-block:: none
hive.config.resources=/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
Note that Presto configures the HDFS client automatically for most
setups and does not require any configuration files. Only specify
additional configuration files if absolutely necessary. We also
recommend minimizing the configuration files to have the minimum set
of requried properties, as additional properties may cause problems.

You can have as many catalogs as you need, so if you have additional
Hive clusters, simply add another properties file to ``etc/catalog``
with a different name (making sure it ends in ``.properties``).

Cassandra
"""""""""

Create ``etc/catalog/cassandra.properties`` with the following contents
to mount the ``cassandra`` connector as the ``cassandra`` catalog,
replacing ``host1,host2`` with a comma-separated list of the Cassandra
nodes used to discovery the cluster topology:

.. code-block:: none
connector.name=cassandra
cassandra.contact-points=host1,host2
You will also need to set ``cassandra.native-protocol-port`` if your
Cassandra nodes are not using the default port (9142).
See :doc:`/connector` for more information about configuring connectors.

.. _running_presto:

Expand Down

0 comments on commit 8f15589

Please sign in to comment.