Skip to content

Latest commit

 

History

History
104 lines (64 loc) · 5.67 KB

sql-gremlin.md

File metadata and controls

104 lines (64 loc) · 5.67 KB

SQL Support and Limitations

The SQL-Gremlin compiler turns your SQL queries into Gremlin traversals and runs them against your favorite TinkerPop 3 enabled graph database.

Basic Query Form

The driver current support SELECT statements of the general form:

SELECT [ DISTINCT ] { * | <projectItem> [, <projectItem> ]* }
   FROM <tableExpression>
   [ WHERE <booleanExpression> ]
   [ GROUP BY { <column> [, <column> ]* } ]
   [ HAVING <booleanExpression> ]
   [ ORDER BY { <column> [ DESC ] [, <column> [ DESC ] ]* } ]
   [ LIMIT limitNumber ]

projectItem:
    [ agg ]* [ <table>. ]* <column> [ [ AS ] columnAlias ]

Queries without a FROM clause or only using VALUES in the FROM clause are not supported.

A tableExpression must specify 1 or more tables as a comma separated list or using JOIN keywords. See the Joins section for more information on supported join operations.

A projectItem in SELECT can be a reference to a column, or aggregation expression using the supported aggregation functions listed in Operators and Functions. A booleanExpression is the same but must resolve to a boolean value.

To order by a value, it must be part of the SELECT list. Group by and order by using column aliases is currently not supported.

Type Conversion is not supported, and thus CAST is not supported. Set operations UNION, INTERSECT and EXCEPT are not supported. Grouping operations using CUBE, ROLLUP or GROUPING SETS are not supported. Ordering using NULLS FIRST and NULLS LAST or by referencing column ordinals is not supported.

Identifiers

Identifiers are the names of tables, columns, and column aliases in an SQL query.

Quoting is optional but unquoted identifiers must start with a letter and can only contain letters, digits, and underscores. Quoted identifiers start and end with double quotes. They may contain virtually any character. To include a double quote in an identifier, use another double quote to escape it. The maximum identifier length, quoted or unquoted, is 128 characters.

Identifier matching is case-sensitive and identifiers that match a reserved SQL keyword must be quoted or use fully qualified names.

Joins

The driver only support INNER JOIN on two vertices that are connected by an edge. When looking at vertices you will see <edge_label>_IN_ID or <edge_label>_OUT_ID. Basic literal comparisons are supported in WHERE and HAVING clauses in conjunction with the INNER JOIN clause.

Foreign keys are not generated and not exposed in JDBC metadata at this time.

Data Types

The driver recognizes the following SQL data types:

  • BOOLEAN - Boolean literals must be TRUE or FALSE
  • TINYINT
  • SMALLINT
  • INTEGER or INT
  • BIGINT
  • DECIMAL
  • REAL or FLOAT
  • CHAR and VARCHAR - String literal must be enclosed in single-quotes.
  • DATE

Operators and Functions

Supported operators are listed below. Arithmetic, string, conditional and date operators and functions are currently not support.

Comparison Operators

  • value1 <op> value2 where op is one of : =, <>, <, >, <= or >=
  • NOT in conjunction with a comparison operator is not currently supported

Logical Operators

  • boolean1 OR boolean2
  • boolean1 AND boolean2
  • NOT boolean2 is supported only when used with a boolean valued column in WHERE/HAVING filters
  • NOT boolean2 is supported with expressions and boolean values in SELECT clause

Aggregate Functions

  • AVG(numeric)
  • COUNT(*)
  • MAX(value)
  • MIN(value)
  • SUM(numeric)

Currently, COUNT( [DISTINCT] numeric), COUNT( [DISTINCT] value), and SUM( [DISTINCT] numeric) are not supported.

Schema Collection

Property graph databases are schemaless, however SQL requires that the connected database has schema. Because of this, sql-gremlin collects what is effectively the schema of the graph. This is done by sampling the graph, collecting all vertex and edge labels.

The vertex and edge labels become the tables in the graph. The properties of the vertices and edges become the columns in the graph. In addition to the properties of the vertices and edges, connections between vertices and edges are collected and become columns.

This means that a vertex with an edge labelled as MY_EDGE that goes into it will have MY_EDGE_IN_ID as a column, and a vertex with an edge labelled as MY_OTHER_EDGE that goes out of it will have MY_OTHER_EDGE_OUT_ID as a column.

MY_EDGE will have the vertex is goes into as <vertex_label>_IN_ID as a column and MY_OTHER_VERTEX will have the vertex it goes out of as <vertex_label>_OUT_ID as a column.

Edges and columns also have their own id as a column with the name <label>_ID.

Additional Limitations

Currently, JDBC driver supports Tableau Data Extracts (TDE) and has limitations which may prevent or significantly limit functionality when using Live Connection in Tableau.

The driver does not support Convert to Custom SQL in Tableau, due to Tableau’s use of embedded SQL. If one wishes to issue SQL commands against a live server, DbVisualizer does provide such functionality.

Acknowledgements

Special thanks goes to the Apache TinkerPop and Apache Calcite teams. The depth and breadth of both of these projects is truly astounding. Also, thanks to Daniel Kuppitz. His work on SPARQL-Gremlin served as a model and inspiration for SQL-Gremlin.