Skip to content

Latest commit

 

History

History

vanilladb-core

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
                      THE VANILLADB DATABASE SYSTEM
                  General Information and Instructions

VanillaDB is enhanced based on SimpleDB.

This document contains the following sections:
    I.    SimpleDB 2.10 Release Notes (by Edward Sciore)
    II.   Server Installation
    III.  Running the Server
    IV.   Running Client Programs
    V.    VanillaDB Limitations
    VI.   The Organization of the Server Code
    VII.  Test Suite for VanillaDB
    VIII.  Enhancements based on SimpleDB (by NetDB)


I. SimpleDB 2.10 Release Notes (by Edward Sciore):
   SimpleDB's web site:
   	* www.cs.bc.edu/~sciore/simpledb/intro.html
   
   This release of the SimpleDB system is Version 2.10, which was
   uploaded on January 1, 2013. This release provides the following
   fixes to Version 2.9:

    * The files simpledb.Startup and remote.SimpleDriver have been changed 
      to use a server-specific registry, instead of forcing the user to 
      run rmiregistry as a separate process.
    * A bug was fixed in the file SortScan.java.
    * The new client file StudentMajorNoServer was added.

   SimpleDB is distributed in a WinZip-formatted file. This file contains
   four items:

    * The folder simpledb, which contains the server-side Java code.
    * The folder javadoc, which contains the JavaDoc documentation 
      of the above code.
    * The folder studentClient, which contains some client-side code 
      for an example database.
    * This document.

   The author welcomes all comments, including bug reports, suggestions
   for improvement, and anectodal experiences.  His email address is: 
   [email protected]
  
II. Installation Instructions:

  1)  Install the Java 1.6 or 1.5 SDK. Java 1.6 is obviously preferable.

  2)  If you do install Java 1.5, you need to make some minor changes 
      to the package org.vanilladb.core.remote.jdbc:
    
      * The classes named xxxAdapter provide default implementations of 
        the interfaces in java.sql. Java 1.6 added several extra methods 
        to these interfaces. If you are using Java 1.5, just comment out 
        those methods. (You can tell which ones they are because you'll 
        get an error when you try to compile them.)
      
      * The classes named VanillaDbXXX call the SQLException constructor 
        with a Throwable argument.  This constructor is new to version
        1.6. To use in 1.5, rewrite the code "throw new SQLException(e)"
        to be "throw new SQLException(e.getMessage())".

  3)  Decide where you want the server-side software to go. Let's assume 
      that the code will go in the folder C:\javalib in Windows, or the 
      folder ~/javalib in UNIX or MacOS.

  4)  Add that folder to your classpath. In other words, the javalib 
      folder must be mentioned in your CLASSPATH environment variable.
    
      * In UNIX, your home directory has an initialization file, 
        typically called .bashrc.  If the file does not set CLASSPATH,
        add the following line to the file:  
               CLASSPATH =.:~/javalib     
               
        Here, the ":" character separates folder names.  The command 
        therefore says that the folder "." (i.e., your current diretory) 
        and "~/javalib" are to be searched whenever Java needs to find a 
        class.  If the file already contains a CLASSPATH setting, modify 
        it to include the javalib directory.
 
      * In Windows, you must set the CLASSPATH variable via the System 
        control panel.  From that control panel, choose the advanced tab 
        and click on the environment variables button.  You want to have 
        a user variable named CLASSPATH that looks like this:
               .;C:\javalib
               
        Here, the ";" character separates the two folder names.

  5)  Copy the vanilladb folder from the distribution file to that
      folder. Within the vanilladb folder should be subfolders 
      containing all of the code for VanillaDB.


III. Running the Server:

  VanillaDB has a client-server architecture. You run the server code on 
  a host machine, where it will sit and wait for connections from clients.
  It is able to handle multiple simultaneous requests from clients, 
  each on possibly different machines. You can then run a client program
  from any machine that is able to connect to the host machine.

  To run the VanillaDB server, run Java on the org.vanilladb.core.
  server.Startup class. You must pass in the name of a folder that 
  VanillaDB will use to hold the database. For example in Windows, if you 
  execute the command:
      
         > start java org.vanilladb.core.server.StartUp studentdb
             
  then the server will run in a new window, using studentdb as the
  database folder. You can execute this command from any directory;
  the server will always use the studentdb folder that exists in your
  home directory. If a folder with that name does not exist, then
  one will be created automatically.
 
  If everything is working correctly, when you run the server with a
  new database folder the following will be printed in the server 
  window:

      creating new database
      new transaction: 1
      transaction 1 committed
      database server ready

  If you run the server with an existing database folder, the following
  will be printed instead:

      recovering existing database
      database server ready

  In either case, the server will then sit awaiting connections from
  clients.  As connections arrive, the server will print additional
  messages in its window.


IV. Running Client Programs 

  The VanillaDB server accepts connections from any JDBC client. The client
  program makes its connection via the following code:
            Driver d = new VanillaDbDriver();
            String host = "mymachine.com"; //any DNS name or IP address
            String url = "jdbc:vanilladb:https://" + host;
            Connection conn = d.connect(url, null);

  Note that VanillaDB does not require a username and password, although
  it is easy enough to modify the server code to do so.

  The driver class VanillaDbDriver is contained in the package 
  org.vanilladb.core.remote.jdbc, along with the other classes that 
  it needs. A client program will not run unless this package in its classpath.
  Note that you could install the entire VanillaDB server code on a client 
  machine, but that is overkill. All you need is 
  org.vanilladb.core.remote.jdbc.

V. VanillaDB Limitations

  VanillaDB is a teaching tool. It deliberately implements a tiny subset
  of SQL and JDBC, and (for simplicity) imposes restrictions not present
  in the SQL standard.  Here we briefly indicate these restrictions.


  VanillaDB SQL
  
  A query in VanillaDB consists only of select-from-where clauses in which
  the select clause contains a list of fieldnames (without the AS 
  keyword), and the from clause contains a list of tablenames (without
  range variables).
 
  The where clause is optional. The only Boolean operator is and. Unlike 
  standard SQL, there are no other Boolean operators and no parentheses. The
  group by, order by clauses and partial aggregation functions are supported. 
  Arithmetic expression is only supported in update command.

  Views can be created, but a view definition can be at most 100 characters.
 
  Because there are no renaming, all field names in a query must be disjoint. 
  Other restrictions:

    * The "*" abbreviation in the select clause is not supported.
    * There are no null values.
    * There are no explicit joins or outer joins in the from clause.
    * The union and except keywords are not supported.
    * Insert statements take explicit values only, not queries.
    * Update statements can have only one assignment in the set clause.
	
  VanillaDB Syntax
  
     // Predicate
    <Field>				:= IdTok
    <Constant>			:= StrTok | NumericTok
    <Expression>	    := <Field> | <Constant>
    <BinaryArithmeticExpression>	
                        := ADD(<Expression>, <Expression>) | 
                           SUB(<Expression>, <Expression>) | 
                           MUL(<Expression>, <Expression>) |
                           DIV(<Expression>, <Expression>)
    <Term>				:= <Expression> = <Expression>  | 
  						   <Expression> > <Expression>  |
  						   <Expression> >= <Expression> | 
  						   <Expression> < <Expression>  | 
  						   <Expression> <= <Expression>
    <Predicate>			:= <Term> [ AND <Predicate>	]

	// Query
    <Query>				:= SELECT <ProjectSet> FROM <TableSet> 
                           [ WHERE <Predicate> ] [ GROUP BY <IdSet> ] 
                           [ ORDER BY <SortList> [ DESC | ASC ] ]
	<IdSet>				:= <Field> [ , <IdSet> ]
	<TableSet> 			:= IdTok [ , <TableSet> ]
	<AggFn>				:= AVG(<Field>) | COUNT(<Field>) |
						   COUNT(DISTINCT <Field>) | MAX(<Field>) |
						   MIN(<Field>) | SUM(Field>)
	<ProjectSet>		:= <Field> | <AggFn> [ , <ProjectSet>]
	<SortList>			:= <Field> | <AggFn> [ , <SortList>]

	// Update
	<UpdateCmd>			:= <Insert> | <Delete> | <Modify> | <Create>
	<Create>			:= <CreateTable> | <CreateView> | <CreateIndex>
	<Insert>			:= INSERT INTO IdTok ( <FieldList> ) 
						   VALUES ( <ConstantList> )
	<FieldList> 		:= <Field> [ , <Field> ]
	<ConstantList> 		:= <Constant> [ , <Constant> ]
	<Delete>			:= DELETE FROM IdTok [ WHERE <Predicate> ]
	<Modify>			:= UPDATE IdTok SET <ModifyTermList> 
						   [ WHERE <Predicate> ]
	<ModifyExpression>	:= <Expression> | <BinaryArithmeticExpression>
	<ModifyTermList>	:= <Field> = <ModifyExpression> [ , <ModifyTermList> ]
	<CreateTable>		:= CREATE TABLE IdTok ( <FieldDefs> )
	<FieldDefs> 		:= <FieldDef> [ , <FieldDef> ]
	<FieldDef>			:= IdTock <TypeDef>
	<TypeDef>			:= INT | LONG | DOUBLE | VARCHAR ( NumericTok ) 
	<CreateView>		:= CREATE VIEW IdTok AS <Query>
	<CreateIndex>		:= CREATE INDEX IdTok ON IdTok ( <Field> )  
  
  VanillaDB JDBC
  
  VanillaDB implements only the following JDBC methods:

   Driver

      public Connection connect(String url, Properties prop);
      // The method ignores the contents of variable prop.

   Connection

      public Statement createStatement();
      public void      close();
	  public void      setAutoCommit(boolean autoCommit);
	  public void	   setReadOnly(boolean readOnly);
	  public void      setTransactionIsolation(int level);
	  public boolean   getAutoCommit();
      public int       getTransactionIsolation();
       
   Statement

      public ResultSet executeQuery(String qry);
      public int       executeUpdate(String cmd);

   ResultSet

      public boolean   next();
      public int       getInt();
      public String    getString();
      public void      close();
      public ResultSetMetaData getMetaData();

   ResultSetMetaData

      public int        getColumnCount();
      public String     getColumnName(int column);
      public int        getColumnType(int column);
      public int        getColumnDisplaySize(int column);



VI. The Organization of the Server Code

  VanillaDB is usable without knowing anything about what the code looks
  like. However, the entire point of the system is to make the code
  easy to read and modify.  The basic packages in VanillaDB are structured
  hierarchically, in the following order:

    * storage.file (Manages OS files as a virtual disk.)
    * storage.log (Manages the log.)
    * storage.buffer (Manages a buffer pool of pages in memory that acts 
    		          as a cache of disk blocks.)
    * storage.tx (Implements transactions with multi-granularity locking.
                  Does concurrency control and logging.)
    * storage.record (Implements fixed-length records inside of pages.)
    * storage.metadata (Maintains metadata in the system catalog.)
    * query.algebra (Implements relational algebra operations.  Each 
                     operation has a plan class, used by the planner, and
                     a scan class, used at runtime.)
    * query.parse (Implements the parser.)
    * query.planner (Implements a naive planner for SQL statements.)
    * sql (Implements the supported SQL and constants.)
    * remote (Implements the server using RMI.)
    * server (The place where the startup and initialization code live. 
              The class Startup contains the main method.)

  The basic server is exceptionally inefficient.  The following packages
  enable more efficient query processing:

    * storage.index (Implements static hash and btree indexes.)
    * query.algebra.index(Implements relational algebra operations to take 
    		              advantage of them.)
    * query.algebra.materialize (Implements implementations of the relational 
                                 operators materialize, sort, groupby, and
                                 mergejoin.)
    * query.algebra.multibuffer (Implements modifications to the sort and 
    						     product operators, in order to make optimum 
    						     use of available buffers.)
    * query.planner.opt (Implements a heuristic query optimizer)
    * query.planner.index (Implements a update planner based on index)
 
   The textbook "Database Design and Implementation" describes the original
   SimpleDB packages in considerably more detail. For further information, go
   to the URL www.wiley.com/college/sciore
   
VII. Test Suite for VanillaDB

  The Java files in this archive constitute a test suite for the VanillaDB
  database system.  The suite runs on the server machine, and embeds the
  VanillaDB server code.  Thus you don't have to have the RMI registry 
  running; you just need to have the vanilladb code in your classpath.
  You then just run the test code.
  
  This test suite is particularly useful if you are modifying the VanillaDB
  code.  After making changes, you can run the tests to ensure that you
  did not break anything.  These tests have not been made publicly 
  available, because instructors may want to assign the creation of some
  of the test code to their students.  By making the code available only
  to instructors, they can choose which portions of it (if any) to
  release to their students.
  
  The test suite contains a class for each VanillaDB package, each having 
  a method named "test".  That method runs various diagnostic tests.
  The test will print an error message if it detects something wrong.
  (Error messages are output lines prepended by the characters "*****".)
  
  The main method of the suite is in the class VanillaDbTestSuite. It opens 
  a database named "testvanilladb", and calls each package's test method.  
  
  These test methods are somewhat rudimentary.  There are many features  
  that are not tested.  Improvements are welcome.  Please send email to
  [email protected].

VIII. Enhancements based on SimpleDB (by NetDB)
  
  In VanillaDB, NetDB has implemented some enhancements to improve SimpleDB
  as summarized below:
  
    
  Enhancements:
  	
  	> File level
       > FileMgr implements "atomic write failure" that is required by LogMgr. 
         A write to block is reflected sequentially two times to disk with 
         check sum, so one copy can be used to recover another
       * Block renames to BlockId (update the API doc as well, specially those
         in BasicBufferMgr/BufferMgr/Buffer)
       * Add long variable type
         
    > Buffer level
       * Decouple buffer information from transaction packages, maintain 
         buffer pinned by tx in BufferMgr
       * Synchronized access to buffers
       * Maintain dirty buffer modified by tx in BufferMgr
       * Repin buffers holding by tx when pin timeout 
       * Add long variable type
       * Refine findExistingBuffer
       
    > Transaction level
       * Only maintain lifecycle
       * Add interfaces TransactionLifecycleListener and
         TransactionStartListener for RecoveryMgr, ConcurrencyMgr and BufferMgr
       * Transaction package across all packages
       
    > Concurrency level 
       * Implement TransactionLifecycleListener
       * Multi-granularity locking and index locking
       * Locktable responses for checking different lock types compatibility
         and maintains currently holding locks by a tx 
        
    > Recovery level
       * Implement TransactionLifecycleListener
       * Add SetLongRecord to support long type
      
    > Record level
       * Access API from buffer level directly
       * RID renames to RecordId (update the API doc as well)
       * Open the file until client really access record 
      
    > Index level
       * Access API from buffer level directly 
       * Index crabbing  
       * Btree index supports range query
      
    > Query level
       * Implement aggreagtion functions
         * SUM, AVG
      	 * MAX, MIN
      	 * COUNT DISTINCT 
         * COUNT
       * sortPlan support sort operations ASC and DESC
       * Add binary arithmetic on two expressions as expression type
      	 * ADD(e1, e2) 
      	 * SUB(e1, e2)  
      	 * DIV(e1, e2)   
      	 * MUL(e1, e2)  
       * Supprot different types of Term
      	 * term > term
      	 * term >= term
      	 * term < term
      	 * term <= term
       * Refine the reduction factor methods of Predicate and Term to support 
      	 SelectPlan estimation (change factor from int to double)
      	 * valueReductionFactor
      	 * recordReductionFactor
       * Add BigIntConstant, DoubleConstant
       * Add Constant arithmetic operation
       * SelectPlan supports range query
       * Add preprocessingCost method into all plans (but current planner only 
      	 take into account the recordOutput when planning)
      	      	
    > Parse level
       * Support binary arithmetic on two expression in update command
       * Extend predicate to support > >= <= < term
       * Order by ASC and DESC
       * Aggregation function and group by
       * Parse double constant
       * Parse '_' as word 
      
    > Metadata level
       * Add sample-based histogram
       * Separate statistic info from table/index info and add TableStatInfo/
         IndexStatInfo
    
    > Planner level
       * Support group by and order by operations      
    
    > Remote
       * Add JDBC API setReadOnly() (throws exception if it execute update)
         to support TableScan(Query only)
       * Add setTransactionIsolationLevel() and setAutoCommit()
        
  Testing:
    > ConcurrencyTest add different isolation level test case      
    
  Benchmarking:
  	> Test if it helps for ConcurrencyMgr to obtain multiple locks of different 
  	  granularities or different types (S/X) in a synchronized block over 
  	  lockTbl