-
Notifications
You must be signed in to change notification settings - Fork 1
oceanoverflow/alphadb
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
---------------------------------------------------------------------------------------------------- CONSENSUS: ---------------------------------------------------------------------------------------------------- CAP theorem In theoretical computer science, the CAP theorem, also named Brewer's theorem after computer scientist Eric Brewer, states that it is impossible for a distributed data store to simultaneously provide more than two out of the follow- ing three guarantees: * Consistency: Every read receives the most recent write or an error * Availability: Every request receives a (non-error) response - without the guarantee that it contains the most recent write * Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes In particular, the CAP theorem implies that in the presence of a network partition, one has to choose between consi- stency and availability. Note that consistency as defined in the CAP theorem is quite different from the consistency guaranteed in ACID database transactions. Explanation: No distributed system is safe from network failures, thus network partitioning generally has to be tolerated. In the presence of a partition, one is then left with two options: consistency or availability. When choosing consistency over availability, the system will always process the query and try to return the most recent available version of the information, even if it cannot gurantee it is up to date due to network partitioning. In the absence of network failure - that is, when the distributed system is running normally - both availability and consistency can be satisfied. CAP is frequently misunderstood as if one has to choose to abandon one of the three guarantees at all times. In fact , the choice is really between consistency and availability only when a network partition or failure happens; at all other times, no trade-off has to be made. Database systems designed with traditional ACID guarantees in mind such as RDBMS choose consistency over availability , whereas system designed around the BASE philosophy, common in the NoSQL movement for example, choose availability over consistency. The PACELC theorem builds on CAP by stating that even in the absence of partitioning, anther trade-off between laten- cy and consistency occurs. History According to University of California, Berkeley computer scientist Eric Brewer, the theorem first appeared in autumn 1998. It was published as the CAP principle in 1999 and presented as a conjecture by Brewer at the 2000 Symposium on Principles of Distributed Computing (PODC). In 2002, Seth Gilbert and Nancy Lynch of MIT published a formal proof of Brewer's conjecture, rendering it a theorem. In 2012, Brewer clarified some of his positions, including why the often-used "two out of three" concept can be mis- leading or misapplied, and the different definition if consistency used in CAP relative to the one used in ACID. A similar theorem stating the trade-off between consistency and availability in distributed systems was published by Birman and Friedman in 1996. The result of Birman and Friedman restricted this lower bound to non-commuting ope- rations. +--------+ |raft | |Follower| +--------+ / storage \ / engine! \ +--------+-------------+--------+ |raft | |raft | |Leader | |Follower| +--------+ +--------+ ^ | | | +------+ | |client|------------+ +------+ replicate then reply || reply then replicate ? for read ==> should be reply then replicate for write ==> should be replicate then reply ______ raft follower1 / / client -----> server(s1) -----> state machine (db) -----> raft leader(s2) \ \______ raft follower2 ---------------------------------------------------------------------------------------------------- BUFFER: ---------------------------------------------------------------------------------------------------- +----+----+----+----+----+----+ | | | | | | | +----+----+----+----+----+----+ | | | | | | | +----+----+----+----+----+----+ | | | | | | | +----+----+----+----+----+----+ | | | | | | | +----+----+----+----+----+----+ | | | | | | | +----+----+----+----+----+----+ ---------------------------------------------------------------------------------------------------- TABLE: ---------------------------------------------------------------------------------------------------- page_id | +----- slot_id | base --+ | v v v 4092B +----+--------------+------+------+------+------+ | Nt |table_page_hdr| ptr0 | ptr1 | ptr2 | ptr3 | +----+-+--------+---+--+---+------+------+------+ | ptr4 | ..... | ptrX |<| | +---+--+--------+---+--+ | | | | | | | +------------+ | | | | | | +--+--------------+ | | | | | | | v v v | | +--------+-----+------------------+------+ | |>|(tupleX)| ... |> (tuple4) |tuple3| +------+-+------+-----+---------+--------+------+ |(tuple2)|> (tuple1) |> (tuple0) | +--------+----------------------+---------------+ largest <-------- smallest tuple_len[i] = ptr[i] - ptr[i-1] for example: tuple_len[4] = ptr[3] - ptr[4] special case: tuple_len[0] = INDEX_PAGE_SIZE - ptr[0] ---------------------------------------------------------------------------------------------------- INDEX: ---------------------------------------------------------------------------------------------------- Disk structure: platter -> track -> sector (block=4096Bytes)==> addressable <track_no, sector_no> offset page layout: degree_: N page_size_ = (4KB-4B) degree can also be calculated: N = std::floor( page_size_ / (sizeof(RID) + sizeof(KEY)) ) for [internal] node: KEY[n-1] is a PSEUDO key, RID<page, slot>, for interval node, just p works, s is not important. <===> RID<p, -1> RID0 < KEY0 <= RID1 < KEY1 <= RID2 < KEY2 <= RID3 < KEY3 <= ...... < KEY[n-2] <= RID[n-1] < (<<<<<<<<<<KEY[n-1]>>>>>>>>>)p | | | | | p{0}<-----+ | | | +------>p{n-1} | | | p1<------+ | +------>p{3} | v p{2} for {leaf} node: RID[n-1] is the pointer to the next leaf, KEY[n-1] is also a PSEUDO key, other RIDi points to the record in the table, and KEYi is the index key, here i < [n-1] (RID0, KEY0) <= (RID1, KEY1) <= (RID2, KEY2) <= (RID3, KEY3) ........... <= (RID[n-1], KEY[n-1]) <= (RID[n-1], KEY[n-1]x) +------------------------->***nextleaf*** | | | | | | | | | | | | +----------------------+ | | | | v | | | v (p{n-2},s{n-2}) | | v (p3,s3) | v (p2,s2) v (p1,s1) (p0,s0) logical view: +--------------+--------------+--------------+--------------+-------------------------------------+----------------------+ | (RID0, KEY0) | (RID1, KEY1) | (RID2, KEY2) | (RID3, KEY3) | ....................................| (RID[n-1], KEY[n-1]) | +--------------+--------------+--------------+--------------+-------------------------------------+----------------------+ physical view: +---+------+------+------+----------+------+------+------+----------+------------------------------------------------------+ | n | RID0 | RID1 |......| RID[n-1] | KEY0 | KEY1 |......| KEY[n-1] | free_space | +---+------+------+------+----------+------+------+------+----------+------------------------------------------------------+ +--------------+ |p0|k0|p1|k1|p2| internal_node +--------------+ / | \ / | \ / v \ +--------------+ +--------------+ +--------------+ first --> |r0|k0|r1|k1|n2|-->|r3|k3|r4|k4|n2|-->|r5|k5|r6|k6|n3| --> last (nullptr) leaf_node +--------------+ +--------------+ +--------------+ | | | | | | | | | | | | v v v v v v *** *** *** *** *** *** degree = 7 max_node = 6; min_node = 3 1. node->move_half_to(node* receiver) [when splitting] begin: |*******| end: |****| -> |***| ***|||insert|||*** 2. node->move_all_to(node* receiver) [when the node is underflow] begin: |****| <- |**| end: |*****| ***|||delete|||*** coalesce 3. node->move_first_to_end_of(node* receiver) [when the node is overflow, and left neighbor is not] begin: |*****| <- |*******| end: |******| <-> |******| ***|||insert|||*** redistribute 4. node->move_last_to_front_of(node* receiver) [same as last one, but this time the right neighbor is not] begin: |*******| -> |*****| end: |******| <-> |******| ***|||insert|||*** redistribute ---------------------------------------------------------------------------------------------------- SQL: ---------------------------------------------------------------------------------------------------- lex is a scanner generator. input is a set of regular expressions and associated actions (written in C) output is a table-driven scanner (lex.yy.c) flex: an open source implementation of the original UNIX lex utility lex: semantic analysis -> splits the input file into tokens yacc: yet another compiler compiler -> parses and does semantic processing on the stream of tokens produced by lex bison: GNU parser parser, upward compatibility with yacc. lex input example(.l) exl.l FIRST PART(optional) %{ #include "myscanner.h" %} %% %% keyword first!!! "select" select(); pattern action "hello world" printf("GOODBYE\n") ....(regular expressions) : return COLON; %% %% THIRD PART int yywrap(void) { return 1; } yacc input FIRST PART %{ C declarations; yacc definitions; %} %% %% production action statements: statement {printf("statement");} .... | statement statements {printf("statements");} ; %% %% THIRD PART $ yacc -d sql.y // generate y.tab.h && y.tab.c $ lex sql.l // process the lex file to generate a scanner (gets saved as lex.yy.c) $ gcc lex.yy.c y.tab.c -o sqls // compile the scanner and grab main() from the lex library (-ll option) $ ./a.out // run the scanner taking input from standard input lex pattern examples +-----------+-------------------------------------------------------------------------------+ |abc | match the string "abc" | +-----------+-------------------------------------------------------------------------------+ |[a-zA-Z] | match any lower or uppercase letter | +-----------+-------------------------------------------------------------------------------+ |dog.*cat | match any string starting with dog, and ending with cat. | +-----------+-------------------------------------------------------------------------------+ |(ab)+ | match one or more occurrences of "ab" concatenated. | +-----------+-------------------------------------------------------------------------------+ |[^a-z]+ | matches any string of one or more characters that do not includ lower case a-z| +-----------+-------------------------------------------------------------------------------+ |[+-]?[0-9]+| match any string of one or more digists with an optional prefix of + or -. | +-----------+-------------------------------------------------------------------------------+ source code a = b + c * d | v +----------------+ +---+ |Lexical Analyzer|<--------|Lex|<------- patterns +----------------+ +---+ | tokens v +---------------+ +----+ |Syntax Analyzer|<--------|Yacc|<------ grammers (context free) +---------------+ +----+ | syntax tree v / \ id1 + / \ id2 * | / \ | id3 id4 v +--------------+ |code generator| +--------------+ | generated code v load id3 mul id4 add id2 store id1 +------------+ mylang.y --> | yacc | --> y.tab.c _ +------------+ \ source | \ code | \ | v +--------+ v y.tab.h | gcc |--> mylang | +--------+ | v / v +-----------+ / compiled code mylang.l --> | lex | --> lex.yy.c _/ interpreter output +-----------+ Supported SQL Queries ===================== ## Select Statements ```sql SELECT name, city, * FROM students AS t1 JOIN students AS t2 ON t1.city = t2.city WHERE t1.grade < 2.0 AND t2.grade > 2.0 AND t1.city = 'Frohnau' ORDER BY t1.grade DESC; SELECT city, AVG(grade) AS average, MIN(grade) AS best, MAX(grade) AS worst FROM students GROUP BY city; ``` ## Data Definition & Modification **Create Tables** ```sql CREATE TABLE students ( name TEXT, student_number INTEGER, city TEXT, grade DOUBLE ); ``` **Update and Delete** ```sql UPDATE students SET name='Max Mustermann' WHERE name = 'Ralf Mustermann'; DELETE FROM students WHERE name = 'Max Mustermann'; ``` A compilr or interpreter for a progromming language is often decomposed into two parts: 1. Read the source program and discover its structure. 2. Process this structure, e.g. to generate the target program. Lex and Yacc can generate program fragments that solve the first task. The task of discovering the source structure again is decomposed into subtasks: 1. Split the source file into tokens (Lex) 2. Find the hierarchical structure of the program (Yacc). ---------------------------------------------------------------------------------------------------- NETWORK: ---------------------------------------------------------------------------------------------------- server client +--------+ +--------+ | socket | | socket | +--------+ +--------+ | | v | +--------+ | | bind | | +--------+ | | | v | +--------+ | | listen | | +--------+ | | | v v +--------+ TCP 3 time +---------+ (ESTABLISHED) SYN+ACK | accept |<-------------| connect | SYN, ACK (ESTABLISHED) +--------+ handshake +---------+ | | v v +--------+ +---------+ | read |<-------------| write | +--------+ +---------+ +--------+ +---------+ | write |------------->| read | +--------+ +---------+ | | v v +-------+ +-------+ | close | | close | +-------+ +-------+ +------------+ +---------------+ +------+ +----------------+ +----------------+ | user input |---->| child process | ----> | pipe | ----> | parent process | ----> | server | +------------+ +---------------+ +------+ +----------------+ +----------------+ ^ | | | +------------------------+ TCP is a finite state machine, it has 11 state. SYNseq=x SYN_SENT |------------------>| LISTEN (listen()) (connect)| SYNseq=y,ACK=x+1 | ESTABLISHED|<------------------| SYN_RCVD | ACK=y+1 | |------------------>| ESTABLISHED write() | |___________________ | | | seq=x+1,ACK=y+1 | |------------------>| read() | ACK=x+2 | |<------------------| | |___________________ | FINseq=x+2ACK=y+1 | FIN_WAIT_1 |------------------>| CLOSE_WAIT | ACK=x+3 | FIN_WAIT_2 |<------------------| LAST_ACK TIME_WAIT |<------------------| (close()) | FINseq=y+1 | |------------------>| | ACK=y+2 | CLOSED --> SYN_SENT --> ESTABLISHED LISTEN SYN_RCVD CLOSE_WAIT LAST_ACK CLOSING FIN_WAIT_1 FIN_WAIT_2 TIME_WAIT 1. synchronize blocking iterative bottleneck: disk or network I/O for (;;) { fd = accept(...); read(fd, buf, n); dosomething(buf); write(fd, buf, n); close(fd); } +-----------+ |application| +-----------+ | +-+ | | | | System call - kernel context switch +-+ -------------------------------------> +-+ | | | initiate read I/O | +-+ ----> | | | | | | | | | | read response | data movement from +-+ <---- | kernel space to user space | | +-+ <------------------------------------- +-+ | | | | | | +-+ | | | | | | | | | 2. multiple process for (;;) { fd = accept(...); ret = fork(); swtich (ret) { case -1: do_err_handler(); break; case 0: // child process do_handler_fd(fd); break; default: // parent process continue; } } // need signal handling to handle zombie child process pre-fork: process pool 3. multiple thread void* thread_entry(void* args) { int fd = *(int *)args; do_handler_fd(fd); } for (;;) { fd = accept(); pthread_create(..., thread_entry, &fd); } ---------------------------------------------------------------------------------------------------- TRANSACTION: ---------------------------------------------------------------------------------------------------- In computer science, ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties of database transactions intended to guarantee validity even in the event of errors, power failures, etc. In the context of database, a sequence of database operations that satisfies the ACID properties (and these can be perceived as a single logical operation on the data) is called a transaction. For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction.
About
distributed relational database
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published