JP4126843B2

JP4126843B2 - Data management method and apparatus, and recording medium storing data management program

Info

Publication number: JP4126843B2
Application number: JP2000101211A
Authority: JP
Inventors: 憲宏原
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2000-03-31
Filing date: 2000-03-31
Publication date: 2008-07-30
Anticipated expiration: 2020-03-31
Also published as: JP2001282599A; US6571250B1

Description

【０００１】
【発明の属する技術分野】
本発明はデータ管理技術に関し，あるインデクスのキーに対して複数のデータを管理する場合に適用して有効な技術に関するものである。
【０００２】
【従来の技術】
近年インタネットを利用した業務システムの急速な普及に伴い、システムを支えるデータベース管理システム(DBMS)の適用分野も拡大している。さらにそれに伴いDBMSの扱うデータ量も年々増加している。そのシステム中には、ワークフローにおける「業務推移状態」など、値域の狭いデータに対してDBMSの最も代表的な高速検索手段であるB木インデクスを利用するようなアプリケーションも多く、Ｂ木インデクスの安定した性能の提供が重要となっている。
【０００３】
Ｂ木インデクスに関しては、例えば、文献(Jim Gray and Andreas Reuter, Transaction Processing: Concepts and Techniques, Morgan Kaufmann Publishers, 1993)に、基本的はデータ構造、複数処理からの同時アクセス方法、および回復に関する基本方式について開示している。
【０００４】
実システムとしての高速なＢ木インデクスを実現するためには、次の課題を解消する必要がある。
【０００５】
（ｉ）複数処理による同時実行性の向上
レコードを含むデータベーステーブルに対して複数のトランザクションが同時にアクセスしてくる場合、問題が起こる。具体的には、あるトランザクションが１つのレコードを更新しようとしたとき、同時に他のトランザクションが同一レコードにアクセスしようとするとき、競合状況が発生する。競合問題の解決法の１つとして、レコードまたはＢ木インデクスのある部分（Ｂ木インデクス全体、ノード、インデクスエントリ、キー値など）に対するロッキング方式（排他的アクセス方式）がある。ロッキング方式は、データにアクセスする前に強制的にトランザクションにロックを取得させ。このとき他のトランザクションによってロックがかけられている場合、そのロックが衝突するために、必要なロックを取得できないことがある。ロッキング方式は自トランザクションの更新結果や参照結果を確実に保証してくれるが、ロックを取得中は他のトランザクションのアクセスをロック解除まで待たすので、システムの多重度(同時実行性)を高めスループットを上げるためには、１つのトランザクションが同時に取得するロックの数やロックによる影響範囲（粒度）を最小限に抑えることが重要である。
【０００６】
（ii）回復処理制御
実システムにおいては、トランザクションの中断、システムダウン、媒体障害など種々の障害に関して、トランザクションの原子性(Atomicity)を保証し、かつＢ木インデクス内のデータの整合性を保証する必要がある。そのためのログレコード取得方式およびログレコードを使用した回復処理制御が重要となる。ログレコードに取得方式には、インデクスを構成するノードの更新前・更新後の物理イメージをログレコードの情報をして取得する物理ログ方式などがある。物理イメージを用いた回復では、更新を行ったトランザクションの完了まで、他トランザクションはそのページへのアクセスが制限されることになる。すわち、回復処理制御は排他制御と密接に関係し、同時実行性にも影響を与える。
【０００７】
図１１に、従来の代表的なＢ木インデクスの構造を示す。
【０００８】
Ｂ木インデクスは、１つのルートノードを頂点に、ルートノードから多数のレベルにわたりノードが枝分かれしている。枝分かれしたノードの内末端のノードをしばしばリーフノードと呼ぶ。またルートノードを含むリーフノード以外のノードをしばしば上位ノードと呼ぶ。各ノード内は以下に示す情報から構成される複数のインデクスエントリを持つ。リーフノード内のインデクスエントリは、データベース中のレコード(テーブルデータ)を示すポインタと、そのレコード(テーブルデータ)の特徴の１つを表すキー値から構成される。また上位ノード内のインデクスエントリは、次の下位レベルにあるノード（子ノード）を示すポインタと、その子ノードから枝分かれして最終的にリーフノードで管理されているキー値の範囲を示す１キー値から構成される。上位インデクスエントリ内のキー値は、Ｂ木インデクスへのアクセス・プログラムがルートノードから目的とするレコードへのポインタを含むインデクスエントリが格納管理されているリーフノードへと辿っていく際の道しるべ（判定要素）の役割を担う。通常Ｂ木インデクスの１ノードは、データベース上では、アクセス単位であるページによって実現される。
【０００９】
【発明が解決しようとする課題】
Ｂ木インデクスに対する排他制御方式の一つにインデクスの排他資源の粒度を「キー」とするキー値排他方式もしくはキーレンジ排他方式がある。これらインデクスのキーに対する排他制御を用いた場合、同一キーを持つデータ(行)に対する検索・更新処理はインデクスキー排他によりシリアライズされ、同時実行性が低下する。現在急速な普及をみせているWEB環境下におけるトランザクションは、従来のＯＬＴＰのトランザクションの長さに較べ長く、キーに対する排他制御方式では高スループットのサービスの提供は困難である。
【００１０】
また、図１１に示すように多数の同一キー有するＢ木インデクスに関して、以下のような問題もある。図１１のＢ木インデクスでは、あるキーに関連するテーブルデータ情報を１つのリーフエントリ１６で管理している。リーフエントリ１６は、キー１４、そのキー値に関連するテーブルデータ情報１８(データレコード識別子)、そのキー値に関連するテーブルデータに関する情報１８の個数１７（超複数）を有する。テーブルデータ情報１８は重複数分リスト形式にて昇順に配置されている。このような格納方式では、あるキーに対する重複数が多くなると、リーフエントリ自体の長さがリーフノード１２に収まりきれなくなる。結果的に、１つのキーに対するテーブルデータ情報を複数のリーフエントリとして、複数のノードに分けて格納しなければならない。図１１の例では、キー「２８」のエントリがリーフノードＮ４、Ｎ５、Ｎ６と３つのノードに分かれて格納されている。この構造では、範囲検索に対応するためのリーフリーフノードの水平方向のポインタを用いたスキャンにおいて、キー「２８」のエントリがネックとなる。またさらに、ルートノード１０および上位ノード１３に格納されている上位エントリ１３に含まれるキー１４は、「２８」に加えテーブルデータ情報でキーとしなければならず、ルートノード１０および上位ノード１３の数を増大する要因となっている。結果的に、重複キーは、重複キーそのものに対するアクセス処理だけでなく、Ｂ木インデクス全体のアクセス性能の低下および格納効率の低下の要因となっている。
【００１１】
GrayはＢ木構造外へのデータ構造にてテーブルデータ情報を管理する方法を開示しているが、それに対する同時実行制御および回復制御方法に関しては開示されていない。また、物理ログによる回復処理方式について開示しているが、先に示したような同一ページに対する同時実行を実現することができない。このような問題は、同一キーに対するインデクスアクセスを行う場合、一般的に発生する。
【００１２】
本発明の目的は上記問題を改善し、複数のアクセスが行われる環境下において、同一キーに対するインデクスアクセスを好適に行うことが可能な技術を提供することにある。
【００１３】
【課題を解決するための手段】
前記課題を以下の手段により改善する。
【００１４】
複数ユーザから同時にアクセスされる環境下で、多数の同一キーを持つＢ木インデクスを管理する方法において、ある一つのキーに関する複数のデータポインタを格納するデータ構造であり、そのキーを含むインデクスエントリが格納されるリーフノードとは異なるデータ構造であり、そのデータ構造はそのキーを含むインデクスエントリからポイントされ、複数処理による同時アクセスが可能であるデータ構造を有し、あるキーに関連するデータポインタを追加もしくは削除する際に、そのキーに関連する複数のデータポインタを格納する前記データ構造に対し、そのデータポインタを追加もしくは削除するための情報を含む第１のログレコードを取得し、リーフノードに格納されたそのキーを含むインデクスエントリを更新し、前記データ構造へのそのデータポインタの追加もしくは削除が行われたことを示す情報を含む第２のログレコードを取得し、前記データ構造に、そのデータポインタを追加もしくは削除することにより上記課題を改善する。
【００１５】
【発明の実施の形態】
複数ユーザからの同時アクセス環境下において、同一キーに対するインデクスアクセスを効率的に行うことが可能な一実施形態のデータベース処理システムについて説明する。
【００１６】
まず、本発明の概念を図１を用いて簡単に説明する。
【００１７】
本実施形態のデータベース管理システムにおけるＢ木インデクス１は、図１に示すようにキーを用いて効率的にテーブルデータに関連する情報を取得するためのＢ木構造１０１と、キーごとにそのキーに関連付けられた多数のテーブルデータ情報を管理するデータ構造１０２(以降「重複キーデータ構造」と呼ぶ)から構成される。
【００１８】
Ｂ木構造１０１は、ルートノード１０、中間ノード１１、リーフノード１２により構成され、１つのルートノードを頂点に、ルートノードから多数のレベルにわたりノードが枝分かれしている構造をもつ。ルートノード１０および中間ノード１１には、キー１４とそのキー値に関連する下位レベルへのノードを示すポインタ１３とを有する上位エントリ１３が格納される。それぞれのノード内において、上位エントリ１３およびリーフエントリ１６は、そのインデクスエントリ１３、１６が有するキー値の順に格納管理されている。本実施例では、図の左から右方向へキー値が昇順に格納されている。それぞれのノード内において、それらソートされたインデクスエントリをバイナリサーチすることにより、効率的かつ安定して必要なインデクスエントリにアクセスすることができる。一つの上位インデクスエントリがポイントする子ノードには、その上位インデクスエントリ内キー値よりも小さいキー値を持つインデクスエントリが格納されている。
【００１９】
キー１４に関連するテーブルデータが多数の場合のリーフエントリ２０は、キー１４、そのキー値に関連するテーブルデータに関する情報の個数１７、及び、テーブルデータに関する情報を格納する重複キーデータ構造１０２へのポインタ１９を有する。図１の例では、キー値「２８」に対して、関連するテーブルデータ情報数すなわち重複数が「35,678」であることを示している。それら35,678個の情報は、ポインタ１９「Nx1」にて指される重複データ構造１０２に格納管理されている。
【００２０】
一方、キー値「２８」以外のインデクスエントリのように図１の例では、重複数すなわちテーブルデータに関する情報数があまり多くない場合、重複キーデータ構造１０２を持たないリーフエントリ１６として格納される。リーフエントリ１６は、キー１４、そのキー値に関連するテーブルデータに関する情報１８(データレコード識別子)、そのキー値に関連するテーブルデータに関する情報１８の個数１７を有する。データレコードの追加もしくは更新によって、そのキーに関連するテーブルデータ情報の重複数がある敷居値を超えた際に、上記重複キーデータ構造１０２をポイントするリーフエントリ２０に移行する
。
【００２１】
また、多数のテーブルデータに関連する情報を格納管理する重複キーデータ構造１０２は以下の特徴を持つ。
【００２２】
(1)アクセスの際に排他を取得することなく、複数処理による同時アクセスが可能である。
【００２３】
(2)通常処理のアクセスを制限することなく取り消し処理が可能である。
【００２４】
ここで、矢印１０３はデータレコードの更新に伴うＢ木インデクス１に対するキー値「２８」、テーブルデータ関連情報「Ｐ１８」の追加更新処理の様子を示している。まず、この追加更新処理は、キー値「２８」を用いてＢ木構造１０１のルートノード１０（Ｎ１）からサーチを開始し、中間ノード１１（Ｎ２）を経由し、更新対象のキー値「２８」を有するリーフエントリ２０が格納されているリーフノード１２（Ｎ５）に辿り着く。そして、リーフエントリ２０を確定し、第１のログレコード４１を取得し、エントリ内の重複数１７をインクリメントする。第１のログレコード４１には、重複キーデータ構造１０２に対する更新処理を続行するために必要な情報が含まれている。さらに、エントリ内の重複キーデータ構造１０２へのポインタ１９（Ｎｘ１）を取得し、重複キーデータ構造１０２へのアクセスを行う。そして、第２のログレコード４２を取得し、重複キーデータ構造１０２に対し、テーブルデータ関連情報「Ｐ１８」を追加する。第２のログレコード４２には、重複キーデータ構造１０２に対しテーブルデータ関連情報１８の追加が完了したことを示す情報が含まれている。
【００２５】
図１の例を用いてテーブルデータ関連情報の追加に関し説明したが、テーブルデータ関連情報の削除に関しても同様である。リーフエントリ２０をアクセスし、第１のログレコード４１を取得し、エントリ内の重複数１７をデクリメントする。そして、重複キーデータ構造１０２をアクセスし、第２のログレコードを取得した後に、重複キーデータ構造１０２からテーブルデータ関連情報１８の削除を行う。
【００２６】
以上のように、リーフエントリの更新から重複キーデータ構造への変更順序が一定であり、インデクス１を構成するデータリソースへの排他制御を必要としないことから、複数処理の同時実行が可能である。また、一連の変更処理結果に対し取り消し要因が発生した場合、もしくはリーフエントリに対する更新直後に処理の中断要因が発生した場合でも、前述の第１のログレコード４１および第２のログレコード４２を使用することにより、同時実行性を失うことなしに取り消し処理を実施することが可能である。
【００２７】
具体的には、取り消し処理時、第１のログレコード４１と第２のログレコードがペアで取得されている場合、一連の変更処理は完了していると判断し、図１の例では、キー値「２８」、テーブルデータ関連情報「Ｐ１８」の削除処理を行う。取り消し処理時、第１のログレコード４１しか取得されていない場合は、リーフエントリ２０に対する変更処理しか行われていないので、第１のログレコードに含まれる情報を用いて重複キーデータ構造２０１に対するテーブルデータ関連情報「Ｐ１８」の追加処理を行った後、改めてキー値「２８」、テーブルデータ関連情報「Ｐ１８」の削除処理を行う。取り消し処理においてもＢ木インデクス１に対する変更の順序は常に片方向であるため、他処理との同時実行性を損なうことはない。以上のように、取り消し処理を交えても高い同時実行性を提供することが可能である。
【００２８】
さらに、本重複キーデータ構造２０１をリーフエントリ２０からポイントし、リーフノード１２の外に格納管理することから、リーフエントリ２０が格納されているリーフノード１２内には、すべてのテーブルデータ関連情報１８をリーフエントリ１６の有する方式に較べ、より多くの他のキーを有するリーフエントリの格納が可能となる。そしてリーフエントリを格納するリーフノード１２の必要数を抑え、最終的に上位ノード１１の必要数も抑え、Ｂ木インデクス全体の格納容量削減にもなる。多くのテーブルデータ情報を持たないキーに対するアクセスに関しても、Ｂ木構造の容量が抑えられるため、Ｂ木構造を辿るすべての検索処理および変更処理に対し安定した性能を提供することが可能である。
【００２９】
次に図２に本実施形態のデータベース管理システムの概略構成を示す。ユーザが作成したアプリケーションプログラム６と、問い合わせやリソース管理などのデータベースシステム全体の管理を行うデータベース管理システム２がある。上記のデータベース管理システム２は、論理処理部２１、物理処理部２２、システム制御２３と、データベースアクセス対象となるデータを永続的にあるいは一時的に格納するデータベース３、そしてシステムログ４を有する。また、上記データベース管理システム２はネットワークなどを介して他のシステムと接続されている。
【００３０】
上記論理処理部２１は、問合せの構文解析・意味解析を行う問合せ解析２１１と、適切な処理手順を生成する最適化処理２１２と、処理手順に対応したコードの生成を行うコード生成２１３を具備している。また、上記生成されたコードに基づき、コードを解釈しデータベース処理実行の指示を物理処理部２２に対して行うコード解釈実行２１４を具備している。
【００３１】
上記物理処理部２２は、データベース３に格納されているテーブルデータ５に対し、データの格納、削除、更新、検索処理をデータベースバッファ２４を介して行うテーブルデータ管理部２２１を具備する。また、データベース３に格納されているインデクス１を管理するＢ木インデクス管理部２５を具備している。インデクス１は、キーを用いて効率的に関連情報にアクセスするためのＢ木構造と、キーごとにそのキーに関連付けられた多数のテーブルデータ情報を管理する重複キーデータ構造から成る。
【００３２】
Ｂ木インデクス管理部２５は、テーブルデータ５の更新に伴い、それに関連するＢ木インデクス１に対し、Ｂ木構造に対する変更を行うＢ木構造アクセス処理２６の機能、および重複データデータ構造に対する変更を行う重複キーデータ構造アクセス処理２７の機能を有する。また、インデクス１をアクセスすることにより、検索条件であるキーから関連するテーブルデータを高速に検索するための機能をＢ木構造アクセス処理２６と重複キーデータ構造アクセス処理２７内に有する。
【００３３】
上記物理処理部２２は、また、システムで共用するリソースの排他制御２２３を具備している。本実施例では、データに対する整合性は、テーブルデータ５に対する排他制御(排他リソース：テーブルデータ格納ページID,テーブルデータIDなど)のみで実現し、Ｂ木インデクス１のリソース(インデクスID,インデクスページID,キー値など)に対する排他制御は行わない。
【００３４】
システムログ４は、データベース３へのデータ挿入、更新、削除などの更新履歴情報を蓄積する。その蓄積される情報には、Ｂ木インデクス１に対する変更に関する履歴情報も含まれる。そして、トランザクションの取り消しやシステムダウンなどの種々の状況において、Ｂ木インデクス１のデータ整合性を回復するために使用される。
【００３５】
上記システム制御２３は、ユーザからの問い合わせ入力、問い合わせ結果の返却、コマンド受付によるデータベース保守などを行う。また、前述のシステムログ４の管理を行い、上記物理処理部２２と連携し、データベース３変更の際に履歴情報であるログレコードの取得２６を行う。そして、システム制御２３は、読み込んだログレコードがＢ木インデクスに対する変更処理に関する場合に、物理処理部２２のＢ木インデクス管理部２５にそのログレコードを渡し、取り消し処理を依頼する。依頼されたＢ木インデクス管理部２５は、ログレコード内の情報を基に変更取り消し処理を行う。
【００３６】
ユーザは、アプリケーションプログラム６を介し、データベース管理システム２を用いて、データベース３内のテーブルデータ５を構成するデータレコードを探索、アクセスし、変更することができる。データベース管理システム２はデータレコードへの効率のよいアクセスを実現するために、Ｂ木インデクス１を利用する。
【００３７】
データベース３内のＢ木インデクス１あるいはテーブルデータ５は、アクセス単位であるページ単位に、データベースバッファ２４に読み込んでくる(GETしてくる)ことによりアクセスが可能になる。バッファ上のページを参照（検索）した後、バッファをRELEASEすることにより使用していたバッファに対する利用終了を宣言し、他処理からのそのバッファの利用を許可する。また、バッファ上のページに対して更新を行った後、バッファをPUTすることにより当該バッファ上のページがデータベースに反映することを宣言し、使用していたバッファの他処理からの利用を許可する。そのバッファ上のページはすぐにはデータベースに反映されず、ある契機にデータベースに反映される。バッファにGETしたページに他のトランザクションが不当にアクセスしないようにバッファに対してラッチをかける。ラッチは一種のロックであるが通常のロックよりもはるかに取得期間が短く、獲得と解除を安価に行うことができる。本実施例では、インデクスノードを参照する際に共用ラッチを、更新する際に排他ラッチをかける。排他ラッチに関する処理はシリアライズされ、共用ラッチ間ではバッファに対する同時参照を可能にする。そして、バッファ解放の際に獲得中ラッチを解除し、排他ラッチを獲得しようとしている他の処理のアクセスを許可する。
【００３８】
図３は本実施形態のコンピュータシステムのハードウエア構成の一例を示す図である。コンピュータシステム３０００は、ＣＰＵ３００２、主記憶装置３００１、磁気ディスク装置等の外部記憶装置３００３及び多数の端末３００４で構成される。主記憶装置３００１上には、図２を用いて先に説明したデータベース管理システム２が置かれ、外部記憶装置３００３上にはデータベース管理システム３が管理するテーブルデータ５とインデクス１を含むデータベース３が格納される。また、システムログ４も外部記憶装置３００３上に格納される。さらに、データベース管理システム２を実現するプログラム３１００も外部記憶装置３００３上に格納される。
【００３９】
図８は本実施形態のインデクスのログレコードの一例を示す図である。
【００４０】
図８の４１に示す様に、図１にて説明したリーフエントリ変更時に取得される第１のログレコード４１は、その変更処理が属するトランザクションの情報などを含むログレコードヘッダ４１０、変更を示す操作コード４１１、キー４１２、変更対象であるテーブルデータ情報４１３、そして変更されたリーフエントリが指す重複キーデータ構造へのポインタ４１４により構成される。操作コード４１１には、追加処理や削除処理を示すコードが設定される。図８の例ではその設定コードは追加処理を示す「INSERT」である。ログレコードヘッダ４１０には変更対象のＢ木インデクスのインデクスIDやそのインデクスが格納されるエリア情報も含まれる。キー４１２に設定される値は変更対象であるリーフエントリ２０内のキー１４の値と同値である。図８の例ではその値は「２８」である。
【００４１】
重複キーデータ構造へのポインタ４１４は、リーフエントリ変更は終了したが重複キーデータ構造変更が完了していない状況にて変更結果取り消し要因が発生した場合に、重複キーデータ構造に対する変更処理を続行するための情報である。
【００４２】
また図８の４２に示す様に、図１にて説明したリーフエントリ変更時に取得される第２のログレコードは、その変更処理が属するトランザクションの情報などを含むログレコードヘッダ４２０、変更を示す操作コード４２１、そして変更対象であるテーブルデータ情報４２２により構成される。４１１と同様に操作コード４２１には、追加処理や削除処理を示すコードが設定される。図８の例ではその設定コードは追加処理を示す「INSERT」である。また、テーブルデータ情報４２２には、第１のログレコード４１の４１３と同じ値が設定されることになる。第２のログレコード４２は、先に図１を用いて説明したように、重複キーデータ構造に対する変更処理が終了しているかを示すマーカの役割を担う。
【００４３】
図４は本実施形態のＢ木インデクス管理部２５の処理手順を示すフローチャートである。図４ではＢ木インデクスに対するテーブルデータ情報追加の場合のＢ木構造アクセス処理２６の処理内容を表している。
【００４４】
まず、ステップ４０１において、キーを用いて、Ｂ木構造をルートノードから上位ノードへと辿り、リーフノードを確定後、そのリーフノードをデータベースバッファに読み込みバッファ固定(GET)を行う。次にステップ４０２において、リーフノード内に格納されているリーフエントリをサーチし、変更対象であるリーフエントリの確定を行う。そして、ステップ４０３においてそのリーフエントリのキーに関連するテーブルデータ情報数(重複数)をインクリメントする。ここで、ステップ４０４にて、リーフエントリ内に含まれる重複キーデータ構造を指すポインタを取得する。さらに、図８を用いて説明した第１のログレコードを作成し（ステップ４０５）、作成した第１のログレコードを取得する（ステップ４０６）。最後にリーフノードをデータベースに書き出すように予約し、バッファ固定を解除(PUT)する（ステップ４０７）。以上でリーフエントリ変更処理を終了し（ステップ４０８）、引き続き重複キーデータ構造に対する変更処理に進む。
【００４５】
本フローチャートでは、リーフエントリ変更（４０４）、ログレコード取得（４０５）の順に処理が行われているが、実際のデータベースへのリーフノードに対する変更結果の反映は、ＷＡＬ(Write-ahead log)プロトコルに従ってログレコードのシステムログへの書き出しが実施された後に行われる。
【００４６】
図５は本実施形態のＢ木インデクス管理部２５の処理手順を示すフローチャートである。図５ではＢ木インデクスに対するテーブルデータ情報追加の場合の重複キーデータ構造アクセス処理２７の処理内容を表している。
【００４７】
重複キーデータ構造に対する変更処理は、図４のフローチャートで説明したリーフエントリに対する変更処理に引き続き行われる。
【００４８】
まず、ステップ５０１において、先に図４のステップ４０４にて取得したポインタを用いて重複キーデータ構造へアクセスし、ステップ５０２において、テーブルデータ情報を追加する位置をサーチする。そして、図８を用いて説明した第２のログレコードを作成し（ステップ５０３）、作成した第２のログレコードを取得する（ステップ５０４）。ここで、重複キーデータ構造に追加領域があるかを判定する（ステップ５０５）。追加領域が無い場合、ステップ５０６に進み、新規領域の割り当てを行い、新規領域追加に伴う重複キーデータ構造の変更を行う(ステップ５０７)。具体的には重複キーデータ構造が複数のページから構成される場合は、新規ページ割り当てが行われる。その後ステップ５０８へ進む。また、ステップ５０５の判定結果が領域有りの場合、ステップ５０８へ進み、重複キーデータ構造へテーブルデータ情報を追加し、重複キーデータ構造に対する変更処理を終了する（ステップ５０９）。
【００４９】
図６は本実施形態のＢ木インデクス管理部２５の処理手順を示すフローチャートである。図６ではＢ木インデクスに対するテーブルデータ情報削除の場合のＢ木構造アクセス処理２６の処理内容を表している。
【００５０】
まず、ステップ６０１において、キーを用いて、Ｂ木構造をルートノードから上位ノードへと辿り、リーフノードを確定後、そのリーフノードをデータベースバッファに読み込みバッファ固定(GET)を行う。次にステップ６０２において、リーフノード内に格納されているリーフエントリをサーチし、変更対象であるリーフエントリの確定を行う。ここで、ステップ６０３において、リーフエントリ内に含まれる重複キーデータ構造を指すポインタを取得する。そして、ステップ６０４において、そのリーフエントリのキーに関連するテーブルデータ情報数(重複数)が、１かどうかを判定する。重複数が１の場合、ステップ６０５に進みリーフエントリの削除を行い、ステップ６０７に進む。ステップ６０４の判定結果が重複数が１より大きい場合、ステップ６０６に進み、そのリーフエントリのキーに関連するテーブルデータ情報数(重複数)をデクリメントする。
【００５１】
さらに、図８を用いて説明した第１のログレコードを作成し（ステップ６０７）、作成した第１のログレコードを取得する（ステップ６０８）。最後にリーフノードをデータベースに書き出すように予約し、バッファ固定を解除(PUT)する（ステップ６０９）。以上でリーフエントリ変更処理を終了し（ステップ６１０）、引き続き重複キーデータ構造に対する変更処理に進む。
【００５２】
図７は本実施形態のＢ木インデクス管理部２５の処理手順を示すフローチャートである。図７ではＢ木インデクスに対するテーブルデータ情報削除の場合の重複キーデータ構造アクセス処理２７の処理内容を表している。
【００５３】
重複キーデータ構造に対する変更処理は、図６のフローチャートで説明したリーフエントリに対する変更処理に引き続き行われる。
【００５４】
まず、ステップ７０１において、先に図４のステップ６０３にて取得したポインタを用いて重複キーデータ構造へアクセスし、ステップ７０２において、削除対象テーブルデータ情報の存在位置をサーチする。そして、図８を用いて説明した第２のログレコードを作成し（ステップ７０３）、作成した第２のログレコードを取得する（ステップ７０４）。
【００５５】
次に、ステップ７０５において、重複キーデータ構造からのテーブルデータ情報削除を行う。ここで、テーブルデータ情報の削除によって、回収可能な重複キーデータ構造を構成する領域が発生したかを判定する（ステップ７０６）。判定の結果、回収可能領域が発生した場合、ステップ７０７へ進み領域の回収を行い、領域回収に伴う重複キーデータ構造の変更を行う(ステップ７０８)。具体的には重複キーデータ構造が複数のページから構成される場合は、ページの回収が行われる。回収されたページは他の処理における新規ページ割り当てにより再利用されることとなる。ステップ７０８後ステップ７０９にて重複キーデータ構造に対する変更処理を終了する。また、ステップ７０６における判定結果が、回収可能領域発生なしの場合、そのままステップ７０９にて重複キーデータ構造に対する変更処理を終了する。
【００５６】
図９は本実施形態のインデクス管理部のインデクス更新結果取り消し処理の処理手順を示すフローチャートである。
【００５７】
まず、ステップ９０１において、システム制御からＢ木インデクス変更に関するログレコードを受け取る。ステップ９０２において、処理すべきログレコードが存在するかを判定する。処理すべきログレコードが存在しない場合、ステップ９１３にてインデクス更新結果取り消し処理を終了する。処理すべきログレコードが存在する、すなわち受け取りログレコードが存在する場合、ステップ９０３へ進みログレコードの種別を判定する。ここでまずログレコードが第２のログレコードかどうかを判定する（ステップ９０３）。第２のログレコードである場合、ステップ９０４へ進み第２のログレコードを保持する。そして、ステップ９０１へ戻り次のログレコードの処理を行う。ここで次のログレコードは、第１のログレコードのはずである。ステップ９０３の判定において、第１のログレコードではない場合、さらにステップ９０５にて、ログレコードが第１のログレコードかどうかを判定する。第１のログレコードでない場合、ステップ９０６にてその他のログレコードに対応する更新結果取り消し処理を行い、ステップ９０１へ戻り、次のログレコードの処理を行う。第１のログレコードである場合、以下の処理を行う。まず第２のログレコードが保持されているかを判定し（ステップ９０７）、保持されている場合には、リーフエントリおよび重複キーデータ構造に対する一連の変更処理が完了していると判断し、保持されている第２のログレコードを廃棄し（ステップ９０８）、ステップ９１０へ進む。第２のログレコードが保持されていない場合、リーフエントリの変更処理は終わっているが対となる重複キーデータ構造に対する変更処理が完了していないと判断し、ステップ９０９にて重複キーデータ構造の対する更新処理を続行する。その後ステップ９１０へ進む。ステップ９１０では、処理中のログレコードが通常処理にて取得されたものであるかどうかを判定する。通常処理にて取得されたログレコードである場合、関連する変更結果を取り消す必要がある。ステップ９１１にてその取り消し処理を行う。また、通常処理時取得ではない、すなわち取り消し処理時における取得の場合、取り消し処理はステップ９０９を含めすでに完了していると判断し、一連のＢ木インデクス更新結果の取り消し処理を終了し、ステップ９０１へ戻り、次のログレコードの処理を行う。
【００５８】
図１０は本実施形態の重複キーデータ構造１０２の構成の一例を示す図である。
【００５９】
図１０の例では、重複キーデータ構造１０２は、実際にテーブルデータ情報を格納するページ１００２と、そのページ１００２を指すポインタを有する管理エントリ１００３を格納するページ１００１より構成される。
【００６０】
ページ１００１に格納される管理エントリ１００３は、ページ１００２を指すポインタ１００５、およびそのページ１００２内に格納されるテーブルデータ情報の最大値１００４を有する。ページ１００１内の管理エントリ１００３は、テーブルデータ情報の最大値１００４の順にソートされている。
【００６１】
また図１０に示すように、各ページ１００１は、そのページ１００１内に格納される管理エントリ１００３内の最大情報１００４よりも大きな最大情報を持つ管理エントリが格納されるページ１００１へのポインタを有する。リーフエントリ２０からは、最も小さいテーブルデータ情報を格納するページ１００２をポイントする管理エントリが格納されてるページ１００１がポイントされる。
【００６２】
以上のデータ構造により、図５および図７にて説明した重複キーデータ構造内のサーチを効率的に行うことが可能である。
【００６３】
また、ページ１００１もしくはページ１００２の新規割り当て・回収は、データベースバッファへのＧＥＴ・ＰＵＴにより容易に実現可能である。新規割り当て・回収の際の第２のログレコードは、図８の例で示した構成情報の他にページ１００１もしくはページ１００２のページ識別子を付け加えることにより、取り消し処理における変更続行処理が容易に可能である。
【００６４】
以上示したフローチャートの処理は、図３で例として示したコンピュータシステムにおけるプログラムとして実行される。しかし、そのプログラムは図３の例の様にコンピュータシステムに物理的に直接接続される外部記憶装置に格納されるものと限定はしない。ハードディスク装置、フロッピーディスク装置等のコンピュータで読み書きできる記憶媒体に格納することができる。また、ネットワークを介して図３のコンピュータシステムとは別のコンピュータシステムに接続される外部記憶装置に格納することもできる。
【００６５】
このようにすることにより、同一キーを持つテーブルデータ情報が、アクセスの際に排他を取得することなく複数処理による同時アクセスが可能であり、且つ通常処理のアクセスを制限することなく取り消し処理が可能である重複キーデータ構造で管理されるため、同一キーに対して同時にアクセスする複数の処理に対し、取り消し処理をも交えた高い同時実行性を提供することが可能である。
【００６６】
その重複キーデータ構造はリーフエントリからポイントされ、リーフノードの外に格納管理されることから、重複キーを有するリーフエントリが格納されているリーフノード内により多くの他のキーを有するリーフエントリの格納が可能となる。その結果、Ｂ木インデクスの格納容量が抑えられ、安定したアクセス性能を提供することが可能になる。
【００６７】
以上のように、多数の重複キーを有する場合でも同時実行性に優れ、格納効率のよいＢ木インデクスを用いたデータ管理方法を提供することができる。
【００６８】
本Ｂ木インデクスを用いたデータ管理方法は、ワークフローにおける「業務推移状態」など、値域の狭いデータに対してＢ木インデクスを利用するようなアプリケーションに対して非常に有効である。そのアプリケーションが利用するＢ木インデクスでは、値域が狭いため一つのキーに関連するデータ重複数が膨大であり、さらに「業務推移状態」などの変化によってＢ木インデクスに関して頻繁に更新が発生するためである。
【００６９】
以上、Ｂ木について説明してきたが、本発明は、複数のアクセスが行われる環境下において、同一キーに対するインデクスアクセスが発生する場合に同様に適用可能であり、同一キーに対するインデクスアクセスを好適に行うことが可能なデータ管理方法および装置を提供することが可能である。
【００７０】
【発明の効果】
本発明によれば、複数のアクセスが行われる環境下において、同一キーに対するインデクスアクセスを好適に行うことが可能なデータ管理方法および装置を提供することができる。
【図面の簡単な説明】
【図１】本発明の概念図である。
【図２】本実施形態のＢ木インデクスを有するデータベース処理システムの機能ブロックを示す図である。
【図３】本実施形態のコンピュータシステムのハードウエア構成の一例を示す図である。
【図４】本実施形態のＢ木インデクス管理部２５のインデクス更新処理の処理手順を示すフローチャートである。
【図５】本実施形態のＢ木インデクス管理部２５のインデクス更新処理の処理手順を示すフローチャートである。
【図６】本実施形態のＢ木インデクス管理部２５のインデクス更新処理の処理手順を示すフローチャートである。
【図７】本実施形態のＢ木インデクス管理部２５のインデクス更新処理の処理手順を示すフローチャートである。
【図８】本実施形態のＢ木インデクスのログレコードの一例を示す図である。
【図９】本実施形態のＢ木インデクス管理部のインデクス更新結果取消し処理(回復処理手順)の処理手順を示すフローチャートである。
【図１０】本実施形態の重複キーデータ構造の構成の一例を示す図である。
【図１１】従来の多くの重複キーを有するＢ木インデクスの構成を示す図である。
【符号の説明】
１…Ｂ木インデクス、２…データベース管理システム、３…データベース、４…システムログ、５…テーブルデータ、６…アプリケーションプログラム、２１…論理処理部、２２…物理処理部、２３…システム制御、２５…Ｂ木インデクス管理部、２６…Ｂ木構造アクセス処理、２７…重複キーデータ構造アクセス処理、１０１…Ｂ木構造、１０２…重複キーデータ構造、４１…第１ログレコード、４２…第２ログレコード。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a data management technique, and more particularly to a technique that is effective when applied to managing a plurality of data with respect to a key of an index.
[0002]
[Prior art]
With the rapid spread of business systems using the Internet in recent years, the field of application of database management systems (DBMS) that support the system is expanding. In addition, the amount of data handled by DBMS is increasing year by year. There are many applications in the system that use B-tree indexes, which are the most representative high-speed search means of DBMS, for narrow-range data such as “business transition status” in workflows, and the stability of B-tree indexes. It is important to provide the required performance.
[0003]
Regarding the B-tree index, for example, in the literature (Jim Gray and Andreas Reuter, Transaction Processing: Concepts and Techniques, Morgan Kaufmann Publishers, 1993) Is disclosed.
[0004]
In order to realize a high-speed B-tree index as a real system, it is necessary to solve the following problems.
[0005]
(I) Improvement of concurrency by multiple processes
Problems arise when multiple transactions access a database table that contains records simultaneously. Specifically, a contention situation occurs when a transaction attempts to update one record and another transaction attempts to access the same record at the same time. As a solution to the contention problem, there is a locking method (exclusive access method) for a part of a record or a B-tree index (the entire B-tree index, node, index entry, key value, etc.). A locking scheme forces a transaction to acquire a lock before accessing data. At this time, if the lock is applied by another transaction, the lock may collide, so that a necessary lock may not be acquired. The locking method guarantees the update result and reference result of the own transaction, but while acquiring the lock, it waits for the access of other transactions until the lock is released, thus increasing the system multiplicity (simultaneous execution) and throughput. In order to increase the number of locks, it is important to minimize the number of locks acquired simultaneously by one transaction and the influence range (granularity) of the locks.
[0006]
(Ii) Recovery processing control
In an actual system, it is necessary to guarantee the atomicity of a transaction and guarantee the consistency of data in a B-tree index with respect to various failures such as transaction interruption, system down, and media failure. Therefore, the log record acquisition method and the recovery process control using the log record are important. As a log record acquisition method, there is a physical log method in which a physical image before and after updating a node constituting an index is acquired by using log record information. In recovery using a physical image, other transactions are restricted from accessing the page until the updated transaction is completed. In other words, recovery process control is closely related to exclusive control and also affects concurrency.
[0007]
FIG. 11 shows the structure of a conventional typical B-tree index.
[0008]
In the B-tree index, nodes are branched over a number of levels from the root node, with one root node as a vertex. The innermost node of the branched node is often called a leaf node. Nodes other than leaf nodes including the root node are often referred to as upper nodes. Each node has a plurality of index entries composed of the following information. The index entry in the leaf node is composed of a pointer indicating a record (table data) in the database and a key value representing one of the characteristics of the record (table data). The index entry in the upper node is a pointer indicating the node (child node) at the next lower level, and one key value indicating the range of key values that are branched from the child node and are finally managed by the leaf node. Consists of The key value in the higher-order index entry is a guideline when the access program to the B-tree index traces from the root node to the leaf node in which the index entry including the pointer to the target record is stored and managed (determination Element). Normally, one node of the B-tree index is realized by a page which is an access unit on the database.
[0009]
[Problems to be solved by the invention]
One of the exclusive control methods for the B-tree index is a key value exclusion method or a key range exclusion method in which the granularity of the exclusive resource of the index is “key”. When exclusive control for these index keys is used, search / update processing for data (rows) having the same key is serialized by index key exclusion, and concurrency decreases. Transactions in the WEB environment that are currently spreading rapidly are longer than the length of conventional OLTP transactions, and it is difficult to provide high-throughput services with the exclusive control method for keys.
[0010]
Also, as shown in FIG. 11, there are the following problems with respect to the B-tree index having a large number of identical keys. In the B-tree index of FIG. 11, table data information related to a certain key is managed by one leaf entry 16. The leaf entry 16 has a key 14, table data information 18 (data record identifier) related to the key value, and the number 17 (super plural) of information 18 related to the table data related to the key value. The table data information 18 is arranged in ascending order in the form of a plurality of duplicates. In such a storage method, when the number of overlapping for a certain key increases, the length of the leaf entry itself cannot be accommodated in the leaf node 12. As a result, the table data information for one key must be stored in a plurality of nodes as a plurality of leaf entries. In the example of FIG. 11, the entry of the key “28” is divided and stored in three nodes, leaf nodes N4, N5, and N6. In this structure, the entry of the key “28” becomes a bottleneck in the scan using the horizontal pointer of the leaf leaf node to cope with the range search. Furthermore, the key 14 included in the upper entry 13 stored in the root node 10 and the upper node 13 must be a key in the table data information in addition to “28”. The number of the root node 10 and the upper node 13 It is a factor that increases. As a result, the duplicate key causes not only the access process for the duplicate key itself but also the access performance and storage efficiency of the entire B-tree index.
[0011]
Gray discloses a method of managing table data information in a data structure outside the B-tree structure, but does not disclose a concurrent execution control and recovery control method for the method. Further, although a recovery processing method using a physical log is disclosed, simultaneous execution on the same page as described above cannot be realized. Such a problem generally occurs when performing index access to the same key.
[0012]
An object of the present invention is to improve the above-described problem and provide a technique capable of suitably performing index access to the same key in an environment where a plurality of accesses are performed.
[0013]
[Means for Solving the Problems]
The above problems are improved by the following means.
[0014]
In a method of managing a large number of B-tree indexes having the same key in an environment accessed by a plurality of users at the same time, the data structure stores a plurality of data pointers related to a single key, and an index entry including the key is The data structure is different from the stored leaf node. The data structure is pointed to by the index entry containing the key, and has a data structure that can be accessed simultaneously by multiple processes. When adding or deleting, a first log record including information for adding or deleting the data pointer is acquired for the data structure storing a plurality of data pointers related to the key, and is stored in the leaf node. Update the stored index entry containing the key To obtain a second log record containing information indicating that the addition or deletion of data pointer to a structure is made, in the data structure, to improve the above problems by adding or deleting the data pointer.
[0015]
DETAILED DESCRIPTION OF THE INVENTION
A database processing system according to an embodiment capable of efficiently performing index access to the same key in a simultaneous access environment from a plurality of users will be described.
[0016]
First, the concept of the present invention will be briefly described with reference to FIG.
[0017]
A B-tree index 1 in the database management system according to the present embodiment includes a B-tree structure 101 for efficiently acquiring information related to table data using a key as shown in FIG. It consists of a data structure 102 (hereinafter referred to as “duplicate key data structure”) for managing a large number of associated table data information.
[0018]
The B-tree structure 101 includes a root node 10, an intermediate node 11, and a leaf node 12, and has a structure in which nodes are branched from a root node over a number of levels with one root node as a vertex. The root node 10 and the intermediate node 11 store an upper entry 13 having a key 14 and a pointer 13 indicating a node to a lower level related to the key value. Within each node, the upper entry 13 and the leaf entry 16 are stored and managed in the order of the key values of the index entries 13 and 16. In this embodiment, key values are stored in ascending order from left to right in the figure. By performing a binary search on the sorted index entries in each node, the necessary index entries can be accessed efficiently and stably. The child node pointed to by one higher index entry stores an index entry having a key value smaller than the key value in the higher index entry.
[0019]
When there are a large number of table data related to the key 14, the leaf entry 20 is stored in the key 14, the number 17 of information related to the table data related to the key value, and the duplicate key data structure 102 storing information related to the table data. It has a pointer 19. In the example of FIG. 1, the number of related table data information, that is, the overlap number is “35,678” for the key value “28”. The 35,678 pieces of information are stored and managed in the duplicate data structure 102 pointed to by the pointer 19 “Nx1”.
[0020]
On the other hand, in the example of FIG. 1, like the index entry other than the key value “28”, when the number of information regarding the duplication number, that is, the table data is not so large, it is stored as the leaf entry 16 having no duplicate key data structure 102. The leaf entry 16 includes a key 14, information 18 (data record identifier) related to table data related to the key value, and the number 17 of information 18 related to table data related to the key value. When the overlap of the table data information related to the key exceeds a threshold value due to the addition or update of the data record, the processing shifts to the leaf entry 20 that points to the duplicate key data structure 102.
.
[0021]
The duplicate key data structure 102 for storing and managing information related to a large number of table data has the following characteristics.
[0022]
(1) Simultaneous access by multiple processes is possible without acquiring exclusive access.
[0023]
(2) Cancel processing is possible without restricting access to normal processing.
[0024]
Here, an arrow 103 indicates a state of additional update processing of the key value “28” and the table data related information “P18” for the B-tree index 1 accompanying the update of the data record. First, in this additional update process, a search is started from the root node 10 (N1) of the B-tree structure 101 using the key value “28”, and the update target key value “28” is passed through the intermediate node 11 (N2). To the leaf node 12 (N5) in which the leaf entry 20 having "" is stored. Then, the leaf entry 20 is confirmed, the first log record 41 is acquired, and the duplication number 17 in the entry is incremented. The first log record 41 includes information necessary for continuing the update process for the duplicate key data structure 102. Further, the pointer 19 (Nx1) to the duplicate key data structure 102 in the entry is acquired, and the duplicate key data structure 102 is accessed. Then, the second log record 42 is acquired, and table data related information “P18” is added to the duplicate key data structure 102. The second log record 42 includes information indicating that the addition of the table data related information 18 to the duplicate key data structure 102 has been completed.
[0025]
Although the addition of table data related information has been described using the example of FIG. 1, the same applies to the deletion of table data related information. The leaf entry 20 is accessed, the first log record 41 is acquired, and the duplication number 17 in the entry is decremented. Then, after accessing the duplicate key data structure 102 and acquiring the second log record, the table data related information 18 is deleted from the duplicate key data structure 102.
[0026]
As described above, the order of change from the update of the leaf entry to the duplicate key data structure is constant, and exclusive control of the data resources constituting the index 1 is not required, so that a plurality of processes can be executed simultaneously. . Further, even when a canceling factor occurs for a series of change processing results or when a processing interrupting factor occurs immediately after updating a leaf entry, the first log record 41 and the second log record 42 described above are used. By doing so, it is possible to carry out cancellation processing without losing concurrency.
[0027]
Specifically, during the cancellation process, if the first log record 41 and the second log record are acquired in pairs, it is determined that a series of change processes have been completed. In the example of FIG. The deletion process of the value “28” and the table data related information “P18” is performed. When only the first log record 41 is acquired at the time of the cancellation process, only the change process for the leaf entry 20 is performed, and therefore the table for the duplicate key data structure 201 using the information included in the first log record. After the addition process of the data related information “P18”, the key value “28” and the table data related information “P18” are deleted again. Even in the cancellation process, the order of changes to the B-tree index 1 is always one-way, so that the concurrency with other processes is not impaired. As described above, high concurrency can be provided even with cancellation processing.
[0028]
Further, since the duplicate key data structure 201 is pointed from the leaf entry 20 and stored and managed outside the leaf node 12, all the table data related information 18 is stored in the leaf node 12 in which the leaf entry 20 is stored. Compared with the system having the leaf entry 16, the leaf entry having more other keys can be stored. Then, the necessary number of leaf nodes 12 for storing leaf entries is reduced, and finally the necessary number of upper nodes 11 is also reduced, thereby reducing the storage capacity of the entire B-tree index. Even for access to a key that does not have much table data information, the capacity of the B-tree structure is suppressed, so that it is possible to provide stable performance for all search processes and change processes that follow the B-tree structure.
[0029]
Next, FIG. 2 shows a schematic configuration of the database management system of the present embodiment. There is an application program 6 created by a user and a database management system 2 that manages the entire database system such as inquiries and resource management. The database management system 2 includes a logical processing unit 21, a physical processing unit 22, a system control 23, a database 3 that permanently or temporarily stores data to be accessed by the database, and a system log 4. The database management system 2 is connected to other systems via a network or the like.
[0030]
The logic processing unit 21 includes a query analysis 211 that performs syntax analysis / semantic analysis of a query, an optimization process 212 that generates an appropriate processing procedure, and a code generation 213 that generates a code corresponding to the processing procedure. ing. Further, a code interpretation execution 214 for interpreting the code and instructing the physical processing unit 22 to execute the database processing based on the generated code is provided.
[0031]
The physical processing unit 22 includes a table data management unit 221 that performs data storage, deletion, update, and search processing on the table data 5 stored in the database 3 via the database buffer 24. In addition, a B-tree index management unit 25 that manages the index 1 stored in the database 3 is provided. The index 1 includes a B-tree structure for efficiently accessing related information using a key, and a duplicate key data structure for managing a large number of table data information associated with the key for each key.
[0032]
As the table data 5 is updated, the B-tree index management unit 25 performs a function of the B-tree structure access processing 26 for changing the B-tree structure for the B-tree index 1 related thereto, and changes to the duplicate data structure. It has the function of the duplicate key data structure access processing 27 to be performed. Further, the B-tree structure access process 26 and the duplicate key data structure access process 27 have a function for searching the related table data at high speed from the search condition key by accessing the index 1.
[0033]
The physical processing unit 22 also includes an exclusive control 223 for resources shared by the system. In this embodiment, data consistency is realized only by exclusive control (exclusive resources: table data storage page ID, table data ID, etc.) for table data 5, and B-tree index 1 resources (index ID, index page ID). , Key value, etc.) are not controlled exclusively.
[0034]
The system log 4 accumulates update history information such as data insertion, update, and deletion in the database 3. The accumulated information includes history information regarding changes to the B-tree index 1. It is used to restore data consistency of the B-tree index 1 in various situations such as transaction cancellation and system down.
[0035]
The system control 23 performs inquiry input from the user, return of inquiry results, database maintenance by command reception, and the like. In addition, the system log 4 is managed, and in cooperation with the physical processing unit 22, log records that are history information are acquired 26 when the database 3 is changed. Then, when the read log record relates to a change process for the B-tree index, the system control 23 passes the log record to the B-tree index management unit 25 of the physical processing unit 22 and requests cancellation processing. The requested B-tree index management unit 25 performs change cancellation processing based on the information in the log record.
[0036]
A user can search, access, and change data records constituting the table data 5 in the database 3 using the database management system 2 via the application program 6. The database management system 2 uses the B-tree index 1 to realize efficient access to data records.
[0037]
The B-tree index 1 or the table data 5 in the database 3 can be accessed by reading (GET) it into the database buffer 24 in units of pages which are access units. After referencing (searching) the page on the buffer, the buffer is released by releasing the buffer, and the use of the buffer from other processing is permitted. Also, after updating the page on the buffer, declare that the page on the buffer will be reflected in the database by PUT the buffer, and allow the buffer to be used from other processing. . The page on the buffer is not immediately reflected in the database, but is reflected in the database at a certain moment. Latch the buffer so that other transactions do not access the page GET to the buffer. Although a latch is a kind of lock, the acquisition period is much shorter than that of a normal lock, and acquisition and release can be performed at a low cost. In this embodiment, the shared latch is applied when referring to the index node, and the exclusive latch is applied when updating. Processing related to the exclusive latch is serialized, and simultaneous references to the buffer are possible between the shared latches. Then, the latch being acquired is released when the buffer is released, and access to another process that is trying to acquire the exclusive latch is permitted.
[0038]
FIG. 3 is a diagram illustrating an example of a hardware configuration of the computer system according to the present embodiment. The computer system 3000 includes a CPU 3002, a main storage device 3001, an external storage device 3003 such as a magnetic disk device, and a number of terminals 3004. The database management system 2 described above with reference to FIG. 2 is placed on the main storage device 3001, and the database 3 including the table data 5 and the index 1 managed by the database management system 3 is placed on the external storage device 3003. Stored. The system log 4 is also stored on the external storage device 3003. Further, a program 3100 for realizing the database management system 2 is also stored on the external storage device 3003.
[0039]
FIG. 8 is a diagram showing an example of an index log record according to the present embodiment.
[0040]
As shown in 41 of FIG. 8, the first log record 41 acquired when the leaf entry described in FIG. 1 is changed is a log record header 410 including information of a transaction to which the change process belongs, an operation indicating the change A code 411, a key 412, table data information 413 to be changed, and a pointer 414 to a duplicate key data structure pointed to by the changed leaf entry. In the operation code 411, a code indicating an addition process or a deletion process is set. In the example of FIG. 8, the setting code is “INSERT” indicating addition processing. The log record header 410 includes an index ID of the B-tree index to be changed and area information for storing the index. The value set in the key 412 is the same as the value of the key 14 in the leaf entry 20 to be changed. In the example of FIG. 8, the value is “28”.
[0041]
The pointer 414 to the duplicate key data structure continues the change process for the duplicate key data structure when a change result canceling factor occurs in a situation where the leaf entry change is completed but the duplicate key data structure change is not completed. It is information for.
[0042]
Further, as indicated by 42 in FIG. 8, the second log record acquired when the leaf entry described in FIG. 1 is changed is a log record header 420 including information on the transaction to which the change process belongs, and an operation indicating the change. It consists of a code 421 and table data information 422 to be changed. Similar to 411, the operation code 421 is set with a code indicating addition processing or deletion processing. In the example of FIG. 8, the setting code is “INSERT” indicating addition processing. Further, the same value as 413 of the first log record 41 is set in the table data information 422. As described above with reference to FIG. 1, the second log record 42 serves as a marker that indicates whether the change process for the duplicate key data structure has been completed.
[0043]
FIG. 4 is a flowchart showing a processing procedure of the B-tree index management unit 25 of this embodiment. FIG. 4 shows the processing contents of the B-tree structure access processing 26 when table data information is added to the B-tree index.
[0044]
First, in step 401, using the key, the B tree structure is traced from the root node to the upper node, and after determining the leaf node, the leaf node is read into the database buffer and the buffer is fixed (GET). Next, in step 402, the leaf entry stored in the leaf node is searched, and the leaf entry to be changed is determined. In step 403, the number (overlapping number) of table data information related to the key of the leaf entry is incremented. Here, in step 404, a pointer to the duplicate key data structure included in the leaf entry is acquired. Further, the first log record described with reference to FIG. 8 is created (step 405), and the created first log record is acquired (step 406). Finally, the leaf node is reserved to be written in the database, and the buffer fixation is released (PUT) (step 407). Thus, the leaf entry change process is completed (step 408), and the process proceeds to the change process for the duplicate key data structure.
[0045]
In this flowchart, processing is performed in the order of leaf entry change (404) and log record acquisition (405), but the reflection of the change result for the leaf node in the actual database is in accordance with the WAL (Write-ahead log) protocol. This is done after the log record is written to the system log.
[0046]
FIG. 5 is a flowchart showing a processing procedure of the B-tree index management unit 25 of the present embodiment. FIG. 5 shows the processing contents of the duplicate key data structure access processing 27 when table data information is added to the B-tree index.
[0047]
The change process for the duplicate key data structure is continued from the change process for the leaf entry described in the flowchart of FIG.
[0048]
First, in step 501, the duplicate key data structure is accessed using the pointer previously obtained in step 404 of FIG. 4, and in step 502, a position for adding table data information is searched. Then, the second log record described with reference to FIG. 8 is created (step 503), and the created second log record is acquired (step 504). Here, it is determined whether there is an additional area in the duplicate key data structure (step 505). If there is no additional area, the process proceeds to step 506, where a new area is allocated, and the duplicate key data structure is changed when the new area is added (step 507). Specifically, when the duplicate key data structure is composed of a plurality of pages, new page allocation is performed. Thereafter, the process proceeds to step 508. If the determination result in step 505 is that there is a region, the process proceeds to step 508, table data information is added to the duplicate key data structure, and the change process for the duplicate key data structure is terminated (step 509).
[0049]
FIG. 6 is a flowchart showing a processing procedure of the B-tree index management unit 25 of this embodiment. FIG. 6 shows the processing contents of the B-tree structure access processing 26 when table data information is deleted for the B-tree index.
[0050]
First, in step 601, using the key, the B-tree structure is traced from the root node to the upper node, and after determining the leaf node, the leaf node is read into the database buffer and the buffer is fixed (GET). Next, at step 602, the leaf entry stored in the leaf node is searched, and the leaf entry to be changed is determined. Here, in step 603, a pointer that points to the duplicate key data structure included in the leaf entry is obtained. In step 604, it is determined whether or not the number (overlapping number) of table data information related to the key of the leaf entry is one. If the overlap number is 1, the process proceeds to step 605 to delete the leaf entry, and the process proceeds to step 607. If the determination result in step 604 is that the duplication number is greater than 1, the process proceeds to step 606, and the number of table data information (duplication number) related to the key of the leaf entry is decremented.
[0051]
Further, the first log record described with reference to FIG. 8 is created (step 607), and the created first log record is acquired (step 608). Finally, the leaf node is reserved to be written in the database, and the buffer fixation is released (PUT) (step 609). Thus, the leaf entry change process is completed (step 610), and the process proceeds to the change process for the duplicate key data structure.
[0052]
FIG. 7 is a flowchart showing a processing procedure of the B-tree index management unit 25 of the present embodiment. FIG. 7 shows the processing contents of the duplicate key data structure access processing 27 when table data information is deleted for the B-tree index.
[0053]
The change process for the duplicate key data structure is continued from the change process for the leaf entry described with reference to the flowchart of FIG.
[0054]
First, in step 701, the duplicate key data structure is accessed using the pointer previously obtained in step 603 of FIG. 4, and in step 702, the existence position of the deletion target table data information is searched. Then, the second log record described with reference to FIG. 8 is created (step 703), and the created second log record is acquired (step 704).
[0055]
Next, in step 705, the table data information is deleted from the duplicate key data structure. Here, it is determined whether or not an area constituting a recoverable duplicate key data structure is generated by deleting the table data information (step 706). As a result of the determination, if a recoverable area occurs, the process proceeds to step 707, where the area is recovered, and the duplicate key data structure is changed according to the area recovery (step 708). Specifically, when the duplicate key data structure is composed of a plurality of pages, the pages are collected. The collected pages are reused by assigning new pages in other processes. In step 709 after step 708, the process for changing the duplicate key data structure is terminated. If the determination result in step 706 indicates that there is no recoverable area, the process for changing the duplicate key data structure is terminated in step 709.
[0056]
FIG. 9 is a flowchart showing the processing procedure of the index update result cancellation processing of the index management unit of this embodiment.
[0057]
First, in step 901, a log record related to a B-tree index change is received from system control. In step 902, it is determined whether there is a log record to be processed. If there is no log record to be processed, the index update result cancellation process is terminated in step 913. If there is a log record to be processed, that is, if there is a received log record, the process proceeds to step 903 to determine the type of log record. Here, it is first determined whether the log record is the second log record (step 903). If it is the second log record, the process proceeds to step 904 to hold the second log record. Then, the process returns to step 901 to process the next log record. Here, the next log record should be the first log record. If it is determined in step 903 that the log record is not the first log record, it is further determined in step 905 whether the log record is the first log record. If it is not the first log record, update result cancellation processing corresponding to other log records is performed in step 906, and the processing returns to step 901 to process the next log record. If it is the first log record, the following processing is performed. First, it is determined whether or not the second log record is retained (step 907). If retained, it is determined that a series of processing for changing the leaf entry and the duplicate key data structure has been completed and retained. The second log record is discarded (step 908), and the process proceeds to step 910. If the second log record is not held, it is determined that the change processing of the leaf entry has been completed but the change processing for the duplicate key data structure to be paired has not been completed. The update process is continued. Thereafter, the process proceeds to step 910. In step 910, it is determined whether the log record being processed has been acquired by normal processing. If it is a log record obtained by normal processing, it is necessary to cancel the related change result. In step 911, the cancellation process is performed. Further, in the case of acquisition not during normal processing, that is, acquisition during cancellation processing, it is determined that cancellation processing has already been completed, including step 909, and a series of B-tree index update result cancellation processing ends, and step 901 is completed. Return to and process the next log record.
[0058]
FIG. 10 is a diagram showing an example of the configuration of the duplicate key data structure 102 of this embodiment.
[0059]
In the example of FIG. 10, the duplicate key data structure 102 includes a page 1002 that actually stores table data information and a page 1001 that stores a management entry 1003 having a pointer pointing to the page 1002.
[0060]
The management entry 1003 stored in the page 1001 has a pointer 1005 pointing to the page 1002 and a maximum value 1004 of table data information stored in the page 1002. Management entries 1003 in the page 1001 are sorted in the order of the maximum value 1004 of the table data information.
[0061]
As shown in FIG. 10, each page 1001 has a pointer to a page 1001 in which a management entry having maximum information larger than the maximum information 1004 in the management entry 1003 stored in the page 1001 is stored. The leaf entry 20 points to the page 1001 storing the management entry pointing to the page 1002 storing the smallest table data information.
[0062]
With the above data structure, it is possible to efficiently perform a search in the duplicate key data structure described with reference to FIGS.
[0063]
Further, new allocation / recovery of the page 1001 or the page 1002 can be easily realized by GET / PUT to the database buffer. The second log record at the time of new allocation / collection can easily perform the change continuation process in the cancellation process by adding the page identifier of the page 1001 or the page 1002 in addition to the configuration information shown in the example of FIG. is there.
[0064]
The processing of the flowchart shown above is executed as a program in the computer system shown as an example in FIG. However, the program is not limited to the program stored in the external storage device physically connected directly to the computer system as in the example of FIG. It can be stored in a storage medium readable and writable by a computer such as a hard disk device or a floppy disk device. Further, it can be stored in an external storage device connected to a computer system different from the computer system of FIG. 3 via a network.
[0065]
In this way, table data information with the same key can be accessed simultaneously by multiple processes without acquiring exclusion at the time of access, and can be canceled without restricting the access of normal processes Therefore, it is possible to provide high concurrency with cancellation processing for a plurality of processes that simultaneously access the same key.
[0066]
Since the duplicate key data structure is pointed from the leaf entry and stored and managed outside the leaf node, storage of leaf entries having more other keys within the leaf node in which the leaf entry having the duplicate key is stored Is possible. As a result, the storage capacity of the B-tree index is suppressed, and stable access performance can be provided.
[0067]
As described above, it is possible to provide a data management method using a B-tree index that has excellent concurrency and high storage efficiency even when there are a large number of duplicate keys.
[0068]
The data management method using the B-tree index is very effective for applications that use the B-tree index for data with a narrow range, such as “business transition state” in a workflow. Because the B-tree index used by the application has a narrow range, the data duplication related to one key is enormous, and the B-tree index is frequently updated due to changes such as the “business transition state”. is there.
[0069]
Although the B-tree has been described above, the present invention can be similarly applied when an index access to the same key occurs in an environment where a plurality of accesses are performed, and suitably performs an index access to the same key. It is possible to provide a data management method and apparatus capable of managing the data.
[0070]
【The invention's effect】
ADVANTAGE OF THE INVENTION According to this invention, the data management method and apparatus which can perform suitably the index access with respect to the same key in the environment where several accesses are performed can be provided.
[Brief description of the drawings]
FIG. 1 is a conceptual diagram of the present invention.
FIG. 2 is a diagram showing functional blocks of a database processing system having a B-tree index according to the present embodiment.
FIG. 3 is a diagram illustrating an example of a hardware configuration of a computer system according to the present embodiment.
FIG. 4 is a flowchart illustrating a processing procedure of index update processing of a B-tree index management unit 25 according to the present embodiment.
FIG. 5 is a flowchart illustrating a processing procedure of index update processing of a B-tree index management unit 25 according to the present embodiment.
FIG. 6 is a flowchart illustrating a processing procedure of index update processing of a B-tree index management unit 25 according to the present embodiment.
FIG. 7 is a flowchart illustrating a processing procedure of index update processing of the B-tree index management unit 25 according to the present embodiment.
FIG. 8 is a diagram illustrating an example of a B-tree index log record according to the present embodiment;
FIG. 9 is a flowchart illustrating a processing procedure of an index update result cancellation process (recovery processing procedure) of the B-tree index management unit according to the present embodiment;
FIG. 10 is a diagram illustrating an example of a configuration of a duplicate key data structure according to the present embodiment.
FIG. 11 is a diagram showing a configuration of a conventional B-tree index having many duplicate keys.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... B-tree index, 2 ... Database management system, 3 ... Database, 4 ... System log, 5 ... Table data, 6 ... Application program, 21 ... Logical processing part, 22 ... Physical processing part, 23 ... System control, 25 ... B-tree index management unit, 26 ... B-tree structure access processing, 27 ... duplicate key data structure access processing, 101 ... B-tree structure, 102 ... duplicate key data structure, 41 ... first log record, 42 ... second log record.

Claims

In a data management apparatus in which one key manages identification information relating to data including the key as a data pointer, and uses a B-tree index to find a data pointer related to the key based on a key value .
A data structure for storing a plurality of data pointers related to a plurality of data including the same key as the one key, and a data structure different from a leaf node in which an index entry including the key is stored. Storage means having a duplicate data structure that is pointed from the index entry that includes the key and does not acquire exclusion when accessing a plurality of data pointers regarding a plurality of data that includes the same key;
When adding or deleting a data pointer related to a key, table data related information as information for adding or deleting the data pointer to the duplicate data structure storing a plurality of data pointers related to the key Means for obtaining a first log record including a pointer to the duplicate data structure and the key;
Means for updating the index entry containing the key stored in the leaf node;
Means for obtaining a second log record including information indicating that the data pointer has been added to or deleted from the duplicate data structure;
Means for adding or deleting the data pointer to the duplicate data structure;
When canceling the processing result that involves adding or deleting the above data pointer,
When the acquired log record is received and the received log record is the second log record, the second log record is retained, and the received log record is the first log record. If the second log record is held, it is confirmed that the addition or deletion of the data pointer to the duplicate data structure is completed. Means to judge,
Adding or deleting the pointer to the duplicate data structure by confirming that the second log record is not held even though the log record received in the above determination is the first log record Is determined to be not completed, the duplicate data structure indicated by the pointer to the duplicate data structure included in the first log record is accessed, and the table data information included in the first log record is added or A data management apparatus comprising means for deleting.

A computer that records identification data related to data including the key as a data pointer and records a data management program using a B-tree index to find the data pointer related to the key based on the key value In possible recording media,
  A data structure for storing a plurality of data pointers related to a plurality of data including the same key as the one key,
The data structure is different from the leaf node in which the index entry including the key is stored, and the data structure is pointed to from the index entry including the key, and access to a plurality of data pointers regarding the plurality of data including the same key Have a duplicate data structure that does not get an exclusive at the time of
  When adding or deleting a data pointer associated with a key,
  For the duplicate data structure storing a plurality of data pointers associated with the key,
A function of acquiring a first log record including table data related information, a pointer to the duplicate data structure, and the key as information for adding or deleting the data pointer;
  The ability to update the index entry containing that key stored in the leaf node;
  A function of obtaining a second log record including information indicating that the data pointer has been added to or deleted from the duplicate data structure;
  A function of adding or deleting the data pointer to the duplicate data structure;
  When canceling the processing result that involves adding or deleting the above data pointer,
When the acquired log record is received and the received log record is the second log record, the second log record is retained, and the received log record is the first log record. If the second log record is held, it is confirmed that the addition or deletion of the data pointer to the duplicate data structure is completed. The ability to judge,
  Adding or deleting the pointer to the duplicate data structure by confirming that the second log record is not held even though the log record received in the determination is the first log record Is determined to be not completed, the duplicate data structure indicated by the pointer to the duplicate data structure included in the first log record is accessed, and the table data information included in the first log record is added or A computer-readable storage medium storing a data management program for causing a computer to execute a function to be deleted.