CN110287172B - Method for formatting HBase data - Google Patents

Method for formatting HBase data Download PDF

Info

Publication number
CN110287172B
CN110287172B CN201910588013.8A CN201910588013A CN110287172B CN 110287172 B CN110287172 B CN 110287172B CN 201910588013 A CN201910588013 A CN 201910588013A CN 110287172 B CN110287172 B CN 110287172B
Authority
CN
China
Prior art keywords
hbase
cluster
zookeeper
root node
hadoop
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910588013.8A
Other languages
Chinese (zh)
Other versions
CN110287172A (en
Inventor
李烨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan XW Bank Co Ltd
Original Assignee
Sichuan XW Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan XW Bank Co Ltd filed Critical Sichuan XW Bank Co Ltd
Priority to CN201910588013.8A priority Critical patent/CN110287172B/en
Publication of CN110287172A publication Critical patent/CN110287172A/en
Application granted granted Critical
Publication of CN110287172B publication Critical patent/CN110287172B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for formatting HBase data, belongs to the field of data formatting, and solves the problems that in the prior art, the operation is complicated and the time consumption is long when the HBase data is formatted. According to the method, all services of the HBase cluster are stopped, and the Zookeeper and Hadoop on which the HBase cluster depends are kept in a normal running state; firstly deleting a root node storing HBase metadata on a Zookeeper and all child nodes contained under the root node on the HBase cluster, and deleting a root directory storing HBase data on a Hadoop and all child directories contained under the root directory on the HBase cluster; after deleting, all services of the HBase cluster are started to obtain the HBase in the initial state. The method is used for quickly formatting the HBase data.

Description

Method for formatting HBase data
Technical Field
A method for formatting HBase data is used for rapidly formatting the HBase data, and belongs to the field of data formatting.
Background
Data formatting refers to deleting all data and metadata in the system, and restoring the system to an initial state. When the data in the system is no longer useful or the system state is abnormal, the system can be quickly restored to a clean and usable state by performing data formatting.
Zookeeper: the ZooKeeper is a distributed application coordination service of open source codes, is an open source implementation of Chubbby of Google, is an important component of Hadoop and HBase dependence, and is currently a top-level open source project of Apache communities. It is a software providing a consistency service for distributed applications, the provided functions include: configuration maintenance, domain name service, distributed synchronization, group service, etc.
Hadoop: hadoop contains a distributed file system HDFS and a distributed computing framework MapReduce, which is currently the top-level item of the Apache community. Hadoop is characterized by high fault tolerance and is designed to be deployed on inexpensive hardware, and it provides high throughput access to data of applications that fit applications with very large data sets.
HBase is a very popular distributed and array-oriented NoSQL database, is a top-level open-source project of Apache communities, and has application scenes mainly of massive data storage and fixed condition retrieval under high concurrency conditions. In the development test environment, when the data in the HBase is no longer useful or the HBase state is abnormal, by formatting the HBase data, an HBase in an initial state, i.e., an HBase without any data, can be obtained quickly. The operation of HBase depends on Zookeeper and Hadoop, the metadata of which is stored on Zookeeper, and the data is stored on Hadoop. The HBase itself does not provide a method or tool for formatting, no patent is retrieved in the published patent regarding formatting the HBase, nor is there a detailed description of a method of formatting the HBase on the internet similar to that described herein. One solution that can easily be thought of and achieve the same purpose is to uninstall the original HBase cluster, namely, need to delete all data, metadata, software packages, configuration files and the like of the HBase, and to re-build a set of brand-new HBase clusters (need to reinstall the software packages and the configuration files in each node of the HBase cluster), but the operation of the method is complicated and takes a long time.
Disclosure of Invention
Aiming at the problems of the research, the invention aims to provide a method for formatting HBase data, which solves the problems of complicated operation and long time consumption in the prior art that a set of brand new HBase clusters are rebuilt to format the HBase data by unloading the original HBase clusters.
In order to achieve the above purpose, the invention adopts the following technical scheme:
a method of formatting HBase data, comprising the steps of:
s1, stopping all services of an HBase cluster, and simultaneously keeping a Zookeeper and Hadoop on which the HBase cluster depends in a normal running state;
s2, after the step S1 is executed, firstly deleting a root node storing HBase metadata on a Zookeeper and all child nodes contained under the root node on the HBase cluster, and then deleting a root directory storing HBase data on Hadoop and all child directories contained under the root directory on the HBase cluster; or deleting the root directory storing the HBase data on the Hadoop and all sub-directories contained in the root directory on the HBase cluster, and deleting the root node storing the HBase metadata on the Zookeeper and all sub-nodes contained in the root node on the HBase cluster; or firstly deleting the root node storing the HBase metadata on the Zookeeper and all sub-nodes contained under the root node on the HBase cluster, and simultaneously deleting the root directory storing the HBase data on the Hadoop and all sub-directories contained under the root directory on the HBase cluster;
and S3, after deleting, starting all services of the HBase cluster, and obtaining the HBase in an initial state.
Further, in the step S2,
the specific implementation process of deleting the root node storing the HBase metadata on the Zookeeper on the HBase cluster and all the child nodes contained under the root node is as follows: the method comprises the steps that a root node storing HBase metadata on a Zookeeper is found in a Zookeeper tag of a configuration file HBase-site.xml of an HBase cluster, and after the root node and all child nodes contained under the root node are deleted on the Zookeeper;
the specific implementation process for deleting the root directory storing the HBase data on the Hadoop on the HBase cluster and all subdirectories contained under the root directory comprises the following steps: and finding a root directory storing HBase data on the Hadoop in a HBase-site.xml HBase. Rootdir tag of a configuration file HBase cluster, and deleting the root directory and all subdirectories contained in the root directory on the Hadoop after finding.
Further, the processor receives a request for formatting HBase data, stops all services of the HBase cluster, and simultaneously keeps the Zookeeper and Hadoop on which the HBase cluster depends in a normal running state;
then, the processor calls a query and deletion program in a memory according to a query deletion instruction, firstly deletes a root node storing HBase metadata on a Zookeeper and all child nodes contained under the root node on the HBase cluster, and then deletes a root directory storing HBase data on a Hadoop and all child directories contained under the root directory on the HBase cluster; or the processor calls a query and deletion program in a memory according to a query deletion instruction, firstly deletes a root directory storing HBase data on Hadoop and all subdirectories contained under the root directory on the HBase cluster, and then deletes a root node storing HBase metadata on a Zookeeper and all subdirectories contained under the root node on the HBase cluster; or the processor calls a query and deletion program in a memory according to a query deletion instruction, firstly deletes a root node storing HBase metadata on the Zookeeper and all child nodes contained under the root node on the HBase cluster, and simultaneously deletes a root directory storing HBase data on the Hadoop and all child directories contained under the root directory on the HBase cluster;
after deleting, the processor starts all services of the HBase cluster, and then the HBase in an initial state is obtained.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the method, all metadata stored on the Zookeeper and data stored on the Hadoop by the HBase cluster are deleted, so that the implementation steps are simplified, the complexity of operation is reduced, the formatting of HBase data is realized rapidly, and the optimal solution of processing the internal object by the computer is realized.
Drawings
FIG. 1 is a flow chart of deleting all metadata stored on a Zookeeper in the present invention, and then deleting data stored on Hadoop.
Detailed Description
The invention will be further described with reference to the drawings and detailed description.
A method of formatting HBase data, comprising the steps of:
s1, stopping all services of an HBase cluster, and simultaneously keeping a Zookeeper and Hadoop on which the HBase cluster depends in a normal running state;
s2, after the step S1 is executed, firstly deleting a root node storing HBase metadata on a Zookeeper and all child nodes contained under the root node on the HBase cluster, and then deleting a root directory storing HBase data on Hadoop and all child directories contained under the root directory on the HBase cluster; or deleting the root directory storing the HBase data on the Hadoop and all sub-directories contained in the root directory on the HBase cluster, and deleting the root node storing the HBase metadata on the Zookeeper and all sub-nodes contained in the root node on the HBase cluster; or firstly deleting the root node storing the HBase metadata on the Zookeeper and all sub-nodes contained under the root node on the HBase cluster, and simultaneously deleting the root directory storing the HBase data on the Hadoop and all sub-directories contained under the root directory on the HBase cluster;
the specific implementation process of deleting the root node storing the HBase metadata on the Zookeeper on the HBase cluster and all the child nodes contained under the root node is as follows: the method comprises the steps that a root node storing HBase metadata on a Zookeeper is found in a Zookeeper tag of a configuration file HBase-site.xml of an HBase cluster, and after the root node and all child nodes contained under the root node are deleted on the Zookeeper;
the specific implementation process for deleting the root directory storing the HBase data on the Hadoop on the HBase cluster and all subdirectories contained under the root directory comprises the following steps: and finding a root directory storing HBase data on the Hadoop in a HBase-site.xml HBase. Rootdir tag of a configuration file HBase cluster, and deleting the root directory and all subdirectories contained in the root directory on the Hadoop after finding.
In the searching and deleting process, a manual mode is adopted to search in HBase cluster configuration files HBase-site.xml and delete according to the searching result, namely, corresponding content is found through naked eye checking and deleting instructions are given for deleting; or after receiving the searching instruction through the program, automatically searching in HBase-site.xml of the configuration file HBase cluster and deleting according to the searching result, wherein the program for searching the root node storing the HBase metadata on the Zookeeper: namely, an XML parsing program (such as a common XML parsing library such as a DOM4J is called) is written, a value of a < value > </value > tag corresponding to a < name > zookeeper/parent > tag is found out from hbase-site. Searching a root directory storing HBase data on Hadoop: an XML analysis program (such as a common XML analysis library such as a DOM4J is called) is written, a value of a < value > </value > mark corresponding to a < name > hbase. Rootdir </name > mark is found out from hbase-site.xml, and then the program is executed to search; the deleting procedure is as follows: the node deleted on the Zookeeper can adopt zkCli.sh script, java API of the deleted node of the Zookeeper or other language API, etc.; the directory on Hadoop can be deleted by using commands of hdfs dfs-rm-r < directory > or Hadoop fs-rm-r < directory > which are both carried by the Hadoop, or Java APIs of the Hadoop for deleting the directory or APIs of other languages.
And S3, after deleting, starting all services of the HBase cluster, and obtaining the HBase in an initial state.
The data stream that implements the formatting is as follows:
the processor receives a request for formatting HBase data, stops all services of the HBase cluster, and simultaneously keeps the Zookeeper and Hadoop relied on by the HBase cluster in a normal running state;
the processor calls a query and deletion program in a memory according to a query deletion instruction, firstly deletes a root node storing HBase metadata on a Zookeeper and all child nodes contained under the root node on the HBase cluster, and then deletes a root directory storing HBase data on a Hadoop and all child directories contained under the root directory on the HBase cluster; or the processor calls a query and deletion program in a memory according to a query deletion instruction, firstly deletes a root directory storing HBase data on Hadoop and all subdirectories contained under the root directory on the HBase cluster, and then deletes a root node storing HBase metadata on a Zookeeper and all subdirectories contained under the root node on the HBase cluster; or the processor calls a query and deletion program in a memory according to a query deletion instruction, firstly deletes a root node storing HBase metadata on the Zookeeper and all child nodes contained under the root node on the HBase cluster, and simultaneously deletes a root directory storing HBase data on the Hadoop and all child directories contained under the root directory on the HBase cluster;
after deleting, the processor starts all services of the HBase cluster, and then the HBase in an initial state is obtained.
The above is merely representative examples of numerous specific applications of the present invention and should not be construed as limiting the scope of the invention in any way. All technical schemes formed by adopting transformation or equivalent substitution fall within the protection scope of the invention.

Claims (2)

1. A method of formatting HBase data, comprising the steps of:
s1, stopping all services of an HBase cluster, and simultaneously keeping a Zookeeper and Hadoop on which the HBase cluster depends in a normal running state;
s2, after the step S1 is executed, firstly deleting a root node storing HBase metadata on a Zookeeper and all child nodes contained under the root node on the HBase cluster, and then deleting a root directory storing HBase data on Hadoop and all child directories contained under the root directory on the HBase cluster; or deleting the root directory storing the HBase data on the Hadoop and all sub-directories contained in the root directory on the HBase cluster, and deleting the root node storing the HBase metadata on the Zookeeper and all sub-nodes contained in the root node on the HBase cluster; or firstly deleting the root node storing the HBase metadata on the Zookeeper and all sub-nodes contained under the root node on the HBase cluster, and simultaneously deleting the root directory storing the HBase data on the Hadoop and all sub-directories contained under the root directory on the HBase cluster;
the specific implementation process of deleting the root node storing the HBase metadata on the Zookeeper on the HBase cluster and all the child nodes contained under the root node is as follows: the method comprises the steps that a root node storing HBase metadata on a Zookeeper is found in a Zookeeper label of a configuration file hbae-se-s i t e.x m l of an H Ba s e cluster, and after the root node and all child nodes contained under the root node are deleted on the Zookeeper; the specific implementation process for deleting the root directory storing the HBase data on the Hadoop on the HBase cluster and all subdirectories contained under the root directory comprises the following steps: finding a root directory storing HBase data on the Hadoop in a HBase-site.xml HBase database tag of the HBase cluster, and deleting the root directory and all subdirectories contained in the root directory on the Hadoop after finding;
and S3, after deleting, starting all services of the HBase cluster, and obtaining the HBase in an initial state.
2. The method for formatting HBase data according to claim 1, wherein the processor receives a request for formatting HBase data, stops all services of the HBase cluster, and simultaneously keeps the Zookeeper and Hadoop on which the HBase cluster depends still in a normal operation state; then, the processor calls a query and deletion program in a memory according to a query deletion instruction, firstly deletes a root node storing HBase metadata on a Zookeeper and all child nodes contained under the root node on the HBase cluster, and then deletes a root directory storing HBase data on a Hadoop and all child directories contained under the root directory on the HBase cluster; or the processor calls a query and deletion program in a memory according to a query deletion instruction, firstly deletes a root directory storing HBase data on Hadoop and all subdirectories contained under the root directory on the HBase cluster, and then deletes a root node storing HBase metadata on a Zookeeper and all subdirectories contained under the root node on the HBase cluster; or the processor calls a query and deletion program in a memory according to a query deletion instruction, firstly deletes a root node storing HBase metadata on the Zookeeper and all child nodes contained under the root node on the HBase cluster, and simultaneously deletes a root directory storing HBase data on the Hadoop and all child directories contained under the root directory on the HBase cluster; after deleting, the processor starts all services of the HBase cluster, and then the HBase in an initial state is obtained.
CN201910588013.8A 2019-07-01 2019-07-01 Method for formatting HBase data Active CN110287172B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910588013.8A CN110287172B (en) 2019-07-01 2019-07-01 Method for formatting HBase data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910588013.8A CN110287172B (en) 2019-07-01 2019-07-01 Method for formatting HBase data

Publications (2)

Publication Number Publication Date
CN110287172A CN110287172A (en) 2019-09-27
CN110287172B true CN110287172B (en) 2023-05-02

Family

ID=68021634

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910588013.8A Active CN110287172B (en) 2019-07-01 2019-07-01 Method for formatting HBase data

Country Status (1)

Country Link
CN (1) CN110287172B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113591143A (en) * 2021-07-07 2021-11-02 四川新网银行股份有限公司 Control method for limiting client IP reading and writing HBase table

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105468735A (en) * 2015-11-23 2016-04-06 武汉虹旭信息技术有限责任公司 Stream preprocessing system and method based on mass information of mobile internet
CN109271365A (en) * 2018-09-19 2019-01-25 浪潮软件股份有限公司 Method for accelerating reading and writing of HBase database based on Spark memory technology
CN109299068A (en) * 2018-08-31 2019-02-01 安徽四创电子股份有限公司 From relevant database to the data flow migration method of HBase database

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9172608B2 (en) * 2012-02-07 2015-10-27 Cloudera, Inc. Centralized configuration and monitoring of a distributed computing cluster
US9842126B2 (en) * 2012-04-20 2017-12-12 Cloudera, Inc. Automatic repair of corrupt HBases

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105468735A (en) * 2015-11-23 2016-04-06 武汉虹旭信息技术有限责任公司 Stream preprocessing system and method based on mass information of mobile internet
CN109299068A (en) * 2018-08-31 2019-02-01 安徽四创电子股份有限公司 From relevant database to the data flow migration method of HBase database
CN109271365A (en) * 2018-09-19 2019-01-25 浪潮软件股份有限公司 Method for accelerating reading and writing of HBase database based on Spark memory technology

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于Hadoop的云平台设计与实现;秦东霞;《智能计算机与应用》;20160828;全文 *
基于Hadoop系统大数据平台在天津市地震局的应用;丁晶;《电子技术与软件工程》;20170927;全文 *

Also Published As

Publication number Publication date
CN110287172A (en) 2019-09-27

Similar Documents

Publication Publication Date Title
US11836151B2 (en) Synchronizing symbolic links
CN104951474B (en) Method and device for acquiring MySQL binlog incremental log
JP5961689B2 (en) Incremental data extraction
JP2022095645A (en) System and method for capture of change data from distributed data sources, for use with heterogeneous targets
US9715507B2 (en) Techniques for reconciling metadata and data in a cloud storage system without service interruption
US8938430B2 (en) Intelligent data archiving
US8442951B1 (en) Processing archive content based on hierarchical classification levels
CN106933703B (en) Database data backup method and device and electronic equipment
KR101127304B1 (en) Hsm two-way orphan reconciliation for extremely large file systems
CN113986873B (en) Method for processing, storing and sharing data modeling of mass Internet of things
CN103595797B (en) Caching method for distributed storage system
US8874519B1 (en) Method and apparatus for restoring a table in a database
US20140156603A1 (en) Method and an apparatus for splitting and recovering data in a power system
US10747643B2 (en) System for debugging a client synchronization service
JP2020057416A (en) Method and device for processing data blocks in distributed database
US10606805B2 (en) Object-level image query and retrieval
US9646016B2 (en) Methods circuits apparatuses systems and associated computer executable code for data deduplication
CN110287172B (en) Method for formatting HBase data
US20100293143A1 (en) Initialization of database for synchronization
US11210212B2 (en) Conflict resolution and garbage collection in distributed databases
CN114036226A (en) Data synchronization method, device, equipment and storage medium
WO2011051098A1 (en) Synchronizing database and non-database resources
US20150347402A1 (en) System and method for enabling a client system to generate file system operations on a file system data set using a virtual namespace
CN111026764B (en) Data storage method and device, electronic product and storage medium
US11055266B2 (en) Efficient key data store entry traversal and result generation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant