CN104270412A - Three-level caching method based on Hadoop distributed file system - Google Patents

Three-level caching method based on Hadoop distributed file system Download PDF

Info

Publication number
CN104270412A
CN104270412A CN201410455411.XA CN201410455411A CN104270412A CN 104270412 A CN104270412 A CN 104270412A CN 201410455411 A CN201410455411 A CN 201410455411A CN 104270412 A CN104270412 A CN 104270412A
Authority
CN
China
Prior art keywords
data
data block
internal memory
page
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410455411.XA
Other languages
Chinese (zh)
Inventor
孙知信
谢怡
宫婧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201410455411.XA priority Critical patent/CN104270412A/en
Publication of CN104270412A publication Critical patent/CN104270412A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention discloses a three-level caching method based on a Hadoop distributed file system. The method comprises the first step of task scheduling of data localization processing, the second step of local access of data in a local memory and the third step of repeated utilization of the data of the local memory. The method can improve the data hit rate, reduce the data transmission quantity and improve the MapReduce execution efficiency.

Description

A kind of three grades of caching methods based on Hadoop distributed file system
Technical field
The present invention relates to field of data storage, refer more particularly to a kind of three grades of caching methods based on Hadoop distributed file system.
Background technology
Apache Hadoop (generally referred to as Hadoop) is a distributed data processing platform of increasing income, it mainly comprises field of distributed file processing (Hadoop Distributed File System, HDFS) and MapReduce computation module.
Apache Hadoop is the intensive Distributed Application of a supported data and the open source software framework issued with Apache2.0 permission agreement.It is supported in the application program that the large-scale cluster of commodity hardware structure runs.Hadoop be MapReduce and the Google archives economy delivered according to Google company paper voluntarily implementation form.
Hadoop framework is pellucidly for application provides reliability and data mobile.It achieve the programming paradigm of MapReduce by name: application program is divided into many fractions, and each part can arbitrary node in the cluster perform or re-execute.In addition, Hadoop additionally provides distributed file system, and in order to store the data of all computing nodes, this is that whole cluster brings very high bandwidth.The design of MapReduce and distributed file system, makes whole framework can handle node failures automatically.The data of the computer that it makes application program and thousands of independence calculate and PB level.Generally believe now that whole Apache Hadoop " platform " comprises Hadoop kernel, MapReduce, Hadoop distributed file system (HDFS) and some relevant items, have Apache Hive and Apache HBase etc.
HDFS can meet the storage demand of large-scale data well, but it exists many deficiencies in the reading of process real time data.A large amount of digital independent is related in process due to execution MapReduce task, can exert heavy pressures on to Internet Transmission and I/O (Input/Output) bandwidth, so caching system will be arranged on the basis of HDFS, reduce volume of transmitted data, to improve the execution efficiency of MapReduce.
Data calculation process is divided into two stage: Map and Reduce by MapReduce, corresponding to two process function mapper and reducer.In the Map stage, initial data is transfused to mapper and carries out filtering and changing, and the intermediate data result of acquisition, as the input of reducer, obtains last result.In the processing procedure of whole MapReduce, from HDFS, read the time that initial data spends the longest, therefore, wants the execution efficiency improving MapReduce, need to start with from the reading of initial data.By arranging corresponding caching mechanism, improving data hit rate, the reading time of Map stage initial data is reduced.
Memcached and RAMCloud (internal memory cloud) is two typical internal memory level buffer memory systems.Memcached be in based on the mass data of disk storage for rear end provide Relatively centralized, non-cooperating, to the nontransparent data buffer storage service of user.RAMCloud uses distributed shared memory to substitute disk to complete data store and management, and proposes data cached based on many disk dispersion backups, parallel thought of repairing fast.
The framework that both inherently still deposit towards data in magnetic disk Relatively centralized, computational resource is separated with storage resources, is therefore difficult to be directly applied for MapReduce platform; Application type difference simultaneously owing to supporting, the two does not all consider the feature due to MapReduce application data localization process.
Carried out the temporal locality of quantized data access by the access times of other data blocks at institute interval between twice access of same data block based on the cache replacement policy LAC (Locality-Aware Cooperative Caching, location aware cooperation caching) of temporal locality.The factors such as the nearest access time of the transmission cost of data block, the size of data block and data block are built buffer memory and replaces Cost Model.
These mechanism are all towards conventional data centers platform architecture, but dispose and the feature of data localization process in the close coupling of Map/Reduce platform computational resource and storage resources, make the data access feature statistics based on block level be subject to the interference of computational resource allocation strategy and real time load, be difficult to complete real reflection data access feature.
Reading a large amount of data for needing in MapReduce tasks carrying process, Internet Transmission and I/O bandwidth being exerted heavy pressures on, in prior art, there is no good solution.
Summary of the invention
For solving the problems of the technologies described above, present invention employs following technical scheme:
Based on three grades of caching methods of Hadoop distributed file system, adopt Apache Hadoop to realize, its method is as follows:
The task scheduling of step one, data localization process, comprises substep again:
1st step, user are to the request of Jobtrack submit job, and Jobtrack obtains the Job data area that will read and breakdown of operation is become several Map tasks and Reduce task;
The data that 2nd step, Jobtrack will read according to each Map task, obtain the DataNode position of depositing these data by the metadata of accessing NameNode;
3rd step, the timing of idle Tasktrack node report oneself situation to Jobtrack, Jobtrack selects the DataNode having target data from the Tasktrack node of these free time, and by corresponding Map task matching in this node;
Step 2, data are accessed in the locality of local internal memory, comprise again following sub-step:
1st step, the memory headroom of server is divided into equal-sized some storage areas, each block region is referred to as page frame;
2nd step, every one page are the most basic Memory Allocation.Under the bottom reserved byte of every one page deposits sensing one page address or represent that this data block terminates.Each data block is expressed as the chained list of a string page in internal memory.
Safeguard in 3rd step, internal memory that a data block calls in information table, when the data block in internal memory reaches the upper limit of memory space, when new data block needs to call in, uses and does not use replacement algorithm to perform the replacement of data block recently at most; In addition, the position diagram of a memory page table is also safeguarded in internal memory.
The recycling of step 3, local internal storage data
1st step, in Master server, safeguard a global buffer information management table, whether cache information and this node of being responsible for recording each Slave node have enough Slot resources.Each Slave server timing sends information to Master server, reports oneself cache information and Slot resource information.
2nd step, when Jobtrack carries out task scheduling, first check global buffer information table, if find the data having Map required by task in buffer memory, then this task of priority allocation.If then do not follow the scheduling strategy of data localization process.
Leaf frame size in step 2 in the 1st step is 64k.
In step 2, in the 2nd step, byte number is 4.
In step 2, the 3rd step does not use replacement algorithm to replace for selecting data block not visited in maximum duration in nearest a period of time recently at most.
In step 2 the 3rd step data block call in information table to record in have data block sequence number, the initial frame number of data block, data block access time.
The position diagram of a memory page table is also safeguarded in 3rd step internal memory in step 2, its method is divide one piece of fixed storage region in internal memory, each bit of each internal storage location represents a page, if this page is assigned with, then corresponding bit position 1, otherwise set to 0, to determine whether there is free cells in internal memory.
Three grades of caching mechanisms based on HDFS proposed by the invention can improve data hit rate, reduce volume of transmitted data, promote the execution efficiency of MapReduce.
Accompanying drawing explanation
Fig. 1 is based on three grades of cache shelf compositions of HDFS.
Embodiment
Three grades of caching mechanisms proposed by the invention are the distributed file system HDFS running on Hadoop platform, comprise three levels, as shown in Figure 1: 1, the task scheduling of data localization process; 2, data are accessed in the locality of local internal memory; 3, the recycling of local internal storage data.
Generally, HDFS comprises a NameNode and is deployed in a large amount of DataNode on master server and is deployed in the metadata information that NameNode from server is in charge of user file.Metadata is made up of three parts, is the corresponding relation of the data block split by file system directory tree information, file and file, the data block distributing position information on DataNode respectively.The file being stored in HDFS is splitted into onesize data block (normally 64MB), and these data blocks will copy and be stored in multiple data storage server.Each data storage server preserves these blocks by Linux file system on local disk, and reads and writes data block.
MapReduce computation schema is a kind of functional expression programming mode of standard.Client user is by the operation of programming realization to file.Each user program can regard an operation as, and operation will resolve into several Map and Reduce tasks by Jobtrack (job scheduling).Map task read data from HDFS, the data write HDFS that Reduce task is handled well, so proposed by the invention three grades of caching systems serve primarily in Map task, ensures that Map can find the data wanted within the shortest time.
Step one: the task scheduling of data localization process
Because HDFS is a Master/Slave structure (host-guest architecture), thus can be deployed in Master server (master server) by NameNode (namenode) and Jobtrack, DataNode (back end) and Tasktrack (task scheduling) is deployed in Slave server from server.If Map task data to be dealt with are kept on home server just, so Map task directly can read data from local disk, reduces the data transmission time in a network.When Jobtrack carries out task scheduling, preferentially by Map task matching to comprise this task want on the DataNode of process data block.In order to realize this purpose, split burst size and data block in the same size, therefore in InputSplit metadata information, host list only comprises a node, data localization process can be realized completely.
With data localization process for the task scheduling process of guiding is as follows:
1, user writes the new JobClient example of MapReduce program creation, to the request of Jobtrack submit job.Jobtrack receives a Job Client and asks, and replys.Then obtaining the Job data area that will read and breakdown of operation become several Map tasks and Reduce task, the corresponding partial data of each Map task process, is a split size.
2, the Jobtrack data that will read according to each Map task, obtain the DataNode position of depositing these data by the metadata of accessing NameNode, comprise the back end position of backup.
3, the situation of oneself is reported in the Tasktrack node timing of free time to Jobtrack, and Jobtrack selects the DataNode having target data from the Tasktrack node of these free time, and by corresponding Map task matching in this node.
Step 2: data are accessed in the locality of local internal memory
Obviously, the reading rate of disk does not catch up with the processing speed of CPU far away, so arrange core buffer to balance both gaps.Partial data in disk is read in internal memory in advance, when CPU runs into the instruction of reading data, directly from internal memory, obtains corresponding data.Can reduce Map task reads data time from local disk like this, internal memory DBMS buffer scheduling proposed by the invention is based on locality access.Although be independently between two Map tasks, but Map task decomposes out from user program, so meet locality access principle between front and back, data in advance can be called in internal memory or the data that were read last time do not recall internal memory at once, when next Map task arrives, just directly can read data.
Internal memory DBMS buffer scheduling process based on locality access is as follows:
1, the memory headroom of server is divided into equal-sized some storage areas, each block region is referred to as page frame, and page frame size is 64K, and the size due to data block each in HDFS is 64MB, so the page frame number needed for a data block is 1024.Page frame in internal memory from 0 serial number.
2, every one page is the most basic Memory Allocation.4 bytes are reserved to deposit the address of one page under sensing or to represent that this data block terminates in the bottom of every one page.Each data block is expressed as the chained list of a string page in internal memory.
3, conveniently from internal memory, required data block is found, from the internal memory of server, safeguard that a data block calls in information table, show which data block and call in internal memory, memory location in internal memory and when call in internal memory, the option that therefore will record in this information table has: the initial frame number of data block sequence number, data block, data block access time.When the data block in internal memory reaches the upper limit of memory space, when new data block needs to call in, use and do not use replacement algorithm (Least Recently Used recently at most, LRU) replacement of data block is performed, namely data block not visited in maximum duration in nearest a period of time is selected to replace, because locality of reference principle to think in a period of time in the past the data block of never accessed mistake, in the immediate future also can not be accessed, so need recording data blocks in data block information table call in the time, prepare in addition for performing lru algorithm, the position diagram of a memory page table is also safeguarded in internal memory, one piece of fixed storage region is divided in internal memory, each bit of each internal storage location represents a page, if this page is assigned with, then corresponding bit position 1, otherwise set to 0, to determine whether there is free cells in internal memory, the effect of position diagram is to point out in internal memory, whether each page is assigned with away, and be the sum of assignment page.Also refer to above, the conveniently management of internal memory, memory headroom is divided into some page frames that size is identical, data block is called in internal memory, must dispenser paging store, so in order to manage the distribution of page frame, so represent the distribution condition of every one page with position diagram, be assigned with set, do not distribute reset.When will call in data block, first to check a diagram next time, see if there is enough page frames and call in for data block, if then will lru algorithm be performed not, data block is before recalled internal memory, vacates page frame, call in for data block below.
4, suppose that memory headroom size is 8GB, the space of reserved 2GB is as other purposes, and the space size that so really can be used for doing memory cache is 6GB.The size of each data block of HDFS is 64MB, so a performance is called in data block and is had 96 data blocks in memory cache, so data block call in information table can not be very large.
Step 3: the recycling of local internal storage data
After Map tasks carrying terminates, the data that it reads can be retained in internal memory.When next Map task matching is to this DataNode, in order to avoid internal memory replaces the unnecessary time spent, Jobtrack should pay the utmost attention to the match condition of data in node memory and Map required by task deal with data when next round allocating task.So the present invention proposes with the task scheduling of internal storage data recycling for driving.
With the task scheduling process of internal storage data recycling for driving:
1, in Master server, safeguard a global buffer information management table, whether cache information and this node of being responsible for recording each Slave node have enough Slot resources.Each Slave server timing sends information to Master server, reports oneself cache information and Slot resource information, and TaskTracker uses slot to represent the stock number divided in the first-class amount of this node.Slot represents computational resource (CPU, internal memory etc.).Just have an opportunity after a Map Task gets a slot to run.
2, when Jobtrack carries out task scheduling, first global buffer information table is checked, if find the data having Map required by task in buffer memory, then this task of priority allocation.If then do not follow the scheduling strategy of data localization process.
So, according to three grades of caching mechanisms based on HDFS proposed by the invention, Jobtrack follows following process when task scheduling and Data import: when Jobtrack distributes Map task, first checks global buffer information table, preferentially task matching to the node of calling in target data block in internal memory; Then according to the strategy of data localization process, by remaining Map task matching in the node preserving target data block, will the data block of required reading is loaded in internal memory when performing Map task, internal memory replaces the internal memory DBMS buffer scheduling followed based on locality access.
The present invention is based on three grades of caching mechanisms of HFDS, comprise by data localization process be guiding task scheduling, by locality access based on internal memory DBMS buffer scheduling, with internal storage data recycling for drive task scheduling.Internal memory DBMS buffer scheduling process based on locality access, it is store by page that data block is called in internal memory, and each data block is expressed as the chained list of a string page in internal memory, is easy to the recycling of memory headroom.Call in information table by data block and carry out internal memory replacement.

Claims (6)

1., based on three grades of caching methods of Hadoop distributed file system, adopt Apache Hadoop to realize, its method is as follows:
The task scheduling of step one, data localization process, comprises substep again:
1st step, user are to the request of Jobtrack submit job, and Jobtrack obtains the Job data area that will read and breakdown of operation is become several Map tasks and Reduce task;
The data that 2nd step, Jobtrack will read according to each Map task, obtain the DataNode position of depositing these data by the metadata of accessing NameNode;
3rd step, the timing of idle Tasktrack node report oneself situation to Jobtrack, Jobtrack selects the DataNode having target data from the Tasktrack node of these free time, and by corresponding Map task matching in this node;
Step 2, data are accessed in the locality of local internal memory, comprise again following sub-step:
1st step, the memory headroom of server is divided into equal-sized some storage areas, each block region is referred to as page frame;
2nd step, every one page are the most basic Memory Allocation, under the bottom reserved byte of every one page deposits sensing one page address or represent that this data block terminates, each data block is expressed as the chained list of a string page in internal memory;
Safeguard in 3rd step, internal memory that a data block calls in information table, when the data block in internal memory reaches the upper limit of memory space, when new data block needs to call in, uses and does not use replacement algorithm to perform the replacement of data block recently at most; In addition, the position diagram of a memory page table is also safeguarded in internal memory;
The recycling of step 3, local internal storage data:
1st step, in Master server safeguard a global buffer information management table, whether cache information and this node of being responsible for recording each Slave node have enough Slot resources, each Slave server timing sends information to Master server, reports oneself cache information and Slot resource information;
2nd step, when Jobtrack carries out task scheduling, first check global buffer information table, if find the data having Map required by task in buffer memory, then this task of priority allocation, if then do not follow the scheduling strategy of data localization process.
2. a kind of three grades of caching methods based on Hadoop distributed file system according to claim 1, the leaf frame size in its step 2 in the 1st step is 64k.
3. a kind of three grades of caching methods based on Hadoop distributed file system according to claim 1, in its step 2, in the 2nd step, byte number is 4.
4. a kind of three grades of caching methods based on Hadoop distributed file system according to claim 1, in its step 2, the 3rd step does not use replacement algorithm to replace for selecting data block not visited in maximum duration in nearest a period of time recently at most.
5. a kind of three grades of caching methods based on Hadoop distributed file system according to claim 1, in its step 2 the 3rd step data block call in information table to record in have data block sequence number, the initial frame number of data block, data block access time.
6. a kind of three grades of caching methods based on Hadoop distributed file system according to claim 1, the position diagram of a memory page table is also safeguarded in 3rd step internal memory in its step 2, its method is divide one piece of fixed storage region in internal memory, each bit of each internal storage location represents a page, if this page is assigned with, then corresponding bit position 1, otherwise set to 0, to determine whether there is free cells in internal memory.
CN201410455411.XA 2014-06-24 2014-09-09 Three-level caching method based on Hadoop distributed file system Pending CN104270412A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410455411.XA CN104270412A (en) 2014-06-24 2014-09-09 Three-level caching method based on Hadoop distributed file system

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201410287652.8 2014-06-24
CN201410287652 2014-06-24
CN201410455411.XA CN104270412A (en) 2014-06-24 2014-09-09 Three-level caching method based on Hadoop distributed file system

Publications (1)

Publication Number Publication Date
CN104270412A true CN104270412A (en) 2015-01-07

Family

ID=52161901

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410455411.XA Pending CN104270412A (en) 2014-06-24 2014-09-09 Three-level caching method based on Hadoop distributed file system

Country Status (1)

Country Link
CN (1) CN104270412A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105808160A (en) * 2016-02-24 2016-07-27 鄞州浙江清华长三角研究院创新中心 mpCache hybrid storage system based on SSD (Solid State Disk)
CN106843073A (en) * 2017-03-23 2017-06-13 云南工商学院 Cloud control solves the device of electric automatization control
CN106961670A (en) * 2017-05-02 2017-07-18 千寻位置网络有限公司 Geo-fencing system and method for work based on distributed structure/architecture
CN107229673A (en) * 2017-04-20 2017-10-03 努比亚技术有限公司 Method for writing data, Hbase terminals and the storage medium of Hbase databases
CN107368608A (en) * 2017-08-07 2017-11-21 杭州电子科技大学 The HDFS small documents buffer memory management methods of algorithm are replaced based on ARC
CN107480071A (en) * 2017-08-25 2017-12-15 深圳大学 Data cached moving method and device
CN108984617A (en) * 2018-06-13 2018-12-11 西安交通大学 A kind of metadata catalog structure implementation method towards memory cloud
CN112070062A (en) * 2020-09-23 2020-12-11 南京工业职业技术大学 Hadoop-based crop waterlogging image classification detection and implementation method
CN113641648A (en) * 2021-08-18 2021-11-12 山东省计算中心(国家超级计算济南中心) Distributed cloud security storage method, system and storage medium
CN107562926B (en) * 2017-09-14 2023-09-26 丙申南京网络技术有限公司 Multi-hadoop distributed file system for big data analysis

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663117A (en) * 2012-04-18 2012-09-12 中国人民大学 OLAP (On Line Analytical Processing) inquiry processing method facing database and Hadoop mixing platform
CN103414761A (en) * 2013-07-23 2013-11-27 北京工业大学 Mobile terminal cloud resource scheduling method based on Hadoop framework
CN103530387A (en) * 2013-10-22 2014-01-22 浪潮电子信息产业股份有限公司 Improved method aimed at small files of HDFS
CN103617087A (en) * 2013-11-25 2014-03-05 华中科技大学 MapReduce optimizing method suitable for iterative computations
CN103761146A (en) * 2014-01-06 2014-04-30 浪潮电子信息产业股份有限公司 Method for dynamically setting quantities of slots for MapReduce

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663117A (en) * 2012-04-18 2012-09-12 中国人民大学 OLAP (On Line Analytical Processing) inquiry processing method facing database and Hadoop mixing platform
CN103414761A (en) * 2013-07-23 2013-11-27 北京工业大学 Mobile terminal cloud resource scheduling method based on Hadoop framework
CN103530387A (en) * 2013-10-22 2014-01-22 浪潮电子信息产业股份有限公司 Improved method aimed at small files of HDFS
CN103617087A (en) * 2013-11-25 2014-03-05 华中科技大学 MapReduce optimizing method suitable for iterative computations
CN103761146A (en) * 2014-01-06 2014-04-30 浪潮电子信息产业股份有限公司 Method for dynamically setting quantities of slots for MapReduce

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105808160B (en) * 2016-02-24 2019-02-05 鄞州浙江清华长三角研究院创新中心 MpCache based on SSD mixes storage system
CN105808160A (en) * 2016-02-24 2016-07-27 鄞州浙江清华长三角研究院创新中心 mpCache hybrid storage system based on SSD (Solid State Disk)
CN106843073A (en) * 2017-03-23 2017-06-13 云南工商学院 Cloud control solves the device of electric automatization control
CN106843073B (en) * 2017-03-23 2019-07-30 云南工商学院 Cloud control solves the device of electric automatization control
CN107229673A (en) * 2017-04-20 2017-10-03 努比亚技术有限公司 Method for writing data, Hbase terminals and the storage medium of Hbase databases
CN106961670B (en) * 2017-05-02 2019-03-12 千寻位置网络有限公司 Geo-fencing system and working method based on distributed structure/architecture
CN106961670A (en) * 2017-05-02 2017-07-18 千寻位置网络有限公司 Geo-fencing system and method for work based on distributed structure/architecture
CN107368608A (en) * 2017-08-07 2017-11-21 杭州电子科技大学 The HDFS small documents buffer memory management methods of algorithm are replaced based on ARC
CN107480071A (en) * 2017-08-25 2017-12-15 深圳大学 Data cached moving method and device
CN107562926B (en) * 2017-09-14 2023-09-26 丙申南京网络技术有限公司 Multi-hadoop distributed file system for big data analysis
CN108984617A (en) * 2018-06-13 2018-12-11 西安交通大学 A kind of metadata catalog structure implementation method towards memory cloud
CN112070062A (en) * 2020-09-23 2020-12-11 南京工业职业技术大学 Hadoop-based crop waterlogging image classification detection and implementation method
CN113641648A (en) * 2021-08-18 2021-11-12 山东省计算中心(国家超级计算济南中心) Distributed cloud security storage method, system and storage medium
CN113641648B (en) * 2021-08-18 2023-04-21 山东省计算中心(国家超级计算济南中心) Distributed cloud secure storage method, system and storage medium

Similar Documents

Publication Publication Date Title
CN104270412A (en) Three-level caching method based on Hadoop distributed file system
CN107169083B (en) Mass vehicle data storage and retrieval method and device for public security card port and electronic equipment
CN105144121B (en) Cache content addressable data block is for Storage Virtualization
CN103812939B (en) Big data storage system
CN108885582A (en) Multi-tenant memory services for memory pool architecture
US11561930B2 (en) Independent evictions from datastore accelerator fleet nodes
US20140358977A1 (en) Management of Intermediate Data Spills during the Shuffle Phase of a Map-Reduce Job
WO2019085769A1 (en) Tiered data storage and tiered query method and apparatus
EP3443471B1 (en) Systems and methods for managing databases
US9305112B2 (en) Select pages implementing leaf nodes and internal nodes of a data set index for reuse
CN106716409A (en) Method and system for adaptively building and updating column store database from row store database based on query demands
CN105701219B (en) A kind of implementation method of distributed caching
CN105138679B (en) A kind of data processing system and processing method based on distributed caching
CN104679898A (en) Big data access method
US11080207B2 (en) Caching framework for big-data engines in the cloud
CN103455577A (en) Multi-backup nearby storage and reading method and system of cloud host mirror image file
CN103366016A (en) Electronic file concentrated storing and optimizing method based on HDFS
CN111737168B (en) Cache system, cache processing method, device, equipment and medium
US10191663B1 (en) Using data store accelerator intermediary nodes and write control settings to identify write propagation nodes
CN109697016A (en) Method and apparatus for improving the storage performance of container
CN103491155A (en) Cloud computing method and system for achieving mobile computing and obtaining mobile data
CN103795801A (en) Metadata group design method based on real-time application group
CN103595799A (en) Method for achieving distributed shared data bank
CN104572505A (en) System and method for ensuring eventual consistency of mass data caches
CN106302659A (en) A kind of based on cloud storage system promotes access data quick storage method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150107