CN103646114B - Characteristic extracting method and device in hard disk SMART data - Google Patents

Characteristic extracting method and device in hard disk SMART data Download PDF

Info

Publication number
CN103646114B
CN103646114B CN201310733574.5A CN201310733574A CN103646114B CN 103646114 B CN103646114 B CN 103646114B CN 201310733574 A CN201310733574 A CN 201310733574A CN 103646114 B CN103646114 B CN 103646114B
Authority
CN
China
Prior art keywords
data
attribute
smart
hard disk
smart data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310733574.5A
Other languages
Chinese (zh)
Other versions
CN103646114A (en
Inventor
胡光
胡殿明
杨文君
魏伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201310733574.5A priority Critical patent/CN103646114B/en
Publication of CN103646114A publication Critical patent/CN103646114A/en
Application granted granted Critical
Publication of CN103646114B publication Critical patent/CN103646114B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention proposes characteristic extracting method and device in a kind of hard disk SMART self-monitorings, analysis and reporting techniques data, wherein, the method is comprised the following steps:The SMART data acquisition systems of sample hard disk are obtained, wherein, SMART data acquisition systems include Q SMART data and Q hard disk type information corresponding with Q SMART data difference;Q SMART data are normalized, to generate Q normalization SMART data;Respectively Q normalization SMART data are modified according to Q hard disk type information, to generate amendment SMART data acquisition systems;Hard disk characteristic is generated according to amendment SMART data acquisition systems.The method of the embodiment of the present invention, can pass through same fault pre-alarming model realization and the fault pre-alarming of different hard disks is tested and analyzed, improve the accuracy of fault pre-alarming model, reduce model training, test and analysis cost.

Description

Characteristic extracting method and device in hard disk SMART data
Technical field
The present invention relates to technical field of memory, more particularly to a kind of hard disk SMART self-monitorings, analysis and reporting techniques number According to middle characteristic extracting method and device.
Background technology
As hard disk failure can be from hard disk SMART (Self Monitoring Analysis And Reporting Technology, self-monitoring, analysis and reporting techniques) reflect in data, therefore in hard disk failure early warning analysis, can Whether can be broken down within following a period of time according to the SMART data analysiss hard disk of hard disk.At present, machine can be passed through Learning algorithm trains fault pre-alarming model according to certain attribute in SMART data, so as to according to the fault pre-alarming model to hard The SMART data of disk are analyzed to predict whether hard disk being capable of steady operation within following a period of time.
But, it is due to the eigenvalue representation disunity of different attribute in SMART data and excessively discrete, it is difficult to Predict joint effect of some different attributes to hard disk.And, when training pattern, existing characteristics value is lacked some attributes Situation, increase analysis SMART data difficulty so that model prediction is inaccurate.Additionally, the hard disc data of different vendor Eigenvalue calculation mode disunity, is unfavorable for that unified numerical characteristics are represented, it is therefore desirable to the SMART to the hard disk of each manufacturer Data are respectively trained fault pre-alarming model to carry out fault pre-alarming analysis, and this is accomplished by repeatedly carrying out model training, so that Analysis cost is increased significantly.
The content of the invention
It is contemplated that at least solving above-mentioned technical problem to a certain extent.
For this purpose, first purpose of the present invention is to propose characteristic extracting method in a kind of hard disk SMART data, should Method is only capable of achieving the fault pre-alarming to different hard disks by same fault pre-alarming model without the need for multiple fault pre-alarming models Test and analysis, improve the accuracy of fault pre-alarming model, reduce model training, test and analysis cost.
Second object of the present invention is to propose characteristic extraction element in a kind of hard disk SMART data.
It is that, up to above-mentioned purpose, during first aspect present invention embodiment proposes a kind of hard disk SMART data, characteristic is carried Method is taken, is comprised the following steps:The SMART data acquisition systems of sample hard disk are obtained, wherein, the SMART data acquisition systems include Q SMART data and Q hard disk type information corresponding with the Q SMART data difference;The Q SMART data are carried out Normalized, to generate Q normalization SMART data;According to the Q hard disk type information respectively to the Q normalizing Change SMART data to be modified, to generate amendment SMART data acquisition systems;Hard disk is generated according to the amendment SMART data acquisition systems Characteristic.
Characteristic extracting method in the hard disk SMART data of the embodiment of the present invention, by the SMART numbers to sample hard disk According to being normalized, and normalized hard disk SMART data are modified according to type of hardware information, thus, are made Hard disk SMART data have identical codomain, and by being modified subregion to normalized hard disk SMART data, so as to Fault pre-alarming test to different hard disks is capable of achieving by same fault pre-alarming model and is analyzed, improve fault pre-alarming mould The accuracy of type, reduces model training, test and analysis cost.
It is that during second aspect present invention embodiment provides a kind of hard disk SMART data, characteristic is carried up to above-mentioned purpose Device is taken, including:First acquisition module, for obtaining the SMART data acquisition systems of sample hard disk, wherein, the SMART data sets Conjunction includes Q SMART data and Q hard disk type information corresponding with the Q SMART data difference;First generation module, For being normalized to the Q SMART data, to generate Q normalization SMART data;Correcting module, for root Respectively the Q normalization SMART data are modified according to the Q hard disk type information, to generate amendment SMART data Set;Second generation module, for generating hard disk characteristic according to the amendment SMART data acquisition systems.
Characteristic extraction element in the hard disk SMART data of the embodiment of the present invention, by the SMART numbers to sample hard disk According to being normalized, and normalized hard disk SMART data are modified according to type of hardware information, thus, are made Hard disk SMART data have identical codomain, and by being modified subregion to normalized hard disk SMART data, so as to Fault pre-alarming test to different hard disks is capable of achieving by same fault pre-alarming model and is analyzed, improve fault pre-alarming mould The accuracy of type, reduces model training, test and analysis cost.
The additional aspect and advantage of the present invention will be set forth in part in the description, and partly will become from the following description Obtain substantially, or recognized by the practice of the present invention.
Description of the drawings
The above-mentioned and/or additional aspect and advantage of the present invention will become from the description with reference to accompanying drawings below to embodiment It is substantially and easy to understand, wherein:
Fig. 1 be one embodiment of the invention hard disk SMART data in characteristic extracting method flow chart;
Fig. 2 be another embodiment of the present invention hard disk SMART data in characteristic extracting method flow chart;
Fig. 3 be a specific embodiment of the invention hard disk SMART data in gradient data normalized Analysis result Schematic diagram;
Fig. 4 be the present invention a specific embodiment hard disk SMART data in attribute data normalized Analysis result Schematic diagram;
Fig. 5 be one embodiment of the invention hard disk SMART data in characteristic extraction element structural representation;
Fig. 6 be another embodiment of the present invention hard disk SMART data in characteristic extraction element structural representation; And
Fig. 7 be another embodiment of the invention hard disk SMART data in characteristic extraction element structural representation.
Specific embodiment
Embodiments of the invention are described below in detail, the example of the embodiment is shown in the drawings, wherein from start to finish Same or similar label represents same or similar element or the element with same or like function.Below with reference to attached The embodiment of figure description is exemplary, is only used for explaining the present invention, and is not considered as limiting the invention.
In describing the invention, it is to be understood that term " " center ", " longitudinal direction ", " horizontal ", " on ", D score, The orientation or position relationship of the instruction such as "front", "rear", "left", "right", " vertical ", " level ", " top ", " bottom ", " interior ", " outward " is Based on orientation shown in the drawings or position relationship, it is for only for ease of the description present invention and simplifies description, rather than indicate or dark Show that the device or element of indication there must be specific orientation, with specific azimuth configuration and operation therefore it is not intended that right The restriction of the present invention.Additionally, term " first ", " second " are only used for describing purpose, and it is not intended that indicating or implying relative Importance.
In describing the invention, it should be noted that unless otherwise clearly defined and limited, term " installation ", " phase Company ", " connection " should be interpreted broadly, for example, it may be being fixedly connected, or being detachably connected, or be integrally connected;Can Being to be mechanically connected, or electrically connect;Can be joined directly together, it is also possible to be indirectly connected to by intermediary, Ke Yishi The connection of two element internals.For the ordinary skill in the art, above-mentioned term can be understood at this with concrete condition Concrete meaning in invention.
At present, machine learning algorithm can be passed through and fault pre-alarming model is trained according to certain attribute in SMART data, so as to Whether prediction hard disk being capable of steady operation within following a period of time.If however, existing fault pre-alarming model cannot be embodied Joint effect of the dry SMART attributes to hard disk, and also the hard disk SMART data of several model cannot be added to a failure It is trained in the middle of Early-warning Model.Therefore, hard disk failure early warning is inaccurate, and needs to difference in hard disk prealarming process Different faults Early-warning Model corresponding to the hard disk of model is analyzed, and analysis cost is higher.If can be by the hard of different model Property value in disk SMART data is processed so that each property value represents unified, then can be by the hardware of some different models Data are trained in being added to a fault pre-alarming model, thus can reduce fault pre-alarming model training number of times, and reduction is parsed into This.For this purpose, the present invention proposes characteristic extracting method in a kind of hard disk SMART data.
Fig. 1 be one embodiment of the invention hard disk SMART data in characteristic extracting method flow chart.
As shown in figure 1, characteristic extracting method is comprised the following steps in hard disk SMART data
S11, obtains the SMART data acquisition systems of sample hard disk.
In one embodiment of the invention, SMART data acquisition systems include Q SMART data and with Q SMART data The corresponding Q hard disk type information of difference.Wherein, SMART data acquisition systems are Q same kind and/or different types of hard disk Such as hard disk related to hard disk recorded in SMART seeks the attribute datas such as error rate, hard disk temperature, and corresponding The data acquisition system of hard disk type information.Wherein, hard disk type information refer to by HD vendor provide it is related to hard disk for example The data messages such as hard disk model, hard disk ID (Identity).For example, when machine algorithm study is carried out, the SMART of hard disk Data acquisition system includes that the hard disk tracking error rate of multiple different hard disks, hard disk power up the attribute datas such as number of times, hard disk temperature, with And the information such as the corresponding hard disk model of hard disk, hard disk ID (Identity).
Q SMART data are normalized by S12, to generate Q normalization SMART data.
In one embodiment of the invention, can be in Q different type and/or different types of hard disk SMART data Each attribute data be normalized respectively, so as to each attribute data that will there are different codomains in SMART data The data being normalized in same codomain.Thus, it is capable of achieving unified Analysis and the place to different types of hardware SMART data Reason.
Q normalization SMART data are modified respectively, to generate amendment by S13 according to Q hard disk type information SMART data acquisition systems.
In order to the test result for obtaining different hard disks respectively in the test result of hard disk failure Early-warning Model, at this In one embodiment of invention, corresponding data-bias can be set respectively to different hard disks according to the type information of each hard disk Measure, and the data offset according to corresponding to each hard disk is modified to realize to SMART to Q normalization SMART data The subregion of data acquisition system.
S14, generates hard disk characteristic according to amendment SMART data acquisition systems.
Characteristic extracting method in the hard disk SMART data of the embodiment of the present invention, by the SMART numbers to sample hard disk According to being normalized, and normalized hard disk SMART data are modified according to type of hardware information, thus, are made Hard disk SMART data have identical codomain, and by being modified subregion to normalized hard disk SMART data, so as to Fault pre-alarming test to different hard disks is capable of achieving by same fault pre-alarming model and is analyzed, improve fault pre-alarming mould The accuracy of type, reduces model training, test and analysis cost.
In order that the fault pre-alarming model performance for training is more preferably, it is being normalized to Q SMART data Before, the corresponding Grad of each attribute data in each SMART data can be also obtained by method of least square.Specifically, Fig. 2 be another embodiment of the present invention hard disk SMART data in characteristic extracting method flow chart.
As shown in Fig. 2 characteristic extracting method is comprised the following steps in hard disk SMART data.
S21, obtains the SMART data acquisition systems of sample hard disk.
In one embodiment of the invention, SMART data acquisition systems include Q SMART data and with Q SMART data The corresponding Q hard disk type information of difference.Wherein, SMART data acquisition systems are Q same kind and/or different types of hard disk Such as hard disk related to hard disk recorded in SMART seeks the attribute datas such as error rate, hard disk temperature, and corresponding The data acquisition system of hard disk type information.Wherein, hard disk type information refer to by HD vendor provide it is related to hard disk for example The data messages such as hard disk model, hard disk ID (Identity).For example, when machine algorithm study is carried out, the SMART of hard disk Data acquisition system includes that the hard disk tracking error rate of multiple different hard disks, hard disk power up the attribute datas such as number of times, hard disk temperature, with And the information such as the corresponding hard disk model of hard disk, hard disk ID (Identity).
S22, obtains the S corresponding S gradient data collection of the first attribute data subclass in each SMART data respectively Close.
In an embodiment of the present invention, SMART data acquisition systems also include S attribute data corresponding with S attribute difference Set, each SMART data include and the S corresponding S of attribute difference the first attribute data subclass.Wherein, the first category Property data subset to close be data acquisition system corresponding to certain attribute in SMART.For example, the first attribute data subclass can be The data acquisition system corresponding to hard disk tracking error rate attribute in SMART.
In an embodiment of the present invention, first, corresponding first attribute number of s-th attribute from each SMART data According to M attribute data is chosen in subclass successively, to generate P the second attribute data subclass, wherein, each second attribute number Include M attribute data, P=N-M+1 according to subclass, N is attribute data in the corresponding first attribute data subclass of attribute s Sum, s=1 ... S.
Then, the P corresponding P gradient data of the second attribute data subclass is calculated respectively, and according to P gradient data Generate the corresponding gradient data set of attribute s.Specifically, after P the second attribute data subset is obtained, can be by weighting most Little square law obtains the P corresponding P fitting coefficient of the second attribute data subclass respectively, specifically, can first obtain P plan I-th fitting coefficient k in syzygy numberi=(Z-b*Y)/X, wherein, i=1 ... P, specifically, X, Y, Z and b can be by following public affairs Obtained by formula:
Wherein, wjFor the corresponding default weight of j-th attribute data in the corresponding first attribute data subclass of attribute s, xj For the detection time of j-th attribute data in the corresponding first attribute data subclass of attribute s, yjFor attribute s corresponding first J-th attribute data in attribute data subclass.
I-th fitting coefficient k in P fitting coefficient is obtainediAfterwards, P gradient can be obtained respectively by below equation I-th gradient data Grad in datai
Gradi=ki*(M-1)*yM+i-1
Wherein, ki* (M-1) is represented between two attribute datas of the straight line for being fitted out by weighted least-squares method Drop value, the symbol of drop value represent the trend of whole piece straight line, then are multiplied with the y values of last attribute data, are obtained final Grad, now the size of Grad can both represent overall variation trend, also may indicate that the intensity of variation tendency.
After P gradient data is obtained, the corresponding gradient data set of attribute s can be generated according to P gradient data.
It should be appreciated that S the first attribute data subset in each SMART data can be finally obtained by above-mentioned steps Close S corresponding gradient data set.
S23, using S gradient data set of each the SMART data for obtaining as S the first new attribute data subset Conjunction is separately added into each SMART data.
Q SMART data are normalized by S24, to generate Q normalization SMART data.
In an embodiment of the present invention, Q SMART data can be normalized by below equation:
G (x)=sign (x) × logy| x |,
Wherein, x is an attribute data in Q SMART data, after g (x) is for the corresponding normalization of attribute data x Attribute data, wherein, y can be calculated by below equation:
yz≤Value<(y+Δy)z,
Wherein, z is predetermined threshold value, and factory-defaults of the Value for the corresponding attributes of attribute data x, Δ y are default essence Degree.
For example, (1900,2000) corresponding to greatest gradient value account for sum more than 70% when, can by calculate The Grad for obtaining is Grad=1.078, y=1.071, then gradient normalization image as shown in Figure 3 is obtained and such as Fig. 4 institutes The attribute data normalized image shown.
Q normalization SMART data are modified respectively, to generate amendment by S25 according to Q hard disk type information SMART data acquisition systems.
In order to the test result for obtaining different hard disks respectively in the test result of hard disk failure Early-warning Model, at this In inventive embodiment, can according to Q hard disk type acquisition of information Q correction value corresponding with Q hard disk type information difference, And according to corresponding Q correction value is repaiied to corresponding normalization SMART data respectively respectively with Q hard disk type information Just.For example, corresponding data offset, and root can be set respectively according to the type information of each hard disk to different hard disks Q normalization SMART data are modified according to the data offset corresponding to each hard disk to realize to SMART data acquisition systems Subregion.
S26, generates hard disk characteristic according to amendment SMART data acquisition systems.
In an embodiment of the present invention, each amendment SMART data acquisition system includes and the S corresponding S of attribute difference Amendment attribute data set.Specifically, the corresponding S training characteristics data of S attribute can be obtained respectively, and respectively to each Training characteristics value in training characteristics data is ranked up to generate S characteristic sequence (V corresponding with S attributei).Wherein, The feature of each property value v in the corresponding amendment attribute data set of each attribute can be obtained by following default mapping ruler Value f (v):
After the eigenvalue for obtaining each property value, can be according to the corresponding amendment attribute data collection of each attribute for obtaining The eigenvalue of each property value in conjunction generates hard disk characteristic.Thus, can cause to correct each in SMART data acquisition systems Eigenvalue corresponding to training data is trained in being all applied to fault pre-alarming model by mapping ruler, it is to avoid training In model process, the defect of eigenvalue disappearance, improves the accuracy of failure predication model.
Multiple SMART data are being returned by characteristic extracting method in the hard disk SMART data of the embodiment of the present invention Before one change is processed, the corresponding gradient data set of attribute data in each SMART data is obtained by method of least square, and with Gradient data set updates corresponding first attribute data set, thus, the variation tendency of SMRAT data can be caused to highlight Come, then coordinate machine learning algorithm, the fault pre-alarming model performance for training can be made more preferably.
In order to realize above-described embodiment, the present invention also proposes characteristic extraction element in a kind of hard disk SMART data.
Characteristic extraction element in a kind of hard disk SMART data, including:First acquisition module is hard for obtaining sample The SMART data acquisition systems of disk, wherein, SMART data acquisition systems include Q SMART data and corresponding respectively with Q SMART data Q hard disk type information;First generation module, for being normalized to Q SMART data, to generate Q normalizing Change SMART data;Correcting module, for being modified to Q normalization SMART data according to Q hard disk type information respectively, To generate amendment SMART data acquisition systems;Second generation module, for generating hard disk characteristic number according to amendment SMART data acquisition systems According to.
Fig. 5 be one embodiment of the invention hard disk SMART data in characteristic extraction element structural representation.
As shown in figure 5, characteristic extraction element includes in hard disk SMART data:First acquisition module 100, first is given birth to Into module 200, correcting module 300 and the second generation module 400.
Specifically, the first acquisition module 100 is used to obtain the SMART data acquisition systems of sample hard disk.Wherein, SMART data Set includes Q SMART data and Q hard disk type information corresponding with Q SMART data difference.In other words, SMART numbers It is that such as hard disk related to hard disk recorded in Q same kind and/or different types of hard disk SMART seeks out according to set The attribute datas such as error rate, hard disk temperature, and the data acquisition system of corresponding hard disk type information.Wherein, hard disk type letter Breath refers to such as hard disk model related to hard disk, the data message such as hard disk ID (Identity) provided by HD vendor.Lift For example, when machine algorithm study is carried out, the first acquisition module 100 can obtain multiple in the SMART data acquisition systems of hard disk The hard disk tracking error rate of different hard disks, hard disk power up the attribute datas such as number of times, hard disk temperature, and the corresponding hard disk type of hard disk Number, the data message such as hard disk ID (Identity).
First generation module 200 for being normalized to Q SMART data, to generate Q normalization SMART Data.Specifically, in one embodiment of the invention, 200 cocoa of the first generation module is to Q different type and/or difference Each attribute data in the hard disk SMART data of type is normalized respectively, so as to will have not in SMART data The data in same codomain are normalized to each attribute data of codomain.Thus, it is capable of achieving to different types of hardware SMART The unified Analysis of data and process.
Correcting module 300 for being modified to Q normalization SMART data according to Q hard disk type information respectively, with Generate amendment SMART data acquisition systems.Specifically, in one embodiment of the invention, correcting module 300 can be according to each hard disk Type information set corresponding data offset, and the data-bias according to corresponding to each hard disk to different hard disks respectively Amount is modified to realize the subregion to SMART data acquisition systems to Q normalization SMART data.Thus, in hard disk failure early warning The test result of different hard disks is obtained in the test result of model respectively can.
Second generation module 400 is for according to amendment SMART data acquisition system generation hard disk characteristics.
Characteristic extraction element in the hard disk SMART data of the embodiment of the present invention, by the SMART numbers to sample hard disk According to being normalized, and normalized hard disk SMART data are modified according to type of hardware information, thus, are made Hard disk SMART data can have identical codomain, and by being modified subregion to normalized hard disk SMART data, The fault pre-alarming test to different hard disks and analysis are capable of achieving so as to pass through same fault pre-alarming model, failure is improve pre- The accuracy of alert model, reduces model training, test and analysis cost.
Fig. 6 be another embodiment of the present invention hard disk SMART data in characteristic extraction element structural representation.
As shown in fig. 6, characteristic extraction element includes in hard disk SMART data:First acquisition module 100, first is given birth to Into module 200, correcting module 300, the second generation module 400, the second acquisition module 500 and addition module 600.
In an embodiment of the present invention, SMART data acquisition systems include S attribute data collection corresponding with S attribute difference Close, each SMART data includes and the S corresponding S of attribute difference the first attribute data subclass.Wherein, the first attribute It is data acquisition system corresponding to certain attribute in SMART that data subset is closed.For example, the first attribute data subclass can be The data acquisition system corresponding to hard disk tracking error rate attribute in SMART.
Specifically, the second acquisition module 500 is used to obtain S the first attribute data in each SMART data respectively Gather corresponding S gradient data set.Add S gradient data collection of each SMART data of the module 600 for obtaining Cooperate to be separately added into each SMART data for S the first new attribute data subclass.
Fig. 7 be another embodiment of the invention hard disk SMART data in characteristic extraction element structural representation.
As shown in fig. 7, characteristic extraction element includes in hard disk SMART data:First acquisition module 100, first is given birth to Into module 200, correcting module 300, the second generation module 400, the second acquisition module 500 and addition module 600.Wherein, second Acquisition module 500 includes:First signal generating unit 510 and the second signal generating unit 520, wherein, the first generation module 200 includes processing Unit 210, correcting module 300 include:First acquisition unit 310 and amending unit 320, the second generation module 400 include:Second Obtaining unit 410, sequencing unit 420 and the 3rd acquiring unit 430.Wherein, the second signal generating unit 520 includes:First obtains son Unit 521 and second obtains subelement 522.
Specifically, the first signal generating unit 510 is for corresponding first attribute of s-th attribute from each SMART data Data subset chooses M attribute data in closing successively, to generate P the second attribute data subclass, wherein, each second attribute Data subset is closed includes M attribute data, P=N-M+1, and N is attribute number in the corresponding first attribute data subclass of attribute s According to sum, s=1 ... S.
Second signal generating unit 520 is used to calculate the P corresponding P gradient data of the second attribute data subclass respectively, and The corresponding gradient data set of attribute s is generated according to P gradient data.
Processing unit 210 is used to be normalized Q SMART data by below equation:
G (x)=sign (x) × logy| x |,
Wherein, x is an attribute data in Q SMART data, after g (x) is for the corresponding normalization of attribute data x Attribute data, y can be calculated by below equation:
yz≤Value<(y+Δy)z,
Wherein, z is predetermined threshold value, and factory-defaults of the Value for the corresponding attributes of attribute data x, Δ y are default essence Degree.
First acquisition unit 310 is for corresponding with Q hard disk type information difference according to Q hard disk type acquisition of information Q correction value.
Amending unit 320 is used for according to corresponding Q correction value is returned to corresponding respectively respectively with Q hard disk type information One change SMART data are modified.
Second obtaining unit 410 is used for the corresponding S training characteristics data of S attribute of acquisition respectively.
Sequencing unit 420 is used to be ranked up the training characteristics value in each training characteristics data with generation and S respectively The corresponding S characteristic sequence (V of individual attributei)。
3rd acquiring unit 430 is used to obtain the corresponding amendment attribute data of each attribute by following default mapping ruler Eigenvalue f (v) of each property value v in set:
3rd signal generating unit 440 is for according to each category in the corresponding amendment attribute data set of each attribute for obtaining Property value eigenvalue generate hard disk characteristic.
First acquisition subelement 521 is used to obtain P the second attribute data subclass respectively by weighted least-squares method Corresponding P fitting coefficient, wherein, i-th fitting coefficient k in P fitting coefficienti=(Z-b*Y)/X,
Wherein, i=1 ... P,
wjFor the corresponding default weight of j-th attribute data in the corresponding first attribute data subclass of attribute s, xjFor category The detection time of j-th attribute data, y in the property corresponding first attribute data subclass of sjFor corresponding first attributes of attribute s J-th attribute data in data subset conjunction.
Second acquisition subelement 522 is used to obtain i-th gradient data in P gradient data respectively by below equation Gradi
Gradi=ki*(M-1)*yM+i-1
Characteristic extraction element in the hard disk SMART data of the embodiment of the present invention, obtains each by method of least square The corresponding gradient data set of attribute data in SMART data, and corresponding first attribute data is updated with gradient data set Set, thus, can cause the variation tendency of SMRAT data to highlight, then coordinate machine learning algorithm, can make what is trained Fault pre-alarming model performance is more preferably.
In flow chart or here any process described otherwise above or method description are construed as, expression includes It is one or more for realizing specific logical function or process the step of the module of code of executable instruction, fragment or portion Point, and the scope of the preferred embodiment of the present invention includes other realization, wherein the suitable of shown or discussion can not be pressed Sequence, including according to involved function by it is basic simultaneously in the way of or in the opposite order, carry out perform function, this should be of the invention Embodiment person of ordinary skill in the field understood.
Expression or here logic described otherwise above and/or step, for example, are considered use in flow charts In the order list of the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (as computer based system, the system including processor or other can hold from instruction The system of row system, device or equipment instruction fetch execute instruction) use, or with reference to these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium " can any can be included, store, communicate, propagate or pass The dress that defeated program is used for instruction execution system, device or equipment or with reference to these instruction execution systems, device or equipment Put.The more specifically example (non-exhaustive list) of computer-readable medium is including following:With the electricity that one or more connect up Connecting portion (electronic installation), portable computer diskette box (magnetic device), random access memory (RAM), read only memory (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device, and portable optic disk is read-only deposits Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can print described program thereon or other are suitable Medium, because for example by carrying out optical scanning to paper or other media edlin, interpretation can then be entered or if necessary with which His suitable method is processed to electronically obtain described program, is then stored in computer storage.
It should be appreciated that each several part of the present invention can be realized with hardware, software, firmware or combinations thereof.Above-mentioned In embodiment, the software that multiple steps or method can be performed in memory and by suitable instruction execution system with storage Or firmware is realizing.For example, if realized with hardware, and in another embodiment, can be with well known in the art Any one of row technology or their combination are realizing:With for the logic gates of logic function is realized to data signal Discrete logic, the special IC with suitable combinational logic gate circuit, programmable gate array (PGA), scene Programmable gate array (FPGA) etc..
Those skilled in the art are appreciated that to realize all or part of step that above-described embodiment method is carried Suddenly the hardware that can be by program to instruct correlation is completed, and described program can be stored in a kind of computer-readable storage medium In matter, the program upon execution, including one or a combination set of the step of embodiment of the method.
Additionally, each functional unit in each embodiment of the invention can be integrated in a processing module, it is also possible to It is that unit is individually physically present, it is also possible to which two or more units are integrated in a module.Above-mentioned integrated mould Block both can be realized in the form of hardware, it would however also be possible to employ the form of software function module is realized.The integrated module is such as Fruit using in the form of software function module realize and as independent production marketing or use when, it is also possible to be stored in a computer In read/write memory medium.
Storage medium mentioned above can be read only memory, disk or CD etc..
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show Example ", or the description of " some examples " etc. mean specific features with reference to the embodiment or example description, structure, material or spy Point is contained at least one embodiment or example of the present invention.In this manual, to the schematic representation of above-mentioned term not Identical embodiment or example are referred to necessarily.And, the specific features of description, structure, material or feature can be any One or more embodiments or example in combine in an appropriate manner.
Although an embodiment of the present invention has been shown and described, it will be understood by those skilled in the art that:Not These embodiments can be carried out with various changes, modification, replacement and modification in the case of the principle and objective that depart from the present invention, this The scope of invention is limited by claim and its equivalent.

Claims (12)

1. a kind of hard disk SMART self-monitorings, analysis and reporting techniques data in characteristic extracting method, it is characterised in that Including:
Obtain sample hard disk SMART data acquisition systems, wherein, the SMART data acquisition systems include Q SMART data and with institute State Q SMART data and distinguish corresponding Q hard disk type information;
The Q SMART data are normalized, to generate Q normalization SMART data;
Respectively the Q normalization SMART data are modified according to the Q hard disk type information, to generate amendment SMART data acquisition systems;
Hard disk characteristic is generated according to the amendment SMART data acquisition systems;
Wherein, the SMART data acquisition systems include S attribute data set corresponding with S attribute difference, described in each SMART data include and the S corresponding S of attribute difference the first attribute data subclass, described to the Q SMART Before data are normalized, also include:
The S corresponding S gradient data set of the first attribute data subclass in each described SMART data is obtained respectively;
Using S gradient data set of each the SMART data for obtaining as S the first new attribute data subclass point Jia Ru not described each SMART data.
2. the method for claim 1, it is characterised in that obtain S first category in each described SMART data respectively Property data subset close corresponding S gradient data set and specifically include:
M attribute is chosen successively in the corresponding first attribute data subclass of s-th attribute from SMART data each described Data, to generate P the second attribute data subclass, wherein, P=N-M+1, N are corresponding first attribute datas of the attribute s The sum of attribute data in subclass, s=1 ... S;
The P corresponding P gradient data of the second attribute data subclass is calculated respectively, and according to the P gradient data Generate the corresponding gradient data set of the attribute s.
3. method as claimed in claim 2, it is characterised in that described to calculate the P the second attribute data subclass respectively Corresponding P gradient data is specifically included:
The P corresponding P fitting coefficient of the second attribute data subclass is obtained respectively by weighted least-squares method, its In, i-th fitting coefficient k in the P fitting coefficienti=(Z-b*Y)/X,
Wherein, i=1 ... P,
X = &Sigma; j = i M + i - 1 w j * x j 2 ,
Y = &Sigma; j = i M + i - 1 w j * x i ,
Z = &Sigma; j = i M + i - 1 w j * x j * y j ,
b = ( Z * Y - &Sigma; j = i M + i - 1 ( w j * y j ) * X ) / ( Y * Y - X * &Sigma; j = i M + i - 1 w j ) ,
wjFor the corresponding default weight of j-th attribute data in the corresponding first attribute data subclass of the attribute s, xjFor institute State the detection time of j-th attribute data in the corresponding first attribute data subclass of attribute s, yjIt is corresponding for the attribute s J-th attribute data in first attribute data subclass;
Obtain i-th gradient data Grad in the P gradient data by below equation respectivelyi
Gradi=ki*(M-1)*yM+i-1
4. the method as described in any one of claim 1-3, it is characterised in that described that normalizing is carried out to the Q SMART data Change is processed and is specifically included:
The Q SMART data are normalized by below equation:
G (x)=sign (x) × logy| x |,
Wherein, x is an attribute data in the Q SMART data, after g (x) is for the corresponding normalization of attribute data x Attribute data, the y can be calculated by below equation:
yz≤Value<(y+Δy)z,
Wherein, z is predetermined threshold value, and Value is the factory-default of the corresponding attributes of the attribute data x, and Δ y is default smart Degree.
5. the method as described in any one of claim 1-3, it is characterised in that described according to the Q hard disk type information point Other being modified to the Q normalization SMART data specifically includes:
According to Q hard disk type acquisition of information Q correction value corresponding with the Q hard disk type information difference;
According to corresponding Q correction value is entered to corresponding normalization SMART data respectively respectively with the Q hard disk type information Row amendment.
6. the method as described in any one of claim 1-3, it is characterised in that the amendment SMART data acquisition systems include with Corresponding S of the S attribute difference corrects attribute data set, described to be generated firmly according to the amendment SMART data acquisition systems Disk characteristic is specifically included:
The corresponding S training characteristics data of the S attribute are obtained respectively;
Respectively the training characteristics value in each training characteristics data is ranked up corresponding with the S attribute S to generate Characteristic sequence (Vi);
The spy of each property value v in the corresponding amendment attribute data set of each attribute is obtained by following default mapping ruler Value indicative f (v):
f ( v ) = V i V i &le; v &le; &lsqb; ( V i + 1 - V i ) / 2 &rsqb; f ( v ) = V i + 1 V i + &lsqb; ( V i + 1 - V i ) / 2 &rsqb; + 1 < v < V i + 1 ;
Generated according to the eigenvalue of each property value in the corresponding amendment attribute data set of each attribute for obtaining described hard Disk characteristic.
7. characteristic extraction element in a kind of hard disk SMART data, it is characterised in that include:
First acquisition module, for obtaining the SMART data acquisition systems of sample hard disk, wherein, the SMART data acquisition systems include Q Individual SMART data and Q hard disk type information corresponding with the Q SMART data difference;
First generation module, for being normalized to the Q SMART data, to generate Q normalization SMART number According to;
Correcting module, for being modified to the Q normalization SMART data according to the Q hard disk type information respectively, To generate amendment SMART data acquisition systems;
Second generation module, for generating hard disk characteristic according to the amendment SMART data acquisition systems;
Wherein, the SMART data acquisition systems include S attribute data set corresponding with S attribute difference, described in each SMART data include and the S corresponding S of attribute difference the first attribute data subclass, first generation module it Before, also include:
Second acquisition module, it is corresponding for obtaining the S in each described SMART data the first attribute data subclass respectively S gradient data set;
Module is added, S gradient data set of each the SMART data for obtaining belongs to as S new first Property data subset close and be separately added into described each SMART data.
8. device as claimed in claim 7, it is characterised in that second acquisition module is specifically included:
First signal generating unit, for the corresponding first attribute data subclass of s-th attribute from SMART data each described In choose M attribute data successively, to generate P the second attribute data subclass, wherein, P=N-M+1, N are the attribute s The sum of attribute data, s=1 ... S in corresponding first attribute data subclass;
Second signal generating unit, for calculating the P corresponding P gradient data of the second attribute data subclass, and root respectively The corresponding gradient data set of the attribute s is generated according to the P gradient data.
9. device as claimed in claim 8, it is characterised in that second signal generating unit is specifically included:
First obtains subelement, for obtaining the P the second attribute data subclass pair respectively by weighted least-squares method The P fitting coefficient answered, wherein, i-th fitting coefficient k in the P fitting coefficienti=(Z-b*Y)/X,
Wherein, i=1 ... P,
X = &Sigma; j = i M + i - 1 w j * x j 2 ,
Y = &Sigma; j = i M + i - 1 w j * x i ,
Z = &Sigma; j = i M + i - 1 w j * x j * y j ,
b = ( Z * Y - &Sigma; j = i M + i - 1 ( w j * y j ) * X ) / ( Y * Y - X * &Sigma; j = i M + i - 1 w j ) ,
wjFor the corresponding default weight of j-th attribute data in the corresponding first attribute data subclass of the attribute s, xjFor institute State the detection time of j-th attribute data in the corresponding first attribute data subclass of attribute s, yjIt is corresponding for the attribute s J-th attribute data in first attribute data subclass;
Second obtains subelement, for obtaining i-th gradient data in the P gradient data respectively by below equation Gradi
Gradi=ki*(M-1)*yM+i-1
10. the device as described in any one of claim 7-9, it is characterised in that first generation module is specifically included:
Processing unit, for being normalized to the Q SMART data by below equation:
G (x)=sign (x) × logy| x |,
Wherein, x is an attribute data in the Q SMART data, after g (x) is for the corresponding normalization of attribute data x Attribute data, the y can be calculated by below equation:
yz≤Value<(y+Δy)z,
Wherein, z is predetermined threshold value, and Value is the factory-default of the corresponding attributes of the attribute data x, and Δ y is default smart Degree.
11. devices as described in any one of claim 7-9, it is characterised in that the correcting module is specifically included:
First acquisition unit, for corresponding respectively with the Q hard disk type information according to the Q hard disk type acquisition of information Q correction value;
Amending unit, distinguishes corresponding Q correction value respectively to corresponding normalizing for basis and the Q hard disk type information Change SMART data to be modified.
12. devices as described in any one of claim 7-9, it is characterised in that the amendment SMART data acquisition systems include with Corresponding S of the S attribute difference corrects attribute data set, and second generation module is specifically included:
Second obtaining unit, for obtaining the corresponding S training characteristics data of the S attribute respectively;
Sequencing unit, for being ranked up respectively to generate and the S to the training characteristics value in each training characteristics data The corresponding S characteristic sequence (V of attributei);
3rd acquiring unit, for being obtained in the corresponding amendment attribute data set of each attribute by following default mapping ruler Each property value v eigenvalue f (v):
f ( v ) = V i V i &le; v &le; &lsqb; ( V i + 1 - V i ) / 2 &rsqb; f ( v ) = V i + 1 V i + &lsqb; ( V i + 1 - V i ) / 2 &rsqb; + 1 < v < V i + 1 ;
3rd signal generating unit, for according to each property value in the corresponding amendment attribute data set of each attribute for obtaining Eigenvalue generates the hard disk characteristic.
CN201310733574.5A 2013-12-26 2013-12-26 Characteristic extracting method and device in hard disk SMART data Active CN103646114B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310733574.5A CN103646114B (en) 2013-12-26 2013-12-26 Characteristic extracting method and device in hard disk SMART data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310733574.5A CN103646114B (en) 2013-12-26 2013-12-26 Characteristic extracting method and device in hard disk SMART data

Publications (2)

Publication Number Publication Date
CN103646114A CN103646114A (en) 2014-03-19
CN103646114B true CN103646114B (en) 2017-04-05

Family

ID=50251327

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310733574.5A Active CN103646114B (en) 2013-12-26 2013-12-26 Characteristic extracting method and device in hard disk SMART data

Country Status (1)

Country Link
CN (1) CN103646114B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105589795A (en) * 2014-12-31 2016-05-18 中国银联股份有限公司 Disk failure prediction method and device based on prediction model
CN105260279B (en) * 2015-11-04 2019-01-01 四川效率源信息安全技术股份有限公司 Method and apparatus based on SMART data dynamic diagnosis hard disk failure
CN107025153B (en) * 2016-01-29 2021-02-12 阿里巴巴集团控股有限公司 Disk failure prediction method and device
CN110399238B (en) * 2019-06-27 2023-09-22 浪潮电子信息产业股份有限公司 Disk fault early warning method, device, equipment and readable storage medium
CN110929305A (en) * 2019-08-08 2020-03-27 北京盛赞科技有限公司 Hard disk protection method, device, equipment and computer readable storage medium
CN113380316B (en) * 2020-02-25 2024-07-09 深信服科技股份有限公司 Disk information mining method, device, equipment and storage medium
CN111611117B (en) * 2020-05-22 2022-06-10 浪潮电子信息产业股份有限公司 Hard disk fault prediction method, device, equipment and computer readable storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1627277A (en) * 2003-12-13 2005-06-15 张国飙 Intelligent hard disk

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101764846B (en) * 2009-12-18 2012-07-11 西南交通大学 Implement method of remote centralized disk array operation monitoring system
US8572315B2 (en) * 2010-11-05 2013-10-29 International Business Machines Corporation Smart optimization of tracks for cloud computing

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1627277A (en) * 2003-12-13 2005-06-15 张国飙 Intelligent hard disk

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
固态硬盘当缓存 Intel Smart Response技术实战;PCFAN评测室;《电脑迷 》;20110731;32-33 *

Also Published As

Publication number Publication date
CN103646114A (en) 2014-03-19

Similar Documents

Publication Publication Date Title
CN103646114B (en) Characteristic extracting method and device in hard disk SMART data
Hong et al. Toward a connectivity gradient-based framework for reproducible biomarker discovery
CN110514924B (en) Power transformer winding fault positioning method based on deep convolutional neural network fusion visual identification
CN108154508B (en) Method, apparatus, storage medium and the terminal device of product defects detection positioning
CN111459700B (en) Equipment fault diagnosis method, diagnosis device, diagnosis equipment and storage medium
CN110399238B (en) Disk fault early warning method, device, equipment and readable storage medium
CN108257121B (en) Method, apparatus, storage medium and the terminal device that product defects detection model updates
US11715198B2 (en) Medical use artificial neural network-based medical image analysis apparatus and method for evaluating analysis results of medical use artificial neural network
CN107844417A (en) Method for generating test case and device
US10210456B2 (en) Estimation of predictive accuracy gains from added features
CN104596780B (en) Diagnosis method for sensor faults of motor train unit braking system
JP2020052740A (en) Abnormality detection device, abnormality detection method, and program
CN108919059A (en) A kind of electric network failure diagnosis method, apparatus, equipment and readable storage medium storing program for executing
CN110517762A (en) The method for generating for identification and/or predicting the knowledge base of the failure of Medical Devices
CN115964361B (en) Data enhancement method, system, equipment and computer readable storage medium
US20230110056A1 (en) Anomaly detection based on normal behavior modeling
Menze et al. Mimicking the human expert: pattern recognition for an automated assessment of data quality in MR spectroscopic images
CN111949459B (en) Hard disk failure prediction method and system based on transfer learning and active learning
CN109978868A (en) Toy appearance quality determining method and its relevant device
CN114742115B (en) Method for constructing fault diagnosis model of rolling bearing and diagnosis method
CN106919380A (en) Programmed using the data flow of the computing device of the figure segmentation estimated based on vector
CN114236272B (en) Intelligent detection system of electronic product
CN112183751B (en) Neural network model prediction confidence calibration method, system and storage medium
CN114528942A (en) Construction method of data sample library of engineering machinery, failure prediction method and engineering machinery
KR102166441B1 (en) Lesions detecting apparatus and controlling method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant