WO1999056230A2 - Method and system for storing metadata in a relational database - Google Patents
Method and system for storing metadata in a relational database Download PDFInfo
- Publication number
- WO1999056230A2 WO1999056230A2 PCT/US1999/009442 US9909442W WO9956230A2 WO 1999056230 A2 WO1999056230 A2 WO 1999056230A2 US 9909442 W US9909442 W US 9909442W WO 9956230 A2 WO9956230 A2 WO 9956230A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- record
- metadata
- storing
- relation
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
Definitions
- the present invention relates generally to data storage and more specifically, to the storage of metadata in a relational database, which metadata describes data stored in a flat file.
- Enterprises use electronic commerce (e-commerce) to conduct business transactions over electronic mediums.
- the electronic mediums include, for example, computer networks (e.g., the Internet), telephones, and facsimile machines.
- Enterprises usually exchange business documents when conducting business transactions over a computer network.
- the business documents may include a request for bid ("RFB”), request for quote ("RFQ”), purchase order ("PO”), shipping information, and the like.
- the business documents are exchanged over the computer network in an electronic form.
- a "flat file” is a data repository that contains information in an electronic form. There are several different types of flat files.
- a first type of flat file is a "fixed-width" flat file, which contains records having fields that are of a fixed- width.
- the records in a fixed-width flat file are referred to as "fixed-width records.
- Records in a fixed-width flat file share a common structure that is defined by the fields of the records.
- ⁇ second type of flat file is a "delimited” flat file, which contains records having fields whose widths may vary.
- the records in a delimited flat file are referred to as “delimited records.
- Delimited records use a field delimiter (e.g., a predefined character) to separate the fields of a record. Records in delimited flat files share a common structure that is defined by the fields of the records.
- a third type of flat file is a "fixed-block" flat file, which contains two or more different types of fixed-width records.
- Each type of fixed-width record is referred to as a block.
- each type of fixed-width record has its own structure that is defined by the fields of that type of record.
- Fixed-block flat files are the most common types of files used in e-commerce.
- a fourth type of flat file is a "delimited-block" file, which contains two or more different types of delimited records.
- Each type of delimited record is also referred to as a block.
- each type of delimited record has its own structure that is defined by the fields of that type of record. Similar to delimited flat files, a field delimiter is used to separate the fields of a record in a delimited-block flat file.
- a fifth type of flat file is known as a "general-block" flat file.
- a general-block flat file contains both fixed-width and delimited records.
- the fixed- width records may be of two or more different types (i.e. , have different structures), as may the delimited records.
- a general-block file is the most complex type of flat file because it may store records normally contained in fixed-block and delimited-block flat files. Records of a general-block flat file that have a common structure are referred to as a block.
- FIG. 1 illustrates six records 300(1 )-300(6) contained in an exemplary general-block flat file 300. While each record in a flat file typically appears on a single line, it is not uncommon for two or more records to appear on the same line, as illustrated by records 300(5) and 300(6). When two or more records appear on the same line, a row delimiter (e.g. , a predefined character such as a vertical bar 302) is used to separate the records. Records 300(2)-300(4) are fixed-width records. Records 300(1), 300(5), 300(6) are delimited records having comma 305, semicolon 307, and semicolon 309 field delimiters, respectively.
- a flat file typically has a description associated with it that defines the structure and content of the flat file.
- the metadata for two or more flat files may differ depending on the structure and content of the flat files.
- the metadata of a purchase order document may differ from the metadata of a RFQ or a RFB.
- the metadata of the same type of business document e.g. , two purchase orders
- the purchase order must be translated into a form that the second enterprise can parse.
- Enterprise El operates a computer 1 14 that is able to communicate with computer network 130 over a communication channel 128.
- Computer 114 communicates with Electronic Document Interchange ("EDI") translator/mapper software running on a computer 106.
- Enterprise E2 operates a computer 134 that is able to communicate with computer network 130 over a communication channel 132.
- Computer 134 communicates with EDI translator/mapper software running on a computer 140.
- the computer network 130 may be a local area network (“LAN”), a wide area network (“WAN”), the Internet, an Intranet, or some other communications network that allows enterprise El to communicate with enterprise E2.
- Flat file FI contains the purchase order information.
- Flat file FI is typically stored in a data repository.
- the purchase order information is stored in one particular format that is described by metadata Ml .
- flat file FI will be translated to an industry-standard document, such as an EDI flat file EFl , which is described by EDI metadata EDI_M.
- industry standards specify the metadata for different types of business documents to establish a common ground for identifying and parsing these documents by different enterprises.
- metadata Ml and EDI metadata EDI_M are typically stored as flat files in a data repository 118, which causes a number of problems as will be described below.
- EDI translator/mapper software running on computer 106 accesses flat file FI , metadata Ml , and EDI metadata EDI_M.
- the software typically has a proprietary module that has been programmed to access the metadata.
- the EDI translator/mapper software uses flat file FI , metadata Ml , and metadata EDI_M to translate flat file FI into EDI flat file EFl in a well-known manner.
- Computer 114 receives EDI flat file EFl from computer 106 and transmits EDI flat file EFl , over computer network 130, to computer 134 residing in enterprise E2.
- Enterprise E2 will store the purchase order as a flat file F2.
- the format of flat file F2 differs from the format in which enterprise El maintains the purchase order.
- the format in which enterprise E2 maintains the purchase order is described by metadata M2, which differs from metadata M l .
- enterprise E2 In order for enterprise E2 to identify and parse the contents of the purchase order represented by flat file FI (which was sent to enterprise E2 as EDI flat file EFl), enterprise E2 must translate EDI flat file EFl into flat file F2.
- enterprise E2 stores metadata M2 and metadata EDI_M as flat files in data repository 146.
- EDI translator/mapper software running on computer 140 accesses flat file EFl , metadata M2, and EDI metadata EDI_M.
- the software typically has a proprietary module that has been programmed to access the metadata.
- the EDI translator/mapper software uses flat file EFl , metadata M2, and metadata EDI_M to translate flat file EFl into flat file F2 in a well-known manner.
- the output of the EDI translator/mapper software is flat file F2.
- the purchase order information represented by flat file F2 is in a format that enterprise E2 can use.
- data repositories e.g. , 118 and 1466 that store the metadata may not be persistent. Thus, the time taken to access the information in the data repositories may be long, which may hinder performance.
- the data repositories that store the metadata are not always secure. Therefore, the security of the information contained in these data repositories may be severely compromised.
- a first aspect of the present invention is directed to a method for storing metadata in a relational database, wherein the metadata describes data in a flat file.
- the metadata comprises (i) a header specification describing a general content of the data stored in the flat file, (ii) a record specification describing characteristics of a plurality of types of records contained in the flat file, and (iii) a field specification describing characteristics of fields of a record.
- the metadata also has a relation specification associated therewith indicating a relation between the plurality of types of records.
- the method comprises creating a relational database and creating data structures for storing data representing the header specification, the record specification, the field specification, and the relation specification in the relational database.
- the method also includes storing data representing the header specification, the record specification, the field specification, and the relation specification in the data structures.
- a second aspect of this invention relates to a method for storing metadata in a relational database, wherein the metadata describes data in a flat file.
- the metadata comprises (i) a header specification describing a general content of the data stored in the flat file, (ii) a record specification describing characteristics of a plurality of types of records contained in the flat file, and (iii) a field specification describing characteristics of fields of a record.
- the metadata has a relation specification associated therewith indicating a relation between the plurality of types of records.
- the method includes creating a first data structure for storing data representing the header specification and storing the data representing the header specification in the first data structure.
- the method also includes creating a second data structure for storing data representing the record specification and storing the data representing the record specification in the second data structure.
- the method further includes creating a third data structure for storing data representing the field specification and storing the data representing the field specification in the third data structure.
- the method includes creating a fourth data structure for storing data representing the relation specification and storing the data representing the relation specification in the fourth data structure.
- a third aspect of the present invention is directed to a method for storing metadata in a relational database, wherein the metadata describes data in a flat file.
- the metadata includes (i) a header specification describing a general content of the data stored in the flat file, (ii) a record specification describing characteristics of a plurality of types of records contained in the flat file, and (iii) a field specification describing characteristics of fields of a record.
- the metadata has a relation specification associated therewith indicating a relation between the plurality of types of records, and the relational database has a database engine associated therewith.
- the method includes obtaining data representing the header specification, constructing a first database query relating to the data representing the header specification, and passing the first database query to the database engine for execution so that the header specification can be stored in the relational database.
- the method also includes obtaining data representing the ' record specification, constructing a second database query relating to the data representing the record specification, and passing the second database query to the database engine for execution so that the record specification can be stored in the relational database.
- the method further includes obtaining data representing the field specification, constructing a third database query relating to the data representing the field specification, and passing the third database query to the database engine for execution so that the field specification can be stored in the relational database.
- the method includes obtaining data representing the relation specification, constructing a fourth database query relating to the data representing the relation specification, and passing the fourth database query to the database engine for execution so that the relation specification can be stored in the relational database.
- a fourth aspect of the present invention pertains to a method for translating a first file containing data in a first format into a second file containing data in a second format, wherein the first format differs from the second format.
- the method includes populating a relational database with metadata relating to the first file.
- the method also includes using translating software to access the data in the first file and the metadata in the relational database and translate the first file into the second file using the metadata and the data in the first file.
- a fifth aspect of this invention is directed to a system for storing metadata in a relational database, wherein the metadata describes data in a flat file.
- the metadata comprises (i) a header specification describing a general content of the data stored in the flat file, (ii) a record specification describing characteristics of a plurality of types of records contained in the flat file, and (iii) a field specification describing characteristics of fields of a record.
- the metadata has a relation specification associated therewith indicating a relation between the plurality of types of records.
- the system includes a relational database and a processor in communication with the relational database.
- the processor is programmed to create data structures for storing data representing the header specification, the record specification, the field specification, and the relation specification in the relational database.
- the processor is also programmed to store data representing the header specification, the record specification, the field specification, and. the relation specification in the data structures.
- FIG. 1 illustrates an exemplary general-block flat file.
- FIG. 2 is a block diagram illustrating a prior art e-commerce architecture.
- FIG. 3 is a block diagram illustrating an e-commerce architecture in which the present invention may be used.
- FIG. 4 illustrates metadata that describes the structure and content of the general-block flat file shown in FIG. 1.
- FIG. 5 illustrates a relational database for storing the metadata illustrated in FIG. 4 and relation specifications.
- FIG. 5A is a flowchart illustrating a process for creating data structures for storing metadata and relation specifications.
- FIG. 6 is a flowchart illustrating a process for processing and storing metadata in a relational database.
- FIG. 7 is a flowchart illustrating a process for processing and storing a header specification in a relational database.
- FIG. 8 is a flowchart illustrating a process for processing and storing a record specification in a relational database.
- FIG. 9 is a flowchart illustrating a process for processing and storing a field specification in a relational database
- FIG. 10 is a flowchart illustrating a process for processing and storing a relation specification in a relational database.
- the present invention provides a novel and unique method and system for storing metadata in, and using metadata from, a relational database, which metadata describes data stored in a flat file.
- the method and system permits standard database software to be used to securely and quickly access metadata that describes even the most complex flat files.
- FIG. 3 is a block diagram illustrating an e-commerce architecture in which the present invention may be used.
- a first enterprise El may transfer flat file business documents to an enterprise E2.
- Enterprise El operates a computer 1 14 that is in communication with a computer network 130 over a communication channel
- Computer 1 14 communicates with translator/mapper software running on a computer 106 having one or more processors as is well known.
- Enterprise E2 operates a computer 134 that is in communication with computer network 130 over a communication channel 132.
- Computer 134 communicates with translator/mapper software running on a computer 140 having one or more processors as is well known.
- the translator/mapper software running on computers
- 106 and/or 140 is well-known Electronic Document Interchange ("EDI") translator/mapper software such as IBM Corp.'s (Armonk, NY) "Data Interchange/MVS” and/or “Data Interchange/MVS-CICS” software.
- EDI Electronic Document Interchange
- translator/mapper software such as IBM Corp.'s (Armonk, NY) "Data Interchange/MVS” and/or “Data Interchange/MVS-CICS” software.
- other translator/mapper software may be chosen as desired to suit particular file formats used by enterprises El and E2.
- computer network 130 is the Internet.
- computer network 130 may be a local area network (“LAN”), a wide area network (“WAN”), an Intranet, or some other communications network that allows enterprise El to communicate with enterprise E2.
- Communication channels 128 and 132 may be wired or wireless.
- Enterprise El maintains a flat file FI , which is to be transferred to enterprise E2 for processing.
- flat file FI represents a purchase order business document indicating goods and/or services that enterprise El desires to purchase from enterprise E2.
- flat file FI may represent any other document or business document to be used by enterprise E2 for any purpose.
- Enterprise El stores flat file FI in a data repository.
- the purchase order represented by flat file FI is stored in one particular format that is described by metadata Ml .
- Enterprise E2 will maintain the purchase order as a flat file F2, but in a format that differs from the format in which enterprise El maintains the purchase order.
- the format in which enterprise E2 maintains the purchase order is described by metadata M2, which differs from metadata M l .
- metadata M l and M2 may be similar.
- flat file FI is translated into flat file F2.
- the translation process uses an intermediate file format, which, in the present embodiment, is an Electronic Data Interchange ("EDI") format.
- EDI Electronic Data Interchange
- flat file FI will be translated into an EDI flat file EFl , which will be sent to enterprise E2 over computer network 130.
- Enterprise E2 will translate EDT flat file EFl into flat file F2, which enterprise E2 can parse.
- the metadata that describes EDI flat file EFl is described by EDI metadata EDI_M, which is maintained by enterprises E l and E2.
- Enterprise El stores metadata M l and EDI metadata EDI_M in a relational database 500.
- enterprise E2 stores metadata M2 and EDI metadata EDI_M in a relational database 210.
- the storage of metadata is facilitated by computer software 204 as will be explained in more detail below.
- the translation of flat file FI into EDI flat file EFl is performed by translator/mapper software running on computer 106, which accesses flat file FI , metadata Ml , and EDI metadata EDI_M. Unlike the prior art, the metadata can be accessed from relational database 500 using standard database queries.
- the translator/mapper software uses flat file FI , metadata M l , and metadata EDI_M to translate flat file F I into EDI flat file EFl in a well-known manner.
- Computer 1 14 receives EDI flat file EFl from computer 106 and transmits EDI flat file EFl, over the computer network 130, to computer 134 residing in enterprise E2.
- the translator/mapper software running on computer 140 receives EDI flat file EFl from computer 134. This software accesses metadata M2 and EDI metadata EDI_M from relational database 210. Because the metadata is stored in relational database 210, it may be accessed using standard database queries.
- the translator/mapper software uses flat file EFl , metadata M2, and metadata EDI_M to translate flat file EFl into flat file F2 in a well-known manner.
- the output of the translator/mapper software is flat file F2.
- the purchase order represented by flat file F2 which corresponds to the purchase order represented by flat file FI , is in a format that enterprise E2 can parse and use.
- FIG. 4 illustrates metadata that describes the structure and content of a flat file.
- the metadata of FIG. 4 describes the general-block flat file 300
- General block flat file 300 may be considered as all or part of a purchase order requested by that file F I .
- the metadata comprises a header specification 400 and block specifications 405, 410, and 415.
- Header specification 400 describes the general content of a flat file, here, flat file 300. As shown in row Rl , header specification 400 includes a HeaderName and a Description. In this example, the HeaderName is "Companylnformation” and the description is "Company, Products, and Owner Information” as shown in row R2. Thus, header specification 400 indicates that the flat file contains information pertaining to companies, products that the companies manufacture, and the owner(s) of the companies.
- block specifications 405, 410, and 415 each represent one of three different types of records in a flat file -- that is, block specifications 405, 410, and 415 represent a first type of delimited record, a first type of fixed- width record, and a second type of delimited record, respectively.
- Each block specification comprises a record specification and a field specification.
- block specifications 405, 410, and 415 comprise record/field specifications 405A/405B, 410A/410B, and 415A/415B, respectively.
- a record specification describes the characteristics of a record.
- record specifications 405 A, 410A, and 415A describe the characteristics of records represented by block specifications 405, 410, and 41 .
- each record specification 405A, 410A, and 410B includes a RecordName, Type, FicldDelimiter, and RowDeiimiter as shown in rows R3, R13, and R21 , respectively.
- the RecordNames of record specifications 405 A, 410A, and 415A contain the "Company” (R4), "Product” (R14), and “Owner” (R22), respectively. This indicates that the "Company” record type contains information pertaining to a given company, the "Product” record type contains information pertaining to a given product of the Company, and the "Owner” record type contains information pertaining to a given owner of the company.
- the Type of a record specification indicates a type of record represented by the record specification. Each Type is unique and need not be present when the flat file described by the metadata is a fixed-width or delimited flat file because they contain only one type of record.
- record specifications 405 A, 410A, and 415A contain the Types "0100" (R4), "0200” (R14), and "0300” (R22), respectively.
- the FieldDelimiter of a record specification represents a character that separates two consecutive fields in the record. In fixed-width records, this character need not be present.
- records specifications 405 A and 41 A have a comma "," and a semicolon ";” as respective field delimiters as indicated in rows R4 and R22.
- Row R14 of record specification 410A does not have a field delimiter because it is a fixed- width record.
- the RowDeiimiter of a record specification represents the character that separates two records that have the same record specification.
- "NEWLINE” is a RowDeiimiter for record specifications 405A and 410A as indicated in rows R4 and R 14, respectively.
- “NEWLINE” is used to indicate that any two consecutive records, which have the same record specification, appear on two consecutive lines in the file.
- " (R22) is shown as the exemplary RowDeiimiter for record specification 415A.
- a field specification describes the characteristics of the fields of a record.
- field specifications 405B, 410B, 415B describe the characteristics of the fields of records described by record specifications 405A, 410A, and 415A, respectively.
- Each field specification 405B, 410B, and 415B includes a FieldNa e,
- DataType, Size, Position, Offset, and Null indicator as shown in rows R5, R15, and R23, respectively.
- FieldName identifies the name of a field.
- DataType indicates the type of data that can be stored in the field. For example, DataTypes "A,” “N,” “AN,” and “DT” indicate that alpha, numeric, alphanumeric, and date values can be stored in the field, respectively.
- Size indicates the maximum size of a value that may be contained in the field.
- Position indicates a logical position of the field in a record. Offset indicates that the value contained in the field begins in a certain column in the record (that is, the physical position in the record from where the value of a field begins). Null indicates whether or not the value of the field can be null.
- FIG. 5A illustrates a process for creating data structures in a relational database, which data structures will be used to store one or more header specifications, record specifications, field specifications, and relation specifications.
- a relation specification indicates a relation between two or more types of records.
- a relation specification indicates a relation between two fields across two different record types in a fiat file.
- the relation specification typically does not appear explicitly as part of the metadata, but may appear as input from a user.
- software 204 creates. a relational database 500 (FIG. 5) in a
- RDBMS Relational Database Management System
- software 204 creates a table 505 (FIG. 5) to store data representing header specification 400.
- the columns in table 505 are the HcaderlD, HeaderName, and Description.
- the HeaderlD is the primary key column that identifies each row in table 505 uniquely.
- the manner in which the value contained in the HeaderlD column is generated is specific to the RDBMS.
- the HeaderName and Description columns of table 505 will store the HeaderName and Description contained in row R2 of header specification 400, respectively.
- software 204 creates a table 510 (FIG. 5) to store data representing record specifications.
- a row is created for each record specification, here, 405 A, 410A, 415A.
- the columns of table 510 include the RecordlD, HeaderlD, RecordName, Type, FieldDelimiter, and RowDeiimiter.
- the RecordlD is the primary key column that identifies each row in the table uniquely. The manner in which the value contained in the RecordlD column is generated is specific to the RDBMS.
- the HeaderlD is the foreign key column that links table 510 to its parent table 505. This link establishes a relation between record specifications 405 A, 410A, and 415A and header specification 400. The value contained in this column must be present in the HeaderlD column of parent table 505.
- the RecordName, Type, FieldDelimiter and RowDeiimiter columns of rows in table 510 having RecordlDs 1/2/3 will store the RecordName, Type, FicldDelimiter and RowDeiimiter contained in rows R4 R14/R22 of record specifications 405A/410A/415A, respectively.
- Software 204 creates a table 515 (FIG. 5) to store field specifications at step 556.
- a row is created in table 515 for each field specification 405B, 410B, 415B.
- the columns of table 515 include FieldID, RecordlD, FieldName, DataType, Size, Position, Offset, and Null.
- the FieldID is the primary key column that identifies each row in the table uniquely. The manner in which the value contained in this column is generated is specific to the type of the RDBMS.
- the RecordlD is the foreign key column that links this table 515 to its parent table 510. This link establishes a relation between field specifications 405B, 410B, 15B and its corresponding record specification 405A, 410A, 415A.
- the value contained in this column must be present in the Recordld column of parent table 510.
- the FieldName, DataType, Size, Position, Offset, and Null columns of rows in table 515 having FieldlDs 1 - 16 will store the FieldName, DataType, Size, Position, Offset, and Null values contained in rows R6-R12/R16-R20 R24-27 of field specifications 405B/410B/415B, respectively.
- the "VARCHAR,” “DATE,” and "NUMERIC" data represents alphanumeric, numeric, and date data types in table 515. The values may vary depending on the RDBMS.
- software 204 creates table 520 to store a relation specification.
- the columns in table 520 include a RelationID, ParentCoIumnld, and ChildColumnlD.
- the RelationID is the primary key column that identifies each row in table 520 uniquely.
- the manner in which the value contained in this column is generated is specific to the type of the RDBMS.
- the ParentColumnID is the foreign key column that indicates the parent field in a relation specification.
- the row in table 520 having RelationID 1 indicates that the Recldentifier (FieldID 1 in table 515) is the parent column.
- This Recldentifier field belongs to the "Company" record specification as indicated by
- the ChildColumnlD is the foreign key column that indicates the child field in a relation specification.
- the row in table 520 having RelationID 1 indicates that the Recldentifier field (FieldID 8 in table 515) is the child column.
- This Recldentifier field belongs to the Product record specification (RecordlD 2 in table 510).
- the value contained in the ChildColumnlD column must be present in the parent table 515.
- the data representing the header specification, the record specifications, the field specifications, and the relation specification is stored in the data structures.
- the flowchart of FIG. 6 illustrates a preferred method, which begins at step 600.
- header specification 400 is input from a user to software 204.
- software 204 processes and stores header specification 400 in table 505.
- software 204 constructs a program data-structure containing the dates shown in row R2 of header specification 400 (FIG. 4).
- software 204 stores row R2 of header specification in the program data-structure created at step 710.
- software 204 constructs a Structured Query Language (SQL) query to insert the data contained in the program data-structure into a row in table 505.
- SQL Structured Query Language
- software 204 passes the SQL query to a database engine, which is part of the RDBMS.
- the database engine executes the SQL query and inserts the data contained in the program data-structure into an appropriate row in table 505.
- processing proceeds to step 615 where software 204 may check whether all record specifications 405A, 410A, 415A for the current header specification 400 have been stored in relational database 500. If all such record specifications have been stored, then processing continues at step 645.
- next row R4/R14/R22 of record specification 405A/410A/415A are input from a user to software 204 at step 620.
- the next record specification is stored in relational database 500.
- FIG. 8 illustrates step 625 in detail.
- software 204 constructs a program data-structure containing current record specification 405 A/41 OA/415 A.
- software 204 stores the current record specification 405A/410A/415A in the program data-structure created at step 810.
- software 204 constructs a Structured Query Language (SQL) query to insert the data contained in the program data-structure into a row in table 5 10.
- SQL Structured Query Language
- software 204 passes the SQL query to the database engine for execution.
- the database engine executes the SQL query and inserts the data contained in the current program data- structure into an appropriate row in table 510.
- SQL Structured Query Language
- software 204 may check whether all field specifications for the current record specification have been stored in table 515. If all field specifications for the current record specification have been stored in table 515, then processing returns to step 615 where the next record specification is processed. If all field specifications for the current record specification have not been stored in table 515, then the next field specification (i.e., R6-R12/R16-R20/R24-R27 of record specifications 405B/410B/415B) is input from the user to software 204 at step 635. At step 640, software 204 processes and stores the current field specification in table 515. The flowchart in FIG. 9 illustrates this process in more detail.
- software 204 constructs a program data-structure containing current field specification 405B/410B/415B.
- software 204 stores the current field specification the program data-structure created at step 910.
- software 204 constructs a Structured Query Language (SQL) query to insert the data contained in the program data-structure into a row in table 515.
- SQL Structured Query Language
- software 204 passes the SQL query to the database engine for execution.
- the database engine executes the SQL query and inserts the data contained in the program data structure into an appropriate row in table 515.
- software 204 may check whether all of the relation specifications for the current header specification 400 have been stored in table 520. If all of the relation specifications for the current header specification 400 have been stored in table 520, then processing is done at step 660. At that point, metadata for the flat file has been stored in the relational database 500. It is noted that the metadata for other flat files can be stored in a relational database in a similar manner.
- step 655 software 204 processes and stores the relation specification in table 520.
- the flowchart of FIG. 10 illustrates the method in more detail.
- software 204 constructs a program data-structure containing the relation specification, which includes a Relationld, ParentColumnID, and the ChildColumnlD. The values of these columns may be obtained from the FieldID column in table 515 and input by a user to the software 204 at step 650.
- software 204 stores the ParentColumnID and the
- software 204 constructs a Structured Query Language SQL query to insert the data contained in the program data-structure as a row in table 520.
- software 204 passes the SQL query to the database engine for execution.
- the database engine executes the SQL query and inserts the data contained in the program data structure into an appropriate row in table 520. Processing returns to step 645 and may end at step 660.
- the metadata and relation specification stored in relational database 500 may then be used by translator/mapper software as described above.
- the metadata is stored in a standard format in relational databases. Therefore, the metadata may be accessed using industry-standard, non-proprietary database queries such as those offered in Structured Query Language (SQL). Moreover, relational databases have a standard security protocol built into the system, which ensures that the metadata remains secure. Still further, existing software tools enable storage of metadata in a variety of data repositories, although not in relational databases. Further, they mostly cater to only fixed-width and delimited files. While a small number cater to fixed-block files, such tools do not handle metadata that describe very generic types of flat files such as delimited-block files or general-block files.
- SQL Structured Query Language
- the present invention is highly generic and therefore powerful because it is not restricted to storing metadata for any one single type of a flat file just described. It facilitates the storage of metadata that describe very general and complex files containing two or more different types of records that may be fixed-width, delimited, or both.
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Diclosed herein is a method and system for storing metadata in a relational database, wherein the metadata describes data in a flat file. As described in one aspect of the disclosure, the metadata comprises (i) a header specification describing a general content of the data stored in the flat file, (ii) a record specification describing characteristics of a plurality of types of records contained in the flat file, and (iii) a field specification describing characteristics of fields of a record. The metadata also has a relation specification associated therewith indicating a relation between the plurality of types of records. The method includes creating a relational database and creating data structures for storing data representing the header specification, the record specification, the field specification, and the relation specification in the relational database. The method also includes storing data representing the header specification, the record specification, the field specification, and the relation specification in the data structures.
Description
METHOD AND SYSTEM FOR STORING METADATA IN A RELATIONAL DATABASE
CROSS REFERENCE TO RELATED APPLICATION
This application claims priority to United States Provisional Application
Serial Number 60/083,715, filed on April 30, 1998, the disclosure of which is incorporated herein by reference.
BACKGROUND OF THE INVENTION
The present invention relates generally to data storage and more specifically, to the storage of metadata in a relational database, which metadata describes data stored in a flat file.
Enterprises use electronic commerce (e-commerce) to conduct business transactions over electronic mediums. The electronic mediums include, for example, computer networks (e.g., the Internet), telephones, and facsimile machines. Enterprises usually exchange business documents when conducting business transactions over a computer network. The business documents may include a request for bid ("RFB"), request for quote ("RFQ"), purchase order ("PO"), shipping information, and the like. The business documents are exchanged over the computer network in an electronic form. A "flat file" is a data repository that contains information in an electronic form. There are several different types of flat files. A first type of flat file is a "fixed-width" flat file, which contains records having fields that are of a fixed- width. The records in a fixed-width flat file are referred to as "fixed-width records. " Records in a fixed-width flat file share a common structure that is defined by the fields of the records.
Λ second type of flat file is a "delimited" flat file, which contains records having fields whose widths may vary. The records in a delimited flat file are referred to as "delimited records. " Delimited records use a field delimiter (e.g., a predefined character) to separate the fields of a record. Records in delimited flat files share a common structure that is defined by the fields of the records.
A third type of flat file is a "fixed-block" flat file, which contains two or more different types of fixed-width records. Each type of fixed-width record is referred to as a block. In a fixed-block flat file, each type of fixed-width record has its own structure that is defined by the fields of that type of record. Fixed-block flat files are the most common types of files used in e-commerce.
A fourth type of flat file is a "delimited-block" file, which contains two or more different types of delimited records. Each type of delimited record is also referred to as a block. In a delimited-block flat file, each type of delimited record has its own structure that is defined by the fields of that type of record. Similar to delimited flat files, a field delimiter is used to separate the fields of a record in a delimited-block flat file.
A fifth type of flat file is known as a "general-block" flat file. A general-block flat file contains both fixed-width and delimited records. The fixed- width records may be of two or more different types (i.e. , have different structures), as may the delimited records. Thus, a general-block file is the most complex type of flat file because it may store records normally contained in fixed-block and delimited-block flat files. Records of a general-block flat file that have a common structure are referred to as a block.
FIG. 1 illustrates six records 300(1 )-300(6) contained in an exemplary general-block flat file 300. While each record in a flat file typically appears on a single line, it is not uncommon for two or more records to appear on the same line, as illustrated by records 300(5) and 300(6). When two or more records appear on the
same line, a row delimiter (e.g. , a predefined character such as a vertical bar 302) is used to separate the records. Records 300(2)-300(4) are fixed-width records. Records 300(1), 300(5), 300(6) are delimited records having comma 305, semicolon 307, and semicolon 309 field delimiters, respectively. A flat file typically has a description associated with it that defines the structure and content of the flat file. This description, which is known as metadata, is necessary in order to be able to identify and parse the contents of the flat file. The metadata for two or more flat files may differ depending on the structure and content of the flat files. For example, the metadata of a purchase order document may differ from the metadata of a RFQ or a RFB. Additionally, the metadata of the same type of business document (e.g. , two purchase orders) may differ between two enterprises that use that type of business document. In this latter case, for a second enterprise to read the purchases order of a first enterprise, the purchase order must be translated into a form that the second enterprise can parse. To illustrate, consider the e-commerce architecture illustrated in FIG. 2. in which a first enterprise El desires to submit a purchase order to a second enterprise E2. Enterprise El operates a computer 1 14 that is able to communicate with computer network 130 over a communication channel 128. Computer 114 communicates with Electronic Document Interchange ("EDI") translator/mapper software running on a computer 106. Enterprise E2 operates a computer 134 that is able to communicate with computer network 130 over a communication channel 132. Computer 134 communicates with EDI translator/mapper software running on a computer 140. The computer network 130 may be a local area network ("LAN"), a wide area network ("WAN"), the Internet, an Intranet, or some other communications network that allows enterprise El to communicate with enterprise E2.
Enterprise El maintains a flat file FI that contains the purchase order information. Flat file FI is typically stored in a data repository. The purchase order
information is stored in one particular format that is described by metadata Ml . In this prior art example, flat file FI will be translated to an industry-standard document, such as an EDI flat file EFl , which is described by EDI metadata EDI_M. As is well known, industry standards specify the metadata for different types of business documents to establish a common ground for identifying and parsing these documents by different enterprises. In the prior art, metadata Ml and EDI metadata EDI_M are typically stored as flat files in a data repository 118, which causes a number of problems as will be described below.
To perform the translation, EDI translator/mapper software running on computer 106 accesses flat file FI , metadata Ml , and EDI metadata EDI_M. With regard to the metadata, the software typically has a proprietary module that has been programmed to access the metadata. The EDI translator/mapper software uses flat file FI , metadata Ml , and metadata EDI_M to translate flat file FI into EDI flat file EFl in a well-known manner. Computer 114 receives EDI flat file EFl from computer 106 and transmits EDI flat file EFl , over computer network 130, to computer 134 residing in enterprise E2.
Enterprise E2 will store the purchase order as a flat file F2. The format of flat file F2 differs from the format in which enterprise El maintains the purchase order. In particular, the format in which enterprise E2 maintains the purchase order is described by metadata M2, which differs from metadata M l . In order for enterprise E2 to identify and parse the contents of the purchase order represented by flat file FI (which was sent to enterprise E2 as EDI flat file EFl), enterprise E2 must translate EDI flat file EFl into flat file F2.
To do so, enterprise E2 stores metadata M2 and metadata EDI_M as flat files in data repository 146. Again, as will be described below, the storage of metadata in flat files causes a number of problems in the prior art. EDI translator/mapper software running on computer 140 accesses flat file EFl , metadata M2, and EDI
metadata EDI_M. With regard to the metadata, the software typically has a proprietary module that has been programmed to access the metadata. The EDI translator/mapper software uses flat file EFl , metadata M2, and metadata EDI_M to translate flat file EFl into flat file F2 in a well-known manner. The output of the EDI translator/mapper software is flat file F2. Thus, the purchase order information represented by flat file F2 is in a format that enterprise E2 can use.
The prior art techniques that store metadata in flat files suffer from number of problems. In particular, the structure of the flat files used to store metadata may be quite complex. Thus, special, non-standard, proprietary translator/mapper software is required to access and use the metadata. In some circumstances, such as when metadata is used to describe delimited-block and/or general-block files, such software is not known to the inventors as having been developed.
Further, data repositories (e.g. , 118 and 146) that store the metadata may not be persistent. Thus, the time taken to access the information in the data repositories may be long, which may hinder performance.
Still further, the data repositories that store the metadata are not always secure. Therefore, the security of the information contained in these data repositories may be severely compromised.
In view of the foregoing, what is needed is a method and system that stores and permits quick and secure access to metadata that describes even the most complex flat files.
SUMMARY OF TILE INVENTION
A first aspect of the present invention is directed to a method for storing metadata in a relational database, wherein the metadata describes data in a flat file. The metadata comprises (i) a header specification describing a general content of the data stored in the flat file, (ii) a record specification describing characteristics of a plurality of types of records contained in the flat file, and (iii) a field specification describing characteristics of fields of a record. The metadata also has a relation specification associated therewith indicating a relation between the plurality of types of records.
The method comprises creating a relational database and creating data structures for storing data representing the header specification, the record specification, the field specification, and the relation specification in the relational database. The method also includes storing data representing the header specification, the record specification, the field specification, and the relation specification in the data structures. A second aspect of this invention relates to a method for storing metadata in a relational database, wherein the metadata describes data in a flat file. The metadata comprises (i) a header specification describing a general content of the data stored in the flat file, (ii) a record specification describing characteristics of a plurality of types of records contained in the flat file, and (iii) a field specification describing characteristics of fields of a record. The metadata has a relation specification associated therewith indicating a relation between the plurality of types of records.
The method includes creating a first data structure for storing data representing the header specification and storing the data representing the header specification in the first data structure. The method also includes creating a second data structure for storing data representing the record specification and storing the data representing the record specification in the second data structure. The method further includes creating a third data structure for storing data representing the field specification and storing the data representing the field specification in the third data structure. Still
further, the method includes creating a fourth data structure for storing data representing the relation specification and storing the data representing the relation specification in the fourth data structure.
A third aspect of the present invention is directed to a method for storing metadata in a relational database, wherein the metadata describes data in a flat file. The metadata includes (i) a header specification describing a general content of the data stored in the flat file, (ii) a record specification describing characteristics of a plurality of types of records contained in the flat file, and (iii) a field specification describing characteristics of fields of a record. The metadata has a relation specification associated therewith indicating a relation between the plurality of types of records, and the relational database has a database engine associated therewith.
The method includes obtaining data representing the header specification, constructing a first database query relating to the data representing the header specification, and passing the first database query to the database engine for execution so that the header specification can be stored in the relational database. The method also includes obtaining data representing the' record specification, constructing a second database query relating to the data representing the record specification, and passing the second database query to the database engine for execution so that the record specification can be stored in the relational database. The method further includes obtaining data representing the field specification, constructing a third database query relating to the data representing the field specification, and passing the third database query to the database engine for execution so that the field specification can be stored in the relational database. Still further, the method includes obtaining data representing the relation specification, constructing a fourth database query relating to the data representing the relation specification, and passing the fourth database query to the database engine for execution so that the relation specification can be stored in the relational database.
A fourth aspect of the present invention pertains to a method for translating a first file containing data in a first format into a second file containing data in a second format, wherein the first format differs from the second format. The method includes populating a relational database with metadata relating to the first file. The method also includes using translating software to access the data in the first file and the metadata in the relational database and translate the first file into the second file using the metadata and the data in the first file.
A fifth aspect of this invention is directed to a system for storing metadata in a relational database, wherein the metadata describes data in a flat file. The metadata comprises (i) a header specification describing a general content of the data stored in the flat file, (ii) a record specification describing characteristics of a plurality of types of records contained in the flat file, and (iii) a field specification describing characteristics of fields of a record. The metadata has a relation specification associated therewith indicating a relation between the plurality of types of records. The system includes a relational database and a processor in communication with the relational database. The processor is programmed to create data structures for storing data representing the header specification, the record specification, the field specification, and the relation specification in the relational database. The processor is also programmed to store data representing the header specification, the record specification, the field specification, and. the relation specification in the data structures.
BRIEF DESCRIPTION OF THE FIGURES
Representative embodiments of the present invention will be described with reference to the following figures:
FIG. 1 illustrates an exemplary general-block flat file. FIG. 2 is a block diagram illustrating a prior art e-commerce architecture.
FIG. 3 is a block diagram illustrating an e-commerce architecture in which the present invention may be used.
FIG. 4 illustrates metadata that describes the structure and content of the general-block flat file shown in FIG. 1.
FIG. 5 illustrates a relational database for storing the metadata illustrated in FIG. 4 and relation specifications.
FIG. 5A is a flowchart illustrating a process for creating data structures for storing metadata and relation specifications. FIG. 6 is a flowchart illustrating a process for processing and storing metadata in a relational database.
FIG. 7 is a flowchart illustrating a process for processing and storing a header specification in a relational database.
FIG. 8 is a flowchart illustrating a process for processing and storing a record specification in a relational database.
FIG. 9 is a flowchart illustrating a process for processing and storing a field specification in a relational database
FIG. 10 is a flowchart illustrating a process for processing and storing a relation specification in a relational database.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Reference is now made to the accompanying Figures for the purpose of describing, in detail, the preferred embodiments of the present invention. The Figures and accompanying detailed description are provided as examples of the invention and are not intended to limit the scope of the claims appended hereto.
The present invention, as defined by the claims, provides a novel and unique method and system for storing metadata in, and using metadata from, a relational database, which metadata describes data stored in a flat file. In so doing, the method and system permits standard database software to be used to securely and quickly access metadata that describes even the most complex flat files.
Referring now to the figures wherein like reference numerals identify similar elements, FIG. 3 is a block diagram illustrating an e-commerce architecture in which the present invention may be used. In particular, a first enterprise El may transfer flat file business documents to an enterprise E2. Enterprise El operates a computer 1 14 that is in communication with a computer network 130 over a communication channel
128. Computer 1 14 communicates with translator/mapper software running on a computer 106 having one or more processors as is well known. Enterprise E2 operates a computer 134 that is in communication with computer network 130 over a communication channel 132. Computer 134 communicates with translator/mapper software running on a computer 140 having one or more processors as is well known.
In one embodiment, the translator/mapper software running on computers
106 and/or 140 is well-known Electronic Document Interchange ("EDI") translator/mapper software such as IBM Corp.'s (Armonk, NY) "Data Interchange/MVS" and/or "Data Interchange/MVS-CICS" software. Of course, other translator/mapper software may be chosen as desired to suit particular file formats used by enterprises El and E2.
While the particular type of computer network 130 is not important to the present invention, according to one embodiment, computer network 130 is the Internet. However, computer network 130 may be a local area network ("LAN"), a wide area network ("WAN"), an Intranet, or some other communications network that allows enterprise El to communicate with enterprise E2. Communication channels 128 and 132 may be wired or wireless.
Enterprise El maintains a flat file FI , which is to be transferred to enterprise E2 for processing. In one embodiment, flat file FI represents a purchase order business document indicating goods and/or services that enterprise El desires to purchase from enterprise E2. In alternate embodiments, flat file FI may represent any other document or business document to be used by enterprise E2 for any purpose.
Enterprise El stores flat file FI in a data repository. The purchase order represented by flat file FI is stored in one particular format that is described by metadata Ml . Enterprise E2 will maintain the purchase order as a flat file F2, but in a format that differs from the format in which enterprise El maintains the purchase order. The format in which enterprise E2 maintains the purchase order is described by metadata M2, which differs from metadata M l . In an alternative embodiment, metadata M l and M2 may be similar. In order for enterprise E2 to identify and parse the contents of the purchase order represented by flat file FI , flat file FI is translated into flat file F2. The translation process uses an intermediate file format, which, in the present embodiment, is an Electronic Data Interchange ("EDI") format. Of course, other intermediate file formats may be used as desired. Thus, flat file FI will be translated into an EDI flat file EFl , which will be sent to enterprise E2 over computer network 130. Enterprise E2 will translate EDT flat file EFl into flat file F2, which enterprise E2 can parse.
The metadata that describes EDI flat file EFl is described by EDI metadata EDI_M, which is maintained by enterprises E l and E2. Enterprise El stores metadata M l and EDI metadata EDI_M in a relational database 500. Similarly, enterprise E2 stores metadata M2 and EDI metadata EDI_M in a relational database 210. The storage of metadata is facilitated by computer software 204 as will be explained in more detail below.
The translation of flat file FI into EDI flat file EFl is performed by translator/mapper software running on computer 106, which accesses flat file FI , metadata Ml , and EDI metadata EDI_M. Unlike the prior art, the metadata can be accessed from relational database 500 using standard database queries. The translator/mapper software uses flat file FI , metadata M l , and metadata EDI_M to translate flat file F I into EDI flat file EFl in a well-known manner. Computer 1 14 receives EDI flat file EFl from computer 106 and transmits EDI flat file EFl, over the computer network 130, to computer 134 residing in enterprise E2. The translator/mapper software running on computer 140 receives EDI flat file EFl from computer 134. This software accesses metadata M2 and EDI metadata EDI_M from relational database 210. Because the metadata is stored in relational database 210, it may be accessed using standard database queries. The translator/mapper software uses flat file EFl , metadata M2, and metadata EDI_M to translate flat file EFl into flat file F2 in a well-known manner. The output of the translator/mapper software is flat file F2. Thus, the purchase order represented by flat file F2, which corresponds to the purchase order represented by flat file FI , is in a format that enterprise E2 can parse and use.
FIG. 4 illustrates metadata that describes the structure and content of a flat file. In this example, the metadata of FIG. 4 describes the general-block flat file 300
(FIG. 1 ). Of course, metadata can be used to describe other flat files. General block flat file 300 may be considered as all or part of a purchase order requested by that file F I .
The metadata comprises a header specification 400 and block specifications 405, 410, and 415.
Header specification 400 describes the general content of a flat file, here, flat file 300. As shown in row Rl , header specification 400 includes a HeaderName and a Description. In this example, the HeaderName is "Companylnformation" and the description is "Company, Products, and Owner Information" as shown in row R2. Thus, header specification 400 indicates that the flat file contains information pertaining to companies, products that the companies manufacture, and the owner(s) of the companies.
A block specification is maintained for each type of record that is contained in a flat file. In the example shown in FIG. 4, block specifications 405, 410, and 415 each represent one of three different types of records in a flat file -- that is, block specifications 405, 410, and 415 represent a first type of delimited record, a first type of fixed- width record, and a second type of delimited record, respectively.
Each block specification comprises a record specification and a field specification. In this exemplary embodiment, block specifications 405, 410, and 415 comprise record/field specifications 405A/405B, 410A/410B, and 415A/415B, respectively.
A record specification describes the characteristics of a record. In the present embodiment, record specifications 405 A, 410A, and 415A describe the characteristics of records represented by block specifications 405, 410, and 41 .
In this example, each record specification 405A, 410A, and 410B includes a RecordName, Type, FicldDelimiter, and RowDeiimiter as shown in rows R3, R13, and R21 , respectively. The RecordNames of record specifications 405 A, 410A, and 415A contain the "Company" (R4), "Product" (R14), and "Owner" (R22), respectively. This indicates that the "Company" record type contains information pertaining to a given company, the "Product" record type contains information pertaining to a given product of
the Company, and the "Owner" record type contains information pertaining to a given owner of the company.
The Type of a record specification indicates a type of record represented by the record specification. Each Type is unique and need not be present when the flat file described by the metadata is a fixed-width or delimited flat file because they contain only one type of record. In the example shown in FIG. 4, record specifications 405 A, 410A, and 415A contain the Types "0100" (R4), "0200" (R14), and "0300" (R22), respectively.
The FieldDelimiter of a record specification represents a character that separates two consecutive fields in the record. In fixed-width records, this character need not be present. In this example, records specifications 405 A and 41 A have a comma "," and a semicolon ";" as respective field delimiters as indicated in rows R4 and R22. Row R14 of record specification 410A does not have a field delimiter because it is a fixed- width record. The RowDeiimiter of a record specification represents the character that separates two records that have the same record specification. In this example, "NEWLINE" is a RowDeiimiter for record specifications 405A and 410A as indicated in rows R4 and R 14, respectively. "NEWLINE" is used to indicate that any two consecutive records, which have the same record specification, appear on two consecutive lines in the file. A vertical bar "|" (R22) is shown as the exemplary RowDeiimiter for record specification 415A.
A field specification describes the characteristics of the fields of a record. In this embodiment, field specifications 405B, 410B, 415B describe the characteristics of the fields of records described by record specifications 405A, 410A, and 415A, respectively. Each field specification 405B, 410B, and 415B includes a FieldNa e,
DataType, Size, Position, Offset, and Null indicator as shown in rows R5, R15, and R23, respectively.
FieldName identifies the name of a field. DataType indicates the type of data that can be stored in the field. For example, DataTypes "A," "N," "AN," and "DT" indicate that alpha, numeric, alphanumeric, and date values can be stored in the field, respectively. Size indicates the maximum size of a value that may be contained in the field. Position indicates a logical position of the field in a record. Offset indicates that the value contained in the field begins in a certain column in the record (that is, the physical position in the record from where the value of a field begins). Null indicates whether or not the value of the field can be null.
As stated above, the storage of metadata in a relational database is facilitated by computer software 204, which controls one or more processor(s) of an appropriate computer in order to execute processes described herein. This is now described in more detail.
FIG. 5A illustrates a process for creating data structures in a relational database, which data structures will be used to store one or more header specifications, record specifications, field specifications, and relation specifications. As is well-known, a relation specification indicates a relation between two or more types of records. In one embodiment, a relation specification indicates a relation between two fields across two different record types in a fiat file. As is well-known, the relation specification typically does not appear explicitly as part of the metadata, but may appear as input from a user. At step 550, software 204 creates. a relational database 500 (FIG. 5) in a
Relational Database Management System (RDBMS). Any RDBMS software can be used as desired. This step 550 is executed in a well-known manner.
At step 552, software 204 creates a table 505 (FIG. 5) to store data representing header specification 400. In this embodiment, the columns in table 505 are the HcaderlD, HeaderName, and Description. The HeaderlD is the primary key column that identifies each row in table 505 uniquely. The manner in which the value contained in the HeaderlD column is generated is specific to the RDBMS. The HeaderName and
Description columns of table 505 will store the HeaderName and Description contained in row R2 of header specification 400, respectively.
At step 554, software 204 creates a table 510 (FIG. 5) to store data representing record specifications. In this embodiment, a row is created for each record specification, here, 405 A, 410A, 415A. The columns of table 510 include the RecordlD, HeaderlD, RecordName, Type, FieldDelimiter, and RowDeiimiter. The RecordlD is the primary key column that identifies each row in the table uniquely. The manner in which the value contained in the RecordlD column is generated is specific to the RDBMS.
The HeaderlD is the foreign key column that links table 510 to its parent table 505. This link establishes a relation between record specifications 405 A, 410A, and 415A and header specification 400. The value contained in this column must be present in the HeaderlD column of parent table 505. The RecordName, Type, FieldDelimiter and RowDeiimiter columns of rows in table 510 having RecordlDs 1/2/3 will store the RecordName, Type, FicldDelimiter and RowDeiimiter contained in rows R4 R14/R22 of record specifications 405A/410A/415A, respectively.
Software 204 creates a table 515 (FIG. 5) to store field specifications at step 556. In this embodiment, a row is created in table 515 for each field specification 405B, 410B, 415B. The columns of table 515 include FieldID, RecordlD, FieldName, DataType, Size, Position, Offset, and Null. The FieldID is the primary key column that identifies each row in the table uniquely. The manner in which the value contained in this column is generated is specific to the type of the RDBMS. The RecordlD is the foreign key column that links this table 515 to its parent table 510. This link establishes a relation between field specifications 405B, 410B, 15B and its corresponding record specification 405A, 410A, 415A. The value contained in this column must be present in the Recordld column of parent table 510.
The FieldName, DataType, Size, Position, Offset, and Null columns of rows in table 515 having FieldlDs 1 - 16 will store the FieldName, DataType, Size, Position, Offset, and Null values contained in rows R6-R12/R16-R20 R24-27 of field specifications 405B/410B/415B, respectively. The "VARCHAR," "DATE," and "NUMERIC" data represents alphanumeric, numeric, and date data types in table 515. The values may vary depending on the RDBMS.
At step 558, software 204 creates table 520 to store a relation specification. The columns in table 520 include a RelationID, ParentCoIumnld, and ChildColumnlD. The RelationID is the primary key column that identifies each row in table 520 uniquely. The manner in which the value contained in this column is generated is specific to the type of the RDBMS.
The ParentColumnID is the foreign key column that indicates the parent field in a relation specification. For example, the row in table 520 having RelationID 1 indicates that the Recldentifier (FieldID 1 in table 515) is the parent column. This Recldentifier field belongs to the "Company" record specification as indicated by
RecordlD 1 in table 510. The value contained in the ParentColumnID column must be present in the parent table 515.
The ChildColumnlD is the foreign key column that indicates the child field in a relation specification. For example, the row in table 520 having RelationID 1 indicates that the Recldentifier field (FieldID 8 in table 515) is the child column. This Recldentifier field belongs to the Product record specification (RecordlD 2 in table 510). The value contained in the ChildColumnlD column must be present in the parent table 515.
The relation just described indicates that a single parent record (e.g., type "0100" of row R4 of FIG. 4) could have one or more child records of a different type
(e.g., "0200" of row R14 of FIG. 4). This is similar to a primary-foreign key relation in a
relational database. Therefore, record type "0100" (R4) shares a one-to-many relation with record type "0200" (R14).
The data representing the header specification, the record specifications, the field specifications, and the relation specification is stored in the data structures. The flowchart of FIG. 6 illustrates a preferred method, which begins at step 600.
At step 605, header specification 400 is input from a user to software 204. At step 610, software 204 processes and stores header specification 400 in table 505. The flowchart in FIG. 7, which begins at step 700, illustrates step 610 in more detail.
At step 710, software 204 constructs a program data-structure containing the dates shown in row R2 of header specification 400 (FIG. 4). At step 715, software 204 stores row R2 of header specification in the program data-structure created at step 710. At step 720, software 204 constructs a Structured Query Language (SQL) query to insert the data contained in the program data-structure into a row in table 505. At step 725, software 204 passes the SQL query to a database engine, which is part of the RDBMS. At step 730, the database engine executes the SQL query and inserts the data contained in the program data-structure into an appropriate row in table 505.
Refc ing again to FIG. 6, processing proceeds to step 615 where software 204 may check whether all record specifications 405A, 410A, 415A for the current header specification 400 have been stored in relational database 500. If all such record specifications have been stored, then processing continues at step 645.
If there are record specifications to store, then the next row R4/R14/R22 of record specification 405A/410A/415A are input from a user to software 204 at step 620. At step 625, the next record specification is stored in relational database 500.
The flowchart of FIG. 8 illustrates step 625 in detail. At step 810, software 204 constructs a program data-structure containing current record specification 405 A/41 OA/415 A. At step 815, software 204 stores the current record specification 405A/410A/415A in the program data-structure created at step 810. At step 820,
software 204 constructs a Structured Query Language (SQL) query to insert the data contained in the program data-structure into a row in table 5 10. At step 825, software 204 passes the SQL query to the database engine for execution. At step 830, the database engine executes the SQL query and inserts the data contained in the current program data- structure into an appropriate row in table 510.
Referring again to FIG. 6, at step 630, software 204 may check whether all field specifications for the current record specification have been stored in table 515. If all field specifications for the current record specification have been stored in table 515, then processing returns to step 615 where the next record specification is processed. If all field specifications for the current record specification have not been stored in table 515, then the next field specification (i.e., R6-R12/R16-R20/R24-R27 of record specifications 405B/410B/415B) is input from the user to software 204 at step 635. At step 640, software 204 processes and stores the current field specification in table 515. The flowchart in FIG. 9 illustrates this process in more detail. At step 910, software 204 constructs a program data-structure containing current field specification 405B/410B/415B. At step 915, software 204 stores the current field specification the program data-structure created at step 910. At step 920, software 204 constructs a Structured Query Language (SQL) query to insert the data contained in the program data-structure into a row in table 515. At step 925, software 204 passes the SQL query to the database engine for execution.. At step 930, the database engine executes the SQL query and inserts the data contained in the program data structure into an appropriate row in table 515.
At step 645, software 204 may check whether all of the relation specifications for the current header specification 400 have been stored in table 520. If all of the relation specifications for the current header specification 400 have been stored in table 520, then processing is done at step 660. At that point, metadata for the flat file
has been stored in the relational database 500. It is noted that the metadata for other flat files can be stored in a relational database in a similar manner.
If all of the relation specifications for the current header specification 400 have not been stored in table 520, then processing continues at step 655 where software 204 processes and stores the relation specification in table 520. The flowchart of FIG. 10 illustrates the method in more detail. At step 1010, software 204 constructs a program data-structure containing the relation specification, which includes a Relationld, ParentColumnID, and the ChildColumnlD. The values of these columns may be obtained from the FieldID column in table 515 and input by a user to the software 204 at step 650. At step 1015, software 204 stores the ParentColumnID and the
ChildColumnlD in the program data-structure created at step 1010. At step 1020, software 204 constructs a Structured Query Language SQL query to insert the data contained in the program data-structure as a row in table 520. At step 1025, software 204 passes the SQL query to the database engine for execution. At step 1030, the database engine executes the SQL query and inserts the data contained in the program data structure into an appropriate row in table 520. Processing returns to step 645 and may end at step 660. The metadata and relation specification stored in relational database 500 may then be used by translator/mapper software as described above.
While the foregoing described the storage of the metadata of FIG. 4 in relational database 500, it should be appreciated, that the processes of FIGS. 5 A- 10 can be used to store any metadata in an appropriate relational database.
It is noted that while the foregoing description has referred to specific individual databases, formats, structures, records, and fields, those skilled in the art will readily appreciate that various modifications and substitutions may be made thereto without departing from the spirit and scope of the present invention.
In view of the foregoing, it is apparent that the present invention offers advantages over the prior art. In particular, with the present invention, the metadata is
stored in a standard format in relational databases. Therefore, the metadata may be accessed using industry-standard, non-proprietary database queries such as those offered in Structured Query Language (SQL). Moreover, relational databases have a standard security protocol built into the system, which ensures that the metadata remains secure. Still further, existing software tools enable storage of metadata in a variety of data repositories, although not in relational databases. Further, they mostly cater to only fixed-width and delimited files. While a small number cater to fixed-block files, such tools do not handle metadata that describe very generic types of flat files such as delimited-block files or general-block files. From that respect, the present invention is highly generic and therefore powerful because it is not restricted to storing metadata for any one single type of a flat file just described. It facilitates the storage of metadata that describe very general and complex files containing two or more different types of records that may be fixed-width, delimited, or both.
Although the particular embodiments shown and described above are useful in many applications relating to the arts to which the present invention pertains, further modifications of the present invention herein disclosed will occur to persons skilled in the art. All such modifications are deemed to be within the scope and spirit of the present invention.
Claims
What is claimed is: 1. A method for storing metadata in a relational database, wherein the metadata describes data in a flat file, wherein the metadata comprises (i) a header specification describing a general content of the data stored in the flat file, (ii) a record specification describing characteristics of a plurality of types of records contained in the flat file, and (iii) a field specification describing characteristics of fields of a record, wherein the metadata has a relation specification associated therewith indicating a relation between the plurality of types of records, and wherein the method comprises: (a) creating a relational database; (b) creating data structures for storing data representing the header specification, the record specification, the field specification, and the relation specification in the relational database; and (c) storing data representing the header specification, the record specification, the field specification, and the relation specification in the data structures.
2. The method of Claim 1, wherein the step of creating the data structures comprises: (a) creating a first table for storing the header specification; (b) creating a second table for storing the record specification; (c) creating a third table for storing the field specification; and (d) creating a fourth table for storing the relation specification.
3. The method of Claim 1 , wherein the step of storing data comprises: (a) storing data representing the header specification in a first table; (b) storing data representing the record specification in a second table; (c) storing data representing the field specification in a third table; (d) storing data representing the relation specification in a fourth table.
4. The method of Claim 1 , further comprising the step of providing data indicating structures of the header specification, the record specification, the field specification, and the relation specification to database software, and wherein the step of creating the data structures comprises causing the database software to create the data structures based on the provided data indicating the structures.
5. The method of Claim 4, wherein the step of providing comprises providing data indicating the record specification that describes characteristics of fixed-width and delimited records stored in the flat file.
6. The method of Claim 4, wherein the step of providing comprises providing data indicating the record specification that describes characteristics of fixed-width records stored in the flat file.
7. The method of Claim 4, wherein the step of providing comprises providing data indicating the record specification that describes characteristics of delimited records stored in the flat file.
8. A method for storing metadata in a relational database, wherein the metadata describes data in a flat file, wherein the metadata comprises (i) a header specification describing a general content of the data stored in the flat file, (ii) a record specification describing characteristics of a plurality of types of records contained in the flat file, and (iii) a field specification describing characteristics of fields of a record, wherein the metadata has a relation specification associated therewith indicating a relation between the plurality of types of records, and wherein the method comprises: (a) creating a first data structure for storing data representing the header specification; (b) storing the data representing the header specification in the first data structure; (c) creating a second data structure for storing data representing the record specification; (d) storing the data representing the record specification in the second data structure; (e) creating a third data structure for storing data representing the field specification; (f) storing the data representing the field specification in the third data structure; (g) creating a fourth data structure for storing data representing the relation specification; and (h) storing the data representing the relation specification in the fourth data structure.
9. The method of Claim 8, further comprising the step of providing data indicating structures of the header specification, the record specification, the field specification, and the relation specification to database software, and wherein the steps of creating the first, second, third, and fourth data structures comprises causing the database software to create the first, second, third, and fourth data structures based on the data indicating the structures.
10. The method of Claim 9, wherein the step of creating the second data structure comprises creating a data structure for storing data representing fixed-width and delimited records stored in the flat file.
1 1. The method of Claim 9, wherein the step of creating the second data structure comprises creating a data structure for storing data representing fixed-width records stored in the flat file.
12. The method of Claim 9, wherein the step of creating the second data structure comprises creating a data structure for storing data representing delimited records stored in the flat file.
13. A method for storing metadata in a relational database, wherein the metadata describes data in a flat file, wherein the metadata comprises (i) a header specification describing a general content of the data stored in the flat file, (ii) a record specification describing characteristics of a plurality of types of records contained in the flat file, and (iii) a field specification describing characteristics of fields of a record, wherein the metadata has a relation specification associated therewith indicating a relation between the plurality of types of records, wherein the relational database has a database engine associated therewith, and wherein the method comprises: (a) obtaining data representing the header specification; (b) constructing a first database query relating to the data representing the header specification; (c) passing the first database query to the database engine for execution so that the header specification can be stored in the relational database; (d) obtaining data representing the record specification; (e) constructing a second database query relating to the data representing the record specification; (f) passing the second database query to the database engine for execution so that the record specification can be stored in the relational database; (g) obtaining data representing the field specification; (h) constructing a third database query relating to the data representing the field specification; (i) passing the third database query to the database engine for execution so that the field specification can be stored in the relational database; (j) obtaining data representing the relation specification; (k) constructing a fourth database query relating to the data representing the relation specification; and (1) passing the forth database query to the database engine for execution so that the relation specification can be stored in the relational database.
14. The method of Claim 13, wherein steps (b), (e), (h), and (k) comprise constructing a SQL query.
15. A method for translating a first file containing data in a first format into a second file containing data in a second format, wherein the first format differs from the second format, and wherein the method comprises: (a) populating a relational database with metadata relating to the first file; and (b) using translating software, (i) accessing the data in the first file and the metadata in the relational database; and (ii) translating the first file into the second file using the metadata and the data in the first file.
16. The method of Claim 15, wherein the step of translating comprises translating the first file into the second file having an electronic document interchange format.
17. A system for storing metadata in a relational database, wherein the metadata describes data in a flat file, wherein the metadata comprises (i) a header specification describing a general content of the data stored in the flat file, (ii) a record specification describing characteristics of a plurality of types of records contained in the flat file, and (iii) a field specification describing characteristics of fields of a record, wherein the metadata has a relation specification associated therewith indicating a relation between the plurality of types of records, and wherein the system comprises: (a) a relational database; and (b) a processor in communication with the relational database, wherein the processor is programmed to (i) create data structures for storing data representing the header specification, the record specification, the field specification, and the relation specification in the relational database; and (ii) store data representing the header specification, the record specification, the field specification, and the relation specification in the data structures.
18. The system of Claim 17, wherein the processor is programmed to (a) create a first table for storing the header specification; (b) create a second table for storing- the record specification; (c) create a third table for storing the field specification; and (d) create a fourth table for storing the relation specification.
19. The system of Claim 17, wherein the processor is programmed to (a) store data representing the header specification in a first table; (b) store data representing the record specification in a second table; (c) store data representing the field specification in a third table; (d) store data representing the relation specification in a fourth table.
20. The system of Claim 17, wherein the processor is further programmed to provide data indicating structures of the header specification, the record specification, the field specification, and the relation specification to database software, and wherein the processor is programmed- to cause the database software to create the data structures based on the data indicating the structures.
21. The system of Claim 17, wherein the data indicating the structures of the record specification comprises data indicating characteristics of fixed-width and delimited records stored in the flat file.
22. The system of Claim 17, wherein the data indicating the structures of the record specification comprises data indicating characteristics of fixed-width records stored in the flat file.
23. The system of Claim 17, wherein the data indicating the structures of the record specification comprises data indicating characteristics of delimited records stored in the flat file.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US8371598P | 1998-04-30 | 1998-04-30 | |
US60/083,715 | 1998-04-30 | ||
US20392598A | 1998-12-02 | 1998-12-02 | |
US09/203,925 | 1998-12-02 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO1999056230A2 true WO1999056230A2 (en) | 1999-11-04 |
WO1999056230A3 WO1999056230A3 (en) | 2000-03-09 |
Family
ID=26769633
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1999/009442 WO1999056230A2 (en) | 1998-04-30 | 1999-04-29 | Method and system for storing metadata in a relational database |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO1999056230A2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001052118A2 (en) * | 2000-01-14 | 2001-07-19 | Saba Software, Inc. | Information server |
WO2003027904A2 (en) * | 2001-09-27 | 2003-04-03 | Qualcomm Incorporated | Method and system for providing a unified data exchange and storage format |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5021995A (en) * | 1985-12-17 | 1991-06-04 | Wang Laboratories, Inc. | Data exchange apparatus and methods |
WO1993022734A1 (en) * | 1992-05-01 | 1993-11-11 | Wang Laboratories, Inc. | Computer method and apparatus for a table driven file parser |
-
1999
- 1999-04-29 WO PCT/US1999/009442 patent/WO1999056230A2/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5021995A (en) * | 1985-12-17 | 1991-06-04 | Wang Laboratories, Inc. | Data exchange apparatus and methods |
WO1993022734A1 (en) * | 1992-05-01 | 1993-11-11 | Wang Laboratories, Inc. | Computer method and apparatus for a table driven file parser |
Non-Patent Citations (2)
Title |
---|
P. DREW ET AL : "A Metadata archictecture for multi-system interoperation" 1ST IEEE METADATA CONFERENCE, 16 - 18 April 1996, pages 1-16, XP002126079 Silver Spring, Maryland, USA * |
YOW T G ET AL: "Managing data warehouse metadata using the Web: a Web-based DBA maintenance tool suite" PROCEEDINGS OF THE THIRTY-FIRST HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (CAT. NO.98TB100216), PROCEEDINGS OF THE THIRTY-FIRST HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, KOHALA COAST, HI, USA, 6-9 JAN. 1998, pages 49-54 vol.6, XP002126080 1998, Los Alamitos, CA, USA, IEEE Comput. Soc, USA ISBN: 0-8186-8255-8 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001052118A2 (en) * | 2000-01-14 | 2001-07-19 | Saba Software, Inc. | Information server |
WO2001052118A3 (en) * | 2000-01-14 | 2003-12-18 | Saba Software Inc | Information server |
WO2003027904A2 (en) * | 2001-09-27 | 2003-04-03 | Qualcomm Incorporated | Method and system for providing a unified data exchange and storage format |
WO2003027904A3 (en) * | 2001-09-27 | 2003-11-06 | Qualcomm Inc | Method and system for providing a unified data exchange and storage format |
US7089257B2 (en) | 2001-09-27 | 2006-08-08 | Qualcomm, Inc. | Method and system for providing a unified data exchange and storage format |
Also Published As
Publication number | Publication date |
---|---|
WO1999056230A3 (en) | 2000-03-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8051094B2 (en) | Common interface to access catalog information from heterogeneous databases | |
US8886617B2 (en) | Query-based searching using a virtual table | |
US6834287B1 (en) | Classification engine for managing attribute-based data | |
US6799184B2 (en) | Relational database system providing XML query support | |
US6959416B2 (en) | Method, system, program, and data structures for managing structured documents in a database | |
EP1639503B1 (en) | A data processing method and system | |
US6636855B2 (en) | Method, system, and program for accessing stored procedures in a message broker | |
US6615206B1 (en) | Techniques for eliminating database table joins based on a join index | |
US7076567B1 (en) | Simplified application object data synchronization for optimized data storage | |
US7647311B2 (en) | Content enhancement for analyzing data in a database | |
CN1761962B (en) | Real-time aggregation of unstructured data into structured data for SQL processing by a relational database engine | |
US7194457B1 (en) | Method and system for business intelligence over network using XML | |
US7440963B1 (en) | Rewriting a query to use a set of materialized views and database objects | |
US20120265744A1 (en) | Knowledge-based e-catalog procurement system and method | |
US20020138353A1 (en) | Method and system for analysis of database records having fields with sets | |
CA2417763A1 (en) | System and method for comparing heterogeneous data sources | |
US9537977B2 (en) | Referencing change(s) in data utilizing a network resource locator | |
US7496556B2 (en) | UPIKM database system with integrated XML interface | |
CA2347785A1 (en) | Method for maintaining exception tables for a check utility | |
US7668807B2 (en) | Query rebinding for high-availability database systems | |
WO1999056230A2 (en) | Method and system for storing metadata in a relational database | |
WO2007021254A2 (en) | Systems and methods for integrating from data sources to data target locations | |
Harrison et al. | Development of a Prototype PML Server for an Auto-ID Enabled Robotic Manufacturing Environment | |
US7093233B1 (en) | Computer-implemented automatic classification of product description information | |
US20060161589A1 (en) | Simplifying Movement of Data to Different Desired Storage Portions Depending on the State of the Corresponding Transaction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): CN IN JP RU |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
AK | Designated states |
Kind code of ref document: A3 Designated state(s): CN IN JP RU |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
122 | Ep: pct application non-entry in european phase |