US20070260575A1 - System and method for managing records through establishing semantic coherence of related digital components including the identification of the digital components using templates - Google Patents

System and method for managing records through establishing semantic coherence of related digital components including the identification of the digital components using templates Download PDF

Info

Publication number
US20070260575A1
US20070260575A1 US11/797,644 US79764407A US2007260575A1 US 20070260575 A1 US20070260575 A1 US 20070260575A1 US 79764407 A US79764407 A US 79764407A US 2007260575 A1 US2007260575 A1 US 2007260575A1
Authority
US
United States
Prior art keywords
record
data file
records
data
electronic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/797,644
Inventor
Fred Y. Robinson
Rodney J. Ripley
Roy S. Rogers
Matthew J. McKennirey
Mark J. Evans
Gregory S. Hunter
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lockheed Martin Corp
Fenestra Tech Corp
Hunter Information Management Services Inc
Tessella Inc
Original Assignee
Lockheed Martin Corp
Fenestra Tech Corp
Hunter Information Management Services Inc
Tessella Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lockheed Martin Corp, Fenestra Tech Corp, Hunter Information Management Services Inc, Tessella Inc filed Critical Lockheed Martin Corp
Priority to US11/797,644 priority Critical patent/US20070260575A1/en
Assigned to TESSELLA INC. reassignment TESSELLA INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: EVANS, MARK J.
Assigned to FENESTRA TECHNOLOGIES CORPORATION reassignment FENESTRA TECHNOLOGIES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MCKENNIREY, MATTHEW J., ROGERS, ROY S., IV
Assigned to HUNTER INFORMATION MANAGEMENT SERVICES, INC. reassignment HUNTER INFORMATION MANAGEMENT SERVICES, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HUNTER, GREGORY S.
Assigned to LOCKHEED MARTIN CORPORATION reassignment LOCKHEED MARTIN CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: RIPLEY, RODNEY J., ROBINSON, FRED Y.
Publication of US20070260575A1 publication Critical patent/US20070260575A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2308Concurrency control
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/912Applications of a database
    • Y10S707/944Business related
    • Y10S707/948Product or catalog
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99951File or database maintenance
    • Y10S707/99952Coherency, e.g. same view to multiple users
    • Y10S707/99953Recoverability

Definitions

  • the example embodiments disclosed herein relate to systems and methods for managing records through establishing semantic coherence of related digital components including the identification of the digital components using templates.
  • One aspect of the invention is directed to an architecture that will support operational, functional, physical, and interface changes as they occur.
  • a suite of commercial off-the-shelf (COTS) hardware and software products has been selected to implement and deploy an embodiment of the invention in the ERA, but the inventive architecture is not limited to these products.
  • COTS commercial off-the-shelf
  • the architecture facilitates seamless COTS product replacement without negatively impacting the ERA system.
  • Another aspect of the ERA is to preserve and to provide ready access to authentic electronic records of enduring value.
  • the ERA supports and flows from NARA's mission to ensure “for the Citizen and the Public Servant, for the President and the Congress and the Courts, ready access to essential evidence.” This mission facilitates the exchange of vital ideas and information that sustains the United States of America.
  • NARA is responsible to the American people as the custodian of a diverse and expanding array of evidence of America's culture and heritage, of the actions taken by public servants on behalf of American citizens, and of the rights of American citizens.
  • the core of NARA's mission is that this essential evidence must be identified, preserved, and made available for as long as authentic records are needed—regardless of form.
  • An aspect of the invention involves an integrated ERA solution supporting NARA's evolving business processes to identify, preserve, and make available authentic, electronic records of enduring value—for as long as they are needed.
  • the ERA can be used to store, process, and/or disseminate a private institution's records. That is, in an embodiment, the ERA may store records pertaining to a private institution or association, and/or the ERA may be used by a first entity to store the records of a second entity.
  • System solutions no matter how elegant, may be integrated with the institutional culture and organizational processes of the users.
  • NARA Since 1934, NARA has developed effective and innovative processes to manage the records created or received, maintained or used, and destroyed or preserved in the course of public business transacted throughout the Federal Government. NARA played a role in developing this records lifecycle concept and related business processes to ensure long-term preservation of, and access to, authentic archival records. NARA also has been instrumental in developing the archival concept of an authentic record that consists of four fundamental attributes: content, structure, context, and presentation.
  • NARA has been managing electronic records of archival value since 1968, longer than almost anyone in the world. Despite this long history, the diverse formats and expanding volume of current electronic records pose new challenges and opportunities for NARA as it seeks to identify records of enduring value, preserve these records as vital evidence of our nation's past, and make these records accessible to citizens and public servants in accordance with statutory requirements.
  • the ERA should support, and may affect, the institution's (e.g., NARA's) evolving business processes. These business processes mirror the records lifecycle and are embodied in the agency's statutory authority:
  • the ERA solution provides an integrated and automated capability to manage electronic records from: the identification and capture of records of enduring value; through the storage, preservation, and description of the records; to access control and retrieval functions.
  • the archival mission is to identify, preserve, and make available records of enduring value, regardless of form.
  • This three-part archival mission is the core of the Open Archival Information System (OAIS) Reference Model, expressed as ingest, archival storage, and access.
  • OAIS Open Archival Information System
  • FIG. 1 one ERA solution is built around the generic OAIS Reference Model (presented in FIG. 1 ), which supports these core archival functions through data management, administration, and preservation planning.
  • the ERA may coordinate with the front-end activities of the creation, use, and maintenance of electronic records by Federal officials. This may be accomplished through the implementation of disposition agreements for electronic records and the development of templates or schemas that define the content, context, structure, and presentation of electronic records along with lifecycle data referring to these records.
  • the ERA solution may complement NARA's other activities and priorities, e.g., by improving the interaction between NARA staff and their customers (in the areas of scheduling, transfer, accessioning, verification, preservation, review and redaction, and/or ultimately the ease of finding and retrieving electronic records).
  • ERA Like NARA itself, the scope of ERA includes the management of electronic and non-electronic records, permanent and temporary records, and records transferred from Federal entities as well as those donated by individuals or organizations outside of the government. Each type of record is described and/or defined below.
  • ERA and Non-Electronic Records Although the focus of ERA is on preserving and providing access to authentic electronic records of enduring value, the system's scope also includes, for example, management of specific lifecycle activities for non-electronic records. ERA will support a set of lifecycle management processes (such as those used for NARA) for appraisal, scheduling, disposition, transfer, accessioning, and description of both electronic and non-electronic records. A common systems approach to appraisal and scheduling through ERA will improve the efficiency of such tasks for non-electronic records and help ensure that permanent electronic records are identified as early as possible within the records lifecycle. This same common approach will automate aspects of the disposition, transfer, accessioning, and description processes for all types of records that will result in significant workflow efficiencies.
  • lifecycle management processes such as those used for NARA
  • ERA Archivists, researchers, and other users may realize benefits by having descriptions of both electronic and non-electronic records available together in a powerful, universal catalog of holdings.
  • some of ERA's capabilities regarding non-electronic records may come from subsuming the functionality of legacy systems such the Archival Research Catalog (ARC).
  • ARC Archival Research Catalog
  • ERA also may maintain data interchange (but not subsume) other legacy systems and likely future systems related to non-electronic records.
  • Permanent and Temporary Records There is a fundamental archival distinction between records of enduring historic value, such as those that NARA must retain forever (e.g., permanent records) and those records that a government must retain for a finite period of time to conduct ongoing business, meet statutory and regulatory requirements, or protect rights and interests (e.g., temporary records).
  • NARA identifies these distinctions during the record appraisal and scheduling processes and they are reflected in NARA-approved disposition agreements and instructions. Specific records are actually categorized as permanent or temporary during the disposition and accessioning processes. NARA takes physical custody of all permanent records and some temporary records, in accordance with approved disposition agreements and instructions. While all temporary records are eventually destroyed, NARA ultimately acquires legal (in addition to physical) custody over all permanent records.
  • ERA may address the distinction between permanent and temporary records at various stages of the records life-cycle. ERA may facilitate an organization's records appraisal and scheduling processes where archivists and transferring entities may use the system to clearly identify records as either permanent or temporary in connection with the development and approval of disposition agreements and instructions. The ERA may use this disposition information in association with the templates to recognize the distinctions between permanent and temporary records upon ingest and manage these records within the system accordingly.
  • NARA's Records Center Program (RCP) is exploring offering its customers an ERA service to ingest and store long-term temporary records in persistent formats. To the degree that the RCP opts to facilitate their customers' access to the ERA for appropriate preservation of long-term temporary electronic records, this same coordination relationship with transferring entities through the RCP will allow NARA to effectively capture permanent electronic records earlier in the records lifecycle. In the end, ERA may also provide for the ultimate destruction of temporary electronic records.
  • ERA and Donated Materials In addition to federal records, NARA also receives and accesses donated archival materials. Such donated collections comprise a significant percentage of NARA's Presidential Library holdings, for example. ERA may manage donated electronic records in accordance with deeds of gift of deposit agreements which, when associated with templates, may ensure that these records are properly preserved and made available to users. Although donated materials may involve unusual disposition instructions or access restrictions, ERA should be flexible enough to adapt to these requirements. Since individuals or institutions donating materials to NARA are likely to be less familiar with ERA than federal transferring entities, the system may also include guidance and tools to help donors and the NARA appraisal staff working with them insure proper ingest, preservation, dissemination of donated materials.
  • Systems are designed to facilitate the work of users, and not the other way around.
  • One or more of the following illustrative classes of users may interact with the ERA: transferring entity; appraiser; records processor; preserver; access reviewer; consumer; administrative user; and/or a manager.
  • the ERA may take into account data security, business process re-engineering, and/or systems development and integration.
  • the ERA solution also may provide easy access to the tools the users need to process and use electronic records holdings efficiently.
  • NARA must meet challenges relating to archival of massive amounts of information, or the American people risk losing essential evidence that is only available in the form of electronic federal records. But beyond mitigating substantial risks, the ERA affords such opportunities as:
  • a system for ingesting, storing, and/or disseminating information may include an ingest module, a storage module, and a dissemination module that may be accessed by a user via one or more portals.
  • a system and method for automatically identifying, preserving, and disseminating archived materials may include extreme scale archive storage architecture with redundancy or at least survivability, suitable for the evolution from terabytes to exabytes, etc.
  • an electronic records archives comprising an ingest module to accept a file and/or a record, a storage module to associate the file or record with information and/or instructions for disposition, and an access or dissemination module to allow selected access to the file or record.
  • the ingest module may include structure and/or a program to create a template to capture content, context, structure, and/or presentation of the record or file.
  • the storage module may include structure or a program to preserve authenticity of the file or record over time, and/or to preserve the physical access to the record or file over time.
  • the access module may include structure and/or a program to provide a user with ability to view/render the record or file over time, to control access to restricted records, to redact restricted or classified records, and/or to provide access to an increasing number of users anywhere at any time.
  • the ingest module may include structure or a program to auto-generate a description of the file or record.
  • Each record may be transformed, e.g., using a framework that wraps and computerizes the record in a self-describing format with appropriate metadata to represent information in the template.
  • the ingest module may include structure or a program to process a submission Information Package (SIP), and/or an Archive Information Package (AIP).
  • the access module may include structure or a program to process a Dissemination Information Packages (DIP).
  • Independent aspects of the invention may include the ingest module alone or one or more aspects thereof, the storage module alone or one or more aspects thereof; and/or the access module alone or one or more aspects thereof.
  • Still further aspects of the invention relate to a methods for carrying out one or more functions of the ERA or components thereof (ingest module, storage module, and/or access module).
  • an ERA may be provided to address some or all of the more general problems.
  • archives systems exist for storing and preserving electronic assets, which are stored as digital data. Typically, these assets are preserved for a period of time (retention time) and then deleted.
  • metadata may include one or more of the following:
  • archives systems are designed to retain data for years or sometimes decades, but not longer. As retention times of assets become very long or indefinite, longevity of the archives system itself, as well as the assets archived, is needed because an archives system's basic requirement is to preserve assets.
  • Archives systems today are built on top of underlying storage systems based on commercial products that are typically comprised of file systems (e.g., Sun's ZFS file system) or relational databases (e.g., Oracle), and sometimes proprietary systems (e.g., EMC Centera). All of these storage systems have limitations in terms of scale (though sometimes the limits can be quite high). In some cases, there may be no products that can make use of the full scale of available file systems. Few of these systems can scale to trillions of entries (e.g., files). Limitations arise for different reasons but can be related to one or more of the following factors, alone or in combination:
  • Relational databases can scale only to 10 billion objects per instance. Relational DBs also generally do not perform as well as file systems for simple search and retrieval function tasks because they tend to introduce additional overhead to meet other requirements such as fine-grained transactional integrity. There is also no viable product that integrates multiple file systems in a way that provides both extreme scaling and longevity suitable for an archives file system.
  • a method for managing electronic records.
  • Each electronic record comprises a data file, a plurality of data files, a portion of a data file, or portions of a plurality of data files.
  • the electronic records comprise a plurality of record types and data file types.
  • the method comprises forming a data file set comprising one or more logically related data files; identifying attributes of each record type in a record type template; identifying specifications of each data file type in a data file type template; and extracting digital components from the data file set, wherein the extracted digital components relate to the attributes in each record type template and the specifications in each data file type template and comprise an individual record.
  • an electronic record archive for managing electronic record.
  • Each electronic record comprises a data file, a plurality of data files, a portion of a data file, or portions of a plurality of data files.
  • the electronic records comprise a plurality of record types and data file types.
  • the electronic record archive comprises a data file set comprising one or more logically related data files; a record type template for each record type, each record type template identifying attributes of each record type; a data file type template for each data file type, each data file type template identifying specifications of each data file type; and a digital component extractor configured to extract digital components from the data file set.
  • the extracted digital components relate to the attributes in each record type template and the specifications in each data file type template and comprise an individual record.
  • FIG. 1 is a reference model of an overall archives system
  • FIG. 2 is a chart demonstrating challenges and solutions related to certain illustrative aspects of the present invention
  • FIG. 3 illustrates the notional life cycle of records as they move through the ERA system, in accordance with an example embodiment
  • FIG. 4 illustrates the ERA System Functional Architecture from a notional perspective, delineating the system-level packages and external system entities, in accordance with an example embodiment
  • FIG. 5 illustrates a digital component extractor model according to the present invention
  • FIG. 6 illustrates an XML Schema as a template for content and structure of a record
  • FIG. 7 illustrates an instance of the template of FIG. 6 .
  • FIG. 8 illustrates an XSL template fore defining the presentation of the instance of FIG. 7 .
  • NARA U.S. National Archives and Records Administration
  • the implementations described for storage, processing, and/or access to information can also apply to any institution that requires and/or desires automated archiving and/or preservation of its information, e.g., documents, email, corporate IP/knowledge, etc.
  • stitution includes at least government agencies or entities, private companies, publicly traded corporations, universities and colleges, charitable or non-profit organizations, etc.
  • electronic records archive (ERA) is intended to encompass a storage, processing, and/or access archives for any institution, regardless of nature or size.
  • FIG. 2 relates specific electronic records challenges to the components of the OAIS Reference Model (ingest, archival storage, access, and data management/administration), and summarizes selected relevant research areas.
  • the ERA needs to identify and capture all components of the record that are necessary for effective storage and dissemination (e.g., content, context, structure, and presentation). This can be especially challenging for records with dynamic content (e.g., websites or databases).
  • NARA will not fulfill its mission simply by storing electronic records of archival value. Through the ERA, these records will be used by researchers long after the associated application software, operating system, and hardware all have become obsolete. The ERA also may apply and enforce access restrictions to sensitive information while at the same time ensuring that the public interest is served by consistently removing access restrictions that are no longer required by statute or regulation.
  • Migrations are potentially loss-full transformations, so techniques are needed to detect and measure any actual loss.
  • the system may reduce the likelihood of such loss by applying statistical sampling, based on human judgment for example, backed up with appropriate software tools, and/or institutionalized in a semi-automatic monitoring process.
  • Table 1 summarizes the “lessons learned” by the Applicants from experience with migrating different types of records to a Persistent Object Format (POF).
  • PPF Persistent Object Format
  • PDF Portable Document Format
  • ISO are currently developing, with assistance from NARA, a standard version of PDF specifically designed for archival purposes (PDF/A). This format has the benefit that it forces some ambiguities in the original to be removed.
  • Adobe and Microsoft are evolving towards using native XML for their document formats.
  • Images TIFF is a widely accepted open standard format for raster images and is a good candidate in the short to medium term for a POF.
  • the XML-based Scalable Vector Graphics format is an attractive option, particularly as it is a W3C open standard.
  • Databases The contents of a database should be converted to a POF rather than being maintained in the vendor's proprietary format. Migration of the contents of relational database tables to an XML or flat file format is relatively straightforward. However, in some cases, it is also desirable to represent and/or preserve the structure of the database.
  • BLOB Binary Large Object
  • a further challenge with database preservation is that of preserving not only the data, but the way that the users created and viewed the data. In some cases this may be depend on stored queries and stored procedures forming the database; in others it may depend on external applications interacting with the database. To preserve such “executable” aspects of the database “as a system” is an area of ongoing research.
  • the structural relationship between the different files in a web-site should be maintained.
  • the fact that most web-sites include external as well as internal links should be managed in designing a POF for web-sites.
  • the boundary of the domain to be archived should be defined and an approach decided on for how to deal with links to files outside of that domain.
  • Many modern web sites are actually applications where the navigation and formatting are generated dynamically from executed pages (e.g., Active Server Pages or Java Server Pages).
  • the actual content, including the user's preferences on what content is to be presented, is managed in a database. In this case, there are no simple web pages to archive, as different users may be presented with different material at different times. This situation overlaps with our discussion above of databases and the applications which interact with them.
  • the WAV and AVI formats are the de facto standards and therefore a likely basis for POFs.
  • For video there are a number of MPEG formats in general use, with varying degrees of compression. While it is desirable that only lossless compression techniques are used for archiving, if a lossy compression was used in the original format it cannot be recaptured in a POF.
  • One aspect is to encourage the evolution and enhancement of third-party migration software products by providing a framework into which such commercial off-the-shelf (COTS) software products could become part of the ERA if they meet appropriate tests.
  • COTS commercial off-the-shelf
  • the format may need to be migrated to a non-permanent but more modern, proprietary format (this is known as Enhanced Preservation). Even POFs are not static, since they still need executable software to interpret them, and future POFs may need to be created that have less feature loss than an older format. Thus, the ERA may allow migrated files to be migrated again into a new and more robust format in the future. Through the Dutch Testbed Project, the Applicants have found that it is normally better to return to the original file(s) whenever such a re-migration occurs.
  • certain example embodiments may revert to an original version of the document and migrate it to a POF accordingly, whereas certain other example embodiments may not be able to migrate the original document (e.g., because it is unavailable, in an unsupported format, etc.) and thus may be able to instead or in addition migrate the already-migrated file.
  • a new version of a record may be derived from an original version of the record if it is available or, if it the original is not available, the new version may be derived from any other already existing derivative version (e.g., of the original).
  • an extensible POF for certain example embodiments may be provided.
  • the ERA may comprise an ingest module to accept a file and/or a record, a storage module to associate the file or record with information and/or instructions for disposition, and an access or dissemination module to allow selected access to the file or record.
  • the ingest module may include structure and/or a program to create a template to capture content, context, structure, and/or presentation of the record or file.
  • the storage module may include structure and/or a program to preserve authenticity of the file or record over time, and/or to preserve the physical access to the record or file over time.
  • the access module may include structure or a program to provide a user with ability to view/render the record or file over time, to control access to restricted records, to redact restricted or classified records, and/or to provide access to an increasing number of users anywhere at any time.
  • FIG. 3 illustrates the notional life cycle of records as they move through the ERA system, in accordance with an example embodiment. Records flow from producers, who are persons or client systems that provide the information to be preserved, and end up with consumers, who are persons or client systems that interact with the ERA to find preserved information of interest and to access that information in detail. The Producer also may be a “Transferring Entity.”
  • Disposition Agreement contains disposition instructions, and also a related Preservation and Service Plan.
  • Producers submit records to the ERA System in a SIP.
  • the transfer occurs under a pre-defined Disposition Agreement and Transfer Agreement.
  • the ERA System validates the transferred SIP by scanning for viruses, ensuring the security access restrictions are appropriate, and checking the records against templates.
  • the ERA System informs the Producer of any potential problems, and extracts metadata (including descriptive data, described in greater detail below), creates an Archival Information Package (or AIP, also described in greater detail below), and places the AIP into Archival Storage.
  • archivists may perform Archival Processing, which includes developing arrangement, description, finding aids, and other metadata. These tasks will be assigned to archivists based on relevant policies, business rules, and management discretion. Archival processing supplements the Preservation Description Information metadata in the archives.
  • archivists may perform Preservation Processing, which includes transforming the records to authentically preserve them. Policies, business rules, Preservation and Service Plans, and management discretion will drive these tasks. Preservation processing supplements the Preservation Description Information metadata in the archives, and produces new (transformed) record versions.
  • archivists may perform Access Review and Redaction, which includes performing mediated searches, verifying the classification of records, and coordinating redaction of records where necessary. These tasks will be driven by policies, business rules, and access requests. Access Review and Redaction supplement the Preservation Description Information metadata in the archives, and produces new (redacted) record versions. Also, at any time after the AIP has been placed into Archival Storage, Consumers may search the archives to find records of interest.
  • FIG. 4 illustrates the ERA System Functional Architecture from a notional perspective, delineating the system-level packages and external system entities, in accordance with an example embodiment.
  • the rectangular boxes within the ERA System boundary represent the six system-level packages.
  • the ingest system-level package includes the means and mechanisms to receive the electronic records from the transferring entities and prepares those electronic records for storage within the ERA System, while the records management system-level package includes the services necessary to manage the archival properties and attributes of the electronic records and other assets within the ERA System as well as providing the ability to create and manage new versions of those assets.
  • Records Management includes the management functionality for disposition agreements, disposition instructions, appraisal, transfer agreements, templates, authority sources, records life cycle data, descriptions, and arrangements. In addition, access review, redaction, selected archival management tasks for non-electronic records, such as the scheduling and appraisal functions are also included within the Records Management service.
  • the Preservation system-level package includes the services necessary to manage the preservation of the electronic records to ensure their continued existence, accessibility, and authenticity over time.
  • the Preservation system-level service also provides the management functionality for preservation assessments, Preservation and Service Level plans, authenticity assessment and digital adaptation of electronic records.
  • the Archival Storage system-level package includes the functionality to abstract the details of mass storage from the rest of the system. This abstraction allows this service to be appropriately scaled as well as allow new technology to be introduced independent of the other system-level services according to business requirements.
  • the Dissemination system-level package includes the functionality to manage search and access requests for assets within the ERA System. Users have the capability to generate search criteria, execute searches, view search results, and select assets for output or presentation.
  • the architecture provides a framework to enable the use of multiple search engines offering a rich choice of searching capabilities across assets and their contents.
  • the Local Services and Control (LS&C) system-level package includes the functional infrastructure for the ERA Instance including a user interface portal, user workflow, security services, external interfaces to the archiving entity and other entities' systems, as well as the interfaces between ERA Instances. All external interfaces are depicted as flowing through LS&C, although the present invention is not so limited.
  • the ERA System contains a centralized monitoring and management capability called ERA Management.
  • the ERA Management hardware and/or software may be located at an ERA site.
  • the Systems Operations Center (SOC) provides the system and security administrators with access to the ERA management Virtual Local Area Network.
  • SOC manages one or more Federations of Instances based on the classification of the information contained in the Federation.
  • FIG. 5 is a federation of ERA instances, in accordance with an example embodiment.
  • the federation approach is described in greater detail below, although it is important to note here that the ERA and/or the asset catalog may be structured to work with and/or enable a federated approach.
  • the ERA's components may be structured to receive, manage, and process a large amount of assets and collections of assets. Because of the large amount of assets and collections of assets, it would be advantageous to provide an approach that scales to accommodate the same. Beyond the storage of the assets themselves, a way of understanding, accessing, and managing the assets may be provided to add meaning and functionality to the broader ERA. To serve these and/or other ends, an asset catalog including related, enabling features may be provided.
  • the asset catalog and storage system federator may address the following underlying problems, alone or in various combinations:
  • Electronic records are manifested, in some way, as electronic data files.
  • requirements for managing the relationship between electronic records and data files include, but are not limited to: 1) ensuring that all data files stored in the system are associated with the records they constitute; 2) specifying the relationship of each ingested data file with an electronic record; 3) specifying the relationship of each transformed data file to an electronic record; and 4) verifying the data files associated with electronic records contained in a transfer.
  • the present solves this complexity through an intermediate layer called a digital component extractor, which establishes a bridge between electronic records and data files.
  • This bridge allows archivists and transferring entities to model the true semantic relationship between individual electronic records and data files.
  • record means a unit of recorded information created, received, and maintained as evidence or information by an organization or person, in pursuance of legal obligations or the transaction of business.
  • a record can be said to exhibit a characteristic known as strong “semantic coherence,” which is implied by the “unit of recorded information” phrase in the definition of a record.
  • semantic coherence is defined as a conceptual meaning that is closely related through connections and consistency, and holds together firmly as parts of the same mass.
  • Semantic coherence covers a scale, from weak (no coherence) to strong (high coherence), and the exact point on the scale for any particular set of information will involve subjective (archival) judgment.
  • a record represents conceptual meaning that “sticks together” strongly enough on the semantic coherence scale to be considered an individual record.
  • strong semantic coherence is the characteristic that allows a distinction between one particular record and another particular record.
  • archivists With paper records, archivists often do not identify individual records, due to time and resource constraints. Instead, archivists typically manage records in the aggregate. With electronic records, archivists may have the capability and desire to identify individual electronic records as standard practice.
  • Each individual record has an attribute that defines its particular “record type.”
  • record type refers to the abstract form of the records, such as letter, memo, greeting card, or portrait, etc.
  • each record type represents a distinctive class of electronic records defined by their form.
  • a record type represents a distinctive class of records defined by their function or use.
  • a parish church will typically maintain many different types of electronic records, including baptismal records, deeds to parish properties, ledgers of the parish financial accounts, minutes of parish meetings, and official parish correspondence. Each of these different record types has a distinct intellectual form. For example, baptismal records almost always list at least the name of the person baptized, the date and place of birth, and the date and place of the baptism. In contrast, financial account ledger records might include a chart of accounts with debit/credit entries. It would be rather surprising to find an infant's birth date in a financial ledger.
  • record type template The abstract form of a record type is specified by a “record type template.”
  • record type template is template that identifies specific attributes for a specific type of record.
  • the record type template specifies the essential characteristics of the record, which are used to ensure authenticity.
  • FIG. 5 illustrates the relationship between a record and a record type template.
  • a record type template specifies the form of a record.
  • the Record Type Template also specifies the essential characteristics of the record, which are used to ensure authenticity as documented in co-pending, commonly assigned U.S. Application (Attorney Docket No 4870-25), entitled SYSTEM AND METHOD FOR PRESERVATION OF DIGITAL RECORDS.
  • Record aggregate means an intellectual aggregation of documentary material arising because they result from the same accumulation of filing process, the same function, or the same activity; have a particular form; or because of some other relationship arising out of their creation, receipt, or use; or because the aggregate was required for the purposes of archival arrangement.
  • Record aggregates may be composed of other record aggregates, or records.
  • Record aggregates can themselves be accumulated and organized into higher order record aggregates.
  • An archivist might place military service records into an aggregate for the branch of the military (e.g., Army) which itself is within an aggregate for the Department of Defense, which itself is within an aggregate for the Federal Government.
  • military e.g., Army
  • Department of Defense e.g., Department of Defense
  • Record aggregates may follow standard levels: record groups, collections, series, file units, and items. Each record aggregate has name and title attributes which help identify it. Record aggregates may be composed of other record aggregates, or electronic records. FIG. 5 illustrates the relationship between electronic records and record aggregates.
  • Record aggregates may either be homogeneous, i.e., they contain electronic records of the same record type, or heterogeneous, i.e., they contain electronic records of different record types.
  • record aggregates have a degree of semantic coherence—they are organized according to principles of original order and provenance, which ensures that related electronic records are aggregated together.
  • the semantic coherence that binds together a record aggregate is somewhat weaker than the semantic coherence that binds together a particular individual record.
  • an individual record within an aggregate has an independent identity because its semantic coherence is “strong enough” to be considered a record.
  • data files represent the atomic unit of recorded information for computers. Where electronic records are conceptual in nature, data files are clearly physical.
  • data file means: 1) a collection of data that is stored together and treated as a unit by a computer software application; and 2) related data (e.g., numeric, textual, and/or graphic information) and fields that are organized in a strictly prescribed form and format. This definition includes two characteristics of data files, which are described in more detail below.
  • the first characteristic is that data files typically require interpretation by a computer software application, which the OAIS model calls “access software.”
  • the OAIS definition for “access software” is a type of software that presents part of or all of the information content of an Information Object in forms understandable to humans or systems.
  • Presentation processing is defined as the software processing algorithms (including transformation, consolidation, tabulation, formatting, rendering, querying, filtering, interpretation, etc.) which access software employs to present the information contained in data files in a form understandable to humans.
  • Presentation processing covers a scale, from low (little to no processing required) to high (complex processing required), and the exact point on the scale for any particular set of information will involve subjective judgment. Presentation processing often involves presenting data files visually, but could also include presenting data files audibly or through any other human sensory perception.
  • Some data files are “eye readable” with minimal presentation processing. “Eye readable” is defined as data files whose information is inherently understandable to humans through visual inspection using access software that supports minimal presentation processing.
  • a fixed-length tabular dataset might be composed of one data file that structures tabular data into a regular row/column format that can easily be read and understood by a person. In this case, using access software might be optional.
  • a single web page might be composed of dozens of individual data files.
  • the web page might include multiple Hyper-Text Markup Language (HTML) data files, multiple Cascading Style Sheet (CSS) data files, client-side JavaScript script files, and multiple image files in various formats, such as Graphics Interchange Format (GIF) and Portable Network Graphics (PNG).
  • HTML Hyper-Text Markup Language
  • CSS Cascading Style Sheet
  • GIF Graphics Interchange Format
  • PNG Portable Network Graphics
  • DOC Microsoft Word's native binary
  • the second characteristic is that data files have a prescribed form and format.
  • the above examples reference several data file formats, including Hyper-Text Markup Language (HTML) and Microsoft Word's native binary (DOC).
  • HTML Hyper-Text Markup Language
  • DOC Microsoft Word's native binary
  • This prescribed form and format is specified by a “data file type template.”
  • data file type template means a set of specifications about a data type that governs its format and behaviors.
  • Data files are often aggregated to facilitate management and presentation processing.
  • the web page is composed of many individual data files, which is known as a “data file set.”
  • the term “data file set” means one or more data files that are logically related for purposes of presentation processing by access software.
  • Data file sets can either be “explicit,” or “implicit.” “Explicit” data file sets are defined by information contained in the data files, whereas “implicit” data file sets are defined through inscrutable software processing algorithms.
  • FIG. 5 illustrates the relationship between data files, data file type templates, data file sets, and access software.
  • Electronic records are conceptual and data files are physical. Electronic records are manifested in some way as electronic data files, but the manner in which the electronic records are manifested must first be determined.
  • An individual record may be composed of:
  • the parish church maintains each baptismal record as a separate word processing document data file, and its financial ledger as a separate spreadsheet data file. In this case, there is a one-to-one correspondence between a record and each data file.
  • the parish church maintains two separate spreadsheet data files for its financial ledger record, one spreadsheet for the balance statement and a second spreadsheet for the profit/loss statement.
  • one record is composed of multiple data files.
  • the parish church has a sophisticated content management software application to manage all of its documents.
  • the content management application stores all documents (including baptismal records, correspondence, financial ledgers, etc.) in one single database data file. In this case, one record is composed of a portion of one data file.
  • the parish church has a sophisticated content management software application to manage all of its documents.
  • the content management application stores all documents in one single database data file and all metadata about the documents in a separate database data file. In this case, one record is composed of portions of multiple data files.
  • Examples 10-13 the intellectual form, content, and number of electronic records remains fixed, while the relationship of those electronic records to data files varies, depending on the particulars of how the parish church manages and uses its data files at a specific point in time.
  • the present invention provides a solution to the gap between electronic records an data filed by adding a logical view which transforms between the conceptual and physical views.
  • the present invention provides a “digital component extractor.”
  • digital component extractor is defined as a software component that extracts digital components from a data file set, guided by a set of instructions.
  • a “digital component” is defined herein as a set of digital information that exhibits strong semantic coherence and is expressed as a bit stream.
  • the purpose of the digital component extractor is to extract digital components from data files in a data file set that together comprise a record.
  • FIG. 5 illustrates the model, which bridges the gap between electronic records and data files.
  • Digital component extractors establish the map between data files and electronic records, and because this map is many-to-many, the exact method by which digital component extractors extract digital components varies.
  • the digital component extractor simply needs to return the specified data file as the digital component. For example, a digital component extractor for a record that corresponds to a single word processing document data file would simply return that data file as the digital component.
  • the digital component extractor includes an algorithm to extract portions of the specified data file. For example, a digital component extractor for a record that corresponds to an e-mail archive data file would extract individual e-mails as digital components.
  • the digital component extractor includes an algorithm to extract portions of the specified data files. For example, a digital component extractor for a record that corresponds to a document spread across multiple database tables (and data files) in a content management software application would perform appropriate queries on those database tables to extract the digital component.
  • digital component extractors contain the instructions necessary to extract digital components from data file sets.
  • Table 2 documents the approaches for specifying digital component extractors, and their advantages and disadvantages.
  • the transferring entity defines The transferring entity defines Requires up-front planning and the digital component semantic coherence early, investment by the transferring extractors early in the records which ensures that the entity, plus a change in how lifecycle, as the records are information contained in the the transferring entity manages still in active use data files is accessible information
  • the transferring entity (with The transferring entity (with Requires a large time and assistance from the archivist) assistance from the archivist) resource investment at the defines the digital component generally has the subject area exact point (records extractors after-the-fact, as domain knowledge and management offices) at which part of preparing to transfer technical knowledge to transferring entities are the electronic records to ERA properly define semantic overburdened coherence
  • the ERA system itself The system can make A human might make better imputes digital component reasonable assumptions about assumptions than the extractors from record type the digital component automated ones, based on templates and data type extractors in an automated subjective judgment.
  • An archivist defines the digital The archivist generally has the Requires a large time and component extractors after- subject area domain resource investment from the the-fact, during archival knowledge and technical archivist, which may not scale processing knowledge to properly define to meet the electronic record semantic coherence archive's expected ingest volumes
  • the electronic record archive The system can apply This is an area of on-going system itself imputes semantic linguistic and pattern computer science research, and coherence and therefore matching algorithms to at this time this requires digital component extractors determine appropriate digital further development. from the data file content component extractors in an automated manner
  • the record type template indicates a particular set of records is correspondence, and the data file template indicates the data file is in Microsoft Outlook (PST) format.
  • PST Microsoft Outlook
  • a reasonable set of digital component extractors can be imputed that extract individual e-mails into separate digital components. Each digital component represents an individual e-mail, which exhibits strong semantic coherence.
  • the record type template indicates a particular set of records is geospatial information, and the data file template is in an unknown proprietary format that is not human readable and not documented. ERA cannot impute a reasonable set of digital component extractors because it is not aware of the data type format.
  • the ERA of the present invention will create a default set of digital component extractors, known as “placeholder digital component extractors,” which are defined as a set of digital component extractors that assume each data file is a single digital component
  • the levels of available preservation, access, and authenticity services that the ERA of the present can provide may be constrained for electronic records with placeholder digital component extractors, so these should be the exception rather than the norm.
  • placeholder digital component extractors are only consistent with the most basic level of service in ERA.
  • An approach to generating identifiers according to the present invention involves using a cryptographic hash algorithm (such as SHA-256) based on the initial content of the thing being identified. This approach meets the required constraints.
  • a cryptographic hash algorithm such as SHA-256
  • identity which is independent of its content.
  • identity of a record is independent of the content digital components and/or data files that make up any particular version of that record.
  • New versions of electronic records can arise from redaction and preservation activities, and each record version will have its own independent identifier that is related back to the record.
  • the identifier will be generated from the content of the entity when it is first created within ERA and immutable thereafter.
  • the identifier for electronic records would be generated and assigned when the record is created within ERA based on the content of the first version's digital components, and that identifier would be immutable thereafter.
  • the notion of digital components and digital component extractors has some interesting implications for preservation.
  • the InterPARES I Preservation Task Force states “It is impossible to preserve an electronic record. It is only possible to preserve the ability to reproduce an electronic record.” (“Preserving Electronic Records”, Presentation on the work of the InterPARES I Preservation Task Force, Jun. 19, 2002.)
  • the preservation strategy of the present invention ensures the digital component extractors produce digital components that authentically represent the record. This means that digital component extractors must honor the essential characteristics associated with the record (and which are specified in the record type template).
  • the process of redaction involves deleting specific content from a record to produce a new version of the record, and the new version of the record typically has reduced access restrictions.
  • redaction deleting digital content
  • most redaction tools redact content from data files, so the present invention will support this approach. This means that redaction will occur against data files, which will produce a new version of the data files, and the digital component extractors will produce new digital components from these redacted data files. This process will result in a new version of the record, that is composed of redacted digital components that have been extracted from redacted data files.
  • the digital component extractors of the present invention will be executed to produce a physical representation of a digital component.
  • a digital component would be a bit stream serialized as a managed file within the system.
  • the digital component extractors will be executed on-demand to produce a transient digital component, as needed.
  • a digital component would be a transient in-memory bit stream.
  • the present invention allow for both options, and the decisions on which to use will be a matter of policy and design.
  • Templates play a large part in NARA's vision of the ERA both as a means to manage electronic records, in respect to scheduling, and as a means to preserve records, in respect to defining preservation formats and processing.
  • the present invention utilizes a taxonomy of templates and the relationships between templates and instances of templates to identify and manage records.
  • the present invention also utilizes the relationship between hierarchical templates and hierarchical information using a matrix. Furthermore, the present invention provides for managing templates.
  • template may be associated with all of the following:
  • templates are being used according to the present invention to:
  • XML technologies as an example, an example of templates, and instances of documents that conform to or are generated by those templates that might be used in the preservation and presentation of a document displayed on a web page is provided.
  • the first template is an XML schema that defines the structure of the record catalog which lists the digital objects that are part of the web page and their hierarchical relationships. An instance of that template is a selection from the record catalog for the page in question.
  • the next template might be an XML schema that defines the content and structure of the document that is to be displayed on the page.
  • Each data element in the document is defined.
  • the relationship(s) of each data element to other data elements are also defined.
  • an instance of the template of FIG. 6 is an XML document (the textual content of the document) that conforms to that schema and which includes the data elements and content of the type defined in the schema.
  • the instance has data elements described in the schema that hold values, which is also consistent with the schema.
  • the next template might be an XSL template that defines the presentation of that XML instance in HTML on the web page (or as in some other format such as PDF).
  • the XSL template may be a spreadsheet, or other type of template, and can be used to describe how an XML instance that conforms to an XML shema will be presented or displayed, for example as HTML or a PDF file.
  • the template can also be used to transform an XML document into a variety of other formats, as well as into a different XML document.
  • templates may orchestrate a sequence of pages.
  • the instantiation of that template is the web page—which is the record that is being preserved.
  • Additional templates may be involved in defining the behavior of a web application, including templates that define the work flow within the application, templates that define the orchestration of pages within the application and templates that describe the animation of items on a page.
  • Table 3 provides an overview of some of the types of templates that may occur in the ERA of the present invention. Although each example has been mapped to an appropriate XML syntax that might be used to create the template, it should be appreciated that the present invention is not limited to the use of any particular format. It should also be appreciated that the list of templates Table 3 is not intended to be exhaustive. There are many possible applications for templates and there are other XML technologies, and non-XML technologies, which may be used.
  • System Components (an information component of the system, or description of a component of the system) Structure of Authority XML Authority Sources Sources and Thesauri Schema Structure and content of XML Persistent Formats where content is Persistent Object Formats Schema primarily words, numbers, vectors etc. (POF) *(1) BSDL Persistent Formats where content is primarily images, sound, etc. Digital Adaptation XSL/T Data type specific processing templates Instructions to transform from one data type to non-exhaustive list *(2) another Presentation of multimedia SMIL Templates to define interactions records between multiple digital items in multimedia presentations 5.
  • System Metadata Description and versioning of XML Disposition Agreement template templates Schema 6.
  • Templates may be used to define the relationships between records in the archives, such as defining the original order of records, the structure of the record catalog, and the structure of transfers to the archives or the delivery of copies to users (Submission Information Packages and Dissemination Information Packages).
  • Capturing the original order of a record represents a case where a template can be used within a template.
  • the structure of the Record Catalog can be described in a template that defines the information elements that make up an entry in the catalog.
  • the content of some of those information elements may be other templates, or they may be become values in the instantiation of an object that conforms to another template.
  • Templates may be used to define the content and structure of records schedules and other Life Cycle Documents.
  • Templates may be used to define the structure of record description, and the elements of information that compose the metadata of records.
  • a template for Archival Metadata which includes description and Life cycle data, will define which elements of information that must be present, what type of information they should contain, and how they are related to each other.
  • Templates may be used as inputs to processes that transform digital objects in the archive, including templates that may be used to define the presentation of assets to users.
  • the System component templates cover the widest variety of use of templates. This includes defining persistent object formats, defining the information needed by a processor to render those formats in a current format, defining the choreography and behaviors of objects in aggregate multimedia records, etc.
  • the System Components will be constantly evolving, adding new templates as new digital technologies evolve. Each type of system component will have its own family of templates.
  • Templates may be used to define the structure of component description.
  • the ERA system will archive itself and be self-describing. Templates will define elements of information needed for components to be self describing.
  • Templates may also be used to define the nature and rights of entities and the access restrictions on assets in the archive.
  • a records-centric access model will define restrictions and rights in relation to records using the internal structure of the records themselves. Templates will define the instructions on records and create the framework for aligning identity—role—authorization to protect the records.
  • Templates may further be used to describe system services and orchestrate services within work flow processes.
  • the Service Architecture describes the arrangement and delivery of services in the ERA system of the present invention, including the work flow processes and the functionality at each step in the process. Templates, expressed for example in Business Process Execution Language (BPEL), may be used to describe the orchestration of functional services, and at a lower level, describe the inputs and outputs to each individual functional services, using for example Web Services Description Language (WSDL).
  • BPEL Business Process Execution Language
  • WSDL Web Services Description Language
  • a hierarchical scheme according to the present invention may be implemented for managing templates.
  • the introduction of hierarchy to the management of templates adds another level of abstraction.
  • a template abstracts from a specific instance to the general case. Such a template is associated to a single type of object.
  • Another layer of abstraction may be added that can be applied to any of: 1) the template, 2) the content which it controls, or 3) both.
  • the template becomes a mirror of the organization of objects into increasing larger aggregate structures which is a method of organization common to the ERA system of the present invention as a whole.
  • Templates can have a hierarchical connotation either because: (a) the template itself can only be instantiated with reference to a hierarchy of templates which collectively define its content, or (b) the object the template describes can only be instantiated with reference to a hierarchy of digital items or conceptual arrangements of digital items.
  • instantiating the template requires retrieving elements from within different templates within a hierarchy.
  • Life Cycle Data document templates Transfer Agreements, Disposition Agreements, etc
  • the template hierarchy might look like:
  • ERA.xsd elements common to the ERA, such as identifiers
  • this may be implemented by having each template in each child level of the template hierarchy begin with an ⁇ include/> instruction that incorporates in the child template all the data elements described in its parent, which in turn will ⁇ include/> all the data elements in its parent, etc.
  • the template for archival metadata may include elements of information some of which are associated to a record catalog item that represents the conceptual concept of the entire record (the parent or root element of the record) while other elements of information are associated to individual digital items that are components of the record.
  • Metadata that describes the ⁇ Origin> of the record will likely be associated with the highest level in the record hierarchy, the “//Curie Collection” level, as the description of ⁇ Origin> applies to all the documents in that collection.
  • Metadata that describes the ⁇ Digital Object Type> of a specific document will be associated with a specific document, such as “//Curie Collection/Professional Papers/Research Activities/Reagents”.
  • templates and hierarchies can be presented in a matrix as shown in Table 4.
  • the templates either derived from a hierarchy or self-contained.
  • the conforming content again either derived from a hierarchy or self-contained.
  • the matrix below illustrates where some types of templates may fall in the matrix.
  • Axis Template is Life Cycle Document templates, Archival metadata, the schema Hierarchical where template is Life Cycle for metadata may be instantiated
  • the template is an Document + generic Life Cycle by aggregating schemas within a aggregation of template Elements hierarchy of metadata schemas, elements from a and the conforming metadata hierarchy of templates. document may be created from Document conformance the aggregation of all metadata cannot be tested without elements traversing a record including elements from hierarchy. the hierarchy.
  • Template is Self- System metadata, such as n/a Contained persistent format definitions
  • the template is a self- Service Architecture templates; contained object.
  • each template is both a functional component of the system and a record in the system.
  • the template is treated the same as any other record, with its own metadata, life cycle management, and preservation.
  • the ERA system of the present invention may be regarded, therefore, as an aggregate record, with its own hierarchy of documents, so that part of our ERA record hierarchy might look like
  • Each instance of a system component including templates, has its own archival metadata (metadata that describes a record). This latter metadata makes the component self describing.
  • a WSDL file is an instance of the template for defining a service and a BPEL file is an instance of the template that defines a work flow.
  • the archival metadata of the WSDL file will include information such as;
  • the present invention may use another template, the Archival Metadata schema, as the template to describe the service as a component of the system.
  • Templates will evolve as ERA evolves. As such templates, as records in ERA, will be versioned and managed. Life cycle data elements or records will include the version of the templates they use. Versioning will allow new templates to be introduced without creating problems with validation. Whether life cycle content that is subject to validation against templates should be updated as templates evolve will be a policy decision applied to each template.
  • Each process to update a template may be a standard work flow in the ERA, and described in its own template, which will include appropriate approval and authorization steps as determined in policy.
  • Templates as records, will have their own fixity information to ensure their integrity and the life cycle data of objects modified by templates will record which version of which template was used.
  • the concept of managing templates can be extended to apply to every component of the system.
  • Each software component of the ERA system should be described and held in the ERA. This applies to platform applications, web application components, any client side components, as well as all the functionality wrapped in web services which can be managed within the concept of managing templates as described above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • General Factory Administration (AREA)

Abstract

A method for managing electronic records is provided. Each electronic record includes a data file, a plurality of data files, a portion of a data file, or portions of a plurality of data files. The electronic records include a plurality of record types and data file types. The method includes forming a data file set comprising one or more logically related data files; identifying attributes of each record type in a record type template; identifying specifications of each data file type in a data file type template; and extracting digital components from the data file set. The extracted digital components relate to the attributes in each record type template and the specifications in each data file type template and compose an individual record. An electronic record archive includes record type and data file type templates and a digital component extractor.

Description

    CROSS-REFERENCES TO RELATED APPLICATIONS
  • This application claims the benefit of U.S. Applications 60/802,875, filed May 24, 2006, and 60/797,754, filed May 5, 2006, each of which is incorporated herein by reference in its entirety.
  • FIELD OF THE INVENTION
  • The example embodiments disclosed herein relate to systems and methods for managing records through establishing semantic coherence of related digital components including the identification of the digital components using templates.
  • BACKGROUND AND SUMMARY OF THE INVENTION
  • Since the earliest history, various institutions (e.g., governments and private companies alike) have recorded their actions and transactions. Subsequent generations have used these archival records to understand the history of the institution, the national heritage, and the human journey. These records may be essential to support the efficiency of the institution, to protect the rights of individuals and businesses, and/or to ensure that the private company or public corporation/company is accountable to its employees/shareholders and/or that the Government is accountable to its citizens.
  • With the advance of technology into a dynamic and unpredictable digital era, evidence of the acts and facts of institutions and the government and our national heritage are at risk of being irrecoverably lost. The challenge is pressing—as time moves forward and technologies become obsolete, the risks of loss increase. It will be appreciated that a need has developed in the art to develop an electronic records archives system and method especially, but not only, for the National Archives and Records Administration (NARA) in a system known as Electronic Records Archives (ERA), to resolve this growing problem, in a way that is substantially obsolescence-proof and policy neutral. While embodiments of the invention will be described with respect to its application for safeguarding government records, the described embodiments are not limited to archives systems applications nor to governmental applications and can also be applied to other large scale storage applications, in addition to archives systems, and for businesses, charitable (e.g., non-profit) and other institutions, and entities.
  • One aspect of the invention is directed to an architecture that will support operational, functional, physical, and interface changes as they occur. In one example, a suite of commercial off-the-shelf (COTS) hardware and software products has been selected to implement and deploy an embodiment of the invention in the ERA, but the inventive architecture is not limited to these products. The architecture facilitates seamless COTS product replacement without negatively impacting the ERA system.
  • Another aspect of the ERA is to preserve and to provide ready access to authentic electronic records of enduring value.
  • In one embodiment, the ERA supports and flows from NARA's mission to ensure “for the Citizen and the Public Servant, for the President and the Congress and the Courts, ready access to essential evidence.” This mission facilitates the exchange of vital ideas and information that sustains the United States of America. NARA is responsible to the American people as the custodian of a diverse and expanding array of evidence of America's culture and heritage, of the actions taken by public servants on behalf of American citizens, and of the rights of American citizens. The core of NARA's mission is that this essential evidence must be identified, preserved, and made available for as long as authentic records are needed—regardless of form.
  • The creation and use of an unprecedented and increasing volume of Federal electronic records—in a wide variety of formats, using evolving technologies—poses a problem that the ERA must solve. An aspect of the invention involves an integrated ERA solution supporting NARA's evolving business processes to identify, preserve, and make available authentic, electronic records of enduring value—for as long as they are needed.
  • In another embodiment, the ERA can be used to store, process, and/or disseminate a private institution's records. That is, in an embodiment, the ERA may store records pertaining to a private institution or association, and/or the ERA may be used by a first entity to store the records of a second entity. System solutions, no matter how elegant, may be integrated with the institutional culture and organizational processes of the users.
  • Since 1934, NARA has developed effective and innovative processes to manage the records created or received, maintained or used, and destroyed or preserved in the course of public business transacted throughout the Federal Government. NARA played a role in developing this records lifecycle concept and related business processes to ensure long-term preservation of, and access to, authentic archival records. NARA also has been instrumental in developing the archival concept of an authentic record that consists of four fundamental attributes: content, structure, context, and presentation.
  • NARA has been managing electronic records of archival value since 1968, longer than almost anyone in the world. Despite this long history, the diverse formats and expanding volume of current electronic records pose new challenges and opportunities for NARA as it seeks to identify records of enduring value, preserve these records as vital evidence of our nation's past, and make these records accessible to citizens and public servants in accordance with statutory requirements.
  • The ERA should support, and may affect, the institution's (e.g., NARA's) evolving business processes. These business processes mirror the records lifecycle and are embodied in the agency's statutory authority:
      • Providing guidance to Federal Agencies regarding records creation and records management;
      • Scheduling records for appropriate disposition;
      • Storing and preserving records of enduring value; and/or
      • Making records available in accordance with statutory and regulatory provisions.
  • Within this lifecycle framework, the ERA solution provides an integrated and automated capability to manage electronic records from: the identification and capture of records of enduring value; through the storage, preservation, and description of the records; to access control and retrieval functions.
  • Developing the ERA involves far more than just warehousing data. For example, the archival mission is to identify, preserve, and make available records of enduring value, regardless of form. This three-part archival mission is the core of the Open Archival Information System (OAIS) Reference Model, expressed as ingest, archival storage, and access. Thus, one ERA solution is built around the generic OAIS Reference Model (presented in FIG. 1), which supports these core archival functions through data management, administration, and preservation planning.
  • The ERA may coordinate with the front-end activities of the creation, use, and maintenance of electronic records by Federal officials. This may be accomplished through the implementation of disposition agreements for electronic records and the development of templates or schemas that define the content, context, structure, and presentation of electronic records along with lifecycle data referring to these records.
  • The ERA solution may complement NARA's other activities and priorities, e.g., by improving the interaction between NARA staff and their customers (in the areas of scheduling, transfer, accessioning, verification, preservation, review and redaction, and/or ultimately the ease of finding and retrieving electronic records).
  • Like NARA itself, the scope of ERA includes the management of electronic and non-electronic records, permanent and temporary records, and records transferred from Federal entities as well as those donated by individuals or organizations outside of the government. Each type of record is described and/or defined below.
  • ERA and Non-Electronic Records: Although the focus of ERA is on preserving and providing access to authentic electronic records of enduring value, the system's scope also includes, for example, management of specific lifecycle activities for non-electronic records. ERA will support a set of lifecycle management processes (such as those used for NARA) for appraisal, scheduling, disposition, transfer, accessioning, and description of both electronic and non-electronic records. A common systems approach to appraisal and scheduling through ERA will improve the efficiency of such tasks for non-electronic records and help ensure that permanent electronic records are identified as early as possible within the records lifecycle. This same common approach will automate aspects of the disposition, transfer, accessioning, and description processes for all types of records that will result in significant workflow efficiencies. Archivists, researchers, and other users may realize benefits by having descriptions of both electronic and non-electronic records available together in a powerful, universal catalog of holdings. In an embodiment, some of ERA's capabilities regarding non-electronic records may come from subsuming the functionality of legacy systems such the Archival Research Catalog (ARC). To effectively manage lifecycle data for all types of records, in certain embodiments, ERA also may maintain data interchange (but not subsume) other legacy systems and likely future systems related to non-electronic records.
  • Permanent and Temporary Records: There is a fundamental archival distinction between records of enduring historic value, such as those that NARA must retain forever (e.g., permanent records) and those records that a government must retain for a finite period of time to conduct ongoing business, meet statutory and regulatory requirements, or protect rights and interests (e.g., temporary records).
  • For a particular record series from the US Federal Government, NARA identifies these distinctions during the record appraisal and scheduling processes and they are reflected in NARA-approved disposition agreements and instructions. Specific records are actually categorized as permanent or temporary during the disposition and accessioning processes. NARA takes physical custody of all permanent records and some temporary records, in accordance with approved disposition agreements and instructions. While all temporary records are eventually destroyed, NARA ultimately acquires legal (in addition to physical) custody over all permanent records.
  • ERA may address the distinction between permanent and temporary records at various stages of the records life-cycle. ERA may facilitate an organization's records appraisal and scheduling processes where archivists and transferring entities may use the system to clearly identify records as either permanent or temporary in connection with the development and approval of disposition agreements and instructions. The ERA may use this disposition information in association with the templates to recognize the distinctions between permanent and temporary records upon ingest and manage these records within the system accordingly.
  • For permanent records this may involve transformation to persistent formats or use of enhanced preservation techniques to insure their preservation and accessibility forever. For temporary records, NARA's Records Center Program (RCP) is exploring offering its customers an ERA service to ingest and store long-term temporary records in persistent formats. To the degree that the RCP opts to facilitate their customers' access to the ERA for appropriate preservation of long-term temporary electronic records, this same coordination relationship with transferring entities through the RCP will allow NARA to effectively capture permanent electronic records earlier in the records lifecycle. In the end, ERA may also provide for the ultimate destruction of temporary electronic records.
  • ERA and Donated Materials: In addition to federal records, NARA also receives and accesses donated archival materials. Such donated collections comprise a significant percentage of NARA's Presidential Library holdings, for example. ERA may manage donated electronic records in accordance with deeds of gift of deposit agreements which, when associated with templates, may ensure that these records are properly preserved and made available to users. Although donated materials may involve unusual disposition instructions or access restrictions, ERA should be flexible enough to adapt to these requirements. Since individuals or institutions donating materials to NARA are likely to be less familiar with ERA than federal transferring entities, the system may also include guidance and tools to help donors and the NARA appraisal staff working with them insure proper ingest, preservation, dissemination of donated materials.
  • Systems are designed to facilitate the work of users, and not the other way around. One or more of the following illustrative classes of users may interact with the ERA: transferring entity; appraiser; records processor; preserver; access reviewer; consumer; administrative user; and/or a manager. The ERA may take into account data security, business process re-engineering, and/or systems development and integration. The ERA solution also may provide easy access to the tools the users need to process and use electronic records holdings efficiently.
  • NARA must meet challenges relating to archival of massive amounts of information, or the American people risk losing essential evidence that is only available in the form of electronic federal records. But beyond mitigating substantial risks, the ERA affords such opportunities as:
      • Using digital communication tools, such as the Internet, to make electronic records holdings, such as NARA's, available beyond the research room walls in offices, schools, and homes throughout the country and around the world;
      • Allowing users to take advantage of the information-processing efficiencies and capabilities afforded by electronic records;
      • Increasing the return on the public's investment by demonstrating technological solutions to electronic records problems that will be applied throughout our digital society in a wide variety of institutional settings; and/or
      • Developing tools for archivists to perform their functions more efficiently.
  • According to one aspect of the invention, there is provided a system for ingesting, storing, and/or disseminating information. The system may include an ingest module, a storage module, and a dissemination module that may be accessed by a user via one or more portals.
  • In an aspect of certain embodiments, there is provided a system and method for automatically identifying, preserving, and disseminating archived materials. The system/method may include extreme scale archive storage architecture with redundancy or at least survivability, suitable for the evolution from terabytes to exabytes, etc.
  • In another aspect of certain embodiments, there is provided an electronic records archives (ERA), comprising an ingest module to accept a file and/or a record, a storage module to associate the file or record with information and/or instructions for disposition, and an access or dissemination module to allow selected access to the file or record. The ingest module may include structure and/or a program to create a template to capture content, context, structure, and/or presentation of the record or file. The storage module may include structure or a program to preserve authenticity of the file or record over time, and/or to preserve the physical access to the record or file over time. The access module may include structure and/or a program to provide a user with ability to view/render the record or file over time, to control access to restricted records, to redact restricted or classified records, and/or to provide access to an increasing number of users anywhere at any time.
  • The ingest module may include structure or a program to auto-generate a description of the file or record. Each record may be transformed, e.g., using a framework that wraps and computerizes the record in a self-describing format with appropriate metadata to represent information in the template.
  • The ingest module, may include structure or a program to process a Submission Information Package (SIP), and/or an Archive Information Package (AIP). The access module may include structure or a program to process a Dissemination Information Packages (DIP).
  • Independent aspects of the invention may include the ingest module alone or one or more aspects thereof, the storage module alone or one or more aspects thereof; and/or the access module alone or one or more aspects thereof.
  • Still further aspects of the invention relate to a methods for carrying out one or more functions of the ERA or components thereof (ingest module, storage module, and/or access module).
  • The challenges faced by NARA are typical of broader archival problems and reveal drawbacks associated with known solutions. Thus, in an embodiment, an ERA may be provided to address some or all of the more general problems. In particular, archives systems exist for storing and preserving electronic assets, which are stored as digital data. Typically, these assets are preserved for a period of time (retention time) and then deleted. These systems maintain metadata about the assets in asset catalogs to facilitate asset management. Such metadata may include one or more of the following:
      • Attributes to uniquely identify assets;
      • Attributes to describe assets;
      • Attributes to facilitate search through the archives;
      • Attributes to define asset structure and relationships to other assets;
      • Attributes to organize assets;
      • Attributes for asset protection;
      • Attributes to maintain information about asset authenticity; and/or
      • Status of the asset lifecycle (e.g., planning receipt of asset through eventual deletion).
  • Unfortunately, these systems all suffer from several drawbacks. For example, there are limitations relating to the scale of the assets managed and, in particular, the size and number of all the assets maintained. These systems also have practical limitations in the duration in which they retain assets. Typically, archives systems are designed to retain data for years or sometimes decades, but not longer. As retention times of assets become very long or indefinite, longevity of the archives system itself, as well as the assets archived, is needed because an archives system's basic requirement is to preserve assets.
  • But indefinite longevity of an archives system and its assets pose challenges. For example, providing access to old electronic assets is complicated by obsolescence of the asset's format. Regular upgrades of the archives system itself, including migrations of asset data and/or metadata to new storage systems is complicated by extreme size of the assets managed, e.g., if the metadata has to be redesigned to handle new required attributes or to handle an order of magnitude greater number of assets than supported by the old design, then the old metadata generally will have to be migrated to the new design, which could entail a great deal of migration. Extreme scale and longevity make impractical archives systems that are not designed to accommodate unknown, future changes and reduce the impact of necessary change as much as possible.
  • Archives systems today are built on top of underlying storage systems based on commercial products that are typically comprised of file systems (e.g., Sun's ZFS file system) or relational databases (e.g., Oracle), and sometimes proprietary systems (e.g., EMC Centera). All of these storage systems have limitations in terms of scale (though sometimes the limits can be quite high). In some cases, there may be no products that can make use of the full scale of available file systems. Few of these systems can scale to trillions of entries (e.g., files). Limitations arise for different reasons but can be related to one or more of the following factors, alone or in combination:
      • Limitations of object or file identification schemes (e.g., uniqueness of identifiers. www.doi.org provides background on the state of the art for electronic/digital entity identifiers.);
      • Catalog limitations (e.g., number of entries, design bottlenecks);
      • The number of storage subsystems that can be integrated (sometimes termed horizontal scalability);
      • The capacity of underlying storage technologies;
      • Search and retrieval performance considerations (e.g., search can become impractical with extreme size);
      • The ability to distribute system components (e.g., systems can be difficult to distribute geographically); and/or
      • Limitations of system maintenance tasks that are a function of system size (e.g., systems can become impractical to administer with extreme size).
  • Currently, relational databases (DBs) can scale only to 10 billion objects per instance. Relational DBs also generally do not perform as well as file systems for simple search and retrieval function tasks because they tend to introduce additional overhead to meet other requirements such as fine-grained transactional integrity. There is also no viable product that integrates multiple file systems in a way that provides both extreme scaling and longevity suitable for an archives file system.
  • There clearly exists a need for a system and/or method for managing records that allows for identifying and managing the records that is not dependent on the original hardware and/or software used to create the records, which may have little or no records management function.
  • According to one embodiment of the present invention, a method is provided for managing electronic records. Each electronic record comprises a data file, a plurality of data files, a portion of a data file, or portions of a plurality of data files. The electronic records comprise a plurality of record types and data file types. The method comprises forming a data file set comprising one or more logically related data files; identifying attributes of each record type in a record type template; identifying specifications of each data file type in a data file type template; and extracting digital components from the data file set, wherein the extracted digital components relate to the attributes in each record type template and the specifications in each data file type template and comprise an individual record.
  • According to another embodiment of the present invention, an electronic record archive for managing electronic record is provided. Each electronic record comprises a data file, a plurality of data files, a portion of a data file, or portions of a plurality of data files. The electronic records comprise a plurality of record types and data file types. The electronic record archive comprises a data file set comprising one or more logically related data files; a record type template for each record type, each record type template identifying attributes of each record type; a data file type template for each data file type, each data file type template identifying specifications of each data file type; and a digital component extractor configured to extract digital components from the data file set. The extracted digital components relate to the attributes in each record type template and the specifications in each data file type template and comprise an individual record.
  • It will be appreciated that the above-described embodiments, and the elements thereof, may be used alone or in various combinations to realize yet further embodiments.
  • Other aspects, features, and advantages of this invention will become apparent from the following detailed description when taken in conjunction with the accompanying drawings, which are a part of this disclosure and which illustrate, by way of example, principles of this invention.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a reference model of an overall archives system;
  • FIG. 2 is a chart demonstrating challenges and solutions related to certain illustrative aspects of the present invention;
  • FIG. 3 illustrates the notional life cycle of records as they move through the ERA system, in accordance with an example embodiment;
  • FIG. 4 illustrates the ERA System Functional Architecture from a notional perspective, delineating the system-level packages and external system entities, in accordance with an example embodiment;
  • FIG. 5 illustrates a digital component extractor model according to the present invention;
  • FIG. 6 illustrates an XML Schema as a template for content and structure of a record;
  • FIG. 7 illustrates an instance of the template of FIG. 6; and
  • FIG. 8 illustrates an XSL template fore defining the presentation of the instance of FIG. 7.
  • DETAILED DESCRIPTION
  • The following description includes several examples and/or embodiments of computer-driven systems and/or methods for carrying out automated information storage, processing and/or access. In particular, the examples and embodiments are focused on systems and/or methods oriented specifically for use with the U.S. National Archives and Records Administration (NARA). However, it will be recognized that, while one or more portions of the present specification may be limited in application to NARA's specific requirements, most if not all of the described systems and/or methods have broader application. For example, the implementations described for storage, processing, and/or access to information (also sometimes referred to as ingest, storage, and dissemination) can also apply to any institution that requires and/or desires automated archiving and/or preservation of its information, e.g., documents, email, corporate IP/knowledge, etc. The term “institution” includes at least government agencies or entities, private companies, publicly traded corporations, universities and colleges, charitable or non-profit organizations, etc. Moreover, the term “electronic records archive” (ERA) is intended to encompass a storage, processing, and/or access archives for any institution, regardless of nature or size.
  • As one example, NARA's continuing fulfillment of its mission in the area of electronic records presents new challenges and opportunities, and the embodiments described herein that relate to the ERA and/or asset catalog may help NARA fulfill its broadly defined mission. The underlying risk associated with failing to meet these challenges or realizing these opportunities is the loss of evidence that is essential to sustaining a government's or an institution's needs. FIG. 2 relates specific electronic records challenges to the components of the OAIS Reference Model (ingest, archival storage, access, and data management/administration), and summarizes selected relevant research areas.
  • At Ingest—the ERA needs to identify and capture all components of the record that are necessary for effective storage and dissemination (e.g., content, context, structure, and presentation). This can be especially challenging for records with dynamic content (e.g., websites or databases).
  • Archival Storage—Recognizing that in the electronic realm the logical record is independent of its media, the four illustrative attributes of the record (e.g., content, context, structure, and presentation) and their associated metadata, still must be preserved “for the life of the Republic.”
  • Access—NARA will not fulfill its mission simply by storing electronic records of archival value. Through the ERA, these records will be used by researchers long after the associated application software, operating system, and hardware all have become obsolete. The ERA also may apply and enforce access restrictions to sensitive information while at the same time ensuring that the public interest is served by consistently removing access restrictions that are no longer required by statute or regulation.
  • Data Management—The amount of data that needs to be managed in the ERA can be monumental, especially in the context of government agencies like NARA. Presented herewith are embodiments that are truly scalable solutions that can address a range of needs—from a small focused Instance through large Instances. In such embodiments, the system can be scaled easily so that capacity in both storage and processing power is added when required, and not so soon that large excess capacities exist. This will allow for the system to be scaled to meet demand and provide for maximum flexibility in cost and performance to the institution (e.g., NARA).
  • Satisfactorily maintaining authenticity through technology-based transformation and re-representation of records is extremely challenging over time. While there has been significant research about migration of electronic records and the use of persistent formats, there has been no previous attempt to create an ERA solution on the scale required by some institutions such as NARA.
  • Migrations are potentially loss-full transformations, so techniques are needed to detect and measure any actual loss. The system may reduce the likelihood of such loss by applying statistical sampling, based on human judgment for example, backed up with appropriate software tools, and/or institutionalized in a semi-automatic monitoring process.
  • Table 1 summarizes the “lessons learned” by the Applicants from experience with migrating different types of records to a Persistent Object Format (POF).
  • TABLE 1
    Type of record Current Migration Possibilities
    E-mail The Dutch Testbed project has shown that e-mail can be
    successfully migrated to a POF. An XML-based POF was
    designed by Tessella as part of this work. Because e-mail
    messages can contain attached files in any format, an e-mail record
    should be preserved as a series of linked objects: the core message,
    including header information and message text, and related objects
    representing attachments. These record relationships are stored in
    the Record Catalog. Thus, an appropriate preservation strategy can
    be chosen and applied to each file, according to its type.
    Word processing Simple documents can be migrated to a POF, although document
    documents appearance can be complex and may include record characteristics.
    Some documents can also include other embedded documents
    which, like e-mail attachments, can be in any format. Documents
    can also contain macros that affect “behavior” and are very
    difficult to deal with generically. Thus, complex documents
    currently require an enhanced preservation strategy.
    Adobe's Portable Document Format (PDF) often has been treated
    as a suitable POF for Word documents, as it preserves presentation
    information and content. The PDF specification is controlled by
    Adobe, but it is published, and PDF readers are widely available,
    both from Adobe and from third-parties. ISO are currently
    developing, with assistance from NARA, a standard version of
    PDF specifically designed for archival purposes (PDF/A). This
    format has the benefit that it forces some ambiguities in the original
    to be removed. However, both Adobe and Microsoft are evolving
    towards using native XML for their document formats.
    Images TIFF is a widely accepted open standard format for raster images
    and is a good candidate in the short to medium term for a POF. For
    vector images, the XML-based Scalable Vector Graphics format is
    an attractive option, particularly as it is a W3C open standard.
    Databases The contents of a database should be converted to a POF rather
    than being maintained in the vendor's proprietary format.
    Migration of the contents of relational database tables to an XML
    or flat file format is relatively straightforward. However, in some
    cases, it is also desirable to represent and/or preserve the structure
    of the database. In the Dutch Digital Preservation Testbed project,
    this was achieved using a separate XML document to define the
    data types of columns, constraints (e.g., whether the data values in
    a column must be unique), and foreign key relationships, which
    define the inter-relationships between tables. The Swiss Federal
    Archives took a similar approach with their SIARD tool, but used
    SQL statements to define the database structure.
    Major database software vendors have taken different approaches
    to implementing the SQL “standard” and add extra non-standard
    features of their own. This complicates the conversion to a POF.
    Another difficulty is the Binary Large Object (BLOB) datatype,
    which presents similar problems to those of e-mail attachments:
    any type of data can be stored in a BLOB and in many document-
    oriented databases, the majority of the important or relevant data
    may be in this form. In this case, separate preservation strategies
    may be applied according to the type of data held.
    A further challenge with database preservation is that of preserving
    not only the data, but the way that the users created and viewed the
    data. In some cases this may be depend on stored queries and
    stored procedures forming the database; in others it may depend on
    external applications interacting with the database. To preserve
    such “executable” aspects of the database “as a system” is an area
    of ongoing research.
    Records with a For this type of record, it is difficult to separate the content from
    high degree of the application in which it was designed to operate. This makes
    “behavioral” these records time-consuming to migrate to any format. Emulation
    properties (e.g., is one approach, but this approach is yet to be fully tested in an
    virtual reality archival environment. Migration to a POF is another approach, and
    models) more research is required into developing templates to support this.
    Spreadsheets The Dutch Testbed project examined the preservation of
    spreadsheets and concluded that an XML-based POF was the best
    solution, though did not design the POF in detail. The structured
    nature of spreadsheet data means that it can be mapped reliably and
    effectively to an XML format. This approach can account for cell
    contents, the majority of appearance related issues (cell formatting,
    etc), and formulae used to calculate the contents of some cells.
    The Testbed project did not address how to deal with macros: most
    spreadsheet software products include a scripting or programming
    language to allow very complex macros to be developed (e.g.,
    Visual Basic for Applications as part of Microsoft Excel). This
    allows a spreadsheet file to contain a complex software application
    in addition to the data it holds. This is an area where further
    research is necessary, though it probably applies to only a small
    proportion of archival material.
    Web sites Most Web sites include documents in standardized formats (e.g.,
    HTML). However, it should be noted that there are a number of
    types of HTML documents, and many Web pages will include
    incorrectly formed HTML that nonetheless will be correctly
    displayed by current browsers. The structural relationship between
    the different files in a web-site should be maintained. The fact that
    most web-sites include external as well as internal links should be
    managed in designing a POF for web-sites. The boundary of the
    domain to be archived should be defined and an approach decided
    on for how to deal with links to files outside of that domain.
    Many modern web sites are actually applications where the
    navigation and formatting are generated dynamically from
    executed pages (e.g., Active Server Pages or Java Server Pages).
    The actual content, including the user's preferences on what
    content is to be presented, is managed in a database. In this case,
    there are no simple web pages to archive, as different users may be
    presented with different material at different times. This situation
    overlaps with our discussion above of databases and the
    applications which interact with them.
    Sound and video For audio streams, the WAV and AVI formats are the de facto
    standards and therefore a likely basis for POFs. For video, there
    are a number of MPEG formats in general use, with varying
    degrees of compression. While it is desirable that only lossless
    compression techniques are used for archiving, if a lossy
    compression was used in the original format it cannot be recaptured
    in a POF.
    For video archives in particular, there is the potential for extremely
    large quantities of material. High quality uncompressed video
    streams can consume up to 100 GB per hour of video, so storage
    space is an issue for this record type.
  • It is currently not possible to migrate a number of file formats in a way that will be acceptable for archival purposes. One aspect is to encourage the evolution and enhancement of third-party migration software products by providing a framework into which such commercial off-the-shelf (COTS) software products could become part of the ERA if they meet appropriate tests.
  • When an appropriate POF cannot be identified to reduce the chances of obsolescence, the format may need to be migrated to a non-permanent but more modern, proprietary format (this is known as Enhanced Preservation). Even POFs are not static, since they still need executable software to interpret them, and future POFs may need to be created that have less feature loss than an older format. Thus, the ERA may allow migrated files to be migrated again into a new and more robust format in the future. Through the Dutch Testbed Project, the Applicants have found that it is normally better to return to the original file(s) whenever such a re-migration occurs. Thus, when updating a record, certain example embodiments may revert to an original version of the document and migrate it to a POF accordingly, whereas certain other example embodiments may not be able to migrate the original document (e.g., because it is unavailable, in an unsupported format, etc.) and thus may be able to instead or in addition migrate the already-migrated file. Thus, in certain example embodiments, a new version of a record may be derived from an original version of the record if it is available or, if it the original is not available, the new version may be derived from any other already existing derivative version (e.g., of the original). As such, an extensible POF for certain example embodiments may be provided.
  • In view of the above aspects of the OAIS Reference Model, the ERA may comprise an ingest module to accept a file and/or a record, a storage module to associate the file or record with information and/or instructions for disposition, and an access or dissemination module to allow selected access to the file or record. The ingest module may include structure and/or a program to create a template to capture content, context, structure, and/or presentation of the record or file. The storage module may include structure and/or a program to preserve authenticity of the file or record over time, and/or to preserve the physical access to the record or file over time. The access module may include structure or a program to provide a user with ability to view/render the record or file over time, to control access to restricted records, to redact restricted or classified records, and/or to provide access to an increasing number of users anywhere at any time.
  • FIG. 3 illustrates the notional life cycle of records as they move through the ERA system, in accordance with an example embodiment. Records flow from producers, who are persons or client systems that provide the information to be preserved, and end up with consumers, who are persons or client systems that interact with the ERA to find preserved information of interest and to access that information in detail. The Producer also may be a “Transferring Entity.”
  • During the “Identify” stage, producers and archivists develop a Disposition Agreement to cover records. This Disposition Agreement contains disposition instructions, and also a related Preservation and Service Plan. Producers submit records to the ERA System in a SIP. The transfer occurs under a pre-defined Disposition Agreement and Transfer Agreement. The ERA System validates the transferred SIP by scanning for viruses, ensuring the security access restrictions are appropriate, and checking the records against templates. The ERA System informs the Producer of any potential problems, and extracts metadata (including descriptive data, described in greater detail below), creates an Archival Information Package (or AIP, also described in greater detail below), and places the AIP into Archival Storage. At any time after the AIP has been placed into Archival Storage, archivists may perform Archival Processing, which includes developing arrangement, description, finding aids, and other metadata. These tasks will be assigned to archivists based on relevant policies, business rules, and management discretion. Archival processing supplements the Preservation Description Information metadata in the archives.
  • At any time after the AIP has been placed into Archival Storage, archivists may perform Preservation Processing, which includes transforming the records to authentically preserve them. Policies, business rules, Preservation and Service Plans, and management discretion will drive these tasks. Preservation processing supplements the Preservation Description Information metadata in the archives, and produces new (transformed) record versions.
  • With respect to the “Make Available” phase, at any time after the AIP has been placed into Archival Storage, archivists may perform Access Review and Redaction, which includes performing mediated searches, verifying the classification of records, and coordinating redaction of records where necessary. These tasks will be driven by policies, business rules, and access requests. Access Review and Redaction supplement the Preservation Description Information metadata in the archives, and produces new (redacted) record versions. Also, at any time after the AIP has been placed into Archival Storage, Consumers may search the archives to find records of interest.
  • FIG. 4 illustrates the ERA System Functional Architecture from a notional perspective, delineating the system-level packages and external system entities, in accordance with an example embodiment. The rectangular boxes within the ERA System boundary represent the six system-level packages. The ingest system-level package includes the means and mechanisms to receive the electronic records from the transferring entities and prepares those electronic records for storage within the ERA System, while the records management system-level package includes the services necessary to manage the archival properties and attributes of the electronic records and other assets within the ERA System as well as providing the ability to create and manage new versions of those assets. Records Management includes the management functionality for disposition agreements, disposition instructions, appraisal, transfer agreements, templates, authority sources, records life cycle data, descriptions, and arrangements. In addition, access review, redaction, selected archival management tasks for non-electronic records, such as the scheduling and appraisal functions are also included within the Records Management service.
  • The Preservation system-level package includes the services necessary to manage the preservation of the electronic records to ensure their continued existence, accessibility, and authenticity over time. The Preservation system-level service also provides the management functionality for preservation assessments, Preservation and Service Level plans, authenticity assessment and digital adaptation of electronic records. The Archival Storage system-level package includes the functionality to abstract the details of mass storage from the rest of the system. This abstraction allows this service to be appropriately scaled as well as allow new technology to be introduced independent of the other system-level services according to business requirements. The Dissemination system-level package includes the functionality to manage search and access requests for assets within the ERA System. Users have the capability to generate search criteria, execute searches, view search results, and select assets for output or presentation. The architecture provides a framework to enable the use of multiple search engines offering a rich choice of searching capabilities across assets and their contents.
  • The Local Services and Control (LS&C) system-level package includes the functional infrastructure for the ERA Instance including a user interface portal, user workflow, security services, external interfaces to the archiving entity and other entities' systems, as well as the interfaces between ERA Instances. All external interfaces are depicted as flowing through LS&C, although the present invention is not so limited.
  • The ERA System contains a centralized monitoring and management capability called ERA Management. The ERA Management hardware and/or software may be located at an ERA site. The Systems Operations Center (SOC) provides the system and security administrators with access to the ERA management Virtual Local Area Network. Each SOC manages one or more Federations of Instances based on the classification of the information contained in the Federation.
  • Also shown are the three primary data stores for each Instance:
      • 1. Ingest Working Storage—Contains transfers that remain until they are verified and placed into the Electronic Archives;
      • 2. Electronic Archives—Contains all assets (e.g., disposition agreements, records, templates, descriptions, authority sources, arrangements, etc.); and
      • 3. Instance Data Storage—Contains a performance cache of all business assets, operational data and the ERA asset catalog.
  • This diagram provides a representative illustration of how a federated ERA system can be put together, though it will be appreciated that the same is given by way of example and without limitation. Also, the diagram describes a collection of Instances at the same security classification level and compartment that can communicate electronically via a WAN with one another, although the present invention is not so limited. For example, FIG. 5 is a federation of ERA instances, in accordance with an example embodiment. The federation approach is described in greater detail below, although it is important to note here that the ERA and/or the asset catalog may be structured to work with and/or enable a federated approach.
  • The ERA's components may be structured to receive, manage, and process a large amount of assets and collections of assets. Because of the large amount of assets and collections of assets, it would be advantageous to provide an approach that scales to accommodate the same. Beyond the storage of the assets themselves, a way of understanding, accessing, and managing the assets may be provided to add meaning and functionality to the broader ERA. To serve these and/or other ends, an asset catalog including related, enabling features may be provided.
  • In particular, to address the overall problems of scaling and longevity, the asset catalog and storage system federator may address the following underlying problems, alone or in various combinations:
      • Capturing business objects that relate to assets that are particular to the application storing the assets (e.g., in an archiving system, such business objects may include, for example, disposition and destruction information, receipt information, legal transfer information, appraisals and archive description, etc.), with each new business use of the design potentially defining unique business objects that are needed to control its assets and execute its business processes;
      • Maintaining arbitrary asset attributes to be flexible in accommodating unknown future attributes;
      • Employing asset and other identifiers that are immutable so that they remain useful indefinitely and, therefore, enable them to be referenced both within the archives and by external entities with a reduced concern for changes over time;
      • Supporting search and navigation through the extreme scale and diversity of assets archived;
      • Handling obsolescence of assets that develops over time;
      • Accommodating redacted and other derivative versions of assets appropriate for an archive system;
      • Federating (e.g., integrate independent parts to create a larger whole) multiple, potentially heterogeneous, distributed, and independent archives systems (e.g., instances) to provide a larger scale archive system;
      • Supporting a distributed implementation necessary for scaling, site independence, and disaster recovery considerations where the distribution of assets and associated catalogs may change over time but remain visible to all sites;
      • Employing a search architecture and catalog format that allows exploitation of multiple, possibly commercial search engines for differing asset data types and across instances of archives in a federation, as future needs may dictate;
      • Accommodating multiple, heterogeneous, commercial storage subsystems among and within the instances in a federation of archives to achieve extreme scaling and adapt to changes over time;
      • Supporting a variety of data handling requirements based on, for example, security level, handling restrictions and ownership, in a manner that performs well and remains manageable for an extremely large number of assets and catalog entries;
      • Supporting storage of any kind of electronic asset;
      • Supporting transparent data location and migration and storage subsystem upgrades/changes; and/or
      • Supporting reconstruction of the catalog and archives with little or no information other than the original catalog and archived bit streams (e.g., for the purposes of disaster recovery).
  • Electronic records are manifested, in some way, as electronic data files. There are several requirements for managing the relationship between electronic records and data files. These requirements include, but are not limited to: 1) ensuring that all data files stored in the system are associated with the records they constitute; 2) specifying the relationship of each ingested data file with an electronic record; 3) specifying the relationship of each transformed data file to an electronic record; and 4) verifying the data files associated with electronic records contained in a transfer.
  • The relationship between electronic records and data files appears simple at first glance, but is in reality somewhat complex, particularly when considering the relationship between an individual electronic record and data files, as is required by requirements 2) and 3) above. Although it is tempting to think of electronic records as being directly composed of data files, this is incorrect, as explained in more detail below.
  • The present solves this complexity through an intermediate layer called a digital component extractor, which establishes a bridge between electronic records and data files. This bridge allows archivists and transferring entities to model the true semantic relationship between individual electronic records and data files.
  • The concept of a record originates in the archival and records management domains, where a record represents a “unit of recorded information”. As used herein, the term “record” means a unit of recorded information created, received, and maintained as evidence or information by an organization or person, in pursuance of legal obligations or the transaction of business.
  • This definition has a conceptual basis, in the sense that records are recognized and understood by humans to represent information. It is necessary when discussing electronic records to distinguish the archival and records management term “record” with the computer science concept of the same name. The computer science concept of “record” formally represents a matrix-tuple in linear algebra which is analogous to a row in a database table. The present invention uses the unqualified term “record” to indicate the archival and records management concept, and uses the qualifier “tuple record” to indicate the computer science concept. As used herein, the term “tuple record” means a matrix-tuple (defined by linear algebra), which is a finite function that maps field names to a certain value.
  • Archivists and records managers typically manage numerous records. The requirements discussed above require the system to manage not only records (in the plural), but also individual records (in the singular). The requirement to manage both individual and plural records presents several questions, including, but not limited to: 1) what defines the exact extent of an individual record? and 2) where precisely does an individual record start and where precisely does it end?
  • The answers to these questions must be precisely specified in the context of electronic records, where individual electronic records are managed independently.
  • Given the conceptual nature of records, a conceptual approach to defining the exact extent of a particular individual record is needed. A record can be said to exhibit a characteristic known as strong “semantic coherence,” which is implied by the “unit of recorded information” phrase in the definition of a record. As used herein, the term “semantic coherence” is defined as a conceptual meaning that is closely related through connections and consistency, and holds together firmly as parts of the same mass.
  • Semantic coherence covers a scale, from weak (no coherence) to strong (high coherence), and the exact point on the scale for any particular set of information will involve subjective (archival) judgment. A record represents conceptual meaning that “sticks together” strongly enough on the semantic coherence scale to be considered an individual record.
  • Consider the following examples of semantic coherence:
  • EXAMPLE 1
  • Consider a record of a particular veteran's military service. Information about that individual's service dates, ranks, and defined benefits is strongly logically connected. Is the same information for a different individual the same record? No, because the logical connection for information about one particular individual is very strong whereas the logical connection for information across individuals is weaker.
  • EXAMPLE 2
  • Consider again a record of a veteran's military service. Now consider information about a battle plan for a particular military engagement in which the individual participated. Is the battle plan part of the individual's military service record? No, while the battle plan is in itself a record (and is loosely connected to the individual's service record), its meaning is inconsistent with the service record, and is therefore a separate record.
  • Put another way, strong semantic coherence is the characteristic that allows a distinction between one particular record and another particular record.
  • With paper records, archivists often do not identify individual records, due to time and resource constraints. Instead, archivists typically manage records in the aggregate. With electronic records, archivists may have the capability and desire to identify individual electronic records as standard practice.
  • Each individual record has an attribute that defines its particular “record type.” As used herein, the term “record type” refers to the abstract form of the records, such as letter, memo, greeting card, or portrait, etc. As such, each record type represents a distinctive class of electronic records defined by their form. A record type represents a distinctive class of records defined by their function or use. Consider the following example of record types:
  • EXAMPLE 3
  • A parish church will typically maintain many different types of electronic records, including baptismal records, deeds to parish properties, ledgers of the parish financial accounts, minutes of parish meetings, and official parish correspondence. Each of these different record types has a distinct intellectual form. For example, baptismal records almost always list at least the name of the person baptized, the date and place of birth, and the date and place of the baptism. In contrast, financial account ledger records might include a chart of accounts with debit/credit entries. It would be rather surprising to find an infant's birth date in a financial ledger.
  • The abstract form of a record type is specified by a “record type template.” As used herein a “record type template” is template that identifies specific attributes for a specific type of record. The record type template specifies the essential characteristics of the record, which are used to ensure authenticity.
  • Referring again to Example 3, the record type template for baptismal records would identify the information expected in that type of record, such as the name of the person baptized, date and place of birth, etc. FIG. 5 illustrates the relationship between a record and a record type template. A record type template specifies the form of a record.
  • The Record Type Template also specifies the essential characteristics of the record, which are used to ensure authenticity as documented in co-pending, commonly assigned U.S. Application (Attorney Docket No 4870-25), entitled SYSTEM AND METHOD FOR PRESERVATION OF DIGITAL RECORDS.
  • Electronic records are accumulated and organized into “record aggregates” to facilitate organization and archival processing. As used herein, the term “record aggregate” means an intellectual aggregation of documentary material arising because they result from the same accumulation of filing process, the same function, or the same activity; have a particular form; or because of some other relationship arising out of their creation, receipt, or use; or because the aggregate was required for the purposes of archival arrangement. Record aggregates may be composed of other record aggregates, or records.
  • Record aggregates can themselves be accumulated and organized into higher order record aggregates. Consider the following example of a record aggregates:
  • EXAMPLE 4
  • An archivist might place military service records into an aggregate for the branch of the military (e.g., Army) which itself is within an aggregate for the Department of Defense, which itself is within an aggregate for the Federal Government.
  • Record aggregates may follow standard levels: record groups, collections, series, file units, and items. Each record aggregate has name and title attributes which help identify it. Record aggregates may be composed of other record aggregates, or electronic records. FIG. 5 illustrates the relationship between electronic records and record aggregates.
  • Record aggregates may either be homogeneous, i.e., they contain electronic records of the same record type, or heterogeneous, i.e., they contain electronic records of different record types.
  • Like electronic records, record aggregates have a degree of semantic coherence—they are organized according to principles of original order and provenance, which ensures that related electronic records are aggregated together. However, the semantic coherence that binds together a record aggregate is somewhat weaker than the semantic coherence that binds together a particular individual record. Put another way, an individual record within an aggregate has an independent identity because its semantic coherence is “strong enough” to be considered a record.
  • Computer software applications operate on data files, and data files represent the atomic unit of recorded information for computers. Where electronic records are conceptual in nature, data files are clearly physical. As used herein, the term “data file” means: 1) a collection of data that is stored together and treated as a unit by a computer software application; and 2) related data (e.g., numeric, textual, and/or graphic information) and fields that are organized in a strictly prescribed form and format. This definition includes two characteristics of data files, which are described in more detail below.
  • The first characteristic is that data files typically require interpretation by a computer software application, which the OAIS model calls “access software.” The OAIS definition for “access software” is a type of software that presents part of or all of the information content of an Information Object in forms understandable to humans or systems.
  • While it is conceivable that a person might look at all the individual bits of a data file to try to make sense of it, people generally use access software to present the information in some usable manner. The access software performs some kind of “presentation processing” to accomplish this. “Presentation processing” is defined as the software processing algorithms (including transformation, consolidation, tabulation, formatting, rendering, querying, filtering, interpretation, etc.) which access software employs to present the information contained in data files in a form understandable to humans.
  • Presentation processing covers a scale, from low (little to no processing required) to high (complex processing required), and the exact point on the scale for any particular set of information will involve subjective judgment. Presentation processing often involves presenting data files visually, but could also include presenting data files audibly or through any other human sensory perception.
  • Some data files are “eye readable” with minimal presentation processing. “Eye readable” is defined as data files whose information is inherently understandable to humans through visual inspection using access software that supports minimal presentation processing.
  • Only the simplest of data files are eye readable and most data files are completely unintelligible without a high degree of presentation processing. Using access software specifically suited to presenting a certain class of data files is necessary when the access software performs a high degree of software processing because without this access software, the information in the data files would be incomprehensible. Consider the following examples:
  • EXAMPLE 5
  • A fixed-length tabular dataset might be composed of one data file that structures tabular data into a regular row/column format that can easily be read and understood by a person. In this case, using access software might be optional.
  • EXAMPLE 6
  • A single web page might be composed of dozens of individual data files. For example, the web page might include multiple Hyper-Text Markup Language (HTML) data files, multiple Cascading Style Sheet (CSS) data files, client-side JavaScript script files, and multiple image files in various formats, such as Graphics Interchange Format (GIF) and Portable Network Graphics (PNG).
  • While a person could look through the individual bytes in each of these individual files, doing so would not provide an accurate sense of the data files' information content. This is because the access software, a web browser, actually performs a great deal of software processing to apply style sheets to transform and render content, more software processing to render images, and more software processing to render the behavior contained in the client-side scripts. This kind of software processing cannot easily be imagined or replicated by a person, so using access software is required.
  • EXAMPLE 7
  • Many data file formats are either undocumented, or are essentially incomprehensible to a person. For example, Microsoft Word's native binary (DOC) data file format is incompletely documented (due to the fact that it is proprietary) and is incomprehensible to a person who might look at the individual bytes within the data file. Using access software for these kinds of data files is required.
  • Historically, data files created in the earlier days of computing require low presentation processing, but as computers, software, data, and algorithms have continually increased in complexity over time, the amount of required presentation processing has also increased.
  • The second characteristic is that data files have a prescribed form and format. The above examples reference several data file formats, including Hyper-Text Markup Language (HTML) and Microsoft Word's native binary (DOC). This prescribed form and format is specified by a “data file type template.” As used herein, the term “data file type template” means a set of specifications about a data type that governs its format and behaviors.
  • The “specifications” in the above definition are essentially the instructions required by the access software to perform presentation processing.
  • Data files are often aggregated to facilitate management and presentation processing. In the web page example (Example 6), the web page is composed of many individual data files, which is known as a “data file set.” The term “data file set” means one or more data files that are logically related for purposes of presentation processing by access software.
  • Data file sets can either be “explicit,” or “implicit.” “Explicit” data file sets are defined by information contained in the data files, whereas “implicit” data file sets are defined through inscrutable software processing algorithms. Consider these examples:
  • EXAMPLE 8
  • Consider again the example of a web page. When an HTML data file refers to a CSS style sheet data file, it does so explicitly by data file name. This name can be resolved to find the CSS data file.
  • EXAMPLE 9
  • Consider an example of a set of database tables that include multiple data files for different kinds of information. One data file might contain simple data, another might contain binary data, and yet another data file might contain index information. The relationship between these data files is implicit, meaning it is not specified within the data files. Only the database application software defines these relationships as part of its presentation processing.
  • FIG. 5 illustrates the relationship between data files, data file type templates, data file sets, and access software.
  • As discussed above, electronic records are conceptual and data files are physical. Electronic records are manifested in some way as electronic data files, but the manner in which the electronic records are manifested must first be determined.
  • First, the options to describe the relationship between electronic records and data files should be considered. An individual record may be composed of:
      • One entire data file
      • Multiple entire data files
      • A portion of one data file
      • Portions of multiple data files
  • All of these options may apply, as explained in the following examples, which extend the example of the parish church (Example 3).
  • EXAMPLE 10
  • The parish church maintains each baptismal record as a separate word processing document data file, and its financial ledger as a separate spreadsheet data file. In this case, there is a one-to-one correspondence between a record and each data file.
  • EXAMPLE 11
  • The parish church maintains two separate spreadsheet data files for its financial ledger record, one spreadsheet for the balance statement and a second spreadsheet for the profit/loss statement. In this case, one record is composed of multiple data files.
  • EXAMPLE 12
  • The parish church has a sophisticated content management software application to manage all of its documents. The content management application stores all documents (including baptismal records, correspondence, financial ledgers, etc.) in one single database data file. In this case, one record is composed of a portion of one data file.
  • EXAMPLE 13
  • Again, the parish church has a sophisticated content management software application to manage all of its documents. The content management application stores all documents in one single database data file and all metadata about the documents in a separate database data file. In this case, one record is composed of portions of multiple data files.
  • In Examples 10-13, the intellectual form, content, and number of electronic records remains fixed, while the relationship of those electronic records to data files varies, depending on the particulars of how the parish church manages and uses its data files at a specific point in time.
  • The reason that the relationship varies between a record and data files is that a record has strong semantic coherence, while data files may not have strong semantic coherence. A particular data file might contain many different kinds of information, or even bits and pieces of information, which sometimes cannot be eye readable without significant presentation processing and access software. In other words, semantic coherence is not a requirement for data files per se—the semantic coherence is realized by the presentation processing and access software and the human understanding gained through using that software.
  • The relationship between electronic records and data files, then, is potentially many-to-many at a portion level—a record might be composed of one or more portions of data files, and data files might contain one or more portions of electronic records.
  • Based on Examples 10-13, it should be appreciated that the gap between electronic records (conceptual view) and data files (physical view) must be bridged. As the InterPARES I Preservation Task Force concluded, “Digital data inscribed on a physical medium do not have the form of a record. It is necessary to transform the inscribed bits into the form of the record.” (“Preserving Electronic Records,” Presentation on the work of the InterPARES I Preservation Task Force, Jun. 19, 2002)
  • The present invention provides a solution to the gap between electronic records an data filed by adding a logical view which transforms between the conceptual and physical views. To perform this task, the present invention provides a “digital component extractor.” As used herein, the term “digital component extractor” is defined as a software component that extracts digital components from a data file set, guided by a set of instructions. A “digital component” is defined herein as a set of digital information that exhibits strong semantic coherence and is expressed as a bit stream.
  • The purpose of the digital component extractor is to extract digital components from data files in a data file set that together comprise a record. FIG. 5 illustrates the model, which bridges the gap between electronic records and data files.
  • One implication of this model is that electronic records are composed of digital components (which exhibit strong semantic coherence) and not data files (which can exhibit any range of semantic coherence, including none whatsoever). Another implication is that digital component extractors are instructed as to how to extract digital components from data file sets.
  • Digital component extractors establish the map between data files and electronic records, and because this map is many-to-many, the exact method by which digital component extractors extract digital components varies. Consider the following examples:
  • EXAMPLE 14
  • If there is a one-to-one correspondence between a record and a data file, the digital component extractor simply needs to return the specified data file as the digital component. For example, a digital component extractor for a record that corresponds to a single word processing document data file would simply return that data file as the digital component.
  • EXAMPLE 15
  • If a record is composed of portions from one data file, the digital component extractor includes an algorithm to extract portions of the specified data file. For example, a digital component extractor for a record that corresponds to an e-mail archive data file would extract individual e-mails as digital components.
  • EXAMPLE 16
  • If a record is composed of portions from more than one data file, the digital component extractor includes an algorithm to extract portions of the specified data files. For example, a digital component extractor for a record that corresponds to a document spread across multiple database tables (and data files) in a content management software application would perform appropriate queries on those database tables to extract the digital component.
  • Put another way, digital component extractors contain the instructions necessary to extract digital components from data file sets.
  • Table 2 documents the approaches for specifying digital component extractors, and their advantages and disadvantages.
  • TABLE 2
    Approach Advantages Disadvantages
    The transferring entity defines The transferring entity defines Requires up-front planning and
    the digital component semantic coherence early, investment by the transferring
    extractors early in the records which ensures that the entity, plus a change in how
    lifecycle, as the records are information contained in the the transferring entity manages
    still in active use data files is accessible information
    The transferring entity (with The transferring entity (with Requires a large time and
    assistance from the archivist) assistance from the archivist) resource investment at the
    defines the digital component generally has the subject area exact point (records
    extractors after-the-fact, as domain knowledge and management offices) at which
    part of preparing to transfer technical knowledge to transferring entities are
    the electronic records to ERA properly define semantic overburdened
    coherence
    The ERA system itself The system can make A human might make better
    imputes digital component reasonable assumptions about assumptions than the
    extractors from record type the digital component automated ones, based on
    templates and data type extractors in an automated subjective judgment. Also, the
    templates manner system might not always be
    able to perform this imputation
    (for example, if key
    information is missing)
    An archivist defines the digital The archivist generally has the Requires a large time and
    component extractors after- subject area domain resource investment from the
    the-fact, during archival knowledge and technical archivist, which may not scale
    processing knowledge to properly define to meet the electronic record
    semantic coherence archive's expected ingest
    volumes
    The electronic record archive The system can apply This is an area of on-going
    system itself imputes semantic linguistic and pattern computer science research, and
    coherence and therefore matching algorithms to at this time this requires
    digital component extractors determine appropriate digital further development.
    from the data file content component extractors in an
    automated manner
  • It would be efficient for transferring entities to establish intellectual control over the semantic coherence of their electronic records as they develop their information systems, but this will not always happen. It would also be efficient if transferring entities, with assistance from the archivist, at least defined their electronic records before the point of transfer, but again this will not always happen, because this is a burden on records officers. The system of the present invention imputes digital component extractors from templates as discussed below, and this generally will be acceptable. In the cases where none of these approaches work, the ERA must allow archivists to establish intellectual control over the electronic records at an item level through defining the digital component extractors.
  • Generally, ERA imputing the digital component extractors from the relevant templates will work quite well. Consider this example:
  • EXAMPLE 17
  • The record type template indicates a particular set of records is correspondence, and the data file template indicates the data file is in Microsoft Outlook (PST) format. A reasonable set of digital component extractors can be imputed that extract individual e-mails into separate digital components. Each digital component represents an individual e-mail, which exhibits strong semantic coherence.
  • In some rare cases, there may be no workable digital component extractors, because they are not defined by either the transferring entity or archivist, and the ERA system cannot impute reasonable alternatives. Consider this example:
  • EXAMPLE 18
  • The record type template indicates a particular set of records is geospatial information, and the data file template is in an unknown proprietary format that is not human readable and not documented. ERA cannot impute a reasonable set of digital component extractors because it is not aware of the data type format.
  • In the case where there are no workable digital component extractors, the ERA of the present invention will create a default set of digital component extractors, known as “placeholder digital component extractors,” which are defined as a set of digital component extractors that assume each data file is a single digital component
  • The levels of available preservation, access, and authenticity services that the ERA of the present can provide may be constrained for electronic records with placeholder digital component extractors, so these should be the exception rather than the norm. In other words, placeholder digital component extractors are only consistent with the most basic level of service in ERA.
  • All of the entities modeled by the present invention, such as electronic records, record aggregates, digital components, data files, etc., must be identifiable and resolvable. An approach to identifiers is more fully documented in co-pending, commonly assigned U.S. Application (Attorney Docket 4870-9), filed Apr. 26, 2007, entitled SYSTEM AND METHOD FOR AN IMMUTABLE IDENTIFICATION SCHEME IN A LARGE SCALE COMPUTER SYSTEM.
  • All identifiers within THE ERA must exhibit the following characteristics:
      • The identifier must resolve to the entity which it identifies
      • The identifier must be guaranteed unique across the ERA identifier namespace
      • The identifier for a particular entity must be immutable
      • The identifier system must scale to ten teraobjects
  • An approach to generating identifiers according to the present invention involves using a cryptographic hash algorithm (such as SHA-256) based on the initial content of the thing being identified. This approach meets the required constraints.
  • It should be noted that some entities have an identity which is independent of its content. For example, the identity of a record is independent of the content digital components and/or data files that make up any particular version of that record. New versions of electronic records can arise from redaction and preservation activities, and each record version will have its own independent identifier that is related back to the record.
  • In these cases, the identifier will be generated from the content of the entity when it is first created within ERA and immutable thereafter. Thus, the identifier for electronic records would be generated and assigned when the record is created within ERA based on the content of the first version's digital components, and that identifier would be immutable thereafter.
  • An approach to preservation and authenticity issues are more fully documented in co-pending, commonly assigned U.S. application (Attorney Docket 4870-25), entitled SYSTEM AND METHOD FOR PRESERVATION OF DIGITAL RECORDS.
  • The notion of digital components and digital component extractors has some interesting implications for preservation. The InterPARES I Preservation Task Force states “It is impossible to preserve an electronic record. It is only possible to preserve the ability to reproduce an electronic record.” (“Preserving Electronic Records”, Presentation on the work of the InterPARES I Preservation Task Force, Jun. 19, 2002.) A record's digital components, along with access software, allow reproduction of the electronic record. As such, the preservation strategy of the present invention ensures the digital component extractors produce digital components that authentically represent the record. This means that digital component extractors must honor the essential characteristics associated with the record (and which are specified in the record type template).
  • The process of redaction involves deleting specific content from a record to produce a new version of the record, and the new version of the record typically has reduced access restrictions.
  • In the electronic record context, digital content is contained in both data files and digital components, so in theory redaction (deleting digital content) could occur in either place. In practice, most redaction tools redact content from data files, so the present invention will support this approach. This means that redaction will occur against data files, which will produce a new version of the data files, and the digital component extractors will produce new digital components from these redacted data files. This process will result in a new version of the record, that is composed of redacted digital components that have been extracted from redacted data files.
  • Like records, original order and arrangement are conceptual and not physical. Thus, order and arrangement both apply to records, but not data files. The order of data files is essentially arbitrary and meaningless from an archival context, since data files exhibit low semantic cohesion.
  • It is possible that electronic records might have no meaningful original order, in the same way paper records might have no meaningful original order. In these cases, the present invention will follow the advice of Frank Boles in “Disrespecting Original Order” to maintain records in a state of simple usability. (Boles, F., “Disrespecting Original Order”, The American Archivist, Vol. 45 No. 1, pp. 26-32, 1982.) Simple usability for electronic records implies dynamic sorting, filtering, and querying capabilities.
  • It is possible that the digital component extractors of the present invention will be executed to produce a physical representation of a digital component. In this case, a digital component would be a bit stream serialized as a managed file within the system. It is also possible that the digital component extractors will be executed on-demand to produce a transient digital component, as needed. In this case, a digital component would be a transient in-memory bit stream. The present invention allow for both options, and the decisions on which to use will be a matter of policy and design.
  • Templates play a large part in NARA's vision of the ERA both as a means to manage electronic records, in respect to scheduling, and as a means to preserve records, in respect to defining preservation formats and processing.
  • Because there are many potential applications of templates, and because templates are sometimes described by examples of documents that conform to the templates rather than the template itself, there is a need to define what templates are and how they are used.
  • As discussed in more detail below, the present invention utilizes a taxonomy of templates and the relationships between templates and instances of templates to identify and manage records. The present invention also utilizes the relationship between hierarchical templates and hierarchical information using a matrix. Furthermore, the present invention provides for managing templates.
  • It is helpful to begin with an example of templates and instances of templates, and to provide an illustrative listing of some kinds of templates that might be used within the ERA system of the present invention.
  • According to the present invention, the use of template may be associated with all of the following:
      • To describe the structure and content of record life cycle documents that the system will help create and manage. This includes templates for Transfer Agreements, Disposition Agreements, Preservation Plans, etc.
      • To describe the presentation of documents.
      • To define the relationship between assets within the archive (such as the original order of records) and within transfers of records to the archive.
      • To describe the structure and content of archival metadata, the contextual information which, together with the digital objects it describes forms the records. This includes archival description elements and life cycle data elements.
      • To describe components and resources within the system itself. Instances of these templates include data type format templates, templates that describe digital adaptation processes, and resources such as Authorities Sources.
      • To describe the operation of ERA system itself. Instances of these templates define operations such as work flow processes that orchestrate the use of ERA system services.
  • It can therefore be seen that templates are being used according to the present invention to:
      • Describe the content and structure of a document—what data elements it should contain and any relationships between those data elements
      • Describe the content and structure of the metadata that describes a document.
      • Describe how a document should be presented to a user, how would its content be laid out on a screen or a printed page, and when appropriate to describe the choreography of the presentation of different digital objects
      • Serve as a manifest to list all the documents contained within some collection of documents.
      • Serve as a catalog of documents describing the relationships between them.
      • Serve as components within the ERA system, providing processing instructions for operations that take place, such as the orchestration of work flows or digital adaptation processing.
      • Describe components of the ERA system, such as specific data type formats.
  • Some of these uses of templates have been described with reference to instantiations of the templates and some have been described with reference to the templates themselves. It is necessary to distinguish between templates and instances of templates.
  • Using XML technologies as an example, an example of templates, and instances of documents that conform to or are generated by those templates that might be used in the preservation and presentation of a document displayed on a web page is provided.
  • The first template is an XML schema that defines the structure of the record catalog which lists the digital objects that are part of the web page and their hierarchical relationships. An instance of that template is a selection from the record catalog for the page in question.
  • Referring to FIG. 6, the next template might be an XML schema that defines the content and structure of the document that is to be displayed on the page. Each data element in the document is defined. The relationship(s) of each data element to other data elements are also defined.
  • Referring to FIG. 7, an instance of the template of FIG. 6 is an XML document (the textual content of the document) that conforms to that schema and which includes the data elements and content of the type defined in the schema. The instance has data elements described in the schema that hold values, which is also consistent with the schema.
  • Referring to FIG. 8, the next template might be an XSL template that defines the presentation of that XML instance in HTML on the web page (or as in some other format such as PDF). The XSL template may be a spreadsheet, or other type of template, and can be used to describe how an XML instance that conforms to an XML shema will be presented or displayed, for example as HTML or a PDF file. The template can also be used to transform an XML document into a variety of other formats, as well as into a different XML document.
  • Other types of templates, may orchestrate a sequence of pages. The instantiation of that template is the web page—which is the record that is being preserved.
  • Additional templates may be involved in defining the behavior of a web application, including templates that define the work flow within the application, templates that define the orchestration of pages within the application and templates that describe the animation of items on a page.
  • Table 3 provides an overview of some of the types of templates that may occur in the ERA of the present invention. Although each example has been mapped to an appropriate XML syntax that might be used to create the template, it should be appreciated that the present invention is not limited to the use of any particular format. It should also be appreciated that the list of templates Table 3 is not intended to be exhaustive. There are many possible applications for templates and there are other XML technologies, and non-XML technologies, which may be used.
  • TABLE 3
    Indicative
    XML
    Application of Template Syntax Examples
    1. Record Structure Templates
    Structure of Records; Record XML Record Catalog
    Catalog entries Schema, Submission Information Package
    METS
    2. Lifecycle Documents
    Structure and content of Life XML Transfer Agreement
    Cycle documents Schema Disposition Agreement
    Preservation Plan
    Layout of documents on XSL, XSL- Presentation of documents
    screen or paper FO
    3. Archival Metadata (information specific to a record or a part of a record)
    Structure and content of XML Origin, Provenance, Content, Context, etc.
    Archival Description Schema
    Structure and content of Life XML Additions to life cycle data
    cycle Data Schema
    4. System Components (an information component of the system, or description of a
    component of the system)
    Structure of Authority XML Authority Sources
    Sources and Thesauri Schema
    Structure and content of XML Persistent Formats where content is
    Persistent Object Formats Schema primarily words, numbers, vectors etc.
    (POF) *(1) BSDL Persistent Formats where content is
    primarily images, sound, etc.
    Digital Adaptation XSL/T Data type specific processing templates
    Instructions to transform from one data type to
    non-exhaustive list *(2) another
    Presentation of multimedia SMIL Templates to define interactions
    records between multiple digital items in
    multimedia presentations
    5. System Metadata
    Description and versioning of XML Disposition Agreement template
    templates Schema
    6. Identity & Rights
    Structure and content of User XML User profiles
    Profiles Schema
    Authorization Requests/ SAML Authorization of users
    Responses
    Access Restrictions & Rights XACML Definition of access privileges for
    specific records
    7. Service Architecture
    Work flow Processes BPEL Orchestration of services involved in
    business processes, such as managing a
    FOIA request
    Services WSDL Inputs and outputs of individual
    services
  • Templates may be used to define the relationships between records in the archives, such as defining the original order of records, the structure of the record catalog, and the structure of transfers to the archives or the delivery of copies to users (Submission Information Packages and Dissemination Information Packages).
  • Capturing the original order of a record represents a case where a template can be used within a template. The structure of the Record Catalog can be described in a template that defines the information elements that make up an entry in the catalog. The content of some of those information elements may be other templates, or they may be become values in the instantiation of an object that conforms to another template.
  • Templates may be used to define the content and structure of records schedules and other Life Cycle Documents.
  • Templates may be used to define the structure of record description, and the elements of information that compose the metadata of records.
  • A template for Archival Metadata, which includes description and Life cycle data, will define which elements of information that must be present, what type of information they should contain, and how they are related to each other.
  • Templates may be used as inputs to processes that transform digital objects in the archive, including templates that may be used to define the presentation of assets to users.
  • The System component templates cover the widest variety of use of templates. This includes defining persistent object formats, defining the information needed by a processor to render those formats in a current format, defining the choreography and behaviors of objects in aggregate multimedia records, etc.
  • The System Components will be constantly evolving, adding new templates as new digital technologies evolve. Each type of system component will have its own family of templates.
  • Templates may be used to define the structure of component description. The ERA system will archive itself and be self-describing. Templates will define elements of information needed for components to be self describing.
  • Templates may also be used to define the nature and rights of entities and the access restrictions on assets in the archive.
  • A records-centric access model will define restrictions and rights in relation to records using the internal structure of the records themselves. Templates will define the instructions on records and create the framework for aligning identity—role—authorization to protect the records.
  • Templates may further be used to describe system services and orchestrate services within work flow processes.
  • The Service Architecture describes the arrangement and delivery of services in the ERA system of the present invention, including the work flow processes and the functionality at each step in the process. Templates, expressed for example in Business Process Execution Language (BPEL), may be used to describe the orchestration of functional services, and at a lower level, describe the inputs and outputs to each individual functional services, using for example Web Services Description Language (WSDL).
  • A hierarchical scheme according to the present invention may be implemented for managing templates. The introduction of hierarchy to the management of templates adds another level of abstraction. A template abstracts from a specific instance to the general case. Such a template is associated to a single type of object. With hierarchy, another layer of abstraction may be added that can be applied to any of: 1) the template, 2) the content which it controls, or 3) both.
  • As an object subject to a hierarchical arrangement the template becomes a mirror of the organization of objects into increasing larger aggregate structures which is a method of organization common to the ERA system of the present invention as a whole.
  • Templates can have a hierarchical connotation either because: (a) the template itself can only be instantiated with reference to a hierarchy of templates which collectively define its content, or (b) the object the template describes can only be instantiated with reference to a hierarchy of digital items or conceptual arrangements of digital items.
  • In the first case (a), instantiating the template requires retrieving elements from within different templates within a hierarchy. For example, Life Cycle Data document templates (Transfer Agreements, Disposition Agreements, etc) will have their own specific information elements but will also likely share a set of information elements common to all Life Cycle Data documents.
  • The template hierarchy might look like:
  • ERA.xsd (elements common to the ERA, such as identifiers)
      • Life_Cycle_Documents.xsd (elements common to all Life Cycle documents)
        • Transfer_Agreement.xsd (e.g. SF-258 specific elements)
        • Disposition_Agreement.xsd (e.g. SF-115 specific elements)
        • Preservation_Plan.xsd (elements specific to this template).
  • In XML Schema, this may be implemented by having each template in each child level of the template hierarchy begin with an <include/> instruction that incorporates in the child template all the data elements described in its parent, which in turn will <include/> all the data elements in its parent, etc.
  • In the second case (b), to instantiate a document that conforms to a template requires retrieving elements of information from hierarchically organized assets within the archive.
  • For example the template for archival metadata may include elements of information some of which are associated to a record catalog item that represents the conceptual concept of the entire record (the parent or root element of the record) while other elements of information are associated to individual digital items that are components of the record.
  • To create a document that represents the archival metadata for a specific digital item, and which conforms to the archival metadata template, requires retrieving all the information elements from each level in the record's internal hierarchy from that digital item up to the record's “root”.
  • For example, suppose that the family of a noted physicist donates her personal papers to NARA. The record hierarchy that might look like:
  • Curie Collection
      Family Papers
        Professional Papers
          Research Activities
            Reagents
  • Metadata that describes the <Origin> of the record will likely be associated with the highest level in the record hierarchy, the “//Curie Collection” level, as the description of <Origin> applies to all the documents in that collection.
  • Metadata that describes the <Digital Object Type> of a specific document will be associated with a specific document, such as “//Curie Collection/Professional Papers/Research Activities/Reagents”.
  • To create an instance of the metadata for the “//Reagents” document requires the accretion of the metadata for itself and all its ancestors as we traverse the record hierarchy up to the collection level.
  • The possible intersections of templates and hierarchies can be presented in a matrix as shown in Table 4. Along one axis are the templates; either derived from a hierarchy or self-contained. Along the other axis are the conforming content, again either derived from a hierarchy or self-contained.
  • The matrix below illustrates where some types of templates may fall in the matrix.
  • TABLE 4
    Content Axis
    Template Axis Template is Life Cycle Document templates, Archival metadata, the schema
    Hierarchical where template is Life Cycle for metadata may be instantiated
    The template is an Document + generic Life Cycle by aggregating schemas within a
    aggregation of template Elements hierarchy of metadata schemas,
    elements from a and the conforming metadata
    hierarchy of templates. document may be created from
    Document conformance the aggregation of all metadata
    cannot be tested without elements traversing a record
    including elements from hierarchy.
    the hierarchy.
    Template is Self- System metadata, such as n/a
    Contained persistent format definitions
    The template is a self- Service Architecture templates;
    contained object. both the hierarchy of BPEL
    Document conformance managing WSDL, and within
    can be tested without WSDL the aggregation of generic
    reference to any other WSDL and the web service
    template. specific elements described in
    XML Schema
    Content Self-Contained Content Hierarchal
    An object that conforms to the The creation of an object that
    template is a self-contained object in conforms to the template is achieved
    its own right and conformance can be by retrieving all references to it from
    tested without reference to the each layer in the hierarchy. The
    hierarchy to which it belongs. conforming object accretes its content
    as it traverses the hierarchal tree and
    is only conforming at the end of the
    accretion process.
  • In a self-describing system, each template is both a functional component of the system and a record in the system. As a record in the system, the template is treated the same as any other record, with its own metadata, life cycle management, and preservation. The ERA system of the present invention may be regarded, therefore, as an aggregate record, with its own hierarchy of documents, so that part of our ERA record hierarchy might look like
  •   ERA
        System
          Templates
            System
              Workflow
                DispositionWorkflow.bpel (instance of
    BPEL template)
                  AddDescriptionService.wdsl
    (instance of WSDL template)
  • Each instance of a system component, including templates, has its own archival metadata (metadata that describes a record). This latter metadata makes the component self describing.
  • For example, a WSDL file is an instance of the template for defining a service and a BPEL file is an instance of the template that defines a work flow.
  • The archival metadata of the WSDL file will include information such as;
      • What does it do?
      • What work flow does it belong to?
      • What version is this, is it the current version?
      • How does it work—inputs, outputs?
      • Where did the code originate?
      • Are there are intellectual rights associated to this web service?
      • What is the actual code?
  • This sort of information could be included in the WSDL file as comments (or <Documentation/> elements) but would not be very manageable as a result. The system would not be able to apply its record management functionality to its own templates, which is based on archival metadata held exterior to the digital object the metadata describes,
  • To make description of the system components manageable, they should be described using the same archival metadata templates as for any record.
  • While there will be a defined template for a service in the ERA (such as the XML Schema for WSDL), the present invention may use another template, the Archival Metadata schema, as the template to describe the service as a component of the system.
  • As templates evolve, the life cycle data elements in their description capture that evolution, such as the version. When a change to a template changes the behavior of the system, the earlier version of the template is preserved as a record so that the previous behavior of the system can be understood.
  • Templates will evolve as ERA evolves. As such templates, as records in ERA, will be versioned and managed. Life cycle data elements or records will include the version of the templates they use. Versioning will allow new templates to be introduced without creating problems with validation. Whether life cycle content that is subject to validation against templates should be updated as templates evolve will be a policy decision applied to each template.
  • Each process to update a template may be a standard work flow in the ERA, and described in its own template, which will include appropriate approval and authorization steps as determined in policy.
  • Templates, as records, will have their own fixity information to ensure their integrity and the life cycle data of objects modified by templates will record which version of which template was used.
  • The concept of managing templates can be extended to apply to every component of the system. Each software component of the ERA system should be described and held in the ERA. This applies to platform applications, web application components, any client side components, as well as all the functionality wrapped in web services which can be managed within the concept of managing templates as described above.
  • The concept of preserving original arrangement to the system can also be extended so as to describe in Archival Metadata how all the components are structurally linked—creating in essence a schema for the ERA itself.
  • While the invention has been described in connection with what are presently considered to be the most practical and preferred embodiments, it is to be understood that the invention is not to be limited to the disclosed embodiments, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the invention. Also, the various embodiments described above may be implemented in conjunction with other embodiments, e.g., aspects of one embodiment may be combined with aspects of another embodiment to realize yet other embodiments.

Claims (20)

1. A method for managing electronic records, each electronic record comprising a data file, a plurality of data files, a portion of a data file, or portions of a plurality of data files, the electronic records comprising a plurality of record types and data file types, the method comprising:
forming a data file set comprising one or more logically related data files;
identifying attributes of each record type in a record type template;
identifying specifications of each data file type in a data file type template;
extracting digital components from the data file set, wherein the extracted digital components relate to the attributes in each record type template and the specifications in each data file type template and comprise an individual record.
2. A method according to claim 1, further comprising:
specifying in each record type template characteristics of authenticity of each record type.
3. A method according to claim 1, wherein the data files of the data file set are logically related for purposes of accessing the extracted digital components.
4. A method according to claim 3, wherein accessing the extracted digital components comprises presenting the individual record in human understandable form.
5. A method according to claim 3, wherein accessing the individual record comprises transforming, consolidating, tabulating, formatting, rendering, querying, filtering, and/or interpreting the individual record.
6. A method according to claim 4, wherein presenting the individual record comprises presenting the record perceptible to human senses.
7. A method according to claim 1, wherein the data files of the data file set are logically related by a manner of presentation.
8. A method according to claim 3, wherein the specifications of each data file type comprise instructions for accessing the individual record.
9. A method according to claim 1, wherein the data files of the data file set are logically related by information contained in the data files.
10. A method according to claim 1, further comprising:
extracting default digital components from the data file set when attributes of a record type and/or specifications of a data file type are unavailable.
11. An electronic record archive for managing electronic records, each electronic record comprising a data file, a plurality of data files, a portion of a data file, or portions of a plurality of data files, the electronic records comprising a plurality of record types and data file types, the electronic record archive comprising:
a data file set comprising one or more logically related data files;
a record type template for each record type, each record type template identifying attributes of each record type;
a data file type template for each data file type, each data file type template identifying specifications of each data file type; and
a digital component extractor configured to extract digital components from the data file set, wherein the extracted digital components relate to the attributes in each record type template and the specifications in each data file type template and comprise an individual record.
12. An electronic record archive according to claim 11, wherein each record type template specifies characteristics of authenticity of each record type.
13. An electronic record archive according to claim 11, wherein the data files of the data file set are logically related for purposes of accessing the extracted digital components.
14. An electronic record archive according to claim 13, further comprising an accessing component configured to present the individual record in human understandable form.
15. An electronic record archive according to claim 13, further comprising an accessing component configured to access the individual record by transformation, consolidation, tabulation, formation, rendition, questioning, filtering, and/or interpretation of the individual record.
16. An electronic record archive according to claim 14, wherein the accessing component is configured to present the individual record perceptible to human senses.
17. An electronic record archive according to claim 11, wherein the data files of the data file set are logically related by a manner of presentation.
18. An electronic record archive according to claim 13, wherein the specifications of each data file type comprise instructions for accessing the individual record.
19. An electronic record archive according to claim 11, wherein the data files of the data file set are logically related by information contained in the data files.
20. An electronic record archive according to claim 11, wherein the digital component extractor is configured to extract default digital components from the data file set when attributes of a record type and/or specifications of a data file type are unavailable
US11/797,644 2006-05-05 2007-05-04 System and method for managing records through establishing semantic coherence of related digital components including the identification of the digital components using templates Abandoned US20070260575A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/797,644 US20070260575A1 (en) 2006-05-05 2007-05-04 System and method for managing records through establishing semantic coherence of related digital components including the identification of the digital components using templates

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US79775406P 2006-05-05 2006-05-05
US80287506P 2006-05-24 2006-05-24
US11/797,644 US20070260575A1 (en) 2006-05-05 2007-05-04 System and method for managing records through establishing semantic coherence of related digital components including the identification of the digital components using templates

Publications (1)

Publication Number Publication Date
US20070260575A1 true US20070260575A1 (en) 2007-11-08

Family

ID=38442289

Family Applications (8)

Application Number Title Priority Date Filing Date
US11/785,814 Abandoned US20080005194A1 (en) 2006-05-05 2007-04-20 System and method for immutably cataloging and storing electronic assets in a large scale computer system
US11/790,560 Expired - Fee Related US7711702B2 (en) 2006-05-05 2007-04-26 System and method for immutably cataloging electronic assets in a large-scale computer system
US11/790,562 Active 2028-03-28 US7783596B2 (en) 2006-05-05 2007-04-26 System and method for an immutable identification scheme in a large-scale computer system
US11/790,561 Expired - Fee Related US7711703B2 (en) 2006-05-05 2007-04-26 System and method for immutably storing electronic assets in a large-scale computer system
US11/797,278 Active 2028-08-01 US7792791B2 (en) 2006-05-05 2007-05-02 Systems and methods for establishing authenticity of electronic records in an archives system
US11/797,644 Abandoned US20070260575A1 (en) 2006-05-05 2007-05-04 System and method for managing records through establishing semantic coherence of related digital components including the identification of the digital components using templates
US11/797,643 Active 2030-12-19 US8726351B2 (en) 2006-05-05 2007-05-04 Systems and methods for controlling access to electronic records in an archives system
US11/797,567 Expired - Fee Related US8087063B2 (en) 2006-05-05 2007-05-04 System and method for preservation of digital records

Family Applications Before (5)

Application Number Title Priority Date Filing Date
US11/785,814 Abandoned US20080005194A1 (en) 2006-05-05 2007-04-20 System and method for immutably cataloging and storing electronic assets in a large scale computer system
US11/790,560 Expired - Fee Related US7711702B2 (en) 2006-05-05 2007-04-26 System and method for immutably cataloging electronic assets in a large-scale computer system
US11/790,562 Active 2028-03-28 US7783596B2 (en) 2006-05-05 2007-04-26 System and method for an immutable identification scheme in a large-scale computer system
US11/790,561 Expired - Fee Related US7711703B2 (en) 2006-05-05 2007-04-26 System and method for immutably storing electronic assets in a large-scale computer system
US11/797,278 Active 2028-08-01 US7792791B2 (en) 2006-05-05 2007-05-02 Systems and methods for establishing authenticity of electronic records in an archives system

Family Applications After (2)

Application Number Title Priority Date Filing Date
US11/797,643 Active 2030-12-19 US8726351B2 (en) 2006-05-05 2007-05-04 Systems and methods for controlling access to electronic records in an archives system
US11/797,567 Expired - Fee Related US8087063B2 (en) 2006-05-05 2007-05-04 System and method for preservation of digital records

Country Status (3)

Country Link
US (8) US20080005194A1 (en)
EP (8) EP1855218A3 (en)
CA (8) CA2587457A1 (en)

Cited By (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080065498A1 (en) * 2006-08-28 2008-03-13 Avaya Technology Llc Orchestration Engine as an Intermediary Between Telephony Functions and Business Processes
US20080281836A1 (en) * 2007-02-06 2008-11-13 Access Systems Americas, Inc. system and method for displaying and navigating content on a electronic device
US20080301123A1 (en) * 2007-05-31 2008-12-04 Schneider James P Distributing data across different backing data stores
US20090248730A1 (en) * 2008-03-27 2009-10-01 Microsoft Corporation Data Binding for XML Schemas
US20100011009A1 (en) * 2008-07-08 2010-01-14 Caterpillar Inc. System and method for monitoring document conformance
US20100287173A1 (en) * 2009-05-11 2010-11-11 Red Hat, Inc. Searching Documents for Successive Hashed Keywords
US20100287172A1 (en) * 2009-05-11 2010-11-11 Red Hat, Inc . Federated Document Search by Keywords
US20100287187A1 (en) * 2007-02-14 2010-11-11 Donglin Wang Method for query based on layout information
US20100287171A1 (en) * 2009-05-11 2010-11-11 Red Hat, Inc. Federated Indexing from Hashed Primary Key Slices
US20100318969A1 (en) * 2009-06-16 2010-12-16 Lukas Petrovicky Mechanism for Automated and Unattended Process for Testing Software Applications
US7933930B1 (en) * 2008-04-08 2011-04-26 United Services Automobile Association (Usaa) Systems and methods for creating documents from templates
US20110164820A1 (en) * 2008-01-09 2011-07-07 Stephen Schneider Records Management System and Method
US20110191383A1 (en) * 2010-02-01 2011-08-04 Oracle International Corporation Orchestration of business processes using templates
US20110218925A1 (en) * 2010-03-05 2011-09-08 Oracle International Corporation Change management framework in distributed order orchestration system
US20110218921A1 (en) * 2010-03-05 2011-09-08 Oracle International Corporation Notify/inquire fulfillment systems before processing change requests for adjusting long running order management fulfillment processes in a distributed order orchestration system
US20110218924A1 (en) * 2010-03-05 2011-09-08 Oracle International Corporation Distributed order orchestration system for adjusting long running order management fulfillment processes with delta attributes
US20110218923A1 (en) * 2010-03-05 2011-09-08 Oracle International Corporation Task layer service patterns for adjusting long running order management fulfillment processes for a distributed order orchestration system
US20110219218A1 (en) * 2010-03-05 2011-09-08 Oracle International Corporation Distributed order orchestration system with rollback checkpoints for adjusting long running order management fulfillment processes
US20110218926A1 (en) * 2010-03-05 2011-09-08 Oracle International Corporation Saving order process state for adjusting long running order management fulfillment processes in a distributed order orchestration system
US20110218842A1 (en) * 2010-03-05 2011-09-08 Oracle International Corporation Distributed order orchestration system with rules engine
US20110218922A1 (en) * 2010-03-05 2011-09-08 Oracle International Corporation Cost of change for adjusting long running order management fulfillment processes for a distributed order orchestration sytem
US20110218927A1 (en) * 2010-03-05 2011-09-08 Oracle International Corporation Compensation patterns for adjusting long running order management fulfillment processes in an distributed order orchestration system
US20110218813A1 (en) * 2010-03-05 2011-09-08 Oracle International Corporation Correlating and mapping original orders with new orders for adjusting long running order management fulfillment processes
US8037101B1 (en) * 2008-04-08 2011-10-11 United Services Automobile Association (Usaa) Systems and methods for creating documents from templates
US8051103B1 (en) * 2008-04-08 2011-11-01 United Services Automobile Association (Usaa) Systems and methods for creating documents from templates
US20130086568A1 (en) * 2011-09-30 2013-04-04 Oracle International Corporation Optimizations using a bpel compiler
US8438146B2 (en) * 2011-06-30 2013-05-07 International Business Machines Corporation Generating containers for electronic records based on configurable parameters
US20140164895A1 (en) * 2012-12-11 2014-06-12 SmartOrg. Inc. Systems and methods for managing spreadsheet models
US8762322B2 (en) 2012-05-22 2014-06-24 Oracle International Corporation Distributed order orchestration system with extensible flex field support
US8788542B2 (en) 2008-02-12 2014-07-22 Oracle International Corporation Customization syntax for multi-layer XML customization
US8799319B2 (en) 2008-09-19 2014-08-05 Oracle International Corporation System and method for meta-data driven, semi-automated generation of web services based on existing applications
US8875306B2 (en) 2008-02-12 2014-10-28 Oracle International Corporation Customization restrictions for multi-layer XML customization
US20140344313A1 (en) * 2013-05-15 2014-11-20 Oracle International Corporation Migration of data objects
US8966465B2 (en) 2008-02-12 2015-02-24 Oracle International Corporation Customization creation and update for multi-layer XML customization
US8996658B2 (en) 2008-09-03 2015-03-31 Oracle International Corporation System and method for integration of browser-based thin client applications within desktop rich client architecture
US9053109B1 (en) * 2011-09-15 2015-06-09 Symantec Corporation Systems and methods for efficient data storage for content management systems
US20150172405A1 (en) * 2013-12-13 2015-06-18 Oracle International Corporation System and method for providing data interoperability in a distributed data grid
US9122520B2 (en) 2008-09-17 2015-09-01 Oracle International Corporation Generic wait service: pausing a BPEL process
US20170068691A1 (en) * 2015-05-29 2017-03-09 International Business Machines Corporation Determining a storage location according to legal requirements
US9658901B2 (en) 2010-11-12 2017-05-23 Oracle International Corporation Event-based orchestration in distributed order orchestration system
US9672560B2 (en) 2012-06-28 2017-06-06 Oracle International Corporation Distributed order orchestration system that transforms sales products to fulfillment products
US9760528B1 (en) 2013-03-14 2017-09-12 Glue Networks, Inc. Methods and systems for creating a network
US9780965B2 (en) 2008-05-27 2017-10-03 Glue Networks Methods and systems for communicating using a virtual private network
US9785412B1 (en) 2015-02-27 2017-10-10 Glue Networks, Inc. Methods and systems for object-oriented modeling of networks
US9928082B1 (en) 2013-03-19 2018-03-27 Gluware, Inc. Methods and systems for remote device configuration
US10235417B1 (en) * 2015-09-02 2019-03-19 Amazon Technologies, Inc. Partitioned search of log events
US20190129697A1 (en) * 2017-10-31 2019-05-02 EMC IP Holding Company LLC Management of data using templates
US10552769B2 (en) 2012-01-27 2020-02-04 Oracle International Corporation Status management framework in a distributed order orchestration system
US10719492B1 (en) 2016-12-07 2020-07-21 GrayMeta, Inc. Automatic reconciliation and consolidation of disparate repositories
US10853359B1 (en) 2015-12-21 2020-12-01 Amazon Technologies, Inc. Data log stream processing using probabilistic data structures
US11086901B2 (en) 2018-01-31 2021-08-10 EMC IP Holding Company LLC Method and system for efficient data replication in big data environment
US11621857B2 (en) 2020-09-03 2023-04-04 Seagate Technology Llc Fingerprint and provenance for movable storage devices

Families Citing this family (294)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7350390B2 (en) 2000-08-17 2008-04-01 Industrial Origami, Inc. Sheet material with bend controlling displacements and method for forming the same
US8935307B1 (en) 2000-09-12 2015-01-13 Hewlett-Packard Development Company, L.P. Independent data access in a segmented file system
US6782389B1 (en) * 2000-09-12 2004-08-24 Ibrix, Inc. Distributing files across multiple, permissibly heterogeneous, storage devices
US7836017B1 (en) * 2000-09-12 2010-11-16 Hewlett-Packard Development Company, L.P. File replication in a distributed segmented file system
US7464072B1 (en) 2001-06-18 2008-12-09 Siebel Systems, Inc. Method, apparatus, and system for searching based on search visibility rules
US20050171797A1 (en) * 2004-02-04 2005-08-04 Alcatel Intelligent access control and warning system for operations management personnel
US7865519B2 (en) 2004-11-17 2011-01-04 Sap Aktiengesellschaft Using a controlled vocabulary library to generate business data component names
US8019333B2 (en) * 2005-03-14 2011-09-13 Qualcomm Incorporated Apparatus and methods for product acceptance testing on a wireless device
US8356053B2 (en) * 2005-10-20 2013-01-15 Oracle International Corporation Managing relationships between resources stored within a repository
US8924335B1 (en) 2006-03-30 2014-12-30 Pegasystems Inc. Rule-based user interface conformance methods
US20080005194A1 (en) * 2006-05-05 2008-01-03 Lockheed Martin Corporation System and method for immutably cataloging and storing electronic assets in a large scale computer system
US8762418B1 (en) * 2006-05-31 2014-06-24 Oracle America, Inc. Metadata that allows refiltering and data reclassification without accessing the data
US20070299828A1 (en) * 2006-06-05 2007-12-27 Digital Mountain, Inc. Method and Apparatus for Processing Heterogeneous Data
US7823064B1 (en) * 2007-08-30 2010-10-26 Adobe Systems Incorporated Interleaving compressed archives within a page description language file
US9135322B2 (en) 2006-09-18 2015-09-15 Emc Corporation Environment classification
US20080082960A1 (en) * 2006-09-29 2008-04-03 Mcdougal Monty D Method and System For Controlling The Release of Data For Multiple-Level Security Systems
JP4089742B2 (en) * 2006-10-13 2008-05-28 富士ゼロックス株式会社 Document management system and document disposal apparatus
US9183321B2 (en) * 2006-10-16 2015-11-10 Oracle International Corporation Managing compound XML documents in a repository
US7765195B2 (en) * 2006-11-07 2010-07-27 Microsoft Corporation Trimmed and merged search result sets in a versioned data environment
US7840537B2 (en) 2006-12-22 2010-11-23 Commvault Systems, Inc. System and method for storing redundant information
US20080172737A1 (en) * 2007-01-11 2008-07-17 Jinmei Shen Secure Electronic Medical Record Management Using Hierarchically Determined and Recursively Limited Authorized Access
US20080184131A1 (en) * 2007-01-31 2008-07-31 Solar Turbines Inc. Method for providing an asset criticality tool
US8615567B2 (en) * 2007-02-20 2013-12-24 International Business Machines Corporation Systems and methods for services exchange
JP4810469B2 (en) * 2007-03-02 2011-11-09 株式会社東芝 Search support device, program, and search support system
US9880980B2 (en) * 2007-03-05 2018-01-30 International Business Machines Corporation Document transformation performance via incremental fragment transformations
US8726235B2 (en) * 2007-03-29 2014-05-13 Verizon Patent And Licensing Inc. Telecom business-oriented taxonomy for reusable services
US20080270381A1 (en) * 2007-04-24 2008-10-30 Interse A/S Enterprise-Wide Information Management System for Enhancing Search Queries to Improve Search Result Quality
US7752207B2 (en) * 2007-05-01 2010-07-06 Oracle International Corporation Crawlable applications
US7664779B1 (en) * 2007-06-29 2010-02-16 Emc Corporation Processing of a generalized directed object graph for storage in a relational database
US8086646B2 (en) * 2007-07-20 2011-12-27 Sap Ag Scheme-based identifier
US8091138B2 (en) * 2007-09-06 2012-01-03 International Business Machines Corporation Method and apparatus for controlling the presentation of confidential content
US9461890B1 (en) 2007-09-28 2016-10-04 Emc Corporation Delegation of data management policy in an information management system
US9323901B1 (en) 2007-09-28 2016-04-26 Emc Corporation Data classification for digital rights management
US9134916B1 (en) * 2007-09-28 2015-09-15 Emc Corporation Managing content in a distributed system
US8868720B1 (en) 2007-09-28 2014-10-21 Emc Corporation Delegation of discovery functions in information management system
US8548964B1 (en) * 2007-09-28 2013-10-01 Emc Corporation Delegation of data classification using common language
US9141658B1 (en) 2007-09-28 2015-09-22 Emc Corporation Data classification and management for risk mitigation
JP5040580B2 (en) * 2007-10-18 2012-10-03 富士ゼロックス株式会社 Document management system and program
JP2009099073A (en) * 2007-10-19 2009-05-07 Fuji Xerox Co Ltd Document processing history management system, document processing history management device and program
US8832076B2 (en) 2007-10-19 2014-09-09 Oracle International Corporation Search server architecture using a search engine adapter
EP2270724A3 (en) * 2007-10-22 2013-01-16 Open Text S.A. Method and system for managing enterprise content
US8856313B2 (en) * 2007-11-13 2014-10-07 International Business Machines Corporation Systems and methods for using provenance information for data retention in stream-processing
US7991777B2 (en) 2007-12-03 2011-08-02 Microsoft International Holdings B.V. Method for improving search efficiency in enterprise search system
US7974965B2 (en) * 2007-12-17 2011-07-05 International Business Machines Corporation Federated pagination management
US7913167B2 (en) * 2007-12-19 2011-03-22 Microsoft Corporation Selective document redaction
US20090164528A1 (en) * 2007-12-21 2009-06-25 Dell Products L.P. Information Handling System Personalization
US9128946B2 (en) * 2007-12-31 2015-09-08 Mastercard International Incorporated Systems and methods for platform-independent data file transfers
US9589250B2 (en) * 2008-01-11 2017-03-07 Oracle International Corporation System and method for asset registration workflows utilizing an eventing infrastructure in a metadata repository
US9652346B2 (en) 2008-01-24 2017-05-16 Symcor Inc. Data consistency control method and software for a distributed replicated database system
US8387115B2 (en) * 2008-02-21 2013-02-26 Syracuse University Active access control system and method
US8849765B2 (en) * 2008-04-22 2014-09-30 Anne Marina Faggionato System and method for providing a permanent data record for a creative work
FR2932043B1 (en) * 2008-06-03 2010-07-30 Groupe Ecoles Telecomm METHOD FOR TRACEABILITY AND RESURGENCE OF PUSH-STARTED FLOWS ON COMMUNICATION NETWORKS, AND METHOD FOR TRANSMITTING INFORMATION FLOW TO SECURE DATA TRAFFIC AND ITS RECIPIENTS
US7996429B2 (en) * 2008-06-12 2011-08-09 Novell, Inc. Mechanisms to persist hierarchical object relations
US8214765B2 (en) * 2008-06-20 2012-07-03 Microsoft Corporation Canvas approach for analytics
US9172709B2 (en) * 2008-06-24 2015-10-27 Raytheon Company Secure network portal
US8001154B2 (en) * 2008-06-26 2011-08-16 Microsoft Corporation Library description of the user interface for federated search results
US8646027B2 (en) * 2008-06-27 2014-02-04 Microsoft Corporation Workflow based authorization for content access
US7996394B2 (en) 2008-07-17 2011-08-09 International Business Machines Corporation System and method for performing advanced search in service registry system
US7966320B2 (en) 2008-07-18 2011-06-21 International Business Machines Corporation System and method for improving non-exact matching search in service registry system with custom dictionary
US8359357B2 (en) * 2008-07-21 2013-01-22 Raytheon Company Secure E-mail messaging system
US8972463B2 (en) * 2008-07-25 2015-03-03 International Business Machines Corporation Method and apparatus for functional integration of metadata
US8943087B2 (en) * 2008-07-25 2015-01-27 International Business Machines Corporation Processing data from diverse databases
US9110970B2 (en) * 2008-07-25 2015-08-18 International Business Machines Corporation Destructuring and restructuring relational data
US9727628B2 (en) * 2008-08-11 2017-08-08 Innography, Inc. System and method of applying globally unique identifiers to relate distributed data sources
US8219572B2 (en) 2008-08-29 2012-07-10 Oracle International Corporation System and method for searching enterprise application data
US8296317B2 (en) 2008-09-15 2012-10-23 Oracle International Corporation Searchable object network
US8335778B2 (en) 2008-09-17 2012-12-18 Oracle International Corporation System and method for semantic search in an enterprise application
US8046337B2 (en) * 2008-10-15 2011-10-25 International Business Machines Corporation Preservation aware fixity in digital preservation
US8230228B2 (en) * 2008-10-31 2012-07-24 International Business Machines Corporation Support of tamper detection for a log of records
US8229775B2 (en) 2008-11-06 2012-07-24 International Business Machines Corporation Processing of provenance data for automatic discovery of enterprise process information
US8209204B2 (en) 2008-11-06 2012-06-26 International Business Machines Corporation Influencing behavior of enterprise operations during process enactment using provenance data
US9053437B2 (en) 2008-11-06 2015-06-09 International Business Machines Corporation Extracting enterprise information through analysis of provenance data
US8135723B2 (en) * 2008-11-12 2012-03-13 Microsoft Corporation Leveraging low-latency memory access
US9996567B2 (en) 2014-05-30 2018-06-12 Georgetown University Process and framework for facilitating data sharing using a distributed hypergraph
US11226945B2 (en) 2008-11-14 2022-01-18 Georgetown University Process and framework for facilitating information sharing using a distributed hypergraph
US9805123B2 (en) * 2008-11-18 2017-10-31 Excalibur Ip, Llc System and method for data privacy in URL based context queries
US8572110B2 (en) * 2008-12-04 2013-10-29 Microsoft Corporation Textual search for numerical properties
US8359641B2 (en) * 2008-12-05 2013-01-22 Raytheon Company Multi-level secure information retrieval system
US8386565B2 (en) * 2008-12-29 2013-02-26 International Business Machines Corporation Communication integration between users in a virtual universe
US8140556B2 (en) * 2009-01-20 2012-03-20 Oracle International Corporation Techniques for automated generation of queries for querying ontologies
US8090683B2 (en) * 2009-02-23 2012-01-03 Iron Mountain Incorporated Managing workflow communication in a distributed storage system
US8397051B2 (en) * 2009-02-23 2013-03-12 Autonomy, Inc. Hybrid hash tables
US8145598B2 (en) 2009-02-23 2012-03-27 Iron Mountain Incorporated Methods and systems for single instance storage of asset parts
US8214401B2 (en) * 2009-02-26 2012-07-03 Oracle International Corporation Techniques for automated generation of ontologies for enterprise applications
EP2394228A4 (en) 2009-03-10 2013-01-23 Ebrary Inc Method and apparatus for real time text analysis and text navigation
US8027960B2 (en) * 2009-03-11 2011-09-27 International Business Machines Corporation Intelligent deletion of elements to maintain referential integrity of dynamically assembled components in a content management system
US8843435B1 (en) * 2009-03-12 2014-09-23 Pegasystems Inc. Techniques for dynamic data processing
US8392375B2 (en) * 2009-03-23 2013-03-05 Microsoft Corporation Perpetual archival of data
US8572675B2 (en) * 2009-04-03 2013-10-29 The Boeing Company System and method for facilitating the provision of web services across different internet security domains
US20100262508A1 (en) * 2009-04-10 2010-10-14 Will Volnak Method and system for an online library marketplace
US9031914B2 (en) * 2009-04-22 2015-05-12 International Business Machines Corporation Tier-based data management
US8578120B2 (en) 2009-05-22 2013-11-05 Commvault Systems, Inc. Block-level single instancing
US9276935B2 (en) * 2009-05-27 2016-03-01 Microsoft Technology Licensing, Llc Domain manager for extending digital-media longevity
US8214336B2 (en) * 2009-06-16 2012-07-03 International Business Machines Corporation Preservation of digital content
KR20100138700A (en) * 2009-06-25 2010-12-31 삼성전자주식회사 Method and apparatus for processing virtual world
US9235579B1 (en) 2009-06-26 2016-01-12 Symantec Corporation Scalable enterprise data archiving system
US10169599B2 (en) * 2009-08-26 2019-01-01 International Business Machines Corporation Data access control with flexible data disclosure
WO2011031773A2 (en) * 2009-09-08 2011-03-17 Zoom Catalog, Llc System and method to research documents in online libraries
US8689188B2 (en) * 2009-09-11 2014-04-01 International Business Machines Corporation System and method for analyzing alternatives in test plans
US8527955B2 (en) 2009-09-11 2013-09-03 International Business Machines Corporation System and method to classify automated code inspection services defect output for defect analysis
US8566805B2 (en) * 2009-09-11 2013-10-22 International Business Machines Corporation System and method to provide continuous calibration estimation and improvement options across a software integration life cycle
US8352237B2 (en) 2009-09-11 2013-01-08 International Business Machines Corporation System and method for system integration test (SIT) planning
US8539438B2 (en) * 2009-09-11 2013-09-17 International Business Machines Corporation System and method for efficient creation and reconciliation of macro and micro level test plans
US10235269B2 (en) * 2009-09-11 2019-03-19 International Business Machines Corporation System and method to produce business case metrics based on defect analysis starter (DAS) results
US8495583B2 (en) * 2009-09-11 2013-07-23 International Business Machines Corporation System and method to determine defect risks in software solutions
US8893086B2 (en) 2009-09-11 2014-11-18 International Business Machines Corporation System and method for resource modeling and simulation in test planning
US8578341B2 (en) 2009-09-11 2013-11-05 International Business Machines Corporation System and method to map defect reduction data to organizational maturity profiles for defect projection modeling
US8667458B2 (en) * 2009-09-11 2014-03-04 International Business Machines Corporation System and method to produce business case metrics based on code inspection service results
US9224007B2 (en) * 2009-09-15 2015-12-29 International Business Machines Corporation Search engine with privacy protection
EP2302536A1 (en) 2009-09-21 2011-03-30 Thomson Licensing System and method for automatically verifying storage of redundant contents into communication equipments, by data comparison
US8350677B2 (en) * 2009-10-12 2013-01-08 Dell Products, Lp System and method for integrating asset tagging with a manufacturing process
EP2323084A1 (en) * 2009-10-23 2011-05-18 Alcatel Lucent Artifact management method
US8176061B2 (en) * 2009-10-29 2012-05-08 Eastman Kodak Company Tracking digital assets on a distributed network
US8156140B2 (en) 2009-11-24 2012-04-10 International Business Machines Corporation Service oriented architecture enterprise service bus with advanced virtualization
US8364745B2 (en) 2009-11-24 2013-01-29 International Business Machines Corporation Service oriented architecture enterprise service bus with universal ports
US8260813B2 (en) * 2009-12-04 2012-09-04 International Business Machines Corporation Flexible data archival using a model-driven approach
US20110137872A1 (en) * 2009-12-04 2011-06-09 International Business Machines Corporation Model-driven data archival system having automated components
US8589439B2 (en) * 2009-12-04 2013-11-19 International Business Machines Corporation Pattern-based and rule-based data archive manager
US8600996B2 (en) * 2009-12-08 2013-12-03 Tripwire, Inc. Use of inference techniques to facilitate categorization of system change information
US8996684B2 (en) * 2009-12-08 2015-03-31 Tripwire, Inc. Scoring and interpreting change data through inference by correlating with change catalogs
US9741017B2 (en) * 2009-12-08 2017-08-22 Tripwire, Inc. Interpreting categorized change information in order to build and maintain change catalogs
US8566358B2 (en) * 2009-12-17 2013-10-22 International Business Machines Corporation Framework to populate and maintain a service oriented architecture industry model repository
US9111004B2 (en) * 2009-12-17 2015-08-18 International Business Machines Corporation Temporal scope translation of meta-models using semantic web technologies
US9026412B2 (en) 2009-12-17 2015-05-05 International Business Machines Corporation Managing and maintaining scope in a service oriented architecture industry model repository
US9600134B2 (en) 2009-12-29 2017-03-21 International Business Machines Corporation Selecting portions of computer-accessible documents for post-selection processing
US9009135B2 (en) * 2010-01-29 2015-04-14 Oracle International Corporation Method and apparatus for satisfying a search request using multiple search engines
US20110191145A1 (en) * 2010-02-02 2011-08-04 Bank Of America Corporation Digital Records Management
US8566823B2 (en) 2010-02-05 2013-10-22 Tripwire, Inc. Systems and methods for triggering scripts based upon an alert within a virtual infrastructure
US8875129B2 (en) * 2010-02-05 2014-10-28 Tripwire, Inc. Systems and methods for monitoring and alerting events that virtual machine software produces in a virtual infrastructure
US8868987B2 (en) * 2010-02-05 2014-10-21 Tripwire, Inc. Systems and methods for visual correlation of log events, configuration changes and conditions producing alerts in a virtual infrastructure
US8990185B2 (en) 2010-02-19 2015-03-24 International Business Machines Corporation Evaluating reference based operations in shared nothing parallelism systems
US8401370B2 (en) * 2010-03-09 2013-03-19 Dolby Laboratories Licensing Corporation Application tracks in audio/video containers
US20110225550A1 (en) * 2010-03-12 2011-09-15 Creedon Michael S System and method for displaying and navigating library information with a virtual library collections browser
US8725767B1 (en) * 2010-03-31 2014-05-13 Emc Corporation Multi-dimensional object model for storage management
CA2702133A1 (en) 2010-05-21 2010-07-24 Ibm Canada Limited - Ibm Canada Limitee Redistribute native xml index key shipping
US8768973B2 (en) 2010-05-26 2014-07-01 Pivotal Software, Inc. Apparatus and method for expanding a shared-nothing system
KR101208814B1 (en) * 2010-07-09 2012-12-06 엔에이치엔(주) System and method for providing serach service
US9164998B2 (en) * 2010-07-29 2015-10-20 Sap Se Archive-system-independent archive-type objects
US20120030577A1 (en) * 2010-07-30 2012-02-02 International Business Machines Corporation System and method for data-driven web page navigation control
US8539165B2 (en) 2010-09-29 2013-09-17 International Business Machines Corporation Methods for managing ownership of redundant data and systems thereof
US20120078931A1 (en) * 2010-09-29 2012-03-29 International Business Machines Corporation Methods for managing ownership of redundant data and systems thereof
US8539154B2 (en) 2010-09-29 2013-09-17 International Business Machines Corporation Methods for managing ownership of redundant data and systems thereof
US8612682B2 (en) 2010-09-29 2013-12-17 International Business Machines Corporation Methods for managing ownership of redundant data and systems thereof
US8645636B2 (en) 2010-09-29 2014-02-04 International Business Machines Corporation Methods for managing ownership of redundant data and systems thereof
US8935492B2 (en) 2010-09-30 2015-01-13 Commvault Systems, Inc. Archiving data objects using secondary copies
US8538826B1 (en) * 2010-10-25 2013-09-17 Amazon Technologies, Inc. Applying restrictions to items
US8352491B2 (en) 2010-11-12 2013-01-08 International Business Machines Corporation Service oriented architecture (SOA) service registry system with enhanced search capability
US8560566B2 (en) 2010-11-12 2013-10-15 International Business Machines Corporation Search capability enhancement in service oriented architecture (SOA) service registry system
US8505047B2 (en) * 2010-11-20 2013-08-06 Motorola Solutions, Inc. Method and system for policy-based re-broadcast video on demand service
US20120131189A1 (en) * 2010-11-24 2012-05-24 Raytheon Company Apparatus and method for information sharing and privacy assurance
US10073844B1 (en) * 2010-11-24 2018-09-11 Federal Home Loan Mortgage Corporation (Freddie Mac) Accelerated system and method for providing data correction
US8881240B1 (en) * 2010-12-06 2014-11-04 Adobe Systems Incorporated Method and apparatus for automatically administrating access rights for confidential information
US9384198B2 (en) * 2010-12-10 2016-07-05 Vertafore, Inc. Agency management system and content management system integration
US8694548B2 (en) * 2011-01-02 2014-04-08 Cisco Technology, Inc. Defense-in-depth security for bytecode executables
US8819064B2 (en) * 2011-02-07 2014-08-26 Yahoo! Inc. Method and system for data provenance management in multi-layer systems
US8452670B2 (en) * 2011-02-08 2013-05-28 Strategic Pharmaceutical Solutions, Inc. Computer-enabled method and system for facilitating veterinary pharmaceutical and other animal-related product catalog customization
US9225694B1 (en) * 2011-02-24 2015-12-29 Mpulse Mobile, Inc. Mobile application secure data exchange
US8479302B1 (en) * 2011-02-28 2013-07-02 Emc Corporation Access control via organization charts
US8478753B2 (en) 2011-03-03 2013-07-02 International Business Machines Corporation Prioritizing search for non-exact matching service description in service oriented architecture (SOA) service registry system with advanced search capability
US8453048B2 (en) 2011-03-07 2013-05-28 Microsoft Corporation Time-based viewing of electronic documents
US9607084B2 (en) * 2011-03-11 2017-03-28 Cox Communications, Inc. Assigning a single master identifier to all related content assets
US10185741B2 (en) 2011-03-14 2019-01-22 Verisign, Inc. Smart navigation services
US9781091B2 (en) 2011-03-14 2017-10-03 Verisign, Inc. Provisioning for smart navigation services
US9811599B2 (en) 2011-03-14 2017-11-07 Verisign, Inc. Methods and systems for providing content provider-specified URL keyword navigation
US9646100B2 (en) 2011-03-14 2017-05-09 Verisign, Inc. Methods and systems for providing content provider-specified URL keyword navigation
US8326800B2 (en) * 2011-03-18 2012-12-04 Microsoft Corporation Seamless upgrades in a distributed database system
WO2012135722A1 (en) * 2011-03-30 2012-10-04 Google Inc. Using an update feed to capture and store documents for litigation hold and legal discovery
US8566842B2 (en) 2011-04-01 2013-10-22 International Business Machines Corporation Identification of a protocol used in a message
US20120290447A1 (en) * 2011-05-15 2012-11-15 Mar Hershenson On line advertising and electronic catalog processes and apparatus
US8577993B2 (en) * 2011-05-20 2013-11-05 International Business Machines Corporation Caching provenance information
US10242208B2 (en) * 2011-06-27 2019-03-26 Xerox Corporation System and method of managing multiple levels of privacy in documents
US9495393B2 (en) 2011-07-27 2016-11-15 EMC IP Holding Company, LLC System and method for reviewing role definitions
US20130054607A1 (en) * 2011-08-27 2013-02-28 Henry Gladney Method and System for Preparing Digital Information for Long-Term Preservation
US8423575B1 (en) 2011-09-29 2013-04-16 International Business Machines Corporation Presenting information from heterogeneous and distributed data sources with real time updates
US9043311B1 (en) 2011-10-20 2015-05-26 Amazon Technologies, Inc. Indexing data updates associated with an electronic catalog system
US9292521B1 (en) * 2011-10-20 2016-03-22 Amazon Technologies, Inc. Archiving and querying data updates associated with an electronic catalog system
US8442951B1 (en) 2011-12-07 2013-05-14 International Business Machines Corporation Processing archive content based on hierarchical classification levels
US9286303B1 (en) * 2011-12-22 2016-03-15 Emc Corporation Unified catalog service
US8938444B2 (en) * 2011-12-29 2015-01-20 Teradata Us, Inc. Techniques for external application-directed data partitioning in data exporting from a database management system
US8712994B2 (en) 2011-12-29 2014-04-29 Teradata US. Inc. Techniques for accessing a parallel database system via external programs using vertical and/or horizontal partitioning
US9195936B1 (en) 2011-12-30 2015-11-24 Pegasystems Inc. System and method for updating or modifying an application without manual coding
US9195853B2 (en) 2012-01-15 2015-11-24 International Business Machines Corporation Automated document redaction
US9311623B2 (en) * 2012-02-09 2016-04-12 International Business Machines Corporation System to view and manipulate artifacts at a temporal reference point
US9678956B2 (en) * 2012-02-17 2017-06-13 Kno2 Llc Data capturing and structuring method and system
US9020890B2 (en) 2012-03-30 2015-04-28 Commvault Systems, Inc. Smart archiving and data previewing for mobile devices
US20130304717A1 (en) * 2012-05-08 2013-11-14 General Electric Company Control system asset management
US9892278B2 (en) 2012-11-14 2018-02-13 International Business Machines Corporation Focused personal identifying information redaction
US20140156558A1 (en) * 2012-12-04 2014-06-05 Risconsulting Group Llc, The Collateral Mechanisms
US9633022B2 (en) 2012-12-28 2017-04-25 Commvault Systems, Inc. Backup and restoration for a deduplicated file system
US9223840B2 (en) * 2012-12-31 2015-12-29 Futurewei Technologies, Inc. Fast object fingerprints
US20140257893A1 (en) * 2013-03-08 2014-09-11 Gerard Nicol Method and System for Certification
US20140289185A1 (en) * 2013-03-20 2014-09-25 Marklogic Corporation Apparatus and Method for Policy Based Rebalancing in a Distributed Document-Oriented Database
US10057207B2 (en) * 2013-04-07 2018-08-21 Verisign, Inc. Smart navigation for shortened URLs
WO2015065470A1 (en) * 2013-11-01 2015-05-07 Longsand Limited Asset browsing and restoration over a network using on demand staging
US10671491B2 (en) * 2013-11-01 2020-06-02 Micro Focus Llc Asset browsing and restoration over a network using pre-staging and directory storage
US9507814B2 (en) 2013-12-10 2016-11-29 Vertafore, Inc. Bit level comparator systems and methods
US9294480B2 (en) * 2014-01-15 2016-03-22 Cisco Technology, Inc. Tracking and tracing information theft from information systems
US10324897B2 (en) 2014-01-27 2019-06-18 Commvault Systems, Inc. Techniques for serving archived electronic mail
US10069914B1 (en) 2014-04-21 2018-09-04 David Lane Smith Distributed storage system for long term data storage
US10120855B2 (en) * 2014-05-22 2018-11-06 International Business Machines Corporation Consolidation of web contents between web content management systems and digital asset management systems
US9747556B2 (en) 2014-08-20 2017-08-29 Vertafore, Inc. Automated customized web portal template generation systems and methods
SG11201701613YA (en) * 2014-09-03 2017-03-30 Dun & Bradstreet Corp System and process for analyzing, qualifying and ingesting sources of unstructured data via empirical attribution
US10257274B2 (en) * 2014-09-15 2019-04-09 Foundation for Research and Technology—Hellas (FORTH) Tiered heterogeneous fast layer shared storage substrate apparatuses, methods, and systems
EP3204858B9 (en) * 2014-10-07 2021-03-31 Optum, Inc. Highly secure networked system and methods for storage, processing, and transmission of sensitive personal information
US10469396B2 (en) 2014-10-10 2019-11-05 Pegasystems, Inc. Event processing with enhanced throughput
US10387834B2 (en) * 2015-01-21 2019-08-20 Palantir Technologies Inc. Systems and methods for accessing and storing snapshots of a remote application in a document
US9609023B2 (en) 2015-02-10 2017-03-28 International Business Machines Corporation System and method for software defined deployment of security appliances using policy templates
US11386107B1 (en) 2015-02-13 2022-07-12 Omnicom Media Group Holdings Inc. Variable data source dynamic and automatic ingestion and auditing platform apparatuses, methods and systems
US9946752B2 (en) * 2015-04-27 2018-04-17 Microsoft Technology Licensing, Llc Low-latency query processor
US10095807B2 (en) * 2015-04-28 2018-10-09 Microsoft Technology Licensing, Llc Linked data processor for database storage
US11720539B1 (en) * 2015-05-13 2023-08-08 United States Of America As Represented By The Administrator Of Nasa System and method for providing a climate data intercomparison and analytics service application programming interface
US10324914B2 (en) 2015-05-20 2019-06-18 Commvalut Systems, Inc. Handling user queries against production and archive storage systems, such as for enterprise customers having large and/or numerous files
US10002157B2 (en) * 2015-06-15 2018-06-19 International Business Machines Corporation Automatic conflict resolution during software catalog import
US10725922B2 (en) 2015-06-25 2020-07-28 Intel Corporation Technologies for predictive file caching and synchronization
KR20170010574A (en) * 2015-07-20 2017-02-01 삼성전자주식회사 Information processing apparatus, image processsing apparatus and control methods thereof
US9600400B1 (en) 2015-10-29 2017-03-21 Vertafore, Inc. Performance testing of web application components using image differentiation
US10380070B2 (en) * 2015-11-12 2019-08-13 International Business Machines Corporation Reading and writing a header and record on tape
US10713431B2 (en) 2015-12-29 2020-07-14 Accenture Global Solutions Limited Digital document processing based on document source or document type
US20220164840A1 (en) 2016-04-01 2022-05-26 OneTrust, LLC Data processing systems and methods for integrating privacy information management systems with data loss prevention tools or other tools for privacy design
US20170300673A1 (en) * 2016-04-19 2017-10-19 Brillio LLC Information apparatus and method for authorizing user of augment reality apparatus
US10698599B2 (en) 2016-06-03 2020-06-30 Pegasystems, Inc. Connecting graphical shapes using gestures
US10740348B2 (en) 2016-06-06 2020-08-11 Georgetown University Application programming interface and hypergraph transfer protocol supporting a global hypergraph approach to reducing complexity for accelerated multi-disciplinary scientific discovery
US11354434B2 (en) 2016-06-10 2022-06-07 OneTrust, LLC Data processing systems for verification of consent and notice processing and related methods
US11392720B2 (en) 2016-06-10 2022-07-19 OneTrust, LLC Data processing systems for verification of consent and notice processing and related methods
US12052289B2 (en) 2016-06-10 2024-07-30 OneTrust, LLC Data processing systems for data-transfer risk identification, cross-border visualization generation, and related methods
US11727141B2 (en) 2016-06-10 2023-08-15 OneTrust, LLC Data processing systems and methods for synching privacy-related user consent across multiple computing devices
US10284604B2 (en) 2016-06-10 2019-05-07 OneTrust, LLC Data processing and scanning systems for generating and populating a data inventory
US11675929B2 (en) 2016-06-10 2023-06-13 OneTrust, LLC Data processing consent sharing systems and related methods
US11520928B2 (en) 2016-06-10 2022-12-06 OneTrust, LLC Data processing systems for generating personal data receipts and related methods
US11354435B2 (en) 2016-06-10 2022-06-07 OneTrust, LLC Data processing systems for data testing to confirm data deletion and related methods
US11544667B2 (en) 2016-06-10 2023-01-03 OneTrust, LLC Data processing systems for generating and populating a data inventory
US11188615B2 (en) 2016-06-10 2021-11-30 OneTrust, LLC Data processing consent capture systems and related methods
US11416798B2 (en) 2016-06-10 2022-08-16 OneTrust, LLC Data processing systems and methods for providing training in a vendor procurement process
US12118121B2 (en) 2016-06-10 2024-10-15 OneTrust, LLC Data subject access request processing systems and related methods
US11586700B2 (en) 2016-06-10 2023-02-21 OneTrust, LLC Data processing systems and methods for automatically blocking the use of tracking tools
US11461500B2 (en) 2016-06-10 2022-10-04 OneTrust, LLC Data processing systems for cookie compliance testing with website scanning and related methods
US10997318B2 (en) * 2016-06-10 2021-05-04 OneTrust, LLC Data processing systems for generating and populating a data inventory for processing data access requests
US11481710B2 (en) 2016-06-10 2022-10-25 OneTrust, LLC Privacy management systems and methods
US11636171B2 (en) 2016-06-10 2023-04-25 OneTrust, LLC Data processing user interface monitoring systems and related methods
US12136055B2 (en) 2016-06-10 2024-11-05 OneTrust, LLC Data processing systems for identifying, assessing, and remediating data processing risks using data modeling techniques
US12045266B2 (en) 2016-06-10 2024-07-23 OneTrust, LLC Data processing systems for generating and populating a data inventory
US11475136B2 (en) 2016-06-10 2022-10-18 OneTrust, LLC Data processing systems for data transfer risk identification and related methods
US11188862B2 (en) 2016-06-10 2021-11-30 OneTrust, LLC Privacy management systems and methods
US10698647B2 (en) 2016-07-11 2020-06-30 Pegasystems Inc. Selective sharing for collaborative application usage
CN106156356A (en) * 2016-07-27 2016-11-23 北京电子科技学院 OAIS Information encapsulation method and system based on XML
US20180114179A1 (en) * 2016-10-24 2018-04-26 Simmonds Precision Products, Inc. Product life cycle model storage architecture
TWI734735B (en) * 2017-01-24 2021-08-01 香港商阿里巴巴集團服務有限公司 Terminal authenticity verification method, device and system
US10783112B2 (en) * 2017-03-27 2020-09-22 International Business Machines Corporation High performance compliance mechanism for structured and unstructured objects in an enterprise
CN108694067A (en) * 2017-04-06 2018-10-23 群晖科技股份有限公司 For carrying out the method and apparatus of storage space management for multiple virtual machines
US10521740B2 (en) * 2017-05-09 2019-12-31 Accenture Global Solutions Limited Automated ECM process migrator
US10013577B1 (en) 2017-06-16 2018-07-03 OneTrust, LLC Data processing systems for identifying whether cookies contain personally identifying information
US10635999B2 (en) * 2017-07-12 2020-04-28 Accurate Group Holdings, Llc Methods and systems for controlling a display screen with graphical objects for scheduling
US11100152B2 (en) 2017-08-17 2021-08-24 Target Brands, Inc. Data portal
US11055269B2 (en) * 2017-08-28 2021-07-06 GroupBy Inc. Efficient ingest and search of access controlled records
US10635700B2 (en) 2017-11-09 2020-04-28 Cloudera, Inc. Design-time information based on run-time artifacts in transient cloud-based distributed computing clusters
US10514948B2 (en) * 2017-11-09 2019-12-24 Cloudera, Inc. Information based on run-time artifacts in a distributed computing cluster
US11550811B2 (en) * 2017-12-22 2023-01-10 Scripps Networks Interactive, Inc. Cloud hybrid application storage management (CHASM) system
US11068569B2 (en) * 2017-12-22 2021-07-20 Barracuda Networks, Inc. Method and apparatus for human activity tracking and authenticity verification of human-originated digital assets
CN108389118B (en) 2018-02-14 2020-05-29 阿里巴巴集团控股有限公司 Asset management system, method and device and electronic equipment
US10524096B2 (en) * 2018-03-07 2019-12-31 Electronics And Telecommunications Research Institute Method of identifying internet of things group service based on object identifier
US11150632B2 (en) * 2018-03-16 2021-10-19 Yokogawa Electric Corporation System and method for field device management using class parameter set
US10878115B1 (en) * 2018-07-11 2020-12-29 Veeva Systems Inc. Record relationship change control in a content management system
US11048488B2 (en) 2018-08-14 2021-06-29 Pegasystems, Inc. Software code optimizer and method
US10803202B2 (en) 2018-09-07 2020-10-13 OneTrust, LLC Data processing systems for orphaned data identification and deletion and related methods
US11544409B2 (en) 2018-09-07 2023-01-03 OneTrust, LLC Data processing systems and methods for automatically protecting sensitive data within privacy management systems
EP3874383A1 (en) 2018-11-01 2021-09-08 rewardStyle, Inc. System and method for improved searching across multiple databases
US10999077B2 (en) 2019-01-02 2021-05-04 Bank Of America Corporation Data protection using sporadically generated universal tags
US11212106B2 (en) 2019-01-02 2021-12-28 Bank Of America Corporation Data protection using universal tagging
CN109697891B (en) * 2019-01-21 2021-04-23 南京苏宁软件技术有限公司 Method and system for monitoring starting state of automatic ship identification system
US11262979B2 (en) 2019-09-18 2022-03-01 Bank Of America Corporation Machine learning webpage accessibility testing tool
US11194833B2 (en) * 2019-10-28 2021-12-07 Charbel Gerges El Gemayel Interchange data format system and method
US11677754B2 (en) * 2019-12-09 2023-06-13 Daniel Chien Access control systems and methods
WO2022011142A1 (en) 2020-07-08 2022-01-13 OneTrust, LLC Systems and methods for targeted data discovery
WO2022026564A1 (en) 2020-07-28 2022-02-03 OneTrust, LLC Systems and methods for automatically blocking the use of tracking tools
US11003880B1 (en) 2020-08-05 2021-05-11 Georgetown University Method and system for contact tracing
US11567945B1 (en) 2020-08-27 2023-01-31 Pegasystems Inc. Customized digital content generation systems and methods
US11436373B2 (en) 2020-09-15 2022-09-06 OneTrust, LLC Data processing systems and methods for detecting tools for the automatic blocking of consent requests
US11321285B2 (en) 2020-10-01 2022-05-03 Bank Of America Corporation Automatic database script generation for copying data between relational databases
WO2022099023A1 (en) 2020-11-06 2022-05-12 OneTrust, LLC Systems and methods for identifying data processing activities based on data discovery results
US11782623B2 (en) * 2020-12-15 2023-10-10 International Business Machines Corporation Transferring an operating image into a multi-tenant environment
US11687528B2 (en) * 2021-01-25 2023-06-27 OneTrust, LLC Systems and methods for discovery, classification, and indexing of data in a native computing system
US11775348B2 (en) 2021-02-17 2023-10-03 OneTrust, LLC Managing custom workflows for domain objects defined within microservices
US11546661B2 (en) 2021-02-18 2023-01-03 OneTrust, LLC Selective redaction of media content
US11533315B2 (en) 2021-03-08 2022-12-20 OneTrust, LLC Data transfer discovery and analysis systems and related methods
US11562078B2 (en) 2021-04-16 2023-01-24 OneTrust, LLC Assessing and managing computational risk involved with integrating third party computing functionality within a computing system
CN113377755B (en) * 2021-06-23 2022-12-16 黑龙江大学 Integrity detection and missing repair method for electric power spot data
CN113360482B (en) * 2021-08-10 2021-11-30 深圳市中科鼎创科技股份有限公司 SQL database-based online migration method
US12056021B2 (en) * 2021-10-12 2024-08-06 Bmc Software, Inc. Database archiving and access across updates to database metadata
US11762857B2 (en) * 2022-02-18 2023-09-19 Capital One Services, Llc Methods and systems for searching data exchanges that comprise information on assets with non-homogenous functionality and non-standardized data descriptions
US20230409521A1 (en) * 2022-05-26 2023-12-21 Preservica Ltd Automatic preservation
US11620142B1 (en) 2022-06-03 2023-04-04 OneTrust, LLC Generating and customizing user interfaces for demonstrating functions of interactive user environments
WO2024173883A1 (en) * 2023-02-17 2024-08-22 Capital One Services, Llc Systems and methods for validating non-homogenous assets and executing operations across data exchanges that comprise non-standardized data descriptions using dynamically generated validation rules
GB2629761A (en) * 2023-04-18 2024-11-13 Preservica Ltd Data preservation
CN118410226B (en) * 2024-06-28 2024-09-24 万村联网数字科技有限公司 AI-based bad asset case search analysis system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5813009A (en) * 1995-07-28 1998-09-22 Univirtual Corp. Computer based records management system method
US6021202A (en) * 1996-12-20 2000-02-01 Financial Services Technology Consortium Method and system for processing electronic documents
US6263330B1 (en) * 1998-02-24 2001-07-17 Luc Bessette Method and apparatus for the management of data files
US20030193994A1 (en) * 2001-03-21 2003-10-16 Patrick Stickler Method of managing media components
US20070050333A1 (en) * 2005-08-31 2007-03-01 Sap Ag Archive indexing engine
US7200627B2 (en) * 2001-03-21 2007-04-03 Nokia Corporation Method and apparatus for generating a directory structure
US7246104B2 (en) * 2001-03-21 2007-07-17 Nokia Corporation Method and apparatus for information delivery with archive containing metadata in predetermined language and semantics
US7254570B2 (en) * 2001-03-21 2007-08-07 Nokia Corporation Query resolution system and service
US7353236B2 (en) * 2001-03-21 2008-04-01 Nokia Corporation Archive system and data maintenance method

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5633875A (en) * 1995-06-07 1997-05-27 General Electric Company Protocol and mechanism for centralized asset tracking communications
US6181336B1 (en) * 1996-05-31 2001-01-30 Silicon Graphics, Inc. Database-independent, scalable, object-oriented architecture and API for managing digital multimedia assets
US6023765A (en) 1996-12-06 2000-02-08 The United States Of America As Represented By The Secretary Of Commerce Implementation of role-based access control in multi-level secure systems
CA2295634C (en) * 1997-07-10 2007-11-27 Siemens Aktiengesellschaft Conveyor device
US6202066B1 (en) 1997-11-19 2001-03-13 The United States Of America As Represented By The Secretary Of Commerce Implementation of role/group permission association using object access type
US6088679A (en) 1997-12-01 2000-07-11 The United States Of America As Represented By The Secretary Of Commerce Workflow management employing role-based access control
EP1203315A1 (en) * 1999-06-15 2002-05-08 Kanisa Inc. System and method for document management based on a plurality of knowledge taxonomies
JP2001297026A (en) 2000-04-11 2001-10-26 Hitachi Ltd Computer system with a plurality of database management systems
US6678700B1 (en) 2000-04-27 2004-01-13 General Atomics System of and method for transparent management of data objects in containers across distributed heterogenous resources
WO2001090951A2 (en) 2000-05-19 2001-11-29 The Board Of Trustee Of The Leland Stanford Junior University An internet-linked system for directory protocol based data storage, retrieval and analysis
US6718335B1 (en) * 2000-05-31 2004-04-06 International Business Machines Corporation Datawarehouse including a meta data catalog
US6604110B1 (en) * 2000-08-31 2003-08-05 Ascential Software, Inc. Automated software code generation from a metadata-based repository
US6965904B2 (en) * 2001-03-02 2005-11-15 Zantaz, Inc. Query Service for electronic documents archived in a multi-dimensional storage space
US7593968B2 (en) * 2001-06-05 2009-09-22 Silicon Graphics, Inc. Recovery and relocation of a distributed name service in a cluster filesystem
US6950833B2 (en) * 2001-06-05 2005-09-27 Silicon Graphics, Inc. Clustered filesystem
US7617292B2 (en) * 2001-06-05 2009-11-10 Silicon Graphics International Multi-class heterogeneous clients in a clustered filesystem
US7200801B2 (en) * 2002-05-17 2007-04-03 Sap Aktiengesellschaft Rich media information portals
US8090590B2 (en) * 2003-03-10 2012-01-03 Intuit Inc. Electronic personal health record system
US20060123232A1 (en) * 2004-12-08 2006-06-08 International Business Machines Corporation Method for protecting and managing retention of data on worm media
US20070011109A1 (en) * 2005-06-23 2007-01-11 Microsoft Corporation Immortal information storage and access platform
WO2007016787A2 (en) * 2005-08-09 2007-02-15 Nexsan Technologies Canada Inc. Data archiving system
US20070061567A1 (en) * 2005-09-10 2007-03-15 Glen Day Digital information protection system
US20080005194A1 (en) 2006-05-05 2008-01-03 Lockheed Martin Corporation System and method for immutably cataloging and storing electronic assets in a large scale computer system

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5813009A (en) * 1995-07-28 1998-09-22 Univirtual Corp. Computer based records management system method
US6021202A (en) * 1996-12-20 2000-02-01 Financial Services Technology Consortium Method and system for processing electronic documents
US6209095B1 (en) * 1996-12-20 2001-03-27 Financial Services Technology Consortium Method and system for processing electronic documents
US6263330B1 (en) * 1998-02-24 2001-07-17 Luc Bessette Method and apparatus for the management of data files
US20030193994A1 (en) * 2001-03-21 2003-10-16 Patrick Stickler Method of managing media components
US7200627B2 (en) * 2001-03-21 2007-04-03 Nokia Corporation Method and apparatus for generating a directory structure
US7246104B2 (en) * 2001-03-21 2007-07-17 Nokia Corporation Method and apparatus for information delivery with archive containing metadata in predetermined language and semantics
US7254570B2 (en) * 2001-03-21 2007-08-07 Nokia Corporation Query resolution system and service
US7353236B2 (en) * 2001-03-21 2008-04-01 Nokia Corporation Archive system and data maintenance method
US20070050333A1 (en) * 2005-08-31 2007-03-01 Sap Ag Archive indexing engine

Cited By (83)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7940916B2 (en) * 2006-08-28 2011-05-10 Avaya Inc. Orchestration engine as an intermediary between telephony functions and business processes
US20080065498A1 (en) * 2006-08-28 2008-03-13 Avaya Technology Llc Orchestration Engine as an Intermediary Between Telephony Functions and Business Processes
US20080281836A1 (en) * 2007-02-06 2008-11-13 Access Systems Americas, Inc. system and method for displaying and navigating content on a electronic device
US20100287187A1 (en) * 2007-02-14 2010-11-11 Donglin Wang Method for query based on layout information
US8386943B2 (en) * 2007-02-14 2013-02-26 Sursen Corp. Method for query based on layout information
US7631019B2 (en) * 2007-05-31 2009-12-08 Red Hat, Inc. Distributing data across different backing data stores
US20080301123A1 (en) * 2007-05-31 2008-12-04 Schneider James P Distributing data across different backing data stores
US9305030B2 (en) 2008-01-09 2016-04-05 Med-Legal Technologies, Llc Records management system and methods
US8301611B2 (en) * 2008-01-09 2012-10-30 Med-Legal Technologies, Llc Records management system and method
US8458155B2 (en) 2008-01-09 2013-06-04 Med-Legal Technologies, Llc Records management system and method with excerpts
US20110164820A1 (en) * 2008-01-09 2011-07-07 Stephen Schneider Records Management System and Method
US8875306B2 (en) 2008-02-12 2014-10-28 Oracle International Corporation Customization restrictions for multi-layer XML customization
US8788542B2 (en) 2008-02-12 2014-07-22 Oracle International Corporation Customization syntax for multi-layer XML customization
US8966465B2 (en) 2008-02-12 2015-02-24 Oracle International Corporation Customization creation and update for multi-layer XML customization
US8229976B2 (en) * 2008-03-27 2012-07-24 Microsoft Corporation Data binding for XML schemas
US20090248730A1 (en) * 2008-03-27 2009-10-01 Microsoft Corporation Data Binding for XML Schemas
US9454521B1 (en) * 2008-04-08 2016-09-27 United Services Automobile Association (Usaa) Systems and methods for creating documents from templates
US7933930B1 (en) * 2008-04-08 2011-04-26 United Services Automobile Association (Usaa) Systems and methods for creating documents from templates
US8516007B1 (en) * 2008-04-08 2013-08-20 United Services Automobile Association (Usaa) Systems and methods for creating documents from templates
US8037101B1 (en) * 2008-04-08 2011-10-11 United Services Automobile Association (Usaa) Systems and methods for creating documents from templates
US8051103B1 (en) * 2008-04-08 2011-11-01 United Services Automobile Association (Usaa) Systems and methods for creating documents from templates
US9780965B2 (en) 2008-05-27 2017-10-03 Glue Networks Methods and systems for communicating using a virtual private network
US20100011009A1 (en) * 2008-07-08 2010-01-14 Caterpillar Inc. System and method for monitoring document conformance
US8996658B2 (en) 2008-09-03 2015-03-31 Oracle International Corporation System and method for integration of browser-based thin client applications within desktop rich client architecture
US9606778B2 (en) 2008-09-03 2017-03-28 Oracle International Corporation System and method for meta-data driven, semi-automated generation of web services based on existing applications
US10296373B2 (en) 2008-09-17 2019-05-21 Oracle International Corporation Generic wait service: pausing and resuming a plurality of BPEL processes arranged in correlation sets by a central generic wait server
US9122520B2 (en) 2008-09-17 2015-09-01 Oracle International Corporation Generic wait service: pausing a BPEL process
US8799319B2 (en) 2008-09-19 2014-08-05 Oracle International Corporation System and method for meta-data driven, semi-automated generation of web services based on existing applications
US8032551B2 (en) 2009-05-11 2011-10-04 Red Hat, Inc. Searching documents for successive hashed keywords
US8032550B2 (en) 2009-05-11 2011-10-04 Red Hat, Inc. Federated document search by keywords
US20100287173A1 (en) * 2009-05-11 2010-11-11 Red Hat, Inc. Searching Documents for Successive Hashed Keywords
US20100287172A1 (en) * 2009-05-11 2010-11-11 Red Hat, Inc . Federated Document Search by Keywords
US20100287171A1 (en) * 2009-05-11 2010-11-11 Red Hat, Inc. Federated Indexing from Hashed Primary Key Slices
US8037076B2 (en) 2009-05-11 2011-10-11 Red Hat, Inc. Federated indexing from hashed primary key slices
US8739125B2 (en) 2009-06-16 2014-05-27 Red Hat, Inc. Automated and unattended process for testing software applications
US20100318969A1 (en) * 2009-06-16 2010-12-16 Lukas Petrovicky Mechanism for Automated and Unattended Process for Testing Software Applications
US8402064B2 (en) * 2010-02-01 2013-03-19 Oracle International Corporation Orchestration of business processes using templates
US20110191383A1 (en) * 2010-02-01 2011-08-04 Oracle International Corporation Orchestration of business processes using templates
US20110219218A1 (en) * 2010-03-05 2011-09-08 Oracle International Corporation Distributed order orchestration system with rollback checkpoints for adjusting long running order management fulfillment processes
US9904898B2 (en) 2010-03-05 2018-02-27 Oracle International Corporation Distributed order orchestration system with rules engine
US10061464B2 (en) 2010-03-05 2018-08-28 Oracle International Corporation Distributed order orchestration system with rollback checkpoints for adjusting long running order management fulfillment processes
US20110218842A1 (en) * 2010-03-05 2011-09-08 Oracle International Corporation Distributed order orchestration system with rules engine
US8793262B2 (en) 2010-03-05 2014-07-29 Oracle International Corporation Correlating and mapping original orders with new orders for adjusting long running order management fulfillment processes
US20110218923A1 (en) * 2010-03-05 2011-09-08 Oracle International Corporation Task layer service patterns for adjusting long running order management fulfillment processes for a distributed order orchestration system
US20110218924A1 (en) * 2010-03-05 2011-09-08 Oracle International Corporation Distributed order orchestration system for adjusting long running order management fulfillment processes with delta attributes
US20110218926A1 (en) * 2010-03-05 2011-09-08 Oracle International Corporation Saving order process state for adjusting long running order management fulfillment processes in a distributed order orchestration system
US20110218921A1 (en) * 2010-03-05 2011-09-08 Oracle International Corporation Notify/inquire fulfillment systems before processing change requests for adjusting long running order management fulfillment processes in a distributed order orchestration system
US20110218925A1 (en) * 2010-03-05 2011-09-08 Oracle International Corporation Change management framework in distributed order orchestration system
US20110218922A1 (en) * 2010-03-05 2011-09-08 Oracle International Corporation Cost of change for adjusting long running order management fulfillment processes for a distributed order orchestration sytem
US10395205B2 (en) 2010-03-05 2019-08-27 Oracle International Corporation Cost of change for adjusting long running order management fulfillment processes for a distributed order orchestration system
US10789562B2 (en) 2010-03-05 2020-09-29 Oracle International Corporation Compensation patterns for adjusting long running order management fulfillment processes in an distributed order orchestration system
US20110218927A1 (en) * 2010-03-05 2011-09-08 Oracle International Corporation Compensation patterns for adjusting long running order management fulfillment processes in an distributed order orchestration system
US20110218813A1 (en) * 2010-03-05 2011-09-08 Oracle International Corporation Correlating and mapping original orders with new orders for adjusting long running order management fulfillment processes
US9269075B2 (en) 2010-03-05 2016-02-23 Oracle International Corporation Distributed order orchestration system for adjusting long running order management fulfillment processes with delta attributes
US9658901B2 (en) 2010-11-12 2017-05-23 Oracle International Corporation Event-based orchestration in distributed order orchestration system
US8438146B2 (en) * 2011-06-30 2013-05-07 International Business Machines Corporation Generating containers for electronic records based on configurable parameters
US8935228B2 (en) 2011-06-30 2015-01-13 International Business Machines Corporation Generating containers for electronic records based on configureable parameters
US9053109B1 (en) * 2011-09-15 2015-06-09 Symantec Corporation Systems and methods for efficient data storage for content management systems
US20130086568A1 (en) * 2011-09-30 2013-04-04 Oracle International Corporation Optimizations using a bpel compiler
US8954942B2 (en) * 2011-09-30 2015-02-10 Oracle International Corporation Optimizations using a BPEL compiler
US10552769B2 (en) 2012-01-27 2020-02-04 Oracle International Corporation Status management framework in a distributed order orchestration system
US8762322B2 (en) 2012-05-22 2014-06-24 Oracle International Corporation Distributed order orchestration system with extensible flex field support
US9672560B2 (en) 2012-06-28 2017-06-06 Oracle International Corporation Distributed order orchestration system that transforms sales products to fulfillment products
US20140164895A1 (en) * 2012-12-11 2014-06-12 SmartOrg. Inc. Systems and methods for managing spreadsheet models
US9575950B2 (en) * 2012-12-11 2017-02-21 Smartorg, Inc. Systems and methods for managing spreadsheet models
US9760528B1 (en) 2013-03-14 2017-09-12 Glue Networks, Inc. Methods and systems for creating a network
US9928082B1 (en) 2013-03-19 2018-03-27 Gluware, Inc. Methods and systems for remote device configuration
US20140344313A1 (en) * 2013-05-15 2014-11-20 Oracle International Corporation Migration of data objects
US9286330B2 (en) * 2013-05-15 2016-03-15 Oracle International Corporation Migration of data objects
US9497283B2 (en) * 2013-12-13 2016-11-15 Oracle International Corporation System and method for providing data interoperability in a distributed data grid
US20150172405A1 (en) * 2013-12-13 2015-06-18 Oracle International Corporation System and method for providing data interoperability in a distributed data grid
US9785412B1 (en) 2015-02-27 2017-10-10 Glue Networks, Inc. Methods and systems for object-oriented modeling of networks
US10838664B2 (en) * 2015-05-29 2020-11-17 Pure Storage, Inc. Determining a storage location according to legal requirements
US20170068691A1 (en) * 2015-05-29 2017-03-09 International Business Machines Corporation Determining a storage location according to legal requirements
US11550515B1 (en) 2015-05-29 2023-01-10 Pure Storage, Inc. Determining a storage location according to data retention policies
US11886752B2 (en) 2015-05-29 2024-01-30 Pure Storage, Inc. Method for determining the legal basis for transfer of a data object
US10235417B1 (en) * 2015-09-02 2019-03-19 Amazon Technologies, Inc. Partitioned search of log events
US10853359B1 (en) 2015-12-21 2020-12-01 Amazon Technologies, Inc. Data log stream processing using probabilistic data structures
US10719492B1 (en) 2016-12-07 2020-07-21 GrayMeta, Inc. Automatic reconciliation and consolidation of disparate repositories
US20190129697A1 (en) * 2017-10-31 2019-05-02 EMC IP Holding Company LLC Management of data using templates
US10977016B2 (en) * 2017-10-31 2021-04-13 EMC IP Holding Company LLC Management of data using templates
US11086901B2 (en) 2018-01-31 2021-08-10 EMC IP Holding Company LLC Method and system for efficient data replication in big data environment
US11621857B2 (en) 2020-09-03 2023-04-04 Seagate Technology Llc Fingerprint and provenance for movable storage devices

Also Published As

Publication number Publication date
CA2587757A1 (en) 2007-11-05
EP1852795A1 (en) 2007-11-07
EP1862925A3 (en) 2008-08-13
EP1862925A2 (en) 2007-12-05
US20070283417A1 (en) 2007-12-06
EP1855219A2 (en) 2007-11-14
CA2587459A1 (en) 2007-11-05
US20070271258A1 (en) 2007-11-22
CA2587462C (en) 2014-01-28
CA2587462A1 (en) 2007-11-05
CA2587759A1 (en) 2007-11-05
US20080005194A1 (en) 2008-01-03
US7792791B2 (en) 2010-09-07
EP1855218A3 (en) 2007-11-28
CA2587457A1 (en) 2007-11-05
CA2587454A1 (en) 2007-11-05
US20070260621A1 (en) 2007-11-08
US8087063B2 (en) 2011-12-27
EP1852815A1 (en) 2007-11-07
EP1852794A2 (en) 2007-11-07
EP1852793A2 (en) 2007-11-07
US7711702B2 (en) 2010-05-04
CA2587758A1 (en) 2007-11-05
US8726351B2 (en) 2014-05-13
EP1855220A3 (en) 2007-12-26
EP1852793A3 (en) 2007-11-28
CA2587397A1 (en) 2007-11-05
CA2587757C (en) 2014-04-22
US7783596B2 (en) 2010-08-24
EP1855220A2 (en) 2007-11-14
EP1852794A3 (en) 2007-11-28
US20070260476A1 (en) 2007-11-08
US20070260620A1 (en) 2007-11-08
US7711703B2 (en) 2010-05-04
CA2587759C (en) 2012-07-17
US20080072290A1 (en) 2008-03-20
EP1855218A2 (en) 2007-11-14
EP1855219A3 (en) 2007-11-28

Similar Documents

Publication Publication Date Title
CA2587757C (en) A system and method for managing electronic records
Lavoie The open archival information system reference model: Introductory guide
Beagrie et al. A strategic policy framework for creating and preserving digital collections: a report to the Digital Archiving Working Group
Zhang Original order in digital archives
Caplan The Florida Digital Archive and DAITSS: a working preservation repository based on format migration
Jantz et al. Digital archiving and preservation: Technologies and processes for a trusted repository
Hitchcock et al. Preservation for Institutional Repositories: practical and invisible
Sebastian et al. The Art of SQL Server FILESTREAM
Sathiadas et al. Document Management Techniques and Technologies
Shepherd et al. Are ISO 15489‐1: 2001 and ISAD (G) compatible? Part 2
Eisenberg et al. Building an electronic records archive at the National Archives and Records Administration: Recommendations for initial development
Aldeias Open Archival Information Systems for Database Preservation
James TRANSLATING THEORY TO PRACTICE
Triebsees et al. Controlled Migration in Digital Archives.
Bulatovic et al. eSciDoc-a service infrastructure for management of Cultural Heritage content
Hinrichs et al. Sustainability of Linguistic Data and Analysis in the Context of a Collaborative eScience Environment.
Semple Digital Archives Research Project: A report and recommendations
Zeng et al. Research and practice of electronic resources preservation in Tsinghua University Library
Abrams Harvard University Cambridge, Massachusetts USA David Seaman Digital Library Federation

Legal Events

Date Code Title Description
AS Assignment

Owner name: FENESTRA TECHNOLOGIES CORPORATION, MARYLAND

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ROGERS, ROY S., IV;MCKENNIREY, MATTHEW J.;REEL/FRAME:019401/0662

Effective date: 20070504

Owner name: TESSELLA INC., MASSACHUSETTS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:EVANS, MARK J.;REEL/FRAME:019401/0525

Effective date: 20070504

Owner name: HUNTER INFORMATION MANAGEMENT SERVICES, INC., NEW

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HUNTER, GREGORY S.;REEL/FRAME:019401/0205

Effective date: 20070504

Owner name: LOCKHEED MARTIN CORPORATION, MARYLAND

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ROBINSON, FRED Y.;RIPLEY, RODNEY J.;REEL/FRAME:019401/0319

Effective date: 20070504

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION