US20110167129A1 - System, apparatus and method for encryption and decryption of data transmitted over a network - Google Patents

System, apparatus and method for encryption and decryption of data transmitted over a network Download PDF

Info

Publication number
US20110167129A1
US20110167129A1 US12/982,690 US98269010A US2011167129A1 US 20110167129 A1 US20110167129 A1 US 20110167129A1 US 98269010 A US98269010 A US 98269010A US 2011167129 A1 US2011167129 A1 US 2011167129A1
Authority
US
United States
Prior art keywords
text
processed
server
input text
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US12/982,690
Other versions
US9002976B2 (en
Inventor
Ben Matzkel
Maayan Tal
Aviad Lahav
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cyberark Software Ltd
Original Assignee
Vaultive Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from PCT/IL2009/000901 external-priority patent/WO2010029559A1/en
Application filed by Vaultive Ltd filed Critical Vaultive Ltd
Priority to US12/982,690 priority Critical patent/US9002976B2/en
Assigned to VAULTIVE LTD. reassignment VAULTIVE LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LAHAV, AVIAD, MATZKEL, BEN, TAL, MAAYAN
Publication of US20110167129A1 publication Critical patent/US20110167129A1/en
Application granted granted Critical
Publication of US9002976B2 publication Critical patent/US9002976B2/en
Assigned to CYBERARK SOFTWARE LTD. reassignment CYBERARK SOFTWARE LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: VAULTIVE LTD.
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • H04L63/0471Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload applying encryption by an intermediary, e.g. receiving clear information at the intermediary and encrypting the received information at the intermediary before forwarding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/06Message adaptation to terminal or network requirements
    • H04L51/066Format adaptation, e.g. format conversion or compression
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/21Monitoring or handling of messages
    • H04L51/212Monitoring or handling of messages using filtering or selective blocking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/42Mailbox-related aspects, e.g. synchronisation of mailboxes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/535Tracking the activity of the user
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/21Monitoring or handling of messages
    • H04L51/226Delivery according to priorities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • H04L63/045Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload wherein the sending and receiving network entities apply hybrid encryption, i.e. combination of symmetric and asymmetric encryption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/12Applying verification of the received information
    • H04L63/126Applying verification of the received information the source of the received data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols

Definitions

  • SaaS Software as a Service
  • a variety of encryption schemes are used to render data unintelligible to anyone who does not possess the appropriate decryption methods or keys.
  • application providers may enable and/or require an information owner to encrypt data in transit between a client and a host using secure socket layer (SSL) encryption or another method.
  • SSL secure socket layer
  • ISP internet service provider
  • the data is accordingly decrypted upon arrival to the hosted application, and the hosted application vendor may view and manipulate the owner's unencrypted data.
  • this method exposes the sensitive data at the hosted application vendor.
  • U.S. Pat. No. 7,165,175 describes an apparatus and method for selectively encrypting portions of data sent over a network between client and server.
  • the apparatus includes parsing means for separating a first portion of the data from a second portion of the data, encrypting means for encrypting only of the first portion of the data, and combining means for combining the encrypted first portion of the data with the second portion of the data.
  • the apparatus further includes decrypting means installed at the client for decrypting the encrypted portion of the data.
  • PCT Patent Publication Number WO 01/047205 discloses enhanced computer network encryption using downloaded software objects.
  • This application describes a method and a system for securing highly sensitive financial and other data contained in transmissions over a public network, such as the World Wide Web, linking a web server computer to a remote client computer.
  • a desired (usually strong) specific standard of encryption for all sensitive communications between web server and client and “pushing” the capability to encrypt to such standard to the client by automatically downloading from the web server to the client, and executing within the client's web browser, software objects to perform encryption/decryption tasks pursuant to the chosen standard, strong encryption is readily assured even if the client did not originally have such strong encryption capabilities.
  • embodiments of the invention may include a system and method for receiving input text at an intermediate device from the client device; processing the input text at the intermediate module to obtain processed text, wherein the processing comprises including bait in the processed text; transmitting the processed text to the server; upon request, receiving at the intermediate module transformed processed text from the server, the server having applied at least one of the plurality of transformations to the processed text to obtain said transformed processed text; and determining by the intermediate module at least one of the transformations applied by the server based on a comparison between the processed text and the transformed processed text.
  • Some embodiments of the invention may further include applying a reverse transformation on the processed text to obtain unprocessed input text; and modifying the unprocessed input text based on the at least one determined transformation. Some embodiments of the invention may yet further include sending the modified unprocessed input text to the client device.
  • At least one transformation of the plurality of transformations comprises replacement of at least one transformable character in the processed text with a matching replacement character or replacement character string, and including bait in the processed text comprises including the at least one transformable character in the processed text.
  • Some embodiments of the invention may yet further include applying a reverse transformation on the processed text to obtain unprocessed input text; and modifying the unprocessed input text by replacing the at least one transformable character in the unprocessed input text with the matching replacement character or replacement character string.
  • Some embodiments of the invention may yet further include sending the modified unprocessed input text to the client device.
  • At least one transformation of the plurality of transformations may comprise omitting HTML tags in the processed text, and including bait in said processed text comprises including an HTML tag in the processed text.
  • Some embodiments of the invention may further include applying a reverse transformation on the processed text to obtain unprocessed input text; modifying the unprocessed input text by omitting HTML tags contained therein; and sending the modified unprocessed input text to the client device.
  • FIG. 1 illustrates a system including an intermediate module and its environment according to an embodiment of the invention
  • FIG. 2 illustrates a flow of data from a client terminal to a network node, according to an embodiment of the invention
  • FIG. 3 illustrates a flow of data from a network node to a client terminal, according to an embodiment of the invention
  • FIG. 4 illustrates a method for encrypting data allowing server-side searching and indexing of encrypted data, according to an embodiment of the invention
  • FIG. 5 illustrates an example of a normalization process and an input text that includes a sentence
  • FIG. 6 illustrates an example for processing a word, according to an embodiment of the invention
  • FIG. 7 illustrates a method for encrypting data allowing server-side sorting of encrypted data, according to an embodiment of the invention
  • FIG. 8 illustrates a method of generating an order preserving function, according to an embodiment of the invention
  • FIG. 9 illustrates an example of three generated order-preserving encryption function using three different keys according to an embodiment of the invention.
  • FIG. 10 schematically illustrates a flow of data enabling searching of encrypted user data in an embodiment of the present invention.
  • the terms “plurality” and “a plurality” as used herein may include, for example, “multiple” or “two or more”.
  • the terms “plurality” or “a plurality” may be used throughout the specification to describe two or more components, devices, elements, units, parameters, or the like. Unless explicitly stated, the method embodiments described herein are not constrained to a particular order or sequence. Additionally, some of the described method embodiments or elements thereof can occur or be performed at the same point in time.
  • a computing system may be any suitable article, processor, chip, controller or suitable computing device suitable for processing data as described herein as well as controlling components in a device.
  • the task of data processing may be distributed among a number of controllers, processors or computing systems.
  • An intermediate module or processor associated therewith may include a controller that may be, for example, a central processing unit processor (CPU), a chip, or any suitable computing or computational device.
  • CPU central processing unit processor
  • Reference to memory may be or may include, for example, a Random Access Memory (RAM), a read only memory (ROM), a Dynamic RAM (DRAM), a Synchronous DRAM (SD-RAM), a double data rate (DDR) memory chip, a Flash memory, a volatile memory, a non-volatile memory, a cache memory, a buffer, a short term memory unit, a long term memory unit, or other suitable memory units or storage units.
  • RAM Random Access Memory
  • ROM read only memory
  • DRAM Dynamic RAM
  • SD-RAM Synchronous DRAM
  • DDR double data rate
  • Flash memory a volatile memory, a non-volatile memory, a cache memory, a buffer, a short term memory unit, a long term memory unit, or other suitable memory units or storage units.
  • Memory may be or may include a plurality of, possibly different memory units.
  • Reference to data a storage device may be or may include, for example, a hard disk drive, a floppy disk drive, a Compact Disk (CD) drive, a CD-Recordable (CD-R) drive, a universal serial bus (USB) device, a redundant array of independent disks (RAID), or any other suitable removable and/or fixed storage unit.
  • a hard disk drive a floppy disk drive
  • CD Compact Disk
  • CD-R CD-Recordable
  • USB universal serial bus
  • RAID redundant array of independent disks
  • Embodiments of the invention may include an article such as a computer or processor non-transitory readable medium, or a computer or processor non-transitory storage medium, such as for example a memory, a disk drive, or a USB flash memory, encoding, including or storing instructions, e.g., computer-executable instructions, which when executed by a processor or controller, carry out methods disclosed herein.
  • an article such as a computer or processor non-transitory readable medium, or a computer or processor non-transitory storage medium, such as for example a memory, a disk drive, or a USB flash memory, encoding, including or storing instructions, e.g., computer-executable instructions, which when executed by a processor or controller, carry out methods disclosed herein.
  • An intermediate module according to embodiments of the invention may include software, hardware, firmware or any combination thereof.
  • FIG. 1 illustrates a system including an intermediate module 200 and its environment according to an embodiment of the invention, as well as a flow of data from client module at workstation 230 to application service provider at network node 260 .
  • Intermediate module 200 may include an interception module 210 and a data protection module 220 .
  • Intermediate module 200 may be operatively connected to a client terminal 230 , e.g., a trusted workstation, and to a network node 260 , e.g., an application service provider, via a network, such as public network 250 .
  • client terminal 230 e.g., a trusted workstation
  • network node 260 e.g., an application service provider
  • FIG. 1 is an exemplary embodiment of the invention, and that other network configurations are possible.
  • trusted workstation 230 and intermediate module 200 may be remote from each other, for example, operatively connected over a trusted network link.
  • trusted workstation 230 may be connected to a plurality of intermediate modules, including for a plurality of organizations, and intermediate their data traffic with one or more application service providers over a public network.
  • the module may reside on the client device, at a gateway server, e.g., on premises associated with the client device, or at a separate server or servers in communication with the trusted client device and the untrusted server.
  • the interception and/or data protection modules may be installed on the trusted workstation, possibly as a browser plug-in, possibly as an operating system driver or module, possibly as a software library and possibly as another software component.
  • the intermediate module may be positioned right in front of the untrusted application, where all accesses to the untrusted application pass through the intermediate module.
  • the intermediate module may be a separate server to which client module transmits input data, which in turn transmits the processed data to the untrusted server.
  • a trusted workstation 230 may be a client computer having installed thereon a client component 240 that may interact with the intermediate module.
  • Client component 240 may be a web application HTML form running in a web browser while network node 260 can be an HTTP web server of a SaaS vendor.
  • Client component 240 can include API client software and, additionally or alternatively, any other method of remotely accessing network node 260 .
  • End users can use client component 240 to enter, retrieve and manipulate data, intended to be passed to, or retrieved from, network node 260 .
  • End users may include human users utilizing a software agent (e.g. a web browser) and automated agents using a client API.
  • a software agent e.g. a web browser
  • Interception module 210 of intermediate module 200 may intercept or otherwise receive input (unprocessed) text from trusted workstation 230 , and provide the input text to data protection module 220 for processing. Interception module 210 may intercept the data flowing between client component 240 and network node 260 , can modify it, and can interfere with the normal data flow. For example, the interception module may trigger an authentication session in order to determine that an end user can access data stored in network node 260 . Interception module 210 can be (or be executed by) a web proxy server.
  • Data protection module 220 may receive input text and process it selectively. Input text that is not selected to be processed may be transmitted as unprocessed text to network node 260 for manipulation and/or storage in storage system 270 substantially without processing, or with less processing than text selected for processing. For text to be processed, data protection module 220 may process the input text to provide processed text, which may be provided over public network 250 to untrusted application service provider 260 for storage, manipulation, etc. According to embodiments of the invention, therefore, application service provider 260 may thereby not receive the unprocessed text, but rather store and manipulate processed text. As described below, the processing may include applying a search- and/or sort-enabling encryption scheme, to thereby provide encrypted text data. According to embodiments of the invention, the processing may selectively encrypt text, selecting which input text to transmit to application service provider 260 in processed form, and which input text to transmit in unprocessed form.
  • intermediate module 200 may include one or more servers, one or more workstations, one or more personal computers, one or more laptop computers, one or more media players, one or more personal data accessories, one or more integrated circuits, and/or one or more printed circuit boards, dedicated hardware, or a combination thereof.
  • Intermediate module 200 may include or provide functionality additional to or unrelated to encryption and/or decryption, and may alter the normal message flow between the client trusted workstation 230 and the server untrusted application 260 . Such additional functionality may have the effect of compensating for server-side functionality lost due to encryption.
  • the intermediate module may receive input data from the client device, intercept said input data, e.g., prevent or otherwise not allow the input data to be transmitted to the server, and the intermediate module may provide the relevant function on the input data that the server would otherwise provide. For example, the intermediate module may generate at least one message to the client device based on a result of the function.
  • the intermediate module may obtain from said client device a response to the at least one message, based on the response, process the input text to obtain processed input text, and transmit the processed input text to the server.
  • a server may generally check the spelling of input text and provide the user with a feedback message, for example, indicating misspelled words and suggesting corrections.
  • the server may not be able to perform spell checking without decrypting the processed text. Therefore, in accordance with embodiments of the invention, the intermediate module may provide additional functionality, for example, spell-checking, on input text, and may provide the user with a feedback message, e.g., a result of the spell-checking function on the input data, such as an error message, a suggested spelling correction, or a message that no errors were detected.
  • such additional functionality may include replacing server-side search functionality, for example, by storing a copy of the user data (or a portion thereof) and searching it in the intermediate module in response to a search request made by the client.
  • such additional functionality may include triggering an authentication session between the client and the intermediate module before allowing user data to be encrypted and decrypted.
  • such additional functionality may include format-checking input data, and if appropriate, for example, if the input data is in a first format, requesting the client to send information in a second format, different from the first format.
  • Such received and/or requested formats may include, for example, (a) a delta-encoded format of the input text wherein only differences from a known version of the input text are transmitted, (b) a full version of the input text, (c) the input text contained in a specific document format, or a combination thereof.
  • input data may be received in a delta-encoded format, and the intermediate module may request the input data in a full input text format.
  • specific document formats include but are not limited to PDF, DOC, HTML, etc.
  • the processed text may be stored at network node 260 , for example, in storage system 270 , and manipulated remotely over public network 250 .
  • the processing may be such that searching and/or sorting may be enabled on the processed text, in such a manner as to be transparent or unseen by the trusted user and/or the untrusted server application, without decrypting the processed data at the application service provider.
  • storage system 270 is at times denoted a database; however, it will be recognized that storage system 270 may be any suitable digital storage architecture, and may be stored on any suitable hardware, e.g., a redundant array of independent disks (RAID), etc.
  • trusted workstation 230 may provide unprocessed input data such as “Acme Corp.” for use by application service provider 260 .
  • the input text may be intercepted at intermediate module 200 , for example, by interception module 210 , and processed by data protection module 220 .
  • Data protection module 220 may process the input text into one or more individual text units referred to as tokens, and control data, which may be encrypted, shown schematically as processed data “DHFOEFRGEJIC”, and send the processed data over network 250 to untrusted application service provider 260 , where it may be manipulated by users and/or stored in database 270 .
  • DHFOEFRGEJIC is schematic, and that any suitable encryption algorithm may be used, for example, resulting in any symbol set.
  • non-Latin characters or symbols may be used, for example, Korean or Chinese language symbols.
  • FIG. 2 illustrates a generalized flow of data from client terminal 230 to application service provider 260 , according to an embodiment of the invention.
  • the end user may provide input text that is not encrypted (clear text).
  • the input data may be transmitted from client terminal 230 towards network node 250 and be intercepted by interception module 210 .
  • Interception module 210 may provide the input text to data protection module 220 that processes the input data to provide processed data, wherein the processing includes encrypting at least a portion of the input text.
  • the processed data may then be sent to interception module 210 , which in turn transmits it over public network 250 .
  • the processed data may be received by network node 260 for manipulation by an application, e.g., a SaaS application, and stored in database 270 .
  • an application e.g., a SaaS application
  • the input data may be new or updated data to be stored in storage system 270 , or it may be any data provided to an SaaS application for real time manipulation, for example, one or more parameters of a command, e.g., a search command.
  • FIG. 3 illustrates a flow of data from network node 260 to client terminal 230 , according to an embodiment of the invention.
  • a process may be initiated by a user at workstation 230 by making a retrieval or search request.
  • the parameter of the request e.g., the terms to be searched for
  • the parameter of the request may be processed as described above in connection with FIG. 2 , and the application at network node 260 may search or sort the processed data, possibly based on the processed parameter provided.
  • Network node 260 may retrieve processed data, for example, in response to a search or retrieval request, where the processed data may include some encrypted portions.
  • the processed data may be sent over public network 250 towards client terminal 230 .
  • Interception module 210 may intercept the processed data and provide it to data protection module 220 to identify any encrypted data within the processed data. Any identified encrypted data may be decrypted, and provided to interception module 210 to resume data communication. Interception module 210 may forward the unprocessed data (decrypted plaintext data) to client component 240 for display to a user.
  • FIG. 10 schematically illustrates a flow of data enabling searching of encrypted user data in an embodiment of the present invention.
  • the client 240 may enter data and make several store requests to the untrusted application 260 passing through the intermediate module 200 .
  • the intermediate module encrypts user inputs such that every searchable word is mapped onto an encrypted searchable word, such that every input searchable word has exactly one corresponding encrypted searchable word. Encrypted searchable words may be normalized before encryption.
  • the words “the” and “a” are considered non-searchable and do not result in an individual encrypted searchable token. Conversely, the words “dog” and “cat” map into the encrypted searchable words “eeee” and “bbbb” respectively.
  • the information holding the case markers for the searchable words and the non-searchable words is contained in the encrypted tokens “ZZZytuv” and “ZZZabcd”.
  • FIG. 4 is a schematic illustration of a data processing method 100 designed to enable server-side searching and/or indexing of user textual data, according to an embodiment of the invention.
  • Method 100 may be applied by an intermediate module, for example, by a data protection module as described above. It will be understood that the method of receiving processed data and converting it to unprocessed data may be substantially the reverse of the described method.
  • Method 100 starts at stage 110 by receiving input message, for example, by an intermediate module operatively connected between a client terminal and a network node.
  • an input message may include a First Name field, a Last Name field, and a Document Body field.
  • the method may iterate over all identified data units, first obtaining an unhandled data unit at stage 113 , then selecting whether or not to process the obtained data unit.
  • Processed data units may be processed individually or collectively.
  • the method may then determine whether to process the input data. Input data that are not modified are retained (stage 130 ). At stage 115 , the method may determine whether and/or what portions of the input data unit text should be processed. For example, portions of an input text not suitable for encryption may include search connectors such as “OR”, “AND”, or application-specific significant text markup such as “ ⁇ important ⁇ ” or “@location”, indicating a special kind of server processing to be carried out on the data.
  • search connectors such as “OR”, “AND”, or application-specific significant text markup such as “ ⁇ important ⁇ ” or “@location”, indicating a special kind of server processing to be carried out on the data.
  • the method proceeds to stage 116 , in which the input text is broken down into individual text units called tokens (the process of determining tokens from the input text is referred to herein as tokenization).
  • tokenization is optional, and method 100 may include (a) encrypting all input data together as a single token, (b) encrypting input data determined to be suitable for encryption separately, to provide a plurality of processed tokens, wherein each processed token represents a piece of input text, or (c) a combination thereof.
  • the method may then proceed to stage 117 , in which certain input tokens may be recognized as unsuitable for searching.
  • the criterion for determining each individual word may be a list of predefined words, a threshold word frequency in a word frequency list such as English dictionary frequency list, the length of the word, or a combination thereof.
  • the method may extract information unimportant for searching from searchable input tokens, for example: letter case, letter diacritics, ligature breakup, Unicode character composition or decomposition (as defined by the Unicode standard).
  • the extracted information may be stored for later use in a separate location and may be placed in an output token called a control token.
  • the text tokens may be converted into a normalized form which does not contain the extracted information. This process is referred to herein as normalization. It will be recognized that normalization is optional, and may be done in any suitable manner.
  • the method may obtain bit representations of all information units to be encrypted, including searchable tokens, information extracted from searchable tokens, and other portions of the input, in order to encrypt it using a cryptographic cipher.
  • Information units may be classified as searchable or non-searchable. Non-searchable information units may be combined or broken up. The order of searchable tokens in the input text may be changed, and an indication of the original order may be added to the non-searchable information units.
  • the method may encrypt information units by using a cryptographic cipher, such as AES or DES.
  • a cryptographic cipher such as AES or DES.
  • the method may convert the encrypted bit representations into output text units consisting of a sequence of characters taken from a character set, for example, one or more predefined contiguous portions of Unicode, as described in further detail below.
  • This character set may be defined in advance to assist decrypting.
  • the input data unit in the input message may be replaced with the output text obtained at stage 121 .
  • the method may continue to apply stages 112 - 122 to all identified input units, and then transmit the processed message to the network node hosting the server application (stage 131 ).
  • the data processing method may involve tokenization, which in turn may involve a number of steps. It will be understood that some of the steps described in connection with the illustration of tokenization below are are optional. Furthermore, it will be understood that de-tokenization, i.e., converting tokenized processed data into unprocessed data, may be substantially the reverse of the described method.
  • input texts may be broken into a number of segments in a process called tokenization. Segments holding individually searchable terms are called (unprocessed) input tokens, where input tokens are typically whole words. Input segments that are not tokens are added to an information set called a Non-Searchable Information Set. Such segments may include punctuation, space characters, and other characters.
  • ⁇ words may be combined into a single token, or a single word may be broken into two or more constituent tokens.
  • compound words like “whiteboard” may be decomposed into individually searchable tokens “white” and “board”.
  • languages such as Chinese or Japanese do not usually use spaces or another distinct character to separate words in written text, and thus a single Chinese input text may be broken into several input tokens. The indication of such combination or breaking may be added to the non-searchable information set.
  • Tokenization may include detection of morphological variants of words, modifying the input token to a normalized form and adding an indication of the original input token to the non-searchable information set.
  • morphological invariants of words may include plural versus singular noun forms (“word”, “words”), verb conjugation (“cry”, “cried”, “crying”), etc.
  • Tokenization may include detection of words unlikely to be searched for, and their removal from the set of searchable input tokens and addition to the non-searchable information set. For example, such detection may use (a) a predefined set of words, (b) a dictionary holding word frequency list and a threshold frequency where words with frequency above the threshold frequency are considered unsearchable, (c) a minimum and/or maximum length for a searchable word, or (d) any combination thereof.
  • Tokenization may support server-side searching and/or indexing which ignore certain character properties, such as letter case, diacritics, ligatures or Unicode composition/decomposition. For example, searching for “ToKeN” and “tOkEn” may produce the same results when searching text, having all strings containing a variant of the word “token” to appear on the search results.
  • Supporting such property-insensitive searching may be performed by (1) converting every input character into a single canonical form, (2) producing an indication of the original character, and (3) adding this indication to the non-searchable information set.
  • tokenization may support case-insensitive searching on the server side by converting input token characters into a single letter case (e.g. lowercase) and adding an indication of the original letter case to the non-searchable information set.
  • diacritical marks may be ignored during searching, Ignoring added, removed or modified diacritical marks, e.g., “E” or “E” or “E”.
  • a search for “cafe” will match user data such as “Café”, “CAFE”, “c ⁇ fe” or “çafe”.
  • the system may convert all these word instances into the normalized form “cafe” add an indication of the original diacritics to the non-searchable information set.
  • the system may support ligature-insensitive searches (for example, d ⁇ mon and daemon).
  • the system may convert ligatures into normalized form such as converging “ ⁇ ” to “ae”, produce an indication of the original ligature, and add it to the non-searchable information set.
  • FIG. 6 illustrates processing of the word “Café”.
  • the input text is stripped of the uppercase and diacritics, and converted to the token “cafe”.
  • the associated control token indicates that the first letter is uppercase, and that the fourth letter has an acute accent.
  • letters may be assumed to be lowercase with no diacritics, so that the control token need not indicate lowercase letters or absence of diacritics.
  • processing input text may include detection of application-specific text at least one handling instructions, and may either add these handling instructions to the non-deterministically transformed text or leave this information in clear text in the processed text, so that the untrusted server may apply any kind of handling related to this text augmentation information.
  • HTML is a text augmentation which may add formatting information to user text by embedding HTML tags in the text.
  • the system may handle input HTML tags by at least one of: (1) adding HTML tags to the non-searchable information, (2) including input HTML tags in the output processed text without encryption to allow server-side handling, (3) treating HTML tags as normal text, e.g., applying any handling performed on non-HTML-tag input text to the HTML tags.
  • the intermediate module may decide not to transform said at least one handling instruction.
  • the intermediate module may decide to transform said at least one handling instruction non-deterministically.
  • the system may add context information to the non-searchable information set, such as the time, the user, or other information known to the system when producing processed text.
  • the system may add custom indications to the encrypted tokens such as “important” or “sensitive”, such that upon decryption these indications may be noticed, an event indicating the decryption of the input information may be generated, and this event handled, for example, by adding a record to a log file.
  • Processing the input text may include changing an order of input tokens within the processed text.
  • token order indication may be generated to indicate an order of the input tokens in the original input text, and may be added to the non-searchable information set.
  • Processing the input text may include generating at least one fake or decoy excess tokens to be included in the output text.
  • decoy tokens can make the encrypted text more robust to statistical analysis.
  • the excess decoy tokens may be added with an intended target statistical distribution in order to disguise decoy tokens and make decryption by statistical analysis yet more difficult.
  • the at least one excess tokens are distinguishable from other tokens included in the processed text only after gaining access to a secret key. For example, English-language word frequencies may be used as a model for the target distribution of decoy tokens.
  • the non-searchable information set may be arranged in one or more non-searchable tokens (also referred to herein as control tokens), which may be included in the processed output text.
  • the control tokens may be placed before the normalized set of input tokens, after the normalized set of the input token, or can be located within the normalized set of input tokens.
  • the non-searchable information set may be fully or partially encrypted, and then included in the processed output text.
  • bit representations of non-searchable information set and searchable tokens may be obtained. Obtaining such bit representations may include compressing and encoding input data in certain encoding and compression schemes.
  • Error detection indication may be generated and added it to the non-searchable information set. For example, a checksum of the input text may be calculated and added to the non-searchable information set.
  • the obtained bit representations of input tokens and possibly the non-searchable information set may then be encrypted wholly or partially.
  • Encryption of searchable input tokens may provide a single encrypted form for every instance of a searchable input token.
  • Encryption of non-searchable information may provide a single or multiple encrypted forms for every instance of the same information set. Multiple encrypted forms may provide better security, but can render certain server-side operations difficult or impossible without decrypting the user data.
  • Multiple encrypted forms may use at least one bit of cryptographic salt embedded in the encrypted form.
  • the encrypted forms may then be converted into textual forms using a suitable encoding scheme.
  • a suitable encoding scheme may provide at least one of the following properties: (a) separation of encrypted tokens to allow an untrusted server application to determine searchable units within the processed text, (b) using a character set which does not cause an untrusted server application to determine searchable units (for example, the character “+” may be used to separate words by an untrusted server application and therefore may not be suitable for encoding encrypted tokens; for example, using both English and Hebrew characters may cause an application to separate sequences of both sets), (c) providing a compact representation such that server-side length limitation are less likely to be met, and (d) using an efficient algorithm in the intermediate module for encoding and decoding.
  • processed text may comprise a string of characters selected from a predetermined character set, for example, a character set comprising at least one contiguous subset of the Unicode character set.
  • the at least one contiguous subset may include characters in the letter character category, the number character category, or both.
  • the characters selected for use in the processed text may be selected from among a plurality of contiguous subsets of the Unicode character set, for example, two, three, four, or five separate subsets of the Unicode character set may be selected.
  • the number of subsets may be more than one and less than or equal to ten subsets of the Unicode character set.
  • the subset of the Unicode character set may be one or more subsets selected from Korean Hangul, Chinese, Japanese and Korean (CJK) Unified Ideographs, and a combination thereof.
  • Korean language characters may be used for server applications storing user input using UTF-16 encoding.
  • Korean characters represent a single range within the Unicode character set which contain only letter characters, they have an efficient encoding and decoding implementation.
  • Chinese character set may be used for the same reason but having a greater range than Korean; however, use of the Chinese character set may not be suitable in server application that separately search and/or index every individual Chinese character.
  • BASE64 encoding may be used for server applications storing user input using UTF-8 encoding.
  • BASE64 encoding itself contains the characters “+” and “/” which may cause server applications to conclude that a single encrypted token has one or more encrypted words.
  • space characters may be used to separate encrypted tokens.
  • Another character such as a period “.” may be used to separate encrypted tokens where space characters are not expected, for example in email address fields.
  • Processed output text may be included in unencrypted text when being received at the intermediate module, when sent from the untrusted server.
  • the system may generate a statistically significant feature in processed text.
  • the system may include a rare character or combination of characters in the processed text, to be searched for when detecting encrypted text within unencrypted text.
  • processed output text may be arranged in more than one output token, such that output tokens do not exceed certain length limits
  • a length limit of 50 characters may be applied to the first output token and a length limit of 1000 characters may be applied to subsequent output tokens.
  • Some embodiments of the invention may use deterministic or non-deterministic transformations of input text, or a combination thereof.
  • Embodiments of the present invention may decide whether to transform input data (or portions thereof) deterministically or non-deterministically, or a combination thereof, then based on such decision, transform the input text deterministically or non-deterministically, or a combination thereof using at least one secret key to thereby obtain processed text, and transmit the processed text to the server.
  • a non-deterministic transformation to an input text is one whose result may be one of a plurality of possible outputs.
  • a deterministic transformation to an input text is one that may include only one possible output. Both kinds of transformations may typically use or depend on a secret key for determining the possible output or outputs.
  • deterministic token representations may be obtained, e.g., by applying reversible encryption depending on a secret key, or using an irreversible encryption using a secret key.
  • Non-deterministic tokens representations may be obtained, e.g., by applying a symmetric encryption algorithm using a secret key, or by applying an asymmetric encryption algorithm, using the private key of a public-private key pair as a secret key, or by other reversible transformation depending on a secret key.
  • the server may provide search functionality over previously entered input texts.
  • the intermediate module may choose in such case to deterministically transform individual searchable tokens within the input text. Such deterministic transformation may allow future search queries containing processed searchable terms to be processed correctly at the server.
  • Portions of the input text may be transformed non-deterministically, for example, in order to provide enhanced security.
  • portions of input text may be transformed deterministically in order to allow server-side functions requiring exact matches between recurring instances of portions of input texts. For example, if a server may compare multiple revisions of an input text, wherein each revision is slightly different from its respective preceding revision, the server may provide a word-by-word or line-by-line difference analysis. Therefore, in such an example, deterministically transforming words or lines of input text allows such exact-match semantics on the server.
  • the step of processing input text in an embodiment of the invention may include (1) encrypting some or all of the input text into one or more processed tokens in a non-deterministic fashion, (2) generating processed tokens corresponding to some or all suitable input tokens of the input text (e.g., after tokenization, normalization of the input text, etc.) in a deterministic fashion, and (3) including both the non-deterministically and deterministically transformed processed data in the output processed text for transmission and storage at the network node.
  • the decision whether to transform the input text deterministically or non-deterministically, or a combination thereof may be based on whether said word is member of a set of words.
  • input tokens to be made available for searching may be transformed deterministically, thereby enabling a search on such words.
  • the processed input text which may include deterministically and non-deterministically transformed processed data may be returned as a search result.
  • input tokens not made available for searching need not be transformed deterministically.
  • the decision whether to transform the input text deterministically or non-deterministically, or a combination thereof may be based on the length of the word.
  • it may be decided to transform a word of the input text non-deterministically based on a length of said word.
  • short words e.g., words containing less than three characters
  • longer words e.g., words having three or more characters
  • short words having less than the minimum number of characters may not be searchable.
  • the non-deterministic transformation may be performed using a first key, and the deterministic transformation may be performed using a second key.
  • the first key and the second key may be identical. In other embodiments of the invention, the first and second keys may be different.
  • one or more deterministically generated tokens may be dropped or eliminated if the overall length of the output text exceeds a length limit. In some embodiments of the invention, the decision may be made not to transform at least a portion of the input text.
  • processed text may be received at the intermediate module, and a suitable reverse processing may be applied on the processed text to obtain original input text.
  • the original input text may be sent or otherwise provided to the client device, for example, to be displayed or provided to a user or application operating the client device.
  • Input text received at the intermediate module may be search queries including at least one search term to search for.
  • Search query input texts may be processed by the intermediate module in order to (a) facilitate correct search functionality at the network node, and (b) enable decryption of the search query at the intermediate module, if the network node sends it back to the client.
  • Search queries are generally processed at the network node in the same manner as other input texts are processed, and may apply further processing stages.
  • the step of transforming the input text may comprise deterministically transforming at least one search term in the search query using a first key to produce at least one deterministically transformed search term.
  • the step of transmitting the processed input text to the server may comprise transmitting the plurality of deterministically transformed search terms to the server.
  • a plurality of search terms in the search query may be treated and transformed separately.
  • the processed search query may include substantially only deterministically transformed search terms, wherein the deterministic transformation may be a reversible transformation.
  • the network node may search for the processed terms, and may return the result set to the client.
  • the intermediate module may use the processed search terms to obtain original input text.
  • transforming the search query may further comprise non-deterministically transforming substantially the entire search query using a second key to produce a non-deterministically transformed text, and combining the at least one deterministically transformed search term and the non-deterministically transformed text using a logical disjunction operator (e.g., the “OR” operator) to obtain a combined processed text, wherein transmitting the processed input text to the server comprises transmitting the combined processed text to the server.
  • the network node may search for the processed search terms and for the non-deterministically processed text in disjunction, obtaining (or failing to find) results based on the deterministically transformed search terms, and obtaining no results for the non-deterministically transformed text.
  • the result of the search may therefore be to return the result of the search on the processed search terms.
  • the intermediate module may receive from the network node the non-deterministically transformed text, from which it may then obtain the original input text of the search query.
  • Some network node servers may return truncated search results in response to a query or other requests. For example, if the result of a search query is a 100 character field, the server may return only the first 20 characters of the field, and if the user selects the found record, the server will provide the full field. According to embodiments of the invention, the intermediate module should be able to work within such constraints. According to embodiments of the invention, where the server truncates units of the processed text, these units may be individual tokens within the processed text, the processed text as a whole, or both.
  • this problem may be solved, for example, by providing a repository of processed texts at the intermediate module, or at a storage device managed or otherwise controlled or accessible by the intermediate module.
  • the system may attempt to recover from such truncations before obtaining the original input text during the decryption stage, as follows: (1) the intermediate module may store unabridged processed text units at a trusted storage during the encryption stage, e.g., not via the untrusted server or its associated storage device, (2) when a truncated processed text is sent from the server and received at the intermediate module, the trusted storage unit is consulted to determine whether there exists therein one or more non-truncated processed text units matching or corresponding to the truncated processed text units, (3) if so, the intermediate module replaces the truncated processed text units with the corresponding unabridged processed text units to obtain a recovered processed text, (4) the recovered processed text are processed by a reverse processing method (e.g. decryption using a secret key) to obtain the original
  • what is stored in the repository may be at least one unabridged processed element associated with the processed text.
  • the processed element may be said entire processed text or a word or other portion contained in the processed text.
  • system and method using the repository may be applied to any suitable request from the client device, including, for example, a search request, a record request, or a report request.
  • An untrusted server may often apply one or more of a multitude of transformations on instances of processed user data. Such transformations may be expected by a client component residing on the trusted workstation, but may not be known to the intermediate module described herein. According to embodiments of the invention, therefore, the intermediate module may utilize methods to infer the kind of transformation applied to processed user data.
  • the intermediate module may add excess information (referred to herein as bait) to encrypted user data in known locations.
  • Bait may be used when processed user data is received at the intermediate module in order to infer the kind of transformation applied to processed user data.
  • transformations for which bait may be used are application of a certain character encoding scheme and HTML tag elimination.
  • an untrusted server may apply various and possibly combined encoding schemes to encrypted user data received thereat.
  • the encrypted text may be encoded in one of a multitude of encoding schemes used by an untrusted server application to communicate with the client component residing on the trusted workstation.
  • the encoding scheme may or may not be indicated in the message generated by the server.
  • the client component may typically be aware of the server component and may reliably know the encoding scheme used.
  • the intermediate module may not be aware of the specific encoding used in every instance of encrypted text.
  • the intermediate module when decrypting user data before providing decrypted user data to the client component, should be able to use the same encoding scheme applied in the server and expected by the client. That is, if the intermediate module does not know the encoding scheme used by the untrusted server and the trusted workstation, information may become lost or garbled in the processing and deprocessing by the intermediate module.
  • the intermediate module may add predetermined characters known as encoding bait to encrypted text.
  • the encoding bait may be encoded by the server along with the encrypted user data before providing to the client component.
  • the intermediate module detects encrypted tokens, the encoding bait may be examined to infer the kind of encoding scheme being used for encoding an instance of encrypted text. Accordingly, the intermediate module may use the inferred encoding scheme to encode decrypted text in a processed message.
  • Non-limiting examples of encoding schemes include: (i) UTF-8 encoding, (ii) encoding using HTML escape sequence followed by UTF-8; and (iii) encoding using JavaScript escape sequences, then again using JavaScript escape sequences, and then performing Latin-1 encoding (AKA ISO-8859-1).
  • JavaScript escaping typically operates by replacing characters with a backslash and another character; for example, the newline character is replaced with a backslash and the character “n”, i.e. the sequence “ ⁇ n”.
  • bait may be used to detect at least one transformation including replacement of at least one transformable character in the processed text with a matching replacement character or replacement character string, e.g., one or more escape characters.
  • the user may input the string “This ‘ is a quote”. This is encrypted, for example, into “QIFJDJNZOP”. During encryption, bait is attached to an encrypted token so that “QIFJDJNZOP” becomes “ ⁇ QIFJDJNZOP”, in which ⁇ is the bait.
  • the server may receive the encrypted string, and send the string to the client in a JavaScript file. In a JavaScript file, the server needs only to escape the backslash, but not the angle bracket. Accordingly, the message sent to the client includes: “ ⁇ QIFJDJNZOP”, in which the original backslash of the bait is escaped using another backslash.
  • the intermediate module When the intermediate module detects the encrypted token in the message preceded by the original angle bracket and the escaped backslash, it may infer that the token is JavaScript-escaped. Thereupon, the intermediate module may decrypt the input QIFJDJNZOP into “This ‘ is a quote”. However, having inferred that the client is expecting a JavaScript-escaped text, the module may then use JavaScript escaping to encode the decrypted string, e.g., by escaping the quote to produce “This ⁇ ’ is a quote”. The decrypted quote is thus using the encoding rules inferred from the encoded bait. The decrypted and encoded string is then forwarded to the client.
  • HTML transformations of which HTML tag elimination is a special case.
  • An untrusted server may receive text augmented with HTML markup, generate instances of received text with all or some HTML tags removed, and may return these instances to the client component.
  • the intermediate module may include an HTML tag bait in processed user data.
  • the HTML tag bait may be removed by the intermediate module when receiving processed user data, and infer, from its existence or inexistence, whether HTML tags may be removed from decrypted user data, and may accordingly retain or remove decrypted HTML tags in a message returned to the client component.
  • multiple pieces of bait may be added to a processed text to detect a plurality of transformations or encoding schemes applied by the untrusted server.
  • a plurality of separate portions of the input text may be transformed in which at least one of the plurality of portions of said input text includes no more than a maximum number of characters, for example, by truncation of the respective portion. In some embodiments of the invention, a plurality of separate portions of the input text may be transformed in which each of the plurality of portions of said input text includes no more than a maximum number of characters, for example, by truncation of the respective portion.
  • FIG. 5 illustrates the normalization and tokenization of an input text that includes the sentence “This sentence has FIVE words!”
  • Input text 510 includes the sentence “This sentence has FIVE words!”
  • the sentence may be tokenized to the following input tokens “This”, “sentence”, “has”, “FIVE”, “words”, and “!”.
  • These input tokens may be normalized to provide normalized input tokens and metadata.
  • the normalized input tokens have the following format: “This”, “sentence”, “has”, “five”, “words”, and “!”.
  • the metadata associated with “sentence” is “lower case”.
  • the metadata associated with “FIVE” is “upper case”.
  • the metadata associated with “words” is “lower case” and “plural”.
  • the method may detect common input tokens, including the words “this”, “has” and the non-word “!”. These input tokens may be encrypted in a non-deterministic manner, e.g., they may be encrypted with salt (denoted “*”).
  • the method may detect uncommon input tokens “word”, “sentence” and five”. These words may be encrypted in a deterministic manner.
  • the order of input tokens may be changed and order metadata may be generated accordingly.
  • the order metadata, the case metadata, and the plural metadata may be included in a control token 530 .
  • a text processing feature common in many SaaS applications is sorting records by lexicographic order of a particular field or other attribute. It may therefore be beneficial to provide processed text by an order-preserving encryption process.
  • order preservation can be obtained by any of the following methods: (i) maintaining a list of all records on the interception module, performing site-specific ordering when needed. This method requires almost duplication of each server's functionality in both presentation and data management; (ii) providing an API for the server to query the sort order of a particular string; or (iii) creating a lexicographically sortable representation which preserves the real sort order without any modification in the network node.
  • An encryption method may preserve order of input text records by applying the following stages or a combination thereof: (1) converting input data into a numeric values (if not already numeric), (2) applying an order-preserving transformation on the numerical values to obtain output numeric value, (3) obtaining a lexicographically sortable representation from the output numeric value, and (4) using the lexicographically sortable representation in the processed output text, as either a prefix string (in textual data) or as the whole output data.
  • the order-preserving transformation may be a monotonously increasing function.
  • the order preserving function may use a private key that can be generated from a random source, in order to parameterize its functionality. A private key may be generated for every set of inputs sorted collectively as a set.
  • generating order information may include applying an order-preserving, secret-key-dependent function on the input text.
  • order information may be produced based on a truncated version of the input text.
  • the order information may be produced based on a plurality of truncated words in the input text, in the order in which they appear therein.
  • the intermediate module may process input text by applying an order-preserving transformation, wherein the order-preserving transformation comprises generating order information based on the input text, the order information indicative of a relative order of the input text within a set of possible input texts according to a collation rule, transforming the input text to obtain processed text, and transmitting the processed text to the server.
  • the order information may be sent to the server in association with said processed input text by adding the order information as a prefix to the processed input data and transmitting the combined order information and processed input data to the server.
  • the intermediate device may consider only a reduced portion of the input data when generating an order-preserved output. Reducing the input to obtain a reduced portion of the input data may include (a) ignoring certain words such as “the”, “a”, (b) ignoring all characters in every word occurring at a certain position within the word or later, e.g. ignoring the characters “ra” in “zebra”, (c) ignoring final words within the record (d) contracting the input domain of the order-preserving function, (e) ignoring certain character properties such as letter case, or (e) a combination thereof.
  • FIG. 7 illustrates various stages of method 170 according to an embodiment of the invention that may be used to obtain an order-preserving representation of textual data to be included in processed text.
  • input text to be encrypted may be received.
  • certain words may be discarded from the input text.
  • certain character properties may be discarded, such as letter case, diacritics, ligatures or other character properties.
  • input words may be truncated according to a predetermined parameter of the encryption scheme, such that final characters from input words may be discarded.
  • stage 175 certain final words of the input text may be discarded. Accordingly, performing one or more of optional stages 172 , 173 , 174 , and 175 may produce a reduced input text.
  • the (optionally reduced) input text may be converted into a numeric value to obtain a input numeric value.
  • an order-preserving function may be applied to the input numeric value to obtain an output numeric value.
  • an order preserving representation may be obtained from the output numeric value.
  • the order preserving representation may be placed as either a prefix or the whole encrypted data of the processed text.
  • the input numeric value of input text “The Green Zebra” may calculated as follows: (i) receiving a set of input tokens “The Green Zebra”; (ii) ignoring irrelevant input token “the” to provide relevant input tokens “Green Zebra”, (iii) normalizing the relevant input tokens to provide “green zebra”; (iii) selecting, for example, based on user definitions, only the first three letters of every input token, to provide six relevant characters: “gre zeb”; (iv) calculating the numeric value as shown in Table 1 of each letter based on the weight of its location in the input token; and (v) summing up the letters values to provide a numeric value of the set of input tokens which is 0.296199790068345.
  • the alphabet size is 26.
  • FIG. 8 illustrates a method 300 of generating an order-preserving function according to an embodiment of the invention, to be used, for example, in stage 177 of method 170 .
  • the domain (D 1 , D 2 ) and range (R 1 , R 2 ) of the function may be determined, for example, according to configuration by a user or program.
  • a private key K is obtained to be used in calculation of the order-preserving function output value.
  • an input value V in is received (possibly from stage 176 of method 170 ).
  • the function range may be altered, so it starts and ends at key-dependent positions, lying within the original range.
  • the numeric input value V in is checked to see whether it lies within the lower part (D 1 , D mid ) or higher part (D mid , D 2 ) of the current domain (D 1 , D 2 ). If V in lies within the lower part, then stage 188 a is carried out, otherwise stage 188 b is carried out.
  • stage 188 a and 188 b the function's domain (D 1 , D 2 ) and range (R 1 , R 2 ) are modified: in stage 188 a , (D 1 , D 2 ) is set to (D 1 , D mid ) and (R 1 , R 2 ) is set to (R 1 , R L ); in stage 188 b , (D 1 , D 2 ) is set to (D mid , D 2 ) and (R 1 , R 2 ) is set to (R H , R 2 ). Stages 185 - 188 may be repeated until a predetermined stop criterion is satisfied at stage 189 .
  • the stop criterion may be for example a threshold size D threshold being greater than the current domain size
  • D 2 ⁇ D 1 ; or a threshold size R threshold being greater than the current range size
  • R 2 ⁇ R 1 ; or a combination thereof.
  • the following example illustrates an encoding scheme which may be used in stage 178 of method 170 . It is assumed that the transformed numeric value generated by an order preserving function is 0.344323947, that the lexicographically sortable representation is ten characters long and includes only lowercase English letters only. Table 2 illustrates the ten iterations of an arithmetic coding scheme that is applied to generate ten characters of the lexicographically sortable representation.
  • the lexicographically sortable representation is “hxsutgeslc”.
  • a physical computer readable medium can be provided. It stores instructions that when executed by a processor can cause the processor to implement method 100 or portions thereof.
  • the physical computer readable medium can be a disk, a diskette, a tape, a cassette, a disk on key, a flash memory unit, a volatile memory unit, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)

Abstract

A method and system for securing data transmitted between a client device and a server by obtaining input text at an intermediate module, processing the input text to obtain processed text, and transmitting the processed text to the server. According to one embodiment of the invention, the intermediate module may add excess information (referred to herein as bait) to encrypted user data in known locations. Such bait may be used when processed user data is received at the intermediate module in order to infer the kind of transformation applied to processed user data. Non-limiting examples of transformations for which bait may be used are application of a certain character encoding scheme and HTML tag elimination.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application is a continuation-in-part application claiming priority from PCT/IL2009/000901, International Filing Date Sep. 15, 2009, which in turn claims priority from U.S. Provisional Patent Application Ser. No. 61/096,891 filed Sep. 15, 2008, the contents of which are incorporated herein by reference in their entirety.
  • This application also claims priority from U.S. Provisional Patent Application Ser. No. 61/291,398 filed Dec. 31, 2009, and from U.S. Provisional Patent Application Ser. No. 61/306,207 filed Feb. 19, 2010, the contents of which are incorporated herein by reference in their entirety.
  • BACKGROUND OF THE INVENTION
  • The Internet and the World Wide Web allow companies and organizations to offer services in a document, such as a digital form of web applications, to businesses and individuals who may access and utilize these services with a personal computer and a web browser. Making such documents and particularly applications available over a network is typically referred to as Software as a Service (“SaaS”). Some examples of applications that may be provided in SaaS form are electronic mail, instant messaging, productivity tools, customer relationship management, enterprise resource planning, human resources applications, blogs, social networking sites, etc.
  • This model has inherent security risks. User data, such as messages, customer records, and company financials, are stored on remote servers beyond the control of the provider of the user data. Storing personal or corporate information on remote servers exposes the data owner to many risks, and implies that the information's owner must trust the entity that owns the computer systems hosting the information and the network connecting the information owner and the hosting systems.
  • For instance, commonly known accounting software solutions require their customers to post accounting information to be stored on the solution provider's servers. In such systems, the customer must entrust the solution provider with the accounting information, thereby relinquishing a certain measure of control over the privacy and integrity thereof.
  • In certain software applications, a variety of encryption schemes are used to render data unintelligible to anyone who does not possess the appropriate decryption methods or keys. For example, application providers may enable and/or require an information owner to encrypt data in transit between a client and a host using secure socket layer (SSL) encryption or another method. This prevents an internet service provider (ISP) and other potential eavesdroppers from seeing the data itself during transit. The data is accordingly decrypted upon arrival to the hosted application, and the hosted application vendor may view and manipulate the owner's unencrypted data. However, this method exposes the sensitive data at the hosted application vendor.
  • U.S. Pat. No. 7,165,175, describes an apparatus and method for selectively encrypting portions of data sent over a network between client and server. The apparatus includes parsing means for separating a first portion of the data from a second portion of the data, encrypting means for encrypting only of the first portion of the data, and combining means for combining the encrypted first portion of the data with the second portion of the data. The apparatus further includes decrypting means installed at the client for decrypting the encrypted portion of the data.
  • PCT Patent Publication Number WO 01/047205, discloses enhanced computer network encryption using downloaded software objects. This application describes a method and a system for securing highly sensitive financial and other data contained in transmissions over a public network, such as the World Wide Web, linking a web server computer to a remote client computer. By determining a desired (usually strong) specific standard of encryption for all sensitive communications between web server and client, and “pushing” the capability to encrypt to such standard to the client by automatically downloading from the web server to the client, and executing within the client's web browser, software objects to perform encryption/decryption tasks pursuant to the chosen standard, strong encryption is readily assured even if the client did not originally have such strong encryption capabilities.
  • One problem with the application of these approaches to hosted SaaS applications is that such applications require that operating information, e.g., data made available for manipulation over the network, be unencrypted in order to allow manipulation of the information by the application provider, thereby exposing the data to the application provider, and otherwise rendering the data vulnerable to security concerns during manipulation.
  • SUMMARY OF EMBODIMENTS OF THE INVENTION
  • In systems having a server and a client device, wherein the server is adapted to transform text received from the client device by applying at least one of a plurality of transformations, embodiments of the invention may include a system and method for receiving input text at an intermediate device from the client device; processing the input text at the intermediate module to obtain processed text, wherein the processing comprises including bait in the processed text; transmitting the processed text to the server; upon request, receiving at the intermediate module transformed processed text from the server, the server having applied at least one of the plurality of transformations to the processed text to obtain said transformed processed text; and determining by the intermediate module at least one of the transformations applied by the server based on a comparison between the processed text and the transformed processed text.
  • Some embodiments of the invention may further include applying a reverse transformation on the processed text to obtain unprocessed input text; and modifying the unprocessed input text based on the at least one determined transformation. Some embodiments of the invention may yet further include sending the modified unprocessed input text to the client device.
  • According to some embodiments of the invention, at least one transformation of the plurality of transformations comprises replacement of at least one transformable character in the processed text with a matching replacement character or replacement character string, and including bait in the processed text comprises including the at least one transformable character in the processed text. Some embodiments of the invention may yet further include applying a reverse transformation on the processed text to obtain unprocessed input text; and modifying the unprocessed input text by replacing the at least one transformable character in the unprocessed input text with the matching replacement character or replacement character string. Some embodiments of the invention may yet further include sending the modified unprocessed input text to the client device.
  • According to some embodiments of the invention, at least one transformation of the plurality of transformations may comprise omitting HTML tags in the processed text, and including bait in said processed text comprises including an HTML tag in the processed text. Some embodiments of the invention may further include applying a reverse transformation on the processed text to obtain unprocessed input text; modifying the unprocessed input text by omitting HTML tags contained therein; and sending the modified unprocessed input text to the client device.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The foregoing and other objects, features, and advantages of the present invention will become more apparent from the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, similar reference characters denote similar elements throughout the different views.
  • FIG. 1 illustrates a system including an intermediate module and its environment according to an embodiment of the invention;
  • FIG. 2 illustrates a flow of data from a client terminal to a network node, according to an embodiment of the invention;
  • FIG. 3 illustrates a flow of data from a network node to a client terminal, according to an embodiment of the invention;
  • FIG. 4 illustrates a method for encrypting data allowing server-side searching and indexing of encrypted data, according to an embodiment of the invention;
  • FIG. 5 illustrates an example of a normalization process and an input text that includes a sentence;
  • FIG. 6 illustrates an example for processing a word, according to an embodiment of the invention;
  • FIG. 7 illustrates a method for encrypting data allowing server-side sorting of encrypted data, according to an embodiment of the invention;
  • FIG. 8 illustrates a method of generating an order preserving function, according to an embodiment of the invention;
  • FIG. 9 illustrates an example of three generated order-preserving encryption function using three different keys according to an embodiment of the invention; and
  • FIG. 10 schematically illustrates a flow of data enabling searching of encrypted user data in an embodiment of the present invention.
  • DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION
  • In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, it will be understood by those skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, and components have not been described in detail so as not to obscure the present invention.
  • Although embodiments of the invention are not limited in this regard, discussions utilizing terms such as, for example, “processing,” “computing,” “calculating,” “determining,” “establishing”, “analyzing”, “checking”, or the like, may refer to operation(s) and/or process(es) of a computer, a computing platform, a computing system, or other electronic computing device, that manipulate and/or transform data represented as physical (e.g., electronic) quantities within the computer's registers and/or memories into other data similarly represented as physical quantities within the computer's registers and/or memories or other information non-transitory storage medium that may store instructions to perform operations and/or processes.
  • Although embodiments of the invention are not limited in this regard, the terms “plurality” and “a plurality” as used herein may include, for example, “multiple” or “two or more”. The terms “plurality” or “a plurality” may be used throughout the specification to describe two or more components, devices, elements, units, parameters, or the like. Unless explicitly stated, the method embodiments described herein are not constrained to a particular order or sequence. Additionally, some of the described method embodiments or elements thereof can occur or be performed at the same point in time.
  • A computing system according to embodiments of the invention may be any suitable article, processor, chip, controller or suitable computing device suitable for processing data as described herein as well as controlling components in a device. In some embodiments the task of data processing may be distributed among a number of controllers, processors or computing systems. An intermediate module or processor associated therewith may include a controller that may be, for example, a central processing unit processor (CPU), a chip, or any suitable computing or computational device.
  • Reference to memory may be or may include, for example, a Random Access Memory (RAM), a read only memory (ROM), a Dynamic RAM (DRAM), a Synchronous DRAM (SD-RAM), a double data rate (DDR) memory chip, a Flash memory, a volatile memory, a non-volatile memory, a cache memory, a buffer, a short term memory unit, a long term memory unit, or other suitable memory units or storage units. Memory may be or may include a plurality of, possibly different memory units.
  • Reference to data a storage device may be or may include, for example, a hard disk drive, a floppy disk drive, a Compact Disk (CD) drive, a CD-Recordable (CD-R) drive, a universal serial bus (USB) device, a redundant array of independent disks (RAID), or any other suitable removable and/or fixed storage unit.
  • Embodiments of the invention may include an article such as a computer or processor non-transitory readable medium, or a computer or processor non-transitory storage medium, such as for example a memory, a disk drive, or a USB flash memory, encoding, including or storing instructions, e.g., computer-executable instructions, which when executed by a processor or controller, carry out methods disclosed herein.
  • An intermediate module according to embodiments of the invention may include software, hardware, firmware or any combination thereof.
  • General Data Flow
  • Reference is made to FIG. 1, which illustrates a system including an intermediate module 200 and its environment according to an embodiment of the invention, as well as a flow of data from client module at workstation 230 to application service provider at network node 260.
  • Intermediate module 200 may include an interception module 210 and a data protection module 220. Intermediate module 200 may be operatively connected to a client terminal 230, e.g., a trusted workstation, and to a network node 260, e.g., an application service provider, via a network, such as public network 250. It will be understood that FIG. 1 is an exemplary embodiment of the invention, and that other network configurations are possible. For example, trusted workstation 230 and intermediate module 200 may be remote from each other, for example, operatively connected over a trusted network link.
  • For example, trusted workstation 230 may be connected to a plurality of intermediate modules, including for a plurality of organizations, and intermediate their data traffic with one or more application service providers over a public network.
  • It will be recognized that reference is made throughout the present application to an intermediate module, however, the module may reside on the client device, at a gateway server, e.g., on premises associated with the client device, or at a separate server or servers in communication with the trusted client device and the untrusted server.
  • Thus, for example, the interception and/or data protection modules may be installed on the trusted workstation, possibly as a browser plug-in, possibly as an operating system driver or module, possibly as a software library and possibly as another software component.
  • In another example, the intermediate module may be positioned right in front of the untrusted application, where all accesses to the untrusted application pass through the intermediate module.
  • In yet another example, the intermediate module may be a separate server to which client module transmits input data, which in turn transmits the processed data to the untrusted server.
  • A trusted workstation 230 may be a client computer having installed thereon a client component 240 that may interact with the intermediate module. Client component 240 may be a web application HTML form running in a web browser while network node 260 can be an HTTP web server of a SaaS vendor. Client component 240 can include API client software and, additionally or alternatively, any other method of remotely accessing network node 260.
  • End users can use client component 240 to enter, retrieve and manipulate data, intended to be passed to, or retrieved from, network node 260. End users may include human users utilizing a software agent (e.g. a web browser) and automated agents using a client API.
  • Interception module 210 of intermediate module 200 may intercept or otherwise receive input (unprocessed) text from trusted workstation 230, and provide the input text to data protection module 220 for processing. Interception module 210 may intercept the data flowing between client component 240 and network node 260, can modify it, and can interfere with the normal data flow. For example, the interception module may trigger an authentication session in order to determine that an end user can access data stored in network node 260. Interception module 210 can be (or be executed by) a web proxy server.
  • Data protection module 220 may receive input text and process it selectively. Input text that is not selected to be processed may be transmitted as unprocessed text to network node 260 for manipulation and/or storage in storage system 270 substantially without processing, or with less processing than text selected for processing. For text to be processed, data protection module 220 may process the input text to provide processed text, which may be provided over public network 250 to untrusted application service provider 260 for storage, manipulation, etc. According to embodiments of the invention, therefore, application service provider 260 may thereby not receive the unprocessed text, but rather store and manipulate processed text. As described below, the processing may include applying a search- and/or sort-enabling encryption scheme, to thereby provide encrypted text data. According to embodiments of the invention, the processing may selectively encrypt text, selecting which input text to transmit to application service provider 260 in processed form, and which input text to transmit in unprocessed form.
  • It will be understood that intermediate module 200 may include one or more servers, one or more workstations, one or more personal computers, one or more laptop computers, one or more media players, one or more personal data accessories, one or more integrated circuits, and/or one or more printed circuit boards, dedicated hardware, or a combination thereof.
  • Data Flow Intervention
  • Intermediate module 200 may include or provide functionality additional to or unrelated to encryption and/or decryption, and may alter the normal message flow between the client trusted workstation 230 and the server untrusted application 260. Such additional functionality may have the effect of compensating for server-side functionality lost due to encryption.
  • According to embodiments of the invention, the intermediate module may receive input data from the client device, intercept said input data, e.g., prevent or otherwise not allow the input data to be transmitted to the server, and the intermediate module may provide the relevant function on the input data that the server would otherwise provide. For example, the intermediate module may generate at least one message to the client device based on a result of the function.
  • According to some embodiments of the invention, the intermediate module may obtain from said client device a response to the at least one message, based on the response, process the input text to obtain processed input text, and transmit the processed input text to the server.
  • For example, a server may generally check the spelling of input text and provide the user with a feedback message, for example, indicating misspelled words and suggesting corrections. However, when the text received by the server is encrypted, in accordance with embodiments of the present invention, the server may not be able to perform spell checking without decrypting the processed text. Therefore, in accordance with embodiments of the invention, the intermediate module may provide additional functionality, for example, spell-checking, on input text, and may provide the user with a feedback message, e.g., a result of the spell-checking function on the input data, such as an error message, a suggested spelling correction, or a message that no errors were detected.
  • In one embodiment of the invention, such additional functionality may include replacing server-side search functionality, for example, by storing a copy of the user data (or a portion thereof) and searching it in the intermediate module in response to a search request made by the client.
  • In an embodiment of the invention, such additional functionality may include triggering an authentication session between the client and the intermediate module before allowing user data to be encrypted and decrypted.
  • In an embodiment of the invention, such additional functionality may include format-checking input data, and if appropriate, for example, if the input data is in a first format, requesting the client to send information in a second format, different from the first format. Such received and/or requested formats may include, for example, (a) a delta-encoded format of the input text wherein only differences from a known version of the input text are transmitted, (b) a full version of the input text, (c) the input text contained in a specific document format, or a combination thereof. For example, input data may be received in a delta-encoded format, and the intermediate module may request the input data in a full input text format. Other examples of specific document formats include but are not limited to PDF, DOC, HTML, etc.
  • According to embodiments of the invention, the processed text may be stored at network node 260, for example, in storage system 270, and manipulated remotely over public network 250. As described below, the processing may be such that searching and/or sorting may be enabled on the processed text, in such a manner as to be transparent or unseen by the trusted user and/or the untrusted server application, without decrypting the processed data at the application service provider. In the below description, storage system 270 is at times denoted a database; however, it will be recognized that storage system 270 may be any suitable digital storage architecture, and may be stored on any suitable hardware, e.g., a redundant array of independent disks (RAID), etc.
  • Accordingly, as shown in the illustrative data flow in FIG. 1, trusted workstation 230 may provide unprocessed input data such as “Acme Corp.” for use by application service provider 260. The input text may be intercepted at intermediate module 200, for example, by interception module 210, and processed by data protection module 220. Data protection module 220 may process the input text into one or more individual text units referred to as tokens, and control data, which may be encrypted, shown schematically as processed data “DHFOEFRGEJIC”, and send the processed data over network 250 to untrusted application service provider 260, where it may be manipulated by users and/or stored in database 270. It will be understood that “DHFOEFRGEJIC” is schematic, and that any suitable encryption algorithm may be used, for example, resulting in any symbol set. As described below, according to one embodiment of the invention, non-Latin characters or symbols may be used, for example, Korean or Chinese language symbols.
  • Reference is made to FIG. 2, which illustrates a generalized flow of data from client terminal 230 to application service provider 260, according to an embodiment of the invention. The end user may provide input text that is not encrypted (clear text). The input data may be transmitted from client terminal 230 towards network node 250 and be intercepted by interception module 210. Interception module 210 may provide the input text to data protection module 220 that processes the input data to provide processed data, wherein the processing includes encrypting at least a portion of the input text. The processed data may then be sent to interception module 210, which in turn transmits it over public network 250. The processed data may be received by network node 260 for manipulation by an application, e.g., a SaaS application, and stored in database 270. It will be understood that the input data may be new or updated data to be stored in storage system 270, or it may be any data provided to an SaaS application for real time manipulation, for example, one or more parameters of a command, e.g., a search command.
  • Reference is made to FIG. 3, which illustrates a flow of data from network node 260 to client terminal 230, according to an embodiment of the invention. Such a process may be initiated by a user at workstation 230 by making a retrieval or search request. The parameter of the request, e.g., the terms to be searched for, may be processed as described above in connection with FIG. 2, and the application at network node 260 may search or sort the processed data, possibly based on the processed parameter provided. Network node 260 may retrieve processed data, for example, in response to a search or retrieval request, where the processed data may include some encrypted portions. The processed data may be sent over public network 250 towards client terminal 230. Interception module 210 may intercept the processed data and provide it to data protection module 220 to identify any encrypted data within the processed data. Any identified encrypted data may be decrypted, and provided to interception module 210 to resume data communication. Interception module 210 may forward the unprocessed data (decrypted plaintext data) to client component 240 for display to a user.
  • Tokenization and Normalization Generally
  • The application running on network node 260 may be requested to search stored data and return a result. FIG. 10 schematically illustrates a flow of data enabling searching of encrypted user data in an embodiment of the present invention.
  • First, the client 240 may enter data and make several store requests to the untrusted application 260 passing through the intermediate module 200. The intermediate module encrypts user inputs such that every searchable word is mapped onto an encrypted searchable word, such that every input searchable word has exactly one corresponding encrypted searchable word. Encrypted searchable words may be normalized before encryption.
  • For example, in FIG. 10 the words “BAD”, “Bad” and “bad” are all encrypted into the encrypted word “cccc”, so searching for “bad” provides results containing “BAD” and “Bad”.
  • In FIG. 10 the words “the” and “a” are considered non-searchable and do not result in an individual encrypted searchable token. Conversely, the words “dog” and “cat” map into the encrypted searchable words “eeee” and “bbbb” respectively. The information holding the case markers for the searchable words and the non-searchable words is contained in the encrypted tokens “ZZZytuv” and “ZZZabcd”.
  • Reference is made to FIG. 4, which is a schematic illustration of a data processing method 100 designed to enable server-side searching and/or indexing of user textual data, according to an embodiment of the invention. Method 100 may be applied by an intermediate module, for example, by a data protection module as described above. It will be understood that the method of receiving processed data and converting it to unprocessed data may be substantially the reverse of the described method.
  • Method 100 starts at stage 110 by receiving input message, for example, by an intermediate module operatively connected between a client terminal and a network node.
  • At stage 111, the method may identify individual data units within the input message to be handled. For example, an input message may include a First Name field, a Last Name field, and a Document Body field.
  • At stage 112, the method may iterate over all identified data units, first obtaining an unhandled data unit at stage 113, then selecting whether or not to process the obtained data unit. Processed data units may be processed individually or collectively.
  • At stage 114, the method may then determine whether to process the input data. Input data that are not modified are retained (stage 130). At stage 115, the method may determine whether and/or what portions of the input data unit text should be processed. For example, portions of an input text not suitable for encryption may include search connectors such as “OR”, “AND”, or application-specific significant text markup such as “{important}” or “@location”, indicating a special kind of server processing to be carried out on the data.
  • For input text to be processed, the method proceeds to stage 116, in which the input text is broken down into individual text units called tokens (the process of determining tokens from the input text is referred to herein as tokenization). It will be recognized that tokenization is optional, and method 100 may include (a) encrypting all input data together as a single token, (b) encrypting input data determined to be suitable for encryption separately, to provide a plurality of processed tokens, wherein each processed token represents a piece of input text, or (c) a combination thereof.
  • The method may then proceed to stage 117, in which certain input tokens may be recognized as unsuitable for searching. For example, the criterion for determining each individual word may be a list of predefined words, a threshold word frequency in a word frequency list such as English dictionary frequency list, the length of the word, or a combination thereof.
  • At stage 118, the method may extract information unimportant for searching from searchable input tokens, for example: letter case, letter diacritics, ligature breakup, Unicode character composition or decomposition (as defined by the Unicode standard). The extracted information may be stored for later use in a separate location and may be placed in an output token called a control token. The text tokens may be converted into a normalized form which does not contain the extracted information. This process is referred to herein as normalization. It will be recognized that normalization is optional, and may be done in any suitable manner.
  • At stage 119, the method may obtain bit representations of all information units to be encrypted, including searchable tokens, information extracted from searchable tokens, and other portions of the input, in order to encrypt it using a cryptographic cipher. Information units may be classified as searchable or non-searchable. Non-searchable information units may be combined or broken up. The order of searchable tokens in the input text may be changed, and an indication of the original order may be added to the non-searchable information units.
  • At stage 120, the method may encrypt information units by using a cryptographic cipher, such as AES or DES.
  • At stage 121, the method may convert the encrypted bit representations into output text units consisting of a sequence of characters taken from a character set, for example, one or more predefined contiguous portions of Unicode, as described in further detail below. This character set may be defined in advance to assist decrypting.
  • At stage 122, the input data unit in the input message may be replaced with the output text obtained at stage 121.
  • The method may continue to apply stages 112-122 to all identified input units, and then transmit the processed message to the network node hosting the server application (stage 131).
  • Tokenization
  • As described above, the data processing method may involve tokenization, which in turn may involve a number of steps. It will be understood that some of the steps described in connection with the illustration of tokenization below are are optional. Furthermore, it will be understood that de-tokenization, i.e., converting tokenized processed data into unprocessed data, may be substantially the reverse of the described method.
  • In order to enable searching over encrypted user data, input texts may be broken into a number of segments in a process called tokenization. Segments holding individually searchable terms are called (unprocessed) input tokens, where input tokens are typically whole words. Input segments that are not tokens are added to an information set called a Non-Searchable Information Set. Such segments may include punctuation, space characters, and other characters.
  • In connection with tokenization, several words may be combined into a single token, or a single word may be broken into two or more constituent tokens. For example, compound words like “whiteboard” may be decomposed into individually searchable tokens “white” and “board”. For example, languages such as Chinese or Japanese do not usually use spaces or another distinct character to separate words in written text, and thus a single Chinese input text may be broken into several input tokens. The indication of such combination or breaking may be added to the non-searchable information set.
  • Tokenization may include detection of morphological variants of words, modifying the input token to a normalized form and adding an indication of the original input token to the non-searchable information set. For example, morphological invariants of words may include plural versus singular noun forms (“word”, “words”), verb conjugation (“cry”, “cried”, “crying”), etc.
  • Tokenization may include detection of words unlikely to be searched for, and their removal from the set of searchable input tokens and addition to the non-searchable information set. For example, such detection may use (a) a predefined set of words, (b) a dictionary holding word frequency list and a threshold frequency where words with frequency above the threshold frequency are considered unsearchable, (c) a minimum and/or maximum length for a searchable word, or (d) any combination thereof.
  • Tokenization may support server-side searching and/or indexing which ignore certain character properties, such as letter case, diacritics, ligatures or Unicode composition/decomposition. For example, searching for “ToKeN” and “tOkEn” may produce the same results when searching text, having all strings containing a variant of the word “token” to appear on the search results.
  • Supporting such property-insensitive searching may be performed by (1) converting every input character into a single canonical form, (2) producing an indication of the original character, and (3) adding this indication to the non-searchable information set. For example, tokenization may support case-insensitive searching on the server side by converting input token characters into a single letter case (e.g. lowercase) and adding an indication of the original letter case to the non-searchable information set.
  • For example, diacritical marks may be ignored during searching, Ignoring added, removed or modified diacritical marks, e.g., “E” or “E” or “E”. For example, a search for “cafe” will match user data such as “Café”, “CAFE”, “cÄfe” or “çafe”. The system may convert all these word instances into the normalized form “cafe” add an indication of the original diacritics to the non-searchable information set.
  • For example, the system may support ligature-insensitive searches (for example, dæmon and daemon). The system may convert ligatures into normalized form such as converging “æ” to “ae”, produce an indication of the original ligature, and add it to the non-searchable information set.
  • Reference is made to FIG. 6, which illustrates processing of the word “Café”. The input text is stripped of the uppercase and diacritics, and converted to the token “cafe”. The associated control token indicates that the first letter is uppercase, and that the fourth letter has an acute accent. According to some embodiments of the invention, letters may be assumed to be lowercase with no diacritics, so that the control token need not indicate lowercase letters or absence of diacritics.
  • Text Markup and Augmentation Information
  • According to an embodiment of the invention, processing input text may include detection of application-specific text at least one handling instructions, and may either add these handling instructions to the non-deterministically transformed text or leave this information in clear text in the processed text, so that the untrusted server may apply any kind of handling related to this text augmentation information. For example, HTML is a text augmentation which may add formatting information to user text by embedding HTML tags in the text. The system may handle input HTML tags by at least one of: (1) adding HTML tags to the non-searchable information, (2) including input HTML tags in the output processed text without encryption to allow server-side handling, (3) treating HTML tags as normal text, e.g., applying any handling performed on non-HTML-tag input text to the HTML tags.
  • According to some embodiments of the invention, upon detecting at least one handling instruction in input text, the intermediate module may decide not to transform said at least one handling instruction.
  • According to some embodiments of the invention, upon detecting at least one handling instruction in input text, the intermediate module may decide to transform said at least one handling instruction non-deterministically.
  • The system may add context information to the non-searchable information set, such as the time, the user, or other information known to the system when producing processed text.
  • For example, in accordance with embodiments of the invention, the system may add custom indications to the encrypted tokens such as “important” or “sensitive”, such that upon decryption these indications may be noticed, an event indicating the decryption of the input information may be generated, and this event handled, for example, by adding a record to a log file.
  • Token Ordering
  • Processing the input text may include changing an order of input tokens within the processed text. When an order is changed, token order indication may be generated to indicate an order of the input tokens in the original input text, and may be added to the non-searchable information set.
  • Excess Tokens
  • Processing the input text may include generating at least one fake or decoy excess tokens to be included in the output text. Such decoy tokens can make the encrypted text more robust to statistical analysis. The excess decoy tokens may be added with an intended target statistical distribution in order to disguise decoy tokens and make decryption by statistical analysis yet more difficult. The at least one excess tokens are distinguishable from other tokens included in the processed text only after gaining access to a secret key. For example, English-language word frequencies may be used as a model for the target distribution of decoy tokens.
  • Tokenization Process
  • The non-searchable information set may be arranged in one or more non-searchable tokens (also referred to herein as control tokens), which may be included in the processed output text. The control tokens may be placed before the normalized set of input tokens, after the normalized set of the input token, or can be located within the normalized set of input tokens. The non-searchable information set may be fully or partially encrypted, and then included in the processed output text.
  • Before encryption, bit representations of non-searchable information set and searchable tokens may be obtained. Obtaining such bit representations may include compressing and encoding input data in certain encoding and compression schemes.
  • Error detection indication may be generated and added it to the non-searchable information set. For example, a checksum of the input text may be calculated and added to the non-searchable information set.
  • The obtained bit representations of input tokens and possibly the non-searchable information set may then be encrypted wholly or partially. Encryption of searchable input tokens may provide a single encrypted form for every instance of a searchable input token. Encryption of non-searchable information may provide a single or multiple encrypted forms for every instance of the same information set. Multiple encrypted forms may provide better security, but can render certain server-side operations difficult or impossible without decrypting the user data. Multiple encrypted forms may use at least one bit of cryptographic salt embedded in the encrypted form.
  • The encrypted forms may then be converted into textual forms using a suitable encoding scheme. Such an encoding scheme may provide at least one of the following properties: (a) separation of encrypted tokens to allow an untrusted server application to determine searchable units within the processed text, (b) using a character set which does not cause an untrusted server application to determine searchable units (for example, the character “+” may be used to separate words by an untrusted server application and therefore may not be suitable for encoding encrypted tokens; for example, using both English and Hebrew characters may cause an application to separate sequences of both sets), (c) providing a compact representation such that server-side length limitation are less likely to be met, and (d) using an efficient algorithm in the intermediate module for encoding and decoding.
  • According to some embodiments of the invention, processed text may comprise a string of characters selected from a predetermined character set, for example, a character set comprising at least one contiguous subset of the Unicode character set. In some embodiments, the at least one contiguous subset may include characters in the letter character category, the number character category, or both. In some embodiments, the characters selected for use in the processed text may be selected from among a plurality of contiguous subsets of the Unicode character set, for example, two, three, four, or five separate subsets of the Unicode character set may be selected. In some embodiments, the number of subsets may be more than one and less than or equal to ten subsets of the Unicode character set.
  • In some embodiments of the invention, the subset of the Unicode character set may be one or more subsets selected from Korean Hangul, Chinese, Japanese and Korean (CJK) Unified Ideographs, and a combination thereof. Accordingly, for example, Korean language characters may be used for server applications storing user input using UTF-16 encoding. As Korean characters represent a single range within the Unicode character set which contain only letter characters, they have an efficient encoding and decoding implementation. For example, Chinese character set may be used for the same reason but having a greater range than Korean; however, use of the Chinese character set may not be suitable in server application that separately search and/or index every individual Chinese character.
  • For example, a possibly modified BASE64 encoding may be used for server applications storing user input using UTF-8 encoding. BASE64 encoding itself contains the characters “+” and “/” which may cause server applications to conclude that a single encrypted token has one or more encrypted words.
  • For example, space characters may be used to separate encrypted tokens. Another character such as a period “.” may be used to separate encrypted tokens where space characters are not expected, for example in email address fields.
  • Processed output text may be included in unencrypted text when being received at the intermediate module, when sent from the untrusted server. In order to trigger decryption, the system may generate a statistically significant feature in processed text. For example, the system may include a rare character or combination of characters in the processed text, to be searched for when detecting encrypted text within unencrypted text.
  • According to some embodiments of the invention, processed output text may be arranged in more than one output token, such that output tokens do not exceed certain length limits For example, a length limit of 50 characters may be applied to the first output token and a length limit of 1000 characters may be applied to subsequent output tokens.
  • Combining Deterministic and Non-Deterministic Encryption
  • Some embodiments of the invention may use deterministic or non-deterministic transformations of input text, or a combination thereof. Embodiments of the present invention may decide whether to transform input data (or portions thereof) deterministically or non-deterministically, or a combination thereof, then based on such decision, transform the input text deterministically or non-deterministically, or a combination thereof using at least one secret key to thereby obtain processed text, and transmit the processed text to the server.
  • As used herein, a non-deterministic transformation to an input text is one whose result may be one of a plurality of possible outputs. A deterministic transformation to an input text is one that may include only one possible output. Both kinds of transformations may typically use or depend on a secret key for determining the possible output or outputs.
  • According to embodiments of the invention, deterministic token representations may be obtained, e.g., by applying reversible encryption depending on a secret key, or using an irreversible encryption using a secret key. Non-deterministic tokens representations may be obtained, e.g., by applying a symmetric encryption algorithm using a secret key, or by applying an asymmetric encryption algorithm, using the private key of a public-private key pair as a secret key, or by other reversible transformation depending on a secret key.
  • In some embodiments of the invention the server may provide search functionality over previously entered input texts. The intermediate module may choose in such case to deterministically transform individual searchable tokens within the input text. Such deterministic transformation may allow future search queries containing processed searchable terms to be processed correctly at the server. Portions of the input text may be transformed non-deterministically, for example, in order to provide enhanced security. According to embodiments of the invention, portions of input text may be transformed deterministically in order to allow server-side functions requiring exact matches between recurring instances of portions of input texts. For example, if a server may compare multiple revisions of an input text, wherein each revision is slightly different from its respective preceding revision, the server may provide a word-by-word or line-by-line difference analysis. Therefore, in such an example, deterministically transforming words or lines of input text allows such exact-match semantics on the server.
  • For example, the step of processing input text in an embodiment of the invention may include (1) encrypting some or all of the input text into one or more processed tokens in a non-deterministic fashion, (2) generating processed tokens corresponding to some or all suitable input tokens of the input text (e.g., after tokenization, normalization of the input text, etc.) in a deterministic fashion, and (3) including both the non-deterministically and deterministically transformed processed data in the output processed text for transmission and storage at the network node.
  • According to some embodiments of the invention, the decision whether to transform the input text deterministically or non-deterministically, or a combination thereof may be based on whether said word is member of a set of words. In this fashion, for example, input tokens to be made available for searching may be transformed deterministically, thereby enabling a search on such words. Upon location of a record based on the search, the processed input text, which may include deterministically and non-deterministically transformed processed data may be returned as a search result. Conversely, input tokens not made available for searching need not be transformed deterministically.
  • In some embodiments of the invention, the decision whether to transform the input text deterministically or non-deterministically, or a combination thereof may be based on the length of the word. Thus, for example, it may be decided to transform a word of the input text non-deterministically based on a length of said word. Thus, for example, in an example of an embodiment of the invention, short words, e.g., words containing less than three characters, may be transformed non-deterministically, while longer words, e.g., words having three or more characters, may be deterministically transformed. Accordingly, in such a scheme, short words having less than the minimum number of characters may not be searchable.
  • In an embodiment of the invention, the non-deterministic transformation may be performed using a first key, and the deterministic transformation may be performed using a second key.
  • In some embodiments of the invention, the first key and the second key may be identical. In other embodiments of the invention, the first and second keys may be different.
  • In some embodiments of the invention, one or more deterministically generated tokens may be dropped or eliminated if the overall length of the output text exceeds a length limit. In some embodiments of the invention, the decision may be made not to transform at least a portion of the input text.
  • It will be recognized that the process of retrieving processed text according to embodiments of the invention may operate in substantially the reverse fashion. That is, processed text may be received at the intermediate module, and a suitable reverse processing may be applied on the processed text to obtain original input text. In some embodiments of the invention, the original input text may be sent or otherwise provided to the client device, for example, to be displayed or provided to a user or application operating the client device.
  • Processing of Search Queries
  • Input text received at the intermediate module may be search queries including at least one search term to search for. Search query input texts may be processed by the intermediate module in order to (a) facilitate correct search functionality at the network node, and (b) enable decryption of the search query at the intermediate module, if the network node sends it back to the client. Search queries are generally processed at the network node in the same manner as other input texts are processed, and may apply further processing stages.
  • In some embodiment of the invention, the step of transforming the input text may comprise deterministically transforming at least one search term in the search query using a first key to produce at least one deterministically transformed search term. Accordingly, the step of transmitting the processed input text to the server may comprise transmitting the plurality of deterministically transformed search terms to the server. In some embodiments of the invention, a plurality of search terms in the search query may be treated and transformed separately.
  • In some embodiments of the invention, the processed search query may include substantially only deterministically transformed search terms, wherein the deterministic transformation may be a reversible transformation. The network node may search for the processed terms, and may return the result set to the client. The intermediate module may use the processed search terms to obtain original input text.
  • In some embodiments of the invention, transforming the search query may further comprise non-deterministically transforming substantially the entire search query using a second key to produce a non-deterministically transformed text, and combining the at least one deterministically transformed search term and the non-deterministically transformed text using a logical disjunction operator (e.g., the “OR” operator) to obtain a combined processed text, wherein transmitting the processed input text to the server comprises transmitting the combined processed text to the server. The network node may search for the processed search terms and for the non-deterministically processed text in disjunction, obtaining (or failing to find) results based on the deterministically transformed search terms, and obtaining no results for the non-deterministically transformed text. The result of the search may therefore be to return the result of the search on the processed search terms. Using the above method according to an embodiment of the invention, the intermediate module may receive from the network node the non-deterministically transformed text, from which it may then obtain the original input text of the search query.
  • Repository of Processed Texts
  • Some network node servers may return truncated search results in response to a query or other requests. For example, if the result of a search query is a 100 character field, the server may return only the first 20 characters of the field, and if the user selects the found record, the server will provide the full field. According to embodiments of the invention, the intermediate module should be able to work within such constraints. According to embodiments of the invention, where the server truncates units of the processed text, these units may be individual tokens within the processed text, the processed text as a whole, or both.
  • According to embodiments of the invention, this problem may be solved, for example, by providing a repository of processed texts at the intermediate module, or at a storage device managed or otherwise controlled or accessible by the intermediate module. The system may attempt to recover from such truncations before obtaining the original input text during the decryption stage, as follows: (1) the intermediate module may store unabridged processed text units at a trusted storage during the encryption stage, e.g., not via the untrusted server or its associated storage device, (2) when a truncated processed text is sent from the server and received at the intermediate module, the trusted storage unit is consulted to determine whether there exists therein one or more non-truncated processed text units matching or corresponding to the truncated processed text units, (3) if so, the intermediate module replaces the truncated processed text units with the corresponding unabridged processed text units to obtain a recovered processed text, (4) the recovered processed text are processed by a reverse processing method (e.g. decryption using a secret key) to obtain the original input text. The original input text, or unprocessed text, may then be provided to the client device, if required.
  • In some embodiments of the invention, what is stored in the repository may be at least one unabridged processed element associated with the processed text. For example, the processed element may be said entire processed text or a word or other portion contained in the processed text.
  • It will be recognized the system and method using the repository may be applied to any suitable request from the client device, including, for example, a search request, a record request, or a report request.
  • Detection of Untrusted Server Transformations Using Bait
  • An untrusted server may often apply one or more of a multitude of transformations on instances of processed user data. Such transformations may be expected by a client component residing on the trusted workstation, but may not be known to the intermediate module described herein. According to embodiments of the invention, therefore, the intermediate module may utilize methods to infer the kind of transformation applied to processed user data.
  • According to one embodiment of the invention, the intermediate module may add excess information (referred to herein as bait) to encrypted user data in known locations. Bait may be used when processed user data is received at the intermediate module in order to infer the kind of transformation applied to processed user data. Non-limiting examples of transformations for which bait may be used are application of a certain character encoding scheme and HTML tag elimination.
  • For example, an untrusted server may apply various and possibly combined encoding schemes to encrypted user data received thereat. When encrypted text is received at the intermediate module from the untrusted server, the encrypted text may be encoded in one of a multitude of encoding schemes used by an untrusted server application to communicate with the client component residing on the trusted workstation. The encoding scheme may or may not be indicated in the message generated by the server. The client component may typically be aware of the server component and may reliably know the encoding scheme used. However, the intermediate module may not be aware of the specific encoding used in every instance of encrypted text. Nevertheless, when decrypting user data before providing decrypted user data to the client component, the intermediate module according to embodiments of the invention should be able to use the same encoding scheme applied in the server and expected by the client. That is, if the intermediate module does not know the encoding scheme used by the untrusted server and the trusted workstation, information may become lost or garbled in the processing and deprocessing by the intermediate module.
  • To facilitate encoding scheme detection, the intermediate module may add predetermined characters known as encoding bait to encrypted text. The encoding bait may be encoded by the server along with the encrypted user data before providing to the client component. When the intermediate module detects encrypted tokens, the encoding bait may be examined to infer the kind of encoding scheme being used for encoding an instance of encrypted text. Accordingly, the intermediate module may use the inferred encoding scheme to encode decrypted text in a processed message. Non-limiting examples of encoding schemes include: (i) UTF-8 encoding, (ii) encoding using HTML escape sequence followed by UTF-8; and (iii) encoding using JavaScript escape sequences, then again using JavaScript escape sequences, and then performing Latin-1 encoding (AKA ISO-8859-1). For example, JavaScript escaping typically operates by replacing characters with a backslash and another character; for example, the newline character is replaced with a backslash and the character “n”, i.e. the sequence “\n”.
  • In some embodiments of the invention, bait may be used to detect at least one transformation including replacement of at least one transformable character in the processed text with a matching replacement character or replacement character string, e.g., one or more escape characters.
  • An example of using encoding bait composed of an angle bracket “<” and a backslash “\” is provided herein. The user may input the string “This ‘ is a quote”. This is encrypted, for example, into “QIFJDJNZOP”. During encryption, bait is attached to an encrypted token so that “QIFJDJNZOP” becomes “<\QIFJDJNZOP”, in which <\ is the bait. The server may receive the encrypted string, and send the string to the client in a JavaScript file. In a JavaScript file, the server needs only to escape the backslash, but not the angle bracket. Accordingly, the message sent to the client includes: “<\\QIFJDJNZOP”, in which the original backslash of the bait is escaped using another backslash. When the intermediate module detects the encrypted token in the message preceded by the original angle bracket and the escaped backslash, it may infer that the token is JavaScript-escaped. Thereupon, the intermediate module may decrypt the input QIFJDJNZOP into “This ‘ is a quote”. However, having inferred that the client is expecting a JavaScript-escaped text, the module may then use JavaScript escaping to encode the decrypted string, e.g., by escaping the quote to produce “This \’ is a quote”. The decrypted quote is thus using the encoding rules inferred from the encoded bait. The decrypted and encoded string is then forwarded to the client.
  • Another example for which bait may be used is HTML transformations, of which HTML tag elimination is a special case. An untrusted server may receive text augmented with HTML markup, generate instances of received text with all or some HTML tags removed, and may return these instances to the client component. In such cases, the intermediate module may include an HTML tag bait in processed user data. The HTML tag bait may be removed by the intermediate module when receiving processed user data, and infer, from its existence or inexistence, whether HTML tags may be removed from decrypted user data, and may accordingly retain or remove decrypted HTML tags in a message returned to the client component.
  • It will be recognized that in some embodiments, multiple pieces of bait may be added to a processed text to detect a plurality of transformations or encoding schemes applied by the untrusted server.
  • Length Limits
  • In some embodiments of the invention, a plurality of separate portions of the input text may be transformed in which at least one of the plurality of portions of said input text includes no more than a maximum number of characters, for example, by truncation of the respective portion. In some embodiments of the invention, a plurality of separate portions of the input text may be transformed in which each of the plurality of portions of said input text includes no more than a maximum number of characters, for example, by truncation of the respective portion.
  • Tokenization Example
  • Reference is made to FIG. 5, which illustrates the normalization and tokenization of an input text that includes the sentence “This sentence has FIVE words!” Input text 510 includes the sentence “This sentence has FIVE words!” The sentence may be tokenized to the following input tokens “This”, “sentence”, “has”, “FIVE”, “words”, and “!”. These input tokens may be normalized to provide normalized input tokens and metadata. The normalized input tokens have the following format: “This”, “sentence”, “has”, “five”, “words”, and “!”. The metadata associated with “sentence” is “lower case”. The metadata associated with “FIVE” is “upper case”. The metadata associated with “words” is “lower case” and “plural”.
  • Next, the method may detect common input tokens, including the words “this”, “has” and the non-word “!”. These input tokens may be encrypted in a non-deterministic manner, e.g., they may be encrypted with salt (denoted “*”).
  • The method may detect uncommon input tokens “word”, “sentence” and five”. These words may be encrypted in a deterministic manner.
  • The order of input tokens may be changed and order metadata may be generated accordingly. The order metadata, the case metadata, and the plural metadata may be included in a control token 530.
  • Sort Support
  • A text processing feature common in many SaaS applications is sorting records by lexicographic order of a particular field or other attribute. It may therefore be beneficial to provide processed text by an order-preserving encryption process.
  • Any of a number of order-preserving approaches may be implemented. For example, order preservation can be obtained by any of the following methods: (i) maintaining a list of all records on the interception module, performing site-specific ordering when needed. This method requires almost duplication of each server's functionality in both presentation and data management; (ii) providing an API for the server to query the sort order of a particular string; or (iii) creating a lexicographically sortable representation which preserves the real sort order without any modification in the network node.
  • An encryption method according to the present invention may preserve order of input text records by applying the following stages or a combination thereof: (1) converting input data into a numeric values (if not already numeric), (2) applying an order-preserving transformation on the numerical values to obtain output numeric value, (3) obtaining a lexicographically sortable representation from the output numeric value, and (4) using the lexicographically sortable representation in the processed output text, as either a prefix string (in textual data) or as the whole output data. The order-preserving transformation may be a monotonously increasing function. The order preserving function may use a private key that can be generated from a random source, in order to parameterize its functionality. A private key may be generated for every set of inputs sorted collectively as a set. According to embodiments of the invention, generating order information, as described further below, may include applying an order-preserving, secret-key-dependent function on the input text.
  • According to some embodiments of the invention, order information may be produced based on a truncated version of the input text. According to yet further embodiments of the invention, the order information may be produced based on a plurality of truncated words in the input text, in the order in which they appear therein.
  • According to some embodiments of the invention, the intermediate module may process input text by applying an order-preserving transformation, wherein the order-preserving transformation comprises generating order information based on the input text, the order information indicative of a relative order of the input text within a set of possible input texts according to a collation rule, transforming the input text to obtain processed text, and transmitting the processed text to the server. According to some embodiments of the invention, the order information may be sent to the server in association with said processed input text by adding the order information as a prefix to the processed input data and transmitting the combined order information and processed input data to the server.
  • In order to reduce security risks associated with order preserving encryption schemes, the intermediate device may consider only a reduced portion of the input data when generating an order-preserved output. Reducing the input to obtain a reduced portion of the input data may include (a) ignoring certain words such as “the”, “a”, (b) ignoring all characters in every word occurring at a certain position within the word or later, e.g. ignoring the characters “ra” in “zebra”, (c) ignoring final words within the record (d) contracting the input domain of the order-preserving function, (e) ignoring certain character properties such as letter case, or (e) a combination thereof.
  • FIG. 7 illustrates various stages of method 170 according to an embodiment of the invention that may be used to obtain an order-preserving representation of textual data to be included in processed text. At stage 171, input text to be encrypted may be received. At stage 172 certain words may be discarded from the input text. At stage 173, certain character properties may be discarded, such as letter case, diacritics, ligatures or other character properties. At stage 174, input words may be truncated according to a predetermined parameter of the encryption scheme, such that final characters from input words may be discarded.
  • At stage 175, certain final words of the input text may be discarded. Accordingly, performing one or more of optional stages 172, 173, 174, and 175 may produce a reduced input text. At stage 176, the (optionally reduced) input text may be converted into a numeric value to obtain a input numeric value. At stage 177, an order-preserving function may be applied to the input numeric value to obtain an output numeric value. At stage 178, an order preserving representation may be obtained from the output numeric value. Finally, at stage 179, the order preserving representation may be placed as either a prefix or the whole encrypted data of the processed text.
  • In the below example illustrating an application of stages 172-176, the input numeric value of input text “The Green Zebra” may calculated as follows: (i) receiving a set of input tokens “The Green Zebra”; (ii) ignoring irrelevant input token “the” to provide relevant input tokens “Green Zebra”, (iii) normalizing the relevant input tokens to provide “green zebra”; (iii) selecting, for example, based on user definitions, only the first three letters of every input token, to provide six relevant characters: “gre zeb”; (iv) calculating the numeric value as shown in Table 1 of each letter based on the weight of its location in the input token; and (v) summing up the letters values to provide a numeric value of the set of input tokens which is 0.296199790068345.
  • The weight W may represent the size of the alphabet A, raised to the negative power of the position of the character P, i.e., W=A−P. For English text, the alphabet size is 26.
  • TABLE 1
    Alphabetic Position
    Letter Value (P) Weight (W) Weighted Value
    G 7 1 0.03846153846153850000 0.2692307692307690000000000
    R 18 2 0.00147928994082840000 0.0266272189349112000000000
    E 5 3 0.00005689576695493860 0.0002844788347746930000000
    Z 26 4 0.00000218829872903610 0.0000568957669549386000000
    E 5 5 0.00000008416533573216 0.0000004208266786607880000
    B 2 6 0.00000000323712829739 0.0000000064742565947813600
  • FIG. 8 illustrates a method 300 of generating an order-preserving function according to an embodiment of the invention, to be used, for example, in stage 177 of method 170. At stage 180, the domain (D1, D2) and range (R1, R2) of the function may be determined, for example, according to configuration by a user or program. At stage 181, a private key K is obtained to be used in calculation of the order-preserving function output value. At stage 182, an input value Vin is received (possibly from stage 176 of method 170). At stages 183 and 184, the function range may be altered, so it starts and ends at key-dependent positions, lying within the original range. At stage 185, a point Dmid lying inside the function's domain may be selected, wherein Dmid is dependent on the function's key K, such that Dmid=f1(D1, D2, K). At stage 186, points RL=f2(R1, R2, K, n) and RH=f3(R1, R2, K, n) may be selected, such that R1<RL<Rh<R2, where RL and Rh may depend on the function's key K and/or the iteration number n, where initially n=1. At stage 187, the numeric input value Vin is checked to see whether it lies within the lower part (D1, Dmid) or higher part (Dmid, D2) of the current domain (D1, D2). If Vin lies within the lower part, then stage 188 a is carried out, otherwise stage 188 b is carried out. At stage 188 a and 188 b, the function's domain (D1, D2) and range (R1, R2) are modified: in stage 188 a, (D1, D2) is set to (D1, Dmid) and (R1, R2) is set to (R1, RL); in stage 188 b, (D1, D2) is set to (Dmid, D2) and (R1, R2) is set to (RH, R2). Stages 185-188 may be repeated until a predetermined stop criterion is satisfied at stage 189. The stop criterion may be for example a threshold size Dthreshold being greater than the current domain size |D|=D2−D1; or a threshold size Rthreshold being greater than the current range size |R|=R2−R1; or a combination thereof.
  • The following example illustrates an encoding scheme which may be used in stage 178 of method 170. It is assumed that the transformed numeric value generated by an order preserving function is 0.344323947, that the lexicographically sortable representation is ten characters long and includes only lowercase English letters only. Table 2 illustrates the ten iterations of an arithmetic coding scheme that is applied to generate ten characters of the lexicographically sortable representation.
  • TABLE 2
    Valuen
    Letter (=26 × (Valuen−1 Letter value Rounded Output
    number Roundedn−1 ÷ 26)) (×26) value letter
    1 0.344323947 8.952422617 8 h
    2 0.952422617 24.76298804 24 x
    3 0.762988037 19.83768896 19 s
    4 0.837688957 21.77991288 21 u
    5 0.779912877 20.2777348 20 t
    6 0.277734797 7.221104712 7 g
    7 0.221104712 5.748722505 5 e
    8 0.748722505 19.46678512 19 s
    9 0.46678512 12.13641313 12 l
    10 0.136413127 3.546741304 3 c
  • As indicated by Table 2, the lexicographically sortable representation is “hxsutgeslc”.
  • A physical computer readable medium can be provided. It stores instructions that when executed by a processor can cause the processor to implement method 100 or portions thereof. The physical computer readable medium can be a disk, a diskette, a tape, a cassette, a disk on key, a flash memory unit, a volatile memory unit, and the like.
  • While certain features of the invention have been illustrated and described herein, many modifications, substitutions, changes, and equivalents will now occur to those of ordinary skill in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the true spirit of the invention.

Claims (16)

1. In a system comprising a server and a client device, wherein the server is adapted to transform text received from said client device by applying at least one of a plurality of transformations, a method comprising:
receiving input text at an intermediate device from the client device;
processing said input text at the intermediate module to obtain processed text, wherein said processing comprises including bait in said processed text;
transmitting the processed text to the server;
upon request, receiving at said intermediate module transformed processed text from the server, said server having applied at least one of said plurality of transformations to said processed text to obtain said transformed processed text; and
determining by said intermediate module at least one of said transformations applied by said server based on a comparison between the processed text and the transformed processed text.
2. The method of claim 1, further comprising:
applying a reverse transformation on said processed text to obtain unprocessed input text; and
modifying said unprocessed input text based on said at least one determined transformation.
3. The method of claim 2, further comprising:
sending said modified unprocessed input text to said client device.
4. The method of claim 1,
wherein at least one transformation of said plurality of transformations comprises replacement of at least one transformable character in said processed text with a matching replacement character or replacement character string, and
wherein including bait in said processed text comprises including said at least one transformable character in said processed text.
5. The method of claim 4, further comprising:
applying a reverse transformation on said processed text to obtain unprocessed input text; and
modifying said unprocessed input text by replacing said at least one transformable character in said unprocessed input text with said matching replacement character or replacement character string.
6. The method of claim 5, further comprising:
sending said modified unprocessed input text to said client device.
7. The method of claim 1,
wherein at least one transformation of said plurality of transformations comprises omitting HTML tags in said processed text, and
wherein including bait in said processed text comprises including an HTML tag in said processed text.
8. The method of claim 7, further comprising:
applying a reverse transformation on said processed text to obtain unprocessed input text;
modifying said unprocessed input text by omitting HTML tags contained therein; and
sending said modified unprocessed input text to said client device.
9. A system for securing data transmitted between a client device and a server, wherein the server is adapted to transform text received from said client device by applying at least one of a plurality of transformations, said system comprising:
an intermediate module configured to:
receive input text;
process said input text to obtain processed text by including bait in said processed text;
transmit the processed text to the server;
upon request, receive transformed processed text from the server, said server having applied at least one of said plurality of transformations to said processed text to obtain said transformed processed text; and
determine at least one of said transformations applied by said server based on a comparison between the processed text and the transformed processed text.
10. The system of claim 9, wherein said intermediate module us further configured to:
apply a reverse transformation on said processed text to obtain unprocessed input text; and
modify said unprocessed input text based on said at least one determined transformation.
11. The system of claim 10, wherein said intermediate module us further configured to:
send said modified unprocessed input text to said client device.
12. The system of claim 9,
wherein at least one transformation of said plurality of transformations comprises replacement of at least one transformable character in said processed text with a matching replacement character or replacement character string, and
wherein said intermediate module is to process said input text to obtain processed text by including said at least one transformable character in said processed text.
13. The system of claim 12, wherein said intermediate module us further configured to:
apply a reverse transformation on said processed text to obtain unprocessed input text; and
modify said unprocessed input text by replacing said at least one transformable character in said unprocessed input text with said matching replacement character or replacement character string.
14. The method of claim 13, wherein said intermediate module is further to:
send said modified unprocessed input text to said client device.
15. The system of claim 9,
wherein at least one transformation of said plurality of transformations comprises omission of HTML tags in said processed text, and
wherein said intermediate module is to process said input text to obtain processed text by including an HTML tag in said processed text.
16. The system of claim 15, wherein said intermediate module is further to:
apply a reverse transformation on said processed text to obtain unprocessed input text;
modify said unprocessed input text by omitting HTML tags contained therein; and
send said modified unprocessed input text to said client device.
US12/982,690 2008-09-15 2010-12-30 System, apparatus and method for encryption and decryption of data transmitted over a network Active 2030-11-17 US9002976B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/982,690 US9002976B2 (en) 2008-09-15 2010-12-30 System, apparatus and method for encryption and decryption of data transmitted over a network

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US9689108P 2008-09-15 2008-09-15
PCT/IL2009/000901 WO2010029559A1 (en) 2008-09-15 2009-09-15 Method and system for secure use of services by untrusted storage providers
US29139809P 2009-12-31 2009-12-31
US30620710P 2010-02-19 2010-02-19
US12/982,690 US9002976B2 (en) 2008-09-15 2010-12-30 System, apparatus and method for encryption and decryption of data transmitted over a network

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
PCT/IL2009/000901 Continuation-In-Part WO2010029559A1 (en) 2008-09-15 2009-09-15 Method and system for secure use of services by untrusted storage providers

Publications (2)

Publication Number Publication Date
US20110167129A1 true US20110167129A1 (en) 2011-07-07
US9002976B2 US9002976B2 (en) 2015-04-07

Family

ID=44225343

Family Applications (6)

Application Number Title Priority Date Filing Date
US12/982,695 Abandoned US20110167121A1 (en) 2008-09-15 2010-12-30 System, apparatus and method for encryption and decryption of data transmitted over a network
US12/982,690 Active 2030-11-17 US9002976B2 (en) 2008-09-15 2010-12-30 System, apparatus and method for encryption and decryption of data transmitted over a network
US12/982,688 Active 2030-12-14 US9444793B2 (en) 2008-09-15 2010-12-30 System, apparatus and method for encryption and decryption of data transmitted over a network
US12/982,694 Active 2030-03-10 US8738683B2 (en) 2008-09-15 2010-12-30 System, apparatus and method for encryption and decryption of data transmitted over a network
US12/982,691 Expired - Fee Related US9338139B2 (en) 2008-09-15 2010-12-30 System, apparatus and method for encryption and decryption of data transmitted over a network
US15/259,542 Active US10021078B2 (en) 2008-09-15 2016-09-08 System, apparatus and method for encryption and decryption of data transmitted over a network

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US12/982,695 Abandoned US20110167121A1 (en) 2008-09-15 2010-12-30 System, apparatus and method for encryption and decryption of data transmitted over a network

Family Applications After (4)

Application Number Title Priority Date Filing Date
US12/982,688 Active 2030-12-14 US9444793B2 (en) 2008-09-15 2010-12-30 System, apparatus and method for encryption and decryption of data transmitted over a network
US12/982,694 Active 2030-03-10 US8738683B2 (en) 2008-09-15 2010-12-30 System, apparatus and method for encryption and decryption of data transmitted over a network
US12/982,691 Expired - Fee Related US9338139B2 (en) 2008-09-15 2010-12-30 System, apparatus and method for encryption and decryption of data transmitted over a network
US15/259,542 Active US10021078B2 (en) 2008-09-15 2016-09-08 System, apparatus and method for encryption and decryption of data transmitted over a network

Country Status (1)

Country Link
US (6) US20110167121A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11184149B2 (en) * 2019-02-19 2021-11-23 International Business Machines Corporation Computing range queries over encrypted data

Families Citing this family (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8781988B1 (en) * 2007-07-19 2014-07-15 Salesforce.Com, Inc. System, method and computer program product for messaging in an on-demand database service
US20110167121A1 (en) 2008-09-15 2011-07-07 Ben Matzkel System, apparatus and method for encryption and decryption of data transmitted over a network
US8595812B2 (en) * 2009-12-18 2013-11-26 Sabre Inc. Tokenized data security
EP2572493A1 (en) 2010-05-21 2013-03-27 Vaultive Ltd. System and method for controlling and monitoring access to data processing applications
US8631460B2 (en) 2011-03-23 2014-01-14 CipherPoint Software, Inc. Systems and methods for implementing transparent encryption
US8990266B2 (en) 2011-10-18 2015-03-24 CipherPoint Software, Inc. Dynamic data transformations for network transmissions
US9047489B2 (en) * 2011-11-14 2015-06-02 Wave Systems Corp. Security systems and methods for social networking
EP2795477B1 (en) * 2011-12-19 2018-07-25 Uthisme LLC Privacy system
EP2899648A1 (en) * 2012-09-20 2015-07-29 Kabushiki Kaisha Toshiba Data processing device, data management system, data processing method, and program
JP6225918B2 (en) * 2012-12-12 2017-11-08 日本電気株式会社 Database search apparatus, database search method and program
US9246885B2 (en) 2013-04-02 2016-01-26 International Business Machines Corporation System, method, apparatus and computer programs for securely using public services for private or enterprise purposes
US9953171B2 (en) 2014-09-22 2018-04-24 Infosys Limited System and method for tokenization of data for privacy
US9860324B1 (en) * 2014-12-10 2018-01-02 Google Llc Rapid establishment of a connection from multiple address locations
WO2016120975A1 (en) * 2015-01-26 2016-08-04 株式会社日立製作所 Data aggregation/analysis system and method therefor
US10097522B2 (en) 2015-05-21 2018-10-09 Nili Philipp Encrypted query-based access to data
US10237246B1 (en) * 2015-07-31 2019-03-19 Symphony Communication Services Holdings Llc Secure message search
US9460280B1 (en) 2015-10-28 2016-10-04 Min Ni Interception-proof authentication and encryption system and method
CN106228040B (en) * 2016-07-13 2018-11-13 成都知道创宇信息技术有限公司 A kind of three layers of encryption method of webpage source code
US10819709B1 (en) 2016-09-26 2020-10-27 Symphony Communication Services Holdings Llc Authorizing delegated capabilities to applications in a secure end-to-end communications system
US11030618B1 (en) 2016-09-30 2021-06-08 Winkk, Inc. Authentication and personal data sharing for partner services using out-of-band optical mark recognition
CN106951181A (en) * 2017-02-21 2017-07-14 深圳大普微电子科技有限公司 A kind of control device of data-storage system
EP3373546A1 (en) * 2017-03-09 2018-09-12 EOS Health Honorarmanagement AG Computer system and method with client, proxy-server and server
US10911233B2 (en) * 2017-09-11 2021-02-02 Zscaler, Inc. Identification of related tokens in a byte stream using structured signature data
WO2020018454A1 (en) * 2018-07-16 2020-01-23 Islamov Rustam Cryptography operations for secure post-quantum communications
US10528754B1 (en) 2018-10-09 2020-01-07 Q-Net Security, Inc. Enhanced securing of data at rest
US11216575B2 (en) 2018-10-09 2022-01-04 Q-Net Security, Inc. Enhanced securing and secured processing of data at rest
US11652815B2 (en) 2019-12-10 2023-05-16 Winkk, Inc. Security platform architecture
US11328042B2 (en) 2019-12-10 2022-05-10 Winkk, Inc. Automated transparent login without saved credentials or passwords
US12073378B2 (en) 2019-12-10 2024-08-27 Winkk, Inc. Method and apparatus for electronic transactions using personal computing devices and proxy services
US11574045B2 (en) 2019-12-10 2023-02-07 Winkk, Inc. Automated ID proofing using a random multitude of real-time behavioral biometric samplings
US11553337B2 (en) 2019-12-10 2023-01-10 Winkk, Inc. Method and apparatus for encryption key exchange with enhanced security through opti-encryption channel
US11657140B2 (en) 2019-12-10 2023-05-23 Winkk, Inc. Device handoff identification proofing using behavioral analytics
US12132763B2 (en) 2019-12-10 2024-10-29 Winkk, Inc. Bus for aggregated trust framework
US11928193B2 (en) 2019-12-10 2024-03-12 Winkk, Inc. Multi-factor authentication using behavior and machine learning
US11936787B2 (en) 2019-12-10 2024-03-19 Winkk, Inc. User identification proofing using a combination of user responses to system turing tests using biometric methods
US20220261538A1 (en) * 2021-02-17 2022-08-18 Inteliquet, Inc. Skipping natural language processor
US11843943B2 (en) 2021-06-04 2023-12-12 Winkk, Inc. Dynamic key exchange for moving target
US12095751B2 (en) 2021-06-04 2024-09-17 Winkk, Inc. Encryption for one-way data stream
US20220414261A1 (en) * 2021-06-28 2022-12-29 DeCurtis, LLC Masking sensitive data for logging
US11824999B2 (en) 2021-08-13 2023-11-21 Winkk, Inc. Chosen-plaintext secure cryptosystem and authentication
US11972424B1 (en) * 2021-08-31 2024-04-30 Amazon Technologies, Inc. Detection of evasive item listings

Citations (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5585793A (en) * 1994-06-10 1996-12-17 Digital Equipment Corporation Order preserving data translation
US5870084A (en) * 1996-11-12 1999-02-09 Thomson Consumer Electronics, Inc. System and method for efficiently storing and quickly retrieving glyphs for large character set languages in a set top box
US5958006A (en) * 1995-11-13 1999-09-28 Motorola, Inc. Method and apparatus for communicating summarized data
US20010044893A1 (en) * 2000-01-07 2001-11-22 Tropic Networks Onc. Distributed subscriber management system
US6334140B1 (en) * 1997-09-25 2001-12-25 Nec Corporation Electronic mail server in which electronic mail is processed
US20020046253A1 (en) * 2000-07-04 2002-04-18 Jiyunji Uchida Electronic file management system and method
US20020073099A1 (en) * 2000-12-08 2002-06-13 Gilbert Eric S. De-identification and linkage of data records
US6678694B1 (en) * 2000-11-08 2004-01-13 Frank Meik Indexed, extensible, interactive document retrieval system
US20050060372A1 (en) * 2003-08-27 2005-03-17 Debettencourt Jason Techniques for filtering data from a data stream of a web services application
US20050165623A1 (en) * 2003-03-12 2005-07-28 Landi William A. Systems and methods for encryption-based de-identification of protected health information
US20050198170A1 (en) * 2003-12-12 2005-09-08 Lemay Michael Secure electronic message transport protocol
US20050235163A1 (en) * 2004-04-15 2005-10-20 International Business Machines Corporation Method for selective encryption within documents
US20060101285A1 (en) * 2004-11-09 2006-05-11 Fortiva Inc. Secure and searchable storage system and method
US20060143237A1 (en) * 2000-03-09 2006-06-29 Pkware, Inc. System and method for manipulating and managing computer archive files
US20060170544A1 (en) * 2003-02-14 2006-08-03 Ulrich Hahn Electronic intermediate module
US20060251246A1 (en) * 2003-03-07 2006-11-09 Yoshinori Matsui Encryption device, decryption device, and data reproduction device
US7155442B2 (en) * 2002-06-28 2006-12-26 Microsoft Corporation Compressed normalized character comparison with inversion
US7165175B1 (en) * 2000-09-06 2007-01-16 Widevine Technologies, Inc. Apparatus, system and method for selectively encrypting different portions of data sent over a network
US20070100913A1 (en) * 2005-10-12 2007-05-03 Sumner Gary S Method and system for data backup
US20070130069A1 (en) * 2005-12-06 2007-06-07 Microsoft Corporation Encapsulating Address Components
US20070168656A1 (en) * 2005-12-29 2007-07-19 Paganetti Robert J Method for enabling a user to initiate a password protected backup of the user's credentials
US20070198206A1 (en) * 2002-09-12 2007-08-23 Leif Jagerbrand Device for indicating downloading of data items
US20080005247A9 (en) * 2002-09-18 2008-01-03 Advenix, Corp. (Ca Corporation) Enhancement of e-mail client user interfaces and e-mail message formats
US20080147816A1 (en) * 2002-03-01 2008-06-19 Tralix, L.L.C. System and methods for electronic mail message subject tracking
US20080276098A1 (en) * 2007-05-01 2008-11-06 Microsoft Corporation One-time password access to password-protected accounts
US20090327748A1 (en) * 2004-01-05 2009-12-31 International Business Machines Corp. System and method for fast querying of encrypted databases
US20100169665A1 (en) * 2006-10-04 2010-07-01 Kang Hee-Chang Method for indexing encrypted column
US20110072489A1 (en) * 2009-09-23 2011-03-24 Gilad Parann-Nissany Methods, devices, and media for securely utilizing a non-secured, distributed, virtualized network resource with applications to cloud-computing security and management

Family Cites Families (68)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5333266A (en) * 1992-03-27 1994-07-26 International Business Machines Corporation Method and apparatus for message handling in computer systems
US5537586A (en) * 1992-04-30 1996-07-16 Individual, Inc. Enhanced apparatus and methods for retrieving and selecting profiled textural information records from a database of defined category structures
US5659737A (en) 1995-08-01 1997-08-19 Oracle Corporation Methods and apparatus for data compression that preserves order by using failure greater than and failure less than tokens
IL121071A0 (en) 1997-03-27 1997-11-20 El Mar Software Ltd Automatic conversion server
US6128735A (en) * 1997-11-25 2000-10-03 Motorola, Inc. Method and system for securely transferring a data set in a data communications system
US20010037454A1 (en) 2000-05-01 2001-11-01 Botti John T. Computer networked system and method of digital file management and authentication
GB2349960A (en) 1999-05-08 2000-11-15 Ibm Secure password provision
US7146505B1 (en) 1999-06-01 2006-12-05 America Online, Inc. Secure data exchange between date processing systems
US6567857B1 (en) 1999-07-29 2003-05-20 Sun Microsystems, Inc. Method and apparatus for dynamic proxy insertion in network traffic flow
US6523063B1 (en) 1999-08-30 2003-02-18 Zaplet, Inc. Method system and program product for accessing a file using values from a redirect message string for each change of the link identifier
US6961849B1 (en) 1999-10-21 2005-11-01 International Business Machines Corporation Selective data encryption using style sheet processing for decryption by a group clerk
JP2001147934A (en) 1999-11-19 2001-05-29 Nippon Telegr & Teleph Corp <Ntt> Enciphered information distributing method and device capable of retrieving information
WO2001047205A2 (en) 1999-12-22 2001-06-28 Tashilon Ltd. Enhanced computer network encryption using downloaded software objects
US20020199096A1 (en) 2001-02-25 2002-12-26 Storymail, Inc. System and method for secure unidirectional messaging
US7178100B2 (en) 2000-12-15 2007-02-13 Call Charles G Methods and apparatus for storing and manipulating variable length and fixed length data elements as a sequence of fixed length integers
US6804677B2 (en) 2001-02-26 2004-10-12 Ori Software Development Ltd. Encoding semi-structured data for efficient search and browsing
US20020184336A1 (en) 2001-03-01 2002-12-05 Rising Hawley K. Occurrence description schemes for multimedia content
GB2377514B (en) * 2001-07-05 2005-04-27 Hewlett Packard Co Document encryption
US20030033357A1 (en) 2001-08-13 2003-02-13 Luu Tran Client aware content selection and retrieval in a wireless portal system
US6492917B1 (en) 2001-10-31 2002-12-10 Hughes Electronics Corporation System and method for implementation of the YK lossless data compression algorithm using a modular computational architecture
US7171557B2 (en) 2001-10-31 2007-01-30 Hewlett-Packard Development Company, L.P. System for optimized key management with file groups
US7631084B2 (en) 2001-11-02 2009-12-08 Juniper Networks, Inc. Method and system for providing secure access to private networks with client redirection
AU2003210789A1 (en) * 2002-02-01 2003-09-02 John Fairweather A system and method for managing dataflows
US20030163691A1 (en) 2002-02-28 2003-08-28 Johnson Ted Christian System and method for authenticating sessions and other transactions
DE10239061A1 (en) 2002-08-26 2004-03-11 Siemens Ag Method for transferring user data objects
JP2004101905A (en) 2002-09-10 2004-04-02 Sharp Corp Information display device
US7925717B2 (en) 2002-12-20 2011-04-12 Avaya Inc. Secure interaction between a mobile client device and an enterprise application in a communication system
CN1729460B (en) 2002-12-20 2010-05-12 日本电信电话株式会社 Communication method, communication system, relay system, mail distribution system and method
US7296011B2 (en) * 2003-06-20 2007-11-13 Microsoft Corporation Efficient fuzzy match for evaluating data records
US7506070B2 (en) 2003-07-16 2009-03-17 Sun Microsytems, Inc. Method and system for storing and retrieving extensible multi-dimensional display property configurations
JP2005130352A (en) 2003-10-27 2005-05-19 Victor Co Of Japan Ltd Decoder
US7426752B2 (en) * 2004-01-05 2008-09-16 International Business Machines Corporation System and method for order-preserving encryption for numeric data
JP3998640B2 (en) 2004-01-16 2007-10-31 株式会社東芝 Encryption and signature method, apparatus and program
JP3945708B2 (en) * 2004-01-23 2007-07-18 インターナショナル・ビジネス・マシーンズ・コーポレーション Information processing system, conversion processing system, inverse conversion processing system, conversion method, conversion program, and recording medium
JP2005242740A (en) 2004-02-27 2005-09-08 Open Loop:Kk Program, storage medium and information processor in information security system
JP2005284915A (en) 2004-03-30 2005-10-13 Canon Inc Information retrieval device and method, information retrieval system, and control method for the same
JP3761557B2 (en) 2004-04-08 2006-03-29 株式会社日立製作所 Key distribution method and system for encrypted communication
US7519835B2 (en) * 2004-05-20 2009-04-14 Safenet, Inc. Encrypted table indexes and searching encrypted tables
US20060005017A1 (en) 2004-06-22 2006-01-05 Black Alistair D Method and apparatus for recognition and real time encryption of sensitive terms in documents
ATE413077T1 (en) 2004-06-25 2008-11-15 Telecom Italia Spa METHOD AND SYSTEM FOR PROTECTING INFORMATION EXCHANGED DURING COMMUNICATIONS BETWEEN USERS
JP2006079213A (en) 2004-09-07 2006-03-23 Ntt Docomo Inc Relay device, authentication server, and authentication method
US7509574B2 (en) 2005-02-11 2009-03-24 Fujitsu Limited Method and system for reducing delimiters
US20060224687A1 (en) 2005-03-31 2006-10-05 Popkin Laird A Method and apparatus for offline cooperative file distribution using cache nodes
US20070038579A1 (en) * 2005-08-12 2007-02-15 Tsys-Prepaid, Inc. System and method using order preserving hash
DE102005051577B4 (en) * 2005-10-21 2008-04-30 Engel Solutions Ag Method for encrypting or decrypting data packets of a data stream and signal sequence and data processing system for carrying out the method
US7979569B2 (en) 2005-12-01 2011-07-12 Firestar Software, Inc. System and method for exchanging information among exchange applications
US20100046755A1 (en) 2005-12-07 2010-02-25 Fiske Software Llc Cryptography related to keys with signature
ITUD20050209A1 (en) * 2005-12-09 2007-06-10 Eurotech Spa METHOD FOR THE FINDING OF AFFINITY BETWEEN SUBJECTS AND ITS APPARATUS
US20090132362A1 (en) 2007-11-21 2009-05-21 Mobile Candy Dish, Inc. Method and system for delivering information to a mobile communication device based on consumer transactions
US20070208732A1 (en) 2006-02-07 2007-09-06 Future Vistas, Inc. Telephonic information retrieval systems and methods
JP4561661B2 (en) 2006-03-09 2010-10-13 日本電気株式会社 Decoding method and decoding apparatus
JP4736877B2 (en) 2006-03-16 2011-07-27 日本電気株式会社 Demultiplexer and demultiplexer
US7895666B1 (en) 2006-09-01 2011-02-22 Hewlett-Packard Development Company, L.P. Data structure representation using hash-based directed acyclic graphs and related method
WO2008086189A2 (en) * 2007-01-04 2008-07-17 Wide Angle Llc Relevancy rating of tags
JP2008301335A (en) 2007-06-01 2008-12-11 Kddi R & D Laboratories Inc Video signal switching apparatus
FR2920559B1 (en) * 2007-08-30 2011-07-01 Xooloo DISTRIBUTED DATABASE
KR100903601B1 (en) * 2007-10-24 2009-06-18 한국전자통신연구원 Searching system for encrypted numeric data and searching method therefor
US8191117B2 (en) 2007-10-25 2012-05-29 Anchorfree, Inc. Location-targeted online services
EP2300966A4 (en) * 2008-05-01 2011-10-19 Peter Sweeney Method, system, and computer program for user-driven dynamic generation of semantic networks and media synthesis
WO2010026561A2 (en) 2008-09-08 2010-03-11 Confidato Security Solutions Ltd. An appliance, system, method and corresponding software components for encrypting and processing data
US20110167121A1 (en) 2008-09-15 2011-07-07 Ben Matzkel System, apparatus and method for encryption and decryption of data transmitted over a network
CA2736584C (en) 2008-09-15 2018-02-27 Vaultive Ltd. Method and system for secure use of services by untrusted storage providers
US20100146299A1 (en) * 2008-10-29 2010-06-10 Ashwin Swaminathan System and method for confidentiality-preserving rank-ordered search
KR101174058B1 (en) 2008-12-18 2012-08-16 한국전자통신연구원 Method for saving and serching of encrypted data on database
US8819451B2 (en) * 2009-05-28 2014-08-26 Microsoft Corporation Techniques for representing keywords in an encrypted search index to prevent histogram-based attacks
US8180785B2 (en) * 2009-06-30 2012-05-15 International Business Machines Corporation Method and system for searching numerical terms
CA2786058C (en) 2009-12-31 2017-03-28 Vaultive Ltd. System, apparatus and method for encryption and decryption of data transmitted over a network
EP2572493A1 (en) 2010-05-21 2013-03-27 Vaultive Ltd. System and method for controlling and monitoring access to data processing applications

Patent Citations (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5585793A (en) * 1994-06-10 1996-12-17 Digital Equipment Corporation Order preserving data translation
US5958006A (en) * 1995-11-13 1999-09-28 Motorola, Inc. Method and apparatus for communicating summarized data
US5870084A (en) * 1996-11-12 1999-02-09 Thomson Consumer Electronics, Inc. System and method for efficiently storing and quickly retrieving glyphs for large character set languages in a set top box
US6334140B1 (en) * 1997-09-25 2001-12-25 Nec Corporation Electronic mail server in which electronic mail is processed
US20010044893A1 (en) * 2000-01-07 2001-11-22 Tropic Networks Onc. Distributed subscriber management system
US20060143237A1 (en) * 2000-03-09 2006-06-29 Pkware, Inc. System and method for manipulating and managing computer archive files
US20020046253A1 (en) * 2000-07-04 2002-04-18 Jiyunji Uchida Electronic file management system and method
US7165175B1 (en) * 2000-09-06 2007-01-16 Widevine Technologies, Inc. Apparatus, system and method for selectively encrypting different portions of data sent over a network
US6678694B1 (en) * 2000-11-08 2004-01-13 Frank Meik Indexed, extensible, interactive document retrieval system
US20020073099A1 (en) * 2000-12-08 2002-06-13 Gilbert Eric S. De-identification and linkage of data records
US20080147816A1 (en) * 2002-03-01 2008-06-19 Tralix, L.L.C. System and methods for electronic mail message subject tracking
US7155442B2 (en) * 2002-06-28 2006-12-26 Microsoft Corporation Compressed normalized character comparison with inversion
US20070198206A1 (en) * 2002-09-12 2007-08-23 Leif Jagerbrand Device for indicating downloading of data items
US20080005247A9 (en) * 2002-09-18 2008-01-03 Advenix, Corp. (Ca Corporation) Enhancement of e-mail client user interfaces and e-mail message formats
US20060170544A1 (en) * 2003-02-14 2006-08-03 Ulrich Hahn Electronic intermediate module
US20060251246A1 (en) * 2003-03-07 2006-11-09 Yoshinori Matsui Encryption device, decryption device, and data reproduction device
US20050165623A1 (en) * 2003-03-12 2005-07-28 Landi William A. Systems and methods for encryption-based de-identification of protected health information
US20050060372A1 (en) * 2003-08-27 2005-03-17 Debettencourt Jason Techniques for filtering data from a data stream of a web services application
US20050198170A1 (en) * 2003-12-12 2005-09-08 Lemay Michael Secure electronic message transport protocol
US20090327748A1 (en) * 2004-01-05 2009-12-31 International Business Machines Corp. System and method for fast querying of encrypted databases
US20050235163A1 (en) * 2004-04-15 2005-10-20 International Business Machines Corporation Method for selective encryption within documents
US20060101285A1 (en) * 2004-11-09 2006-05-11 Fortiva Inc. Secure and searchable storage system and method
US20070100913A1 (en) * 2005-10-12 2007-05-03 Sumner Gary S Method and system for data backup
US20070130069A1 (en) * 2005-12-06 2007-06-07 Microsoft Corporation Encapsulating Address Components
US20070168656A1 (en) * 2005-12-29 2007-07-19 Paganetti Robert J Method for enabling a user to initiate a password protected backup of the user's credentials
US20100169665A1 (en) * 2006-10-04 2010-07-01 Kang Hee-Chang Method for indexing encrypted column
US20080276098A1 (en) * 2007-05-01 2008-11-06 Microsoft Corporation One-time password access to password-protected accounts
US20110072489A1 (en) * 2009-09-23 2011-03-24 Gilad Parann-Nissany Methods, devices, and media for securely utilizing a non-secured, distributed, virtualized network resource with applications to cloud-computing security and management

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11184149B2 (en) * 2019-02-19 2021-11-23 International Business Machines Corporation Computing range queries over encrypted data

Also Published As

Publication number Publication date
US20110167121A1 (en) 2011-07-07
US8738683B2 (en) 2014-05-27
US9002976B2 (en) 2015-04-07
US20160380983A1 (en) 2016-12-29
US20110167255A1 (en) 2011-07-07
US9338139B2 (en) 2016-05-10
US9444793B2 (en) 2016-09-13
US20110167102A1 (en) 2011-07-07
US20110167107A1 (en) 2011-07-07
US10021078B2 (en) 2018-07-10

Similar Documents

Publication Publication Date Title
US10021078B2 (en) System, apparatus and method for encryption and decryption of data transmitted over a network
CA2786058C (en) System, apparatus and method for encryption and decryption of data transmitted over a network
US10013574B2 (en) Method and apparatus for secure storage and retrieval of encrypted files in public cloud-computing platforms
US9576005B2 (en) Search system
US9021135B2 (en) System and method for tokenization of data for storage in a cloud
US8166313B2 (en) Method and apparatus for dump and log anonymization (DALA)
US7373345B2 (en) Additional hash functions in content-based addressing
US8041719B2 (en) Personal computing device-based mechanism to detect preselected data
US20150088933A1 (en) Controlling disclosure of structured data
US10298401B1 (en) Network content search system and method
CA2499508A1 (en) Detection of preselected data
US20130246338A1 (en) System and method for indexing a capture system
EP2702723B1 (en) System and method for data obfuscation in interception of communication with a cloud
CN114580008B (en) Document access control based on document component layout
CN115879157A (en) Data security search method and device, equipment, medium and product thereof

Legal Events

Date Code Title Description
AS Assignment

Owner name: VAULTIVE LTD., ISRAEL

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MATZKEL, BEN;TAL, MAAYAN;LAHAV, AVIAD;REEL/FRAME:026408/0328

Effective date: 20110315

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: SURCHARGE FOR LATE PAYMENT, SMALL ENTITY (ORIGINAL EVENT CODE: M2554); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

Year of fee payment: 4

AS Assignment

Owner name: CYBERARK SOFTWARE LTD., ISRAEL

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:VAULTIVE LTD.;REEL/FRAME:048223/0758

Effective date: 20190121

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8