We want to create a privacy-first data clean room solution where data movements from sources are possible and trustworthy enough for customers to use.
****Let $\tau$ be a Set of all possible tables. The source with an owner $c$ $\isin$ $C$, $S_c$ is a table containing a set of records such that $S_c \isin \tau$, where $C$ is the set of all Collaborators.
Sources can be seen as the raw data tables that Collaborator provides during a Collaboration for data analysis or any other use case.
Transformation: A Transformation is a function with an owner $c \isin C$, $t_c$ can be defined as:
where $\phi$ is a null transformation, $C$ is the set of all Collaborators, $S$ is the set of all Sources, $T$ is the set of all Transformations, and $\tau$ is the set of all tables.
Destination: A Destination with an owner $c \isin C$ $$ can be defined as $d_c \subseteq D \times T$, where $C$ is the set of all Collaborators, $T$ is the set of all Transformations, and $D \subseteq \tau$, where $\tau$ is set of all tables.
Data Access Grants: Access Controls are the set of permissions that can be:
transformations_allowed
: a source gives to another transformation from another collaborator to use itselfdestinations_allowed
: a source gives a tuple of (destination, transformation) to a destination owner. The transformation here means the transformation output of the mentioned transformation.destinations_allowed
: allows a destination owner to use its output as a destination request.A Source also gives permissions at column-level granularity:
These are the fields (non-exhaustive) that need to specified for DP SQL Queries:
Trust Group: If the transformation is a Private SQl Query then, Collaborators who give data access grants to same type of transformation forms a trust group. Only one member of the trust group is allowed to define transformation_allowed/noise parameters and define destinations for the same.
Collaboration: Collaboration will happen when all collaborators send their packages to clean room service and to drive their analysis/insights through clean room service. To start a collaboration, a folder will be provided as input to DCR App, in which will contain all the collaboration packages of all the collaborators involved in a Collaboration.
Collaboration Package: A collaboration package consists of
that contains enough context metadata to create a collaboration event successfully. Each collaborator will have its own collaboration package.
Collaboration Graph: A Collaboration Graph is a Directed Acyclic Graph that defines the relationship between Source Tables, Transformations, and Destinations. The Graph will have directed outgoing edges only from transformation to sources and destination to transformation.
Clean Room Service: A Service that operates on data from Source Tables provided by the collaborators in the collaboration package. Extracts session Data from where the source tables are located. Apply transformation on the data as per defined in the package and send back The kind of operation/query to be defined in the Collaboration Package.
Collaboration Event: A collaboration event will be created for every transformation asked to be run by a collaborator, along with the batch of data needed to run the transformation generated from the parametric transformation used, and return computation results to collaborators who have permission to see the results.
Orchestrator: Orchestrator is what creates a new clean room service session on successful validation of the collaboration package. Orchestrator also generates/modifies the query that needs to be run inside the trusted env. Orchestrator also needs to do security checks to prevent any malicious query from executing inside the environment.