The Journal of Systems and Software 92 (2014) 20–31
Contents lists available at ScienceDirect
The Journal of Systems and Software
journal homepage: www.elsevier.com/locate/jss
Mosco: a privacy-aware middleware for mobile social computing夽
Dinh Tien Tuan Anh ∗ , Milind Ganjoo, Stefano Braghin, Anwitaman Datta
School of Computer Engineering, Nanyang Technological University, Singapore
a r t i c l e
i n f o
Article history:
Received 15 October 2012
Received in revised form
16 November 2013
Accepted 22 November 2013
Available online 8 December 2013
Keywords:
Fine-grained access control
Social computing
XACML
Google App Engine
a b s t r a c t
The proliferation of mobile devices coupled with Internet access is generating a tremendous amount of
highly personal and sensitive data. Applications such as location-based services and quantified self harness
such data to bring meaningful context to users’ behavior. As social applications are becoming prevalent,
there is a trend for users to share their mobile data. The nature of online social networking poses new
challenges for controlling access to private data, as compared to traditional enterprise systems. First,
the user may have a large number of friends, each associated with a unique access policy. Second, the
access control policies must be dynamic and fine-grained, i.e. they are content-based, as opposed to allor-nothing. In this paper, we investigate the challenges in sharing of mobile data in social applications.
We design and evaluate a middleware running on Google App Engine, named Mosco, that manages and
facilitates sharing of mobile data in a privacy-preserving manner. We use Mosco to develop a location
sharing and a health monitoring application. Mosco helps shorten the development process. Finally, we
perform benchmarking experiments with Mosco, the results of which indicate small overhead and high
scalability.
© 2013 Elsevier Inc. All rights reserved.
1. Introduction
The time of ubiquitous computing seems to have finally arrived.
As computing devices are getting increasingly smaller, cheaper,
more connected and more powerful, they gradually become indispensable to everyday life. In particular, smart phones equipped
with numerous sensory capabilities, always-on network connectivity and powerful CPU have enjoyed a remarkable growth during
the past few years. They can record user activities with their own
sensors (GPS, accelerometer, etc.) or act as a portal to receive data
from other devices (speedometer, heart-rate monitor, etc.) through
short-range wireless communication.
Social computing has successfully latched on this trend and
enjoyed a rapid growth. In the social computing paradigm, user
behavior in social context is collected and analyzed by computing
systems to derive new values to individuals, as well as new societal insights that benefit the community. While Facebook, Google+
, Twitter, LastFm bring individuals together using the off-line social
connections, applications such as FourSquare, PatientsLikeMe,
夽 This work was supported by A*Star TSRP pCloud project (grant no. 102 158 0038).
Dinh Tien Tuan Anh, Milind Ganjoo and Stefano Braghin contributed to this work
when they were at NTU Singapore.
∗ Corresponding author. Tel.: +65 96745847.
E-mail addresses:
[email protected] (D.T. Tuan Anh),
[email protected]
(M. Ganjoo),
[email protected] (S. Braghin),
[email protected]
(A. Datta).
0164-1212/$ – see front matter © 2013 Elsevier Inc. All rights reserved.
https://dx.doi.org/10.1016/j.jss.2013.11.1110
Nike+ , exploit data from ubiquitous devices to add meaningful context to facilitate social interactions. Other applications like
PIER (Mun et al., 2009), CarTel (Hull et al., 2006) combine sensor
data from users to generate real-time pollution and traffic reports
which benefit the society as a whole.
An important premise to social computing is data sharing, either
amongst friends (social networks) or to third parties (for societal
services). However, sharing in social applications is challenging,
because the nature of data and of the applications demand a rigorous treatment of user privacy. In particular, controlling data access
in these settings is more troublesome than in traditional enterprise
systems. First, a user may have many friends and connections, each
associated with a unique access policy. Second, the policies are
highly dynamic and fine-grained, that is they are content-based
as opposed to the static, all-or-nothing policies. The vast amount
of data, combined with a large number of users and complex social
connections, add to the difficulty in designing privacy-aware social
applications.
For applications that depend on data generated from mobile
devices, user privacy must be addressed with foremost priority,
because the data is of highly personal and sensitive nature. Two
popular social applications that illustrate the needs of more finegrained access control are location sharing (Foursquare, Find my
friends) and quantified self (Quantified, Nike). In the former, a person may want to hide information from another based on their
proximity or the time of day. One may also want to blur the location
by concealing parts of the address, or to report only the statistics (number of check-ins at a particular place). In the latter, an
D.T. Tuan Anh et al. / The Journal of Systems and Software 92 (2014) 20–31
outpatient user may want to share his location and vital-sign readings to the doctors only when his heart rate exceeds a normal
threshold. To his friends, insurance companies or research institutes, only the average readings per hour are revealed.
While the abundance and availability of data induce more social
applications to appear, in many cases, different applications are created using different platforms and technologies. Even though cloud
computing (Armbrust et al., 2009; Amazon; Google) can deliver the
underlying computing infrastructure on-demand, these applications will need to be designed and written from scratch. We believe
that there is an immediate need for a middleware designed specifically for social computing applications. Besides being scalable in
handling large numbers of users and large amount of data, such
a middleware will shorten the development and deployment process, at the same time provide easy mechanisms for addressing
user privacy concerns. More specifically, it will come with easy-touse, extendible interfaces for specifying and enforcing fine-grained
access control with respect to other users of the system. Note that
an end-user may also want privacy from the underlying service
providers. The scenario of untrusted underlying service provider
is interesting and more challenging (see for instance Tuan Anh
and Datta, 2012 for a more comprehensive discussion on the
system/privacy design space), but it is beyond the scope of the
presented work.
In this paper, we present Mosco, a middleware designed for
privacy-preserving mobile social applications. Specifically, the
middleware runs on top of Google App Engine, thus the storage
and management of data are done automatically by the cloud in a
scalable manner. Mosco provides privacy with respect to the endusers, but it assumes that the service providing cloud platform
is trusted. Though there are lot of ongoing research on securing services against untrusted cloud service providers (Tuan Anh
and Datta, 2012), most real-life deployed applications are based
on trusted services, and Mosco’s aim is to augment such existing services with richer functionalities. Mosco accordingly makes
it easier for developers to avail themselves of the middleware’s
primitives to easily develop applications which would allow endusers to specify dynamic and fine-grained access policies, that
are efficiently enforced. It achieves this by extending the XACML
framework. It provides template implementation for a core set
of fined-grained access policies, so that new policies can be easily integrated. With data access being handled within Mosco, the
application developers can turn their focus to the data semantics
and application logics. Mosco provides an interface for data definition which can be readily extended for new applications. As a
consequence of these, the development and deployment process
are considerably shortened, while the resulting applications guarantee user privacy. These properties of Mosco are showcased in
our implementation of a location sharing and a health monitoring
application.
In summary, our contributions are as follows:
• We identify common scenarios for social computing applications
that necessitate fine-grained access control.
• We present the design and implementation of Mosco (source
code can be found at https://code.google.com/p/mosco), a
middleware for developing privacy-preserving mobile social
applications. To the best of our knowledge, Mosco is the first of
its kind. The middleware runs on Google App Engine and utilizes
XACML for specification and enforcement of fine-grained access
control policies.
• To demonstrate Mosco’s capabilities and flexibility, we implement two representative mobile applications using the middleware: a location sharing and a mobile health application. Mosco
provides storage and access to data in a scalable manner and
21
shortens the development and deployment process. Additionally,
it allows users to share data in a flexible, secure manner.
• We benchmark Mosco using both real and synthetic data. The
results suggest that it can scale gracefully with more users and
more data, and that the overhead introduced by the access control
mechanism is small.
The remainder of the paper is organized as follows. The next section describes motivating examples of mobile social applications,
and presents the core set of fine-grained access control policies.
Next, we detail the mechanism for defining and enforcing such
policies, especially the implementation of policies in XACML framework. Section 4 presents the design of Mosco. Section 5 follows with
the implementation details of two mobile social applications, and
results from the benchmark experiments with Mosco. Section 6
highlights related areas of our work. Finally, Section 7 concludes
and discusses avenues for future work.
2. Access control for mobile social computing
2.1. Motivating examples
There exists a plethora of mobile social applications, each providing a different social service either to the individual users or to
the ensemble community. They rely on users to share data generated from mobile devices, which gives rise to concerns about data
access control. In this section, we present some example applications which help identify and highlight the needs for highly
dynamic, flexible and content-based finer-grained access policies.
2.1.1. Location sharing
Existing location-based social networks such as Foursquare,
FindMyFriend, or check-in service (Facebook) employ all-ornothing sharing policy of user location. Given that one’s location
is a sensitive piece of data, it is important to be able to determine
not only to whom the data is shown, but also how much of the data
is shown.
Consider that Alice is on a night out on a weekend, and she would
rather avoid sharing her location with acquaintances at this time,
except for friends who happen to be nearby so that they could be
able to find her and meet up. This involves matching locations of
Alice and her friends to determine if they are in the same neighborhood or within a certain distance from each other. Allowing Alice to
set such a similarity metric allows her to dynamically differentiate
users who can and cannot see her location. During a workday, Alice
is at work and is willing to share her locations so that her colleagues
can find her during office hours. However, outside of office hours,
she would rather her colleagues do not know her whereabouts. This
can be achieved by defining a time window, so that certain friends
can see her location only when their requests are made within this
window. Alternatively, Alice may specify a set of locations as workplace and enable her friends to see her only when she checks in to
one of those locations. When traveling on vacation, she may want
to reveal the complete street address to her close friends and family, while sharing only the region or the country she is visiting to
her other friends. In this case, Alice must be able to specify the granularity at which her data is revealed, so that some friends may see
her exact locations while others only an approximate one.
2.1.2. Quantified self, or mobile health monitoring
The Quantified Self movement advocates the use of technology
to record and analyze users’ daily behavior. Its applications range from fitness (Nike), sleep pattern (Take), mood
change (Moodscope) to medical conditions (PatientsLikeMe,
Curetogether). At the current state of the art, users share their data
to their friends in a coarse-grained, all-or-nothing manner. This
22
D.T. Tuan Anh et al. / The Journal of Systems and Software 92 (2014) 20–31
practice might be acceptable for non-physiological data, but will
fail to meet privacy requirement for clinical health data.
Lets say Bob has been diagnosed with a chronic heart problem and seeks to better manage his condition using wearable
devices (Zephyr) to monitor his heart rate and blood pressure.
These data along with his location is collected every 10 min and
can be shared as it is to his family, who is deeply interested in the
state of his condition. Since personal activities can be inferred from
the data, Bob may only want to alert his physicians of abnormal
signs, i.e. when his heart rate is unusually high or the pressure is
unusually low. Thus, Bob needs to be able to set a threshold value,
that ensures data is revealed only if its value exceeds the threshold.
Bob also takes part in a clinical trial and undergoes some experimental treatments. The research institute overseeing the trial will
be interested in the improvement or degradation of his condition,
for which Bob would only want to share some statistical information such as average and 95th percentile readings on a daily basis, or
even add some noise to obfuscate the precise values before sharing
the data. To achieve this, Bob can specify a sliding window of oneday size which returns the appropriate information from within
the specified window. Alternatively, he can set a granularity level
which determines how much noise to add to the data. Finally, Bob
has to interact with his insurance company to claim back the medical costs. The insurance company would like to be able to verify
if his visits to hospitals were necessary. For this purpose, Bob may
only want to reveal the maximum (or minimum) readings, which
justifies his visits. To achieve this, Bob needs to restrict the insurance company to see only the statistics (max value, in this case) of
his data.
2.1.3. Participatory sensing
Applications of this kind are based on sensor data collected from
voluntary participants. In essence, participatory sensing pushes the
tasks of data collection (and even possibly some basic processing) to
the edge users, while mainly focusing on data analysis. PIER (Mun
et al., 2009) collects air quality measurements to generate pollution warnings of the unhealthy areas, and also to offer insights for
urban planning. Traffic monitoring and management likewise benefits from cars sharing their speeds and locations (Hull et al., 2006;
Hoh et al., 2008). These systems assume that users are readily willing to share their data, which is over-optimistic. Hoh et al. (2008)
and Cornelius et al. (2008) offer privacy to the participants by protecting their anonymity. However, it is not sufficient to hide user
identity, as exposing sensor data unnecessarily can still reveal sensitive information. For example, a user’s identity can be exposed by
identifying the most frequently traveled route as the route from his
house to work.
Suppose Alice commutes to work by driving. Her car is equipped
with a multi-sensor device that can record her location, speed,
energy consumption, air quality as well as road surface condition. She may want to share her speed and road conditions at
pre-defined, usually congested junctions to the transportation
authority, so that the latter can re-route the traffic in real-time or
plan to expand the roads. However, Alice would rather not reveal
her energy consumption which can be used to infer her car model.
In this case, Alice must define a filtering policy restricting access to
specific part of her data only when she comes within pre-defined
regions. If Alice takes part in a research on a new energy-efficient
fuel, the interesting information to share is how much fuel her car
spends over certain distance. For this, Alice would only want to
share her average fuel consumption per unit of distance traveled.
In particular, Alice may combine a filtering policy which hides the
other data fields, with a summary policy that shares only the statistics over each day. Finally, to support environmental initiatives,
Alice would like to share her commute routes with her colleagues
in order to identify opportunities for car-pooling. But she would like
to share her route only to those whose routes significantly overlap
with hers. To this end, Alice may want to define a similarity policy
granting access only to requesters who provide inputs similar to
her data.
2.2. Access control
The examples above illustrate that coarse-grained, all-ornothing data sharing is insufficient for many mobile social
applications. They further demonstrate that a white-list approach to
access control which focuses on the question of what to share and
with whom is preferable to the black-list approach that is mainly
concerned with whom and what not to share. In the following, we
delineate variables that characterize the access scenarios described
above: access policy (or policy), access subject (or subject) and policy
combination. An access control mechanism is secure if the subject
only gets access to the data defined by its associated policy.
For simplicity, we assume user data has the following schema:
userid A0 A1 . . .
where Ai ∈ A is the ith attribute whose values belong to an ordered
(possibly multi-dimensional) domain.
2.2.1. Fine-grained access policy
The set of all possible fine-grained access policies in social
mobile applications may be very large. In Mosco, we consider the
following set of four primitive policies. More complex policies can
be achieved by composing these primitives. We do not claim that
this set is complete, nevertheless it covers a wide range of access
policies.
• Filtering policy is defined as the tuple (F, D) where
F = {(ci , Ai )} specifies boolean functions ci (Ai ) for Ai ∈ A.
D ⊆ A is a set of attributes which are returned if all functions in F are evaluated to true. In the location sharing
example, Alice may create a policy ({ci (timestamp)},
{location}), where ci (timestamp):=timestamp>=9:00 am
Similarly,
in
the
health
AND timestamp<=05:00 pm.
monitoring example, Bob’s policy can be ({ci (blood
pressure)},
{blood pressure}),
where
ci (blood
pressure):=blood pressure > =t for a threshold value t.
• Granularity policy is defined as G = {(gi , Ai , Ti )} where gi ∈ N
represents granularity level (0 being the highest), and Ti is the
function that transforms values in Ai into a different level of granularity. This policy applies Ti to the attribute Ai using granularity
gi before returning Ai . When gi = Ti = null, the original value of
Ai is returned. In the location sharing example, Alice may set Ti
to return the street address for granularity level 1 and country
name for level 5, then assign gi = 1 and gi = 5 to her family and
other friends respectively. Note that the granularity levels need
to and can be defined based on the attribute semantics.
• Similarity policy is defined as (S, D) where S = {(ci , Ei , Ai ,
di )} specifies boolean functions ci (Ei , Ai , di ) returning true
if the distance between an external input Ei and value of
Ai is less than di . The policy returns attributes D ⊆ A if
all functions in S are evaluated to true. For instance, if
Alice wants only friends within 5 km radius to see her
location, her policy may be created with D = {location}
and ci (location,input,5):= dist(location, input) < =
5km where input is the friend’s location and dist computes the
geometric distance between the two locations. Similarly, when
sharing data for car-pool, Alice can hide her route information
unless it is more than 80% overlapping with her colleague’s,
by setting D = {route} and ci (route,input,0.8):= overlap(route, input) > = 0.8. Appropriate similarity functions
can be defined according to the attribute semantics.
23
D.T. Tuan Anh et al. / The Journal of Systems and Software 92 (2014) 20–31
• Summary policy is defined as W = {(fi , bi , wi , pi , Ai )} where fi
is a statistics function. bi , wi , pi together defines a sliding window starting after bi , with size wi and advancing pi steps every
window. This policy returns only the summary of Ai obtained by
applying fi over the sliding windows. In the mobile health example, Bob may share his blood pressure from 01/01/2011 with
the research institute by setting fi = Average, bi = 01/01/2011
00 : 00 am, wi = 24, pi = 24, Ai = bloodpressure. For Alice to share
her fuel consumption from the 01/01/2011 for the research
project, she may set fi = Sum, bi = 01/01/2011 00 : 00 am, wi = 24,
pi = 24, Ai ∈ {fuel, distance}.
One can compose multiple policies to define more complex scenarios. For example, Alice wanting to share the hourly summary
of her vital data during the day can specify a filtering policy (for
data generated in between 9 am and 5 pm) followed by a summary
policy (with window size of 1 h).
2.2.2. Access subject
Each policy described above must be associated with an entity
to whom the access is granted. At one extreme, a policy is applied
to anyone who is a friend of the user. At another, the policy is applicable to one specific user. In between, the user can define groups or
circles of friends and bind each policy to a specific group, so that the
same policy is applicable to any member of the group. An incoming
request will be evaluated against all the policies applicable to the
requester.
In the previous examples, both Alice and Bob can define at least
one circle for family member, one for close friends, and another
for work colleagues. When on holiday, Alice defines new granularity policies for these groups with different values of gi (smallest
value for family group and highest for colleagues). Bob may define
a single-user group containing his physician, to which he assigns
a filtering policy to only alert the subject of abnormal data. In the
participatory sensing example, Alice may specify a group containing the research institute staff and assign it a sliding window policy
which only reveals per-day total fuel consumption and travel distance.
2.2.3. Policy combination
It is not uncommon for a requester to be subject to multiple
access policies. For instance, David is both a work colleague and a
close friend of Alice, therefore belonging to two different circles.
When multiple policies returning different sets of data to the same
requester, the owner must be able to specify how to combine these
results together, i.e. a policy combining algorithm. A default returnall algorithm could have a serious privacy implication. For instance,
David and Alice have a few arguments and the latter decides to
move the former to her weak-acquaintance circle, with the intention of restricting his access to her data. Unfortunately, David also
belongs to Alice’s running-friend and university-friend circles, from
which Alice forgets or is not willing to remove David. A return-all
policy combining algorithm will not serve Alice’s purpose. However, she may define a most-restricted algorithm so that only the
weak-acquaintance policy is applied to David’s requests.
Notice that policy combination is not the same as policy composition. The former deals with how to derive result from outputs
of multiple policies. The latter concerns with evaluating policies
consisting of sequence of sub-policies: output of one sub-policy
becomes input of another.
1. user request
4. retrieve data
6. decision + processed
data
PEP
5. raw data
Data
3. authorization
decision + obligations
2. XACML request
PDP
Policy
Fig. 1. Extending XACML framework.
Mosco builds on an approach similar to what we recently proposed in Tuan Anh et al. (2012) which extends XACML—a popular
XML-based standard for specifying and enforcing policies.
3.1. XACML framework
At the high level, XACML consists of two components: a Policy
Enforcement Point (PEP) and a Policy Decision Point (PDP). Requests
first come to PEP, where they are marshaled into well-formed format before being forwarded to the PDP. The PDP maintains a set of
policies against which incoming requests are evaluated. The result
is an authorization decision and a set of obligations being returned
to the PEP. Finally, PEP processes the obligations before sending
data to the requester. In Mosco, this step involves accessing and
transforming user data. Fig. 1 highlights the process of accessing
data in XACML. This framework is flexible because PEP also handles
application semantics, that is it does not only return Permit/Deny
access decision but also a transformed version of the data. For
more detailed description of XACML, we refer keen readers to the
framework specification (Oasis). Here, we briefly explain the key
elements for making and fulfilling requests.
1 Subject, Resource: a subject requests access to data of the resource.
In Mosco, they are users of the social applications.
2 Request: consists of a series of attributes providing information
about the subject, resource and external inputs. These attributes
can be later extracted at PEP and PDP during request evaluation.
3 Policy and Policy set. a policy contains a target, a set of rules and
a set of obligations (optionally). Every policy is indexed by its
target element which contains a matching condition. The policy
is applicable to a request if the request attributes satisfy the target
matching condition. A rule element contains a boolean function,
which returns an authorization decision: either Permit or Deny if
the function is evaluated to true. When there are multiple rules,
a rule-combining algorithm must be specified to determine the
final decision. A policy set contains multiple child policies. When
more than one child policies are applicable, a policy-combining
algorithm is needed to determine the combined result.
4 Obligation. contains an operation that should be performed by
the PEP when it enforces an authorization decision. The most frequent use of obligations includes notifications and logging of data
access. In Mosco, obligations are vital for enforcing fine-grained
policies, since they specify what functions to be computed over
raw data.
3. Specification and enforcement of fine-grained access
control
3.2. Fine-grained access control using obligations
We have described the core set of access policies necessary
for ensuring user privacy in social applications. To support these,
Obligations vs. Rules XACML allows for customized access control policies by letting users define rules and obligations. A rule is
24
D.T. Tuan Anh et al. / The Journal of Systems and Software 92 (2014) 20–31
request contains his current location. PDP first evaluates the rule,
whose result is true. Next, PDP forwards the obligation to the
PEP, which reads Alice’s current location and computes distance
to Bob’s. If the distance is less than 5KM, PEP sends Alice’s location
to Bob. Otherwise, Bob receives empty data.
Fig. 2. An example policy (Section 2.1.1). Alice sharing her location to nearby friends.
3.3. XACML policy combination and composition
executed within PDP and returns a boolean value. One can implement a rule transforming or checking if a condition holds over the
data, but such a rule cannot be used to retrieve data. On the other
hand, an obligation is performed at the PEP and could return any
object. Thus, for any given function, one can define a corresponding
obligation ensuring only the results of that function are returned.
In our settings, we utilize both rules and obligations: the former
are to filter out requesters who are not friends or not in a specific
friend group of the data owner, and the latter are for returning only
the permitted data.
An obligation in XACML comprises an ID and a set of attributes.
Table 1 summarizes the obligations designed for the policies in
Section 2 (for simplicity, we assume that filtering and similarity policies return all data, i.e. D = A). These obligations can be
combined to specify complex access scenarios. Every obligation
contains an integer attribute col-idx representing the column to
which the obligation is applied (col-idx is in fact the index of Ai
as explained in Section 2.2.1).
• Filtering obligation: consists of an integer attribute filteringcond specifying a comparison operator comp, and a real-valued
attribute filtering-value containing the filtering value val.
The obligation returns the data when Ai comp val = true.
• Granularity obligation: consists of an integer attribute granlevel specifying the value gi , and an integer attribute
trans-func-id specifying the transformation function Ti .
• Similarity obligation: consists of an integer attribute sim-funcid specifying a distance function dist, a real-valued attribute
sim-range specifying the similarity distance between request
inputs and values of Ai . A value v ∈ Ai is returned if dist(in,
v) ≤ range for user inputs in.
• Summary obligation: defines a sliding window over Ai . The string
attribute window-start represents the starting timestamp. Integer attributes stat-func-id, window-size, window-advance
define the statistic function to be applied over each window, the
window size and advancing step respectively.
Notice that when obligation returns no data, the user receives
an empty set of result instead of a Deny decision. We remark that
returning Deny or an empty result both leak some information
about the data, but addressing such leakage is not within the scope
of our work.
Example. Fig. 2 illustrates how an access scenario described in
Section 2.2.1 is mapped into XACML’s rules and obligations. In particular, Alice wants to share her location only to friends when they
are within 5KM from her. The corresponding policy contains a rule
is-a-friend which returns true if the requester is a friend of
Alice. The obligation is of type similarity. Suppose Bob, a friend of
Alice, requests her location through the XACML framework. Bob’s
Letting a user to define a policy combining algorithm is another
dimension of fine-grained access control. A more privacy-conscious
user may want the most restricted policy to be selected, whereas
an indifferent user may wish his friend to see as much data as possible. In XACML, a combining algorithm is identified by an ID and
is included in the Policy element. The common options are:
• Deny-override or Permit-override: returns the Deny-policy
immediately if there is one Deny policy, or the first Permitpolicy. This algorithms are standards in XACML.
• Most-restricted: returns the policy with the most restriction over
the data. The semantics of this algorithm depends on the application. In location sharing applications, granularity with the highest
granularity value may be considered as most restricted, whereas
in participatory sensing applications, summary policies with the
largest window size may be the most restricted.
• Union: returns data from all policies. This is the most relaxed
algorithm, especially since results from different types of policies
may reveal extra information.
XACML does not support policy composition, as it is not possible to specify an order in which obligations are executed by
the PEP. However, one can work around this by defining complex obligations which capture the composed policy. For example,
a FilteringAndSlidingWindowObligation can be added to
Table 1 such that when returned, the PEP will first evaluate the
filtering condition and then apply sliding window over the output.
4. Middleware design
The goal of Mosco is to provide a middleware that requires
minimal effort from the developers to create new mobile social
applications with support for privacy-preserving capabilities, particularly by facilitating fine-grained access control. The resulting
applications are scalable with respect to the number of users and
sizes of data, while at the same time provide users with fine-grained
control over their data. To achieve the former, we build Mosco
on top a cloud platform, namely Google App Engine, which handles increases in system workload automatically and gracefully.
We accomplish the latter by extending XACML, as discussed in the
previous section.
4.1. System overview
Fig. 3 illustrates main components of an application built
using Mosco. Information about a user—including personal details,
friends and access policies—are stored within a data store. The data
generated by each user is also managed by the cloud at its data store.
Suppose a user A queries for data from his friends, the request first
Table 1
Obligations supporting fine-grained access policies.
Description
Id
Attributes
Filtering
Granularity
Similarity
Sliding window
FilteringObligation
GranularityObligation
SimilarityObligation
SlidingWindowObligation
filtering-cond, filtering-val, col-idx
gran-level, trans-func-id, col-idx
sim-func-id, sim-range, col-idx
window-start, window-size, window-advance, stat-func-id, col-idx
25
D.T. Tuan Anh et al. / The Journal of Systems and Software 92 (2014) 20–31
FriendData
1
1
1
AbstractData
n
1
ProfileData
AbstractPrivacyData
1
1
1
n
1
GroupData
Fig. 5. UML diagram representing main entities in the data model. The shaded
entities are to be extended when implementing a new application.
Fig. 3. System’s overview.
arrives at the application servlet which forwards it to the PEP module. For each friend of A, say B, an XACML request is sent to the PDP,
which retrieves B’s policies from the datastore and evaluates them
against the request. If evaluated to Permit, a set of obligations is
sent back to the PEP. Finally, PEP retrieves raw data from the data
store, performs the functions as specified in the obligations and
returns the result to A (via the servlet).
4.1.1. Trust model
Security of the access control mechanism depends upon the
access policies being evaluated correctly. In our case, the evaluation
is done at the cloud which we assume to be honest. In particular,
the cloud is honest in three respects: first, it carries out the access
control enforcement correctly; second, it is allowed to access user
data in clear; third, the cloud is secure from external attacks. We
acknowledge that this is a strong assumption. Nevertheless, we
argue that it is not unreasonable to expect the cloud to behave
honestly either due to the need to protect its reputation or to fulfill its Service Level Agreements and legal obligations, and many
existing systems do operate under similar assumptions. While it is
possible to achieve some levels of fine-grained access control with
semi-honest clouds (Tuan Anh and Datta, 2012), such approaches
require expensive cryptographic operations. On the other hand, by
assuming trusted cloud, we can design more scalable and efficient
systems supporting very high level of fine-grainedness in access
control with respect to other users.
4.1.2. Why google App engine
Being a platform-as-a-service cloud platform, Google App
Engine (GAE) offers a scalable infrastructure for deploying social
applications. It has been used for many large-scale social applications: BuddyPoke, Crystal, etc. Compared to the alternatives such as
Windows Azure or Amazon’s EC2, there are several advantages in
using GAE when it comes to social applications. First, even though
New Application
Mosco
Client
GAE runtime environment is restricted (no writing to files, no
socket API, etc.), it handles scaling of the resource seamlessly making the application respond better to sudden increases in demand
(in the presence of flash crowd, for instance). Second, with the large
user base, one can enjoy the Google authentication service for free.
This means the application users can be assumed to have already
been authenticated, eliminating also any long drawn registration
process. Third, the sand-boxing environment could indeed help
secure the applications from common security or denial of service
threats (such as side channel attacks or common software vulnerability) which are dealt with by Google underlying infrastructure.
Finally, GAE comes with rich ecosystem for creating new applications: comprehensive support for multiple SDKs, easy integration
with other Google’s products, and ease of rolling out the finalized
product since applications can be deployed to a real cloud directly
from the development mode without any change.
4.2. Middleware design
As seen in Fig. 3, a new social application consists of a client
and a server component. The latter will be running on Google’s
cloud infrastructure and serving requests from mobile clients via
HTTP. Mosco provides a set of API hooks so that new applications
can be developed with minimal effort. Fig. 4 highlights the main
modules that are to be extended when writing a new application.
Suppose the application deals with data types and policies that are
not already supported by Mosco. First, the data and policy definition
are extended to accommodate the new types. Next, the XACML policy builder is extended so that XACML policies can be built from the
AbstractPrivacyData instances. Finally, XACML obligation handle
is extended to process obligations embedded in the new policies.
4.2.1. Data and policy model
Fig. 5 shows five entities that make up the generic data
model supported by Mosco. They correspond to five virtual tables
Server
ProfileData,
FriendData,
AbstractData,
GroupData
AbstractPrivacyData
AbstractPolicyBuilder,
MoscoPolicyCombAlg
ObligationHandle
Data definition
Policy definition
XACML
PolicyBuilder
XACML
ObligationHandle
Fig. 4. Mosco design. Both client and server implementations extend four modules in Mosco: data definition, policy definition, policy builder and obligation handle.
26
D.T. Tuan Anh et al. / The Journal of Systems and Software 92 (2014) 20–31
maintained in Google datastore. Each data instance is indexed by a
GAE-generated key.
ProfileData represents a user. It contains the user’s email
address and a AbstractPrivacyData key representing the default
access policy applicable to all friends (as opposed to the group policies embedded in GroupData which are applicable to only group
members). Using this key and the email address, one can retrieve
all information pertaining to the specific user including the data.
AbstractPrivacyData represents a generic access policy. A
new policy must extend this entity. It contains a ProfileData key
pointing to the owner, and a GroupData key if this policy is associated with a group. Mosco comes with support for the four policies
discussed in Section 2 (for real-valued data). We overcome GAE’s
lack of support for inheritance by storing a variable indicating the
type of policy being stored, so that retrieval of a AbstractPrivacyData instance can be done by specifying the class name of the
policy type.
AbstractData represents a generic data instance. In our settings, all data types extend this entity. It contains email address of
the data owner, the data content and a timestamp variable. One can
retrieve the owner’s data by constructing a SQL-like query over the
AbstractData table for the matching email address.
FriendData represents a friend relationship. It contains email
address of the owner and of the friend, as well as a timestamp indicating the latest timestamp of AbstractData instance accessed by
the friend. For example, a FriendData instance represented by the
tuple (u1 , u2 , t) means that u1 is friend of u2 , and u2 most recent
access to u1 ’s data is at timestamp t. The timestamp variable is necessary to avoid Mosco returning duplicate data. For example, (u1 ,
u2 , t) means that u1 has accessed data of u2 up to timestamp t. Next
request from u1 to u2 will only return authorized data timestamped
at t′ where t′ > t. Similar to AbstractData, one can retrieve all the
friends of a certain user by constructing a SQL-like query.
GroupData represents a friend group (or circle) of a specific user.
It consists of the email address of the group owner, the group name,
and a list containing the members’ email addresses. Additionally,
it has a AbstractPrivacyData key pointing to the access policy
associated with the group.
4.2.2. Data dependencies
In our design of the data model, the ProfileData instance
contains no direct reference to other entities except to an
AbstractPrivacyData instance. To retrieve groups, friends, and
data associated with a user, one needs to execute a SQL-like query
with the matching user email address.
In our first design (we refer to this as the old data model),
each ProfileData instance maintains a key pointing to a FriendData instance which has a list of ProfileData keys pointing to
other users. This approach seems to enable easy access to a user’s
entire friend list by retrieving one specific FriendData instance
using its key. However, we later changed to the current design of
FriendData that is similar to that of AbstractData because of
the following reason. Adding or removing a friend relationship in
the old model requires synchronized access to two ProfileData
and two FriendData instances. For the latter, there needs to be
two read and two write access. Since friend update is likely to be
a high-frequency operation, and as the system scales, the cost of
multiple read/write access and of locking will become expensive.
In contrast, in the current Mosco model, updating friend require
one write to the datastore, and updates for the same user can be
done in parallel. We show in Section 5 that this indeed results in
better update performance.
4.2.3. Enforcing new policies
Having defined the data and policy model, Mosco can now
store the new data types and policy information in the cloud.
When requests come in, they must be evaluated against the
stored policies. To support evaluation against new policies, Mosco
must be extended to construct well-formed XACML policies
(for PDP evaluation) from the stored AbstractPrivacyData
instance.
AbstractPolicyBuilder provides a template for constructing
XACML policies from AbstractPrivacyData data instances. To
build a concrete policy, one must implement the createRules()
and createObligations() method, which defines Rule and Obligation elements of the resulting XACML policy. For each type of
AbstractPrivacyData, Mosco comes with implementation of one
policy builder.
ObligationHandle is an interface that must be extended for
every type of obligation. An instance of ObligationHandle is executed at the PEP. In the processObligation() method, one can
query raw data in AbstractData table and process it according to
the function defined by the obligation ID and attributes. Mosco has
four implementations of ObligationHandle, each corresponds to
an obligation type defined in Section 3.
MoscoPolicyCombAlg provides a generic policy combining
algorithm. Current version of Mosco supports Union algorithm.
Other alternatives as listed in Section 3 can be added by extending
MoscoPolicyCombAlg.
4.2.4. Datastore access and caching
For each data entity in Mosco there is a corresponding backend service handling storage, update and accessing of the data. In
particular, ProfileService, FriendService, GroupService,
DataService and PolicyService are singletons containing
methods dealing with ProfileData, FriendData, GroupData,
AbstractData and AbstractPrivacyData respectively.
Accessing the datastore is an expensive and billable operation (Google). Mosco provides a cache layer for all of these services.
Caching is useful in social applications since the same policy may
apply to many users (for example, a default policy applies to all
users in the friend list, or a group policy to all members in the
group), hence data need be retrieved only once and used many
times. It is particularly the case for non-AbstractData instances,
since they seldom change. The cache is purged whenever the data is
updated. For instance, when a new data is added, the current cache
for AbstractData is cleared. We demonstrate caching effectiveness in the next section.
4.2.5. Push or pull
In many social applications, data can be pushed to the
clients (Facebook, Twitter). Server pushing is an useful abstraction besides client pulling, which gives an impression of real-time
updates. Underneath, however, pushing is implemented by client
pulling periodically and by the server hanging on the HTTP request.
The current implementation of Mosco supports data pulling only.
It leaves it to the application to determine how often the client
should query the server. This makes sense for non-realtime applications (such as location sharing) since it imposes no overhead on the
server. Participatory sensing and mobile health applications which
may demand instant access to the latest data (for realtime decision
making) could benefit from the push interface.
5. Implementation and evaluation
We have implemented Mosco in Java (source code can be
found at https://code.google.com/p/mosco) which supports all four
types of access policies discussed in Section 3. In the following,
we describe the implementation of two mobile social applications using Mosco, and the experimental evaluation of our
middleware.
27
D.T. Tuan Anh et al. / The Journal of Systems and Software 92 (2014) 20–31
Fig. 6. Android client user interface.
5.1. Implementation
The goal of Mosco is to serve as a middleware for easy development of privacy-aware, scalable mobile social applications. To
demonstrate this, we implement a location sharing and a mobile
health sharing application for Android. The former allows user
to control who gets access to his location, and to display nearby
friends (who give him access to their locations) in a map (Fig. 6(a)).
The latter enables user to specify fine-grained control over his
physiological (or vital sign): namely the heart rate and chest volume (respiration force) which are collected during his sleep. Such
information is useful for sleep study and can be shared (compared) between friends (Fig. 6(b)). It is worth emphasizing that the
applications we discuss are not a contribution per se, and more
sophisticated applications may be essential for making compelling
real-life use cases. The purpose of developing these ‘toy’ applications was to demonstrate and test the efficacy and scalability of
the Mosco middleware, and to showcase how it eases the development and deployment process for new privacy-aware social mobile
applications.
To implement a new application with Mosco, one first defines
the data type and registers it to the IDs class. Second, a new policy model is specified by extending AbstractPrivacyPreference
class. Third, AbstractPolicyBuilder class is extended to support the new policy. Finally, obligation processing for the policy
is defined by extending ObligationHandle interface. The ObligationHandle object extracts variables embedded in the XACML
obligation and passes them as arguments to a user-defined function. This function implements application-specific processing of
data, and it is declared at the IDs class. For the location sharing
application, we define LocationSimilar and MostFrequentLoc
as application-specific functions for similarity and granularity policies. The mobile health application requires VitalSignSimilar
class for processing similarity obligation (other obligations are
standard functions over real-value numbers).
Mosco comes with policy definition, builder and obligation handle of the four policies listed in Table 1. It also provides standard
Table 2
Parameters used for the benchmarking experiments.
Parameters
Values
Environment
Application
Policy
Dataset
# Concurrent clients (nClients)
Client request
Single server, Google App Engine (GAE)
Location sharing, mobile health sharing
Similarity, filtering
SNAP, SantaFe
1–1600
Insert, delete, data query
functions for processing obligations over real-value data. The
implementation of the location and mobile health sharing applications take 413 and 292 lines of code (not including blank lines)
respectively. This illustrates the ease of developing a new social
application using our middleware. Including these two applications, Mosco amounts to 7019 lines of code.
5.2. Evaluation
We carry out experiments to evaluate how well applications developed using Mosco perform, especially when running
on Google App Engine (GAE) and under increased workloads.
We consider the following performance metrics: scalability and
processing time for updating application data and requesting user
data (or data query). We also want to compare the performance
between different applications.
5.2.1. Methodology
Table 2 summarizes the parameters used for the experiments.
For the location sharing application, we use the SNAP dataset which
contains real location information of over 5000 users. We experiment with similarity policies which grant access to location data
only when the subject is within a certain radius. For the mobile
health sharing application, we generate synthetic data based on
the SantaFe dataset which contains real physiological data from
a sleep study. For this application, we experiment with filtering
28
D.T. Tuan Anh et al. / The Journal of Systems and Software 92 (2014) 20–31
Fig. 7. Data update time.
5.3. Results
Single-server vs. GAE environment. We started multiple
clients that perform concurrent data insert. Each client inserts
1000 data tuples for each user. The single-server setting reaches its
capacity at 50 concurrent clients, i.e. the application crashed after
50 clients. In the GAE environment, the application continues to run
and scales gracefully. This result illustrates the differences of existing cloud platforms. More specifically, Amazon EC2 (as opposed to
GAE) provides raw infrastructure, but its lack of automatic scaling hinders applications’ availability when there is a sudden rise in
demand. The following results are obtained by running the experiments in GAE environment.
Insert and Delete. Fig. 7 shows the insert and delete performance of application data, with each client sending a stream of
1000 data updates for each user. Overall, it takes 23 min to upload
1.6 millions data items with 1600 concurrent clients, and 25 min to
delete them. The insert time per data item scales gracefully from
1
0.9
0.8
0.7
0.6
CDF
policies which grant access to heart rate data whenever it exceeds
a certain threshold.
We simulate a set of clients, each of which sends a stream of
requests to the application. Specifically, a client sends a request
(for a data insert, delete or query operation), waits for the response
from the application and repeats again. Application workloads are
simulated by varying the number of concurrent clients (between 1
and 1600). We note that a client represents an extreme user sending requests at maximal rate to the cloud. Hence, 1600 concurrent
clients translate to 1600 concurrent requests at any given time.
In practice, each mobile user issues requests at much lower rates,
therefore the workload of 1600 concurrent requests may correspond to a much higher number of concurrent users.
We perform experiments in two settings: single-server and GAE
environment. In the former, the application is deployed on one
server running the development-version of the GAE server. In the
latter, the application is running on the real Google App Engine
cloud. The reason for experimenting with these two settings is to
evaluate the limitation of running the application on a generic,
non-cloud environment versus the benefit of automatic scaling
of the GAE cloud platform. For the single-server setting, we hire
one large (high-CPU and high-memory) Amazon EC2 instance to
run the development-version GAE server, and 8 other medium EC2
instances that run the simulated clients. For the GAE environment,
we purchase the F4 instance class offered by Google.
The results presented in the following are averaged over 3 runs.
Unless stated otherwise, the graphs show results of the mobile
health sharing application.
0.5
0.4
0.3
0.2
0.1
old friend model
Mosco
0
0
0.5
1
1.5
2
2.5
3
3.5
4
time (sec)
Fig. 8. Comparing the old vs. Mosco data model for FriendData, for 50 concurrent
clients adding friend relationship from the SNAP dataset.
59 ms with 1 client to 386 ms with 1600 clients. Delete time scales
from 56 ms to 460 ms. The maximum number of GAE instances
launched during the experiments are shown in the graphs (they
correspond to the right y-axis). As more concurrent clients are
added, GAE spins new instances to deal with request: over 150
instances are active for 1600 clients.1
We have discussed in Section 4 the alternative approach for
designing the data model, especially with respect to the FriendData entity. Fig. 8 compares the distribution of insert time for
FriendData between the old data model and the Mosco model.
The old model incurs significant overhead per insert, as it requires
almost 1.5 s at median as opposed to 0.4 s for the Mosco model.
This demonstrates the benefit of loosely-coupled data model, especially when using a cloud platform such as Google which employs
key-value based storage.
Query. The experiments for data query time are run in the GAE
settings. The following results are for the mobile health sharing
application, with 1000 users and 20,000 data items. Each client
sends data requests of the form (u1 , u2 ) representing request originated from user u1 for the data of user u2 .
1
Compared with an old set of experiments (carried out in September 2012), we
observe that GAE launches more instances to deal with a given workload, resulting
in shorter upload time.
29
D.T. Tuan Anh et al. / The Journal of Systems and Software 92 (2014) 20–31
Fig. 9. Mean query time for varying workload, and query time distribution for 50 concurrent clients.
6. Related work
Our work concerns a middleware that facilitates sharing of
mobile data in the context of social computing. There exists other
systems such as SenseWeb, PatientsLikeMe, Foursquare whose
main focuses are also on data sharing. However, their access
control models are either coarse-grained (all-or-nothing sharing) or have little support for social settings. When it comes to
1
0.9
0.8
0.7
0.6
CDF
Fig. 9(a) shows how query time at GAE servers scales with
increased workloads. In particular, it takes 0.26 s for 1 client and
only rises to 1.5 s for 1600 concurrent clients. It can be observed that
the maximum number of GAE instances goes up to 35 instances to
accommodate higher workloads. The figure contains measurement
from a real Android client application (running on a HTC NexusOne
mobile phone with 2.3.6 Android OS, over a residential wireless
network). The response time observed at the client increases and
varies more with higher workloads. However, even with 1600 concurrent (simulated) clients, the average response time for the phone
user is still under 3.5 s. We believe this latency (which can also be
attributed to the network latency) is reasonable.
Fig. 9(b) shows the query time distribution for the workload
consisting of 50 concurrent clients. There are three important
observations that can be taken from this graph. First, the caching
mechanism described in Section 4 is effective as it improves the
query time up to 33% at median and 50% at the 90th percentile.
Second, the overhead of XACML (including policy management and
evaluation against requests) is negligible. It is illustrated as the time
taken for direct queries (obligation functions is executed directly
on receipt of the direct queries without going through the XACML
process) being close to the query time observed in Mosco. The final
observation comes from running a mixed workload. The result so far
is presented for normal workloads in which the requests (u1 , u2 ) are
constructed such that u1 and u2 are friends. In the mixed workload,
we let each client generate normal workload with the probability
of 0.5 and a random workload (where u1 and u2 are random users
who may not be friends) with the same probability. The query time
for this workload indicates faster response time from GAE servers,
which is as expected because many of the requests are rejected
without accessing the datastore for the data.
Comparing two applications. Finally, Fig. 10 compares the
update and query time between the location sharing and mobile
health sharing applications. It can be seen that the metrics are
almost the same for both applications. This is because the location
and vital sign data are roughly of the same size, and that the time
taken for processing the similarity and filtering obligation are also
similar.
0.5
0.4
0.3
0.2
location data insert
vital sign data insert
location sharing (similarity)
vital sign sharing (filtering)
0.1
0
0
0.2 0.4 0.6 0.8
1
1.2 1.4 1.6 1.8
2
time (sec)
Fig. 10. Comparing location sharing and mobile health sharing, for 50 concurrent
clients.
Internet-of-things applications, there are more in data sharing than
merely giving access to the data. Other systems deals with other
aspect of sharing, such as content matching (Guha et al., 2012),
location proximity matching (Zhong et al., 2007; Narayanan et al.,
2011), crowd-sourced sensing (Cornelius et al., 2008), or behavior
classification (Lane et al., 2011). Frameworks such as CarTel (Hull
et al., 2006), Virtual Trip (Hoh et al., 2008) address query and
computation issues for specific applications (traffic control, in particular), while assuming data has already been shared.
Mosco guarantees user privacy in terms of fine-grained access
control with respect to the end-users. We assume that the cloud
where Mosco is running is trusted, that is it will neither violate
data privacy nor collude with rogue users to do so. As competition amongst cloud providers are high, the need to maintain high
reputation is a strong incentives for them to be trustworthy. Systems such as Airavat (Roy et al., 2010) or eXACML (Tuan Anh et al.,
2012) builds on this assumption to provide differential privacy
or fine-grained access control for archival data. When a general
Service License Agreement (SLA) does not suffice, one must rely on
cryptography to protect data from the cloud. CryptDB (Popa et al.,
2011), Plutus (Kallahalla et al., 2003), CloudProof (Popa et al., 2011)
ensure data confidentiality using encryption, which is the same as
access control at a coarse-grained level. Recently, advanced encryption schemes such as Attribute-Based Encryption (Goyal et al.,
2006; Bethencourt et al., 2007) enable more fine-grained access
control. But these schemes incur high computational overhead.
30
D.T. Tuan Anh et al. / The Journal of Systems and Software 92 (2014) 20–31
Furthermore, policies that require transforming the data (the granularity policies, for instance) cannot be directly mapped to ABE. As
noted in Tuan Anh and Datta (2012), the design space for outsourcing access control to the cloud can be characterized along three
dimensions: trustworthiness of the cloud, fine-grainedness of policies and the work ratio between users and the cloud. CryptDB,
Plutus and other systems employing ABE trade fine-grainedness
and work ratio for a more relaxed trust assumption. In this design
space, Mosco occupies a unique spot with the highest level of finegrainedness and work ratio.
Mosco access control model can be considered as hierarchical,
in the sense that it consists of two level: user and group. A popular,
multi-level role-based access control (RBAC) model has been popular in enterprise systems, where delegation and dynamic group
membership are important. For the time being, we believe the simple two-level model used in Mosco is sufficient for many social
applications.
Finally, Mosco leverages Google App Engine, a platform-as-aservice cloud platform. There exists other services at the same level
of abstraction (Amazon, Windows), or even lower-level abstraction (infrastructure-as-a-service, Amazon, Rackspace Hosting). One
could implement Mosco using any of these services and enjoying
different trade-off (Li et al., 2010). While we maintain that Google
App Engine is a good choice for developing and deploying social
applications, we envisage that porting Mosco to another environment would not be particularly challenging.
7. Conclusion and future work
In this paper, we have presented Mosco, a privacy-aware
middleware for scalable mobile social computing. Mobile social
applications requires fine-grained access control while also being
able to scale gracefully with more users and data. We have designed
Mosco to ease the development of new social applications while
meeting both of these requirements. We have identified a core list
of access policies that are common in many social applications. In
Mosco, these policies are enforced by using an extension of the
XACML framework. Mosco runs on Google App Engine to leverage
the cloud’s plentiful and scalable resources. We have demonstrated
that Mosco shortens the development process for new applications.
In addition, the resulting application scales gracefully to accommodate increased workloads. Our experiments also indicate that the
overhead incurred by the access control mechanism is small.
Our immediate plan is to enhance the existing location sharing
and mobile health sharing application with more features (mostly
at the client side) in order to attract real users. Once having real
users, we will be able to carry out user study and gain more insights
into the performance of the application and of the middleware. The
current version of Mosco supports only the pull abstraction for data
retrieval. As discussed in Section 4, a time-sensitive application can
benefit from a push abstraction. We plan to incorporate this into the
future version of Mosco, which entails instrumenting the server to
wait on long-lived HTTP requests and send new data to the client
when it arrives. This extension is likely to incur overhead at the
server side.
We plan to investigate how to enhance the current access control model to the full role-based access control (RBAC) model.
Adding more hierarchy levels and delegation capability to the
access subject will improve the flexibility of Mosco and make it
more attractive to enterprise applications. We also intend to extend
the current XACML framework with support for policy composition,
which will increase its expressiveness as well as its chance to be
adopted in the stream database community. In fact, the problem of
composing simple policies into complex ones can be viewed in the
same light as a well-known problem in stream database research:
constructing query graph from query operators. As a consequence,
we could borrow techniques from the vast number of works in this
community when implementing our XACML extension.
Another interesting extension for Mosco is to raise the level of
data access abstraction. Current applications of Mosco support simple abstractions involving none or very simple computation on the
data, but higher-level abstractions requiring more complex computation may be desirable. One example is a policy that grants
access only to results of certain data mining algorithms or statistical function. Incorporating these abstraction to Mosco seems
straightforward, as one can define a new obligation for the required
computation. The challenges lie on identifying the different levels
of abstractions and implementing them on a cloud platform in an
efficient way.
Last but not least, we will like to investigate the challenges when
removing the trust assumption regarding the cloud. Access control enforcement will no longer be possible with XACML, instead
a cryptographic approach must be considered. Recent work (Tuan
Anh and Datta) shows that it is possible to support a number of
fine-grained access policies. But to support all the policies listed in
Section 3 in an untrusted environment remains a challenge.
References
Amazon Elastic Computing Cloud. https://aws.amazon.com/ec2.
Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R.H., Konwinski, A., Lee, G.,
Patterson, D.A., Rabkin, A., Stoica, I., Zaharia, M., 2009. Above the Clouds: A
Berkeley View of Cloud Computing. Technical Report UCB/EECS-2009-28. EECS
Department, UCB.
Bethencourt, J., Sahai, A., Waters, B., 2007. Ciphertext-policy attribute-based encryption. In: IEEE Symposium on Security and Privacy.
BuddyPoke! Express Yourself. https://www.buddypoke.com.
Chillingo Crystal. https://www.chillingo.com/crystal.
Cornelius, C., Kapadia, A., Kotz, D., Peebles, D., Shin, M., Trandopoulos, N., 2008.
Anonysense: privacy-aware people-centric sensing. In: MobiSys, pp. 211–224.
Curetogether. https://curetogether.com.
Facebook. https://www.facebook.com.
Find My Friends. https://itunes.apple.com/us/app/find-my-friends/id466122094?
mt=8.
Foursquare. https://foursquare.com.
Google App Engine. https://developers.google.com/appengine.
Google+. https://plus.google.com.
Goyal, V., Pandey, O., Sahai, A., Waters, B., 2006. Attribute-based encryption for finegrained access control of encrypted data. In: CCS.
Guha, S., Jain, M., Padmanabhan, V.N., 2012. Koi: a location-privacy platform for
smartphone apps. In: NSDI.
Hoh, B., Gruteser, M., Herring, R., Ban, J., Work, D., Herrera, J.-C., Bayen, A.M.,
Annavaram, M., Jacobson, Q., 2008. Virtual trip lines for distributed privacypreserving traffic monitoring. In: MobiSys, pp. 15–28.
Hull, B., Bychkovsky, V., Zhang, Y., Chen, K., Gorackzko, M., Miu, A., Shih, E., Balakrishnan, H., Madden, S., 2006. CarTel: a distributed mobile sensor computing system.
In: 4th International Conference on Embedded Networked Sensor Systems.
Kallahalla, M., Riedel, E., Swaminathan, R., Wang, Q., Fu, K., 2003. Plutus: scalable
secure file sharing on untrusted storage. In: FAST 2003.
Lane, N.D., Xu, Y., Lu, H., Hu, S., Choudhury, T., Campbell, A.T., Zhao, F., 2011. Enabling
large-scale human activity inference on smartphones using community similarity networks. In: UbiComp, pp. 355–364.
Last fm. https://lastfm.com
Li, A., Yang, X., Kandula, S., Zhang, M., 2010. Cloudcmp: shopping for a cloud made
easy. In: HotCloud.
Moodscope, With a Little Help from Your Friends. https://www.moodscope.com.
Mun, M., Reddy, S., Shilton, K., Yau, N., Burke, J., Estrin, D., Hansen, M.,
Howard, E., West, R., Boda, P., 2009. PEIR, the personal environmental impact
report, as a platform for participatory sensing system research. In: MobiSys,
pp. 55–68.
Narayanan, A., Thiagarajan, N., Lakhani, M., Hamburg, M., Boneh, D., 2011. Location
privacy via private proximity testing. In: NDSS.
Nike+. https://nikeplus.nike.com.
Oasis Extensible Access Control Markup Language (xacml). https://www.
oasis-open.org/committees/xacml/.
Patientslikeme. https://www.patientslikeme.com/.
Popa, R.A., Zeldovich, N., Balakrishnan, H., 2011. Cryptdb: A Practical Encrypted
Relational dbms. Technical Report MIT-CSAIL-TR-2011-005. CSAIL, MIT.
Popa, R.A., Lorch, J.R., Molnar, D., Wang, H.J., Zhuang, L., 2011. Enabling security in
cloud storage SLAs with CloudProof. In: USENIX Annual Technical Conference
2011.
Quantified Self: Self Knowledge Through Numbers. https://quantifiedself.com, 2012.
Rackspace Hosting and Cloud. https://www.rackspace.com.
D.T. Tuan Anh et al. / The Journal of Systems and Software 92 (2014) 20–31
Roy, I., Setty, S.T.V., Kilzer, A., Shmatikov, V., Witchel, E., 2010. Airavat: security and
privacy for mapreduce. In: NSDI 2010.
SenseWeb. https://research.microsoft.com/en-us/projects/senseweb/.
Stanford Network Analysis Project. https://snap.stanford.edu/data/loc-brightkite.
html.
Take Control of Your Sleep. https://www.myzeo.com/sleep.
The Santa Fe Time Series Competition Data. https://www-psych.stanford.edu/
andreas/Time-Series/SantaFe.html.
Tuan Anh, D.T., Datta, A., 2012. The blind enforcer: on fine-grained access control
enforcement on untrusted clouds. Data Engineering Bulletin.
31
Tuan Anh, D.T., Datta, A. Stream on the Sky: Outsourcing Access Control Enforcement
for Stream Data to the Cloud. https://arxiv.org/abs/1210.0660.
Tuan Anh, D.T., Wenqiang, W., Datta, A., 2012. City on the sky: extending xacml for
flexible, secure data sharing on the cloud. Journal of Grid Computing, 151–172.
Twitter. https://twitter.com
Windows Azure: Microsoft’s Cloud Platform. https://www.windowsazure.com.
Zephyr Measure Life Anywhere. https://www.zephyr-technology.com.
Zhong, G., Goldberg, I., Hengartner, U., 2007. Louis, Lester and Pierre: three protocols
for location privacy. In: PET, pp. 62–76.