Shortest path problem considering on-time arrival probability_...

By Rick Kelley,2014-05-27 14:59
10 views 0
Shortest path problem considering on-time arrival probability_...



     Timely Data Delivery in a Realistic Bus Network

     Utku Acer? , Paolo Giaccone? , David Hay? , Giovanni Neglia? , and Saed Tarapiah?

     Dipartimento ?

     Sophia Antipolis, France di Elettronica, Politecnico di Torino, Turin, Italy Dept. of Electrical Engineering, Columbia University, NY, USA


     Abstract?ªWiFi-enabled buses and stops may form the backbone of a metropolitan delay tolerant network, that exploits nearby communications, temporary storage at stops, and predictable bus mobility to deliver non-real time information. This paper studies the problem of how to route data from its source to its destination in order to maximize the delivery probability by a given deadline. We assume to know the bus schedule, but we take into account that randomness, due to road traf?c conditions or passengers boarding and alighting, affects bus mobility. In this sense, this paper is one of the ?rst to tackle quasi-deterministic mobility scenarios. We propose a simple stochastic model for bus arrivals at stops, supported by a study of real-life traces collected in a large urban network with 250 bus lines and about 7500 bus-stops. A succinct graph representation of this model allows us to devise an optimal (under our model) single-copy routing algorithm and then extend it to cases where several copies of the same data are permitted. Through an extensive simulation study, we compare the optimal routing algorithm with three other approaches: minimizing the expected traversal time over our graph, maximizing the delivery probability over an in?nite time-horizon, and a recently-proposed heuristic based on bus frequencies. We show that, in general, our optimal algorithm outperforms the three, but it essentially reduces to either minimizing the expected traversal time when transmissions are always successful, or maximizing the delivery probability over an in?nite time-horizon when transmissions fail frequently. For reliable transmissions and ??reasonable?? values of deadlines, the multi-copy extension requires only 10 copies to reach almost the performance of costly ?ooding approaches.

     Bus schedule Stop?line graph Routing algorithm Optimal routes

     Fig. 1.

     GPS bus trace Bus mobility model

     Synthetic mobility traces

     Performance evaluation

     High-level evaluation framework

     I. I NTRODUCTION We consider an opportunistic data network formed by (some) buses and bus stops equipped with wireless devices, e.g. based on WiFi technologies, like in DieselNet [9]. Most of the stops act as disconnected relay nodes (the throwboxes in [3]), and a few of them are also connected to the Internet. Data are delivered across town following the store-carryforward network paradigm [29], based on multi-hop communication in which two nodes may exchange data messages whenever they are within transmission range of each other. A bus-based network is a convenient solution as wireless backbone for delay tolerant applications in an urban scenario. In fact, a public transportation system provide access to a large set of users (e.g. the passengers themselves), and is already designed to guarantee a coverage of the urban area, taking into account human mobility patterns. Moreover, such a wireless backbone is not signi?cantly constrained by power and/or memory limitations: a throwbox can be easily placed on a bus and connected to its power supply, or be put in an appropriate place in bus stops, which are usually already

     connected to the power grid to provide lights and electronic displays, but also in any other places where power supply is available. Finally, travel times can be predicted from the transportation system time-table; even if the actual times are affected by varying road traf?c conditions and passengers?? boarding and alighting times, such a backbone still provides strong probabilistic guarantees on data delivery time that are not common in opportunistic networks. Indeed, this paper explores the basic question: ??how to route data over a bus-based network, from a given source to a given destination, so that the delivery probability by a given deadline is maximized???. We rely on the knowledge of bus schedule information and some stochastic characterization of bus mobility, supported by real data traces. We consider two classes of routing schemes over such a network. The ?rst class relies only on forwarding a single copy of the data is propagated along a single path. The second class takes advantage of multiple copies spread in the network to increase delivery probability and reduce delivery time, albeit with higher bandwidth usage. Another architectural choice is between exploiting only busbus contacts, only bus-stop contacts, or both types of contacts. While the latter case should provide better performance, the two kinds of transmission opportunities have very different characteristics, making it hard to model both of them together in a common framework. For example, a potential contact between two buses traveling along orthogonal trajectories can be completely avoided if there is even a slight delay of one of them. On the other hand, in case of a bus-stop communication, the contact always happens eventually, but may be delayed. Most prior art (see Sec. II) considered only bus-bus

    communications. Our approach arises from considering rather bus-stop communications (nevertheless, we brie?y discuss a bus-bus extension in Sec. VI). Fig. 1 depicts the high-level framework used in the paper to study routing in the proposed network. Our starting point is a simple mobility model for buses (described in Sec. III-A), that


     is supported by the statistical analysis of a set of real traces of the public transportation system of Turin in Italy, which serves an area of about 200 km2 through about 7500 stops and 1500 vehicles distributed among 250 lines. These traces include the complete schedule for a working day and the corresponding GPS traces with the positions of all the vehicles during the morning rush hour period (6 AM?C10 AM). This mobility model allows us to represent the transportation system appropriately in terms of a graph with independent random weights, that we call the stop-line graph (Sec. IV). Under this representation, our original optimization problem to identify routes maximizing the delivery probability by a given deadline (or maximizing the on-time delivery probability) becomes equivalent to a speci?c stochastic shortest path problem on the stop-line graph. We are able to ?nd an optimal algorithm, called O N -T IME, for the single-copy case (Sec. IV-A) and then to extend it for the multi-copy case through a greedy approach (Sec. IV-C). We compare the performance of these proposed algorithms with three other heuristics (Sec. IV-B) that also operate on the stop-line graph: an adaptation of the routing algorithm proposed in [28] for bus-bus communications (we refer to it as M IN -H EADWAY), and the two na??ve algorithms, M IN -D ELAY, that determines ? the path with the least expected weight, and M AX -P ROB, that maximizes the delivery probability on an in?nite time-horizon. Since the number of real-life traces we obtained is limited, the comparison (Sec. V) is based on simulations carried on a large set of synthetic traces generated on the basis of our bus mobility model and the schedule of Turin bus system. The paper has the following main contributions and conclusions. (1) Formulation of the original routing problem as a speci?c stochastic shortest problem on a particular stochastic graph, that is justi?ed by a statistical analysis of real transportation system traces. (2) Optimal (under our model) routing scheme for the single copy case. While this of?ine routing scheme has, in theory, an exponential worst-case time complexity, in practice it is able to ?nd the optimal route in reasonable time, allowing each node to store an optimal preselected routing plan. (3) Extensions to multi-copy case, based on greedy approaches applied to the single-copy scheme. We prove a tight bound of 1/k for the on-time delivery probability in comparison to an optimal (non-greedy) k-copy scheme. (4) Simulation analysis showing that the optimal algorithm outperforms the M IN -H EADWAY heuristic, but it

    performs as the M IN -D ELAY algorithm when the there is no packet loss, and as M AX -P ROB when packet losses are signi?cant across the network. We provide some explanation for these results. In this sense the conclusion is that na??ve heuristics ? like M IN -D ELAY or M AX -P ROB algorithms may be very good heuristics for routing over realistic bus transportation networks. (5) Simulations showing that only 10 copies are needed for a multi-copy greedy approach to reach performance close to ?ooding routing policies; the latter requires at least two order of magnitude more transmissions and copies for each single piece of data.

     II. R ELATED W ORK Employing a bus network as a mobile backbone for dense vehicular networks was ?rst proposed in [30], using standard routing protocols for mobile ad-hoc networks (e.g., DSR or AODV). More recently, buses employment in a disconnected scenario has been considered, e.g. in the seminal DieselNet project [9]. Since our paper addresses routing in such a network, in what follows we only mention work related to routing issues. Most of the research has focused on bus-bus communications [2], [8], [13], [14], [28] with the following routing approach: Each vehicle learns at run time about its meeting process; then, the vehicles exchange their local view with other vehicles and use the information collected to decide how to route data. The goals of the proposed algorithms were either to reduce the expected delivery time or to maximize the delivery probability. Unlike these studies, we mainly focus on bus to stop data transfers and derive a single-copy routing algorithm to maximize the delivery probability by a given deadline. We then extend the algorithm to address settings where several copies of the same data are permitted. On the other hand, we do not consider buffer or bandwidth constraints, as they are not a major concern in our settings: When the mobile devices are buses (as opposed, for example, to cellular phones), it is reasonable to assume that there is suf?cient storage available; in addition, since buses communicate with stops (as opposed to other moving buses), the amount of data transferable during a meeting is larger. The use of ?xed relay nodes was also considered in [3], [4]. In [4] an architecture is proposed where bus passengers may use the cellular network to require content that will be delivered to access points along the bus trajectory. This data can be replicated also on other buses, taking advantage of possible data transfers between vehicles. Their analysis considers only a simplistic single-street scenario and does not address routing issues. [3] reports that the performance of a vehicular network is improved by adding some infrastructure, like base stations connected to the Internet, a mesh wireless backbone, or ?xed relays (which are similar to our stops) . The most important results are (i) there are scenarios where a mesh or relay hybrid network is a better choice over a base

    station networks; (ii) deploying some infrastructure has a much more signi?cant effect on delivery delay than increasing the number of mobile nodes. These ?ndings, which were veri?ed both analytically and by experiments on DieselNet testbed, support our proposed architecture that relies on an opportunistic connectivity between vehicle nodes and ?xed relays. In order to provide low cost Internet connectivity to ?xed kiosks in rural areas of developing counties, KioskNet architecture has been proposed [15]. In this architecture, buses carry data between the kiosks and the gateways that are connected to the Internet. Routing of such data is achieved by simple ?ooding. On the other hand, gateways are delegated to a kiosk via a scheduling mechanism that considers the schedule of the


     buses which serve the kiosk. The routing algorithms proposed by [18]?C[21] are intrinsically more suited for bus to bus data transfers. [19] and [21] propose algorithms that take advantage of cyclic mobility patterns, according to which nodes meet periodically, albeit with some probability. Even if a given bus may meet multiple times the same stop, this approach does not ?t our scenario for three reasons. First, the bus-stop contact process is not necessarily periodic because vehicles may be assigned to different lines during one operation day. Second, even if a vehicle operates always on the same line, its frequency changes signi?cantly along the day. Third and more importantly, even when a period may be de?ned, its time duration ranges from 30 minutes to 2 hours (depending mainly on the length of the bus trajectory and on inactivity times at terminus), and it is then comparable with the deadlines we are targeting, so that it is not possible to take advantage of such long term periodicity. Other forms of long-term regularities in the contact process of the different nodes [20] are too general for our settings, since we have signi?cantly more information on the meetings that can be exploited to improve the performance. Finally, [18] proposes hierarchical routing for a deterministic network, whereas we consider non-deterministic mobility. Almost all the papers above have considered only small bus networks (40 buses for DieselNet, 16 buses on a cyclic path for MobTorrent [4]). Only [13] considers an urban setting with a public transportation system comparable to ours (70 different bus lines), but, differently from us, they do not use any real mobility trace and simulate bus movement assuming that the bus speed is chosen uniformly at random from a given interval. From the theoretical point of view, our optimization goal can be reformulated (under some assumptions) as a particular stochastic shortest path problem that deals with a graph G whose edge lengths (or equivalently, traversal times over the edges) are random variables. Several optimality criteria were considered in the past for routing in

    stochastic graphs. The most common one is the least expected traversal time, which can be generalized to any linear (or af?ne) utility function [27]. Other optimality criteria are deviance [5], monotonic quadratic utility functions [7] and prospect-theory?Cbased functions [16]. Recent and comprehensive surveys of the different utility functions and corresponding solutions appear in [6], [26]. Our paper deals with the reliability of the chosen path, namely, ?nding a path which maximizes the probability of on-time arrival (given some deadline). This problem was ?rst studied by Frank [12] and then was also investigated in [22]?C[24] and more recently in [10], [11], [25], [26]. Current state-of-the-art algorithms still have exponential worst-case time complexity, based on enumerating over some set of candidate paths [26]. Yet, our problem differs from Frank??s problem essentially in three aspects. First, we have considered a real transportation system and therefore we are not interested in the worst-case complexity of some general graphs. Second, our transportation model has two kinds of entities: stations and buses; we need to take into account waiting time at the stops and not only buses travel times, as explained in details in Sec. IV. Third,

     all the previous work considered a single-copy model, while our model deals also with multiple copies where the objective is that at least one of the copies arrives at the destination before the deadline. Finally, we observe that we use the bus network for data transfer as it is used for passenger transfer. Thus, one could expect that the same problem has already been addressed in the transportation literature (see [1] for more details). However, this is not the case: First, the possibility to exploit multi-copy is clearly absent in the transportation of people or merchandise. Second, the probability to miss a transfer opportunity is also not considered in transportation, while data transfer between two nodes may fail because of insuf?cient contact duration, channel noise or collisions. Third, even for single-copy routing, bus network passenger routes usually aim to minimize the expected traversal time (possibly limiting the maximum number of bus changes) and not to maximize the delivery probability by a given deadline, as we are doing. The fact that ?nally minimizing the expected traversal time may provide almost optimal performance in some scenarios is an a-priori unexpected ?nding of this research. In conclusion, to the best of our knowledge, this is the ?rst paper that proposes an optimal routing algorithm that takes advantage of bus schedule information as well as a stochastic characterization of bus mobility, supported by real data traces. III. M ODEL D EFINITIONS AND A SSUMPTIONS In this section, we formally de?ne the terms and notations we use to describe a transportation system, following the terminology used in transportation literature. A transportation system T has a set of stops, denoted by S, and a set

    of vehicles (buses), denoted by V, which travel between the stops according to a predetermined path and a predetermined schedule. For each vehicle v ?Ê V, the schedule allows us to determine its trajectory, denoted traj(v), which is the ordered sequence of stops the vehicle traverses: traj(v) = (s0 , s1 , . . . sn ). In addition, each vehicle v is associated with a trip, denoted trip(v), which is a time-stamped trajectory: trip(v) = ((s0 , ?Ó0 ), (s1 , ?Ó1 ), . . . (sn , ?Ón )), such that a vehicle v should arrive at stop si along its trajectory at time1 ?Ói = ?Ó (v, si ). We distinguish between the scheduled time ?Ói and the actual time ti = t(v, si ), which is a random variable depending on road traf?c ?uctuations, passengers boarding and alighting, etc. The difference between the actual arrival time t(v, si ) at a stop si and its corresponding scheduled arrival time ?Ó (v, si ) is the lateness l(v, si ) of the vehicle at stop si : l(v, si ) = t(v, si ) ? ?Ó (v, si ). Note that the lateness is negative when the vehicle arrives earlier that its scheduled arrival. The delay d(v, si , sj ) between the stops si and sj is the change in the lateness: d(v, si , sj ) = l(v, sj ) ? l(v, si ). The time difference between the arrivals of a vehicle at two different stops si and sj , is called the actual travel time between the two

     1 We do not introduce explicitly a departure time from the stop, because in our paper we do not take into account bandwidth constraints so that it is not important to specify the duration of the transmission opportunity between a bus and a stop. Moreover from our traces it is possible to determine the arrival time, but not the departure time.


     stops, tt(v, si , sj ) = t(v, sj ) ? t(v, si ). The scheduled travel time is simply the difference between the scheduled arrivals at the two stops. A key concept in bus networks is the notion of lines, which are basically different vehicles with the same trajectory (at different times). Let L denotes the set of lines. For each vehicle v ?Ê V we denote its corresponding line by line(v) = {v ?ä ?Ê V|traj(v) = traj(v ?ä )}. Note that lines introduce an important characteristic of a bus transportation system: if a passenger misses a speci?c vehicle v, she can still catch another vehicle v ?ä in line(v) and reach the same set of stops. The time between two consecutive arrivals of vehicles belonging to the same line at the same stop is called headway. In the sequel, we will refer to the transportation system T as the quintuple S, V, L, ?Ó (), t() , where the function ?Ó () is a way to represent the schedule and t() denotes a characterization of the stochastic process of vehicle arrivals at the stops. In the next section, we are going to start characterizing this stochastic process. A. Bus Mobility and Communication Models The problem of maximizing the delivery probability by a given deadline requires a realistic statistical

    characterization of bus mobility patterns, which is also useful to generate a large set of synthetic traces and evaluate the performance of our routing algorithms. Transportation literature does not provide a universally valid model for bus movements in an urban environment, since they are strongly affected by vehicular and passenger traf?c conditions, road organization (availability of separate lanes for buses), traf?c signal control management (priority may be given to the approaching buses over the other traf?c), company policies (penalties to the bus drivers for delays), and so on; details of our transportation literature survey are in [1]. Two extreme cases can be considered: 1) buses that are late at one stop can always recover their delay at the following stop (speeding up and reducing their travel times), 2) buses move almost in the same way, and they do not try to recover their delay. The ?rst case better describes lines with high headway, while the second is probably more adapt for lines with short headways, where buses try to respect a given frequency, rather than an exact schedule2 . In terms of the quantities we have de?ned above, in the ?rst case, latenesses at consecutive stops are almost independent, while in the second case they are highly correlated. We have performed a statistical analysis of a one day trace with actual bus arrivals at their stops provided to us by Turin??s public transportation company. Due to lack of space, the full details of this analysis appears only in the accompanied technical report [1]. Here, we only present the following two consequences of this analysis, and refer to them as Assumptions 1 and 2. These hypotheses are going to be kept for granted in the rest of

     2 This distinction is expressly advertised by Turin public transportation system, that label lines as frequency-based and schedule-based.

     the paper and will be fundamental to develop our routing algorithm. Assumption 1: Bus travel times at consecutive stops are independent (but not necessarily identically distributed; in particular, their distribution will depend on the corresponding scheduled value). Assumption 2: The distribution of the waiting time at a stop only depends on the stop and the characteristic of the departing bus line, not on the line of the arriving bus. We note that Assumption 2, which plays an important role in enabling a graph representation with additive edge weights, is partially a consequence of Assumption 1: Consider buses moving according to the schedule, and transferring from line ?1 to line ?2 at stop s. It is clear that the waiting time at the stop can be evaluated a-priori on the basis of the scheduled arrival time of the ?1 vehicle and the departure time of the following ?2 vehicle. But under Assumption 1, arrival times of ?1 buses at stop s are random variables and so are the corresponding waiting times. Intuitively, if the variability of ?1 arrival times is large in comparison to the

    headway3 of line ?2 , the waiting time will have almost the same distribution of the waiting time seen by a Poisson observer, thus it is independent of ?1 ??s schedule. This is also supported by our trace analysis [1]. Finally, in our scenario we assume that data transfer during a transmission opportunity can fail. This can be due to different causes: channel noise and collisions, but also nodes failing to discover the communication opportunity, or contact duration being insuf?cient to transfer the data. Our main assumption is the following: Assumption 3: The success probabilities for the message transmissions are independent. IV. ROUTING A LGORITHMS IN A B US N ETWORK As mentioned before, our routing algorithms aim to de?ne an off-line routing for the transportation system that maximizes data delivery probability by a given deadline: De?nition 1: Given a transportation system T = S, V, L, ?Ó (), t() , a source stop ss , a destination stop sd , a start time tstart , and a deadline tstop , the on-time delivery problem is to ?nd a route between ss and sd that starts after time tstart and maximizes the on-time delivery probability, i.e. Pr{data is delivered before time tstop }. We ?rst discuss how we represent the transportation system as a graph, considering the natural operation of a bus system with transfers from buses to stops and then to buses (i.e., involving only bus-stop communications). The following four issues lead to our ?nal representation: computational complexity, intrinsic properties of the bus transportation system (namely, the existence of lines), characteristic of the stochastic process t() (namely, waiting times in the stops depends on the departing line), and an advantage coming from working with additive edge weights. Speci?cally, in our representation, which we call stop-line graph Gsl = Vsl , Esl , the nodes are (s, ?) pairs where s is

     3 According to our model the variance of the lateness increases along the trajectory.


     1 CDF(traversal time)

     P1 P2 P3 P1+P2+P3


     0 30 35 40 45 Deadline [minutes] 50

     Fig. 2. (a) Example of bus network with S = {A, B, C, D, E, F } and L = {1, 2, 3, 4}: the node corresponds to a stop and the label on the edge represents the line connecting the two stops. (b) The corresponding line-stop graph Gsl . Dotted edges are travel edges, while dashed edges are travel-switch edges.

     a stop and ? is a line; (s, ?) ?Ê Vsl if and only if line ? ?Ê L arrives (or depart) at stop s ?Ê S. In addition, we add two nodes ss and sd which are connected to all nodes that correspond to the source and destination stops. The edges of Gsl are de?ned as follows: An edge

    between (s, ?) and (s?ä , ??ä ) corresponds to routes between stops s and s?ä with line ? that continue from stop s?ä on line ??ä . If ? = ??ä we call the edge a travel edge, while if ? = ??ä we call it a travel-switch edge. An example of Gsl appears in Fig. 2. We now de?ne the random variables associated to the edges in Esl . The random variable of a travel edge describes the corresponding travel time between two stops: formally, a travel edge e = ((s, ?), (s?ä , ?)) is associated with the random variable we = tt(?, s, s?ä ) describing the travel time of a line ? bus from stop s to stop s?ä . The random variable of a travel-switch edge includes the travel time between the corresponding stops and the waiting time for the next line, taking into account possible transmission failures. Formally, a travel-switch edge e = ((s, ?), (s?ä , ??ä )) is associated with the following random variable we . +?Þ tt(?, s, s?ä ) + wt(??ä , s?ä , k) with prob. pf with prob. (1 ? pf )2 pk?1 f

     Fig. 3. Delivery probability CDFs of three disjoint paths P1 , P2 and P3 , connecting a source and a destination with different traversal times and without transmission failures (pf = 0). Path P1 has the lowest expected traversal time; the variance of P2 is the smallest, while P3 ??s variance is the largest. P1 , P2 and P3 are respectively the optimal paths computed by O N T IME for deadlines between 34 and 43 minutes, larger than 43 minutes, and shorter than 34 minutes. The curve labeled P1 + P2 + P3 corresponds to the success probability obtained by a multi-copy approach exploiting all the three paths.

     as: tr(P) = e?ÊP we . Now, given the graph Gsl , the ontime delivery problem corresponds to ?nding a path P such that Pr{tr(P) ?Ü tstop ?tstart } is maximized. Note that, under this construction, our problem is similar to the problem de?ned by Frank [12], with the differences highlighted at the end of Sec. II. A. Single-Copy Routing Algorithm and Implementation We now turn to de?ne our routing algorithm, called O N T IME, which aims at solving the on-time delivery problem. O N -T IME ?nds, in general, different paths for different values of (relative) deadline tstop ? tstart . For example, Fig. 3 compares the Cumulative Distribution Functions (CDF) for the delivery times of 3 different paths, for a given sourcedestination pair and no transmission failures (pf = 0). In this case, O N -T IME chooses one of the three paths depending on the given deadline. Nevertheless, the larger the deadline, the larger the resulting on-time delivery probability is. O N -T IME works by ?rst determining a potentially good path between the source to the destination (for example, that with the minimum expected traversal time), and evaluating its ontime delivery probability. This can be done by performing a (numerical) convolution of the different random variables distributions along the path, yielding the end-to-end traversal time distribution. By this

Report this document

For any questions or suggestions please email