On the Optimal Path Length for Tor

本论文发布在 2016 年的隐私增强技术国际研讨会(Privacy Enhancing Technologies Symposium, PETS)
论文 PDF

主要对比了 Tor 在两跳、三跳在性能、安全性上的区别:

属性 两跳 三跳
不可用节点概率
电路不可用概率
重建电路概率
通信性能
恶意节点获取端点节点信息 容易 困难
匿名性

摘要

原文

Abstract. Choosing a path length for low latency anonymous networks that optimally balances security and performance is an open problem. Tor’s design decision to build paths with precisely three routers is thought to strike the correct balance. In this paper, we investigate this design decision by experimentally evaluating several of the key benefits and drawbacks of two-hop and three-hop paths. We find that (1) a three-hop design is slightly more vulnerable to endpoint compromise than a two-hop design in the presence of attackers who employ simple denial-of-service tactics; (2) two-hop paths trivially reveal entry guards to exit routers, but even with three-hop paths the exit can learn entry guards by deploying inexpensive middle-only routers; and (3) three-hop paths incur a performance penalty relative to two-hop paths. Looking forward, we identify and discuss a number of open issues related to path length.

本文研究了不同路径长度对性能和安全性的影响,主要有以下三点结果:

  1. 三跳比单跳更容易收到 DDoS 攻击
  2. 两跳时,出口节点可以很容易获得入口节点信息(但即使是三跳,仍然可以低成本获得入口节点信息)
  3. 相对于两跳,三跳有更大的性能损失

1 介绍

原文

1 Introduction

Design decisions made by low latency anonymizing networks frequently involve achieving a correct balance between security and performance. For example, Tor does not employ cover traffic or add intentional delays in order to ensure performance that is sufficient to support interactive applications such as web browsing. However, this decision has increased Tor’s vulnerability to end-to-end traffic correlation. Another key design decision is path length. Tor employs a decentralized architecture of precisely three routers to mitigate any single router’s ability to link a source and destination. However, three-hop paths have a performance cost. In this paper, we seek to better understand the security and performance trade-offs related to path length design decisions.

Tor’s design — like most low latency anonymizing networks — is vulnerable to end-to-end traffic correlation attacks. If the endpoints are compromised, an adversary can apply any one of many known traffic analysis attacks [1–8] to correlate the source and destination. Conventional wisdom indicates that three-hop paths achieve an appropriate balance between security and performance. However, two-hop paths may be attractive to users seeking improved performance, though it is unclear what security trade-offs two-hop paths may incur.

Through analysis, simulation, and experiments performed on the live Tor network, we critically evaluate the advantages and disadvantages of two-hop and three-hop paths from security and performance perspectives. In addition, we identify and discuss a variety of open issues related to the security and performance of different path length choices.

低延迟匿名网络往往是安全和性能的折衷选择。Tor 不会为了提抗关联而刻意增加延迟。同时, Tor 采用三跳的电路来减轻单个中继联通信源和信宿的能力。
由于不会增加额外的延迟,大多数的低延迟匿名网络都很容易收到端到端流量关联攻击(本文引用文献 1~8 都是流量关联攻击)。大多数情况下,三跳实现了安全和性能的平衡,但对于对性能有额外要求的用户,使用两跳则存在极大的诱惑性。
在目前的 Tor 网络上进行仿真实验,针对两跳和三跳进行了分析,并讨论了各种相关问题。

路径长度和安全性

原文

Path length and security. We consider an adversary who uses selective disruption tactics (as in [4, 9–11]) to force circuits to be rebuilt in the event that a malicious router participates in a circuit that is not compromised. Through simulation of Tor’s router selection algorithm fueled by real router data obtained from Tor’s trusted directory servers, we show that three-hop paths are up to 7% more vulnerable to path compromise than two-hop paths under the same attack.

One potential disadvantage of a two-hop design is that exit routers can trivially discover clients’ entry guards, since they communicate directly. We empirically demonstrate that malicious exit routers can identify clients’ entry guards even with three-hop paths by deploying middle-only routers that employ selective disruption. Our results show that an adversary with only ten malicious exit routers and 50 middle-only routers can learn the entry guards for nearly 80% of all circuits constructed. We also analyze the potential to identify clients uniquely through knowledge of their entry guards.

We lastly perform experiments on the real deployed Tor network to show that low cost timing-based traffic analysis techniques that link circuits by their circuit building messages can be highly successful in practice. On the live Tor network with a workload of real user traffic, we show that timing analysis can successfully link 97% of the traffic from clients that we control even before any data traffic is sent.

假设对手使用选择中断策略(Selective Disruption Tactics)来强迫电路重建,将恶意服务器插入到原本可信的电路中。在使用真实数据模拟中,发现在相同的攻击条件下,三跳路径比两跳更容易发生路径妥协。
两跳设计的缺点是:出口节点可以很容易得知入口节点,但使用该攻击手段,即使只有 10 个恶意出口节点和 50 个恶意中间节点,攻击者可以识别高达 80% 的入口节点。
在实际的 Tor 网络中,采用基于时间的流量分析可以成功分析出 97% 的流量。

路径长度和性能

原文

Path length and performance. In addition to an analysis of path length from a security perspective, we show that shorter paths offer better performance as perceived by end-users in terms of download time. We perform an analysis of typical web browsing behavior and demonstrate that users will see fewer circuit failures with two-hop paths, which results in faster web page loading and an improved user experience.

使用两跳的路径,用户更难以遇到电路故障,可以更快地建立电路并提升用户体验。

2 Tor:第二代洋葱路由设计

原文

2 Tor: The Second Generation Onion Routing Design

Tor is the second generation onion routing design providing a low latency anonymizing overlay network for TCP-based applications [12]. One of Tor’s primary design goals is to ensure low enough latency to facilitate interactive applications such as web browsing and instant messaging. Tor’s system architecture consists of Tor routers, which are volunteer-operated servers, directory servers that organize information about the Tor routers, and Tor proxies (or clients). Tor routers may be configured by their operators to allow connections only to other Tor routers, or to allow exit connections to arbitrary hosts on the Internet. Tor clients query one of the authoritative directory servers to obtain a signed list of the available Tor routers, their public keys, bandwidth advertisements, exit policies, uptime, and other flags indicating their entry guard status and other information.

To establish an anonymous virtual connection through the Tor network to a desired destination, the client must first choose a path (or circuit) of precisely three Tor routers and establish a shared symmetric key with each, using authenticated Diffie-Hellman and a telescoping key agreement procedure. Once the circuit has been created, the client encrypts their data in 512 byte units called cells with each key in a layered manner and forwards these cells to the first router in the circuit. Upon receiving a cell, each router removes its layer of encryption using its symmetric key shared with the client and forwards the cell to the next router in the circuit. Finally, after the exit router removes the final layer of encryption, it establishes a TCP connection with the destination and sends the client’s data. More details can be found in the Tor Protocol Specification [13].

Tor 是第二代洋葱路由设计,为基于 TCP 的应用提供了低延迟的匿名网络。通过查询目录服务器,可以得知 Tor 中继的列表、公钥、带宽等信息。客户将选择三个中继,并使用 Diffie-Helman 进行密钥交换构建电路。并且消息将会在三层加密后发送出去。

2.1 Tor 路由选择算法

原文

2.1 Tor’s Router Selection Algorithm

The manner in which Tor clients select their routers has serious implications for the network’s security properties. For example, if a client chooses malicious routers, then they may experience lost anonymity. At Tor’s inception, it was composed of only a few high-bandwidth routers and had few users, so it was sufficient to select routers uniformly at random. As the network grew in popularity and router bandwidth diversity, it became necessary to balance the traffic load over the available bandwidth resources, which can be achieved by selecting routers according to their bandwidth capacities. However, Tor routers self-advertise their bandwidth capacities. It has been shown that an adversary can falsely advertise high bandwidth claims to attract traffic and increase their ability to compromise circuits [4, 14].

Recent work has proposed methods to securely verify these self-reported bandwidth claims [15]. Active measurements have been integrated into the Tor network’s directory servers to verify routers’ bandwidth claims [16]. However, the security of these active measurements has yet to be evaluated.

Tor’s router selection algorithm [17] chooses routers with the following constraints:

  • A router may only be chosen once per path.
  • To prevent an adversary who controls a small network from deploying a large number of routers, each router on a path must be from a distinct /16 subnet (in CIDR notation).
  • Each router must be marked as Valid and Running by the authoritative directory servers.
  • For non-hidden service circuits, each router must be marked as Fast, indicating that the router has at least 100KB/s100KB/s of bandwidth or is within the top 7/87/8 of all routers ranked by bandwidth.
  • The first router on the path must be marked as a Guard by the authoritative directory servers. Clients select precisely three entry guards to use on their circuits, and choose new guards periodically.
  • The last router on the path must allow connections to the destination host and port.

For general purpose circuits, Tor’s path selection algorithm weighs router selection by each router’s perceived bandwidth capacity. In order to ensure that there is sufficient exit bandwidth available, the bandwidth of Exit routers is weighted differently depending on the fraction of bandwidth that is available from nonExit routers. Suppose that the total exit bandwidth is EE and the total bandwidth available is TT. If E<T/3E < T/3, then Exit routers are not considered for non-exit positions. Otherwise, their bandwidth is weighted by (E(T/3))/E(E − (T/3))/E [17].

Entry guards were introduced to Tor’s design to mitigate the threat of profiling and the predecessor attack [14]. Entry guard nodes have special uptime and bandwidth properties. A router is marked as a Guard by the authoritative directory servers only if its mean time between failures is above the median of all “familiar”3 routers and its bandwidth is greater than or equal to 250KB/s250KB/s [18]. By default, clients choose precisely three entry guards to use for their circuits. To ensure that there is sufficient guard bandwidth available, guard node selection is weighted by (G(T/3))/G(G − (T/3))/G, where GG is the amount of available guard bandwidth. If G<T/3G < T/3, then guard nodes are not considered for non-guard positions [17].

Tor 的路由选择将影响网络的安全属性。如果客户选择了恶意中继,那么其可能会失去匿名。在最初时,Tor 的中继都是高性能服务器,且用户很少,只需要随机分配即可。但随着网络普及和用户的增多,中继质量良莠不齐,对资源进行合理分配非常重要。通常,高性能的中继会有更大的概率被分配给用户。尽管攻击者可以伪装自己拥有极高的性能来提升自己被选择的概率,但目前已经可以通过主动测量来重新评测中继的真实性能。

Tor 的路由选择算法按照如下约束执行:

  • 每个中继只会被选择一次
  • 每条电路上的中继必须不在同一个 CIDR 的 16/ 子网,提升攻击的困难
  • 每个中继都应该被目录服务器中标记为可用
  • 对于非访问隐藏服务的电路,每个中继都应该是被标记为 Fast(至少拥有 100KB/s100KB/s 的带宽或速度排名在前 7/87/8
  • 入口节点需要被目录服务器标记为守卫(Guard)
  • 出口节点与目标服务器可联通

通常而言,Tor 的路由选择算法通过每个中继的带宽衡量。出口节点的带宽取决于其他中继带宽的加权和。设总出口节点带宽为 EE,总可用带宽为 TT,如果 E<T/3E < T/3,那么出口节点不会被用于非出口节点位置;否则带宽加权值由 (ET/3)/E(E-T/3)/E 计算。

而对于入口节点,为了减轻前驱攻击、时间攻击、距离攻击等攻击手段,引入了入口守卫的概念1。默认情况下,客户端会选择三个入口守卫节点,为了确保有足够的守卫带宽,守卫节点使用 G(T3)G\frac{G - (\frac{T}{3})}{G} 加权(其中 GG 是可用的守卫带宽)。如果 G<T3G<\frac{T}{3},那么守卫节点将不会用于非守卫位置

3 安全分析

原文

3 Security Analysis

In this section, we study the security implications of Tor’s path length. First, we evaluate how an adversary’s ability to compromise circuits varies between twohop and three-hop paths. Second, we explore how two-hop paths reveal circuits’ entry guards and discuss the potential for adaptive surveillance attacks. We also describe an attack where an adversary with few exit routers and comparatively many middle-only routers can identify the entry guards on a large fraction of circuits. Third, the amount of information about clients that is revealed by entry guard knowledge is analyzed. Finally, we evaluate a low cost traffic analysis technique that links circuits using only circuit building messages on the live Tor network. This attack’s success re-iterates the fact that three-hop paths provide no protection whatsoever against these attacks.

本节中,研究了 Tor 路径长度的具体含义。

  • 评估对手破坏电路的能力在不同长度间的变化
  • 探讨两跳如何揭示电路的入口守卫
  • 入口守卫
  • 一种低成本的流量分析技术

3.1 选择性中断和路径长度

原文

3.1 Selective Disruption and Path Length

To understand the relationship between path length and circuit compromise, we simulate Tor’s current router selection algorithm (described in Section 2.1) using router data collected from an authoritative irectory server.

Simulation setup. We adopt a simulation methodology similar to Murdoch and Watson [19] in which malicious routers are added to the network and circuit compromise statistics are computed. In particular, we simulate 1 000 clients who choose precisely three entry guards and each construct 100 circuits of length two and three that are suitable for transporting HTTP traffic (port 80).4 Next, a variable number of malicious routers between 5 and 50 are injected into the network. Each malicious router has 10 MB/s of bandwidth,5 is marked as a Guard,6 allows port 80 to exit, and is operated on a istinct /16 subnet.

A snapshot of all Tor routers was obtained from an authoritative directory server on January 6, 2010. This snapshot (summarized in Figure 1) consists of 1 735 total routers marked as Valid and Running. Note that the snapshot has sufficient entry guard and exit bandwidth such that both entry guards and exit routers may by used for any position of the circuit, rovided that they have the
appropriate flags.

Results. Figure 2 shows the fraction of circuits that are compromised as the number malicious routers and amount of adversary-controlled bandwidth increases. First, note that for attackers that do not apply selective disruption, the circuit compromise rate is directly proportional to the adversary’s resource investment: 50 attackers with 10 MB/s of bandwidth each control over half of the network’s bandwidth, but are able to compromise just over half of all circuits. Also, the compromise rate is the same regardless of whether three or two-hop paths are used for each malicious router configuration since the attacker gains no advantage from participating in circuits that are not compromised.

Now consider an adversary whose routers selectively disrupt circuits on which they’re chosen that they cannot compromise. Regardless of path length, if the client has the misfortune of choosing three malicious entry guards, then due to selective disruption, their circuits are always compromised. If a client chooses no malicious entry guards, then their circuits are never compromised. Clients that chose one or two malicious entry guards experience circuit compromise with a certain probability. For example, when there are 10 malicious routers and clients use three-hop paths, 38% of clients choose no malicious guards, 47% choose one malicious guard, 13% choose two malicious guards, and 1% choose three malicious guards. Of the clients that choose one or two malicious guards, their circuits are compromised 63% and 85% of the time, respectively. Note that entry guards offer some degree of protection against circuit compromise, since clients that choose no entry guards are safe and the threat increases with the selection of additional malicious entry guards.

As shown in Figure 2, across all malicious router configurations the fraction of circuits compromised is up to 7% higher for three-hop paths relative to twohop paths. With three-hop paths, when the client selects one or two malicious guards, their circuits are disrupted when they use a non-malicious guard and a malicious middle or exit router (or both). In this case, the circuit is rebuilt and the client may use a malicious guard. With two-hop paths, if the client does not use a malicious guard, then only the exit position can disrupt noncompromised circuits. Since three-hop paths have one additional position from which to disrupt circuits, they exhibit a slightly higher compromise rate relative to two-hops. However, it is unclear if this small increase in the risk of endpoint compromise is a sufficient danger to justify a hange in Tor’s default path length.

使用从权威目录服务器收集的路由数据来模拟 Tor 的路由选择算法

模拟设置
在这里,模拟了 10001000 个客户端,并且入口守卫的数目为 33,每个守卫构建 100100 个长度为 2233 的电路,这些电路可用于传输 HTTP 流量。同时,会在网络中注入 5505 \sim 50 个恶意节点,每个恶意路由器拥有 10MB/s10 \text{MB/s} 的带宽,并且被标记为不同子网下的守卫节点。

路由器的分布来自于 2020-01-06 的权威目录服务器快照,共有 17351735 个被标记为可用运行的路由,同时快照中具有足够数量允许被用于入口和出口的节点。节点具体分布如下图:

模拟中使用的路由器的分布模拟中使用的路由器的分布

结果
下图中是恶意节点带宽增加时,电路中混入恶意节点的比例。

妥协电路比例妥协电路比例

对于不采用选择性中断的攻击者,电路中混入恶意节点的概率与攻击者的资源成正比。当 5050 个恶意节点,每个节点的带宽为 10MB/s10\text{MB/s},那么半数的电路将被攻陷。并且无论是两跳还是三跳,概率都是相同的(图中的的黑实线红虚线)。
另一种则是可以选择性破坏电路的攻击者。无论用户不幸选择了三个恶意的入口守卫节点,那么由于选择性破坏的存在,这些电路总是被破坏(无论如何建立电路,都会经过恶意节点)。而如果客户没有选择恶意的入口守卫节点,那么电路则永远不会被破坏。如果有 1010 个恶意节点,客户端使用三路跳转时,38%38\% 的客户端不会选择到恶意守护节点,47%47\% 会选择到一个恶意守护节点,13%13\% 会选择到两个恶意守护节点,1%1\% 会选择到三个恶意守护节点。对于选择到一个或两个恶意入口守护节点时,电路将有 63%63\%85%85\% 的几率被破坏(体现到图中的蓝虚线绿虚线)。

由于三跳路径实际上增加了一次选中恶意节点的机会,所以实际上有更高的概率被破坏。

3.2 自适应监视攻击和路径长度

原文

3.2 Adaptive Surveillance Attacks and Path Length

In addition to the threat posed by compromised routers, Tor’s three-hop design is ostensibly vulnerable to attacks whereby a powerful ISP or government adversary can monitor a targeted circuit’s endpoints’ networks to identify the traffic’s source and destination. This attack is believed to be difficult with threehop paths because it relies on a circuit having the misfortune of choosing an entry and exit router that reside within monitored networks. Since Tor achieves network diversity in its route selection in practice [22], this attack would require collusion by many network operators.

However, with two-hop paths exit routers can directly observe the entry guards. Suppose that a client builds a circuit through an adversary-controlled exit router, but uses a non-malicious entry guard. Since the exit router knows the client’s entry guard, they could adaptively demand network logs from the entry guard’s network through legal channels or other forms of coercion. While this attack requires a powerful adversary and consequently may be unlikely, two-hop paths make the attack technically feasible which may encourage malicious exit routers (or their network operators) to implement it.

While two-hop paths enable adaptive surveillance attacks by leaking entry guards to the exit router, adaptive surveillance is possible even with Tor’s current three-hop design. If an adversary deploys both malicious exit routers and malicious middle-only routers, they can collude to identify the entry guards used for every circuit on which they are used for the middle and exit positions. We next show that an adversary who controls few exit routers and comparatively many malicious middle-only routers can identify the entry guard used for a large fraction of circuits.

Simulation setup. Experiments are conducted where an adversary controls only ten exit routers configured to exit HTTP (port 80) traffic and injects 50 and 75 middle-only routers to the Tor network summarized in Figure 1. All malicious routers have 10 MB/s of bandwidth and now disrupt circuits when they do not control both the middle and exit positions. We simulate 1 000 clients who each build 100 circuits.

This attack strategy has a low cost for the adversary, since they do not need to demonstrate router stability (as is necessary to obtain the guard flag). In addition, all malicious middle-only nodes could be deployed on the same /16 network and all malicious exit routers could be deployed on a second /16 network. Thus, the resources required to launch this attack are modest.

Results. For an attacker with 10 malicious exit routers and 50 middle-only routers, the adversary can identify the entry guard for 79% of all circuits constructed. When the attacker deploys 75 middle-only routers, they discover the client’s entry guard for 85% of all circuits. For these circuits, the adversary could apply pressure and potentially coerce the entry guard (or its network operator) to reveal the identity of the client.

Perhaps the most compelling argument in favor of three-hop paths for Tor is that the middle router hides the entry guards from exit routers. By using a middle router, a malicious exit typically knows only information about the client that is leaked by their applications. However, if malicious exits collude with middle routers who can observe the entire circuit, it becomes feasible for the exit to learn a large fraction of the total client population’s entry guards.

To make matters worse, deploying a relatively large number of middle-only routers causes a global change in Tor’s router selection process. In these experiments, when 50 middle-only routers are introduced, the aggregate entry guard bandwidth G and aggregate exit router bandwidth E no longer satisfies G ≥ T/3 and E ≥ T/3, respectively, where T is the total bandwidth. In this network configuration, exit routers may only be used for the exit position and entry guards may only be used for the guard position. This enables the adversary to focus their few exit routers toward occupying the exit position and maximize their ability to conduct adaptive surveillance.

强大的服务商和政府部门可以监控目标电路的流量来源和目的。当一个电路选择了恶意的入口节点和出口节点,那么其匿名性将受到挑战。由于 Tor 在路由选择中,会尽可能选择不同的不同网络下的运营商,因此执行攻击需要毒功而运营商的串通。
然而对于只有两跳的电路,恶意的出口节点可以直接获取入口节点的信息,并且通过各种方式要求可信的入口节点提供其流量日志,从而获得发送者信息。这是一种很难实现的攻击方式,但理论上完全可行。

模拟设置
在实验中,一个对手控制了 1010 个可作为出口节点的恶意服务器,同时向 Tor 中注入 507550 \sim 75 个中间节点。所有的恶意节点都有 10MB/s10\text{MB/s} 的带宽。模拟 10001000 个客户,每个客户构建 100100 个电路。
这种攻击策略对攻击者而言代价非常低,并不需要证明节点稳定性,并且中间节点可以处于同一个子网中,出口节点部署在另一个子网中。

结果
在上述的情况(1010 个恶意出口节点,5050 个恶意中间节点)下,攻击者可以识别 79%79\% 的入口守卫。而当攻击者部署 7575 个中间服务器时,将会发现 85%85\% 的入口守卫。尽管无法直接获得发送者的身份,但对于拥有足够能力的攻击者,可以要求这些入口守卫提供信息。
当引入 5050 个中间路由器时,入口带宽 GG 和出口带宽 EE 不再满足 GT3G \ge \frac{T}{3}ET3E \ge \frac{T}{3}。攻击者可以集中服务器占领出口节点,并最大化地监视电路。

3.3 入口守卫连接性

原文

3.3 Entry Guard Linkability

With a two-hop design, we know that malicious exit routers can discover clients’ entry guards. It is possible that clients’ entry guards may be uniquely identifying or place clients into small anonymity sets. To understand the extent to which knowledge of clients’ entry guards may be identifying, we next analyze publicly available data on entry guard usage from the Tor Metrics Project [23]. From this data, eleven entry guards provide information about the number of clients that they observe over time.7 Table 1 presents a statistical summary of the number of clients observed by each entry guard on a daily basis. With this data, we can estimate how much identifying information is leaked through knowledge of a client’s entry guard.

We apply the standard entropy metric from information theory [24] to measure how much information is revealed about a user by their entry guard selections. The total number of unique Tor users per day is currently estimated to be between 100 000 and 300 000 [25]. Thus, without any additional knowledge, 17.61 bits of information are necessary to uniquely identify a Tor user.8 Now suppose that a malicious exit router knows a particular client’s entry guard. On average, roughly 25 000 clients use the same entry guard, so this knowledge leaks only 2.96 bits of information about a user’s identity. Even in the worst case when a client shares a guard with as few as 680 other clients, only 8.20 bits are revealed (the full entropy results are shown in Table 1).

If an attacker knows all three of a particular client’s entry guards, the client may be more identifiable since a choice of three guards may be significantly more unique than a single guard. While it is usually difficult to link a client across multiple entry guards, if a client inadvertently identifies herself — perhaps by logging-in to a website or using an application that does not support SSL/TLS — over time her full set of entry guards could be leaked to a malicious exit router. Tor clients do, however, expire their entry guard selections periodically, which may help to protect users from this type of profiling.
We should also point out that, even with three-hop paths, linkability pitfalls still exist in Tor. First, a Tor circuit can be used by several connections, which can be trivially linked by the exit router. Second, the predecessor attack shows that the entry guards used by a client can be learned after O(1/(fmfe)) circuit constructions on average, where fm and fe are the probabilities that a malicious router will be chosen as the middle and exit router, respectively [26]. Selective disruption and other techniques [27] can be used to increase the speed of such attacks.

根据信息论的熵来确定一个用户的入口守卫措施可以透露的信息量。每天使用 Tor 的用户大概在 103010\text{万} \sim 30\text{万} 之间,在没有额外的信息的情况下,17.61bits17.61bits 的信息是标识一个用户最小的数据量。平均而言,大概 2500025000 个客户端使用同一个入口守卫,这将泄露 2.96bits2.96bits 信息。

使用公开的统计信息进行测试,下表记录了统计的客户端的入口守卫和熵

样本 最小值 最大值 中位数 95%置信区间
n=737n=737 680680 164000164000 84168416 (24104,27176)(24104,27176)
8.208.20 0.290.29 4.574.57 (3.05,2.88)(3.05,2.88)

即使在最坏的情况下,一个客户端与其他 680680 个客户端共享入口守卫,有 8.20bits8.20bits 被暴露。如果攻击者知晓一个客户端的所有三个入口守卫,那么客户端将更容易被识别出。

当用户使用非 SSL/TLS 时,通信可能会被泄露给恶意出口服务器,从而导致其匿名性泄露。但 Tor 会定期中止入口守护选择,有助于避免这类攻击。即使是三跳的电路,该缺陷仍然存在:

  1. 一条电路可能被多个连接使用;
  2. 前驱攻击表明,客户端使用入口守卫时,平均可以在构建 O(1fmfe)O(\frac{1}{f_mf_e}) 的电路后成功攻击,其中 fmf_mfef_e 分别是恶意节点被作为中间节点和出口节点的概率;

3.4 低资源流量分析

原文

3.4 Low Resource Traffic Analysis on the Live Tor Network

Prior work has shown that end-to-end traffic correlation attacks launched against low latency anonymous networks can achieve near perfect accuracy [2]. To support interactive or delay-sensitive applications, Tor does not explicitly delay or batch messages to help defend against end-to-end traffic correlation attacks.

Consequently, Tor’s design assumes that these attacks can achieve high accuracy in practice. In fact, such an attack has been proven effective against the live Tor network in 2006 [14]. Since then, a low resource traffic analysis technique has been proposed that uses only circuit construction messages to link a source and destination before any data is sent [4]. This approach allows low bandwidth attackers to maximize the number of circuits compromised, but this low cost attack has yet to be validated on the live Tor network. We next evaluate this traffic analysis approach on the live Tor network.

Experimental setup. We deploy two Tor routers9 hosted on a 100 Mb/s network link onto the live Tor network. Each router has a distinct configuration: (1) One Tor router is configured as a non-exit and after roughly ten days of uninterrupted operation, it obtained the Guard flag from the authoritative directory servers. (2) A second Tor router is configured with the default exit policy.10 During their operation, both routers sustained oughly 3 MB/s of traffic.

To evaluate the expected success of traffic analysis, we operate our own Tor clients and attempt to link their circuits to their destinations. Upon building a circuit, each client downloads www.google.com, tears down the circuit, and repeats this procedure. To preserve users’ privacy, we ignore traffic at the entry guard that is not produced by one of our clients.11 Note that we do not retain any linkable data nor do we attempt to deanonymize any other clients but our own.

Traffic analysis methodology. We apply a traffic analysis technique in which circuits are linked by their circuit building messages before the clients send any data cells. This approach leverages the fact that Tor’s circuit establishment procedure sends a fixed number of circuit building messages in an identifiable pattern.

Briefly, circuit linking via circuit building messages works as follows. First, our entry guard ensures that the circuit building request is from a client and not a Tor router. Next, it is necessary to ensure that the next router for our entry guard is the same as the previous router for our exit router (with a tight time difference). Finally, the circuit building messages for the entry, middle, and exit routers should occur in increasing chronological order. More details about our linking procedure can be found in [4].

Results. On the live Tor network, our clients build a total of 1 696 circuits that always use our entry guard. Of these 821 circuits use our exit router and 875 circuits use a different exit router.12 The middle routers are chosen according to Tor’s default selection algorithm. Through traffic analysis, we link their circuits with 97% accuracy, 0.6% false negatives (6 false negatives in total), and 6% false positives (52 false positives in total). We regard these results as a lower bound on attainable traffic analysis success, as it should be possible to increase the accuracy by also using data cells to link circuits. Also, we observe that circuits that use a popular (i.e., high bandwidth) middle router tend to be more prone to false positives. Thus, an attacker who sees a positive result with a low bandwidth middle router can be more confident in the result. Given the high accuracy and the relatively easy manner in which the traffic analysis was conducted, we confirm that three-hop paths offer no protection against low ost timing attacks.

对于低延迟匿名网络,端到端流量关联攻击可以达到近乎完美的精度。由于低延迟匿名网络中存在大量交互式和延迟敏感的程序,因此 Tor 无法显式对流量进行延迟传输,以防止流量关联攻击。

低资源的流量分析攻击是一种用于低带宽攻击者的攻击手段,在任何数据发送前,仅仅依赖于电路构造消息就可以将发送者和接收者连接起来。但该攻击手段仅在理论上可行,在论文发布前并没有在实际的 Tor 网络中的受到验证。因此在这里对其进行验证。

实验设置
首先在 Tor 网络中部署两个 100MB/s100\text{MB/s} 的节点,每个节点有着不同的配置:

  1. 一个被配置为不作为出口节点,大概需要 10 天,其被目录服务器标记为可作为守卫节点;
  2. 另一个被配置为默认的退怕出策略。

在操作期间,两个服务器将持续 3MB/s3\text{MB/s} 的流量。接着,运行 Tor 客户端,并尝试访问www.google.com

流量分析方法

  1. 入口守卫确保请求来自于客户端而非 Tor 中继节点;
  2. 确保下一跳节点与出口节点的上一跳相同;
  3. 入口节点和出口节点的收发流量的时间应该符合时间逻辑(入口节点先转发,并且极短时间内由出口节点转发)

结果
客户端共构建 19691969 条使用恶意入口守卫的电路,其中 821821 条使用恶意的出口节点,875875 条使用其他的出口节点,中间节点根据 Tor 默认路由选择算法选择。采用上述的流量分析,拥有 97%97\% 的概率成功连接入口节点和出口节点,0.6%0.6\% 的假阴性(共 66 例),6%6\% 的假阳性(共 5252 例)。
由于可以使用数据单元连接来提高准确性,因此该结果应该是流量分析成功率的下限。同时,根据观察发现,对于高带宽的中间节点,会有更大的假阳性误报,因此如果攻击者发现在低带宽中间服务器中继后,得出关联性存在的结果,那么可以认为结果是准确的。

由于三跳电路下,仍然能够很容易地进行流量分析,因此可以认为其对低成本流量分析没有保护作用。

4 性能分析

原文

4 Performance Analysis

We have already studied Tor’s path length from a security perspective. We next examine its performance implications. Since the vast majority of Tor traffic is interactive web browsing [20, 21], we investigate the performance benefits of a two-hop design from a web browsing end-user’s erspective.

Experimental setup. In order to understand Tor’s performance in a manner that reflects the quality of a user’s experience, we simulate real clients accessing the 15 most popular websites13 over Tor version 0.2.1.24 with Polipo version 1.0.4 and measure the download times. Experiments are conducted in February 2010 over the course of four days.14 Circuits are constructed according to Tor’s default router selection algorithm and the Firefox browser downloads one of the web pages.

In the event of a circuit failure, Firefox’s default behavior is to time-out after two minutes. However, real users may be impatient and explicitly force the browser to reload the page by pressing the “refresh” button. Prior work has found that users of low latency anonymous networks tend to tolerate no more than four seconds of latency [28]. Thus, in the event of a circuit failure, we assume that users wait not the full two minutes for their browsers to time-out, but precisely four seconds before explicitly eloading the page.

Results. A CDF of download times for twoand three-hop paths is shown in Figure 3. For three-hop paths, half of all web page downloads take longer than 12 seconds, while for two-hop paths, half complete in over 8 seconds. The mean download time for three-hop circuits is over 28 seconds, which is twice the expected download time for traffic over twohop circuits (14 seconds).

We observe that circuit failures tend to be a significant cause of the additional expected download times with three-hop circuits.15 21% of circuits fail with three-hop paths, but only 15% of circuits fail with two-hop paths. The observed unreliability of three-hop circuits may contribute to high download times, as some users may wait unnecessarily for their browser to time-out. In these experiments, we assume that the user can quickly identify that their session has stalled (i.e., by observing that no web content has loaded) and refresh the page after waiting four seconds for content to appear. However, some users may take significantly longer to launch another web request.

前面已经针对安全性对 Tor 的路径长度进行了分析,下面针对性能进行分析。

大多数的 Tor 流量都用于交互式 Web 服务,在这方面两跳拥有极大的性能优势。

实验设置
使用Tor version 0.2.1.24以及Polopo version 1.0.4模拟用户访问 1515 个最流行的网站,测试下载时间。实验运行于 2010 年 2 月,为期 4 天。
在电路故障的情况下,Firefox 默认在两分钟后超时。对于普通用户,往往在这个时间后会显得不耐烦,使用刷新按钮重新加载。在之前的研究中,证明低延迟网络往往不能忍受超过 44 秒的延迟。因此当 44 秒无法加载完成,即视为电路断开。

结果
下图显示了两跳和三跳情况下的下载时间

网页加载时间网页加载时间

对于三跳电路,有一半的页面下载时间超过 1212 秒;而对于两跳路径,一般的页面下载时间超过 88 秒。三跳电路的平均下载时间超过 2828 秒,是两跳路径 1414 秒的两倍。
其中,导致下载时间增加的主要原因是电路故障率的提升。21%21\% 的电路会在三跳下出现故障,只有 15%15\% 的电路在两跳下出现故障。

5 讨论

原文

5 Discussion

Having discussed Tor’s path length from security and performance perspectives, we next discuss a variety of open issues related to path length.

在讨论安全和性能问题后,下面讨论更多相关性问题

5.1 用户可配置路径长度

原文

5.1 User-Configurable Path Lengths

Since two-hop paths offer better performance, it may be tempting to allow users who value performance over security to use two-hop paths while users who need stronger security may use three-hop paths. Suppose that most users value performance and consequently, Tor chose a default path length of two hops. Securityconscious users could optionally use three hops to take advantage of the additional security that three-hop paths offer against adaptive surveillance. However, clients who choose to use longer paths may be identified as desiring additional security, which alone could draw an adversary’s attention. Furthermore, it has been argued that most users tend to keep default options, even when the defaults may not be optimally suited to their needs [30]. Allowing users to configure their own path lengths assumes that users understand the full security implications of their choice, which is unlikely, particularly for novice users. Thus, all users should be encouraged to use the same path length.

可以允许重视性能的用户使用两跳路径,而重视安全的用户使用三跳路径。大部分的用户可能更重视性能,因此默认可以设为两跳。但由于大部分用户可能并不能了解配置参数的含义,可能并不会根据自己的需要来修改默认参数。同时,如果用户使用更长的路径,可能会认为是一个明显的特点,更容易进行分析。因此应该鼓励所有用户使用相同的长度。

退出节点的潜在责任

原文

5.2 Potential Liabilities for Exit Routers

Beyond the potential risks of identifying users who desire stronger security by their path length choice, two-hop paths could be a liability for exit router operators. With three-hop paths, exit routers know nothing about clients other than what may be revealed by their traffic. However, with two-hop paths, exit routers are exposed to clients’ entry guards; thus, they are no longer agnostic with regard to the clients whose traffic they transport. Exit routers could be presented with subpoenas to reveal entry guard information to governments or law enforcement agents, which increases the risks associated with operating an exit router. Since Tor’s exit bandwidth is relatively scarce yet essential to the network’s ability to function properly, liabilities for exit router operators should be minimized to attract additional exit routers.

由于当出口节点被暴露后,其可能会受到合法的请求提供日志。由于 Tor 的出口节点至关重要,且相对较为稀缺。因此其应该尽可能避免受到运营商的影响。

5.3 安全带宽估计

原文

5.3 Secure Bandwidth Estimation

The attacks that we describe in Sections 3.1 and 3.2 are particularly dangerous in the absence of secure bandwidth verification, since malicious routers could otherwise inflate their perceived bandwidth to attract traffic. With secure bandwidth estimates in place, it will no longer be possible to carry out these attacks with few resources. However, it is important to remember that such attacks are still within reach of medium-to-large organizations, or even determined individuals: at current hosting rates, running a 10 MB/s node for one week (long enough for a node to be declared a guard) can cost less than $1 000;16 thus, the financial resources required to attack the network successfully are moderate at best. Additionally, attackers may be able to insert their own high-bandwidth nodes into the Tor network by compromising computers at ell-provisioned institutions.

攻击者可以通过伪装自己拥有更大的带宽来使用很少的资源实现攻击。而使用安全带宽估计,则可以对该手段进行防护。但即使真的使用高带宽服务器提供攻击,成本仍然是有限的。

5.4 两跳设计是否会浪费很多节点

原文

5.4 Does a Two-Hop Design Discard Many Routers?

Many Tor routers are not configured to allow exit traffic and are not fast and/or stable enough to be an entry guard. These routers are only used for the middle position. We next consider whether a two-hop design would discard a significant number of middle-only routers and heir collective bandwidth.

From the directory server snapshot analyzed in Section 3, we find that 639 routers may only be used for the middle position. These routers collectively contribute about 85 MB/s of bandwidth. To understand how bandwidth is distributed among non-exit and non-guard routers, Figure 4 shows a CDF of these routers’ bandwidth contributions. Half contribute less than 50.3 KB/s each and only 11% offer the 250 KB/s necessary to meet the bandwidth criterion for the guard flag. These higher bandwidth routers collectively contribute 54.3 of the 85 MB/s of middle-only bandwidth. If stable enough, they could eventually obtain the guard flag and be used for the entry position.

由于 Tor 对于入口和出口节点拥有额外的要求,因此大部分节点只会用于中间节点。但如图可见,大部分的节点实际上带宽贡献是少于 50.3KB/s50.3\text{KB/s},只有 11%11\% 提供了必要的 250KB/s250\text{KB/s} 带宽并标记为守卫。更高的带宽共贡献了 85MB/s85\text{MB/s} 的中间带宽。

中间节点的带宽中间节点的带宽

6 相关工作

低延迟匿名网络的安全性

原文

Security in low latency anonymous networks. An early security analysis of low-latency anonymous networks suggests that an anonymous path is compromised if its endpoints are controlled by an adversary; the expected success of such an attack is roughly (c/n)2, where there are c malicious routers, n total routers, and clients choose routers uniformly at random [31]. As networks such as Tor evolved, it became necessary to balance the traffic load over a diverse set of volunteer routers, making the task of analytically modeling path compromise more challenging. To meet this challenge, Murdoch and Watson propose that path compromise be analyzed empirically through faithful simulation of the underlying routing mechanism in the presence of different threat models [19]. We adopt a similar empirical approach to reasoning bout Tor’s security properties.

对于低延迟匿名网络的早期安全分析表明,如果匿名电路的断电被攻击者控制,那么匿名路径会被破坏。对于总计 nn 个节点中包括 cc 个恶意节点,这种攻击的预期成功率为 (c/n)2(c/n)^2

选择性中断攻击

原文

Selective disruption attacks. Selective disruption attacks are a form of denial-of-service (DoS) that allow an adversary to increase the number of circuits compromised. These attacks work as follows: a malicious router who uses selective disruption should refuse to forward traffic in the event that they participate in a circuit that is not compromised. This causes the circuit to fail and be rebuilt, providing an opportunity for malicious routers to compromise another circuit.

Bauer et al. show that an attacker with only six malicious Tor routers who utilizes the selective disruption strategy can compromise over 46% of all clients’ circuits in an experimental Tor network with 66 total routers [4]. Similarly, Borisov et al. demonstrate that an adversary who uses selective disruption experiences a significantly greater path compromise rate. Without selective disruption, an attacker who controls 50% of the network’s bandwidth can compromise 25% of all circuits, but with selective disruption they can compromise up to 66% of circuits [9]. However, their analysis was performed using a highly simplified model of Tor path selection. They also found that, for mix networks, increased path length results in greater susceptibility to selective disruption attacks, but did not analyze the effects of path length in Tor. We examine how selective disruption attacks are less effective with two-hop aths than three hops.

Given the danger of selective disruption, Danner et al. describe an algorithm for detecting selective disruption attacks that requires a number of probes that scales linearly with the network size [10]. However, such an active probing approach may introduce high load into the network. Also, active probing could be gamed by an intelligent attacker who can recognize the probes, or the adversary could disrupt circuits probabilistically to blend in with expected background circuit failures. Ultimately, the DoS strategy allows attackers to perform traffic analysis on a far greater number of circuits than ould otherwise be possible.

选择性中断攻击是拒绝服务(DoS)的一种形式。如果恶意节点存在于一个电路中,即使别的节点都是可信节点,这些节点仍然会由于恶意节点的选择性中断而促使电路重建。仅需要 66 个恶意 Tor 节点,即可破坏超过 46%46\% 的客户端电路。对于拥有 50%50\% 带宽的攻击者,如果不使用选择性中断,那么可以造成 25%25\% 的电路被破坏;而使用选择性中断后,则可以破坏 66%66\% 的电路。

端到端流量关联攻击

原文

End-to-end traffic correlation attacks. Prior work has shown that endto-end traffic correlation attacks are highly effective against low-latency anonymizing networks. Levine et al. demonstrate through simulation that the performance of timing-based traffic correlation attacks is dependent on network conditions, but they show that an adversary can correlate traffic with perfect accuracy when the packet drop rate is very low [2]. For circuit linking experiments carried out on a small experimental Tor network, Bauer et al. report only 12 false positives out of over ten thousand successful correlations [4]. However, their traffic load was light and uniform, which may have contributed to the extremely low false positive rate. Also, Syverson and Øverlier report a negligible false positive rate for a traffic correlation attack on a Tor hidden service [14]. In this paper, we verify that similarly high traffic correlation accuracies can be expected for low-resource traffic analysis attacks launched on the real Tor network.

端到端流量关联对于低延迟匿名网络非常有效,基于时间的流量关联性能取决于网络条件。当丢包率很低时,可以非常精确地进行流量关联。在小型的 Tor 网络中进行实验,1000010000 个成功关联中,只有 1212 个误报

备选路由器选择策略

原文

Alternate router selection strategies. Given the threat of malicious routers positioning themselves at circuit endpoints for a large number of circuits, Snader and Borisov propose that clients have the ability to “tune” the router selection process between security and performance [32]. Choosing routers more uniformly at random reduces the end-user’s risk of choosing malicious routers who inflate their bandwidth claims to attract traffic, however, at the potential cost of choosing low bandwidth routers and experiencing poor performance. In addition, Sherr et al. propose that link-based attributes (such as latency or jitter) be used to select routers rather than node-based attributes (like bandwidth) [33]. However, these proposed routing techniques have yet to be adopted in practice. Consequently, we only consider Tor’s current router selection algorithm in our subsequent analysis.

均匀、随机地进行路由选择可以降低终端用户选择恶意节点的风险,这些恶意服务器通过夸大自己的带宽来吸引流量。有人提出使用基于链接的属性(延迟、抖动)来选择路由,而非基于节点属性(带宽),但这些路由技术尚未投入使用。

前向性能分析

原文

Prior performance analyses. Beyond the security properties of low latency anonymizing networks, recent work has investigated their performance characteristics. It has been shown that users are more likely to use anonymous communication services that offer better performance [34]. In a performance study of Tor and AN.ON [35] (a mix cascade) from the end-user’s perspective, Wendolsky et al. find that Tor is subject to unpredictable performance and observe that users exhibit a four second tolerance to delay [28]. However, Tor users often experience significant delays beyond this user olerance threshold [29].

To help explain Tor’s poor observed performance, Reardon and Goldberg identify that because Tor multiplexes many streams over the same TCP connections, congestion control interference among different circuits is produced [36]. These unintended interactions often cause very high delays for end-users. TCPover-DTLS, an alternate transport design, is proposed to improve performance. Our work is complementary to these prior studies. Since end-users are sensitive to excessive delays, we quantify the performance improvement that can be expected with a two-hop design and argue that such a design may offer even more improvement in combination with TCP-over-DTLS.

用户通常更倾向于使用性能更好的通信服务,用户往往只能忍耐 44 分钟的延迟。Tor 在相同的 TCP 中,往往会传输多个流,不同电路之间的拥塞控制互相干扰,这种非预期的交互通常会导致较高的延迟。为了提高性能,提出了另一种传输设计——TCP-over-DTLS

7 结论

原文

7 Conclusion

We critically evaluate Tor’s path length and consider the advantages and disadvantages of a two-hop and three-hop design. We show that two-hop paths are slightly less vulnerable to circuit compromise attacks than three-hop paths, but two-hop paths are trivially vulnerable to adaptive surveillance and introduce potential liabilities for exit node operators. While performance is improved with shorter paths, we conclude that there is no strong argument for reducing Tor’s path length. However, we identify a number of open issues that could effect this decision. Our hope is that this paper encourages further investigation into the security and performance trades-offs of various path lengths.

本文批判性地评估了 Tor 的路径长度。两跳路径相对于三跳更容易受到攻击,但拥有更好的性能。但总的而言,减少 Tor 的路径长度并没有特别大的理由。

参考资料


  1. Locating Hidden Servers ↩︎