Christian's profileChristian Huitema's Inte...PhotosBlogLists Tools Help
Photo 1 of 4
More albums (1)

Christian Huitema's Internet experience

January 06

Shall we rethink Internet Congestion Control?

I  am witnessing an interesting discussion between Internet researchers on the "end to end" mailing list. It started with a complain by Joe Touch about the deployment of "CUBIC" in Linux. CUBIC is a new congestion control algorithm developed for "high speed TCP" by Injong Rhee at NCSU. There is a description of that algorithm on their web site (http://netsrv.csc.ncsu.edu/twiki/bin/view/Main/BIC). This is one of several algorithms that are designed to improve the behavior of TCP by somehow relaxing congestion control when a lot of capacity is available. My colleagues at Microsoft have developed another algorithm to the same effect, Compound TCP or CTCP (http://research.microsoft.com/wn/ctcp.aspx). CUBIC is enabled by default in some distributions of Linux, CTCP is available in Windows Vista and will be enabled in WIndows Server 2008. We could debate the relative merits of these two proposals, but that was not the point of the discussion, and is not the point of this blog entry. The debate was about the appropriateness of shipping "experimental algorithms" in "production servers".
 
On one side of the debate are some long time members of the IRTF "end to end" research group, like for example Joe Touch. They are essentially on the side of caution. Changing TCP algorithms changes the way the end systems react to congestion. If some hosts are more aggressive than the majority of others, they can gain an unfair share of the Internet bandwidth. Even worse, if the new algorithm behaves really improperly, there is a risk of destabilizing the entire Internet. These arguments are developed by Sally Floyd in RFC 2914 (http://www.ietf.org/rfc/rfc2914.txt). Joe and others argue thus for the conservative behavior: do not deploy anything in production before it has been reviewed and proposed as a standard by the IETF.

I have two problems with that approach. First, I am not sure that an IETF group should be in position to determine what should or should not be deployed on the Internet. There are obvious issues of checks and balances, since various constituencies will want to deploy new software for various reasons, for example business interests, and others may want to disallow it for opposite business reasons. IETF working groups are designed to develop technology, not to arbitrage between such interests. Then there is the issue of enforcement, which for the IETF can only rely on some form of moral authority, and thus will not be very effective. But my second issue is technical. I simply do not think that we should rely on software in the end systems to ensure stability and fairness in the Internet.

In RFC 2914, Sally Floyd argues that congestion control in the end systems has two principal purposes: preventing congestion collapse and establishing some degree of fairness. But this is equivalent to saying that continuous stability of the Internet depends on the benevolent cooperation of all Internet users. The implementation of slow start in TCP did indeed prevent the Internet to collapse at a crucial time in its evolution. But that was then. I don't think we can extrapolate the 1988 fix into an everlasting principle, not with a billion hosts on the Internet.

If we cannot rely on the benevolent sum of individual behaviors, we need to build mechanisms in the network that help it guarantee its stability. In that research on "network based" mechanisms, we should accept that end systems will be primarily motivated by their self interest. They are certainly not motivated by a desire to be fair with others. The desire of fairness is a social contract, and I don't think we can assume such a contract when the Internet covers the entire world. If we could, that would indeed be a good thing, we would also have worldwide peace and all that kind of thing. So, we have better assume that end system will try to maximize their individual satisfaction, rather than looking for the common good. If the network mechanisms are properly designed, they will ensure that the sum of individual optimizations results in the desired global behavior. Scott Shenker made that point in a SigComm paper published in 1994, "Making greed work in networks: a game-theoretic analysis of switch service disciplines" (http://portal.acm.org/citation.cfm?id=219753.219804). That paper may be old, but it is not too late to revisit it!

In fact, ISP are already attempting to build such "stabilization tools" in their networks. We see various forms of traffic shaping implemented at bottleneck points. We see various tools used to perform "traffic engineering". ISP need to do that if they have any hope of providing some kind of guarantees of service. Many networking researchers will find those tools crude, or possibly harmful. They may be correct, but their reaction cannot be to retreat in the ivory tower and leer at those lowly network engineers. Instead of clinging to the illusion that we can entirely solve the problem in an end to end fashion, that all end systems will follow the dictate of the E2E group, maybe we should actually address the problem. What is the best mechanisms to deploy in the Internet to make it immune to variations in end to end algorithms?

-- Christian Huitema

January 05

Privacy, phishing and IE7

I have the privilege of working for Microsoft, and with that come a few interesting side-effects. Periodically, I would receive e-mail from some friends who complain about a bug or maybe a feature in one of our products. And they will ask me to use my influence, talk to the team in charge, and get it fixed. A tall order indeed, since in a large company like Microsoft your sphere of influence tends to be limited to the products you directly work on. But, hey, one can always try.

It happened again last month. A friend had read a report about the privacy implication of the "anti-phishing" feature of IE7. That particular report implied that, if the anti-phishing feature was selected, IE7 would contact Microsoft for each web page, asking whether the page's URL was or was not listed as a phishing site. My friend was quite incensed. He was concerned that ever computer in his company would report to Microsoft a complete list of all URL visited by their employees, providing Microsoft with a detailed view of this company's activity. He went on to mention the various counter-measures that the company could take, from blocking the access to the Microsoft anti-phishing server in their firewall to prohibiting use of IE7 or even Windows.

Privacy is actually something I care a lot about. In fact, Microsoft as a company cares a lot about it. There have been some incidents in the past in which products "called home" to the dismay of privacy advocates, but we now have well delineated internal procedures to protect the privacy of our customers. Products are only supposed to call home when they have a good reason to do so, e.g. provide a service or collect statistics. The feature has to be opted in by the user. The collection of data should be limited to the strict minimum required to provide the feature. Any use of personal identifiers faces a lot of scrutiny. There must be procedures in place on the back end to protect the data, and dispose of it when it is not needed. In fact, some product managers believe that our protections are too strict and place us at a disadvantage relative to competitors who seem to have no second thought archiving people's entire web history, but personally I like it that way.

So, when I received the mail about IE7 and privacy, I was a bit puzzled, and asked some colleagues in the IE7 team about it. It turns out that they had indeed followed our guidelines, and that the blog entry that my friend's read was unduly alarmist. By nature, the anti-phishing feature requires some contact with Microsoft, since there will sometimes be a need to check an URL against a database of phishing sites, and there will also be a need to report phishing sites for inclusion in the database. Given that constraint, the IE7 team tried to minimize the privacy impact. (They pointed me to an entry on this subject in their blog: http://blogs.msdn.com/ie/archive/2006/05/08/592677.aspx. It contains more information on the topic, including a pointer to a review by independent auditors.) Their design has multiple levels of protection:

  • The feature is opt-in, i.e. it is only activated if the user explicitly agrees to use it.
  • There is a way for IT managers to disable it on their company's PC.
  • IE7 first applies a local filter, and only queries the database if the page fails that local filter.
  • The query is transmitted over SSL, so bad guys will not be able to look at the traffic.
  • The query only contains the "path" component of the URL, not the parameters entered in a form.
  • The database back-end does not receive any personal information about the user, not even the IP address of the computer.

In fact, I can easily picture the team going through brainstorms and reviews to make sure that their feature met the privacy guidelines. Frankly, I believe they did a very good job.

Of course, all that reasoning will not be worth much if people do not trust Microsoft. We can repeat as much as we want that we don't actually store the information, and that we are not trying to keep tabs on the user. There will always be someone who doesn't believe us. Oh well. The summary is that the feature is opt-in, can be controlled by IT administrators, and has been implemented to mitigate privacy risks as much as possible, given the basic need to occasionally check URL against a centrally updated list of phishing sites. Users, and IT managers, will have to evaluate the trade-off: trust that Microsoft will not do anything stupid with the data, versus trust users to not do anything stupid when stumbling on a phishing site. Now, I can wait for the next question…

-- Christian Huitema

December 18

Hiding the SSID, or why too much security is counter-productive

The common wisdom is that, if you want to secure your Wi-Fi network, you should program your access point to hide the SSID. The common wisdom is wrong, and I wonder how long it will take to reverse that.

On the surface, it seems to make sense. If your access point does not broadcast the name of the network, the hacker in the parking lot should be dumbfounded. To connect to a network, the client must know the SSID. If it is unknown, the hacker should not be able to guess it, and thus should not connect. Right? Wrong, of course!

In normal procedure, the access point send "beacons" at regular intervals. The beacons include the name of the network: "Hello, I am an access point serving network EXAMPLE-1." The client hears it, recognizes that "EXAMPLE-1" is a network to which they want to connect, and proceed with establishing the connection.

Suppose now that the AP is programmed to not broadcast the name of the network. It will still send beacons at regular intervals, but these beacons will be somewhat cryptic: "Hello, I am an access point serving some network, but I won't tell you which one." The client hears that, and must now explicitly "probe" the access point, sending a message much like: "Do you happen to serve the network EXAMPLE-1?"

At this point, attackers have two options. First, the can lay in wait near an access point, and listen to the airwaves until a client shows up. When the client sends its probe, the hacker learns that the network could very well be named "EXAMPLE-1". Bingo, here comes the SSID. In fact, attackers can force clients to send a probe by messing with the 802.11 control packets, and effectively force an authorized client to restart the connection procedure. So, even if the access point does not broadcast the SSID, attackers can learn it from the clients, with minimal effort and minimal delay. Hiding the SSID in the access point is at best a speed bump on the way of the attacker.

But let's look at the second option. Suppose that the attacker lays in wait in some public place, such as an airport lounge. The attacker will mimic an SSID that hides its name. If a laptop is programmed to connect to a "hidden" network, it will send a probe: "Hello, are you the MUMBLE network?" Replace here MUMBLE by the name of your company's network, Microsoft, Boeing, etc. The attacker just learned the name of your company. That is already a significant information disclosure, placing your privacy at risk.

But it gets even worse. Suppose that the attacker programs its "mock access point" to just acknowledge any request: "Yes, I am indeed the MUMBLE network." The laptop will happily recognize a "good" network, and will establish the connection. All the ensuing traffic will flow through the attacker's computer. At this point, the attacker can play all kinds of games: spying on the traffic, redirecting queries to chosen sites, probing the laptop for open ports, etc. This is a very powerful attack.

The bad news is that, until recently, the Wi-Fi stack in Windows XP did not make the difference between hidden and public networks. If it heard an "empty beacon", it would automatically send probes for all its preferred networks, effectively empowering the attackers. We changed that in Vista. In Vista, the Wi-Fi stack will only send probes for the networks that have been explicitly programmed as "hidden". If there are no hidden networks at all, the stack will not send any probes, and the attack will be thwarted. Recently, we also back-ported the changes to XP – they are available through Windows Update.

But, of course, if there is at least one hidden network, the stack will still send the probe messages. The alternative would be to require an explicit "manual" connection to any hidden networks. Customers would not be happy. So, the only way to be safe is to not require these customers to connect to a hidden network. So, now, you are warned. You should not program your network to hide its SSID. It does not improve the network's security, since attacker will deduce the SSID value from client traffic. And it does expose clients to automatic attacks, when they travel to public places.

October 09

Replacing ICANN by a P2P system?

The ICANN debate has been going on for at least 10 years now, if we count the initial discussions on alternative top-level domains. It has recently escalated into a full fledge diplomatic show involving the US, the EU, and the UN.
 
We should really think of the consequences of our designs. At the root of the debate is a design decision made 20 years ago, when the Internet moved to the domain name system. The question was, how to safely resolve names such as "mail.example.com" (there were no www back then). The accepted technology at the time was to have a big file with all the names and all the corresponding IP address. The design decision was to replace that with a hierarchical system. Whoever owns "example.com" would maintain a file with the name of the various hosts in the "example.com" domain, the managers of ".com" would maintain a file with all the registered names, and a root service will maintain the list of all top level domains. Simple and, and in fact very practical as long as you trust the root service.
 
No need to rewrite history, but we know better now. Centralized systems attract all kind of people, since controlling the center allows for power and revenues. ICANN may try to be a virtuous and neutral operator, but there will always be suspicions, not to mention power grabs. Rather than fighting politics with more politics, engineers should focus on technology, and we do have an available technology with P2P systems.
 
In a P2P system, you do not ask some central server to publish www.example.com, you just let the www.example.com publish its own name. The systems use technologies like distributed hash tables to resolve names accross networks. (An example of such systems is PNRP, which shipped in Windows XP/SP2.) No central servers, no point of control, no place for politicians to mess up the Internet. We should go for it immediately, shouldn't we?Well, there is a little problem of security. How can we stop the wrong guys from pretending to be "example.com" as well? So far, there are few solutions.
 
One way to ensure "safe peer-to-peer naming" is to publish names that are self-verifying, e.g. hashes of the public key of the publisher. After resolving the name, it is easy to verify that the other end is the right one. The problem is that, instead of names like "example.com", you get names like "12AE-B456-CD78-9F03". There are applications where that works, but they clearly belong to the category of "finding back someone you already know".
 
Another way is to publish something like "example.com", and to  use some kind of X.509 certificate to verify the address after resolution. The problem there is that one needs to rely on a small set of  "well known certification authorities" to sign the certificate. So, one essentially moves the problem of name ownership from registration in a top-level-domain database to registration in a certificate authority's data base. If one wants differentiated controls, e.g. different authorities for ".com" and ".fr", then one needs to publish the equivalent of a root file, the list of certification authorities that are associated with various top-level domains.
 
I personally believe that a peer-to-peer system would be better than the current hierarchical design. It may be potentially more robust, although teething problems are likely to be interesting. It cannot entirely do away with hierarchies and authorities if we want both "friendly names" and "security". But it does allow for decentralization, and it prevents any kind of "censorship at the root".
 
-- Christian Huitema
 
 
Places that I like...

Christian Huitema

Occupation
Location