Watch videos with subtitles in your language, upload your videos, create your own subtitles! Click here to learn more on "how to Dotsub"

Vint Cerf - The Internet Today

0 (0 Likes / 0 Dislikes)
[♪classical music♪] Singularity University - Preparing Humanity for Accelerating Technological Change

I wanted to give you an idea of what the internet-- a little bit about where it came from, but you all know most of that.

Internet Today - Vint Cerf

The real question is what is it doing today and where is it headed, and what are all the problems that are not solved? There are a whole bunch of areas where the internet didn't evolve, didn't develop, isn't satisfactory, things like that. And so let me start out-- We're just going to back up here. This is what the predecessor to the internet looked like in December of 1969, almost 40 years ago. There were only four nodes of the net. It was an experimental packet switching system. I was a graduate student at UCLA, and I wrote the software that connected a Sigma 7 computer to the first packet switch of the ARPANET. That computer is in a museum, and some people think I should be there along with it. [laughter] But this was a very successful demonstration and then it expanded fairly quickly.

By 1972 there were perhaps 35 or so nodes of the ARPANET scattered around the United States, and some were in planning for outside the U.S., particularly in Europe. By the time 1977 rolls around, I have moved from Stanford University where Bob Kahn-- Bob was at DARPA, I was at Stanford.

We did our original work in 1973. I came to DARPA in 1976 and ran the internet research program and by late 1977 I thought it would be very important to show that this new TCP/IP technology would actually allow multiple packet switch nets to be interconnected in a uniform way. We had done pairwise connections of these packet switch nets but not all three of them all connected at once. So on November 22 we took a mobile packet radio network, which is here in the San Francisco Bay area with repeaters up on the mountaintops and repeaters in vans and so on in other locations that were essentially radiating packets between 100 and 400 kilobits a second at-- Let's see. It was 1710 to 1850 megahertz up in the L band. We were radiating packets from this packet radio van and then pushing them through a gateway that went into the now expanded ARPANET. The ARPANET had expanded not only across the United States but also had an internal satellite hop, a synchronous satellite hop, to Norway and then by land line down to University College London. Then we had, in addition to that, a packet satellite network which had multiple ground stations which were sharing a single radio frequency on the satellite--Intelsat 4A-- so you radiate packets up and they would be re-radiated down on a different frequency and all the ground stations would pick them up.

So the idea was to transmit data from the SRI--Stanford Research Institute--packet radio van down to USC Information Sciences Institute in Marina del Ray about 400 miles to the south.

But the actual path the packets took was through the gateway all the way across the ARPANET through an internal satellite hop down to Norway and then to University College London, out of the ARPANET through another gateway into the packet satellite net

then up through another synchronous satellite hop through the Intelsat 4A satellite down to Etam, West Virginia, through another gateway back into the ARPANET and then all the way across the United States and down to USC. So the path of the packets was about 100,000 miles, even though the two endpoints were only 400 miles away, and it actually worked. And I remember leaping around saying, "It works! It works!" as if it couldn't possibly have worked. Anyone who understands its software will appreciate that any time software works it's a miracle. So I actually drew this picture and had it framed and sent it to George Heilmeier, who was, at the time, the head of DARPA, and said, "Dear George, I don't think we wasted our money on this." So if you fast-forward another 22 years, you see what the internet has become in 1999. This is actually an automatically generated image of the connectivity of the internet. A man named Bill Cheswick looked at the global routing cables of the internet that year,

and each different color represents a different autonomous system, which is typically a network operated by a distinct network operator. So you get a sense that there were a lot of different operators.

Some of the smaller ones are out here on the periphery; some of the central, very large internet service providers with international connections were more towards the center. And if you looked at the 2009 version of this, it would look very similar. It would be bigger, it might be more colorful, it might be more connected, but it's basically just a very, very large and gigantic collaboration of internet service providers. So there is no central authority; no one is forced to join. The reason they connect is that it's a highly useful system if everyone is connected to it. Although Metcalfe's law says that the value of the network grows by the square of the number of people who are connected to it-- there is some argument about that, but it's at least analog in, which means connecting to systems like this is beneficial, increasingly beneficial, as the system becomes larger and has more participants in it. If we look at some other statistics, what we're seeing here is the number of machines that are visible on the network. There are--on the public internet you can see over 600 million machines. That's as of January this year. The number is undoubtedly bigger at this point--at midyear. That doesn't count all the machines that are on the net, however, because not all the machines are public or visible. As time has gone on and as enterprises have connected to the net and as governments have connected to the net, they've decided that they don't want all of their assets to be visible publicly, and so firewalls are in place to inhibit the visibility of many of the machines that are online, even though they can be reached if you go through the firewalls and authenticate yourself appropriately. Google, for example, has a large number of machines on the net, but only a small fraction of them are visible to you publicly with domain names and IP addresses. The others are on the other side of application layer gateways or firewalls. The number of users is over one and a half billion, and interestingly enough, another phenomenon in the course of all this growth has been the rapid escalation of the number of mobiles that are in use. In fact, 3.5 billion is low; it's probably more like 4 billion now with possibly as many as a billion mobiles going into use every year. Not all of them are new users; many of them are replacements, but maybe half of them could be argued to be new. Those 3.5 to 4 billion mobiles--some of them are internet-enabled, and so for a company like Google, it's actually quite important because for many people in the world the first exposure that they have to the internet is going to be through a mobile, and for some of them the only exposure they will have. So if you're trying to present your products and services to these people, you have to take into account the varying data rates with which the mobiles can get access to the net, the limited display area that may be available, the limited kinds of keyboard or other touch screen interfaces. So it's actually quite a challenge to configure applications that will work through a large gamut of small display screen mobiles and then large format display desktops and laptops. So anyone who is interested in serving this disparate population has to take that into account.

This tells you a little bit about where the users are, and another very important conclusion can be reached from looking at this, and that is that the Asian population will be the dominant user population on the internet. There's no doubt about that at all anymore. Almost 660 million people in Asia--250 million in China as of some time at the beginning of this year-- are online on the internet, and that's with only 17% penetration of the Asian population. So as time goes on and more and more of the Asian population becomes online, the percentage goes up or the absolute numbers of users goes up dramatically into the 2.5 to 3 billion range. The others are as you see it. Europe is the next largest today. One thing that's rather peculiar is that Europe keeps adding countries, and so any prediction about Europe becomes difficult because you don't know what constitutes Europe anymore. North America is now number three on the list at 250 million. It will not get very much bigger. The populations of the U.S. and Canada are not growing dramatically, and so under no circumstance will the North Americans be the dominant force anymore on the internet, even though the internet started there and the heavy penetration, as you can see, is 74%. So what you see in the U.S. may be indicative of what it's like to have a large fraction of the population online but by no means will the North Americans dominate the content of the network or the application space over the long-term. It will just be part of the picture.

This is a chart which is important to motivate only one thing. The only line in here that's important is the one that's going down. That's not the Dow Jones average and it's not the GDP or anything; it's the number of internet addresses that are still left that can be allocated by the regional internet registries who allocate IP address space to the internet service providers. That address space is going to be exhausted sometime around the middle of 2010 or 2011. There are 4.3 billion unique addresses available in IP version 4, which is what you're running. That choice of 32 bits was made in 1977 when the project was only 4 years old, and one year a debate had gone on about how large the address space should be. I was in charge of the project at DARPA at the time, and this one-year debate didn't resolve itself, and I said, "Well, we have to move on. We've got to get something done here." And that big demonstration was on my mind, so I said, "Look, you can't make up your minds." Some people wanted variable length addresses, and the programmers said that was a terrible idea because you'd waste cycles trying to find the fields in the variable length packets. So they didn't like that and we discarded that. The only other two options that were on the table were 32 bits and 128 bits, and at the time--remember, this is an experiment-- I had trouble believing there was any rational argument that would require you to have 128 bits of address space to do an experiment. So I said, "Okay. It's 32 bits, it's 4.3 billion addresses." "It's enough to do an experiment." "And if it works, then we will do the production version." Well, that was a good idea except the experiment didn't end, and so here we are in 2009; we're running out of address space. The solution to that problem is, in fact, to shift to IP version 6, which does have 128 bits of address space. It has 3.4 times 10 to the 38th addresses, a number that only the American Congress can appreciate. [laughter] What I frankly hope is that this new address space will be adopted and implemented in parallel with IP version 4. I have to tell you it's been a very slow process. IPv6 was standardized around 1996, and very, very few people implemented it, and even now it's been a very, very slow process. Google spent the last year and a half implementing IP version 6 across all of its services, and we're not completely done, but almost everything is accessible through IPv6.

The problem is that unlike IP version 4, which started out with a connected core-- first the ARPANET and the National Science Foundation that were together forming the U.S. backbones plus NASA Science Internet and the Department of Energy YESNet-- all four of those were major core networks in the United States-- they were all interconnected with each other, they were all using IP version 4, so when you connected to any of them using IP version 4, you were connected with everything else. IP version 6 is not coming up that way. It's coming up in little spots around the net, and so we have a spotty and possibly disconnected collection of IP version 6 implementations. You can tunnel between the networks, between the IPv6 nets using the IPv4 backbones, but it's a very brittle way of implementing. So I am pushing very hard to try to get as many people as possible to get the IPv6 implementations done and to adopt the liberal interconnection policy in order to have a fully connected IPv6 network. We think--I think anyway--that as the deadline approaches where the v4 address space is going to run out, more and more people will pay attention to the problem.

The trouble is that they will do it in a crisis, and I think engineering in a crisis is a very bad idea, so I continue to preach sermons about introducing IPv6 in parallel with IPv4 as soon as possible. I want to come back to the mobiles for just a second because they represent a very interesting feature to be introduced into the network that wasn't really there before. It has the property that you carry your information window around with you on your hip or in your purse, and it's clearly not just a phone. These are all programmable devices. They're all just sort of waiting for more downloads of software to run new applications. We're very excited about these at Google. In fact, our excitement extended to the point where we actually implemented a new operating system called Android, and we made it available free of charge in source code form. [applause] The whole idea was to put a platform out there that people could treat-- First of all, we hope it's more secure than some of the others are, but second, it's intended to allow people to download new applications and to freely run them. So that's all to the good. I have to say, though, that if you use your mobile to do anything, you're confronted with all of its various constraints and limitations. One thing that has occurred to me is that when you walk into a room like this, your mobile might become aware of devices like this high res projection unit, or you walk into a hotel room and there might be a Web TV keyboard or something. It would be kind of nice if there were standards so the mobile detected these devices and recognized there were other IO things available besides the mobile itself. And so those are the sorts of things that we might look forward to in time. The other thing which we can see at Google is that when people use their mobiles to interrogate the net, to do various searches, they frequently ask questions related to where they are. So this means that geographically indexed information is becoming increasingly valuable. And so people who go to the trouble associating geographical location with information can monetize that. I sort of understood this intellectually, but I didn't really viscerally appreciate it until my family went on a holiday in Page, Arizona.

There's a big lake called Lake Powell, and we decided to go and rent a houseboat and go out on the lake. So while we were driving into this small town of Page, Arizona, someone pointed out that there weren't any grocery stores on the lake and that we had to buy all of our food before we launched the boat or we wouldn't have anything to eat. So that turned the conversation to meals and what did people want to cook. Somebody said, "Why don't we make paella while we're on the boat." And I thought that was a really good idea, but I didn't know where I could find saffron. Where could I find saffron in Page, Arizona?

So I got my BlackBerry out, and I was getting a good data connection, so I went to the Google homepage and I typed, "Saffron; Page, Arizona; grocery store." And I got back three responses with telephone numbers and little maps showing how to get to the store. I clicked on one of the phone numbers, the phone rang, the voice answers, and I said, "May I speak to the spice department, please?" Now remember, this is a small store; it's probably the owner of the store, and he says, "This is the spice department." I said, "Do you have any saffron?" And he says, "I don't know, but I"ll go check." And he runs off and he comes back and he says, "Yes." So I follow the map and I run into the store, and I'm buying $12.99 worth of saffron. And as I was walking out of the store, I realized what had just happened. In real time, as I needed it, I got exactly the information required to get that 0.06 ounces of saffron. I didn't get the answer, "It's 1,500 miles away in New York City." And so it made me a real believer in the utility and value of having good geographically indexed information which allows you to localize the kind of searching that you're doing. Well, as time goes on, I have seen a variety of things--appliances-- connected to the internet that I certainly never expected to see in 1973: things like refrigerators or picture frames or things that look like telephones but they're actually voice over IP computers, but I like the one in the middle. This is a guy in the Netherlands who built an internet-enabled surfboard. I don't know him, but I have this image in my head of the guy sitting on the water while he's waiting for the next wave, thinking, 'If I had a laptop in my surfboard,' 'I could be surfing the internet while I'm waiting to surf.' [laughter] So he actually built a laptop into the surfboard, put a Wi-Fi server back on the rescue shack, and now he sells this as a product. So if you're interested in surfing the internet while you're out doing the other kind of surfing, that's the product for you. So my belief right now is that we are going to see many billions of devices on the net, probably more devices than there are people. Some of them we already see today like Web TVs or personal digital assistants or computer games that people connect to the high speed internet and they talk to each other while they're shooting at each other. But there are some other little surprises. The refrigerator does exist. It's high end and it has a nice liquid crystal display, and you can go surfing the net while you're in the kitchen. You could use it to augment your family communication system, sending blogs to each other or e-mails or instant messages or flaming whatever. But I had this other thought. What would happen if everything you put inside the refrigerator had an RFID chip in it so the refrigerator, if it could sense that, would know what it has inside. So while you're off working it's surfing the net trying to find recipes that it could make with what it knows it has inside, and then when you get home, you'd see a set of choices for dinner coming up. You could extrapolate that. You could be on vacation and you might get an e-mail and it's from your refrigerator, and it says, "I don't know how much milk is left because you put it in there three weeks ago," "and it's going to crawl out on its own." [laughter] Or maybe you're shopping and your mobile goes off; it's your refrigerator again: "Don't forget the marinara sauce. I have everything else I need for spaghetti dinner tonight."

The bad part about these kinds of scenarios-- our Japanese friends have invented an internet-enabled bathroom scale. You step on the scale and it figures out which family member you are based on your weight and sends that information to the doctor to become part of your medical record. And on the surface I think that's okay. The trouble is that the refrigerator is on the same network, [laughter] and so if it gets the same information, it may just put up diet recipes on the display or maybe just refuse to open. [laughter] There are a lot of other things that I don't have time to really go into in detail, but I did want to mention another class of system which is increasingly common on the net and that's sensor networks, and I want to give you an example of one of them. I'm running an IPv6 wireless network in my house. It's used to instrument every room and every five minutes it collects temperature and humidity information and light levels, and that gets recorded every five minutes for every room in the house. My motivation for installing this-- It's a commercial product; this one happens to come from a company called Arch Rock in the Bay area. There's another company called Crossbow which makes similar kinds of equipment. So these are becoming commercially available. This is not a science project in the garage or anything; this is real commercial quality stuff. Part of my motivation for putting this thing together was to have a year's worth of data about the heating and cooling in the house so that at the end of the year I could go over this with a building engineer and talk about whether the heating and cooling system had done its job well and did I get reasonably uniform cooling or reasonably uniform heating or not, and if not, then what changes should be made to the way the system was working. But there is one room in the house which is particularly sensitive. It's the wine cellar. This needs to be kept below 60 degrees Fahrenheit and preferably 60% humidity so the corks don't dry out. So that one has been alarmed, and if the temperature goes above 60 degrees, I get an SMS on my mobile saying that we've breached the 60-degree mark. This actually happened to me when I was visiting Argon National Laboratory last year. Just as I was walking into the building for a 3-day stay, my mobile went off; it was the wine cellar calling: "Your wine has just heated up beyond 60 degrees." And every five minutes for the next three days I got this little warning saying, "Your wine is warming up." Unfortunately, my wife was away on a holiday and so she couldn't go and re-power the cooling system. By the time I did get home, the wine had gotten up to 70 degrees Fahrenheit. It's not the end of the world, but it wasn't good news, either. So I called the Arch Rock guys, and I said, "Hey, do you guys make actuators as well as sensors?" And the answer was yes, that one of the projects is to install the actuator so I can remotely turn the cooling system back on again while I'm away. Now, that obviously has to be access controlled because I don't want the 15-year-old next door to turn my cooling system off while I'm away. I don't need that kind of help. So I thought there's other information that I can get from observing the data that's being collected from the wine cellar. For example, if somebody goes in and turns the lights on, I can actually detect that because the changing light level should be signaled. So I thought maybe that would warn me if someone had gotten in there that I hadn't anticipated. But it doesn't tell me anything about whether any wine had left the wine cellar without my permission. So the next step is to put RFID chips on all of the wine bottles and that way I'll be able to do instant inventory to determine whether the wine is still where it should be. Of course, someone pointed out to me that you could go into the wine cellar and drink a bottle and then leave it there, and so you wouldn't be able to detect that it had been-- So now I need to put sensors in the corks to determine whether there's any wine left. And as long as I'm going to that much trouble, I should put some kind of analysis in there so that I can look at the esters that are developing in the wine to determine whether or not the wine is ready to drink. And then before you open the wine bottle you interrogate the cork, and if it turns out that's the bottle that got to 90 degrees

because the cooling system went off, that's the bottle you give to somebody who doesn't know the difference. [laughter]

It's a very helpful mechanism. Let me just say that the internet, even though its design was done in 1973, its initial roll-out was in 1983, 26 years ago. This year is probably one of the more momentous in terms of changes that are going on. The introduction of IPv6, the introduction of digitally signed entries in the domain name system in order to reduce certain kinds of spoofing, phishing attacks, and things like that or poisoning of the resolver caches is underway in a number of top level domains including--this is out of date now--dot gov in the U.S. is now being digitally signed as is dot mil. Then we're also introducing non Latin characters into the domain name system, and that's turned out to be much more complex and a harder thing to do than you might expect. But it's overdue because many people use languages that are not expressible in Latin characters. So if you speak Russian, you would use Cyrillic; if you speak Arabic or Urdo, you would use Arabic characters or Chinese or Korean and so on. So that's happening this year as well. The modification of the specification for the domain name system

to allow these other scripts to be used is a very big deal, and the consequence of that is that new country code top level domains will have to be introduced because if you happen to be in a country like India, which has 22 official languages, you want to be able to express the idea of India not as simply dot I-N in Latin characters but also Malayalam and many of the other scripts that are used in that country. So we're going to see a growth in top level domains of the internet simply because of the introduction of these new character sets, and there's also some demand simply to expand the number of generic top level domains that exist, things like today you have dot coop and dot travel and dot gov and dot org; people are going to be asking for other top level domains in the generic world as well. So this is quite a big year for the internet. Something else which has become common in the vocabulary is cloud communications or cloud computing. Google is a very big proponent of assembling large numbers of computers into big data centers and then sharing data across all the data centers for resiliency and redundancy, also allowing variable assignment of the computers to various computational problems. So it's a very efficient way of dynamically allocating computing capacity to a variety of different computational tasks. An interesting question for me right now is that as cloud computing systems are developing at Amazon, at Microsoft, IBM, at Google, at other enterprises, one of the questions that comes up in my mind is how you move data back and forth between those clouds of computers. Clearly, something is already possible because you can get to the cloud through the internet. But there's no formalism right now, so if I have data sitting in the Google cloud and for some reason it needs to be moved to the Amazon cloud or the IBM cloud or the Microsoft cloud, there isn't any vocabulary for that. You don't even know how to say, "Move it to this cloud," because you don't know how to express the term cloud; there's no vocabulary. This is not too different from the world of networking in 1973 when Bob Kahn and I began the work of TCP/IP because the networks at the time didn't have a way of referring to another net; there was no vocabulary for that, and the internet protocol was the way of saying, "Send this to a net that isn't you." And now we may need to have a way of saying, "Send this to a cloud that isn't you." By the way, if the data in the cloud is protected with access control, how do I send knowledge of that access control regime to the other cloud? What metadata do I have to send? How should I express it? How should I express the permissions that the parties who have access have to manipulate the data, to look at the data and the like, or share the data? So this is an open area for research. There are little bits and pieces of exploration of this idea happening today, but no coherent standards have emerged yet. Just to emphasize the fact that even though the internet's been around for quite a long time, operationally for 26 years now, there are a lot of things that are not satisfactory. For one thing it's not secure enough and it's not a question of simply encrypting data between the hosts on the net. There needs to be security introduced at all layers in the layered architecture. The classic example is that you could encrypt e-mail going from a source to the destination, and imagine that there's a virus attached to the e-mail. Well, you encrypt everything, it goes through this cast iron pipe through the net, nobody can see the virus, it gets decrypted at the other end and then, of course, does its damage on the other side. So cryptography does not solve that particular security problem. It may solve other problems like exposing higher level protocols to view, which abusers can use to mount certain kinds of attacks against the TCP layer, for example.

If you encrypt that end to end, you can't mount that particular attack. But there are all kinds of authentication issues that have not been addressed. We don't know, for example, statistically how the internet behaves. We have lots of measurements, but it's not like the phone system. Mr. Erlang examined the phone system 100 years ago and said the average telephone call is about three minutes and it has a nice bell shaped distribution curve, but the internet doesn't behave that way because people keep inventing new applications that have different statistics and so there isn't any such thing as the average statistical behavior of a task or an application on the net. And as the network's capacity increases, people come up with new applications that make very different kinds of demands. It's particularly weird if you're using the World Wide Web and you're clicking around by pointing at things, small amounts of data are going back and forth, and all of a sudden you click on something and you get a 100 megabit file transfer taking place. And so you have this wild variation at the edges of the net in terms of the capacity that's required. I'm not going to go through all of these, but a couple I wanted to highlight especially. We've done a very bad job of dealing with mobility in the internet. It's ironic because one of the first nets that I mentioned was this packet radio net which, in fact, allowed mobility but only within that network, and it didn't take into account the idea of moving from one place in the internet to another, maintaining a connection as you go from one internet access point to another because today that changes your IP address, and changing an IP address destroys the TCP connection. I just didn't recognize that at the time as a serious problem, partly because computers of the day filled up several rooms, cost several million dollars, and you didn't put them in your pocket. And so that scenario just didn't filter in. But today it's a very different story, and things do move around and they do need to be accounted for. Multihoming is another example; people that connect to more than one internet service provider for redundancy end up getting two IP addresses, and that creates a real problem because you can't meld together traffic coming from multiple sources independent of the path and have them all be the same IP address; they actually look like distinct paths and that's not good. We don't use broadcast very well. We take a broadcast radio channel like the 802.11 Wi-Fi and we turn it into a point to point link. We don't actually recognize in the protocol sense that when you radiate over the radio, multiple sites may actually hear the same data. Or if you're radiating from a satellite, if you have the right protocols, you could take advantage of the fact that hundreds of thousands or millions of receivers get the same information. We just don't have protocols that do that, even though the underlying communication system is capable of delivering that service. So it's an area where we could do a lot better. So there are lots of other things that I don't want to take too much time on. I briefly mentioned authentication. This is something we do very badly in the net today. We don't have a strong way of assuring you that you've gotten to the correct host on the net. Even if you do a look-up in the domain name system and even if that's digitally signed, you still might want to have a 3-way handshake that confirms which machine you're actually talking to before you carry out a remaining transaction. There are certain instances at higher layers of protocol like e-mail where you're not quite sure where did the e-mail come from. It would be nice if you have a strong way of authenticating the party with whom you are transacting. And so introducing strong authentication at various places in the network architecture would allow us to do things that today are not very feasible. Gosh. There's all these other things that I don't want to spend too much time on. I'm going to come back to delay and disruption tolerance in a little while. I've already mentioned mobile operation; in fact, I'm going to skip over this slide because I'm going to get back to something else in a minute anyway. I'll come back to those topics. One thing that I want to strongly assert is that the notion of copyright as we think of it today is seriously impaired by the way the internet works or the way the World Wide Web works. Copyright says you can't copy this unless you have permission from the copyright holder, and that works well for material which is fixed in a medium; for example, paper or a DVD or a tape of some kind. But in the internet world, especially the World Wide Web, it's all about copying. If you think about the way a browser works, the browser goes out to a host and it copies a file and then it interprets it. So the internet is this giant copying engine and somehow that comes face-to-face with the notion of copyright, collides with it. So we don't have good models right now for adapting the rules of copyright to this environment where it's easy to duplicate and distribute digital content. And somehow we have to find a way to cope with it because people still want to be compensated--some of them-- for their intellectual property. Others may not and in fact, the avalanche of information that has come into the World Wide Web since the web was publicly introduced around 1994 tells you that a lot of people just want to share their information. That's why YouTube, for example, is now getting 20 hours of video per minute uploaded into our systems at Google. I have no idea who's watching all that stuff. It just amazes me that there would be that much new material being pumped into the system. It's like blogs. There's 200 million blogs in the world, and I wonder what the average readership is. It's probably 1.1--the guy that wrote it plus his dog. [laughter] So we have this incredible willingness and desire to pump information into the net for public sharing. Some of the parties would like to be compensated; other people simply want to know that their information was useful. And in some cases it's a very smart move to make the information freely available. Standards, for example, that people have free access to, source code that you want to make available because it's an open source environment, all contribute to the idea of people wanting to share their information without specific compensation. So we have this very broad range of motivations for sharing information, and somehow we have to find a way to accommodate all of them

in some reasonably, legally enforceable way. Tim Berners-Lee talks a lot about the semantic web, and I didn't fully understand what it was he was getting at until I managed to corral him earlier this year in Madrid at the World Wide Web Conference and I asked him, "What is this all about?" And as I understood it, Tim was talking about what he called data linking. What he's basically--the way I'm interpreting this-- he's saying that we have a lot of information in the internet that's contained in databases that are not made visible and discoverable through web crawling because the web crawlers mostly see things as HTML or XML and they index that. Certainly that's what we do at Google. But we don't dive down into the databases; in fact, it's very hard to do that because a lot of the databases' content is only visible because you initiate an SQL query, for example. I can't imagine trying to have a web crawler which finds the database and then sits there for days at a time asking every possible question and it doesn't even know what question to ask because it doesn't know what the contents of the database are. So we need some way of making the contents of the databases more visible. We also have to have some way of interpreting the data so that if we get some of it, we know what it means and what to do with it, how to process it, how to manipulate it, how to share it, how to join information from multiple databases, and that's my understanding of what Tim is trying to do is to make the dark content of the internet visible and usable. Every single day when you sit down and use your laptop or your desktop and you create documents, you create videos, you create complex simulations, you do a lot of things with software, the outcome of which is very complex files, and those complex files don't mean anything in the absence of the software that knows how to interpret them. So imagine for a moment the extreme situation. It's the year 3000, you've just done a Google search, and let's even assume you're using Windows 3000-- I'm sorry. Kevin is looking very unhappy about that. This is a hypothetical, Kevin. You're using Windows 3000 and you just turned up a 1997 PowerPoint file, so now the question is does Windows 3000 know how to interpret that file? And the answer is probably not. But that's not an unfair dig at Microsoft. Even if it was an open source software thing, it's not clear that the stuff would survive and be maintained for over 1,000 years. So the question is how on earth do we continue to make the data that we're accumulating interpretable and meaningful over long periods of time? It's truly embarrassing to go to--let's say--one of these special libraries that have many manuscripts in them--vellum manuscripts from 1,000 A.D.-- that are still very readable; they're beautiful, in fact, and may be illuminated manuscripts. They've lasted 1,000 years and then you walk in there with your little DVD, and the librarian says, "And how long do you expect that to last?" And it's really two questions: How long will the medium survive? And then how long will you have a piece of equipment that can actually read it? And how long will you have software which can take the bits that have been read and interpret them successfully? Well, I've had many discussions with librarians on this point, and I remember sitting in a meeting where one young fellow got up and made the brash statement that this wasn't a problem and that the important information would be upgraded and rewritten in new applications so that it would survive. And the stuff that didn't get rewritten wasn't important and so nobody would care anyway. It took about a half an hour to get the librarians off the ceiling [laughter] because they pointed out--correctly in my view-- that sometimes you don't know what information is important for a hundred years or more. And at NASA in particular, I want to say that this can be very, very important for scientific reasons. Reanalyzing earlier data in the light of new understanding of new models and new theories can be very, very powerful. The problem now is not only interpreting the bits and what values did they mean but under what conditions were these data collected? What metadata needs to be present along with the actual measured material in order to make it possible to continue to understand and use and reanalyze the information? These are hard problems and they deserve some serious attention. If we don't pay attention to them, we end up with what I've been calling a bit rot problem where all the data just eventually evaporates because it isn't useful. And it doesn't take very long for that to happen. It's already happening now. As a small example, if I produce PowerPoint slides on my Mac and then I take the file over to a PC running with Vista, for example, some of the imagery which might have been captured as TIFF files isn't interpreted successfully on the Microsoft PC version. Literally, I hope you believe me that I'm not sitting here trying to skewer Microsoft at all. It's just an example of the difficulty of maintaining the interpretability of different formats over a reasonable period of time. So this is something we really need to attend to. Let me finish by bringing up a status report on a project that was started in 1998 at the Jet Propulsion Laboratory and continues now to be pursued in a larger context now with NASA and MITRE, and Kevin Fall is deeply involved in this activity from Berkeley Intel, among others. It started out as kind of a hypothetical question: If we are, in fact, going to continue to go back to visit Mars-- remember, our first U.S. landings came in 1976 with the Viking Landers and subsequently a number of other missions failed and then the two Rovers got there in 2001 and the Phoenix Lander landed last year in May-- we were thinking at JPL what would happen over longer periods of time as more complex missions were required where multiple spacecraft, multiple devices on the surface, mobile equipment, spacecraft flying in tandem, what kind of communication would be useful for them to have, something richer than a point to point radio link

which is mostly what we've been doing. We've been using this Deep Space Network since 1964 to reach various spacecraft, some of which are very, very far away and the radio signals are very faint. These are 70-meter dishes located in 3 places around the surface of the earth in Madrid, Spain, Canberra, Australia, and Goldstone, California, and there are some other smaller 34-meter dishes that are in the same complex. So these things are being used to communicate with spacecraft that may be in orbit around the planet or may be down on the surface of Mars. The Rovers, interestingly enough, the ones over here to your left, landed over four years ago, and they were originally intended to transmit data straight back to Earth through the Deep Space Network through some high gain antennas. When they turned the radios on, they overheated, and the consequence of that was they had to reduce the duty cycle to avoid the radios hurting themselves. They were only scheduled to transmit 28 kilobits a second, and so the scientists were very unhappy that the data rate would have to be effectively reduced now even more because of this problem of overheating. So the JPL team said, "Well, we have some orbiters around Mars" "that were used to essentially map the surface of Mars" "to help decide where the Rovers should go," "but they still have radios on board, they still have processing on board," "they still have power available from their solar panels." And the Rovers could be reprogrammed because there was another radio-- it was an X band radio--which didn't have the same ability to communicate all the way back to Earth but it could get up to the orbiters. And so they reprogrammed the orbiters and the Rovers to transmit data up where the orbiter would hold on to the data and then when it got to the right place in its orbit, it would transmit the data at 128 kilobits a second back through the Deep Space Net to Earth. So this store and forward operation emulates the way the internet works. This was the direction that we were heading in when we started this idea back in 1998 before the Rovers were in operation and, in fact, that particular solution worked so well that when they landed the Phoenix Lander in May of last year on the north pole of Mars, they implemented this store and forward design because there wasn't any direct path to go from the Phoenix Lander back to Earth. So we had two very visible work examples of store and forward networking in lieu of point to point radio links. So that got very exciting for us and we thought the scientists would appreciate having alternatives to that. Just a brief commercial, by the way: If you have not already gone to Google Earth and clicked on the Saturn icon and picked up Google Mars, you really should try it because it's a very exciting piece of work. I'm not going to show it here, but you can fly around Mars like you can fly around Earth, you can zoom down and get some really high res pictures and steer around in them as if you were on the Rover looking,

and you can zoom in to see what's going on, so it's pretty exciting. When we started this project of making a more elaborate internet design for space applications, we actually thought we could use the TCP/IP protocols. That idea didn't last very long. Part of the reason is that we realized that the speed of light was too slow and the distance between the planets was astronomical, literally. So between Earth and Mars the distance is anywhere from 35 million to 235 million miles--changing as a function of their location in the orbits-- and the delay is anywhere from 3½ minutes to 20 minutes one way and, of course, double that round trip. So a lot of the TCP protocols are interactive. Flow control, for example, is very simple. You basically say to the other guy, "I've run out of buffer space. Please stop sending." Well, if you're just on the other side of planet Earth and it takes a few tens or maybe a hundred milliseconds to get that message over there, the flow control works pretty well. But if it's 20 minutes before the other guy hears you say, "Stop," he's transmitting at full speed, and the data falls on the floor. So flow control didn't work very well. Then there's this problem of celestial motion. The planets have a habit of rotating, and we haven't figured out how to stop that. So if there's something on the surface of the planet and it rotates, you can't talk to it until it comes back around or maybe a satellite has a similar problem.

So space communication is both variably delayed and also disrupted. And as we started working through those problems, we realized that we had a delay and disruption tolerance problem and we had to invent a new suite of protocols, which Kevin has been very helpful at developing called DTN protocols for delay and disruption tolerant protocols. So we've now implemented those and have been doing a lot of terrestrial testing. I want to go back to that for just a second. We did a lot of terrestrial testing. We have now begun doing testing at interplanetary distances, so last October NASA gave us permission to upload the DTN protocols to something called the Deep Impact spacecraft. That was the one that visited a comet a couple of years ago and launched a probe to find out what the interior of the comet looked like. So we uploaded the protocols in October and sent about 300 messages back and forth over October and November, and the protocols worked extremely well. We didn't lose any data, and there were actually a few power failures that we recovered from to show that the protocols were actually quite robust. Since that time, we've been given permission to upload the protocols to the international space station, so we're now doing testing with some equipment that's already on board the space station, and in August or September of this year we're going to reload the protocols again to the Deep Impact spacecraft, which has been renamed EPOXI because it's on its way up to visit a new comet. So we'll have a 3-way interplanetary network to test during the course of the latter months of this year. We're hoping at the end of the first quarter of 2010 to actually have a processor on board Intelsat 14 which can also have these new DTN protocols, so we'd have a 4-node system in order to test that and verify that the protocols work. Our purpose here is primarily to space qualify the protocols because once we demonstrate that they work we can offer them not only to NASA but to all the other spacefaring countries around the world, and if they, in fact, will adopt these protocols. Okay. So the whole idea here is not that we should just build a big interplanetary backbone and hope that somebody will use it. What we're saying is that you standardize the protocol so that every spacecraft that gets launched has these standards on board, and if you want to use previous assets that are out there, they'll be compatible. So it's in the same way that when you plug into the internet you can talk to the 600 million other machines because they're all using TCP/IP, in this case we hope they'll all be using the DTN protocols. And if that is acceptable to the spacefaring nations, then over time we will really grow an interplanetary backbone over many decades, perhaps. So it's pretty exciting to be in on the very beginnings of that but the end of it we won't see--I won't see in my lifetime, but I think many of you will and certainly your children will probably participate in it. So that's all the formal remarks. I realize I went more than my 45 minutes, but I'm happy to do Q&A, and I understand dinner is not too far away from here, so we can spend a few minutes chatting and then we'll have the rest of it over dinner. Is that okay?

All right. Well, thank you for your time. [applause] [♪classical music♪] [♪♪]

Video Details

Duration: 52 minutes and 48 seconds
Country: United States
Language: English
Views: 230
Posted by: davidorban on Oct 6, 2009

During a lecture at Singularity University (, Vint Cerf ('the father of the internet' and Google Chief Internet Evangelist) gives a comprehensive overview of the state of the internet today, and what issues are arising as it continues to evolve. Includes discussions about IPv6, the need for cloud computing standards, the growing Asian prominence online, and the interplanetary internet.

Copyright Notice: Creative Commons Attribution 3.0
Filmed on Canon EOS 5D Mark II and XH-A1 cameras.

Caption and Translate

    Sign In/Register for Dotsub to translate this video.