Friday, August 29, 2008

A Survey on Hash-Based Packet-Processing Algorithms

Sometime way back, Graham Comorde invited me to a DIMACS workshop on Algorithms for Next Generation Networks, where I talked the usual talk about hash tables, Bloom filters, and things. The organizers later had the wherewithal to undertake putting together a book or collected chapters on the topic, and asked us (us being, of course, me and my student Adam Kirsch) for a chapter related to the talk. Adam was just finishing his thesis, and was willing to take on modifying his thesis to form the basis for a survey chapter. I wrote up a section or two, and for good measure, we got George Varghese, who is the world expert -- and author of the book Network Algorithmics -- to join in as well. (In particular, we really wanted George's much more practical perspective on what works and what doesn't!) The result here is a hopefully pleasant light read on current goings-on in the area of hash-based network algorithms, attempting to blend and show the connections between theory and practice.

There's still time for feedback for the final version; please mail me if you have any constructive suggestions.

Monday, August 25, 2008

And Now, a Word from Our Sponsors (Part 2)

Continuing from last post, my trip to CA.

I visited Google and gave the CAM talk there also, where it also seemed to find a receptive crowd. (Some challenging questions arose as to whether cuckoo hashing is the right approach for hashing in software, as opposed to hardware.) Visiting Google is now like visiting a college campus, or maybe a small city. I was greeted at the parking lot by someone who directed me to valet parking. (The valet didn't seem to be there, though, so I parked myself.) I passed the volleyball courts, cafes and other places to eat, and that dinosaur on the way in; saw the gym, laundry room, doctor's office, massage area, and many many coffee-soft drink-snack bar areas; and ate at the amazing cafeteria. (The sushi line was too long, so I had to skip it. However, they did have an entire freezer full of It's It ice cream sandwiches, packaged specially with the Google logo. It's It alone is worth coming to they Bay Area for.)

I find myself without envy for the Google campus and its well-publicized perks. My limited impression was that it's too crowded and busy for me; it seems like it would be a hard place for me to concentrate get work done. I'd surely balloon up in size surrounded by open larders of food, even with the gym. I suppose I'm now just too old to enjoy the place properly, though I imagine it's fantastic for recent graduates!

The last stop on my tour was Yahoo! Research. It's always great to catch up with my "mentor" Andrei Broder, who these days always seems to be running at 110% or more. Their research group (like Google's) seems focused on Web-related algorithmics, machine learning, and this new subfield of computational advertising (I believe Andrei coined the term, in any case I like it). I talked with people about some compression-related problems, and perhaps something further will come of that.

As usual, I find myself wishing these trips could last longer. There's always too much to do and too many people to see on these visits, although that's what makes the trip interesting and fun.

Saturday, August 23, 2008

And Now, a Word from Our Sponsors (Part 1)

I've just returned from a trip to Silicon Valley, where I visited Cisco, Google, and Yahoo -- all of whom have generously given me research money this year, and hence the CA visit. Besides thanking them in this blog, I thought I'd say a few things about the trip and what I saw on my brief stops at the various places. The purpose of these visits is mostly to see if I can get collaborations going, but I admit, some part of it is just giving face-time to the people and companies who have given me research money. They deserve some of my time, and I'd like to encourage them to keep providing funding!

The first stop on my trip was actually a visit to Microsoft Silicon Valley Research Lab (MSRSV). I still haven't figured out how to get research money from Microsoft, but MSRSV "started" when a lot of my colleagues at what had been DEC/Compaq/HP Systems Research Center moved en masse to Microsoft, so I have historical ties and recent collaborations with people there as well. Since my visit last year, MSRSV has moved into a very nice new building. Lots of open spaces and whiteboards everywhere. It seems wonderfully set up for group collaborations. (One very nice space for group-work, though, is a bit too close to a loud and frequently used coffee machine for my taste...) Besides catching up with everybody, Udi Wieder and others indulged me by talking about some of the many variations of trace reconstruction problems that are still open. Hopefully we'll get somewhere on some of them.

Cisco is a huge sprawling collection of buildings, and the visit there itself similarly felt chaotic. They asked me to give two talks, which caused me a bit of stress the week before the trip as I reworked some slides. I ended up talking about my work with Salil Vadhan on Why Simple Hash Functions Work (talk, paper, blog post), and gave a mini-survey covering my work with Adam Kirsch (and part with Udi Wieder) on how to use CAMs to improve cuckoo hashing (talk, various papers on my papers page). [Actually, I have a new survey article covering this stuff I'll put up shortly.] Cisco still seems very, very generally interested in hashing, and applications of hashing in network measurement and monitoring in particular. I had about 40-50 people show up for the first talk, and the second mini-survey talk was broadcast and recorded for Cisco -- about 50 people showed, and apparently more than that were also listening remotely. (Just like when I teach, and my class is taped...) They have a pretty elaborate setup for these recorded technical talks, with a room set up for guests like Steve Wozniak (who was there a couple of weeks ago) rather than me. Besides giving talks there were a lot of high-level discussions about things going on with Cisco where I might be able to collaborate usefully with them.

One thing I noticed at Cisco was a much larger number of women than usual at my talks. Perhaps EE is turning out more female graduates than CS recently, or it's somehow reflective of Cisco's hiring practices.

Visiting Cisco is always very exciting. They're a lot more short-term focused than research labs, but there is this wonderful sense that what you're talking about could become a part of the fundamental network architecture. They keep me away from details, but multiple-choice hash tables and Bloom filters seem to be standard tools in their arsenal now. I'm hoping some form of cuckoo hashing might be as well someday.

Wednesday, August 20, 2008

NSF Expeditions, Complexity

I'm glad to hear of the news that Sanjeev Arora's team at Princeton was one of the winners for the NSF Expeditions grants, working on the general theme of complexity. I think it shows that some of the public relations work our community has been doing, especially with the NSF, is paying off in concrete ways. I also think that more money for theory generally just has to be a good thing -- it's $10 million more for theory than there was before.

That being said, I'll express two concerns:

1) It's odd to see so much money for theory concentrated into such a small geographic area. I realize that was the nature of the Expeditions program, and I don't fault the proposal for it. It just strikes me as strange when the general budget for CS theory is so small to earmark such a large sum of money to this project. It feels like an over-concentration of resources in what's already a small community.

The solution to this, of course, is to get more money for the general CS theory program. And I'm sure a significant chunk of the Expeditions money will go to open DIMACS-style collaborations like workshops and other events, minimizing this concern.

2) I know it's just the nature of theory, but reading over the blurbs about the various funded Expeditions proposals, I can't help but notice that while the others seem to have some sort of statement of clear goals to take things in new directions ("hope to create a new field of computational sustainability", "It aims to create an "open" alternative to mobile ubiquitous computing and communication that can spur innovations, which will have a dramatic impact on the choices users will have in the way their data and information is computed, stored and communicated", "The project aims to develop tools and theories for molecular programming--such as programming languages and compilers--that will enable systematic design and implementation of technological and biotechnological applications that require information processing and decision-making to be embedded within and carried out by chemical processes."), the complexity grant will "hope to better understand the boundary between the tractable and the intractable" and "attack some of the deepest and hardest problems in computer science". Doesn't that sound, I don't know, just like business as usual? My concern is that it's probably important to the theory community long-term for this Expedition to have some major concrete success attributed to it at the end of the day. I have no doubt that good things will come out of this, just based on the people, who already do good work -- but will the output be the sort of things that in retrospect justify this investment?

Tuesday, August 19, 2008

Book by FemaleScienceProfessor

I'm an occasional reader of the blog FemaleScienceProfessor. Often the blog is just about being a science professor, which is interesting, and I can relate to. And sometimes the blog is specifically about being a female science professor, which is also interesting, even if I relate to it less.

Well, FSP has re-worked past blog entries into an on-line book available at lulu.com. I haven't yet bought and downloaded it yet, but from the Table of Contents, it appears to be a particularly worthwhile book for graduate students thinking about a life in academia, and for new faculty. The bulk of the book seems gender-neutral, if that's a concern. I thought I'd give it a free plug.

Sunday, August 17, 2008

SIGCOMM 2008, Part 3

Here are a few more papers from SIGCOMM which should be of particular interest to a more theoretical audience. (Generally, SIGCOMM papers are interesting -- but again, I'm focusing here on papers that I think might be of special interest to theory people. It strikes me that I should, at some point, similarly summarize papers from a major theory conference -- like STOC 2009 -- that would be of special interest to networking people. Of course, SIGCOMM makes that easier, posting abstracts and all the papers online...)

There's a paper on analyzing BitTorrent in the game-theoretic incentive-style analysis sense. It will require a more careful reading from me, as I'm not a full-fledged game-theory/CS type researcher, but it sure looks interesting on the first perusal. I'm naturally biased to the idea that if all this current effort on game theory that is going on in computer science (and particularly in theory) is to have payoff, real-world protocols must be considered and analyzed. So in that sense, this should be a really interesting paper.

While it doesn't appear particularly theoretical (it looks like what I like to joke is a standard networking paper -- lots of pictures and tables, no equations...) this paper on spamming botnets from Microsoft includes Rina Panigrahy (well known for his work in both theory and practice) as one of the co-authors. (I figure Rina had something to do with where I saw the words "entropy reduction", but that's just a guess...)

Saturday, August 16, 2008

SIGCOMM 2008, Part 2

The Accountable Internet Protocol (AIP) paper asks the question: what if we re-architectured the Internet to start with self-certifying addresses, so that there was a layer of accountability -- you'd know where packets are coming from. This paper clearly fits square in the mold of the NSF FIND program. They suggest what a self-certifying architecture would look like, how routing would work with such an architecture, consider potential attacks on the proposed architecture, and discuss whether technology trends would make such an architecture feasible. Certainly interesting, although I admit to high-level unsubstantiated concerns about the specific address architecture they propose. (I suppose as a "kid" I saw too many key-exchange-style protocol papers where a subtle flaw was exposed by a subsequent paper...)

I notice they used a Bloom filter in the paper without even giving a citation. Have Bloom filters now become so successfully widespread in the networking community that no citation is needed? What a nice thought! (Or maybe the authors just ran out of space for the citation.)

Another SIGCOMM paper continues on the path set out by for example Feigenbaum, Papadimitriou, Sami, and Shenker, on using game theory to study the behavior of BGP. They propose a more realistic model (where, for example, Autonomous Systems can be paid for attracting traffic) which, naturally, leads to more negative results in terms of the truth-telling behavior of ASes. (Why is reality so often disappointing this way?)

Friday, August 15, 2008

SIGCOMM 2008, Part 1

About a year ago, I took a look at some SIGCOMM 2007 papers. I won't be attending SIGCOMM this week, unfortunately, so in the interest of self-education, I thought I'd look at some of the papers this year. (The site currently has the papers up. Wow, what a neat idea...)

Before getting into papers, I thought I'd mention that Don Towsley is being given the ACM SIGCOMM award. This is a great choice, and well deserved. And relevant to this site's audience, Don is, in my mind, primarily a theorist. Not a FOCS/STOC theorist to be sure, but a theorist nonetheless. As the award announcement states:
Towsley, who is Distinguished Professor of Computer Science, has made innovative and pioneering contributions in developing foundational modeling and analysis techniques that have enabled a better understanding of some of the most important aspects of today's computer networks, network protocols and networked applications.
Modeling, analysis, understanding... that's what theory is all about. It's people like Don that made networking an open and understanding place for people like me. Thanks! And hooray for Don!

Now for papers. As before, I'll give brief synopses (at the level of the posted abstracts :) ), as I'm just looking at these papers on the fly. The network coding crowd has attacked again with the MIXIT system, which seems to throw together a bunch of ideas in a clever fashion to improve performance on wireless mesh networks. Recall that the basic working definition of network coding is that intermediate nodes do more than store and forward, they can process the packets as they come through (creating encoded packet variations). Here, the basic unit is not taken to be a packet, but a symbol (a small collection of bits), with symbols being packed into a packet. This allows nodes can "take apart" packets; if a whole packet doesn't come in error-free, the node can take symbols that appear to be right with high enough probability (based on information from the physical layer), and re-package (via linear combinations, a la "standard" network coding) and send on only those symbols. Because erroneous symbols might get through, an end-to-end error-correcting rateless code is also used. All of this appears to improve throughput.

The paper seems interesting -- another proof-of-concept paper for network coding in wireless systems, which is where I suspect network coding will be able to make the most inroads over the next few years. I can't tell yet how practical this really seems (without a more detailed reading), but the idea of taking apart packets and sending only the good pieces in combination with multiple coding techniques seems quite nice.

As an aside, the pdf for this paper seems to contain a picture or something that crashes my poor Mac around the 8th or 9th page. Help!

Sunday, August 10, 2008

Security Issues in Cambridge

Harvard is getting new ID cards next year, thanks to an ambitious student who apparently figured out how to forge IDs (including a duplicate ID for University President Drew Faust). Because, really, how could using unencrypted ID numbers on the card, and giving access to undergraduate computer user assistants access to all ID numbers, ever lead to a problem? (The student also apparently made fake state driver licenses as well. Who says Harvard students don't learn useful real-world talents?)

Of course, Harvard isn't the only institution in Cambridge where students can obtain skills in the security area. Some MIT students, working under the famous Ron Rivest (the R of RSA!), figured out several flaws with the new ticket system for the Boston subway system, including ways to rewrite tickets so that they have lots of money available on them. So, naturally, the subway system sued to keep them from talking about the flaws at a security conference.

In both cases, the systems seem easily breakable (well, at the least the Harvard IDs were easy, not sure about the subway) with a card writer that can be obtained for a couple hundred bucks.

Of course, I'm not surprised, based on previous experience.

I wonder when organizations that want secure cards will realize that perhaps they ought to ask the students to try to break the system before they deploy it, rather than wait for them to break it after.

Wednesday, August 06, 2008

On Simulations

I've been coding up some simulations for some Allerton papers that are due all too soon. Of late I've depended far too much on my (now former) student Adam Kirsch to take care of doing our simulations, but he's graduated, and they're needed, so off I go. (Adam's graduating is all to the good, but clearly, I'll be missing him, especially when it comes time to write simulations.)

I'm always amazed at how averse theory people seem to be to doing simulations. I find them useful for generating ideas and thinking about problems in the early stages -- cutting off wrong directions and giving insight into the right ones. If you don't like doing simulations for such purposes, because it doesn't work for you, or you're clever enough to not need data, I have no issue with that -- people work differently.

But I also use simulations as a way of checking my work. If I have a theorem that says that a random process will behave a certain way, and it's possible to code a simulation of the process, I'll check my theorem with code. If the theory and the code don't match up, my assumption is that something is wrong somewhere, and the result is not ready until the two match or I know why they don't. Surprisingly, I think it's about 50-50 as to which I end up finding is wrong, the code or the theorem. (By the same token, if I don't have a theorem, and there's more than one way to simulate a process, I'll code multiple simulations, and make sure they match!)

Of course not all results can be checked by coding something up -- but many can. Particularly in the study of random processes, which is my area. And it's clear to me that many researchers don't check by coding -- because I (or students working with me) have several times found mistakes by doing a simple implementation and finding that we get different numbers out than the paper gives. Generally the mistakes aren't "fatal" -- usually a constant is off somewhere, and often eventually the O notation will take care of it -- but of course it is grating as a reader when something in a paper is plain wrong and you're left to figure out why. When someone doesn't do a check-by-code, I must admit, it lowers my trust of and my overall opinion of the person's work. Sure, people make mistakes (myself included) -- but if you're ignoring a straightforward approach for checking your work, that doesn't inspire confidence.

I imagine some theory people are so out of practice coding they "can't" do a simulation. (But hey, that's not really an excuse, that's what grad students are for...) And others probably just consider it a waste of time. If you really are good enough not to need to check your work this way, more power to you. Me, I'll get back to the (admitted drudgery of) coding my things up and seeing if they work the way I think they should....

Sunday, August 03, 2008

The Job Market, Post Analysis

When I was in graduate school, the academic/research lab job market was pretty soft. By the time I graduated, it was a little better, but not great; you could see things heading upward, though. (Of course, I should point out here the caveat that generally the job market always seems a bit softer in theory than in anything else...)

So, looking back this last year, what is everyone's take on the job market this past year (and the trend for next year)? It seemed to me that while it's not in a completely disastrous state, it's not great, and it's been trending downward the last year or two. The effects of the economy and the long-term exodus of CS majors is not helping in academia, and while there's some availability in research labs, there doesn't seem to be a lot of spare capacity. Google is providing a much-needed outlet, as are (to a lesser extent) Yahoo Research and the new Microsoft Cambridge lab, but it's not clear (to me) how all three will play out long term, or even in the next few years. (If it weren't for sponsored search, I hesitate to think where the theory job market would be today. And if Yahoo ever does get bought out, what will happen to research...?)

There still seem to be jobs available for the best people (or, depending on your point of view, the people with the best buzz), and we still don't seem as saturated as I always hear physics and math are. But the market seems weak, and it's something students should be aware of.

I'd be happy to hear more informed opinions, or disagreeing opinions, or especially insights on the job market from non-theory people...

Friday, August 01, 2008

Problematic Students

One thing they don't warn you about in graduate school -- unless some places have changed their "teaching preparation" classes to be somewhat more useful -- is that, every once in a while, you'll get a student who is, shall we politely say, "problematic". This is the student that takes up 80% of the time you spend interacting with students that semester, and in a negative way.

I've probably seen a few more of these students than the average, because I've allowed my course to be offered through the Harvard extension school. There have definitely been many cases there of students who just enter the class insufficiently prepared, and most of them quickly drop the class. But occasionally there's one who misunderstands and thinks it's our fault (mine and the TAs) that they're failing a class that they may not have had the necessary background for to begin with. (I've recently had to deal with such a student, which brought up this line of thinking.)

For sheer annoyance value, though, my most problematic student was a Harvard student. He or she (let's use "he" from hereon) got a warning from me partway through the semester because he failed to turn in an assignment. I told him he had done fine on the assignments he had turned in, but if he didn't turn in one or more future assignments, his grade would suffer, and he could even fail the class. He said he'd understood.

After the midterm, he did not turn in another problem set. Which would be fine, except that he then made a rather large issue out of failing the class. He insisted on knowing the exact formula I used to assign grades, going over every question on the midterm and final with me, and so on. In short, he refused to take responsibility for the outcome, which is the hallmark of a problematic student.

I'm curious if other teachers have had similar experiences, and what advice they might have in dealing with such students. (My advice -- catch these students early, and document by e-mail what they have been told regarding their performance! And try to spend more time with more positive students.)