Friday, January 30, 2009

STOC PC Meeting : Part I

Over the next week, I'll present some thoughts related to the STOC PC meeting, which wrapped up earlier today. (Please don't send mail to ask about your paper(s). Messages will be sent out sometime next week.)

Today's post will be about basic logistics, which you wouldn't normally think about, unless you suddenly found yourself running the meeting. We started with roughly 150-160 papers left to consider. At 5 minutes on average per paper, you're already at about 13 hours. And if you slip to 6 minutes on average per paper, well, now you're at 15 to 16 hours, which when you consider overhead time (computer setup, lunch, the occasional rest break), you see the difference between that 5 and 6 sharply when you know people have flights to catch at the end of day 2. I think the hardest part of the meeting, from my end, was simply time management.

We started at 9 am on day 1, with plans to break at 6. It became clear that we were falling behind schedule, so we changed the dinner reservation and kept going until 7. We still seemed behind -- and with most of the easier accept and reject decisions behind us -- so we agreed to meet at 8:30 am on day 2. The mumblings and rumblings over dinner of "We're never going to finish on time" were worrisome, and I stayed up late looking over and organizing the remaining papers, so we could go through a bunch very quickly on day 2. I was definitely aided in this goal by the rest of the committee, who had clearly also done additional prepping post-dinner and were ready when they came in the door. In the first hour we made up our deficit in the schedule, and finished our decisions around 2:30, actually leaving us a comfortable amount of time to talk about other organizing issues before people started heading to the airport for their flights around 3.

While it all ended up OK, perhaps I could have done better day 1.

Also, the lesson for future PC members -- be ready to talk when your papers come up, and be prepared to make your points quickly. The seconds add up! I thank the committee for both putting in the extra long hours and for being well prepared, so we could finish on time. And everyone who submitted papers should be thanking them as well.

Another piece of "chair advice" I was told that I can't over-emphasize enough -- have someone whose job it is take notes and record the status of papers as you go. During the meeting I vaguely asked one of the PC members to keep track of things. At the end of the first night, trying to prepare for day 2, I would have been lost without his guide as to where everything stood. As chair (particularly as a single chair -- we should have co-chairs...) I found it difficult to keep things running, enter things into the system, and track things so I could go back and really see where everything was; the notetaker essentially did the tracking job for me, making the whole meeting run much more smoothly, and allowing me to get some sleep before the second day.

And speaking of sleep, I'm very tired, and need at least a day without thinking of anything related to STOC.

Wednesday, January 28, 2009

Graduate Student Folders Part II

We're getting to the point where we have to make our decisions on graduate student admissions. In the theory group at Harvard we got a sizable number of folders that we've cut down to about 25. We'll probably cutting those down to about 10 acceptances.

There's clearly an abundance of good people, who could become solid TCS researchers. And as we went over in my last post on this, it's difficult to choose based on an application who will become successful.

My guilt in having to choose from among these many talented people is assuaged mostly by my concern for the people who we do admit to graduate school, especially in TCS. Where will the jobs be when these people graduate? I've heard the arguments that CS is still expanding, well maybe not in America but elsewhere in the world, and that PhDs can find other work besides being a professor. I've seen that most of the other students I knew as a graduate student in Berkeley have ended up with good positions -- mostly professors -- albeit some after a number of years of postdocs. But still, at the end of the day, when I do the math, it just doesn't add up. I expect to be in my position for at least 30 years (barring the current curse of Harvard CS faculty to be moved into administration); how many professors (or scientists at research labs) can I produce over my lifetime?

The financial meltdown has only raised my concerns that in CS generally (and theory in particular) we've been living through a bubble, and we'll all be shocked when that bubble pops, though we shouldn't be.

I'm not so entirely pessimistic -- CS PhDs, for as far as I can see, should have no trouble getting pretty good jobs. But how many will get the jobs they think they're getting the PhD for? (Note the internal mental bias here -- professors remember mostly the students who did go on to get faculty/research jobs...) We'll see.

Tuesday, January 27, 2009

SIGCOMM PC starting... STOC PC ending...

I guess there's another week until the "official deadline" for SIGCOMM, but the deadline for registration of abstracts/titles has already happened, and I was asked to "bid" on papers I was interested in. It's hard to tell from the limited information, but it seems like there are plenty of interesting papers. Mostly I was looking for papers covering algorithmic engineering or just plain algorithmic problems, the econ/CS interface, and coding (network coding seems to be getting bigger), but there were certainly papers outside those areas that caught my eye. Out of the 300-400 submissions, there were more than 50 that looked interesting enough for me to request them; I was only asked to choose at least 30 to start.

Of course, since the whole process is double-blind, I don't know who submitted the papers, and hopefully the authors won't be able to figure out which reviews I end up writing.

This would all be great fun, but there's of course STOC to finish up. We've rejected about half the papers so far (we're still in the pre-meeting stage). Of course, that's the "easy" part. Now things get harder.

Sunday, January 25, 2009

More on Algorithms and Implementation

In the comments on my last post on algorithms and implementation, someone asked where to publish when you sit in the middle, and how can a graduate student succeed by being in the middle. The comments seemed to suggest that publishing was hard and succeeding as a graduate student this way was largely impossible.

Sadly, and disconcertingly, I find myself largely agreeing. I think the theory community has developed to the point where, almost universally, if you want to earn a reputation as a top theorist in graduate school, you have to publish FOCS/STOC/SODA papers. And really this can't be done by focusing on practical issues. More practical papers can be published in other venues -- WWW, INFOCOM, even SPAA and PODC -- but to the theory community these papers don't carry as much weight. (Note: yes, obviously there are RARE exceptions.)

Strangely, I still think there's a reasonable job market for practically-minded theorists, both in research labs, and in smaller universities, where they'd much prefer a theorist who would interact with other groups (like networking or AI) because they really aren't big enough to have theorists who don't have "outside interests". And I think evidence of being able to work with practitioners helps most any theorist looking for a job -- you just can't build your reputation on it. But it can be hard to get that balance of theory and practice "right" as a graduate student.

I see a fair amount of people who are theoretical but practically minded simply going into other fields, either as a graduate student or after. You can be a "theoretical" networking person or AI person and succeed, and a number of graduate students who could be theorists figure out they'll be better off in these areas. You can start out as a FOCS/STOC/SODA person and then branch out to other areas. Go look at Karger or Kleinberg's DBLP pages and see how much more they've been doing in the recent past outside the confines of FOCS/STOC/SODA compared to earlier in their careers. (Or look at John Byers and Soumen Chakrabarti, who worked primarily in theory as graduate students, but switched to other areas.)

Perhaps this would make a good conference panel someday. (Offhand I'd recommend getting Mikkel Thorup and Muthu Muthukrishnan to participate -- they seem to me like role models for people who want to sit in the middle -- though I can think of many others as well...) Heck, come to think of it, I'm the STOC chair, maybe it would make a good panel coming up real soon. Please let me know if you'd like a panel on this issue -- something like "How to Succeed doing Practical Theory?" -- at STOC.

Friday, January 23, 2009

Algorithms and Implementation

I was leafing (again) through George Varghese's text, Network Algorithmics, and this caught my eye:
In summary, the uncritical use of standard algorithms can miss implementation breakthroughs because of inappropriate measures (e.g., for packets filters such as BPF, the insertion of a new classifier can afford to take more time than search), inappropriate models (e.g., ignoring the effects of cache lines in software or parallelism in hardware), and inappropriate analysis (e.g., order-of-complexity results that hide constant factors crucial in ensuring wire speed forwarding).

Thus another purpose of this book is to persuade implementors that insight into algorithms and the use of fundamental algorithmic techniques such as divide-and-conquer and randomization is important to master.
These words succinctly express feelings I commonly have. Standard algorithms theory often fails to live up to its promise in practice, untethered as it has become from applications. At the same time, a properly designed algorithm can yield immense breakthroughs in performance, so implementors cannot do without them. I believe we need more people willing to sit in the middle, but perhaps first we need more creative ways to bring theory and practice, or theorists and practitioners, together.

Thursday, January 22, 2009

Georgia Tech visit

I had the pleasure of visiting Georgia Tech Wednesday, giving a theory seminar, in this case my survey talk on Deletion Channels.

A fun day filled with conversations about research, kids, and other goings-on. Highlights include getting to catch up with my academic older sister Dana Randall and younger brother Eric Vigoda. And this trip, as it seems with every trip I've taken to Georgia Tech, my host Vijay Vazirani exceeded expectations with his choice of dinner location. (And by now, I have high expectations for meals in Atlanta!)

The department has really grown into a theory powerhouse (and has a very strong networking group as well). For some time it's been on my list of schools undergraduates should think about for graduate school (but might not have realized they should be thinking about). The visit was a strong reminder of why.

Saturday, January 17, 2009

Summer Funding

I should have blogged about this before, but didn't notice over break that Yale had "agreed to pay7.6 million to resolve allegations" regarding how it was handling research grants. My understanding is that this primarily related to summer funding -- the government apparently objects to professors saying they're working 100% on their research over the summer and then doing other stuff. (You know, pesky things like writing letters of recommendation or preparing for next semester's classes.) Anyone with more info, please speak up or link in the comments.

Now, of course, this all just boils down to accounting. The university (nominally) pays 9 months of salary, but strangely, we seem to be expected to also be working on our government-funded research during the academic year as well. One would think there might be some understanding that there's some appropriate blurring of time spent on various tasks. But Harvard, responsive as always when it sees another school getting fined, is changing its process to avoid the same sort of investigation. I don't quite get how this works, but apparently my June summer funding for '09 will be paid out over the spring semester, and the rest of my summer funding will be paid out over the fall semester. I'm assuming that when I eventually have to fill out forms accounting for my time they'll be modified so that research time during the year is counted some way as well.

As one colleague said to me, the new system is stupid, but so was the old system, and sometime down the road, some bureaucrat will decide this is wrong, and we'll switch to a different stupid system. What I find most odd about the new system is it seems at least part of the time I'm getting paid from a government grant for work I will only do in the future. That's OK with the government? I imagine if for whatever reason I later decided not to work over June, they'd pull the money back out of future paychecks. I also suppose this provides some understanding of why grant overhead rates, which I generally complain about, are so ridiculously high; if there's really this much paperwork (and computer accounting software setup) involved in grant management, maybe the overhead really is necessary.

Wednesday, January 14, 2009

Graduate Student Folders

When I'm not looking over STOC papers the next few weeks, I'll also be looking over graduate student admissions folders. (At Harvard, we're small enough that the theory folks just split up all the folders and then meet to discuss them.) Even though I've been doing this for a while, I'd love to hear from all of you -- what should I -- or we as a community -- be looking for in selecting graduate students?

I think that on the theory side we tend to look for pure brainpower over other skills. Arguably moreso in theory than in other subjects, raw intelligence matters most. (I'm not saying I agree with that argument; I'm just saying it's an argument.) I think we also know that grades don't necessarily provide the right information we need to judge this, however, and so we look for evidence that one has the ability to do research in the form of actual research projects accomplished or letters from faculty members we know (and believe) who suggest that there's intellectual power and creativity there.

I do think most faculty do keep in mind the other skills that can make for a successful student. Communication skills, the ability to work well with others, a sense of what they want to accomplish, the mindset to deal with the one-step-forward-two-steps-back nature of research -- all of these also come into play in decision-making. These can be much harder to judge, however, unless they're clearly marked by their absence in some way.

I wonder if the competitive landscape for graduate slots at the top schools causes us to miss out on some promising people. Over at the FemaleScienceProfessor blog, she discusses why she'd rather take the motivated B student over a lot of A students for undergraduate research projects. I'm curious, having no evidence one way or another, do people think we're doing a good job getting B undergraduates who could become great researchers into the pipeline early on enough so they can construct a good application for graduate school? Do we accept enough of these students when they apply?

Monday, January 12, 2009

Deletion Channel Paper

There's a new paper with bounds on the capacity of the deletion channel on the arxiv by Fertonani and Duman. The main results are some new upper bounds which generally are better than the upper bounds found in a paper by Diggavi, Mitzenmacher, and Pfister, which was the first look at the problem in a long while.

The bounds achieved are very nice. They use a simpler approach than our earlier paper, essentially dividing the input or output into fixed sized blocks and numerically calculating the capacity per block. With clever manipulation, they can extend the bounds in various ways -- the nicest result (in my mind) being that, in the limit as the deletion probability d goes to 1, the capacity is upper bounded by 0.49(1-d). [Our previous result gave a bound of 0.79(1-d), and some of my work on lower bounds showed that the capacity is at least 0.11(1-d).]

It seems to me their work raises some interesting questions. One question is the following: at some point, they have to rely on numerical calculations to determine their upper bounds, using the Blahut-Arimoto algorithm to calculate the capacity of a fixed finite channel. Their approach is limited by this calculation; for instance, if one "chops" the input into blocks of 10 bits, then there are 1024 possible input sequences and 2048 possible output sequences for the deletion channel, and essentially they have to compute the capacity of this fixed channel in a "brute force" way, which requires keeping track of the probability that each input results in each output. Bigger blocks means better bounds but bigger calculations. (Technically one can find ways to avoid keeping the entire 1024 by 2048 probability matrix, since it's fairly sparse, but the calculations still grow big very fast.)

Is there any hope for an approximate version of the Blahut-Arimoto algorithm that would give provable upper bounds on the capacity but could take smaller time/space? It would seem there might be other uses for an approximate expectation-maximization procedure, although perhaps generally one wants such good approximations that there's no real benefit in practice. Any thoughts?

Sunday, January 11, 2009

Co-Chairs

Although it's far from over, I'm already clear that one of my recommendations for the future of FOCS/STOC conferences is to start having co-chairs instead of a single chair.

The reasons for having co-chairs that I can see include:
1) Chairing involves a lot of administrative work, often bursts for short intervals. Being able to share that kind of work seems to have a huge upside.
2) Although I don't think it's come up for STOC yet, having a co-chair watch your back can help avoid silly mistakes.
3) Particularly for FOCS/STOC, having co-chairs from two different subareas (one from a more algorithmic side and one from the complexity side) would seem to offer better coverage.

Many conferences in other areas of comparable size to FOCS/STOC have co-chairs (e.g., NSDI, SIGCOMM that I know of) and in my experience as a PC member it's always gone well.

What are the possible downsides of co-chairs?
1) Whenever two people are co-in-charge, they have to work together well. I hope it wouldn't be too hard to find people who could capably co-chair for these conferences.
2) I imagine it can be hard finding people willing to chair; now you'd have to find 2! Although I actually think it might make it easier to get people to accept if they can co-chair -- it will seem like less work, and if one person is more experienced, you could have a somewhat less experienced person co-chair.

Thursday, January 08, 2009

So, how's STOC going?

I suggested that I'd blog about my experiences as STOC PC chair, so I thought I'd give an update.

We're nearing the point where all the "first round" reviews are supposed to be in. All papers had 3 PC members assigned to them, and my goal was to have 3 reviews for each paper a few weeks before the PC meeting. So far, it seems to be going well; most PC members have been putting in reviews. A few who seem to be working with an offline scorecard haven't uploaded what they've completed -- which makes things a bit more difficult for me, naturally :( -- and I expect some papers won't have 3 reviews by the deadline for various reasons, the most likely being that a subreviewer hasn't finished. Subreviewers, please get things in! (And thank you.)

After that point, I hope to reject a slew of papers outright. Recall that I'm trying a new scoring scale. 1 = bottom 50%, 5 = top 10 %, other scores in between. Anything with 3 1's seems to be an easy reject. Arguably, if there's no score of 3 or above in the 3 reviews, a paper should be rejected. To be clear I won't make those decisions unilaterally, but will set up votes for the PC.

The other work at this point is finding more reviewers for papers with widely disparate scores (where the problem does not seem to be "this paper has a bug" that one reviewer noticed but others didn't, but rather a wide disagreement on the quality/importance of the result) or for papers that seem like they'll be on the borderline. Here I'm looking for people both on and off the PC -- so don't be too surprised if you get an e-mail asking for a quick review -- with the emphasis on trying to find relevant experts in the area of the paper.

How many papers will have both a 1 and a 5 score? (Or 2/5s, or 1/4s?) Seems like a statistic for the business meeting roundup. But so far, the answer seems to be more than you might think...

Tuesday, January 06, 2009

Preparing for Class

It's that time of the year where it's time to get my next class ready. I could use your help.

For various odd reasons, in the coming semester I'll be teaching both my undergraduate Algorithms (and Data Structures) class [it's not parenthesized like that in the course catalog, but it probably should be], and my graduate "big data" class, dubbed Algorithms at the End of the Wire. For the graduate class, I cover a variety of topics -- last time, the main areas were search engine algorithms, compression, stream-based algorithms and data structures, and coding (network coding and otherwise), and essentially each class the students are supposed to come in having read a couple of papers. One of the goals of the class is to present algorithmic theory and practice, so I emphasize trying to find practical papers that really make use of underlying theory, or theory papers that really try to impact how a practical problem is thought about. Another goal of the class is to introduce students to the ways of research, by having them read both ancient classics (like, say, Kleinberg's HITS paper) and more recent (e.g. the last year or two) papers.

I'd be interested in any suggestions for good papers for students to read -- particularly recent papers. Don't feel tied to the topics above -- I should be trying to keep up with good papers bridging the algorithm/theory divide in all areas!

Friday, January 02, 2009

Consistency in Scoring

Having just finished serving on the NSDI PC committee, and being hard at work on the STOC committee, one issue that has been on my mind is consistency in scoring papers. Ideally, when submitting to a conference, it shouldn't matter WHO reads your paper; your score should be intrinsic to your work. Of course, we don't expect -- or even necessarily want -- the complete ideal; there will be differences of opinion that occur for natural reasons. Arguably, for example, "visionary" papers are hard to judge and tend to lead to differences of opinion. So you can take solace that if you get widely varying reviews and your paper is rejected, your paper was probably just too visionary. The more cynical among us might instead think that large variances in opinion have to do with how close the reviewer is to the subfield of the paper. In theory, my take is that referees within a subarea generally give higher marks to papers in that subarea. (At NSDI, I found myself with a different impression; if anything, I thought referees working in a subarea tended to be a bit harsher towards papers in that subarea!)

My impression historically has been that there's more variance on the theory side of the world than on the systems side, particularly for the big conferences like FOCS/STOC/SODA. I was amazed, on the whole, at the consistency of NSDI reviews for papers I was on. STOC reviews are coming in, and I'm looking forward to seeing how consistent things are there, but my early impression is that there's not the same consistency, and it's interesting to hypothesize why. [Perhaps, after it's all over, I'll attempt to gather some real quantitative data on the issue, to see if my impressions match reality.]

I'll throw out a straw man for people to discuss (and attack). In theory, all (reasonable) papers start with the same basic grounding -- you have proofs of some theorems. After that point, though, things get a bit hard to judge. How "nice" are your theorems, in terms of technique/mathematical beauty/novelty? How "big" is your improvement over previous work -- for example, is improving an O(log n) competitive ratio to O(log n/log log n) "important" enough? How practical is your solution (hard to say, when nobody actually implements algorithms or gives performance results...)? A lot of this seems subjective, so naturally there are differing opinions.

In networking or other more practical fields, the same issues -- technique/beauty/novelty, quantity of improvement, practicality -- all come into play. But there's a lot more focus on the bottom line of "Could this lead to a substantial improvement for an important real-world problem." Perhaps this consistency in the criterion for what is a good paper leads to more consistency in reviews?