A person made of money is worth $1,000,000

After watching the Geico about a man made of money, I asked my wife. “If a person were actually made of $100 bills, how much would that be?” Neither of us knew the answer.

So I set to find out! First, getting the volume of a US bill was easy through Google. According to that source, it’s 6.14 inches × 2.61 inches × 0.0043 inches = 0.06890922 cubic inches.

Now, what is the volume of an adult human? After doing some searches that gave me “volumes of urns for burnt human remans” and “blood volumes”, I happened across a nutrition paper on variations in body volume.

Human Body Density and Fat of an Adult Male Population as Measured by Water Displacement, H. J. KRZYWICKI and K. S. K. CHINN
Am J Clin Nutr April 1967 vol. 20 no. 4 305-310

Okay, now we are in business. The volume of an adult ranges between 42 and 75 L. I figured 60L was a reasonable midpoint. Thus, an adult made of money is worth:

  1. $53,000 (using $1 bills)
  2. $1,060,000 (using $20 bills)
  3. $5,300,000 (using $100 bills)

I used $1M in the title as the most common high-denomination bill.

So you can now tell your friends when you are worth more than if you were made of money.

Posted in Uncategorized | 2 Comments

More thoughts on posters

I wanted to write about what I think works and fails with posters. There was a poster session at Purdue coming up to celebrate our 50th anniversary as a CS department so I figured I should try one more experience—with my students—to see if I still ran into the same problems.

I helped them prepare three posters to show off some of the work that we’ve been doing with tensor computations, MapReduce, and fast algorithms for cliques. The posters were placed throughout our department’s building and conceptually grouped into a few areas, although this grouping wasn’t perfect.

Also, I insisted that we have a handout to accompany each poster. It’s a two page abstract of the work that hits some high points. For two of the posters there wasn’t a formal paper ready yet, so we couldn’t hand out papers and I think this is key.

So what worked?

Well, the localized placement wasn’t ideal. There are regions of our building that aren’t good for circulation and there wasn’t enough of a draw from the posters to get too many people to see them all. But one of the fundamental challenges of posters is establishing a compromise between a high-level overview of the ideas, which is what you want for a causal browsers, and some of the deeper technical points, which is what you’d want for an expert. Getting this tension right in a talk is hard enough, but with the equivalent of 4-8 slides of material, it’s really hard.

So let me take a stab at what I think a poster session should look like at a SIAM conference with the view of addressing the “too-many-minis” problem. This list gets a little rambly, but hey, it’s a blog post!

Bring out the big guns

You need well known people at the poster session, presenting posters,in order to draw in the attendees. This would suggest that each poster area ought to have some type of head-line-like poster — something like a plenary-like session. Or heck! Why not insist the plenary speakers
also give a poster? This will certainly provide a draw and a great chance to interact with the speakers.

Make posters a first class citizen

Rather than getting a poster board (something cheap and easy for the conference organizers). I think poster presentations ought to have a 40-60″ LCD display and a white-board! Mini-symposia speakers get a projector, so why should poster presenters by limited by “flat-page”
technology? 

I’ve seen people pin up ipads to the screen, but that’s … just a hack.

This also helps to fix one of the problems with scaling the depth of presentation. Everyone always should have backup slides with those technical nuggets in case you get asked. Same thing with posters now too!

Organize, organize, organize!

In some of the twitter comments to my entreaty about not having posters, Jason Riedy noted that posters work well in focused groups. I couldn’t agree more! However, this only
works when there is something to get people in the room in the first place (see above).

So, the idea would be to propose groups of themed posters — just like mini-symposia — that will all be co-located. And these groups should have some type of headline presentation or poster to provide a draw.

Have a take-away.

Each poster should have a 2-page handout or a card that will allow people to get more information on the poster.

Mini-symposia -> Mini-poster-conference.

Wouldn’t it be cool if the organizer of a mini-symposia got to present in front of the entire conference telling you about why you should go to their mini-symposia? Given the number of mini-symposia, that wouldn’t scale, but if we could organize mini-posters-conferences, then maybe, just maybe, the organizer could get 5 minutes in something like a plenary session

Video intros!

Wouldn’t it be cool if every poster had a 2-3 minute video intro, posted on YouTube that you could watch before the session? Cool, you say, but who would actually watch them? (They tried showing these during a plenary session at KDD2012, and it wasn’t an overwhelming
success.) So maybe have a few of those “video-display” units with private audio where people could watch them within a session.

In conclusion I think there is a huge opportunity for elevating poster presentations beyond the current standardized session! But please no vanilla posters sessions.

Posted in Uncategorized | Leave a comment

NSF Annual Reports on research.gov

I just filled out my first NSF annual report.  In addition to chatting with some of the Purdue faculty about it, I found this link helpful:

http://blog.computationalcomplexity.org/2009/06/nsf-annual-report.html

in particular, the two examples listed there were great!

Along those lines, it’d be great to see an updated examples for the new research.gov site!

I’d offer to share mine, but I’d like to learn a bit more about the process first.

Posted in Uncategorized | Leave a comment

Please, no posters!

I got back from SIAM CSE 2013 last Friday.  I was there for three days and there were around 20 parallel mini-symposium, each with 4 talks, three times a day.  Eek!  This stream of consciousness post is meant to lay out a few thoughts in a disjointed fashion. Notes in italics are my own reflection.

In total over the five days, there were 274 mini symposia, or 1096 talks.  If we assume that each only had 3.5 talks due to last minute cancellations, that’s still 959 talks. There were 12 contributed talk sessions, each with 6 talks = 72 talks.

The speaker index had 3435 entries (counting repetitions). There were 195 speaker entries for the poster session, 173 entries for contributed talks, 592 organizer entries, and 1560 co-author entries. Of the co-authors, 124 were for posters. There were 178 cancellations on it. This means a very rough estimate of the number of folks speaking was:

3435 – 178 – 592 – 1560 = 1106

Which seems about right as the previous estimate was 959 talks + 72 contributed + (195 – 124) posters = 1102 “things”

That’s a big meeting!

I heard a rumor the attendance was around 1200. Twitter now says 1300! (As of 9:11am EST 2013-03-05, Twitter now says 1400!)  If true, should that be worrisome?
attendance = c*speakers
for some small c ~ 1 means that almost everyone is coming to speak!

Is this a good thing? At least it’s something to keep in mind. This brings me to my next point.

I heard that the organizers realize that having 20 parallel sessions is a problem. Supposedly, this was discussed at the business meeting, which I didn’t attend because it conflicted with the dinner Paul and I organized with our mini-symposia. I understand that increasing posters is one of the possible solutions. Now, I’ve mentioned this privately to a few folks, but let me mention on record how much I dislike posters.

I dislike preparing them, I dislike giving them, and I dislike attending the sessions.

Why?

  1. Posters never fit all the info I want. Maybe that’s okay, slides don’t either.
  2. I can’t layer a story into a poster and keep any technical info (or it’s just too time consuming to do this …); and no one else seems to do any better.
  3. Whenever I walk around a poster session, the good ones are always crowded and because of (2), it’s impossible to follow a poster unless the author is walking you through it.  
  4. Too many people just print out slides and make them a poster.
  5. Posters are hard to travel with. At KDD2012, the conference printed the poster for us! That was nice. Also, the format was standardized so all the posters looked the same.

I could go on. The advantage everyone always gives for posters is that they let you interact with the author. This is an advantage! However, I find it so hard to learn about areas I don’t already understand via posters that I’m not sure it’s as helpful as claimed. Why aren’t people studying this? Seems like the NSF ought to be funding research on science research presentation given how much they fund science! Do they already?

Let me get back to SIAM conference for a second and discuss a major difference between mini-symposia and posters. This has to do with endorsement and topicality.

When I organize a mini-symposium, I try and vet the work in some sense. Thus, I’m endorsing the talks at my mini. Now, I don’t have a senior reputation in the field yet, but I’ve gone to watch entire mini-symposia just because of who organized them, even if I wasn’t entirely interested in the topic. If the goal is to move away from mini-symposia, then it seems like this is a critical feature to retain!

The second is topicality. It’s nice to attend a mini because all the people who work in the same area tend to be there too. This also means it’s a good place to meet students interested in the area.

Neither of these are present in a standard poster session. But they could be!

On a more positive note! I’ll outline some thoughts on what a modern poster session should look like soon.

Posted in Uncategorized | 4 Comments

Thoughts on Social Media usage in Science, Math, CS, etc.

I’m giving an interview soon on the topic of Social Media in science, collaboration, applied math, etc. at the SIAM CSE conference.

I got prompts for the questions ahead of time and thought I’d throw my thoughts up on the blog, post it on twitter, and see if I get any good feedback.

There were 5 prompts:

1. What would you say are the best uses of social media with regard to scientific discussion and information dissemination?

I feel there are two answers to this question that depend on the type of social media. Blogs are a great place to have in-depth technical discussions or exposes.  Twitter is a great place to post short announcements, news, and have quick back-and-forth idea exchanges.  Email is a much better platform for long-term and detailed technical collaboration (yes, email is a type of social media as far as I’m concerned).

So the way I think of social media is basically by way of analogy to the activities at a conference.

blogging <=> presenting a paper

twitter <=> quick conversations with people

email <=> private 1-1 technical discussions

So the best uses of social media are no different than the best uses of socializing at a conference.  They are to learn about what’s happening with the field, what people are working on, and what they are thinking about.  The biggest difference is that you can do all of these things without attending a conference or even knowing the people you interact with!  

There are some other forms of social media that are changing the landscape slightly.  For instance, 

stack-exchange <=> focused workshop of experts

The kind of question I’d put on Stack Exchange is a particular sticking point in a broader research agenda. e.g. here is a technical lemma I’d like to be able to show, but I’m not sure it’s possible. Rather than asking friends at conference, try posting it to stack exchange!

2. What are the advantages of research-related conversations carried out online as opposed to one-on-one conversations, especially at conferences or other scientific events?

I draw a big distinction between research carried out online — such as the polymath projects — and research discussions carried out online.  

I’m not sure that research discussions are best conducted in a public form, unless the goal is to involve others, as in the polymath projects. For instance, if you run into a technical problem with an approach and think someone else might be able to solve it, I think that’d make a great little blog post or stack exchange question.  These are the kinds of things you’d hear about talking with people at conferences but that don’t make it into papers because the approaches usually don’t work.

What we gain from having these materials online is that we keep a record of some of the research products that don’t make final drafts, but might greatly help out others! Wouldn’t it be fantastic to be able to google search and find that your current approach won’t work?

3. Have social media and online networks helped you in finding potential collaborators or researchers doing similar types of work, and if so, how?

Yes, by virtue of the friends-of-friends phenomenon, I learn about new individuals doing great work in areas I’m interested in.  Twitter is especially good for these types of activities.

4. Do you see social media as a potential platform to spread awareness about the value of applied mathematics to the younger generation and/or the general public?

I think social media is a great way to interact with people in nearby disciplines or in related lines of work. Many of the individuals I interact with on Twitter are involved in startups and other types of development jobs.  These folks are phenomenally talented but aren’t involved in the research community. I don’t go to the same conferences that they do, but I find it useful to keep track of what problems they are encountering. While this hasn’t happened yet, I’m hoping that this will lead to some interesting applications or tech transfer opportunities.

One idea that made it’s way into a grant proposal I worked on in terms of broader impact was sparked by a twitter exchange I had. Again, this really isn’t any different than getting ideas from talking to people at conferences, it just removes the physical and temporal co-location requirements.

5. Would you encourage mathematicians to use informal platforms, such as blogs, to share their research work and stimulate discussion and insights from readers, and if so, why?

See above. I’d encourage them to post things that they have abandoned along with their reasons why. I’ve seen many people start tackling problems in the same area and sometimes the ones that succeed manage to sidestep all of the issues that others ran into by making slightly different assumptions.  

I’d encourage people to write up their research informally on blogs.  This is usually how they end up presenting it, but everyone can’t always see the presentation. These informal introductions and understanding of the work are often crucial for inexperienced researchers to gain intuition about a problem, an approach, or an area.  Again, these aspects are usually removed from the paper and I see these platforms as a way to publish that content for others.

Posted in Uncategorized | 2 Comments

FERPA isn’t so bad!

If you haven’t heard of FERPA, check wikipedia.  Okay, done?

Purdue must comply with FERPA. I’ve been trained on it and know that I must keep academic records of students private from their parents, from other students, etc.

Currently, I’m teaching a class of 20 without a TA. I’d like to randomly assign pairs of students to grade each others assignments. I would have thought that this would be a violation of the FERPA rules, as the student records would no longer be private.

Turns out the government thought of that!

Comment: We received several comments supporting the proposed changes to the definition of education records that would exclude from the definition grades on peer-graded papers before they are collected and recorded by a teacher. These commenters expressed appreciation that this revision would be consistent with the U.S. Supreme Court’s decision on peer-graded papers in Owasso Independent School Dist. No. I-011 v. Falvo, 534 U.S. 426 (2002) (Owasso). Two commenters asked how the provision would be applied to the use of group projects and group grading within the classroom.

 

Discussion: The proposed changes to the definition of education
records in paragraph (b)(6) are designed to implement the U.S. Supreme Court’s 2002 decision in Owasso, which held that peer grading does not violate FERPA. As noted in the NPRM, 73 FR 15576, the Court held in Owasso that peer grading does not violate FERPA because “the grades on students’ papers would not be covered under FERPA at least until the teacher has collected them and recorded them in his or her grade book.” 534 U.S. at 436.

 

As suggested by the Supreme Count in Owasso, 534 U.S. at 435, FERPA is not intended to interfere with a teacher’s ability to carry out customary practices, such as group grading of team assignments within the classroom. Just as FERPA does not prevent teachers from allowing students to grade a test or homework assignment of another student or from calling out that grade in class, even though the grade may eventually become an education record, [[Page 74812]] FERPA does not prohibit the discussion of group or individual grades on 

classroom group projects, so long as those individual grades have not yet been recorded by the teacher. The process of assigning grades or grading papers falls outside the definition of education records in FERPA because the grades are not “maintained” by an educational agency or institution at least until the teacher has recorded the grades.

 

From the Department of Education

Awesome! We’ll determine if people want to do this soon 🙂

Posted in Uncategorized | Leave a comment

Networks & Matrices @ Supercomputing

I’ll be collecting some posts while I’m at super computing here … given my background, I’ll be focusing on the network and matrix algorithms.

Direction-optimized BFS by Scott Beamer and the Berkeley ParLab

They use a combined push and pull BFS scheme to accelerate a BFS on a graph.  This is a very good idea — I’m surprised it hasn’t been used before as I thought it was the standard in doing fast shortest path queries.  That is, you grow from both end-points and then see where they meet.  Of course, they call it top-down vs. bottom-up.  The push-pull terminology I’m using comes from gossip algorithms.  To be clear, the idea is to push the frontier out initially, then to pull from the frontier in an interim region of the BFS expansion (when the frontier gets large, unvisited nodes are likely to have a frontier neighbor), then finally, push the BFS expansion out again to handle the stragglers.

Graph 500

The new graph 500 benchmark is out.  Using one million cores, you can do a BFS at 15 trillion edges per second on a graph with one trillion vertices and 32 trillion edges — if I’ve understood correctly.

See Jason Riedy’s proposal for the new shortest path benchmark for more info about a future proposed test in the Graph500 benchmark.

One issue with Graph500 is that algorithmic performance improvements are allowed.

Posted in Uncategorized | Leave a comment

Why MapReduce is successful: it’s the IO!

I’ve done quite a bit of work with Hadoop and MapReduce in the past year or so.  

MapReduce offers an attractive, easy computational model that is data scalable. From the Dean and Ghemawat paper:

Programs written in this functional style are automati-
cally parallelized and executed on a large cluster of com-
modity machines. The run-time system takes care of the
details of partitioning the input data, scheduling the pro-
gram’s execution across a set of machines, handling ma-
chine failures, and managing the required inter-machine
communication. This allows programmers without any
experience with parallel and distributed systems to eas-
ily utilize the resources of a large distributed system.

The idea espoused here is that the simplicity of the programming model makes it easy for people to get-up-and-running with MapReduce and do parallel data processing.  I have no doubt that this is true for what amounts to parallel log processing to build simple subsets.  However, MapReduce should also be able to solve more complicated problems such as Google’s PageRank computation over the web-graph. When I’ve tried MapReduce for these tasks — or other things like computing TSQR factorizations, or even just data conversions — I find that it’s really slow.  We are talking about hours to process a few terabytes of data on 10s of machines.  

I’ve also read and reviewed many academic papers that basically hack around MapReduce’s limited communication and computational framework in order to do something faster.  So this makes me think that MapReduce’s model is far to simple for most tasks.  Given the lengths that people will go to (myself included!) in order to implement algorithms in MapReduce, what problem does it really solve?

I claim it’s the distributed IO problem.

Let me back up and explain why this is a problem.  My educational training was all in standard HPC frameworks.  These are almost exclusively driven by MPI.  In 2004, I wrote a distributed MPI-based computation for Google’s PageRank vector on an MPI cluster that Yahoo! Research Labs had just purchased.  Conceptually, this was actually pretty easy.  I used the PETSc package from Argonne National Labs to do all the parallel communication; all I had to do was to load the graph into their internal storage format.

This single task, loading the data, took about 1000 lines of C code, and this was for reading from one machine and distributing to the entire cluster. It did not include any parallel IO!  In MapReduce, this same task is zero lines of code. For a variety of file formats, this task is already done for you.

So my view on MapReduce is as follows:

The key success was using the map-stage to implement the parallel IO routine.  The whole shuffle and reduce business ought to be a pluggable component.  I would love to be able to “map” data from disk into an in-memory MPI job in order to blast through the problem on a small number of nodes.

I’d love to see this paradigm developed into a hybrid MapReduce/MPI hybrid system that I think could enable some really interesting work.

 

Posted in Uncategorized | 8 Comments

A latex template for hand out cards for SIAM minisymposia

Ever been to a SIAM conference?  If you have, then you know there is a lot going on at them. Previously, I’ve helped write a guide on how to get the most out of a SIAM conference.  If you haven’t, let me give you the basic setup.  A SIAM conference has three types of sessions: plenary, mini-symposium, and contributed. Each is 1-1.5 hours. There are usually only one or two plenary talks at the same time, but there may be dozens of parallel mini-symposium sessions — commonly called “minis”. Each mini has 3-5 talks, which are 25 minutes each. Picking an interesting one is always a challenge.

Paul Constantine, one of my collaborators, came up with an interesting solution. If you are organizing a mini-symposium, take responsibility for advertising it at the conference.  His method of doing so was to prepare small, business card-like handouts and distribute them to people you run into at the conference, in line for coffee, etc.

The info on each card was:

  • (Front) The day-of-week, the time of the mini, the title, the organizer info, and the conference.
  • (Back) A list of the titles and speakers, along with their time-slot.

Once at the conference, you can just hand these out to people who might be interested in attending the mini; and you can give them to the speakers in the mini to distribute them as well!  This is a great way of trying to build attendance for your mini.

To help others prepare these, I created a latex template file to generate a two page business card template.

https://gist.github.com/2936930

This template has info filled from my next mini-symposia.

I think this can be improved! So as a challenge to any interested readers, please fork the gist, make your improvements and post them back here. If you do use it, please post a comment. I may improve it on my own too and that’ll help motivate me to keep in touch about it.

Posted in Uncategorized | 1 Comment

Want to use metis 5.0 with Matlab? Try the new metismex!

The underlying C api for Metis changed considerably in the new version 5.0 that was released within the last year.  I had previously updated the metismex interface to work with an alpha release of metis 5.0 and posted it on github.  However, the final release had a ton of new changes to the API.

So if you need to use metis to partition graphs, methods, or to compute nested dissection orders with it, go checkout my github repo for metis mex.  It’s still a little rough on how to use it, but I hope those who are motived can figure it out with the help of the readme.

Posted in Uncategorized | 2 Comments