## Get Matlab to email you when it’s done running!

This is a guest post by Kyle Kloster, one of the students I’m working with at Purdue University. Kyle has been doing some large, long running experiments on big graphs. He writes:

The random-walk transition matrices for Twitter, LiveJournal, and Friendster can take several minutes just for matlab to load. The actual experiments we carry out (computing columns of the matrix exponential) can take hours, and this week I got tired of checking in every 10 minutes to see if Matlab had finished (or crashed).

We asked around a bit on Twitter to see if anyone knew how to have Matlab email you when something finishes. Thanks to @jimfonseca for pointing us to the perfect Matlab documentation on sending email. (And also, this documentation on how to use gmail to send email in Matlab.) We used this documentation to build the matlabmail function.

MATLABMAIL( recipient, message, subject )

sends the character string stored in 'message' with subject
'subject' to the address in 'recipient', from the email
address stored in the file. This requires that the
sending address is a GMAIL email account.

View matlabmail.m

One troubling part is that you need to tell Matlab the password to the account that you want to use as “sender”. David suggested creating a dummy gmail account that the whole research group could use as a sender — this way we can use a randomly generated password for one shared account to spare each of us having to use personal login info.

Now having one line of code at the end of a matlab file alerts me when a trial is over!

'subjectline here');

It’s even possible to attach files using these commands, in case you’d like Matlab to send you a JPEG containing the results from your latest experiments.

Note that the function works for gmail accounts only, and it won’t work for gmail accounts that have “2-step verification” enabled.

## The data computer I want

This post is pure speculation, but they are notes of something I’ve been thinking about for a while.

Here’s the computer I want to get for “medium data” matrix computations ~50TB.

• Quad 8-core Intel E5-48XX processors + motherboard ($10k) • 1TB memory. ($30k)
• 2x Mushkin Scorpion Deluxe 2TB PCIe Flash Drives ($5k) • 45x 4TB Hard Drives + SATA expanders ($10k); 80TB after RAID 10.
• 2x Nvidia GPU of Xeon-Phi for extra horse power ($5k) as the CPUs are a little light. In total, that’d be about$60k. The machine would be pretty hefty at processing data. It’d run circles around a Hadoop machine with 30 nodes (8 TB / node in 4x disks =  80 TB after replication) for any non-trivial task. The IO bandwidth from the flash cards is about 2GB/second each (byte, not bit). From the IO array, you should be able to get a little bit less. Each puts out about 100 MB/sec.

I’m not sure how to configure the raid array. It seems like RAID-5+ is frowned upon for arrays with large drives as the rebuild time is too long. So this would be RAID-10, I guess. This config is the part I’m least sure about.

Why not Hadoop? Hadoop is great for 1PB+ ETL tasks, parallel grep!, or anything else that looks like a “read enormous file and output small data” task. It’d also be a good way to pipe data into this mini computer where you could work on 50TB chunks and do something real with each of them.

That said, I still think MapReduce would be a great way to program that single machine. Something like Phoenix++ would make it pretty easy to take advantage of all that IO power and optimize it across all the cores and NUMA regions.

Why this system? Jim Demmel is right that communication is the dominant bottleneck of modern systems. This computer is designed to optimize the IO pathway to get as much data from ~50TB of secondary storage to main memory as quickly as possible. What to do once it’s there is up to you… You even have a small 4GB/sec write cache for intermediate results.

I’ve seen a few systems like this. John Canny has one in his BID data program. Guy Blelloch has another at his group.

## SVD on the Netflix matrix (Part 2)

About a week ago, we saw some basic performance stats on computing the SVD of the netflix matrix using Matlab‘s internal routines and the Propack software.

One of the comments suggested trying ipython, scipy, and numpy. So we did!

netflix_svd.py is up on the gist. Please see the previous post for more detail on what this is doing.

Thanks to Yangyang Hou for running these so quickly! In this case, we found that propack returned the correct singular values. Not sure what is going on with the matlab interface there!

# PROPACK
k    seconds
10   36.9860050678
25   78.114607811
50  150.511465788
100 328.731420994
150 500.544333935
200 719.040390968

# ARPACK (A^T A)
k    seconds
10 51.504776001
25 103.392450094
50 182.359881163
100 436.23743701
150 590.644889116
200 821.440295219

# SVDLIBC
k    seconds
10 74.2832891941
25 134.539175034
50 235.082634926
100 477.956938028
150 732.327076912
200 988.136811972
Posted in Uncategorized | 1 Comment

## SVD on the Netflix matrix

Here, we consider three implementations of computing the SVD of the netflix matrix.

Just to recap, the matrix has 17770 rows, 480189 columns, and 100480507 non-zeros. We are also considering the sparse SVD that treats the missing entries as 0, not the matrix-completion SVD that treats the missing ratings as missing. (It’s unfortunate that these two, very different, problems are often confused.)

I’m using Matlab R2011a on a dual Intel Xeon e5-2670 computer with 256GB of RAM. Computing a rank 200 SVD takes about 2.34GB of memory (~760 MB for vectors, ~1.5GB for matrix). Given the way the algorithms work, there is usually a bit of overallocation, so let’s say 3GB of memory is reasonable.

(See Part 2 for info on using ipython and numpy and scipy)

If we just use Matlab’s svds

[U,S,V] = svds(A,k);

Then we get the results:

k = 10 -> Elapsed time is 95.075653 seconds.
k = 25 -> Elapsed time is 151.247499 seconds.
k = 50 -> Elapsed time is 262.132427 seconds.
k = 100 -> Elapsed time is 589.469476 seconds.
k = 150 -> Elapsed time is 983.575712 seconds.
k = 200 -> Elapsed time is 1538.977824 seconds.

What Matlab’s svds routine does internally is compute the extremal eigenvectors of the matrix $\begin{bmatrix} 0 & A \\ A^T & 0 \end{bmatrix}$ using the ARPACK software. There are a few steps in this that exploit parallel computations.

We can alternatively compute the largest eigenvalues and vectors of the matrix $A A^T$, which squares the condition number and is usually a no-no in numerical analysis, but if we are solely interested in performance, this could be better. My adviser called this the “dreaded normal equations.” To do this, we use the Matlab eigs routine with a function

f = @(x) A*(A’*x);

So we don’t need to actually FORM the matrix $A A^T$. Again, this routine uses the ARPACK code via the function “eigs” now

f = @(x) A*(A'*x); m = size(A,1);
[V D]=eigs(f,m,k,'LA',struct('issym',1,'disp',0));

What happens here is that we’d need a bit more post-processing to get the matrix U, and the elements of D are the squares of the singular values.

k = 10 -> Elapsed time is 26.425276 seconds.
k = 25 -> Elapsed time is 47.842963 seconds.
k = 10 -> Elapsed time is 84.456961 seconds.
k = 100 -> Elapsed time is 166.463371 seconds.
k = 150 -> Elapsed time is 250.260487 seconds.
k = 200 -> Elapsed time is 335.170137 seconds.

But it’s much faster!

Finally, there is a customized routine that does what Matlab’s svds routine does, but using the Golub-Kahan bidiagonalization procedure that implicitly is doing the Lanczos procedure on $\begin{bmatrix} 0 & A \\ A^T & 0 \end{bmatrix}$ but without forming that matrix or storing extra work. For this, we turn to the PROPACK software.

Ax=@(x) A*x; Atx=@(x) A'*x; [m n] = size(A);
[UD D VD]=lansvd(Ax,Atx,m,n,k(i),'L');
k = 10 -> Elapsed time is 10.205532 seconds.
k = 25 -> Elapsed time is 26.290835 seconds.
k = 50 -> Elapsed time is 44.544767 seconds.
k = 100 -> Elapsed time is 94.061496 seconds.
k = 150 -> Elapsed time is 152.148860 seconds.
k = 200 -> Elapsed time is 216.596219 seconds.

Faster still! Although, when we were looking at some of the singular values, they didn’t seem to match.

The last 10 singular values returned from ARPACK (either) and PROPACK are

ARPACK   PROPACK
834.8761 799.9475
834.7372 796.6092
834.3883 794.4793
833.5185 792.0514
832.5988 789.1563
831.0431 787.2585
829.8794 783.6587
829.5437 782.2349
828.1831 778.7559
827.0634 776.5972
825.3958 773.8257

This suggests we might need to study the tolerance used in the PROPACK for an updated test.

The testing code is on github: netflix_svd.m.

My tremendous thanks to Yangyang Hou for helping with the experiments in this post and to Burak Bayramli for suggesting things that led to it.

Posted in Uncategorized | Tagged , | 2 Comments

## A call to update the Lake Arrowhead graph!

We’d like to produce a new version of this matrix with information from 2013. Once I have that, I’ll run a a few matrix-based link-prediction techniques that Gene would have enjoyed and report the results. (You can see the preliminary results on the linked page.)

These have already born fruit as they revealed a missing link between Boley and Golub (“A modified method for reconstructing periodic Jacobi matrices” Math. Comp. 42(165):143-150, 1984.)

Some other new links I found by looking at Gene’s CV

• Golub and Benzi
• Golub and Pan (is JY Pan the same?)
• Golub and Funderlic
• Golub and Strakos
• Golub and Chandrasekaran
• Golub and Moler
• Golub and Park (assuming Haesun)
• Golub and Hansen
• Golub and Gragg
• Golub and Starke

Some of these were predicted by the metrics! Awesome!

And these links were missing from the original!

• Golub and Boley
• Golub and Smith (assuming it was L. Smith at the meeting)
• Golub and Stewart (assuming it was G. W. at the meeting)

Key questions

• Who was the “Ng” at the meeting? Gene and Michael Ng authored a paper together, so that’d be another link if it was the same one — but this is a pretty common last name.
• Was it L. Smith at the 1993 meeting?
• Was it Pete Stewart at the meeting?

While I have a copy of Gene’s CV, the key issue is finding all the links between the other 103 people in the graph! So we need some help.

The full list of authors is

1. Golub
2. Wilkinson
3. TChan
4. He
5. Varah
6. Kenney
7. Ashby
8. LeBorne
9. Modersitzki
10. Overton
11. Ernst
12. Borges
13. Kincaid
14. Crevelli
15. Boley
16. Anjos
17. Byers
18. Benzi
19. Kaufman
20. Gu
21. Fierro
22. Nagy
23. Harrod
24. Pan
25. Funderlic
26. Edelman
27. Cullum
28. Strakos
29. Saied
30. Ong
31. Wold
32. VanLoan
33. Chandrasekaran
34. Saunders
35. Bojanczyk
36. Dubrulle
37. Marek
38. Kuo
39. Bai
40. Tong
41. George
42. Moler
43. Gilbert
44. Schreiber
45. Pothen
46. NTrefethen
47. Nachtigal
48. Kahan
49. Varga
50. Young
51. Kagstrom
52. Barlow
53. Widlund
55. OLeary
56. NHigham
57. Boman
58. Bjorck
59. Eisenstat
60. Zha
61. VanHuffel
62. Park
63. Arioli
64. MuntheKaas
65. Ng
66. VanDooren
67. Liu
68. Smith
69. Duff
70. Henrici
71. Tang
72. Reichel
73. Luk
74. Hammarling
75. Szyld
76. Fischer
77. Stewart
78. Bunch
79. Gutknecht
80. Laub
81. Heath
82. Ipsen
83. Greenbaum
84. Ruhe
85. ATrefethen
86. Plemmons
87. Hansen
88. Elden
89. BunseGerstner
90. Gragg
91. Berry
92. Sameh
93. Ammar
94. Warner
95. Davis
96. Meyer
97. Nichols
98. Paige
99. Gill
100. Jessup
101. Mathias
102. Hochbruck
103. Starke
104. Demmel

If you are one of these authors and you’ve co-authored a paper with someone after the original Lake Arrowhead matrix from 1993, can you post a note either on my blog or on Cleve’s blog with any additional co-authors?

Alternatively, if you’d like to email one of us your CV, we’ll do all the work for you! I reported the updates from Golub’s CV already.

Update I created a GitHub repository for these edits. If you are feeling brave, head to the arrowhead-graph repository click on “arrowhead-new” (if the paper occurred in 1993 or afterwards) or “arrowhead-missing” (if it was missing from the original, i.e. a pre 1993 reference). Then click “edit”. You’ll have to sign up for an account (which is good to have anyway!), but then it’ll automatically edit the file and tell me your changes.

Posted in Uncategorized | 5 Comments

## Numerical Linear Algebra in Machine Learning

Notes from the Numerical Linear Algebra in Machine Learning Workshop

Here’s a quick summary of some highlights from my notes about the NLA in ML workshop at ICML. First, it was fantastic in terms of speakers and audience. There were lots of great questions that the audience interjected into the talks to clarify the ideas and all of the talks were about important topics.

See the workshop web-page for more about the ideas behind the workshop. Without further ado, my top 4 highlights:

• Peder Olsen talked about the “box product”, a variation on the Kronecker product that arises in his new, and useful, treatment of matrix calculus. I wish I had these notes for the last time I gave my lecture on matrix calculus! In brief, the box product is the Kronecker product after a perfect shuffle or stride permutation. I’ll give an example since
• Zeyuan Allen Zhu gave an overview of their new “almost linear time” method to solve Ax=b with a Laplacian matrix from a graph. It’s a really neat algorithm and closely exploits the relationship between a positive definite. Look up their paper and spend some time with it if you work with these systems. (My internet is bad at the moment and so I can’t look up the refs for the rest of the post. Things are googleable, I believe.)
• Nicolas Gillis spoke about recent work with NMF. I hadn’t seem the LPs before that people use to find NMFs under this separability condition. These are actually quite similar to some of the problems Paul Constantine looks at. See the Hottopicx paper.
• Michael Mahoney spoke about their 60-page revisitation of the Nystrom method with all sorts of goodies that are important in actually using these methods.

There was a ton of other great gems and I wish I had time to list them all. If you aren’t here, it’s because if I wrote all I wanted to, this note would never be done (or at least not in bounded time)! And thanks again to the organizers for a great session.

## How to use Matlab in a command line script or Makefile

This post is really for me as I managed to forget, misplace, or lose my notes on the last time I did this activity.

If you wish to run a matlab m-file from the command line or from a Makefile, here is the best way I’ve found to do it:

#!/bin/sh
# This version shows additional error output
script=basename "$1" .m # this will return$1 unless it ends with .m
if [ "$script" = "$1" ]; then
cmdterm=,
else
cmdterm=\;
fi;
shift # remove the first argument
args="$@" argsq=echo$args | tr '"' "'"
matlab -nodisplay -r "disp('BEGIN>>'); try, $script$argsq $cmdterm catch me, fprintf(2,getReport(me)); exit(1); end, exit(0)" -nosplash | sed -e '1,/^BEGIN>>$/ d'

This extends the notes from the Informatics Bridging Team in a way to show information on the error.

Here are some quick examples with the example files at the matlabcmd gist

dgleich@recurrent:~/Dropbox/research/2013/06-18-matlab-command$./matlabcmd load_file.m mydata.txt 10 dgleich@recurrent:~/Dropbox/research/2013/06-18-matlab-command$ ./matlabcmd load_file mydata.txt
10
ans =

10

dgleich@recurrent:~/Dropbox/research/2013/06-18-matlab-command$./matlabcmd "randn(5)" # will display the output ans = 0.5377 -1.3077 -1.3499 -0.2050 0.6715 1.8339 -0.4336 3.0349 -0.1241 -1.2075 -2.2588 0.3426 0.7254 1.4897 0.7172 0.8622 3.5784 -0.0631 1.4090 1.6302 0.3188 2.7694 0.7147 1.4172 0.4889 dgleich@recurrent:~/Dropbox/research/2013/06-18-matlab-command$ ./matlabcmd with_error.m # shows the error
Error using +
Matrix dimensions must agree.

Error in with_error (line 4)
c = a + b;
dgleich@recurrent:~/Dropbox/research/2013/06-18-matlab-command$./matlabcmd "randn(5) + randn(6)" # shows the error Error using + Matrix dimensions must agree. Here’s a list of things that could make it better. • Optional output — right now, this happens if you remove “.m” from the first file name, in which case it interprets your command as a matlab statement directly. In this case, we show the output with the “ans = ” intact. • Error handling? I think it does a good job here, but I haven’t tested extensively. • Others? Are there big issues with this command? • Any better solutions out there already? (I looked briefly and didn’t find any!) Posted in Uncategorized | Leave a comment ## Creating high-quality graphics in MATLAB for papers and presentations This blog post is a bit different. Tammy Kolda originally drafted the following material to explain how to generate good looking Matlab figures. It was based on some notes I had sent her. After discussing it further, we decided to write up a mini-tutorial on this and post it to the blog here. We actually wanted the blog format in order to encourage discussion about it! So please, suggest improvements! We’ve also made the post available as a Matlab file, and github repository for further collaboration and improvements: # Creating high-quality graphics in MATLAB for papers and presentations (*) Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. ## A simple figure that is hard to view Here we show a normal image from MATLAB. This example has been adapted from YAGTOM (http://code.google.com/p/yagtom/), an excellent MATLAB resource. f = @(x) x.^2; g = @(x) 5*sin(x)+5; dmn = -pi:0.001:pi; xeq = dmn(abs(f(dmn) - g(dmn)) < 0.002); figure(1); plot(dmn,f(dmn),'b-',dmn,g(dmn),'r--',xeq,f(xeq),'g*'); xlim([-pi pi]); legend('f(x)', 'g(x)', 'f(x)=g(x)', 'Location', 'SouthEast'); xlabel('x'); title('Example Figure'); print('example', '-dpng', '-r300'); %<-Save as PNG with 300 DPI The default MATLAB figure does not render well for papers or slides. For instance, suppose we resize the image to 300 pixels high and display in HTML using the following HTML code: <img src="example.png" height=300> The image renders as shown below and is not easy to read. ## Step 1: Choose parameters (line width, font size, picture size, etc.) There are a few parameters that can be used to modify a figure so that it prints or displays well. In the table below, we give some suggested values for papers and presentations. Typically, some trial and error is needed to find values that work well for a particular scenario. It’s a good idea to test the final version in its final place (e.g., as a figure in a LaTeX report or an image in a PowerPoint presentation) to make sure the sizes are acceptable. Default Paper Presentation 5.6 varies varies 4.2 varies varies 0.5 0.75 1 10 8 14 0.5 1.5 2 6 8 12 % Defaults for this blog post width = 3; % Width in inches height = 3; % Height in inches alw = 0.75; % AxesLineWidth fsz = 11; % Fontsize lw = 1.5; % LineWidth msz = 8; % MarkerSize ## Step 2: Creating a figure with manually modified properties Create a new figure. Set its size via the ‘Position’ setting. These commands assume 100 dpi for the sake of on-screen viewing, but this does not impact the resolution of the saved image. For the current axes, set the default fontsize and axes linewidth (different from the plot linewidth). For plotting the results, manually specify the line width and marker sizes as part of the plot command itself. The font size for the legend, axes lables, and title are inherited from the settings for the current axes. figure(2); pos = get(gcf, 'Position'); set(gcf, 'Position', [pos(1) pos(2) width*100, height*100]); %<- Set size set(gca, 'FontSize', fsz, 'LineWidth', alw); %<- Set properties plot(dmn,f(dmn),'b-',dmn, g(dmn),'r--',xeq,f(xeq),'g*','LineWidth',lw,'MarkerSize',msz); %<- Specify plot properites xlim([-pi pi]); legend('f(x)', 'g(x)', 'f(x)=g(x)', 'Location', 'SouthEast'); xlabel('x'); title('Improved Example Figure'); ## Step 3: Save the figure to a file and view the final results Now that you’ve created this fantastic figure, you want to save it to file. There are two caveats: 1. Depending on the size of figure, MATLAB may or may not choose tick marks to your liking. These can change again when the figure is saved. Therefore, it’s best to manually specify the tick marks so that they are correctly preserved in both display and saving. 2. The size needs to be preserved in the saved (i.e., printed) version. To do this, we have so specify the correct position on the paper. % Set Tick Marks set(gca,'XTick',-3:3); set(gca,'YTick',0:10); % Here we preserve the size of the image when we save it. set(gcf,'InvertHardcopy','on'); set(gcf,'PaperUnits', 'inches'); papersize = get(gcf, 'PaperSize'); left = (papersize(1)- width)/2; bottom = (papersize(2)- height)/2; myfiguresize = [left, bottom, width, height]; set(gcf,'PaperPosition', myfiguresize); % Save the file as PNG print('improvedExample','-dpng','-r300'); ## EPS versus PNG An interesting feature of MATLAB is that the rendering in EPS is not the same as in PNG. To illustrate the point, we save the image as EPS, convert it to PNG, and then show it here. The EPS version is cropped differently. Additionally, the dashed line looks more like the original image in the EPS version than in the PNG version. print('improvedExample','-depsc2','-r300'); if ispc % Use Windows ghostscript call system('gswin64c -o -q -sDEVICE=png256 -dEPSCrop -r300 -oimprovedExample_eps.png improvedExample.eps'); else % Use Unix/OSX ghostscript call system('gs -o -q -sDEVICE=png256 -dEPSCrop -r300 -oimprovedExample_eps.png improvedExample.eps'); end GPL Ghostscript 9.07 (2013-02-14) Copyright (C) 2012 Artifex Software, Inc. All rights reserved. This software comes with NO WARRANTY: see the file PUBLIC for details. Loading NimbusSanL-Regu font from %rom%Resource/Font/NimbusSanL-Regu... 3339336 1916509 8109568 6818233 3 done.  Original Improved Improved EPS->PNG ## Automating the example There is a way to make this process easier, especially if you are generating many figures that will have the same settings. It involves changing Matlab’s default settings for the current session. Note that these changes apply only a per-session basis; if you restart Matlab, these changes are forgotten! Recently, the Undocumented Matlab Blog had a great post about these hidden defaultshttp://undocumentedmatlab.com/blog/getting-default-hg-property-values/. There are many other properties that can potentially be changed as well. % The new defaults will not take effect if there are any open figures. To % use them, we close all figures, and then repeat the first example. close all; % The properties we've been using in the figures set(0,'defaultLineLineWidth',lw); % set the default line width to lw set(0,'defaultLineMarkerSize',msz); % set the default line marker size to msz set(0,'defaultLineLineWidth',lw); % set the default line width to lw set(0,'defaultLineMarkerSize',msz); % set the default line marker size to msz % Set the default Size for display defpos = get(0,'defaultFigurePosition'); set(0,'defaultFigurePosition', [defpos(1) defpos(2) width*100, height*100]); % Set the defaults for saving/printing to a file set(0,'defaultFigureInvertHardcopy','on'); % This is the default anyway set(0,'defaultFigurePaperUnits','inches'); % This is the default anyway defsize = get(gcf, 'PaperSize'); left = (defsize(1)- width)/2; bottom = (defsize(2)- height)/2; defsize = [left, bottom, width, height]; set(0, 'defaultFigurePaperPosition', defsize); % Now we repeat the first example but do not need to include anything % special beyond manually specifying the tick marks. figure(1); clf; plot(dmn,f(dmn),'b-',dmn,g(dmn),'r--',xeq,f(xeq),'g*'); xlim([-pi pi]); legend('f(x)', 'g(x)', 'f(x)=g(x)', 'Location', 'SouthEast'); xlabel('x'); title('Automatic Example Figure'); set(gca,'XTick',-3:3); %<- Still need to manually specific tick marks set(gca,'YTick',0:10); %<- Still need to manually specific tick marks print('autoExample', '-dpng', '-r300'); And here is the saved version rendered via the HTML command <img src="autoExample.png" height=300> Posted in Code | 9 Comments ## A person made of money is worth$1,000,000

After watching the Geico about a man made of money, I asked my wife. “If a person were actually made of $100 bills, how much would that be?” Neither of us knew the answer. So I set to find out! First, getting the volume of a US bill was easy through Google. According to that source, it’s 6.14 inches × 2.61 inches × 0.0043 inches = 0.06890922 cubic inches. Now, what is the volume of an adult human? After doing some searches that gave me “volumes of urns for burnt human remans” and “blood volumes”, I happened across a nutrition paper on variations in body volume. Human Body Density and Fat of an Adult Male Population as Measured by Water Displacement, H. J. KRZYWICKI and K. S. K. CHINN Am J Clin Nutr April 1967 vol. 20 no. 4 305-310 Okay, now we are in business. The volume of an adult ranges between 42 and 75 L. I figured 60L was a reasonable midpoint. Thus, an adult made of money is worth: 1.$53,000 (using $1 bills) 2.$1,060,000 (using $20 bills) 3.$5,300,000 (using $100 bills) I used$1M in the title as the most common high-denomination bill.

So you can now tell your friends when you are worth more than if you were made of money.

Posted in Uncategorized | 2 Comments

## More thoughts on posters

I wanted to write about what I think works and fails with posters. There was a poster session at Purdue coming up to celebrate our 50th anniversary as a CS department so I figured I should try one more experience—with my students—to see if I still ran into the same problems.

I helped them prepare three posters to show off some of the work that we’ve been doing with tensor computations, MapReduce, and fast algorithms for cliques. The posters were placed throughout our department’s building and conceptually grouped into a few areas, although this grouping wasn’t perfect.

Also, I insisted that we have a handout to accompany each poster. It’s a two page abstract of the work that hits some high points. For two of the posters there wasn’t a formal paper ready yet, so we couldn’t hand out papers and I think this is key.

So what worked?

Well, the localized placement wasn’t ideal. There are regions of our building that aren’t good for circulation and there wasn’t enough of a draw from the posters to get too many people to see them all. But one of the fundamental challenges of posters is establishing a compromise between a high-level overview of the ideas, which is what you want for a causal browsers, and some of the deeper technical points, which is what you’d want for an expert. Getting this tension right in a talk is hard enough, but with the equivalent of 4-8 slides of material, it’s really hard.

So let me take a stab at what I think a poster session should look like at a SIAM conference with the view of addressing the “too-many-minis” problem. This list gets a little rambly, but hey, it’s a blog post!

## Bring out the big guns

You need well known people at the poster session, presenting posters,in order to draw in the attendees. This would suggest that each poster area ought to have some type of head-line-like poster — something like a plenary-like session. Or heck! Why not insist the plenary speakers
also give a poster? This will certainly provide a draw and a great chance to interact with the speakers.

## Make posters a first class citizen

Rather than getting a poster board (something cheap and easy for the conference organizers). I think poster presentations ought to have a 40-60″ LCD display and a white-board! Mini-symposia speakers get a projector, so why should poster presenters by limited by “flat-page”
technology?

I’ve seen people pin up ipads to the screen, but that’s … just a hack.

This also helps to fix one of the problems with scaling the depth of presentation. Everyone always should have backup slides with those technical nuggets in case you get asked. Same thing with posters now too!

## Organize, organize, organize!

In some of the twitter comments to my entreaty about not having posters, Jason Riedy noted that posters work well in focused groups. I couldn’t agree more! However, this only
works when there is something to get people in the room in the first place (see above).

So, the idea would be to propose groups of themed posters — just like mini-symposia — that will all be co-located. And these groups should have some type of headline presentation or poster to provide a draw.

## Have a take-away.

Each poster should have a 2-page handout or a card that will allow people to get more information on the poster.

## Mini-symposia -> Mini-poster-conference.

Wouldn’t it be cool if the organizer of a mini-symposia got to present in front of the entire conference telling you about why you should go to their mini-symposia? Given the number of mini-symposia, that wouldn’t scale, but if we could organize mini-posters-conferences, then maybe, just maybe, the organizer could get 5 minutes in something like a plenary session

## Video intros!

Wouldn’t it be cool if every poster had a 2-3 minute video intro, posted on YouTube that you could watch before the session? Cool, you say, but who would actually watch them? (They tried showing these during a plenary session at KDD2012, and it wasn’t an overwhelming
success.) So maybe have a few of those “video-display” units with private audio where people could watch them within a session.

In conclusion I think there is a huge opportunity for elevating poster presentations beyond the current standardized session! But please no vanilla posters sessions.