Fuzzy & Pixelated PDF Copy & Paste from macOS Preview

Too long, don’t want to read.

  • Symptom. Cutting and pasting sections of PDF files from macOS / OSX Preview results in fuzzy and pixelated images where you were expecting vector PDF data to be copied and pasted.
    • Correlated symptom. You will be able to get vector data if you copy and paste an entire page instead of a selected region of a page.
  • Diagnosis. There was an app hijacking the “com.adobe.pdf” uniform type identifier (UTI). This resulted in many equivalent types not being recognized as valid PDF data on the clipboard.
  • Current fix. Identify uninstall the app by looking through the UTI registration database with lsregister. To identify the app …
    •  find the line-number of the lsregister -dump that has the currently active mapping for “com.adobe.pdf”
    • /System/Library/Frameworks/CoreServices.framework/Frameworks/LaunchServices.framework/Support/\
      lsregister -dump | grep -n -A 2 "uti:           com.adobe.pdf" 
    • For me, this showed (with some wordpress induced spacing issues)
      36795: uti: com.adobe.pdf 
      36796- description: Portable Document Format (PDF)
      36797- flags: imported inactive core apple-internal trusted 
      --
      49934: uti: com.adobe.pdf
      49935- description: PDF
      49936- flags: exported active trusted # this is the problem!  
      --
      49990: uti: com.adobe.pdf
      49991- description: PDF
      49992- flags: imported inactive trusted 
      --
      74076: uti: com.adobe.pdf
      74077- description: (null)
      74078- flags: imported inactive trusted 
      --
      77680: uti: com.adobe.pdf
      77681- description: PDF Data
      77682- flags: imported inactive untrusted
    • This indicates it’s line 49934 that has the active mapping, whereas we wanted it to be on line 36795, which is the internal apple mapping with all the appropriate metadata.  So now we need to guess at the file that this implicates. Here is my current way to do this. This isn’t entirely reliable, but I think it should work in most cases.
      # This command shows the last 10 "apps" before the offending 
      # uti type definition, the last one should be the app 
      # that is causing the problem. 
      /System/Library/Frameworks/CoreServices.framework/Frameworks/LaunchServices.framework/Support/\
      lsregister -dump | head -n 36795 | grep "CFBundleExecutable = " | tail -n 10
      
       CFBundleExecutable = "Pass Viewer";
       CFBundleExecutable = "Google Chrome Helper";
       CFBundleExecutable = "Photo Library Migration Utility";
       CFBundleExecutable = GarageBand;
       CFBundleExecutable = PIPAgent;
       CFBundleExecutable = PowerChime;
       CFBundleExecutable = "Problem Reporter";
       CFBundleExecutable = gephi;
       CFBundleExecutable = rcd;
       CFBundleExecutable = Notability;
    • So the offending app is likely to be Notability.
    • I uninstalled Notability and everything resumed it’s normal behavior regarding copy & paste after I reset the UTI database.
      /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/LaunchServices.framework/Versions/A/Support/\
      lsregister -kill -seed -r
    • (Then I reinstalled Notability and everything was bad again!)

Continue reading

Posted in Uncategorized | Leave a comment

PageRank beyond the Web

I just completed a survey article about uses of PageRank outside of web-ranking. The paper has been submitted to a journal, and I also posted the manuscript to arXiv.

David F. Gleich. PageRank beyond the Web. arXiv. cs.SI:1407.5107, 2014.

The goal with this paper was to enumerate the discuss how frequently PageRank is used for applications broadly throughout science and engineering. For instance, I discuss:

  • PageRank in Biology – GeneRank, ProteinRank, IsoRank
  • PageRank in Neuroscience
  • PageRank in Chemistry
  • PageRank in Engineered systems – MonitorRank
  • PageRank in Mathematical systems
  • PageRank in Sports
  • PageRank in Literature – BookRank
  • PageRank in Bibliometrics – TimedPageRank, CiteRank, AuthorRank, PopRank
  • PageRank in Knowledge systems – FactRank, ObjectRank, FolkRank
  • PageRank in Recommender systems – ItemRank
  • PageRank in Social networks – BuddyRank, TwitterRank
  • PageRank in the web & Wikipedia – TrustRank, BadRank, VisualRank

In order to make this a tractable survey, I had to strictly restrict the definition of PageRank I used (or the literature would have easily doubled!) The one I use is to define PageRank as the solution of the linear system

(I - \alpha P) x = (1-\alpha) v

where P is a column stochastic transition matrix between the nodes of a graph, \alpha is the teleportation parameter in PageRank, and v is the teleportation distribution. This system yields the stationary distribution of a random process that, 

  • with probability \alpha, follows a transition according to the matrix P
  • with probability 1-\alpha, jumps according to the distribution v.

This system, with application specific constructions of P and v, suffices to reproduce the PageRank rank information in all of the above applications and makes it easy to identify similarities and differences. For instance, when v is a sparse vector, then the resulting PageRank scores highlight a localized region in a large graph or system. (In the past, this has been called a personalized PageRank system, but I think localized PageRank is a better term for its wide, modern usage.) When v is nearly uniform, then the resulting PageRank scores give a global measure of the importance of that feature in the graph or system. 

But there’s another system that is even more common, which we call pseudo-PageRank (inspired by Boldi and Vigna’s “PseudoRank”). The only difference is that P need not be column stochastic and that v need not be a distribution, but there are still a few restrictions on them. This modification yields an incredibly flexible system that is mathematically equivalent to the PageRank construction above. 

Anyway, I won’t spoil the entire paper — take a read and please post a comment or send an email if you spot anything that looks like it could be corrected!

Some random notes

  1. I spoke for a while with Sebastiano Vigna about whether to call my definition PseudoRank (as they had), but because there is a subtle, but important, difference, I decided on pseudo-PageRank.
  2. New papers kept appearing as I was putting this together. For instance, the recent Wikipedia paper by Eom, and the MonitorRank paper.
  3. It was interesting to read papers that didn’t easily let me tell if they were talking about directed or undirected applications! This makes a big difference for PageRank, although the math works out just fine either way, the solution with an undirected graph has some properties that make it worth knowing.
  4. I wish I had more time to talk about the amazing connections between localized PageRank and conductances — but that would lead down a large rabbit hole.
Posted in Uncategorized | Leave a comment

Get Matlab to email you when it’s done running!

This is a guest post by Kyle Kloster, one of the students I’m working with at Purdue University. Kyle has been doing some large, long running experiments on big graphs. He writes:

The random-walk transition matrices for Twitter, LiveJournal, and Friendster can take several minutes just for matlab to load. The actual experiments we carry out (computing columns of the matrix exponential) can take hours, and this week I got tired of checking in every 10 minutes to see if Matlab had finished (or crashed).

We asked around a bit on Twitter to see if anyone knew how to have Matlab email you when something finishes. Thanks to @jimfonseca for pointing us to the perfect Matlab documentation on sending email. (And also, this documentation on how to use gmail to send email in Matlab.) We used this documentation to build the matlabmail function.

MATLABMAIL( recipient, message, subject )

  sends the character string stored in 'message' with subject
  'subject' to the address in 'recipient', from the email 
  address stored in the file. This requires that the 
  sending address is a GMAIL email account.

View matlabmail.m

One troubling part is that you need to tell Matlab the password to the account that you want to use as “sender”. David suggested creating a dummy gmail account that the whole research group could use as a sender — this way we can use a randomly generated password for one shared account to spare each of us having to use personal login info.

Now having one line of code at the end of a matlab file alerts me when a trial is over!

matlabmail('XXXXXX@purdue.edu', 'twitter experiment done', ...
           'subjectline here');

It’s even possible to attach files using these commands, in case you’d like Matlab to send you a JPEG containing the results from your latest experiments.

Note that the function works for gmail accounts only, and it won’t work for gmail accounts that have “2-step verification” enabled.

Posted in Uncategorized | 2 Comments

The data computer I want

This post is pure speculation, but they are notes of something I’ve been thinking about for a while.

Here’s the computer I want to get for “medium data” matrix computations ~50TB.

  • Quad 8-core Intel E5-48XX processors + motherboard ($10k)
  • 1TB memory. ($30k)
  • 2x Mushkin Scorpion Deluxe 2TB PCIe Flash Drives ($5k)
  • 45x 4TB Hard Drives + SATA expanders ($10k); 80TB after RAID 10.
  • 2x Nvidia GPU of Xeon-Phi for extra horse power ($5k) as the CPUs are a little light.

In total, that’d be about $60k. The machine would be pretty hefty at processing data. It’d run circles around a Hadoop machine with 30 nodes (8 TB / node in 4x disks =  80 TB after replication) for any non-trivial task. The IO bandwidth from the flash cards is about 2GB/second each (byte, not bit). From the IO array, you should be able to get a little bit less. Each puts out about 100 MB/sec.

I’m not sure how to configure the raid array. It seems like RAID-5+ is frowned upon for arrays with large drives as the rebuild time is too long. So this would be RAID-10, I guess. This config is the part I’m least sure about.

Why not Hadoop? Hadoop is great for 1PB+ ETL tasks, parallel grep!, or anything else that looks like a “read enormous file and output small data” task. It’d also be a good way to pipe data into this mini computer where you could work on 50TB chunks and do something real with each of them.

That said, I still think MapReduce would be a great way to program that single machine. Something like Phoenix++ would make it pretty easy to take advantage of all that IO power and optimize it across all the cores and NUMA regions.

Why this system? Jim Demmel is right that communication is the dominant bottleneck of modern systems. This computer is designed to optimize the IO pathway to get as much data from ~50TB of secondary storage to main memory as quickly as possible. What to do once it’s there is up to you… You even have a small 4GB/sec write cache for intermediate results.

I’ve seen a few systems like this. John Canny has one in his BID data program. Guy Blelloch has another at his group.

Posted in Uncategorized | Leave a comment

SVD on the Netflix matrix (Part 2)

About a week ago, we saw some basic performance stats on computing the SVD of the netflix matrix using Matlab‘s internal routines and the Propack software.

One of the comments suggested trying ipython, scipy, and numpy. So we did!

netflix_svd.py is up on the gist. Please see the previous post for more detail on what this is doing.

Thanks to Yangyang Hou for running these so quickly! In this case, we found that propack returned the correct singular values. Not sure what is going on with the matlab interface there!

# PROPACK
k    seconds
10   36.9860050678
25   78.114607811
50  150.511465788
100 328.731420994
150 500.544333935
200 719.040390968

# ARPACK (A^T A)
k    seconds
10 51.504776001
25 103.392450094
50 182.359881163
100 436.23743701
150 590.644889116
200 821.440295219

# SVDLIBC
k    seconds
10 74.2832891941
25 134.539175034
50 235.082634926
100 477.956938028
150 732.327076912
200 988.136811972
Posted in Uncategorized | 1 Comment

SVD on the Netflix matrix

Here, we consider three implementations of computing the SVD of the netflix matrix.

Just to recap, the matrix has 17770 rows, 480189 columns, and 100480507 non-zeros. We are also considering the sparse SVD that treats the missing entries as 0, not the matrix-completion SVD that treats the missing ratings as missing. (It’s unfortunate that these two, very different, problems are often confused.)

I’m using Matlab R2011a on a dual Intel Xeon e5-2670 computer with 256GB of RAM. Computing a rank 200 SVD takes about 2.34GB of memory (~760 MB for vectors, ~1.5GB for matrix). Given the way the algorithms work, there is usually a bit of overallocation, so let’s say 3GB of memory is reasonable.

(See Part 2 for info on using ipython and numpy and scipy)

If we just use Matlab’s svds

[U,S,V] = svds(A,k);

Then we get the results:

k = 10 -> Elapsed time is 95.075653 seconds.
k = 25 -> Elapsed time is 151.247499 seconds.
k = 50 -> Elapsed time is 262.132427 seconds.
k = 100 -> Elapsed time is 589.469476 seconds.
k = 150 -> Elapsed time is 983.575712 seconds.
k = 200 -> Elapsed time is 1538.977824 seconds.

What Matlab’s svds routine does internally is compute the extremal eigenvectors of the matrix \begin{bmatrix} 0 & A \\ A^T & 0 \end{bmatrix} using the ARPACK software. There are a few steps in this that exploit parallel computations.

We can alternatively compute the largest eigenvalues and vectors of the matrix A A^T, which squares the condition number and is usually a no-no in numerical analysis, but if we are solely interested in performance, this could be better. My adviser called this the “dreaded normal equations.” To do this, we use the Matlab eigs routine with a function

f = @(x) A*(A’*x);

So we don’t need to actually FORM the matrix A A^T. Again, this routine uses the ARPACK code via the function “eigs” now

f = @(x) A*(A'*x); m = size(A,1);
[V D]=eigs(f,m,k,'LA',struct('issym',1,'disp',0));

What happens here is that we’d need a bit more post-processing to get the matrix U, and the elements of D are the squares of the singular values.

k = 10 -> Elapsed time is 26.425276 seconds.
k = 25 -> Elapsed time is 47.842963 seconds.
k = 10 -> Elapsed time is 84.456961 seconds.
k = 100 -> Elapsed time is 166.463371 seconds.
k = 150 -> Elapsed time is 250.260487 seconds.
k = 200 -> Elapsed time is 335.170137 seconds.

But it’s much faster!

Finally, there is a customized routine that does what Matlab’s svds routine does, but using the Golub-Kahan bidiagonalization procedure that implicitly is doing the Lanczos procedure on \begin{bmatrix} 0 & A \\ A^T & 0 \end{bmatrix} but without forming that matrix or storing extra work. For this, we turn to the PROPACK software.

Ax=@(x) A*x; Atx=@(x) A'*x; [m n] = size(A);
[UD D VD]=lansvd(Ax,Atx,m,n,k(i),'L');
k = 10 -> Elapsed time is 10.205532 seconds.
k = 25 -> Elapsed time is 26.290835 seconds.
k = 50 -> Elapsed time is 44.544767 seconds.
k = 100 -> Elapsed time is 94.061496 seconds.
k = 150 -> Elapsed time is 152.148860 seconds.
k = 200 -> Elapsed time is 216.596219 seconds.

Faster still! Although, when we were looking at some of the singular values, they didn’t seem to match.

The last 10 singular values returned from ARPACK (either) and PROPACK are

 ARPACK   PROPACK
 834.8761 799.9475
 834.7372 796.6092
 834.3883 794.4793
 833.5185 792.0514
 832.5988 789.1563
 831.0431 787.2585
 829.8794 783.6587
 829.5437 782.2349
 828.1831 778.7559
 827.0634 776.5972
 825.3958 773.8257

This suggests we might need to study the tolerance used in the PROPACK for an updated test.

The testing code is on github: netflix_svd.m.

My tremendous thanks to Yangyang Hou for helping with the experiments in this post and to Burak Bayramli for suggesting things that led to it.

Posted in Uncategorized | Tagged , | 2 Comments

A call to update the Lake Arrowhead graph!

Please read Cleve Moler’s blog post about the Lake Arrowhead matrix too and look at the comments. That’s what got this all started!

We’d like to produce a new version of this matrix with information from 2013. Once I have that, I’ll run a a few matrix-based link-prediction techniques that Gene would have enjoyed and report the results. (You can see the preliminary results on the linked page.)

These have already born fruit as they revealed a missing link between Boley and Golub (“A modified method for reconstructing periodic Jacobi matrices” Math. Comp. 42(165):143-150, 1984.)

Some other new links I found by looking at Gene’s CV

  • Golub and Benzi
  • Golub and Pan (is JY Pan the same?)
  • Golub and Funderlic
  • Golub and Strakos
  • Golub and Chandrasekaran
  • Golub and Bai (was this Bai Zhaojun?)
  • Golub and Moler
  • Golub and Park (assuming Haesun)
  • Golub and Hansen
  • Golub and Gragg
  • Golub and Starke

Some of these were predicted by the metrics! Awesome!

And these links were missing from the original!

  • Golub and Boley
  • Golub and Smith (assuming it was L. Smith at the meeting)
  • Golub and Stewart (assuming it was G. W. at the meeting)

Key questions

  • Who was the “Ng” at the meeting? Gene and Michael Ng authored a paper together, so that’d be another link if it was the same one — but this is a pretty common last name.
  • Was it L. Smith at the 1993 meeting?
  • Was it Pete Stewart at the meeting?

While I have a copy of Gene’s CV, the key issue is finding all the links between the other 103 people in the graph! So we need some help.

The full list of authors is

  1. Golub
  2. Wilkinson
  3. TChan
  4. He
  5. Varah
  6. Kenney
  7. Ashby
  8. LeBorne
  9. Modersitzki
  10. Overton
  11. Ernst
  12. Borges
  13. Kincaid
  14. Crevelli
  15. Boley
  16. Anjos
  17. Byers
  18. Benzi
  19. Kaufman
  20. Gu
  21. Fierro
  22. Nagy
  23. Harrod
  24. Pan
  25. Funderlic
  26. Edelman
  27. Cullum
  28. Strakos
  29. Saied
  30. Ong
  31. Wold
  32. VanLoan
  33. Chandrasekaran
  34. Saunders
  35. Bojanczyk
  36. Dubrulle
  37. Marek
  38. Kuo
  39. Bai
  40. Tong
  41. George
  42. Moler
  43. Gilbert
  44. Schreiber
  45. Pothen
  46. NTrefethen
  47. Nachtigal
  48. Kahan
  49. Varga
  50. Young
  51. Kagstrom
  52. Barlow
  53. Widlund
  54. Bjorstad
  55. OLeary
  56. NHigham
  57. Boman
  58. Bjorck
  59. Eisenstat
  60. Zha
  61. VanHuffel
  62. Park
  63. Arioli
  64. MuntheKaas
  65. Ng
  66. VanDooren
  67. Liu
  68. Smith
  69. Duff
  70. Henrici
  71. Tang
  72. Reichel
  73. Luk
  74. Hammarling
  75. Szyld
  76. Fischer
  77. Stewart
  78. Bunch
  79. Gutknecht
  80. Laub
  81. Heath
  82. Ipsen
  83. Greenbaum
  84. Ruhe
  85. ATrefethen
  86. Plemmons
  87. Hansen
  88. Elden
  89. BunseGerstner
  90. Gragg
  91. Berry
  92. Sameh
  93. Ammar
  94. Warner
  95. Davis
  96. Meyer
  97. Nichols
  98. Paige
  99. Gill
  100. Jessup
  101. Mathias
  102. Hochbruck
  103. Starke
  104. Demmel

If you are one of these authors and you’ve co-authored a paper with someone after the original Lake Arrowhead matrix from 1993, can you post a note either on my blog or on Cleve’s blog with any additional co-authors?

Alternatively, if you’d like to email one of us your CV, we’ll do all the work for you! I reported the updates from Golub’s CV already.

Update I created a GitHub repository for these edits. If you are feeling brave, head to the arrowhead-graph repository click on “arrowhead-new” (if the paper occurred in 1993 or afterwards) or “arrowhead-missing” (if it was missing from the original, i.e. a pre 1993 reference). Then click “edit”. You’ll have to sign up for an account (which is good to have anyway!), but then it’ll automatically edit the file and tell me your changes.

Posted in Uncategorized | 5 Comments

Numerical Linear Algebra in Machine Learning

Notes from the Numerical Linear Algebra in Machine Learning Workshop

Here’s a quick summary of some highlights from my notes about the NLA in ML workshop at ICML. First, it was fantastic in terms of speakers and audience. There were lots of great questions that the audience interjected into the talks to clarify the ideas and all of the talks were about important topics.

See the workshop web-page for more about the ideas behind the workshop. Without further ado, my top 4 highlights:

  • Peder Olsen talked about the “box product”, a variation on the Kronecker product that arises in his new, and useful, treatment of matrix calculus. I wish I had these notes for the last time I gave my lecture on matrix calculus! In brief, the box product is the Kronecker product after a perfect shuffle or stride permutation. I’ll give an example since
  • Zeyuan Allen Zhu gave an overview of their new “almost linear time” method to solve Ax=b with a Laplacian matrix from a graph. It’s a really neat algorithm and closely exploits the relationship between a positive definite. Look up their paper and spend some time with it if you work with these systems. (My internet is bad at the moment and so I can’t look up the refs for the rest of the post. Things are googleable, I believe.)
  • Nicolas Gillis spoke about recent work with NMF. I hadn’t seem the LPs before that people use to find NMFs under this separability condition. These are actually quite similar to some of the problems Paul Constantine looks at. See the Hottopicx paper.
  • Michael Mahoney spoke about their 60-page revisitation of the Nystrom method with all sorts of goodies that are important in actually using these methods.

There was a ton of other great gems and I wish I had time to list them all. If you aren’t here, it’s because if I wrote all I wanted to, this note would never be done (or at least not in bounded time)! And thanks again to the organizers for a great session.

Posted in Uncategorized | Leave a comment

How to use Matlab in a command line script or Makefile

This post is really for me as I managed to forget, misplace, or lose my notes on the last time I did this activity.

If you wish to run a matlab m-file from the command line or from a Makefile, here is the best way I’ve found to do it:

#!/bin/sh
# This version shows additional error output
script=`basename "$1" .m` # this will return $1 unless it ends with .m
if [ "$script" = "$1" ]; then
cmdterm=,
else
cmdterm=\;
fi;
shift # remove the first argument
args="$@"
argsq=`echo $args | tr '"' "'"`
matlab -nodisplay -r "disp('BEGIN>>'); try, $script $argsq $cmdterm catch me, fprintf(2,getReport(me)); exit(1); end, exit(0)" -nosplash | sed -e '1,/^BEGIN>>$/ d'

This extends the notes from the Informatics Bridging Team in a way to show information on the error.

Here are some quick examples with the example files at the matlabcmd gist

dgleich@recurrent:~/Dropbox/research/2013/06-18-matlab-command$ ./matlabcmd load_file.m mydata.txt
10
dgleich@recurrent:~/Dropbox/research/2013/06-18-matlab-command$ ./matlabcmd load_file mydata.txt
10
ans =

10

dgleich@recurrent:~/Dropbox/research/2013/06-18-matlab-command$ ./matlabcmd "randn(5)" # will display the output

ans =
0.5377 -1.3077 -1.3499 -0.2050 0.6715
1.8339 -0.4336 3.0349 -0.1241 -1.2075
-2.2588 0.3426 0.7254 1.4897 0.7172
0.8622 3.5784 -0.0631 1.4090 1.6302
0.3188 2.7694 0.7147 1.4172 0.4889
dgleich@recurrent:~/Dropbox/research/2013/06-18-matlab-command$ ./matlabcmd with_error.m # shows the error
Error using +
Matrix dimensions must agree.

Error in with_error (line 4)
c = a + b;
dgleich@recurrent:~/Dropbox/research/2013/06-18-matlab-command$ ./matlabcmd "randn(5) + randn(6)" # shows the error
Error using +
Matrix dimensions must agree.

Here’s a list of things that could make it better.

  • Optional output — right now, this happens if you remove “.m” from the first file name, in which case it interprets your command as a matlab statement directly. In this case, we show the output with the “ans = ” intact. 
  • Error handling? I think it does a good job here, but I haven’t tested extensively.
  • Others? Are there big issues with this command?
  • Any better solutions out there already? (I looked briefly and didn’t find any!)
Posted in Uncategorized | Leave a comment

Creating high-quality graphics in MATLAB for papers and presentations

This blog post is a bit different. Tammy Kolda originally drafted the following material to explain how to generate good looking Matlab figures. It was based on some notes I had sent her. After discussing it further, we decided to write up a mini-tutorial on this and post it to the blog here. We actually wanted the blog format in order to encourage discussion about it! So please, suggest improvements! We’ve also made the post available as a Matlab file, and github repository for further collaboration and improvements:

Creating high-quality graphics in MATLAB for papers and presentations

(*) Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

A simple figure that is hard to view

Here we show a normal image from MATLAB. This example has been adapted from YAGTOM (http://code.google.com/p/yagtom/), an excellent MATLAB resource.

f = @(x) x.^2;
g = @(x) 5*sin(x)+5;
dmn = -pi:0.001:pi;
xeq = dmn(abs(f(dmn) - g(dmn)) < 0.002);
figure(1);
plot(dmn,f(dmn),'b-',dmn,g(dmn),'r--',xeq,f(xeq),'g*');
xlim([-pi pi]);
legend('f(x)', 'g(x)', 'f(x)=g(x)', 'Location', 'SouthEast');
xlabel('x');
title('Example Figure');
print('example', '-dpng', '-r300'); %<-Save as PNG with 300 DPI

highQualityFigs_01

The default MATLAB figure does not render well for papers or slides. For instance, suppose we resize the image to 300 pixels high and display in HTML using the following HTML code:

<img src="example.png" height=300>

The image renders as shown below and is not easy to read.
example

Step 1: Choose parameters (line width, font size, picture size, etc.)

There are a few parameters that can be used to modify a figure so that it prints or displays well. In the table below, we give some suggested values for papers and presentations. Typically, some trial and error is needed to find values that work well for a particular scenario. It’s a good idea to test the final version in its final place (e.g., as a figure in a LaTeX report or an image in a PowerPoint presentation) to make sure the sizes are acceptable.

Default Paper Presentation
Width 5.6 varies varies
Height 4.2 varies varies
AxesLineWidth 0.5 0.75 1
FontSize 10 8 14
LineWidth 0.5 1.5 2
MarkerSize 6 8 12
% Defaults for this blog post
width = 3;     % Width in inches
height = 3;    % Height in inches
alw = 0.75;    % AxesLineWidth
fsz = 11;      % Fontsize
lw = 1.5;      % LineWidth
msz = 8;       % MarkerSize

Step 2: Creating a figure with manually modified properties

Create a new figure. Set its size via the ‘Position’ setting. These commands assume 100 dpi for the sake of on-screen viewing, but this does not impact the resolution of the saved image. For the current axes, set the default fontsize and axes linewidth (different from the plot linewidth). For plotting the results, manually specify the line width and marker sizes as part of the plot command itself. The font size for the legend, axes lables, and title are inherited from the settings for the current axes.

figure(2);
pos = get(gcf, 'Position');
set(gcf, 'Position', [pos(1) pos(2) width*100, height*100]); %<- Set size
set(gca, 'FontSize', fsz, 'LineWidth', alw); %<- Set properties
plot(dmn,f(dmn),'b-',dmn, g(dmn),'r--',xeq,f(xeq),'g*','LineWidth',lw,'MarkerSize',msz); %<- Specify plot properites
xlim([-pi pi]);
legend('f(x)', 'g(x)', 'f(x)=g(x)', 'Location', 'SouthEast');
xlabel('x');
title('Improved Example Figure');

highQualityFigs_02

Step 3: Save the figure to a file and view the final results

Now that you’ve created this fantastic figure, you want to save it to file. There are two caveats:

  1. Depending on the size of figure, MATLAB may or may not choose tick marks to your liking. These can change again when the figure is saved. Therefore, it’s best to manually specify the tick marks so that they are correctly preserved in both display and saving.
  2. The size needs to be preserved in the saved (i.e., printed) version. To do this, we have so specify the correct position on the paper.
% Set Tick Marks
set(gca,'XTick',-3:3);
set(gca,'YTick',0:10);

% Here we preserve the size of the image when we save it.
set(gcf,'InvertHardcopy','on');
set(gcf,'PaperUnits', 'inches');
papersize = get(gcf, 'PaperSize');
left = (papersize(1)- width)/2;
bottom = (papersize(2)- height)/2;
myfiguresize = [left, bottom, width, height];
set(gcf,'PaperPosition', myfiguresize);

% Save the file as PNG
print('improvedExample','-dpng','-r300');

highQualityFigs_03

EPS versus PNG

An interesting feature of MATLAB is that the rendering in EPS is not the same as in PNG. To illustrate the point, we save the image as EPS, convert it to PNG, and then show it here. The EPS version is cropped differently. Additionally, the dashed line looks more like the original image in the EPS version than in the PNG version.

print('improvedExample','-depsc2','-r300');
if ispc % Use Windows ghostscript call
  system('gswin64c -o -q -sDEVICE=png256 -dEPSCrop -r300 -oimprovedExample_eps.png improvedExample.eps');
else % Use Unix/OSX ghostscript call
  system('gs -o -q -sDEVICE=png256 -dEPSCrop -r300 -oimprovedExample_eps.png improvedExample.eps');
end
GPL Ghostscript 9.07 (2013-02-14)
Copyright (C) 2012 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Loading NimbusSanL-Regu font from %rom%Resource/Font/NimbusSanL-Regu... 3339336 1916509 8109568 6818233 3 done.

Original
example

Improved
improvedExample

Improved EPS->PNG
improvedExample_eps

Automating the example

There is a way to make this process easier, especially if you are generating many figures that will have the same settings. It involves changing Matlab’s default settings for the current session. Note that these changes apply only a per-session basis; if you restart Matlab, these changes are forgotten! Recently, the Undocumented Matlab Blog had a great post about these hidden defaultshttp://undocumentedmatlab.com/blog/getting-default-hg-property-values/. There are many other properties that can potentially be changed as well.

% The new defaults will not take effect if there are any open figures. To
% use them, we close all figures, and then repeat the first example.
close all;

% The properties we've been using in the figures
set(0,'defaultLineLineWidth',lw);   % set the default line width to lw
set(0,'defaultLineMarkerSize',msz); % set the default line marker size to msz
set(0,'defaultLineLineWidth',lw);   % set the default line width to lw
set(0,'defaultLineMarkerSize',msz); % set the default line marker size to msz

% Set the default Size for display
defpos = get(0,'defaultFigurePosition');
set(0,'defaultFigurePosition', [defpos(1) defpos(2) width*100, height*100]);

% Set the defaults for saving/printing to a file
set(0,'defaultFigureInvertHardcopy','on'); % This is the default anyway
set(0,'defaultFigurePaperUnits','inches'); % This is the default anyway
defsize = get(gcf, 'PaperSize');
left = (defsize(1)- width)/2;
bottom = (defsize(2)- height)/2;
defsize = [left, bottom, width, height];
set(0, 'defaultFigurePaperPosition', defsize);

% Now we repeat the first example but do not need to include anything
% special beyond manually specifying the tick marks.
figure(1); clf;
plot(dmn,f(dmn),'b-',dmn,g(dmn),'r--',xeq,f(xeq),'g*');
xlim([-pi pi]);
legend('f(x)', 'g(x)', 'f(x)=g(x)', 'Location', 'SouthEast');
xlabel('x');
title('Automatic Example Figure');
set(gca,'XTick',-3:3); %<- Still need to manually specific tick marks
set(gca,'YTick',0:10); %<- Still need to manually specific tick marks

highQualityFigs_04

print('autoExample', '-dpng', '-r300');

And here is the saved version rendered via the HTML command

<img src="autoExample.png" height=300>

autoExample

Posted in Code | 12 Comments