domingo, 23 de setembro de 2012

38th International Conference on Very Large Data Bases

In the last month, I've attended the 2012 edition of the VLDB conference in Istanbul, Turkey. VLDB is   a top-tier conference in computer science with a focus on data management. I consider SIGMOD (ACM) and ICDE (IEEE) to be "sisters" of VLDB in the computer science conference family. My paper entitled Mining attribute-structure correlated patterns in large attributed graphs, written in collaboration with Wagner Meira Jr. and Mohammed Zaki,  was presented as part of the Graph Statistics and Summarization Session of VLDB. Moreover, I had the opportunity to watch many other talks, interact with the data management research community in general, and get to know the beautiful city of Istanbul.

About my talk, I think it was one of the best ones I've given so far. In fact, I was a little bit nervous about this talk, since it was my first one in a major CS conference. To make things even harder, my current advisor and my two past advisors were in the audience -- some people feel more comfortable with a familiar audience, but in my case it works exactly the other way around. I've put a good amount of work on the preparation of the slides, but I still have to improve my slide design skills a lot. My main goal was to combine textless slides with a clear speech. As expected, the speech part was the most challenging one. In the end, I finished my talk on time and also answered one question about it. Moreover, the overall feedback I've received after the talk was mostly positive.

Regarding the other presentations, I was able to watch some of them. However, choosing which session to attend among 7 parallel sessions was difficult. I hope someday they will develop some kind of session recommendation for these big conferences, so that you upload some of your papers and then get a list of sessions/talks aligned with your interests. I've attended all they keynote talks, but did not really pay attention to them. In my experience, keynote talks are usually boring and I've never been able to get something useful out of them. On the other hand, I've enjoyed the research sessions a lot. Here is the list of the talks I've attended:

  1. Size-I Object Summaries for Relational Keyword Search;
  2. Injecting Uncertainty in Graphs for Identify Obfuscation;
  3. Measuring Two-Event Structural Correlations on Graphs;
  4. Efficient Reachability Query Evaluation in Large Spatiotemporal Contact Datasets;
  5. Boosting Moving Object Indexing through Velocity Partitioning;
  6. Multiple Location Profiling for Users and Relationships from Social Network and Content;
  7. Fast and Exact Top-k Search for Random Walk with Restart;
  8. The Filter-Placement Problem and Its Applications to Minimizing Information Multiplicity;
  9. Labeling Workflow Views with Fine-Grained Dependencies;
  10. gSketch: On Query Estimation in Graph Streams;
  11. Answering Top-k Queries Over a Mixture of Attractive and Repulsive Dimensions;
  12. Bayesian Locality Sensitive Hashing for Fast Similarity Search
  13. Challenging the Long Tail Recommendation;
  14. Ranking Large Temporal Data;
  15. Indexing the Earth Mover's Distance Using Normal Distributions;
  16. Stochastic Databases Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores;
  17. A Moving Object Index for Efficient Query Processing with Peer-Wise Location Privacy;
  18. A Data-Based Approach to Social Influence Maximization;
  19. The Complexity of Social Coordination;
  20. Who Tags What? An Analysis Framework;
  21. Whom to Ask? Jury Selection for Decision Making Tasks on Micro-blog Services;

As soon as I set my reading pace to normal levels again, I may summarize some of these papers here.

Visiting Istanbul was an amazing experience. The city has a great history -- former capital of the Byzantine and the Ottoman Empires -- that can still be explored through museums, mosques, churches, castles, etc. Also, the city's topography and hydrography -- the city is divided by the Bosphorus Strait and the Golden Horn -- provides great views from several parts of the city. The picture included in this post was taken during a boat tour through the Bosphorus Strait. it shows the Yoros Castle and the Turkish flag. Because VLDB took place during the Victory Day, which is a very important national holiday to remember the independence of Turkey, we got used to see Turkish flags all over the city.

quinta-feira, 20 de setembro de 2012

How to succeed as a Ph.D student?

How can you maximize your chances of succeeding as a Ph.D student? This is the question I've been thinking about since I received the results of my applications for grad school earlier this year. Nothing more appropriate than applying the scientific method to identify a set of powerful guidelines for acing graduate school. After reading a good amount of material on graduate school, research methodology, and personal management, and also applying some of the proposed techniques, I think it is time to summarize some ideas here.

What is a Ph.D?

The most important question regarding a Ph.D is what is a Ph.D exactly? According to Wikipedia, a Ph.D (or Doctor of Philosophy) is a postgraduate academic degree given by universities. The term 'Philosophy' means 'love of wisdom' and is not necessarily related to Philosophy as a field of study. In practice, a Ph.D is a qualification to conduct individual research in industry, academia, government, etc.

A Ph.D degree usually requires some courses, qualifying exams, teaching experience, and research work presented in the form of a dissertation and oral defense. While working on research, the Ph.D candidate is supervised by a more experienced researcher, usually a professor. The dissertation and its oral defense are evaluated by a committee of experts in the topic of the research done by the Ph.D candidate. In case the committee recognizes the scientific merit of the dissertation, then the Ph.D degree is granted to the candidate.

 Though composed of a set of requirements, the main component in a course of a Ph.D candidate towards his or her degree is the development of original research.  Matt Might describes this research and contextualizes it in terms of the human knowledge using pictures in his Illustrated Guide to Ph.D.


The second most important question when you think about a Ph.D is: Why would you get a P.hD in the first place? Motivation plays a key role in the success of a Ph.D student. As a consequence, it must be very clear in the student's mind what a Ph.D can add to his life and career.

Whether getting a Ph.D is worthy or not is a controversial topic. In practical terms, the main goal of a graduate school is to train a new generation of university professors and researchers. I believe that in order to be objective about the reasons to pursue a Ph.D it is interesting to compare it against a job in industry, which is the most likely alternative to someone who wants to go to grad school. According to Matt Welsh, things that can be learned doing a Ph.D but usually not in industry include:
  • Reading and writing research papers;
  • Running experiments;
  • Giving talks;
  • Figuring out what problems need to be solved;
  • Becoming a leading expert in an area;
  • Exploring the boundary of human knowledge;
  • Dealing with lots of freedom and a lack of structure;
  • Working for yourself;
  • Working on the hardest problems.
Peter Bentley gives broader reasons to pursue a Ph.D:
  • To achieve something significant (except money);
  • To discover or learn something new;
  • To improve yourself and your life;
  • It fits you.
Jamie Lawrence offers a more pessimistic view on the motivations for going to graduate school -- after quitting his Ph.D. He mentions the following ones:
  • Qualification;
  • Dr.
  • Curious;
  • etc.
Regarding professional prospects, a Ph.D degree may increase someone's opportunities to find a satisfying job. However, as Thomas Benton (a pseudonym) describes fiercely, the market for Ph.Ds, specially in humanities, is disappointing. The truth is that there are too many Ph.Ds for too few tenure positions and graduate students end up being the cheap fuel that keeps the higher education engine running. David Goodstein, from Caltech, has a more apocalyptic view on  the oversupply of Ph.Ds as a result of the exponential growth of science in the past 300 years. He makes a very interesting analogy between the evolution of science and the universe based on the Big Bang-Big Crunch theory. Basically, an average university professor advises about 15 Ph.Ds during his career, but only one of them will be another university professor because the society does not support such a exponential growth in science anymore.

Moreover, the cost of a Ph.D degree is definitely high. The great majority of Ph.D students do not actually pay tuition but are supported during grad school instead. However, because this support is usually much lower than salaries in the job market, a Ph.D costs approximately the price of a house. To make things even worse, the earning premium for a Ph.D is, on average, 26%, while the one for a master's degree is 23%. What means that the tough years in graduate school do not actually lead to a big paycheck. In fact, a consequence of the current scenario is that the most brilliant American students have been avoiding graduate school in the past 40 years. However, these missing students have been replaced by foreign students, including myself, that are more willing to tolerate poor conditions and prospects.

Dan Pink, a motivational guru, argues that there are three factors that motivate people to work: purpose,  mastering, and autonomy. Ph.D students deal with these factors in a very interesting fashion. Regarding purpose, Ph.D students are expected to solve relevant problems as part of their dissertations. Moreover, a Ph.D degree is essentially about mastering a topic. Finally, Ph.D students have a high degree of autonomy, since they are developing the capacity of  conducting individual research. Back to the purpose aspect,  I found this amazing quote from Albert Szent-Gyorgi about it:

                "If a student comes to me and says he wants to be useful to mankind and go into 
                research to alleviate human suffering, I advise him to go into charity instead.
               Research wants real egotists who seek their own pleasure and satisfaction, 
                but find it in solving the puzzles of nature."

The reason I like this somehow selfish quote so much is that it removes a great part of the weight carried by science in general. It is hard to imagine that someone is paying a fair amount of money for people to "solve puzzles", but nobody said that this puzzle cannot be curing cancer or abolishing world hunger.


Taking some courses is usually part of the qualifying requirements in graduate school. There is a common agreement that graduate school is more about depth than breadth, being different from undergraduate school. In fact, in graduate school you are expected to learn more from outside classes. Ronald Azuma gives a very good definition for learning in graduate school:

                "It's like teaching swimming by tossing students into the deep end of the pool
                and seeing who makes it to the other end alive and who drowns"

Grad students have more freedom and flexibility than undergrads to choose the courses they take. However, this freedom and flexibility must be handled wisely. It is recommended to ask your advisor and other students about which courses to take. Dianne O'Leary emphasizes that two priorities of a graduate student should be: (1) taking elementary courses to get prepared for the advanced graduate level courses, and (2) fulfilling the course requirements and exams defined by the department. Another good recommendation (from Frank Valid) is taking the easier classes first, since the beginning is a critical part in graduate school.

I like to see courses and qualification in graduate school as part of the broader goal of becoming a world expert in a given topic. As a consequence, graduate level courses require more initiative and preparation hours (from 3 to 4 hours for each hour in class). However, more than learning the topics, it is important to remember what you have learnt. Periodic reviewing is the only way to hold knowledge, but it takes even more discipline than learning itself. 

Another important aspect of graduate-level courses  is learning advanced topics.  As we get closer to the frontier of human knowledge, mastering complex techniques becomes more important. Calvin Newport once mentioned the case of  a highly successful young researcher whose secret has been learning new promising topics and then applying them to solve hard problems. One good strategy for mastering complex topics I have already experimented with is the so called Feynman Technique, which is basically learning as you were supposed to explain the topic to a new student using a top-down approach, examples, analogies, and simple language. In this post, I'm applying some ideas from the Feynman Technique to learn about graduate school.

Being a good student in graduate courses may be a good way to get close to faculty members that may be future advisors, research collaborators, and also write recommendation letters for you. Also, advanced projects developed in these courses may lead to publications and even be included as part of your dissertation, thus keep in mind that an extra effort may be rewarding.

A more controversial view on graduate courses is given by Matt Might, who said that focusing on grades or coursework is one of the easy ways to fail a Ph.D. Straight As do not give you a Ph.D and after you finish, nobody really cares about your grades. Stephen Stearns goes even beyond and says that lectures are an inefficient way to learn for Ph.D students, who should apply more active learning strategies, such as focused seminars. 


According to the guide  How To Do Research In the MIT AI Lab, the choice of thesis advisor is the most important decision to be made in graduate school. Advisors are often the ones who apply for funding, meet weekly to talk about progress, review papers, recommend topics, introduce students to other professors and researchers,  promote the student's work, etc. Nevertheless, advising is a highly subjective matter, which may lead to all sorts of conflicts.

Students should consider prospective advisors as one of the criteria in the selection of graduate schools to apply for. Moreover, searching for an advisor should be a top priority for first-year Ph.D students.  In general, good advisors are those whose students finish their Ph.Ds and have successful careers. Therefore, it is important to talk to previous and current students of a given professor in order to gather information about this professor as a advisor. It is also relevant to know whether the advisor is tenured -- tenure means a guaranteed position as a professor -- or not. Non-tenured professors are generally more available and motivated. The downside is that they have less funding available and are also less experienced in research.

Besides these more general guidelines, choosing an advisor also depends a lot on the specific student under consideration. In particular, the student's and professor's research interests and preferences (e.g., level of guidance required/given, individual or group projects) should be as much compatible as possible. Advising styles range from a strong master/apprentice style to a passive hands-off style. Gordon Davis recommends avoiding these extremes. Moreover, he makes a very interesting point regarding the students' evolution along their Ph.Ds and how different advising styles fit better with each one of the stages in this evolution. Matt Might also makes reference to this evolution, stating that the advisor should be hands on in the early stages, but, towards the end, the student should know more than the advisor about her topic. Finding the exact time for this inversion is a challenge and a bad timing is one of the main reasons to fail a Ph.D. Another very good recommendation by Gordon Davis is discussing a 10-year plan for the student and then working on a Ph.D that best fits this plan.

In my limited experience (two advisors), communication is a key aspect of a successful advisor-advisee relationship. Each involved part should be aware of what is expected from each other (like a contract). Miscommunication -- rather than misbehavior -- was the source of most of the conflicts between students and their advisors I have seen so far. In particular, I believe that many students do not realize that in graduate school they are the ones who take the duties and pressure. In fact, the typical advisor is a source of duties and pressure instead of someone who was supposed to alleviate them.

A very funny view on advising is given in the Thesis Prevention, which is a set of guidelines for a fictitious adversarial advisor. I'm still looking for a student who does recognize his or her advisor at  some point while reading this guide. My favorite guidelines are: 

  1. Do not prepare for supervisions;
  2. Don't concentrate during meetings;
  3. Focus on vague (philosophical) ideas.

Time management

One of the most important lessons I've learnt about how to develop successful research is the importance of time management. Navigating from long-term goals (e.g., becoming a successful researcher, getting a Ph.D, publishing a paper) to very short specific tasks (e.g., reading a paper, plotting a result) is maybe the greatest challenge in the life of a Ph.D student. Therefore, they should become experts in avoiding procrastination and making constant progress as means to increase their productivity and, as a consequence, their success in graduate school.

A Ph.D is a long-term complex project that depends on many people (student, collaborators, advisor, thesis committee, journal and conference committees), costs a good amount of money (enough to buy a house according to Matt Welsh), and involves high risks. Nevertheless, most of the Ph.D students I know refuse to make use of very simple time management techniques in graduate school. The lack of proper time management is hard to perceive, since it has indirect implications over several other aspects in graduate school. But don't blame yourself, "falling to plain is planning to fail".

My major sources of material about time management are the Calvin Newport's blog Study Hacks and Scott Young' blog Get More from Life. Calvin is a computer science professor who got his Ph.D from MIT and is somehow obsessed about productivity and impact. One of my favorite posts from Calvin is about how he achieves outstanding levels of productivity working 8-5 by applying what he calls fixed schedule productivity. The idea is first to fix your schedule and then work backwards to make everything fit. However, in order to make this method work it is necessary to be ruthlessly results oriented, focusing on what really matters. Frank Vahid makes a similar point saying that the key to balancing multiple tasks in graduate school is refusing. Moreover, having a regular schedule, setting priorities, keeping to-do lists, avoiding tasks of negligible importance (e.g., checking email and facebook,  instant messaging, reading news), working in blocks of uninterrupted time, and getting enough sleep are effective time management strategies. Scott Young shares a similar opinion, advocating that you can actually achieve higher productivity by working less if you are smart.

Matt Might disagrees with Cal Newport about the regular schedule, mentioning it as one of the main reasons Ph.D students fail. According to him, graduate school requires contemplative labor days, nights and weekends. Jamie Lawrence also questions the effectiveness of time management in graduate school, but for different reasons. He believes that complex tasks such as inventing and comprehending are simply unmanageable in practice. 

In my opinion proper regular schedule and time management techniques in general are indeed useful in graduate school. Therefore, I decided to apply them systematically. Some of the guidelines I want to follow are:
  • Setting yearly, monthly, weekly, and daily goals with their respective priorities;
  • Breaking goals into sets of specific tasks;
  • Working on 3-hour chunks with 30-minute breaks;
  • Tracking the number of hours of focused working;
  • Setting a regular schedule (8-9 hours a day, 5-6 days a week);
  • Avoiding any source of procrastination;
  • Working on one project per day;
  • Working on up to 3 projects at the same time.
Randy Pausch proposed a list of questions that may help in priority setting:
  • Why am I doing this?
  • What is the goal?
  • Why will I succeed?
  • What happens if I choose not to do it?

Philip Guo, in his Phd Grind, describes some of his productivity rages -- one of them consisting of 72 consecutive days programming with 5 days of break and a 10-hour work routine --  in which he did most of the work that led him to graduate.  Though this kind of "mode" may be useful, I truly believe it should be avoided as much as possible. A good motivation to avoid these rages is that they are not compatible with the kind of creative work that leads to relevant research.

One of the most interesting and polemical articles about time management in research I know was  published in the Nature magazine. It describes the daily routine of a "24/7 lab" -- the Brain Tumor Stem Cell Laboratory -- at Johns Hopkins University School of Medicine, directed by the neuroscientist and neurosurgeon Alfredo Quinones-Hinojosa. In this lab, students are supposed to work over holidays (including Christmas), meetings go until after 10:00 pm, and members receive calls at 6:00 am. Quinones-Hinojosa and his lab are very successful but their working routine has been condemned by the research community. Nevertheless, it is hard to argue against a person who attributes his transition from an illegal immigrant from Mexico to becoming a Berkeley and Harvard alumni before joining one the the US leading research hospitals to hard work. This same article mentions a study that have found that the average scientist works 50 hours a week and that more working hours tends to lead to more publications. However, there is also evidence that highly creative scientists have broader interests and more hobbies than less creative ones. The idea of a 24/7 lab was later challenged by Julie Overbaugh, who argued that this intense routine penalizes creativity and the flow of ideas in research.


Money does not seem to be a big problem in the life of a graduate student. I have already mentioned that a Ph.D is an expensive degree, but this cost actually comes from the low value of the stipends in graduate school compared to salaries in the job market. As Marie desJardins said "Although nobody ever got rich being a graduate student, you probably won't starve either".

The typical sources of money for Ph.D candidates are:
  • Fellowships: Based on merit and given by government (e.g., NSF, DOE) or private (Google, Microsoft, Yahoo, Samsumg) organizations;
  • Research Assistantships: Part of a professor's grants associated to a research project;
  • Teaching Assistantship: Given by the department to students that help professors in course-related activities (e.g., grading, tutoring).
TAships are usually given to first-year students that are looking for advisors and research topics. Around the second year, students move to RAships in order to start research. Fellowships require applications and have deadlines along the year. While a student with a fellowship is free to work with any professor -- and as consequence on any topic --, it is harder for them to interact with professors, in comparison to TAs and RAs. In the particular case of a RA, it is important to notice that once funded by a given project, the student's research is constrained to the topics associated to this project. An interesting analysis of the difference between a RA and a fellowship in the life of a graduate student is given in the Phd Grind, in which the fellowship did not worked in the student's favor due to the lack of motivations for him to work on a particular problem and collaborate with a specific professor.

Fellowships, TAs, and RAs usually cover tuition and also provide a stipend for the student. However, TAs and RAs are often available for nine months. During the summer, students can apply for internships, which pay about 5 times the graduate school stipend, as means to cover their expenses and also get some work experience. Moreover, students also need funding to attend conferences, which may come from travel fellowships or grants related to a project.

A general guideline for graduate students is to keep living expenses in check. Running into debt may move the focus of the student from graduate school to alternative ways to earn more money. This extra money will likely require some kind of effort somehow unrelated to finishing the Ph.D.


Besides supporting graduate students, being a teaching assistant is also an effective way to get teaching experience, which is useful to students who aim to become professors. As a side-effect, teaching assistants alleviate the burden of professors, who then have more time to do research. Today, there is recurrent complaint that TAships, and even graduate school in general, have as one of their main purposes reducing the budget of universities.   The idea is that TAs are a much cheaper replacement for professors, playing a key role in the economic viability of mass education. Following the same line of thought, RAs are cheap manpower for conducting research projects.

TAs usually receive some kind of training from the university or department before they start working. Diane O'Leary gives a list of good recommendations for TAs:
  • Be well prepared (read the textbook, know what was covered in class, solve homework problems);
  • Be fair in grading;
  • Treat students with respect;
  • Give quick feedback;
  • Be alert to any attempt of cheating.
Teaching is a personally rewarding experience, but being a good teaching assistant does not give you a Ph.D degree. Therefore, I like to include teaching as one of those tasks that may conflict with graduation if not managed sensibly.


Internships are a great opportunity for students to get some work experience outside academia. In particular, for students who do not like the idea of staying in academia, internships may become job opportunities after graduation.  The typical period for internships in graduate school is summer. Companies and other organizations select students based on applications and each internship lasts for a 3-month period.

Internships can be focused on research or development depending on the interests of the organization that provides the internship position. Philip Guo describes his internship at Google as development oriented, since he spent his time working on his own open-source project. On the other hand, while he was at Microsoft Research, he collaborated with a famous researcher and wrote 3 papers. 

Internships and teaching assistantships are similar in the sense that they do not necessarily help the student to graduate. Internships focused on research may lead to publications and become part of a dissertation, as happened to Jure Leskovec, a former Carnegie Mellon Ph.D student now at Stanford. Jure was an intern at Yahoo! Research, HP Labs, and Microsoft Research, publishing at least one paper in collaboration with researchers from each one of these laboratories. However, his case is an exception and not the rule. Calvin Newport describes that he once asked to one of his professors about a summer internship he was considering and his professors told him that spending those three months working on paper submissions was a better idea.

Research / Thesis

A Ph.D degree testifies that someone is able do conduct original research, extending the human knowledge regarding a given subject. The research developed during a Ph.D has a thesis as main product.  The function of the advisor is to assist the student in the development of his or her thesis, which will be evaluated by a committee of experts who decide whether the Ph.D degree should be given to the student or not.

Finding a topic/problem to work on is one of the greatest challenges in the life of a Ph.D student. According to Richard Hamming, working on important problems is a prerequisite for doing great research, because if you don't work on important problems, it is unlikely that you will do important work. Moreover, there is an agreement that the student must enjoy the subject and have fun solving the problem. Working on a problem you don't like for several years is an academic suicide. Richard Hamming goes beyond, saying that emotional commitment is a necessary condition for the development of first-class work.

One of my favorite pieces of advice about finding a research problem, given by Edger Dijkstra, is to always try to work on the boundary of your abilities, so that you avoid routine problems as much as you can but, at the same time, are realistic about your capabilities.  It is good to have a list of interesting open problems and then discuss them with your advisor and colleagues. Tracking technology trends and changes may also enable the discovery of interesting research problems.

A Ph.D has limited time, which constraints the space of problems a Ph.D student can solve during her Ph.D. For instance, it took 358 years of intense work of many great mathematicians to prove the Last Fermat's Theorem, therefore proving this theorem was certainly not a good research problem for a Ph.D student. On the other hand, derivative dissertations, in the sense that they solve a slightly twisted version of an existing problem and/or improve an existing solution for a problem by a negligible margin, are usually boring. Given this trade-off between difficulty and feasibility, Marie desJardins recommends taking a central problem that is solvable and acceptable and then extending it to riskier versions in order to make the thesis more exciting. As a general advice, she says that a good problem should require both solid theoretical work and good empirical evaluation. According to Matt Might, aiming too high or too low are some of the top reasons to fail a Ph.D. Uri Alon has an interesting scheme based on difficulty x interest to describe the search space of research problems.

The idea is that ideal problems are very relevant but also relatively easy to solve. As the researcher gets experience, he or she can navigate towards the harder problems. Following the same line, Calvin Newport says that we should avoid complexity when seeking problems. A very important lesson he learned from one of his professors was to always reduce problems to their purest, most simple form. Once you are able to understand the underlying math  of the problem, then add more complexity to it. On the other hand, he recommends seeking complexity in your technical skills. Once you get an important and complex skill, you can apply it to solve many relevant problems. Still on this topic, Lasse Pedersen says that a great paper is often has a result that everyone can understand and apply, but not everyone could have derived.

Michael Nielsen defines two interesting types of researcher that can be extremely successful: the problem-seeker and the problem-solver. The problem-seeker is always exploring new problems and then giving simple solutions for them while the problem-solver takes a well-known relevant problem and then applies his arsenal of techniques to solve it. Based on these definitions, Maurice Herlihy can be considered a problem seeker based on what he said:

               "I am opportunistic. I have a general area of interest, namely, concurrency, but I worry
               about becoming too specialized, so I make an effort to balance my attention between 
               more applied and more theoretical areas, because, often, that's where you have 
              the opportunity to write the first "easy" paper and let the ones who write the difficult
              papers cite your paper."

In his guide about how to publish in the SIGKDD conference, Eanon Keogh states that a good research problem is one that, if you can solve it, you can make money, or save lives, or help children learn a new language. It is important to get real data, make incremental progress, and evaluate your success using a clear metric. Furthermore, your research statement should be falsifiable.

A good way to find new problems and key results is attending/watching seminars and talks in general. I've never been good at getting useful information from talks, thus I've decided to apply an off-the-shelf method to get information from talks, which was proposed by Ravi Vakil and is known as "Three Things" Exercise. The idea is that if you can get three things out of a talk, then it is a successful talk. These things can be: 
  • A definition;
  • A theorem;
  • A key example;
  • A motivating problem;
  • A question you want to ask;
  • Anything else similar.
Once you think you have a relevant and well-defined research problem, it is time to review the literature related to this problem. According to Diane O'Leary, the subject of the problem should be timely and active. But the research area should not be too crowded to reduce your chances of being scooped. Richard Hamming mentions that one important characteristic of great scientists is that they tolerate ambiguity very well

"They believe the theory enough to go ahead; they doubt it enough to notice the errors and faults so they can step forward and create the new replacement theory."

From a problem-solver point of view, after understanding the problem and the key results related to it, then it is time to do the creative work. In my experience, most of the researchers do not spend a sufficient amount of time with a pen and a piece of paper, working on original research. Proper time management and a results-oriented approach should lead you to spend a good amount of time developing original research, since researchers are judged in terms of their problem-solving capacities. However, it does not mean that the problem must be solved from scratch, the idea is filling the gaps between the existing knowledge and the particular problem you are tackling. According to Richard Hamming, you should have a reasonable attack to your problem. Stefan Savage emphasizes that having a "secret weapon" or "unfair advantage" is desired.

John Steele gives a general algorithm for solving research problems that works as follows:
  1. Take a problem of interest;
  2. Become familiar with the key results;
  3. Strip away as much mumbo (e.g., fix definitions) jumbo as you can;
  4. Identify the core phenomena of the area;
  5. Pose simple questions that articulate concrete features of the phenomena of interest;
  6. Put a little work on this questions (keep notes);
  7. Whenever you get a concrete theoretical or empirical result, Latex it out.
A more general piece of advice from John Steele that I think to be very relieving is:

                  "You need to put in an honest days work, say three genuine research hours 
                  (pencil in hand or fingers on the keys) and two genuine reading hours. I have
                  never in thirty years known or heard of a student who did this five days a week
                  for two years who did not get a thesis."

In the research community, there is a famous algorithm that is said to be based on Richard Feynman's approach to solve problems that is more concise, albeit less realistic, that John Steele's one. For sake of completeness, here is Feynman's algorithm:
  1. Write down the problem;
  2. Think real hard;
  3. Write down the solution.
Paul Ronney also offers a set of guidelines -- somehow associated to the fail fast principle -- that are useful in the development of effective research:
  1. Design something simple;
  2. Build quickly;
  3. Test crudely;
  4. Modify;
  5. Improve;
  6. Repeat.
There is a particular topic that I consider to be paramount for a Ph.D student but that is not covered by most of the guides I've found that is whether is the right time to give up. Courage, persistence, and patience are important attributes of a successful researcher, but at some point it is expected that the student should give up on a specific problem because her or she is somehow unable to solve it. I'm my opinion, if after a reasonable amount of time, the student is not able to make any progress regarding the problem, he or she should at least put it in a latent mode. In fact, keeping a dozen research problems in latent mode is a good advice given by Gian-Carlo Rota. He said that this strategy was applied by Richard Feynman, who would check any new method he learned against his set of favorite unsolved problems. Matt Might says that, as a Ph.D student, you have to be willing to fail "from the moment you wake up to the moment your head hits the pillow". The point is that a Ph.D student should always avoid getting stuck for a long time.

One question that I often face regarding a researcher's self-development is how much time should be spent on learning new skills instead of doing actual researchSteven Weinberg argues that you should start doing research and then learn what is needed along the way. But this advice kind of contradicts some points made by Calvin Newport and Michael Nielsen about learning a rich and varied set of tools to attack research problems. Another interesting tip from Steven Weinberg is to "go for the messes", which means the subjects that are in an intense process of evolution, instead of those that are more stable. The reason is that you can make greater contributions when the topic is "boiling".

Martin Schwartz devotes a whole article about the importance of stupidity in research. The idea is that scientists are able to handle stupidity in a productive fashion, actively seeking out new opportunities to feel stupid. This idea agrees a lot with something said by Richard Dawkins:

"Science, as opposed to technology, does violence to common sense."

Stuart Firestein made a similar point when he defined science as:

                   "Form of ignorance characterised by knowing what you don't
                   know, and being able to ask the right questions".

A corollary of the stupidity assumption that I took from Terry Tao is that we should be skeptical about our own work. Moreover, he summarizes what it takes to solve mathematical problems, which a think can be generalized to other research problems, as:

  • Knowledge;
  • Experience;
  • Patience;
  • Hard work

One of the most useful recommendations I found when I started studying effective research methodologies was keeping a notebook (or journal). Things that should be recorded in this notebook include:
  • General research speculations (e.g., a problem that sounds interesting);
  • Remarks about a particular paper you are reading or talk you are watching;
  • List of tasks for the day, week, and month;
  • Outlines of publications and talks;
  • Experiments to be run;
  • Preliminary results and discussion;
  • Formulations, theorems, and math notes in general;
  • Papers to be read later;
  • Etc.
Another aspect that is known to increase the quality of research is collaboration. Collaborators are an important source of feedback, new ideas, and motivation. Furthermore, you may learn new skills from experienced collaborators and also connect with the research community with their help. It is important to keep in mind that the student should be the main responsible for the work included in the dissertation. However, this requirement still allows collaboration opportunities in the development of a dissertation. I found interesting that in Mathematics, researchers usually apply the so called Hardy-Littlewood rule, that states that authors should appear in alphabetical order and all authors share the same credit for publications. Nevertheless, this practice is not popular in other research fields.

A last tip about doing research as a Ph.D is to never loose sight of the big picture. In other words, the student should be able to:
  • Study the proposed problem at several levels and from different points of view;
  • Understand the broader implications of your work to the society;
  • Have a clear picture of how your work complements existing research in different fields;
  • Comprehend how your work may be a step towards more ambitious goals.

Reading in an essential part of the research process. An average researcher is expected to spend most than half his or her time reading. Matt Might estimates that a typical Ph.D student needs to read about 50 to 150 papers in order to defend the novelty of the proposed thesis.

The two main purposes of reading for a Ph.D student are:
  • Finding unsolved research problems (e.g., future work);
  • Reviewing the literature on a given problem.
While reading a paper, you should try to answer the following questions (by Dianne O'Leary):
  • From where did the author seems to draw the ideas?
  • What exactly was accomplished by this piece of work?
  • How does it seem to to relate to other work on this field?
  • What would be the reasonable next step to build upon this work?
  • What ideas from related fields might be brought to bear upon this subject?
Reading a paper carefully is a task that takes several hours. Michael Mitzenmacher gives some important guidelines for reading research papers:
  • Read critically
  • Read creatively
  • Make  notes
  • Try to summarize the paper in a few sentences
  • Compare the paper to other works

I found this very interesting quote from Paul Halmos about the kind of active reading required for academic papers:

                "Don’t just read it; fight it! Ask your own questions, look for your 
                own examples, discover your own proofs. Is the hypothesis necessary? 
               Is the converse true? What happens in the classical special case? 
               What about the degenerate cases? Where does the proof use the 

Write summaries on the papers you read, so that you do not need to read a paper again in case you forget its main ideas or need to write a related work section. Moreover, summaries help you to avoid cryptomnesia, which may lead you to claim credit for other people's ideas. I keep summaries of papers I've read in this blog.

It is important to notice that reading must be selective. There is not time to read many papers in full, so it  is important to read papers at different levels of detail. According to Michael Nielsen, there is a small number of papers that are worth reading in any field. Most papers should be skimmed and deep reading should be applied only to a few good papers.

Richard Hamming has a very peculiar, albeit interesting, view on the relationship between great research and reading:

             "Reading is important to know what is going on, but reading to get the 
              solutions is not the way to do great research."


Writing should be a routinely activity in graduate school. As I have already mentioned, a notebook is one of the most effective tools for researchers. However, different from this notebook, which is for your own consumption, at some point as a graduate student, you will write papers, book chapters and your dissertation, which are supposed to be read by someone else. Due to the "publish or perish" philosophy, writing is part of the job of any researcher, as emphasized by Richard Hamming:

           "But the fact is everyone is busy with their own work. You must present it
            so well that they will set aside what they are doing, look at what you've 
            done, read it, and come back and say, ``Yes, that was good.'' "

 A very common mistake of young researchers is to separate research time from writing time. The best strategy is to write along the way and the notebook helps a lot with that. Writing research ideas, experiments to be run, expected results, partial results, partial conclusions, paper outlines, sketches of figures and plots, etc. is a way to keep a detailed log of the research progress. This log may later become the starting point for drafts of papers and the dissertation. Moreover, it is also very useful to support the discussions while meeting with your advisor and other collaborators.

Writing well is difficult and the best way to improve your writing skills is practicing. Therefore, I decided to make writing a habit by creating this blog. At first, creating a blog seemed as a dumb idea, but, in fact, this blog turned out to be useful to me from several perspectives, such as:
  • Practice writing;
  • Keep summaries of papers I read;
  • Share ideas with colleagues;
  • Pressure myself to put good habits in practice.
One of the ways your writing skills will be put in check is by submitting peer-reviewed papers. Peer-review is, roughly speaking, an evaluation process where a committee of experts assess the scientific merit of your paper. It is a self-regulated method, in the sense that researchers evaluate the work of their colleagues and many researchers are likely to both submit and review papers. Peer-review is a established evaluation process in academia -- not only for papers but also for project proposals, awards, etc. -- but it has been victim of criticism in the last years. It is a common saying that committees are a lowest common denominator. The main shortcomings of peer-review are:
  • It is slow;
  • It favors elitism and the suppression of dissent ideas;
  • It is subject to conflicts of interest;
  • Results are not checked with rigor;
  • There are no incentives for good reviewing.
Due to these shortcomings and also the tough competition among researchers to have their papers published by prestigious venues, good writing has become even more decisive for paper acceptance. The idea is to assume the reviewer as a busy (or lazy) person and then reduce the effort needed for the reviewer to assess the merit of the paper to minimal levels. For instance, Eanon Keogh states the following:

             "If you can save the reviewer one minute of their time, by spending one 
              extra hour of your time, then you have an obligation to do so".

There is an agreement about the fact that the typical reviewer makes his impression based on the first pages of the paper and most of them will not "waste" their time reading your paper in full. As Daniel Lemire once wrote in his blog:

           "In science, everybody is busy promoting their own idea and gathering up
            evidence to convince others."

Another important thing I've learned about writing is that it requires a lot of feedback and several iterations. In fact, many people say that writing is re-writing, in the sense that will you need to improve your paper several times until it reaches a high-quality level. In graduate school, the burden of reviewing is given mainly to the advisor, but in practice the advisor may be to busy to contribute along the whole writing process. The solution I've applied quite successfully so far is getting feedback from colleagues first (especially those from your collaboration network) and then invoking the wisdom of the advisor only in the final stages. Actually, the wisdom will come in the final stages whether you are expecting it or not.

In the Researcher's Bible, the authors make several interesting recommendations about writing academic papers:
  • Learn to write well;
  • Your paper should have a message;
  • You must present it so that it cannot be misunderstood;
  • Do not try to say too much;
  • Basic framework: Claim/hipotheses and then evidence for it;
  • Keep a particular reader in mind;
  • Use examples;
  • State what is new and better about what you have done.
Besides the direct benefit of publishing your work, the guide  How To Do Research In the MIT AI Lab makes an interesting point about the benefits of writing for debugging ideas. Duane Bailey recommends stealing stylistic ideas from other papers, something that I also found very useful. In particular, it is a good strategy to take papers from the venue you are aiming for in order to inspire you in writing a paper that is compatible with this venue.

It is interesting to notice that publication patterns vary a lot according to the research area. While some fields (e.g., biology and physics) are focused on journals others (e.g. computer science) give more value  to conferences. Moreover, publication volumes vary not only among fields but also researchers. Because the number of publications is sometimes considered a measure of a researcher's success, many researchers publish a lot in order to get recognition. On the other hand, there are several well-succeeded  researchers that have a dozen papers or even less, as described by Daniel Lemire. I do agree with him that some people publish too much without caring about the impact of what they publish. It seems like the attributes that a paper must have in order to be published are not exactly the same ones it must have in order to achieve great impact.

                "Today, many assistant professors have published more than star researchers 
                like Claude Shannon, John  Nash or Richard Feynman have in their entire 
                life. Maybe we spend too much time on the publication process, and too 
                little time on the actual science?"

One of the most important pieces of advice regarding writing is the famous saying: "To catch a thief, you must think like a thief.". What is funny about it is that in peer-review everybody is a thief and everybody is trying to catch a thief as well. Therefore, it should be easy for researchers to review and improve their papers from a reviewer's perspective.

I believe that most of the recommendations for effective writing given here also apply to a dissertation. However, the dissertation has some specific properties that should be treated accordingly. First of all, the typical structure of a dissertation is:

  • Introduction;
  • Technical Contribution 1;
  • Technical Contribution 2;
  • ...
  • Technical Contribution N;
  • Conclusions.

Though it is not necessarily true, I like to see each chapter of a dissertation as the work developed during one year of the Ph.D. Moreover, it is a good idea to submit each chapter of the dissertation as a journal paper, so that the contributions are testified by the research community. Each of these journal papers should be extensions of one or two conference papers. I've seem these integrated view on dissertation writing and publishing in practice and it works very well. By spending at least 3 years focused on research, the student can publish about 3 journal papers and 6 conference papers. 

Giving talks

Part of selling your work, and yourself, as a student and researcher is giving talks. In fact, talks are a great opportunity to interact with the research community. So, leave a good impressionRonald Azuma points out that if you can speak well, you will earn recognition and distinguish yourself from the other students. I have to confess that I do not like most of the research talks and lectures I've seen so far, but I have found some inspiring ones. Sean Carrol summarizes well the importance of giving talks in graduate school:

                 "I’ve actually heard some students say that they love science, but
                     don’t like writing papers or giving talks. That’s like saying you love
           being a butcher, just aren’t very fond of cutting up animals."

The paper How to give a good research talk (Simon Jones et al.) gives useful directives for giving effective talks. Some of these directives that should be emphasized are:

  • A talk is a taster, not an in-depth treatment on a subject;
  • Use motivating examples, in particular, start your presentation with one;
  • Treat some aspects with more detail than others;
  • Say enough without saying too much;
  • Don't hide the problems;
  • Don't read the slides;
  • For long talks, have slides to orient the audience;
  • Make eye contact with the audience;
  • Don't overrun, it is selfish and rude;
  • Save 2/3 minutes for each slide;
  • Know your audience;
  • Have in mind that people will remember only one or two things from your talk, choose what should be remembered.

My favorite guide on giving research talks is entitled Small guide for giving presentations and was written by Markus Puschel. In this guide, the author applies several useful strategies in his own slides, which makes his points much clearer. These are the pieces of advice given by him that I think are the most relevant for a graduate student:

  • Preparation is key;
  • Speak clearly, not too fast;
  • Don't put your hands in your pockets or cross your arms;
  • Avoid text. Your brain cannot process reading and listening at the same time, but listening and images are fine;
  • Avoid equations, they are understood only by experts;
  • Use images:
    • Diagrams;
    • Plots;
    • etc.
  • Use humor if you can;
  • Start your presentation by thanking your co-authors;
  • Try not to loose the audience, use slides to catch people back;
  • Strive for quality, people remember good presentations;
  • Start with an interesting slide;
  • Don't forget to explain every plot:
    • axis;
    • scales;
    • higher/lower is better;
    • example point.
  • Use contrasts (colors, fonts, sizes) wisely and be consistent;
  • Watch critically other presentations.

In the following figure, Markus Puschel shows how a presentation is judged considering both its visual quality and technical content.

Most of the guides about giving research talks focus on formal talks, but informal talks are also important. Another general recommendation is that you should be able to summarize your research into one-sentence, one-paragraph, and five-minute answers. Discussing your research with your colleagues is a very effective way to get early feedback on your ideas. But you should also give feedback to other people's research, so that you create a network for research collaboration. This network also works for exchanging interesting references, reviewing drafts of your papers, and sharing research experience in general. This idea is based on the Secret Paper Passing Network, described in the guide How To Do Research In the MIT AI Lab.

A very important topic related to giving research talks is how to deal with criticism and questions. As a researcher, you should always be open to criticism and questions. But be sure that you understand them well. For instance, it is a common mistake to answer a question that is different from the one asked by the audience. Furthermore, it is a good strategy to answer the question in a way that is comfortable for both you and the person who asked the question. Try to make everybody happy.

It is expected that you will be nervous before and during your talk. Therefore, you should learn how to deal with your nerves. First of all, practice a lot, so that you become familiar with your talk. Matt Might recommends listening to music and doing physical exercises (e.g. push-ups) as means to relax. I've watched some of the Matt Might's talks available online and they are impressive. Another great recommendation he gives in his blog is about anticipating the questions an having backup slides that address them. In fact, he says to be able to predict the person who will ask the question sometimes.

Final advice:

Begin with the end in mind!


What is a Ph.D?

Ronald Azuma. So long and thanks for the P.hD.
David Goodstein. The Big Crunch.
The Economist. The disposable academic: Why doing a PhD is often a waste of time. 
Dan Pink. The Surprising Science of Motivation.

Duane Bailey. A letter to research students.
Total Drek. Unhelpful hints.
Jamie Lawrence. Things I learnt during and about my PhD.
Philip Guo. The PhD grind.
Ronald Azuma. So long and thanks for the P.hD.

Time Management
Matt Might. 10 easy ways to fail a P.hD.
Alan Bundy et al. The researcher's bible. 
Cal Newport. Time management: How an MIT postdoc writes 3 books, a PhD defense, and 6+ peer-reviewed papers - and finishes by 5:30pm.
Calvin Newport. Some Thoughts on Graduate School.
Julie Overbaugh. 24/7 isn't the only way: A healty work-life balance can enhance research.
Scott Young. Work less to get more done.
Jamie Lawrence. Things I learnt during and about my PhD.
Philip Guo. The PhD grind.
Richard Hamming. Your and your research.
Randy Pauch. Time management tips.
Heidi Ledford. The 24/7 lab.

Philip Guo. The PhD grind.
Marie desJardins. How to be a good graduate student.
Cal Newport. Time management: How an MIT postdoc writes 3 books, a PhD defense, and 6+ peer-reviewed papers - and finishes by 5:30pm.
Dianne O'Leary. Graduate study in the computer and mathematical sciences: a surviving manual.

Dianne O'Leary. Graduate study in the computer and mathematical sciences: a surviving manual.
Cal Newport. Time management: How an MIT postdoc writes 3 books, a PhD defense, and 6+ peer-reviewed papers - and finishes by 5:30pm.
Randy Pauch. Time management tips.
Richard Hamming. Your and your research.
David Goodstein. The Big Crunch.

Calvin Newport. Some More Thoughts on Graduate School.
Philip Guo. The PhD grind.

The Thesis
Eanon Keogh. How to do great research, get it published in SIGKDD and get it cited! 
Daniel Lemire. How to write a good research paper.
Uri Alon. How to choose a good scientific problem?
Edger Dijkstra. The three golden rules for successful scientific research.
Richard Hamming. Your and your research.
Cal Newport. Decoding the impact instinct.
Marie desJardins. How to be a good graduate student.
Randy Pauch. Time management tips.
Richard Hamming. Your and your research.
Timothy Novikoff et al. Education of a model student.
Total Drek. Unhelpful hints.
Jamie Lawrence. Things I learnt during and about my PhD.
Edward Wilson. Advice to young scientists.
Philip Guo. The PhD grind.
Stephen Stearns. Some modest advice for graduate students.
Scott Young. Work less to get more done.
Scott Young. Learn Faster with the Feynman Technique.
Ravi Vakil. The "Three Things" Exercise for Getting Things out of Talks.

Michael Mitzenmacher. How to read a research paper?
David Chapman. How to do research at the MIT AI lab.
Matt Might. 10 easy ways to fail a P.hD.
Michael Nielsen. Principles of Effective Research.
Lance Fortnow. Finding Problems to Work on.

Apoorva Mandavilli. Peer review: Trial by Twitter.
David Chapman. How to do research at the MIT AI lab.
Philip Guo. The PhD grind.
Daniel Lemire. How to write a good research paper.
Eanon Keogh. How to do great research, get it published in SIGKDD and get it cited! 
Alan Bundy et al. The researcher's bible. 

Giving Talks
Simon Jones et al. How to give a good research talk.
Markus Puschel. Small guide to giving presentations.
David Chapman. How to do research at the MIT AI lab.