How to do a PhD
in Computer Science
CC-BY
Fabian M. Suchanek
68
Overview
2
•
Doing a PhD
•
Making a publication
•
Finding a venue
•
Writing a paper
•
Writing a review
•
Coping with the PhD
•
Wrapping up
These are my personal views and recommendations.
Your requirements may differ
• by university
• by country
• by advisor
• by your own preferences
Following these recommendations does not guarentee success with your thesis.
Not following these recommendations may still yield an excellent thesis.
I cannot take responsibility for the correctness or completeness of these slides.
Disclaimer
3
doctorate, PhD, Dr.,
Doctor of Philosophy
Master / Diploma
What does that mean, PhD thesis?
3-5 years of research work
at university (= “writing a thesis”)
4
•
doing research is fun
•
you get the opportunity to work on what you want
•
you can investigate something for 3 months undisturbed
•
a thesis can open up job opportunities
•
in countries where the PhD is an important qualification
Germany:
1/3 of CEOs
in DAX have a PhD,
entrance salary
20% higher
France:
94%
of doctors have a “cadre” job
• in academia (see
bonus slide
)
• you help to advance science
Why would you want to do that?
Science is mankind’s way of approaching truth.
Scientific theories are unbiased descriptions of reality,
which are useful for industry and society.
5
With a PhD thesis, you contribute your bit to this grand endeavor.
The start of a PhD thesis depends on the fortunate encounter of
• a student who wants to do a PhD thesis (= you?)
• an advisor (professor) who wants to advise the student
• a grant that pays the student’s salary
• a topic that is of interest to all three of them
How do I start a PhD thesis?
Good ways to induce such an encounter are:
• participating in courses that you like
• making contact with a professor you like
• doing an internship with that professor
• following PhD thesis offers (e.g., on forums or mailing lists)
6
What is a PhD thesis?
thesis = { p | p is a publication }
A thesis is a set of publications.
The publications can be to some degree independent.
7
A publication is a written article (usually around 10 pages) that treats
one particular scientific problem and that has been published at a scientific venue.
What is a publication?
Synonyms: paper, article
A paper can treat
• a theoretical problem (Which subset of First Order Logic is decidable?)
• a practical problem (How can databases be merged?)
• in the best case: both
8
Overview
9
•
Doing a PhD
•
Making a publication
•
Finding a venue
•
Writing a paper
•
Writing a review
•
Coping with the PhD
•
Wrapping up
1. Find a problem
2. Survey related work
3. Find a solution
4. Run experiments
5. Make it a paper
Making a Publication
This order
• avoids re‐inventing the wheel
• builds on what is already there
10
1. Find a problem
2. Survey related work
3. Find a solution
4. Run experiments
5. Make it a paper
Making a Publication
1. Find a problem
2. Find a solution
3. Survey related work
4. Run experiments
5. Make it a paper
This order helps you to
• not be biased by what you read
• find intuitive easy solutions
“Spend more time thinking and imagining than reading and learning.”
Hatem Abdelghani's blog
11
1. Find a problem
2. Survey related work
3. Find a solution
4. Run experiments
5. Make it a paper
Making a Publication
1. Find a problem
2. Find a solution
3. Survey related work
4. Run experiments
5. Make it a paper
This order
• fosters inspiration
• builds on an idea
1. Find a solution
2. Find a problem
3. Survey related work
4. Run experiments
5. Make it a paper
12
Find a problem
• that interests you (and your advisor and grant giver)
• where you have knowledge (possibly even an idea)
• that is of broad relevance to science
• that is likely to stay relevant
• that does not have an obvious or known good solution
Your advisor will help you find a good problem.
How to search in log(n) time
How to connect two computers under Windows Vista
How to find a suitable problem
Bad problems for a publication:
P=NP?
How can my dataset be cleaned?
13
In the best case, the problem appears naturally in your area of interest.
It can also come from reading other publications.
Two types of problems are good for solving:
1) Problems that have a solution, but that you can solve better
(faster, more automatically, more easily)
2) Problems that do not have a solution
It’s easier to do something new
than to improve on existing work.
How to find a suitable problem
>RelWork
14
How to find the papers
• ask your advisor
• search on Google scholar
• read the related work of the papers you found
• iterate
How to get the papers
• find the PDF online
• often, your university has an agreement to make papers accessible
• otherwise, write an email to the authors
How to deal with a paper
• read it
• write a short summary (even if the paper turns out to be irrelevant)
• store the summary for use in related work
Surveying Related Work
If you thought the paper
was relevant, a reviewer
may think the same!
>RelWork
15
• What problems are hot right now?
• What solutions do people propose?
• Which related work do people cite?
(this gives us further material to read)
• Which conferences do people cite?
• Who are the main players in the field?
• Which problems are still open?
• How do people sell their stuff?
“The key idea to overcome this dilemma, pursued in this paper,
is to leverage the existing ontology for its own growth.”
What to learn from papers
16
If some citation seems strange, be sure to check out the original paper!
Incremental work
improves over previous work on the same (or a similar) problem
by marginally improving performance — typically by a slightly modified method, a combination
of existing methods, more training data, different training data, augmented trainng data, etc..
Incremental work
• risks being a concidence of the parameters and the dataset rather than a better method
• will be superseded by other incremental work in no time
• will have no impact
Incremental work
17
Incremental work
18
Incremental work is not a good basis for a PhD!
You want your work to be a fundamentally new method
that inspires researchers, applications, and publications!
Incremental work
improves over previous work on the same (or a similar) problem
by marginally improving performance — typically by a slightly modified method, a combination
of existing methods, more training data, different training data, augmented trainng data, etc..
Incremental work
• risks being a concidence of the parameters and the dataset rather than a better method
• will be superseded by other incremental work in no time
• will have no impact
Training on more
data etc. is the work
of an engineer, not
a scientist!
In the ideal case, the solution should be
• general (applicable to many problem instances)
• elegant (a smart idea, well exploited)
• theoretically well founded (not ad hoc)
• implemented (i.e. proven to work)
Good solutions
A solution should not be more complicated than necessary
19
In the experiments
• use standard benchmark datasets wherever possible (see related work)
• compare against the best solutions that exist (“state of the art”)
An irreproachable experiment is when you use exactly the same dataset
and the same metric as your competitor,and you produce better numbers.
Make Experiments
20
Nobody will verify whether you report the numbers that you really obtained in the experiments.
It is out of question to manipulate the results.
It can be very useful to run the experiments even just for your own system, during implementation.
Make periodically sure that every “improvement” of the algorithm really improves the results —
but be aware of overfitting to some benchmark!
Sideremarks on Experiments
It is OK if your solution delivers better results only in some cases
— as long as you can determine upfront what these cases are.
Experiments are often the bottleneck of producing a paper.
Do them early on.
21
If you cannot find a solution, or if you are not happy with the problem,
consider giving up on it! Try something else!
“Better an end with horror
than horror without end.”
Do not insist on the solution you found.
Even if you (or your advisor) thinks it is good, it might not be the right thing to do.
Try alternatives. Try even changing the problem.
Backtrack
(Sophie Scholl)
22
Finding a solution
may require to reinvent the question.
(Goran Frehse)
Overview
23
•
Doing a PhD
•
Making a publication
•
Finding a venue
•
Writing a paper
•
Writing a review
•
Coping with the PhD
•
Wrapping up
In computer science, typical venues are
•
conferences
(main publishing avenue)
•
demos
(system demonstrations)
•
poster papers
(“smaller” papers)
•
workshops
(like smaller conferences)
•
journals
(usually detailed versions
of published conference papers)
All of these (except journals) are physical meetings of scientists.
Finding a venue
A paper goes to a “venue”, i.e., an institution that
• checks the paper for quality
• publishes the paper
>Quality
24
A good venue has papers that
• are well written
• are up to date with respect to related work
• treat a problem in depth
• provide a well-founded solution
Good venues are international, i.e., they are in English.
This means:
• more people can contribute the papers
• more people can verify and check the papers
• more people can use the papers
What is a good venue?
All venues (conferences, workshops, journals, etc.)
exist in different “qualities”.
>Quality
25
Quality of venue
It is easier to have a paper at a bad venue.
Yet, one can be a lot less proud of it.
It is very hard to get a paper into a good venue.
In return, if someone gets a paper in a good venue,
the paper is usually of greater value.
Rather than publishing at a bad conference,
consider publishing a demo or workshop paper at a good conference.
>Quality
26
• your advisor will know
• search for “conference ranking” in your area on the Web
Typical rankings are
• “Rank 1”, “Rank 2”, etc.
• A
, A, B, C
... but rankings are subjective
(and a self-fulfilling prophecy)
How to find good venues
See, e.g., the
CORE ranking
Google Scholar
27
Overview
28
•
Doing a PhD
•
Making a publication
•
Finding a venue
•
Writing a paper
•
Writing a review
•
Coping with the PhD
•
Wrapping up
->submitting
Writing a paper
It is common practice to use Latex for papers
(and you will spend the same time taming LaTex as you will spend
writing the actual text).
All scientific papers follow the same structure:
• title
• abstract
• introduction
• related work
• preliminaries
• approach (solution)
• experiments
• conclusion
>Writing
29
“PARIS: Probabilistic Alignment of Relations, Instances, and Schema”
The abstract should describe in 10 lines
• what exactly is the problem (input / output)
• how it is solved
• that your approach is better than the others
Title and Abstract
>Writing
30
The title of your paper should say what the paper is about.
• Try using all important keywords
• If you wish, invent an acronym for your approach
(it helps people remember your solution)
The introduction should contain Jennifer Widom’s “Stanford 5”:
• what is the problem?
• why this problem is important?
• why is it hard?
• why hasn’t it been solved already?
• what is our solution?
Introduction
After having read the introduction, the reader
should know what EXACTLY is the problem:
• what is the input (= what is given)
• what is the desired output
“If the reviewer gets beyond the first page without
getting convinced, then he will never get convinced.”
Hatem Abdelghani's blog
>Writing
31
For every (vaguely) related approach, the paper should say
• what the approach does
• why it does not solve the problem
• either because the approach solves a different problem
• or because the solution is imperfect
“The XYZ system [42] also addresses the problem of database merging.
Their approach [blah blah]. Yet, the approach has a crucial drawback:
It relies on manual work by underpaid PhD students.”
You cannot be too generous in your coverage of related work.
Cover a paper not only if it is relevant,
but also if a reviewer could think it is relevant (Web search).
Related Work
>Writing
32
"We will now explain our approach.
As input, we require [...].
Our goal is to [...].
Our approach proceeds in 3 stages: ...”
Show that you found a certain path,
and that this path is the best one to go.
Your Solution
>Writing
33
The main part of the paper should explain your solution.
• be very explicit and clear
• explain every design choice you made
Experiments should show that your method works best.
• run on different data sets (at least 3)
• run with different competitors (at least 1, better 3)
• run your system with different parameter settings
• explain all datasets, metrics, and settings explicitly
• discuss reasons for good and bad performance
Be fair with your competitors! That will convince people.
Experiments
time
size of dataset
Our system
Their system
>Writing
34
Write a conclusion (roughly the same thing as the abstract, in past tense).
Include discussion and future work.
“In this paper, we have addressed the problem of blah . We have shown that blub.
...
Our approach still leaves a number of challenges to be solved: ...”
Conclusion
Be frank about limitations of your approach
(but make clear that these are for future work).
>Writing
35
The paper has to be well written
• faultless English (use a spell checker)
• short and concise sentences
“In this paper, what we want to do is after the Web, which has grown so large
in recent times has become even larger is dealing with the problem that has
been bothering many people (not just researchers), namely that Web search
often does not deliver the results.”
“In this paper, we are dealing with Web search. Web search is the problem of finding
relevant Web documents for a given set of keywords. ...”
Polishing
Faulty English is a reason for rejection!
>Writing
36
Many good papers are good, because
they describe a clever solution
• in depth
• in a form that is pleasant to read
The solution does not have to be brilliant,
if it is thought through and well presented.
Polishing
37
Decide who should be an author
• only people who contributed to the work should be authors
• all people who contributed to the work should be authors
Do not take someone as co‐author just because
(s)he offers help (everybody can offer help).
Take someone as co‐author if (s)he is indispensable.
Authors
In some communities, the order of authors is the order of importance
• who had the main idea and main work goes first
• other authors come next
• the advisor usually goes last
All of these are tricky issues, discuss them with your advisor.
38
A paper is submitted online to the venue
(= uploaded to the conference Web page).
Venues usually have a deadline
• the deadline is precise to the minute
• it is usually in Hawaiian time zone (around 5am-11am European time)
• you can submit your paper several times, the last version counts
Writing a paper ALWAYS takes until the last second.
Be prepared to work 10h a day in the week before the deadline (including weekends),
and the entire night before the deadline.
Submitting
39
The reviewers will decide whether to accept or reject your paper.
You have to accept their decision. They usually provide reasons and suggestions.
Reviews
>Reviews
Don’t waste the time waiting for their decision!
Use it to improve all the weaknesses of your paper
that you discovered while writing it — either for this venue or for the next!
The paper is then reviewed by 3 anonymous experts.
This usually takes around 6 weeks.
You may not submit the paper somewhere else during this time.
40
Overview
41
•
Doing a PhD
•
Making a publication
•
Finding a venue
•
Writing a paper
•
Writing a review
•
Coping with the PhD
•
Wrapping up
Science is humanity’s way of approaching truth. (see
Wikipedia/Science
)
The scientific state of the art is the corpus of all theories
that are known to make correct predictions, plus their predictions.
The purpose of the reviewing process is two‐fold:
1) determine whether your contribution should be part of science
2) help you improve your contribution
The Purpose of Reviews
This is how science advances:
Every idea is checked as objectively as possible by experts.
Only if the idea is convincing, it becomes part of science.
This model is, at the same time, also one of the problems of academia,
see Antoine Amarilli’s
summary
.
42
>Reviews
Ethical Expectations
Reviewers are bound by ethical expectations:
•
They have to keep the paper confidential.
•
They are not allowed to use the contributions for their own research
•
They are requested to be polite, even if they are anonymous
•
most importantly: they have to work with utmost rigor,
because their decision impacts a person and their career.
43
>Reviews
Blindness
standard
provably improves reviewer neutrality
There are different ways of reviewing:
•
non-blind submission:
- the authors know who the reviewers are
- the reviewers know who the authors are
•
(single‐)blind submission:
- the authors do not know the reviewers
- the reviewers know the authors
•
double‐blind submission:
- the authors do not know the reviewers
- the reviewers do not know the authors
=> all references to the authors have to be removed from the paper
44
>Reviews
• Brief summary of the paper to show that the reviewer understood
“This paper treats the problem of [...].
The main idea is to [...]”
Common structure of reviews
• General valuation: Is the problem relevant/interesting/hot? Is the
solution smart/interesting? Is the writing and presentation good?
“The paper is well written. It treats an important problem, because [...]”
“I like the approach because [...]”
• Main part: discussion of problems
“I see the following problems with this submission...”
“I am worried that...”
“My main concern is that...”
45
>Reviews
Problems with the approach
Problems with the approach
• the approach does not deliver what was promised
• the approach does not cover all cases
• the approach is faulty
• proofs/arguments are faulty
• there is an easier/better way of doing it
Problems with related work
• relevant related work was not discussed
• the approach solves a problem that has already been solved, and does not do it better
• the approach is a minor modification of existing work
Give references of the papers
you have in mind!
46
>Reviews
Problems with experiments
Problems with experiments (rough rules of thumb)
• no baseline (if there is a naive solution, it has to appear as baseline)
• no competitors (the more the better, at least 1 if problem is known)
• bad performance (at least 5% better than baseline)
• datasets too small
• too few datasets (at least 2)
• unrealistic datasets (at least 1 real world dataset)
• small improvements without statistical significance measure
• experiments do not test all aspects of the approach
47
>Reviews
Problems of presentation
Problems with presentation:
• written entirely in incomprehensible English (give examples)
• authors reveal themselves in double-blind submission
Minor problems of presentation:
• unreadable figures
• inconsistent structure (propose a better one)
• typos
• syntactic/idiomatic problems
• missing references (shown as question marks)
48
These should be mentioned,
but only as an appendix to the review.
They should not be a reason for rejection.
>Reviews
Proposals for improvement
Every criticism of the paper should be accompanied by proposals for improvement
“This problem could be addressed by...”
“I would have liked to see/I would propose...”
“I believe the paper would need (1)..., (2)..., (3)...”
Every reviewer is sometimes an author, and most authors will at some point be reviewers.
We are all colleagues with the common goal of advancing science.
[Show examples of reviews]
49
So be kind and helpful!
>Reviews
Bad reviews
Bad rejecting reviews
•
misunderstand the paper
•
reject the paper because of minor formalities
•
make unreasonable demands for experiments
•
reject the paper because the method is too simple (even though it
works better than existing work)
•
give feedback that cannot be used to improve the paper
(“some parts of the paper are unclear” -> which parts?)
Bad accepting reviews:
•
miss related work
•
miss problems in the paper
•
do not trace the approach
•
do not check the soundness of the approach
•
ask mainly to cite the reviewer’s own work
50
Evaluation
>Reject
The final evaluation usually offers the following choices:
•
strong accept
: I will fight for this paper with the other reviewers
•
weak accept
: I am OK with it being accepted
•
borderline
: I cannot make up my mind (avoid if possible)
•
weak reject
: I am OK with it being rejected
•
strong reject
: I will fight for rejection
The majority of papers are usually rejected.
Hence, the default choice is often “weak reject” (an unconvincing paper).
51
Decision
After all reviewers have made their choices
• the reviewers discuss their reviews online
• the venue might allow for author rebuttals
(the reviews are shared with the authors, who have 3 days to answer the concerns)
• the chairs take a decision
Often, the papers are ranked by their averaged review score,
and then a cut is made at some sensible level
(e.g., “average ≥ borderline”, or “no reject”)
52
Overview
53
•
Doing a PhD
•
Making a publication
•
Finding a venue
•
Writing a paper
•
Writing a review
•
Coping with the PhD
•
Wrapping up
Do not worry if your paper gets marginally rejected.
The majority of papers are at first rejected.
Usual acceptance ratio at good conferences is around 10-30%.
If rejected...
After the deadline is before the deadline.
(You will try again for the next conference)
54
If the paper is accepted,
you will go to the conference
and present your work.
The conference is usually
• in a very cool place (Hawaii, Singapore, ...)
• with good food
• and interesting talks and people.