Semantic Culturomics
Fabian M. Suchanek
(Télécom Paristech University, France)
Nicoleta Preda
(University of Versailles, France)
2
Google Books
13m books
Wikipedia/Google Books
2004-2014
Scanner
Database
Google Ngram Viewer
Culturomics
3
4
Google Ngram Viewer
Culturomics
5
Google Ngram Viewer
Culturomics
“Quantitative Analysis of Culture
Using Millions of Digitized Books”
... and dozens of authors
(Science Magazine 2010)
http://www.culturomics.org/
Google Ngram Viewer
Culturomics++
male
female
gender
nationality
married
6
+
gender
7
0
0.05
0.10
0.15
1950
1960
1970
1980
Ratio of women
... among politicians
“Mining History
with Le Monde”
(AKBC 2013)
>
Culturomics++
+
8
35.0
40.0
45.0
50.0
55.0
60.0
65.0
70.0
1950
1960
1970
1980
All
Politicians
Musicians
Singers
Age
Culturomics++
+
Semantic Culturomics
10
Goal 1:
Discover (not just confirm)
the patterns of life
“
IF
you are a singer
AND
you are drug addict,
THEN
you will die at the age of 27”
time
IF
AND
THEN
Amy Winehouse picture from listal.com
Semantic Culturomics
10
Goal 2: Explain events
by explicit rules
• Why did Amy Winehouse die at 27?
• Because she was a singer and
she was a drug addict
time
IF
AND
THEN
Amy Winehouse picture from listal.com
now
Semantic Culturomics
11
Goal 3: Predict events with
concrete actors
time
IF
AND
THEN
Justin Bieber picture from Wikipedia/Flickr/Joe Bilawa
now
Semantic Culturomics
12
time
IF
AND
THEN
Justin Bieber picture from Wikipedia/Flickr/Joe Bilawa
now
Goal 3: Predict events with
concrete actors
Def: Semantic Culturomics
13
Semantic Culturomics is the large-scale analysis
of text documents with the help of knowledge bases,
with the goal of discovering, explaining, and predicting
the trends and events in history and society.
Examples of questions we want to answer:
• How long do celebrities usually take to marry?
• What are the factors that lead to an armed conflict?
• Which species are likely to migrate due to global warming?
• What will happen to this politician next?
=> requires the combination of text and KB
Challenge 1: Knowledge Rep.
14
singer
+
+
time
Devise a unified model for these tasks:
• Given a rule, replace all facts that match
the antecedent by the succedent
• Represent the meanings of a phrase
under different disambiguations
• Find all phrases that contain names that
could refer to entities with a given property
• Find all entities that appear in a phrase that
can express a certain relationship
large body
of work on
knowledge
representation,
but combining
all is hard.
Challenge 2: Event Mining
16
singer
Find a way
• to map entity names to entities
• to map event descriptions to events
The King married in 1969.
His wife is Priscilla.
large body of work on
information extraction
and disambiguation, but ->
Challenge 2: Event Mining
16
Find a way
• to cluster unknown entity names to entities
• to create events from descriptions
BusjBusj ruled Molvanîa.
The King was known as
“Bu-Bu” to his wife.
BusjBusj
The King
Bu-Bu
Bu-Bu's
wife
married
Challenge 3: Rule Mining
17
• Logical rules
• Numerical rules
• Temporal rules
A person is born before he dies.
The spread between the imports and the exports
of a country correlates with its current account deficit
An election is followed by the inauguration
of a president.
(kind of OK)
(avoid spurious correlations)
(requires temporal KR)
Challenge 3: Rule Mining
18
• Rules with negation
• Existential rules
• Rules with textual features
People marry only if they are yet not married.
If you are a person, then there exists a person who
is your father, and is both male and adult.
If you are often mentioned in positive
context with your party, you may become
the next presidential candidate.
(violates OWA)
(requires several head atoms)
(requires soft reasoning)
Good news: We are not alone
19
Culturomics,
Digital Humanties,
GDELT project
from Nathan Kallus' WWW 2014 paper
Event prediction,
Kira Radinsky's work,
Recorded Future