Language Models and Knowledge Bases
•
designed to generalize
(applies to machine learning models in general)
•
designed to memorize
(applies to structured data in general)
10
Language Models
Knowledge Bases
Language Models and Knowledge Bases
•
designed to generalize
(applies to machine learning models in general)
•
designed to memorize
(applies to structured data in general)
11
Language Models
Knowledge Bases
You give the blue dots
it generalizes
to a line
it stores the
two dots
Language Models and Knowledge Bases
•
designed to generalize
•
probabilistic
•
opaque
•
costly inference
•
creative/ flexible
=>
ideal for natural language processing
(applies to machine learning models in general)
•
designed to memorize
•
deterministic
•
transparent
•
cheap inference
•
“dumb” / rigid
=> ideal for storing crisp data
(applies to structured data in general)
12
Language Models
Knowledge Bases
Databases and Knowledge Bases
•
entities are stored in a table and
have exactly the attributes of table
•
fast on entity‐centric queries
•
has reasoning via constraints
•
not made for taxonomies
=>
ideal for complete data
and “flat” data
•
entities are stored in a graph and
can have (nearly) any attributes
•
need lots of joins for such queries
•
have good reasoning support
•
have a taxonomy, inherited properties
=>
ideal for incomplete data
and data with taxonomic hierarchy
13
Databases
Knowledge Bases
1
2
1 2 / /
5
14
A large number of knowledge bases are publicly available:
Public Knowledge Bases
Huge KB, created by volunteers
Current reference KB, used by Apple Siri
“cleaned‐up version” of Wikidata + schema.org
Huge multiligual KB from several sources
...plus thousands of others.
[Weikum, Dong, Razniewski, Suchanek: Machine Knowledge, Found.&Trends in Databases 2021]
Try it out!
Try it out!
Try it out!
15
KBs are linked in the Semantic Web by Linked Open Data principles.
[LOD Cloud]
Each bubble is a public KB.
Each link means that the entities
of one KB have an equivalent
in another KB.
Public Knowledge Bases
16
[Noy &al: Industry-Scale Knowledge Graphs — Lessons and Challenges, CACM 2019]
= where I gave a talk
[Weikum, Dong, Razniewski, Suchanek: Machine Knowledge, Found. & Trends in Databases, 2021]
Knowledge Bases in Industry
>applications
17
[Weikum, Dong, Razniewski, Suchanek: Machine Knowledge, Found. & Trends in Databases, 2021]
Knowledge Bases in Industry: Applications
•
Search
•
KBs help for auto‐completion
•
a huge fraction of searches
is about named entities (-> from KB)
•
In 50% of Google searches users
are happy with the knowledge card
•
Question answering
•
some queries ask for properties (birth year)
•
some queries ask for entities by properties
(“Stanford computer science professors”)
•
some queries search products by by properties
(“Vanilla Flavored Decaf Portside Coffee”)
•
some queries need complex knowledge
(“Who is the richest tech person?”)
18
Google Knowledge Graph
Google builds its knowledge base from Wikipedia, licensed data, and medical providers.
It contains 5 billion entities and 500 billion facts. It is used in knowledge panels in search.
A reintroduction to our Knowledge Graph and knowledge panels
19
Microsoft Satori
(very hard to find information, but it appears to be used in Bing search results)
Bing.com: “David Hume”
[Microsoft: Bing Entity Search API]
20
Amazon Knowledge Graph
Amazon bought a knowledge base technology called “Evi”, and uses knowledge bases
for its personal assistant Alexa and its product search.
Luna Dong, who built the
Amazon Knowledge Graph
Machine Knowledge
Screenshot from SiriFunny
Apple’s Siri Software
21
Apple
Business Insider, 2017-10-05
Using WikiData?
Apple appears to use a knowledge base for Siri.
22
Where do Knowledge Bases come from?
manual
population
consolidating
other data bases
e.g., for bootstrapping,
or for small KBs
e.g., for merging data
from different sources
with different schemas
David Hume
was...
extracting from
natural language text
e.g., from reports,
Web pages, news, etc.
23
Knowledge Bases are a particular type of structured data
- data is stored as labeled binary relations between entities
- industry players build and use knowledge bases for the products
- plenty of free knowledge bases are available as well
- knowledge bases can be built by information extraction
Summary: Knowledge Bases
Human
Entity
type
subclass
subclass
Book
CreativeWork
subclass
subclass
type
wrote
Philosopher
David_Hume_(philosopher)
“An Enquiry Of Human Understanding ”
->Knowledge-representation
->Semantic-web
->Fact-extraction