Knowledge Bases CC-BY Fabian M. Suchanek 13
2 We need to bridge language and structured data Natural language “how to say it” Structured data “what to say” querying/question answering information retrieval fact checking augmenting/ verifying Language Model Is this screw part of the airplane? Part                    Plane             Weight Screw6677 Airbus717 10g Screw6678 Airbus717 8g ...
3 Database A (relational) database is a set of tables (also called relations), each of which stores data in columns and rows. Given Name David Olympe Thomas ... Family Name Humes de Gouges Paine ... Profession philosopher playwright politician ...
4 Database A (relational) database is a set of tables (also called relations), each of which stores data in columns and rows. Given Name David Olympe Marie ... Family Name Humes de Gouges Curie ... Profession philosopher playwright physicist ... Disadvantages: •  rigid schema, inconvenient for incomplete data •  no taxonomy Birth place ? ? Thetford ... Advantages •  very established data model •  great software support
5 Knowledge Base: Inheritance Relations (properties, attributes) are declared for a class, and inherited by the subclasses. Human Entity type subclass subclass - label - URL - image - birth date - birth place - spouse - work - awards Philosopher David_Hume_(philosopher)
6 Knowledge Base: Terminology Knowledge Base (KB, also: Knowledge graph, KG) is a directed labeled graph, where the nodes are entities and the edges are relations between these entities.  Human Entity type subclass subclass Book CreativeWork subclass subclass type Taxonomy of classes Instances with facts Labels “Hume” “An enquiry of human understanding”  “Исследование о человеческом познании”   label label - label - URL - image Relation definitions Philosopher David_Hume wrote Hume_Book_1
7 Knowledge Base: Terminology Knowledge Base (KB, also: Knowledge graph, KG) is a directed labeled graph, where the nodes are entities and the edges are relations between these entities.  Human Entity subclass subclass CreativeWork subclass subclass Taxonomy of classes Instances with facts Labels Constraints: Human⊓  CreativeWork ≡ ⊥   - label - URL - image Relation definitions type Book type “Hume” “An enquiry of human understanding”  “Исследование о человеческом познании”   label label Philosopher David_Hume wrote Hume_Book_1
8 Knowledge Base: Terminology Knowledge Base (KB, also: Knowledge graph, KG) is a directed labeled graph. Human Entity subclass subclass CreativeWork subclass subclass Taxonomy of classes Instances with facts Labels Constraints: Human⊓  CreativeWork ≡ ⊥   - label - URL - image Relation definitions Schema/ontology type Book type “Hume” “Исследование о человеческом познании”   label label Philosopher David_Hume wrote Hume_Book_1
9 Knowledge Bases: Example schema:Thing schema:Person yago:Human yago:David_Hume yago:Edinburgh schema:deathPlace schema:deathPla… yago:Economist   (+ 4) schema:hasOccupation schema:hasOccup… yago:English_language yago:English_langua… schema:knowsLanguage schema:knowsLan… yago:Kingdom_of_Great_Britain yago:Kingdom_of_Gre… schema:nationality schema:national… yago:Edinburgh schema:birthPlace schema:birthPla… yago:Hume_(surname) schema:familyName schema:familyNa… yago:David_(name) schema:givenName yago:Royal_Society_of_Edinburgh yago:Royal_Society_… schema:memberOf "1776-08-25" ^^xsd:date schema:deathDate http://commons.wikimedia.org/wiki/Special:FilePath/Allan%20Ramsay%20-%20David%20Hume%2C%201711%20-%201776.%20Historian%20and%20philosopher%20-%20Google%20Art%20Project.jpg http://commons.wiki… schema:image yago:male_Q6581097 schema:gender "David Hume"@ar   (+ 73) schema:alternateName schema:alternat… https://af.wikipedia.org/wiki/David_Hume https://af.wikipedi…   (+ 98) schema:sameAs "Brits filosoof"@nl   (+ 23) rdfs:comment "David Hume"@af   (+ 103) rdfs:label wd:Q37160   (+ 2) owl:sameAs [YAGO: David Hume] >comparison to db
Language Models and Knowledge Bases •   designed to generalize (applies to machine learning models in general) •   designed to memorize (applies to structured data in general) 10 Language Models Knowledge Bases
Language Models and Knowledge Bases •   designed to generalize (applies to machine learning models in general) •   designed to memorize (applies to structured data in general) 11 Language Models Knowledge Bases You give the blue dots it generalizes to a line it stores the two dots
Language Models and Knowledge Bases •   designed to generalize probabilistic opaque costly inference •  creative/ flexible =>   ideal for natural language processing  (applies to machine learning models in general) •   designed to memorize deterministic transparent cheap inference •  “dumb” / rigid => ideal for storing crisp data (applies to structured data in general) 12 Language Models Knowledge Bases
Databases and Knowledge Bases •   entities are stored in a table and have exactly the attributes of table fast on entity‐centric queries has reasoning via constraints •  not made for taxonomies =>   ideal for complete data and “flat” data •    entities are stored in a graph and can have (nearly) any attributes •  need lots of joins for such queries •  have good reasoning support •  have a taxonomy, inherited properties =>   ideal for incomplete data and data with taxonomic hierarchy 13 Databases Knowledge Bases 1 2 1     2   /    / 5
14 A large number of knowledge bases are publicly available: Public Knowledge Bases Huge KB, created by volunteers Current reference KB, used by Apple Siri “cleaned‐up version” of Wikidata + schema.org Huge multiligual KB from several sources ...plus thousands of others. [Weikum, Dong, Razniewski, Suchanek: Machine Knowledge, Found.&Trends in Databases 2021] Try it out! Try it out! Try it out!
15 KBs are linked in the Semantic Web by Linked Open Data principles. [LOD Cloud] Each bubble is a public KB. Each link means that the entities of one KB have an equivalent in another KB. Public Knowledge Bases
16 [Noy &al: Industry-Scale Knowledge Graphs — Lessons and Challenges, CACM 2019] = where I gave a talk [Weikum, Dong, Razniewski, Suchanek: Machine Knowledge, Found. & Trends in Databases, 2021] Knowledge Bases in Industry >applications
17 [Weikum, Dong, Razniewski, Suchanek: Machine Knowledge, Found. & Trends in Databases, 2021] Knowledge Bases in Industry: Applications •    Search •    KBs help for auto‐completion •   a huge fraction of searches is about named entities (-> from KB) •  In 50% of Google searches users are happy with the knowledge card •  Question answering some queries ask for properties (birth year) some queries ask for entities by properties (“Stanford computer science professors”) some queries search products by by properties (“Vanilla Flavored Decaf Portside Coffee”) some queries need complex knowledge (“Who is the richest tech person?”)
18 Google Knowledge Graph Google builds its knowledge base from Wikipedia, licensed data, and medical providers. It contains 5 billion entities and 500 billion facts. It is used in knowledge panels in search. A reintroduction to our Knowledge Graph and knowledge panels
19 Microsoft Satori (very hard to find information, but it appears to be used in Bing search results) Bing.com: “David Hume” [Microsoft: Bing Entity Search API]
20 Amazon Knowledge Graph Amazon bought a knowledge base technology called “Evi”, and uses knowledge bases for its personal assistant Alexa and its product search. Luna Dong, who  built the Amazon Knowledge Graph Machine Knowledge
Screenshot from SiriFunny Apple’s Siri Software 21 Apple Business Insider, 2017-10-05 Using WikiData? Apple appears to use a knowledge base for Siri.
22 Where do Knowledge Bases come from? manual population consolidating other data bases e.g., for bootstrapping, or for small KBs e.g., for merging data from different sources with different schemas David Hume was... extracting from natural language text e.g., from reports, Web pages, news, etc.
23 Knowledge Bases are a particular type of structured data - data is stored as labeled binary relations between entities - industry players build and use knowledge bases for the products - plenty of free knowledge bases are available as well - knowledge bases can be built by information extraction Summary: Knowledge Bases Human Entity type subclass subclass Book CreativeWork subclass subclass type wrote Philosopher David_Hume_(philosopher) “An Enquiry Of Human Understanding ” ->Knowledge-representation ->Semantic-web ->Fact-extraction