Amazon principal applied scientist Jens Lehmann and three coauthors recently received the Semantic Web journal 10-year award for their paper “LinkedGeoData: A Core for a Web of Spatial Open Data.”
The Semantic Web journal is a leading journal for knowledge graphs and web technologies. Its editorial board, including editors-in-chief Pascal Hitzler and Krzysztof Janowicz, selected the award-winning paper from a pool of papers published in 2012.
SWJ 10-year award 2022: Claus Stadler, Jens Lehmann, Konrad Höffner, and Sören Auer, LinkedGeoData: A Core for a Web of Spatial Open Data. Semantic Web 3 (4), 2012, 333-354 @ClausStadler @JLehmann82 @SoerenAuer @STKO_UCSB @pascalhitzler @IOSPress_STM https://t.co/RIGTePn5yH
— Semantic Web Journal (@SW_Journal) October 25, 2022
Lehmann’s coauthors are Claus Stadler, a researcher at the Institute for Applied Informatics at the University of Leipzig; Konrad Höffner, a researcher at the Institute for Medical Informatics, Statistics and Epidemiology (IMISE); and Sören Auer, director of Leibniz Information Centre for Science and Technology and professor at Leibniz University Hannover.
The paper helped demonstrate the feasibility of large-scale virtual knowledge graphs by describing a large-scale dataset derived from OpenStreetMap, a collaborative project that relies on a community of mappers to contribute and maintain data as part of a free geographic database of the world. It significantly expanded on an earlier version of the paper published at the International Semantic Web Conference (ISWC) in 2009, where it attracted attention from various stakeholders like Tim Berners-Lee — the inventor of the World Wide Web.
There were several reasons why the dataset got attention in the scientific community. One reason was the scale of several billion facts when performing a full extraction, which means it was one of the largest datasets at the time.
“There were several reasons why the dataset got attention in the scientific community,” Lehmann said. “One reason was the scale of several billion facts when performing a full extraction, which means it was one of the largest datasets at the time. Another reason was the lightweight ontology layer we put on top of it, which simplified querying the dataset and the development of applications.”
Furthermore, connecting the data to DBpedia — a crowdsourced effort to extract structured content from various Wikimedia projects — and other datasets allowed users to fuse information from multiple sources, Lehmann said.
For publishing and querying the dataset, the authors’ rewriting approach transformed incoming queries in a language called SPARQL to queries over the underlying OpenStreetMap database. This made it possible to publish a virtual knowledge graph (VKG) over a relational database without requiring a change in the database itself. It also allowed live synchronization of the knowledge graph, which was important because OpenStreetMap could have thousands of changes per minute.
The main challenge for VKGs is to allow efficient querying without changing the underlying structure. LinkedGeoData uses the Sparqlify approach (or its distributed version, Sparklify) to address this challenge. More recently, LinkedGeoData added support for the Ontop rewriter as an alternative. Both specifically support querying using spatial predicates.
The full dataset contains around 8 billion entities and is several terabytes large, a challenging amount of data to handle. A dump that was extracted around the time the paper was published in 2012 contained 27 billion facts, surpassing the size of the Google Knowledge Graph at that time. For users who wish to work on smaller subsets — such as for particular regions or with a focus on certain spatial elements—there are filtering strategies to obtain snapshots for particular use cases.
LinkedGeoData uses the resource description framework (RDF) data model and includes links to other knowledge graphs. It has served as a resource for spatial entity linking, entity alignment, and topological relation discovery. The query logs of LinkedGeoData have also been used for various analytical tasks. Using the latest rewriter, researchers can build data snapshots for their own use cases and query them efficiently using SPARQL and its extension, OGC GeoSPARQL.
Lehmann joined Amazon in June of 2022 as an Alexa AI principal scientist. Based at Amazon’s office in Dresden, Germany, Lehmann works on the inclusion of knowledge graphs in machine learning and approaches toward building generalized intelligence for making Alexa more competent and natural for customers.
Apart from LinkedGeoData, Lehmann has cofounded and contributed to further knowledge graph projects, such as DBpedia. He has won 15 other best-paper awards for various contributions in artificial intelligence.
“I am interested in building intelligent systems combining knowledge graphs and machine learning,” says Lehmann. “Doing this at scale and ensuring that it is done in a way that is beneficial for Amazon customers is a big motivation for my work at Alexa AI.”