skip to main content
research-article

LinkingPark: : An automatic semantic table interpretation system

Published: 01 October 2022 Publication History

Abstract

In this paper, we present LinkingPark, an automatic semantic annotation system for tabular data to knowledge graph matching. LinkingPark is designed as a modular framework which can handle Cell-Entity Annotation (CEA), Column-Type Annotation (CTA), and Columns-Property Annotation (CPA) altogether. It is built upon our previous SemTab 2020 system, which won the 2nd prize among 28 different teams after four rounds of evaluations. Moreover, the system is unsupervised, stand-alone, and flexible for multilingual support. Its backend offers an efficient RESTful API for programmatic access, as well as an Excel Add-in for ease of use. Users can interact with LinkingPark in near real-time, further demonstrating its efficiency.

References

[1]
E. Muñoz, A. Hogan, A. Mileo, Using linked data to mine RDF from wikipedia’s tables, in: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, 2014, pp. 533–542.
[2]
D. Ritze, O. Lehmberg, Y. Oulabi, C. Bizer, Profiling the potential of web tables for augmenting cross-domain knowledge bases, in: Proceedings of the 25th International Conference on World Wide Web, 2016, pp. 251–261.
[3]
Zhang S., Smarttable: equipping spreadsheets with intelligent assistance functionalities, in: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, 2018, p. 1447.
[4]
S. Zhang, K. Balog, Auto-completion for data cells in relational tables, in: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019, pp. 761–770.
[5]
Limaye G., Sarawagi S., Chakrabarti S., Annotating and searching web tables using entities, types and relationships, Proc. VLDB Endow. 3 (1–2) (2010) 1338–1347.
[6]
Venetis P., Halevy A.Y., Madhavan J., Pasca M., Shen W., Wu F., Miao G., Recovering semantics of tables on the web, Proc. VLDB Endow. (2011) 528–538.
[7]
X. Chu, J. Morcos, I.F. Ilyas, M. Ouzzani, P. Papotti, N. Tang, Y. Ye, Katara: A data cleaning system powered by knowledge bases and crowdsourcing, in: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 2015, pp. 1247–1261.
[8]
Gupta S., Szekely P., Knoblock C.A., Goel A., Taheriyan M., Muslea M., Karma: A system for mapping structured sources into the Semantic Web, in: Extended Semantic Web Conference, Springer, 2012, pp. 430–434.
[9]
Chen S., Karaoglu A., Negreanu C., Ma T., Yao J.-G., Williams J., Gordon A., Lin C.-Y., LinkingPark: An integrated approach for semantic table interpretation, in: SemTab@ ISWC, 2020, pp. 65–74.
[10]
Jiménez-Ruiz E., Hassanzadeh O., Efthymiou V., Chen J., Srinivas K., Cutrona V., Results of semtab 2020, in: CEUR Workshop Proceedings, Vol. 2775, 2020, pp. 1–8.
[11]
Cutrona V., Bianchi F., Jiménez-Ruiz E., Palmonari M., Tough tables: Carefully evaluating entity linking for tabular data, in: International Semantic Web Conference, Springer, 2020, pp. 328–343.
[12]
S. Chen, A. Karaoglu, C. Negreanu, B.F. Karlsson, T. Ma, J.-G. Yao, J. Williams, F. Jiang, A. Gordon, C.-Y. Lin, LinkingPark: Automatic Semantic Table Interpretation Software, (2022). https://doi.org/10.5281/zenodo.6496662.
[13]
Chirigati F., Liu J., Korn F., Wu Y., Yu C., Zhang H., Knowledge exploration using tables on the web, Proc. VLDB Endow. 10 (3) (2016) 193–204.
[14]
S. Zhang, K. Balog, Ad hoc table retrieval using semantic similarity, in: Proceedings of the 2018 World Wide Web Conference, 2018, pp. 1553–1562.
[15]
S. Zhang, E. Meij, K. Balog, R. Reinanda, Novel entity discovery from web tables, in: Proceedings of the Web Conference 2020, 2020, pp. 1298–1308.
[16]
Zhang Z., Effective and efficient semantic table interpretation using tableminer+, Semant. Web 8 (6) (2017) 921–957.
[17]
Jiménez-Ruiz E., Hassanzadeh O., Efthymiou V., Chen J., Srinivas K., SemTab 2019: Resources to benchmark tabular data to knowledge graph matching systems, in: European Semantic Web Conference, Springer, 2020, pp. 514–530.
[18]
J. Chen, E. Jiménez-Ruiz, I. Horrocks, C. Sutton, Colnet: Embedding the semantics of web tables for column type prediction, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 2019, pp. 29–36.
[19]
Deng X., Sun H., Lees A., Wu Y., Yu C., TURL: table understanding through representation learning, Proc. VLDB Endow. 14 (3) (2020) 307–319.
[20]
D. Wang, P. Shiralkar, C. Lockard, B. Huang, X.L. Dong, M. Jiang, TCN: Table convolutional network for web table interpretation, in: Proceedings of the Web Conference 2021, 2021, pp. 4020–4032.
[21]
Shigapov R., Zumstein P., Kamlah J., Oberländer L., Mechnich J., Schumm I., Bbw: Matching CSV to wikidata via meta-lookup, in: CEUR Workshop Proceedings, Vol. 2775, RWTH, 2020, pp. 17–26.
[22]
Abdelmageed N., Schindler S., JenTab: A toolkit for semantic table annotations, in: Second International Workshop on Knowledge Graph Construction Co-Located with the ESWC, 2021.
[23]
Nguyen P., Yamada I., Kertkeidkachorn N., Ichise R., Takeda H., MTab4Wikidata at SemTab 2020: Tabular data annotation with wikidata, in: SemTab@ ISWC, 2020, pp. 86–95.
[24]
Vrandečić D., Krötzsch M., Wikidata: a free collaborative knowledgebase, Commun. ACM 57 (10) (2014) 78–85.
[25]
Wang X., Tabular abstraction, editing, and formatting, in: Thesis of University of Waterloo, UWSpace, 2016.
[26]
Efthymiou V., Hassanzadeh O., Rodriguez-Muro M., Christophides V., Matching web tables with knowledge base entities: from entity lookups to entity embeddings, in: International Semantic Web Conference, Springer, 2017, pp. 260–277.
[27]
Karaoglu A., Negreanu C., Chen S., Williams J., Fabian D., Gordon A., Lin C.-Y., Wiki2row - the In’s and Out’s or Row Suggestion with a Large Scale Knowledge Base, Microsoft, 2020.
[28]
Getoor L., Link-based classification, in: Advanced Methods for Knowledge Discovery from Complex Data, Springer, 2005, pp. 189–207.
[29]
Hignette G., Buche P., Dibie-Barthélemy J., Haemmerlé O., An ontology-driven annotation of data tables, in: International Conference on Web Information Systems Engineering, Springer, 2007, pp. 29–40.
[30]
Hignette G., Buche P., Dibie-Barthélemy J., Haemmerlé O., Fuzzy annotation of web data tables driven by a domain ontology, in: European Semantic Web Conference, Springer, 2009, pp. 638–653.
[31]
Mulwad V., Finin T., Joshi A., Semantic message passing for generating linked data from tables, in: International Semantic Web Conference, Springer, 2013, pp. 363–378.
[32]
D. Ritze, O. Lehmberg, C. Bizer, Matching html tables to dbpedia, in: Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics, 2015, pp. 1–6.
[33]
Huynh V.-P., Liu J., Chabot Y., Labbé T., Monnin P., Troncy R., DAGOBAH: Enhanced scoring algorithms for scalable annotations of tabular data, in: SemTab@ ISWC, 2020, pp. 27–39.
[34]
Kim D., Park H., Lee J.K., Kim W., Generating conceptual subgraph from tabular data for knowledge graph matching, in: SemTab@ ISWC, 2020, pp. 96–103.
[35]
Cremaschi M., Avogadro R., Barazzetti A., Chieregato D., MantisTable SE: an efficient approach for the semantic table interpretation, in: SemTab@ ISWC, 2020, pp. 75–85.
[36]
Cremaschi M., De Paoli F., Rula A., Spahiu B., A fully automated approach to a complete semantic table interpretation, Future Gener. Comput. Syst. 112 (2020) 478–500.
[37]
Melo A., Völker J., Paulheim H., Type prediction in noisy RDF knowledge bases using hierarchical multilabel classification with graph and latent features, Int. J. Artif. Intell. Tools 26 (02) (2017).
[38]
Cutrona V., Puleri G., Bianchi F., Palmonari M., NEST: Neural soft type constraints to improve entity linking in tables, in: Further with Knowledge Graphs, IOS Press, 2021, pp. 29–43.
[39]
Nguyen P., Yamada I., Kertkeidkachorn N., Ichise R., Takeda H., Demonstration of MTab: Tabular data annotation with knowledge graphs, in: International Semantic Web Conference, Posters, Demos, and Industry Tracks, 2021.
[40]
Y. Wang, J. Hu, A machine learning based approach for table detection on the web, in: Proceedings of the 11th International Conference on World Wide Web, 2002, pp. 242–250.
[41]
O. Lehmberg, D. Ritze, R. Meusel, C. Bizer, A large public corpus of web tables containing time and context metadata, in: Proceedings of the 25th International Conference Companion on World Wide Web, 2016, pp. 75–76.
[42]
Eberius J., Braunschweig K., Hentsch M., Thiele M., Ahmadov A., Lehner W., Building the dresden web table corpus: A classification approach, in: 2015 IEEE/ACM 2nd International Symposium on Big Data Computing (BDC), IEEE, 2015, pp. 41–50.
[43]
Braunschweig K., Thiele M., Lehner W., From web tables to concepts: A semantic normalization approach, in: International Conference on Conceptual Modeling, Springer, 2015, pp. 247–260.
[44]
Sil A., Kundu G., Florian R., Hamza W., Neural cross-lingual entity linking, in: Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[45]
Devlin J., Chang M.-W., Lee K., Toutanova K., BERT: Pre-training of deep bidirectional transformers for language understanding, in: NAACL, ACL, Minneapolis, Minnesota, 2019, pp. 4171–4186.
[46]
Nguyen P., Yamada I., Takeda H., MTabES: Entity search with keyword search, fuzzy search, and entity popularities, in: The 35th Annual Conference of the Japanese Society for Artificial Intelligence, 2021, p. 1N4IS1a02.
[47]
Huynh V.-P., Liu J., Chabot Y., Deuzé F., Labbé T., Monnin P., Troncy R., DAGOBAH: Table and graph contexts for efficient semantic annotation of tabular data, in: CEUR Workshop Proceedings, Vol. 3103, 2021, pp. 19–31.
[48]
Abdelmageed N., Schindler S., JenTab meets SemTab 2021’s new challenges, in: CEUR Workshop Proceedings, Vol. 3103, 2021, pp. 42–53.
[49]
Cutrona V., Chen J., Efthymiou V., Hassanzadeh O., Ernesto J.-R., Sequeda J., Srinivas K., Abdelmageed N., Hulsebos M., Oliveira D., Pesquita C., Results of semtab 2021, in: CEUR Workshop Proceedings, Vol. 3103, 2021, pp. 1–12.
[50]
Abdelmageed N., Schindler S., König-Ries B., BiodivTab: A table annotation benchmark based on biodiversity research data, in: CEUR Workshop Proceedings, Vol. 3103, 2021, pp. 13–18.
[51]
Boeddinghaus R., Marhan S., Berner D., Boch S., Fischer M., Kattge J., Klaus V., Kleinebecker T., Oelmann Y., Prati D., Schäfer D., Schöning I., Schrumpf M., Sorkau E., Kandeler E., Manning P., Kandeler E., Plant functional trait shifts explain concurrent changes in the structure and function of grassland soil microbial communities, 2017.
[52]
Fischer M., Nauss T., Tschapka M., Weisser W., Müller J., Aggregated species richness and habitat heterogeneity variables for testing the habitat-heterogeneity hypothesis, 2006–2018, 2020.
[53]
Seibold S., Gos̈ner M., Simons N., Blüthgen N., Müller J., Ambarli D., Ammer C., Bauhus J., Fischer M., Fürstenau C., Habel J.C., Linsenmair K.E., Nauss T., Ostrowski A., Penone C., Prati D., Schall P., Schulze E.-D., Vogt J., Wöllauer S., Weisser W., Arthropod data from 150 grassland plots, 2008–2017, and 140 forest plots, 2008–2016, used in ”Arthropod decline in grasslands and forests is associated with drivers at landscape level”, Nature (2019).
[54]
Leonhardt S., Peters B., Keller A., Trap nesting solitary bee species measured on all grassland VIPs 2017–2018, 2020.
[55]
Leonhardt S., Peters B., Keller A., Fatty acids in pollen of Osmia bicornis larval provisions 2017–2018, 2020.
[56]
Leonhardt S., Peters B., Keller A., Amino acids in pollen of Osmia bicornis larval provisions 2017–2018, 2020.
[57]
Staab M., Schuldt A., Assmann T., Bruelheide H., Klein A., Ant community structure during forest succession in a subtropical forest in South-East China, 2014, pp. 32–40.
[58]
Wubet T., Wu Y., Buscot F., Soil Fungal metagenome from 12 CSPs based on the fungal ITS rDNA pyrotags, 2013.
[59]
Nadrowski K., Deviations from stem breaking probabilities at species level, 2013.
[60]
Bruelheide H., Eichenberg D., Kröber W., Böhnke M., Ristok C., Main Experiment: Leaf traits and chemicals from individual trees in the Main Experiment (Site A & B), 2012.

Index Terms

  1. LinkingPark: An automatic semantic table interpretation system
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Web Semantics: Science, Services and Agents on the World Wide Web
        Web Semantics: Science, Services and Agents on the World Wide Web  Volume 74, Issue C
        Oct 2022
        153 pages

        Publisher

        Elsevier Science Publishers B. V.

        Netherlands

        Publication History

        Published: 01 October 2022

        Author Tags

        1. 00-01
        2. 99-00

        Author Tags

        1. Semantic table interpretation
        2. Entity linking
        3. Tabular data
        4. Knowledge graph

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 0
          Total Downloads
        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 16 Nov 2024

        Other Metrics

        Citations

        View Options

        View options

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media