MTab4DBpedia: Semantic Annotation for Tabular Data with DBpedia
- Table Annotation: https://dbpedia.mtab.app/mtab
- Entity Search: https://dbpedia.mtab.app/search
Search relevant entities from DBpedia (2016-10)
- q: search query. This parameter is required.
- limit: maximum number of relevant entities to return. The value should be from 1 to 1000. The default value is 20.
- m: one of three value [b, f, a]. The default value is a.
- b: keywords search with BM25 (hyper-parameters: b=0.75, k1=1.2).
- f: fuzzy search with an edit-distance (Damerau–Levenshtein distance).
- a: the weighted aggregation of keyword search and fuzzy search.
The query is Tokyo, and get 20 relevant entities.
Command:
curl --request POST --header "Content-Type: application/json" --data '{"q":"Tokyo", "limit":20}' https://dbpedia.mtab.app/api/v1/search
Get entity information from DBpedia (2016-10). The responded object include DBpedia title, mapping to Wikidata, Wikipedia, label, Aliases, types, pagerank score, entity statements, and literal statements.
- q: entity name. This parameter is required.
Get information of the entity Hideaki Takeda
Command:
curl --request POST --header "Content-Type: application/json" --data '{"q":"Hideaki Takeda"}' https://dbpedia.mtab.app/api/v1/info
Table annotation with MTab4DBpedia.
- table: table of content. This parameter is required.
- predict_target: True or False. Let MTab predict target of matching
- tar_cea: cell-entity targets
- tar_cta: column-type targets
- tar_cpa: relation-property targets
- round_id: from [1-5]. [1, 2, 3, 4] is the four rounds of SemTab 2019. 5 is Tough Tables dataset
- search_mode: "b": using BM25 entity search, "f": using entity fuzzy search, "a": using Entity search aggregation mode of "b" and "f"
Please refer m_main.py on how to use it.
Submit annotation file (CEA, CTA, CPA), then get the results.
- round_id: from [1-5]. [1, 2, 3, 4] is the four rounds of SemTab 2019. 5 is Tough Tables dataset
- res_cea: cell-entity results
- res_cta: column-type results
- res_cpa: relation-property results
Please refer m_main.py on how to use it.
Annotate numerical column of tables with Knowledge Graph properties
- values: numerical values
- limit: maximum number of relevant entities to return.
- get_prop_class: get more semantic labels as a format of property||class
Command:
curl --request POST --header "Content-Type: application/json" --data '{"values":[1.50, 1.51, 1.52, 1.53, 1.54], "limit": 5}' https://dbpedia.mtab.app/api/v1/num
or please refer m_main.py for other examples.
- Clone MTab4DBpedia, and open project
git clone https://github.com/phucty/mtab4dbpedia.git
cd mtab4dbpedia
- Create conda environment, activate, and install mtab4dbpedia
conda create -n mtab4dbpedia python=3.6
conda activate mtab4dbpedia
pip install -r requirements.txt
- Other setup:
- Change DIR_ROOT in m_setting.py to your project directory. Current value is (This is the directory in my laptop)
DIR_ROOT = "/Users/phuc/git/mtab4dbpedia"
- Decompress data files
data/semtab_2019_dbpedia_2016-10.tar.bz2
data/semtab_org.tar.bz2
- Run experiment for
- 5 datasets: 4 rounds of SemTab 2019, and Tough Tables (Round 5)
- 2 datasets version: SemTab 2019 (original), and adapted SemTab 2019 DBpedia 2016-10
python exp_semtab.py
- Original version of SemTab 2019 and Tough Tables
- Adapted SemTab 2019 and Tough Tables with DBpedia 2016-10 Note:
- Why do we need the adapted version?
To make a fair evaluation, it is important to have the same target DBpedia version because DBpedia change overtime. Additionally, using up-to-date resources also could yield a higher performance since data is more complete than older version. It is unfair with the previous study used the older version of DBpedia.
- How to adapt the dataset with DBpedia 2016-10?
-
Open resources for reproducibility:
- Classes, properties, and their equivalents
- Entity dump (all information about entities) or using API Get Entity Information
- API entity search based on DBpedia entity label and aliases (multilingual)
-
Adapt Ground Truth:
- Make targets and ground truth consistence
- Remove invalid entities, types, properties.
- Adding redirect, equivalent entities, types, and properties.
- Remove prefix to avoid redirect issues (it shows page instead of resource).
-
-
Phuc Nguyen, Hideaki Takeda, MTab: Tabular Data Annotation, NII Open House June 2021. [video]
-
Phuc Nguyen, Ikuya Yamada, Hideaki Takeda, MTabES: Entity Search with Keyword Search, Fuzzy Search, and Entity Popularities, In The 35th Annual Conference of the Japanese Society for Artificial Intelligence (JSAI), 2021. [video]
-
Phuc Nguyen, Ikuya Yamada, Natthawut Kertkeidkachorn, Ryutaro Ichise, Hideaki Takeda, MTab4Wikidata at SemTab 2020: Tabular Data Annotation with Wikidata, In SemTab@ISWC, 2020. [video]
-
Phuc Nguyen, Natthawut Kertkeidkachorn, Ryutaro Ichise, Hideaki Takeda MTab: Matching Tabular Data to Knowledge Graph using Probability Models, In SemTab@ISWC, 2019, [slides]
- 1st prize at SemTab 2020 (tabular data to Wikidata matching). Results
- 1st prize at SemTab 2019 (tabular data to DBpedia matching). Results
If you find MTab4DBpedia tool useful in your work, and you want to cite our work, please use the following referencee:
@article{Nguyen2022MTab4DSA,
title={MTab4D: Semantic annotation of tabular data with DBpedia},
author={Phuc Tri Nguyen and Natthawut Kertkeidkachorn and Ryutaro Ichise and Hideaki Takeda},
journal={Semantic Web},
year={2022}
}
Phuc Nguyen ([email protected]
)