Recently, there has been a rapid growth in the amount of data available on the Web. Data is produced by different communities working in a wide range of domains, using several techniques. This way a large volume of data in different formats and languages is generated. Accessibility of such heterogeneous and multilingual data becomes an obstacle for reuse due to the incompatibility of data formats and the language gap. This incompatibility of data formats impedes the accessibility of data sources to the right community. For instance, most of open domain question answering systems are developed to be effective when data is represented in RDF. They can not operate with data in the very common CSV files or presented in unstructured formats. Usually, the data they draw from is in English rendering them unable to answer questions e.g. in Spanish. On the other hand, NLP applications in Spanish cannot make use of a knowledge graph in English. Different communities have different requirements in terms of data representation and modeling. It is crucial to make the data interoperable to make it accessible for a variety of applications.
With a larger pool of data accessible in the right format and languages, applications could operate with a larger background knowledge. For knowledge graph completion, integration of heterogeneous data is fundamental.
However, the community currently lack tools to transform data and data sets that are available in a variety of formats and languages. A variety of communities could benefit from such tools and data sets to make data interoperable and having the possibility to include them in their applications. Making data interoperable is one of the core principles of the FAIR framework, setting standards for improving the reuse of data. Further, those data sets and tools can facilitate the future reuse of the data by different applications.
Papers due: July 9, 2018
Notification of accepted papers: July 30, 2019