Research data is essential to facilitate scientific progress, yet, many valuable datasets are hidden on web sites and small repositories or are hard to find due to insufficient metadata. Only a fraction of researchers pro-actively share dataset metadata through public portals, and curation of such metadata collections is costly. Unknown Data will provide means to automatically discover, extract, and publish metadata about research data that is hidden on the Web or in scholarly publications. Thus, the project’s goal is to improve findability and re-usability of research data by (a) improving metadata quality, in particular with respect to authority and use of existing datasets and (b) uncovering datasets that are not yet reflected in public data repositories and registries.

Our approach

  1. utilises data citations from scholarly articles and web pages to collect metadata about relevant datasets,
  2. discovers datasets and their context by crawling web pages,
  3. consolidates metadata by linking information from domain-specific databases,
  4. facilitates high metadata quality by establishing a discipline-specific curation process, and (5) ensures long-term availability of original data sources by archiving relevant web pages.


  1. Funding: German Research Foundation (DFG) funding programme e-Research Technologies (project number 460676019).
  2. Duration: October, 2021 – September, 2024.

Further Web sites:

  1. at GESIS
  2. at LZI

all icon