What is Metadata?

Metadata refers to data that provides information about other data. In publishing, metadata refers to various details about an article or a book, such as the title, author(s), DOI, publication year, abstract, keywords, and sources. It serves as a key component, complementing the content itself. Properly curated metadata is essential for facilitating data discovery, comprehension, and organization of collected journal articles. Additionally, it simplifies communication with authors or affiliated institutions, thereby enhancing the research process for scholars.

What is MetaFetch?

In many cases, researchers collect files containing published articles but may lack essential metadata fields such as all the authors, publication years, abstracts, or keywords. To fill in these gaps typically, one might resort to manually searching for each article on various publishing platforms to gather the missing information. However, this process becomes tedious and error-prone, especially with large datasets.

An alternative method involves leveraging public APIs offered by many publishing companies to programmatically fetch the missing metadata. This approach automates and streamlines the process of retrieving metadata for multiple articles. Inspired by this streamlined approach, Metafetch specifically automates the retrieval of metadata for journal articles from multiple sources.

Currently the tool uses the APIs to retrieve the metadata from Elservier’s Scopus, PubMed and CrossRef.

How does MetaFetch work?

Given a file of journal articles which may contain some gaps in metadata, the tool uses APIs from journal sources to retrieve the missing data. The MetaFetch interface is easy to navigate and has instructions the user can follow. Here is an overview of the steps you can take to use the tool.

  1. Prepare the input file by using the same field names as specified in the provided sample files. The input file is required to have the article title or doi to be used as the search subject, one of these must be present in the file.








  2. Upload the file thus prepared using the interface. You can select a specific API representing the source. If you are not sure of which one to use, select the ‘ALL’ option.
  3. After the file selection, a summary of the relevant gaps in the input will be presented.
  4. Click the ‘Fetch Metadata’ to get the data.
  5. Download returned data files by clicking the buttons;
    1. ‘Merged file’ contains input file data merged with the returned data.
    2. ‘Raw file’ contains only the returned data by the API’s.

Key Features

  1. User-Friendly Interface: The UI is easy to navigate through and gets you started instantly.
  2. Efficiency: Can fetch and process a batch of 100 articles in under 5 minutes.
  3. Accuracy: Designed with accuracy in mind. Only 100% matches are returned.
  4. Comprehensiveness: The tool uses three of the richest publishing databases namely Elservier’s Scopus, Pubmed and CrossRef to ensure wide coverage.

Where do we go from here?

While acknowledging the capabilities of the tool, it's important to consider more questions it could address. For instance, can we improve its comprehensiveness by integrating additional APIs from various sources? Furthermore, is it possible to extend its functionality beyond mere metadata retrieval to include information extraction capabilities?

These inquiries serve as the foundation for advancing the tool. By incorporating more APIs, we can enrich the tool's capabilities with a wider range of data sources. Additionally, by including information extraction functionalities, we can empower users to get more insights from the content itself.