Materials Growth & Measurement Laboratory

Czech open research infrastructure

Data Policy and FAIR principles

Our mission is to help the broad user community with their needs regarding scientific experiments and as a supporting facility. There are growing efforts in opening science by making the data acquisition process more transparent and much more available, including the data reduction and evaluation procedures. A strong foundation for this are FAIR principles, building on Findability, Accessibility, Interoperability, and Reuse of digital assets.

The purpose of the data collection at MGML is to store all the relevant information connected with performed experiments. This does not include only measured data but also detailed documentation of the measurement’s whole progress: electronic logbook, detailed history of all available sensors during measurement, photo and video documentation, user remarks, and user scripts. In addition, we are providing infrastructure for data reduction and evaluation. By collecting all the mentioned data, MGML helps scientists perform reliable and reproducible experiments following FAIR principles.

This approach will ensure that all data produced at MGML will be following FAIR principles:

    Each dataset will be accurately described with metadata, and all instrument related data will be categorized into respective folders. Our technical solution will ensure that all the MGML datasets will be included in data search engines like Google Dataset Search. Automatically assigned persistent identifiers will allow citing the data, and all the users will be forced to cite their datasets by our data policy.
    After the embargo period of five years from the completion of the proposal, all related data will be fully opened and published under the CC0 license (public domain). CC0 license facilitates the discovery, reuse, and citation of that data forever.
    MGML is running several dozens of different instruments producing data in various data formats. All in-house developed instruments are using open-source software that generates standardized and well described data formats. All commercial instruments are generating standardized data formats that are interchangeable between researchers and institutions.
    A strict CC0 license will allow unlimited reuse of all MGML’s produced datasets. Five years embargo period will guarantee enough time for researchers to take advantage of their data’s exclusivity and properly publish all their results. Each PI of the proposal can shorten or prolong the embargo period or publish the data immediately.

In addition to the FAIR principles, there is a supplementary aspect which will MGML guarantee:

    The technical design of MGML instruments will ensure that collected raw data were not modified by the user or anyone else. Our system will generate control checksums of every dataset, and these will be openly published immediately after the experiment. Therefore, it will be possible to check the consistency of the data anytime. Our technical solution also guarantees that no data files were deleted from the published datasets. The idea behind trustworthy data is explained in diagram below.

Data trustworthiness diagram

Diagram showing the advantages of trustworthy data in the means of open science principles. During the peer review process, the manuscript is reviewed, and evaluation scripts are checked for correctness. However, there is no way to check if the provided raw data are correct. This task needs to be secured by large research infrastructure.