LURD
Home
Datasets
FAQ
Datasheet
Research Paper
Explore
GitHub
Datasheet
Motivation
For what purpose was the dataset created?
Who created the dataset and on behalf of which entity?
Who funded the creation of the dataset
Composition
What do the instances that comprise the dataset represent?
How many instances are there in total?
Does the dataset contain all possible instances or is it a sample (not necessarily random) of instances from a larger set?
What data does each instance consist of?
Is there a label or target associated with each instance?
Is any information missing from individual instances?
Are relationships between individual instances made explicit?
Are there recommended data splits?
Are there any errors, sources of noise, or redundancies in the dataset?
Is the dataset self-contained, or does it link to or otherwise rely on external resources?
Does the dataset contain data that might be considered confidential?
Does the dataset contain data that, if viewed directly, might be offensive, insulting, threatening, or might otherwise cause anxiety?
Collection
How was the data associated with each instance acquired?
What mechanisms or procedures were used to collect the data?
If the dataset is a sample from a larger set, what was the sampling strategy?
Who was involved in the data collection process and how were they compensated?
Over what timeframe was the data collected?
Were any ethical review processes conducted?
Did you collect the data from the individuals in question directly, or obtain it via third parties or other sources (e.g., websites)?
Were the individuals in question notified about the data collection?
Did the individuals in question consent to the collection and use of their data?
If consent was obtained, were the consenting individuals provided with a mechanism to revoke their consent in the future or for certain uses?
Has an analysis of the potential impact of the dataset and its use on data subjects (e.g., a data protection impact analysis) been conducted?
Preprocessing
Was any preprocessing/cleaning/labeling of the data done?
Was the “raw” data saved in addition to the preprocessed/cleaned/labeled data?
Is the software that was used to preprocess/clean/label the data available?
Uses
Has the dataset been used for any tasks already?
Is there a repository that links to any or all papers or systems that use the dataset?
What (other) tasks could the dataset be used for?
Is there anything about the composition of the dataset or the way it was collected and preprocessed/cleaned/labeled that might impact future uses?
Are there tasks for which the dataset should not be used?
Distribution
Will the dataset be distributed to third parties outside of the entity on behalf of which the dataset was created?
How will the dataset will be distributed?
Will the dataset be distributed under a copyright or other intellectual property (IP) license, and/or under applicable terms of use (ToU)?
Have any third parties imposed IP-based or other restrictions on the data associated with the instances?
Do any export controls or other regulatory restrictions apply to the dataset or to individual instances?
Copyright © 2024