What questions do you still have about the model and the associated data?

  1. Licensing & usage rights: What license covers each dataset used? Are there any restriction from the license that possibly affect the model training as well as the implementation in projects?
  2. Consent & privacy issues: For the dataset collected, how were they collected and where did they come from? Did they receive consent from the data owner, or author who published the content?
  3. Bias & training process: How do researchers try to decrease the biases during training? What specific methods are being used?

Are there elements you would propose including in the biography?

  1. datasheet summary for each dataset: for better clarification for selection biases and ethical constraints.
  2. detailed dataset information: the access to the dataset used.
  3. attributions & contributions: to acknowledge the help for those who provide the database, help train the model, do the data cleaning, etc.

How does understanding the provenance of the model and its data inform your creative process?