Research Data Management with DataLad

The aim of this workshop is to present a set of good practices in research data management, explain version control, and to introduce the core features of DataLad software - a data management multitool that can assist in handling the entire life cycle of digital objects. We will tackle issues ranging from “what is a good file name” to “how to publish a version-controlled dataset”.

The exercises will provide an opportunity to gain hands-on experience with using DataLad to create a basic dataset, track its changes over time, and publish its contents. You will also learn how to use data created by others and collaborate on datasets. The exercises will be based around toy examples, but all operations will be relevant for real-world data management.

Prerequisites

The workshop will be based around code-along exercises, however no experience with programming, command line, or version control is required.

GIN (G-Node Infrastructure) platform will be used for dataset publication during the Remote Collaboration module. In order to fully complete the exercises, you will need to sign up for a GIN account, which requires a username, password, and a valid e-mail address (institutional e-mails are recommended to benefit from the full set of features).

Schedule

Setup Download files required for the lesson
00:00 1. Content tracking with DataLad What does version control mean for datasets?
How to create a DataLad dataset?
01:30 2. Structuring data What is a good filename?
How to keep data neatly structured?
03:00 3. Remote collaboration How to create a DataLad dataset collaboratively?
How to publish a DataLad dataset?
How to consume a DataLad dataset?
04:30 4. Dataset management How to manage data on a dataset level?
How to link two datasets?
When can multi-level datasets be useful?
06:00 5. Extras: The Basics of Branching What are branches, and why do you need them?
07:00 6. Extras: Removing datasets and files How can I remove files or datasets?
07:15 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.