Participate without installation: Jupyter Hub
If you are participating in an organised workshop, the organisers may have provided you with access to a Jupyter Hub. In this case you will be working on a remote server, with all required software, through a web browser interface. This interface, called Jupyter Lab, gives you access to a command line, a basic file browser and a basic text editor. For workshop organisers/instructors, more information on setting up a cloud server with Jupyter Hub can be found here.
Participate without installation: use Binder
If you don’t have access to a premade environment (such as the Jupyter Hub above) and can’t or don’t want to install anything on your own machine, you can follow all exercises through Binder. The link opens a Jupyter Lab interface in your browser (see above). The binder environment has the most important software needed during the workshop. However, it has two limitations:
- it is not persistent (all content will be removed after you close it)
- it does not allow outgoing ssh connections (meaning that during the lesson about collaboration you won’t be able to publish all example data).
Participate with own computer: install software
If you want to follow the exaples on your own machine, you will need to install DataLad and some additional software which we will use during the walkthrough. Note that Linux or MacOS are strongly recommended for this workshop; although DataLad works on all main operating systems, on Windows there are some caveats which may complicate the presented workflow.
Datalad
For the installation of DataLad, follow the instructions from the DataLad handbook.
Tig
Tig (text mode interface for Git) is a small command line program
which we will use to view dataset history. On Linux you can istall it
with your package manager (e.g. apt install tig
on Debian and
Ubuntu), and on MacOS it’s best to install it through
homebrew (brew install tig
). Detailed
instructions for different systems are given
here.
Python and modules
During the workshop, we will use photos and comma separated files to represent data, and custom Python scripts will serve as a model of data processing. In addition to Python you will need the following libraries:
pillow
(processing images - examples in Modules 1 and 3)pandas
andseaborn
(tabular data, plots - examples in Module 4)
The best way is to create a virtual environment and install the
packages there. One way to do it is with virtualenv
and pip
:
virtualenv --system-site-packages --python=python3 ~/.venvs/rdm-workshop
source ~/.venvs/rdm-workshop/bin/activate
pip install pillow
pip install pandas seaborn
Pandoc
Pandoc is a tool for converting files from one
markup format into another. We will use it in one of the examples in
Module 4. Like with Tig, you can install it with your package manager
on Linux (e.g. apt install pandoc
) or with homebrew on MacOS (brew
install pandoc
), and you can read about all installation methods and
systems here.
Register a GIN account
GIN is a data hosting / management platform of the German Neuroinformatics Node. In the module on remote collaboration we will be using GIN to demonstrate data publishing. If you want to follow the entire walkthrough, you will need to register a GIN account here. From the registration page:
For Registration we require only username, password, and a valid email address, but adding your name and affiliation is recommended. Please use an institutional email address for registration to benefit from the full set of features of GIN.