Address
304 North Cardinal St.
Dorchester Center, MA 02124
Work Hours
Monday to Friday: 7AM - 7PM
Weekend: 10AM - 5PM
The hub environment needs to be flexible: scientists should be able to run custom analyses. This requires the ability to run scripts, through command line access.
Security measures are sometimes at odds with ease of use. With the hub we have as aim to avoid this as much as possible. This includes easy syncing of files.
The analysis is secure. The hub is ISO27001 compliant, uses 2-factor authentication, and no direct data transfer to other machines on the internet is possible. Fine-grained permissions are used to arrange access to the different datasets.
Our goal is to provide harmonized pre-processed data. With this, quality control and harmonization efforts can be shared across the various projects.
The hub consists of a user interface machine, and a (variable) number of worker nodes. A firewall prevents access to and from the internet. Access is provided to an online storage (CephFS) and a tape storage system (dCache). SSH access is only allowed through a so-called doornode.
Access to the hub will be provided through one of the bastion services. The doornode prevents file transfer, but allows SSH access (2-factor authentication) through a two step log-in process, acting as a stepping stone to the AGH user interface machine.
The system uses the SRAM system, a self-service authentication method, . which links to your personal institute account. Authentication to the AGH will require 2-factor authentication: username / password and an authentication token.
It is necessary to allow summary tables and plots to be downloaded, as well as to enable upload of software and (annotation) datasets. This is handled by a data transfer node. This node logs all upload/download actions, and keeps all data which is downloaded. It also prohibits large-scale data transfers necessary to download raw data or the full processed dataset.
Data is pre-processed on Snellius, the Dutch National Supercomputer, by the Hub team. Access to Snellius is safe-guarded through 2-factor authentication.
This is the standard online storage system as used by Spider (Cephfs). This file system is mounted onto the user interface machine, and worker nodes. This data storage is not accessible to users outside of the AGH clone environment.
This is the tape storage system. Data providers can directly upload their data to this system through an upload-only token. After pre-processing, data at rest on tape is stored in an encrypted format.
The storage system uses a fine-grained ACL permission system. Users obtain access to the datasets and projects for which they obtained permission from the data providers.
