National Data Node
We offer help in many different parts of data management and data stewardship activities:
- promote FAIR data management with the life sciences,
- preparation of data management plan (DMP),
- monitoring of DMP execution,
- data management including data analysis for any kind of research data
- FAIRification of data
- RDM and DMP training
All activities are in line with open science strategy and FAIR principles.
Research data management (RDM) expert group initiative
- for data from various fields of life sciences
- in collaboration with other complementary infrastructures
- HPC consisting of 256-core CPU nodes and NVIDIA A100 80G GPU nodes
European Genome-phenome Archive (EGA)
The EGA is a service for permanent archiving and sharing of personally identifiable genetic and phenotypic data. It is a distributed solution for data sharing and exchange across national borders.
There are Local / Federated EGAs in all participating countries, as well as the Central / Main EGA that connects the local (national) EGAs.
Some Local-EGAs are already in production (e.g. in Nordic countries). ELIXIR-SI is setting up national Local-EGA for Slovenia funded by ELIXIR-SI RI-SI-2 National infrastructure project. Full operation is expected in 2024 with first available version tentatively by the end of 2023.
A Local-EGA is a national collection of sensitive human -omics data. The Local-EGA will allow you to deposit sensitive human data locally (and comply with national guidelines for storing that data) but enable data reuse across national boundaries by sharing the metadata of the nationally archived data with the Central / Main EGA.
The shared metadata can be searched through the main EGA portal. This will allow you to use the EGA search engine to search for data stored in every participating country. You can also search and retrieve information from the Local-EGA by using the Local-EGI API, so you can build your own services that are based on the available data and use the Local-EGI API.
European Nucleotide Archive (ENA)
The European Nucleotide Archive (ENA), developed and maintained by the EMBL-EBI, is an archive for experimental workflows that are based around nucleotide sequencing. A typical workflow includes the sample isolation and preparation, production of sequencing data with a sequencing machine, and a subsequent bioinformatic analysis. The ENA records this information in a data model that covers input information (sample, experimental setup, machine configuration), output machine data (sequence traces, reads and quality scores) and interpreted information (assembly, mapping, functional annotation).
Access to ENA data is provided through the browser with search tools, through large scale file download and through the API.
The ENA is part of the ELIXIR infrastructure and an ELIXIR Core Data Resource.
Data sources for the ENA include submissions of raw data, assembled sequences and annotation from small-scale sequencing efforts, data provision from the major European sequencing centres, and information exchange with the International Nucleotide Sequence Database Collaboration (INSDC). Provision of nucleotide sequence data is now mandatory for publication of research findings. It is also required by funding bodies. There are many data classes and formats for ENA submissions. Latest developments and changes to services are announced on the ENA news page and through the ENA mailing list.
2. Research equipment
ELIXIR-SI RI-SI-2 research equipment
It is a medium-performance 1U server (2x Xeon 4215R, 96 GB RAM) with an additional GeForce RTX 2080Ti graphics card. Data is stored on hard disk drives in two JBOD enclosures (gross capacity 840 GB). The server allows the storage of a large amount of data from a cryoelectron microscope and, in some cases, the simultaneous analysis of this data. CryoSPARC and RELION software are mostly used for data analysis, both of which are very modern and widely used in the field of single particle analysis (SPA). We also use other software: IMOD, EMAN, CRYOLO, topaz, phenix, etc. The operating system is Linux Ubuntu. Via fast network connections, data can also be sent to other appropriate equipment outside the KI, which enables even faster analysis.
FlowJo software, intended primarily for the needs of flow cytometry, which will operate within ELIXIR-SI, will enable the analysis and visualization of experimental data.
How to Cite FlowJo for Publication
To cite FlowJo™ in your paper, please apply this information (per AMA style guidelines):
In running text
“As found by our FlowJo™ Software, 89 percent of cells contained the trait (1)” or
“The flow cytometry results were analyzed using FlowJo™ v10.8 Software (BD Life Sciences)”
- FlowJo™ Software (when applicable add—for Windows or for Mac) [software application] Version XXX. Ashland, OR: Becton, Dickinson and Company; 2021.
The high-performance 4U server (2x AMD EPYC Rome, 8x A100 NVIDIA GPU, 2 TB RAM, 64 TB gross SSD) enables fast and parallel analysis of images from a cryoelectron microscope. These are image data stored on other equipment, therefore fast connections and enough temporary working SSD space on the server itself are very important.
Medium capacity 2U server Lenovo ThinkSystem SR650 (Xeon SP Gen 2) consists of two 24-core Intel Xeon Gold 6248R processors (3.0 GHz), 256 GB RAM memory (TruDDR4 2933 MHz RDIMM), and the IBM FlashSystem 5030 Hybrid Flash System 64G-C hard drive storage system with 100 TB net capacity. The hardware has been installed and incorporated in the KIS IT system. The necessary physical and logical connections were made between the existing system and the ELIXIR system, which enable safe access for users. Due to the simultaneous use of the system by several users with different requirements for the operation of the system, the server can be utilised through the installed virtualization software, which was enabled with our own KIS resources.
The server is intended for intensive numerical and graphical data processing on the LINUX platform. The applications involve the implementation of various bioinformatics pipelines based on the analysis of large amounts of data. The server allows user-defined installation of bioinformatics pipelines using open-source bioinformatics software packages. Typical applications are in the fields of genetics, genomics (raw sequencing data processing, genome assembly and annotation), precision agriculture and hyperspectral data processing (image processing, machine learning model calculation and data classification).
A Cisco Catalyst 9500 network switch with 24 data inputs (1/10/25G) and 4x 40/100G network module will enable high-speed connectivity of/to the server for large data transfer.
Server ProLiant DL385 Gen10 Plus Server has a AMD EPYC 7452 processor (64 cores), 640 GB RAM, 52 TB disk space (RAID6) and NVIDIA Quadro RTX4000 graphical card.
Programs currently installed mostly serve sequence analysis tasks and analysis of next generation sequencing transcriptomics data (mainly Illumina and Nanopore). The set of programs and possibilities is constantly being updated, based on researchers’ needs.
A Cisco Catalyst 9200C network switch with 24 data inputs and a 4×10 G network module will provide a reliable, secure and high-speed connectivity to the server after a planned network upgrade.
The HPE ProLiant DL385 Gen10 Plus Server has two AMD EPYC 7702 processors with 64 cores each, 1 TB of working memory (DDR4-3200), and 64 TB disk storage that provides access to up to 48 TB of storage in a RAID5 configuration. In 2021, the server was upgraded with two NVIDIA Quadro RTX 6000 GPUs to meet the needs of GPU-accelerated bioinformatics applications. The GPUs were funded by the P4-0077 programme group.
The server is available to interested researchers. The Qiagen CLC Genomics Server software is installed on the server, allowing the centralized execution of complex bioinformatics pipelines on the server via two licenses of the Qiagen CLC Genomics Workbench package (purchased with funding from programme groups P4-0077 and P4-0220). In addition, standard open source bioinformatics software packages are installed on the host.
UL MF (LJ) and UL VF (MB)
1. Central network RIKT equipment enables fast redundant 100Gbps connection between the two locations (Ljubljana and Maribor) of the central RIKT equipment of the ELIXIR-SI node (Mellanox 3700 switches). The connection takes place via the Arnes national network backbone, which also establishes a connection to the supercomputer infrastructure at home and abroad (European ELIXIR network).
2. The central repository and the cluster form the central part of the national ELIXIR-SI data node.
The repository is intended for the storage of research data in the field of life sciences, in accordance with the principles of FAIR (Research Data Archive). Other data services compliant with the European ELIXIR network are also being set up in the repository, such as archive of the Federated EGA (FEGA). The storage consists of eight nodes evenly distributed in both locations (currently gross approx. 2PB of “slower” disk space and 0.5PB of fast disk space).
The computational cluster is intended primarily for testing and development of bioinformatics and machine learning algorithms for analyzing data in the form of workflows and for analyzing and processing data generated on laboratory equipment of ELIXIR-SI partners and other researchers in the field of life sciences. The computing cluster consists of CPU nodes (128 cores / 256 threads and 1TB of RAM each) and a GPU platform with Nvidia A100 80GB cards with fast NVMe disks. The servers are reasonably deployed at both locations of the central RIKT equipment.
Package 18 ARIS project research equipment
- Startup computer cluster and disk array for UL MF Research Data Archive (ARM)
3. Dry lab includes:
- Data management
- Data stewardship
- Data archive
- Data science
- Data analytics and data analysis
- Bioinformatics, biostatistics, computational biology
- Certification agency for life science research
a) General activities
- JSI-DKT is a founding member of the Centre of excellence for Integrated Approaches in Chemistry and Biology of Proteins (CIPKeBiP).
- Through CIPKeBiP, we have access to a high-performance computing cluster, named Lutetia.
- The cluster is composed of two master servers and forty-four computing servers (with a total of 1000 cores and 8TB of RAM), a disk subsystem (with a total of 24TB of hard disk drives), and ancillary equipment.
- The cluster is extensively used for research related to ELIXIR topics.
- We also use the resources offered by the Slovenian national supercomputing network (SLING).
- The department owns a number of smaller storage/computing servers, with 128 cores and 64 cores+4 GPUs, which are also partly used for research related to ELIXIR topics.
- HPC bioinformatics infrastructure (genomic analytical pipelines, data visualisation, analytical pipeline development).
- Programs CLC Genomics Workbench (two floating licenses),
- CLC Genomics Server for bioinformatical analyses,
- BLAST2GO command line,
- Codon Code Aligner
- Statistical and bioinformatical analyses of genomics and transcriptomics data
UL MF IBMI
- data management (safe and secure data storage following FAIR standards, dynamic archiving)
- planning, building and maintenance of research and clinical registries
- access to (virtual) tools and services for data analysis
- planning and execution of bioinformatics analyses, especially analyses automatisation and data standardisation for advance analyses
- development of methods, tools and services for e-learning
- coordination of listed tools and services for research and infrastructure projects where UL MF is partner or coordinator
- Building National Archives of Research Data
- Building a National Genomic Research Infrastructure
UL MF CFGBC
- statistical and bioinformatical analyses of transcriptome analyses
- experimental design for omics experiments
- system for laboratory sample labeling and tracing (QR code creation, printing and reading) and sample data storage (relational databases)
- Program to analyze the results of genotyping BioNumerics 7.6 (Applied Maths)
- Program Geneious for bioinformatical analyses
- computer-assisted drug design
- inverse molecular docking of natural compounds
- molecular dynamics simulation
- free-energy calculations
b) Research equipment, tools & services
Biomine Explorer is a web application for link discovery and interactive exploration in biological databases. It was built on top of Biomine, a system which integrates cross-references from several biological databases into a large heterogeneous probabilistic network. Biomine Explorer offers user-friendly interfaces for search, visualization, exploration and manipulation as well as public and private storage of discovered subnetworks with permanent links suitable for inclusion into scientific publications.
Link to service: Biomine Explorer
dictyExpress is a web-based application for retrieval and analysis of gene expression data collected from social amoeba Dictyostelium. It is Dictyostelium’s largest and most used gene expression repository and is thoroughly referenced from dictyBase, organism’s genome home page.
Link to service: dictyExpress
DiNAR is a user friendly application for revealing hidden patterns of plant signalling dynamics using Differential Network Analysis. It comes with additional sub-applications for network preparation and pre-processing. DiNAR is a Shiny App and has three main functionalities: dynamic visualisation of complex multi-conditional experiments, identification of strong differential interactions, and recall of latent effects that are present in multi-conditional experiments. Its primary purpose is to reveal hidden patterns of plant signalling dynamics, but can be extended to any user defined network in combination with experimental datasets (transcriptomics, proteomics, metabolomics, …).
Link to service: DiNAR
ELIXIR-SI expands and enriches collection of tools and services in dry and wet lab and in collaboration with other (inter)national ELIXIR partners.
The ELIXIR-SI e-learning team leads e-learning activity within ELIXIR Training platform:
- Developing of ELIXIR-SI eLearning Platform (EeLP) with highly recognized ELIXIR courses in last years:
- Unix/Linux tutorial for beginners (in several repetitions);
- ELIXIR-EXCELERATE HPC Train-the-Researcher course;
- Genome Assembly and Annotation (in several repetitions).
- Data management / data stewardship (in several repetitions).
- Developing of high quality e-learning-based training courses in collaboration with other ELIXIR nodes and RD-Connect in the area of Rare Diseases.
- Developing connection of Cloud and HPC with EeLP.
- Linux terminal embedded in EeLP is now routinely used in ELIXIR e-learning courses.
- Containerised Galaxy servers for students’ training (we initiated Galaxy for training in Slovenia), metagenomics and genome assembly and annotation support.
GoMapMan is an open, web-accessible, resource for gene functional annotations in plant sciences. Three sub-applications exist — namely the protein (protein.gomapman.org), metabolite (meta.gomapman.org) and small RNA (srna.gomapman.org) GoMapMan. In all, protein coding genes, metabolites and small RNAs are described using the MapMan plant ontology.
Grohar presents an open-source computational tool focused to the analysis, visualisation and alignment of genome scale metabolic models (GEMs). Grohar provides an easy to use graphical interface, which allows the user to perform different types of COBRA analyses without any programming skills. Moreover, Grohar allows automatic identification of individual metabolic pathways (from KEGG or SBML files) within the GEM network.
Orange Data Mining is an open source machine learning and data visualization tool that features interactive visualisations and visual programming for construction of data analysis workflows. It includes a large toolbox that supports data preprocessing, clustering, classification, correlation analysis, network construction and analysis, and model evaluation. A number of its add-ons have been specifically designed for biomedical data analysis and access to public biomedical data repositories.
pISA-tree is a data management solution developed to contribute to the reproducibility of research and analyses. Hierarchical set of batch files are used to create a standard project directory tree for research projects for Windows. It is in accordance with the ISA-tab framework and is meant as a support system for reproducible research. The tree structure with standardized nested directories can be generated on the fly, actively during project development and growth. pISA-tree can support small to moderate projects and is a step towards the FAIR data guiding principles. In addition, two related in-house developed R packages are being actively developed: pisar (R support for pISA-tree) and seekr (R interface with the SEEK API). The latter connects the project folders with the externally maintained and developed open source web platform FAIRDOMHub for sharing scientific research assets.
Link to service: pISA-tree
quantGenius is an open web accessible resource for data organization and analysis for various applications of the qPCR method. It is designed as a workflow that guides the user through quality control and calculation steps. The built-in quality control-based decision support system enables robust quantification of nucleic acids.
Link to service: quantGenius
Single Cell Orange is a derivative of Orange data mining toolbox specifically designed for scRNA data analysis. In addition to standard data analytics it features components for mutli-sample data loading, gene marker discovery, marker-based cell scoring, batch effect removal, data alignment, cluster analysis, and analysis of gene set enrichment. All visualisations in scOrange are interactive and support explorative data analysis that does not require any knowledge of programming.
ViDis presents a web-based platform for construction and sharing of diagnostic as well as other clinical algorithms. ViDis provides an easy-to-use web interface, which makes its functionalities available to a wide scope of users, i.e. doctors and clinicians, researchers and finally to patients.
BITOLA – Biomedical Discovery Support System
BITOLA is an interactive literature-based biomedical discovery support system. The purpose of the system is to help the biomedical researchers make new discoveries by discovering potentially new relations between biomedical concepts. The set of concepts currently contains MeSH (Medical Subject Heading), which is used to index Medline, and human genes from HUGO. The potentially new relations are discovered by mining the Medline database.
Link to service: Bitola
ClowdFlows is an open sourced cloud based platform for composition, execution, and sharing of interactive machine learning and data mining workflows. It is based on the principles of service-oriented knowledge discovery and features interactive scientific workflows. In contrast to comparable data mining platforms, ClowdFlows runs in all major Web browsers and platforms. ClowdFlows provides researchers with an easy way to expose and share their work and results, as only an Internet connection and a Web browser are required to access the workflow from anywhere. Practictioners can use ClowdFlows to seamlessly integrate and join different implementations of algorithms, tools and Web services into a coherent workflow that can be executed in a cloud based application. ClowdFlows is also easily extensible during run-time by importing Web services and using them as new workflow components.
Link to service: ClowdFlows
ENZO is a web tool for easy construction and quick testing of kinetic models of enzyme catalyzed reactions.
The tool can be utilized by any interested researcher for efficient testing and evaluation of various kinetic models for a given enzyme catalyzed reaction.
No installation or registration is required. It works on any operating system with a modern web browser.
Link to service: ENZO
SegMine is a powerful methodology for semantic analysis of transcriptomic data. It offers improved hypothesis generation and data interpretation for life scientists. SegMine employs semantic subgroup discovery to construct elaborate rules which identify enriched gene sets, and link discovery for the creation and visualization of new biological hypotheses.
Link to service: SegMine