National Archive of Research Data
- for data from various fields of life sciences
- in coordination with other complementary infrastructures.
European Genome-phenome Archive (EGA)
The EGA is a service for permanent archiving and sharing of personally identifiable genetic and phenotypic data. It is a distributed solution for data sharing and exchange across national borders.
There are Local / Federated EGAs in all participating countries, as well as the Central / Main EGA that connects the local (national) EGAs.
Some Local-EGAs are already in production (e.g. in Nordic countries). ELIXIR-SI is setting up national Local-EGA for Slovenia funded by ELIXIR-SI RI-SI-2 National infrastructure project. Full operation is expected by mid 2021 with first available version by the end of 2020.
A Local-EGA is a national collection of sensitive human -omics data. The Local-EGA will allow you to deposit sensitive human data locally (and comply with national guidelines for storing that data) but enable data reuse across national boundaries by sharing the metadata of the nationally archived data with the Central / Main EGA.
The shared metadata can be searched through the main EGA portal. This will allow you to use the EGA search engine to search for data stored in every participating country. You can also search and retrieve information from the Local-EGA by using the Local-EGI API, so you can build your own services that are based on the available data and use the Local-EGI API.
European Nucleotide Archive (ENA)
The European Nucleotide Archive (ENA), developed and maintained by the EMBL-EBI, is an archive for experimental workflows that are based around nucleotide sequencing. A typical workflow includes the sample isolation and preparation, production of sequencing data with a sequencing machine, and a subsequent bioinformatic analysis. The ENA records this information in a data model that covers input information (sample, experimental setup, machine configuration), output machine data (sequence traces, reads and quality scores) and interpreted information (assembly, mapping, functional annotation).
Access to ENA data is provided through the browser with search tools, through large scale file download and through the API.
The ENA is part of the ELIXIR infrastructure and an ELIXIR Core Data Resource.
Data sources for the ENA include submissions of raw data, assembled sequences and annotation from small-scale sequencing efforts, data provision from the major European sequencing centres, and information exchange with the International Nucleotide Sequence Database Collaboration (INSDC). Provision of nucleotide sequence data is now mandatory for publication of research findings. It is also required by funding bodies. There are many data classes and formats for ENA submissions. Latest developments and changes to services are announced on the ENA news page and through the ENA mailing list.
Dry lab includes:
- Data management
- Data stewardship
- Data archive
- Data science
- Data analytics and data analysis
- Bioinformatics, biostatistics, computational biology
- Certification agency for life science research
1. General activities
- JSI-DKT is a founding member of the Centre of excellence for Integrated Approaches in Chemistry and Biology of Proteins (CIPKeBiP).
- Through CIPKeBiP, we have access to a high-performance computing cluster, named Lutetia.
- The cluster is composed of two master servers and forty-four computing servers (with a total of 1000 cores and 8TB of RAM), a disk subsystem (with a total of 24TB of hard disk drives), and ancillary equipment.
- The cluster is extensively used for research related to ELIXIR topics.
- We also use the resources offered by the Slovenian national supercomputing network (SLING).
- The department owns a number of smaller storage/computing servers, with 128 cores and 64 cores+4 GPUs, which are also partly used for research related to ELIXIR topics.
- HPC bioinformatics infrastructure (genomic analytical pipelines, data visualisation, analytical pipeline development).
- Programs CLC Genomics Workbench (two floating licenses),
- CLC Genomics Server for bioinformatical analyses,
- BLAST2GO command line,
- Codon Code Aligner
- Statistical and bioinformatical analyses of genomics and transcriptomics data
UL MF IBMI
- data management (safe and secure data storage following FAIR standards, dynamic archiving)
- planning, building and maintenance of research and clinical registries
- access to (virtual) tools and services for data analysis
- planning and execution of bioinformatics analyses, especially analyses automatisation and data standardisation for advance analyses
- development of methods, tools and services for e-learning
- coordination of listed tools and services for research and infrastructure projects where UL MF is partner or coordinator
- Building National Archives of Research Data
- Building a National Genomic Research Infrastructure
UL MF CFGBC
- statistical and bioinformatical analyses of transcriptome analyses
- experimental design for omics experiments
- system for laboratory sample labeling and tracing (QR code creation, printing and reading) and sample data storage (relational databases)
- Program to analyze the results of genotyping BioNumerics 7.6 (Applied Maths)
- Program Geneious for bioinformatical analyses
- computer-assisted drug design
- inverse molecular docking of natural compounds
- molecular dynamics simulation
- free-energy calculations
2. Research equipment, tools & services
Biomine Explorer is a web application for link discovery and interactive exploration in biological databases. It was built on top of Biomine, a system which integrates cross-references from several biological databases into a large heterogeneous probabilistic network. Biomine Explorer offers user-friendly interfaces for search, visualization, exploration and manipulation as well as public and private storage of discovered subnetworks with permanent links suitable for inclusion into scientific publications.
Link to service: Biomine Explorer
dictyExpress is a web-based application for retrieval and analysis of gene expression data collected from social amoeba Dictyostelium. It is Dictyostelium’s largest and most used gene expression repository and is thoroughly referenced from dictyBase, organism’s genome home page.
Link to service: dictyExpress
DiNAR is a user friendly application for revealing hidden patterns of plant signalling dynamics using Differential Network Analysis. It comes with additional sub-applications for network preparation and pre-processing. DiNAR is a Shiny App and has three main functionalities: dynamic visualisation of complex multi-conditional experiments, identification of strong differential interactions, and recall of latent effects that are present in multi-conditional experiments. Its primary purpose is to reveal hidden patterns of plant signalling dynamics, but can be extended to any user defined network in combination with experimental datasets (transcriptomics, proteomics, metabolomics, …).
Link to service: DiNAR
ELIXIR-SI expands and enriches collection of tools and services in dry and wet lab and in collaboration with other (inter)national ELIXIR partners.
The ELIXIR-SI e-learning team leads e-learning activity within ELIXIR Training platform:
- Developing of ELIXIR-SI eLearning Platform (EeLP) with highly recognized ELIXIR courses in last years:
- Unix/Linux tutorial for beginners (in several repetitions);
- ELIXIR-EXCELERATE HPC Train-the-Researcher course;
- Genome Assembly and Annotation (in several repetitions).
- Data management / data stewardship (in several repetitions).
- Developing of high quality e-learning-based training courses in collaboration with other ELIXIR nodes and RD-Connect in the area of Rare Diseases.
- Developing connection of Cloud and HPC with EeLP.
- Linux terminal embedded in EeLP is now routinely used in ELIXIR e-learning courses.
- Containerised Galaxy servers for students’ training (we initiated Galaxy for training in Slovenia), metagenomics and genome assembly and annotation support.
GoMapMan is an open, web-accessible, resource for gene functional annotations in plant sciences. Three sub-applications exist — namely the protein (protein.gomapman.org), metabolite (meta.gomapman.org) and small RNA (srna.gomapman.org) GoMapMan. In all, protein coding genes, metabolites and small RNAs are described using the MapMan plant ontology.
Grohar presents an open-source computational tool focused to the analysis, visualisation and alignment of genome scale metabolic models (GEMs). Grohar provides an easy to use graphical interface, which allows the user to perform different types of COBRA analyses without any programming skills. Moreover, Grohar allows automatic identification of individual metabolic pathways (from KEGG or SBML files) within the GEM network.
Orange Data Mining is an open source machine learning and data visualization tool that features interactive visualisations and visual programming for construction of data analysis workflows. It includes a large toolbox that supports data preprocessing, clustering, classification, correlation analysis, network construction and analysis, and model evaluation. A number of its add-ons have been specifically designed for biomedical data analysis and access to public biomedical data repositories.
pISA-tree is a data management solution developed to contribute to the reproducibility of research and analyses. Hierarchical set of batch files are used to create a standard project directory tree for research projects for Windows. It is in accordance with the ISA-tab framework and is meant as a support system for reproducible research. The tree structure with standardized nested directories can be generated on the fly, actively during project development and growth. pISA-tree can support small to moderate projects and is a step towards the FAIR data guiding principles. In addition, two related in-house developed R packages are being actively developed: pisar (R support for pISA-tree) and seekr (R interface with the SEEK API). The latter connects the project folders with the externally maintained and developed open source web platform FAIRDOMHub for sharing scientific research assets.
Link to service: pISA-tree
quantGenius is an open web accessible resource for data organization and analysis for various applications of the qPCR method. It is designed as a workflow that guides the user through quality control and calculation steps. The built-in quality control-based decision support system enables robust quantification of nucleic acids.
Link to service: quantGenius
Single Cell Orange is a derivative of Orange data mining toolbox specifically designed for scRNA data analysis. In addition to standard data analytics it features components for mutli-sample data loading, gene marker discovery, marker-based cell scoring, batch effect removal, data alignment, cluster analysis, and analysis of gene set enrichment. All visualisations in scOrange are interactive and support explorative data analysis that does not require any knowledge of programming.
ViDis presents a web-based platform for construction and sharing of diagnostic as well as other clinical algorithms. ViDis provides an easy-to-use web interface, which makes its functionalities available to a wide scope of users, i.e. doctors and clinicians, researchers and finally to patients.