Pan-European infrastructure for ocean & marine data management

Newsletter

3 April 2018

See other editions

Other editions

Content:

Introduction

SeaDataNet is a major operational infrastructure for managing, indexing and providing access to ocean and marine data sets and data products, acquired by European organisations from research cruises and other observational activities in European coastal marine waters, regional seas and the global ocean. It develops, governs and promotes common standards for metadata and data formats, common vocabularies and quality flags and software tools and services for marine data management, which are widely adopted and used. SeaDataNet core partners are the National Oceanographic Data Centres (NODCs) and major marine research institutes in Europe. It has established a large European and international network, working closely together with operational oceanography, marine research, and marine environmental monitoring communities as well as with other marine data management infrastructures. SeaDataNet is also a major partner in the development and operation of the European Marine Observation and Data network (EMODnet), aimed at supporting the EU initiatives on Marine Knowledge 2020 and Blue Growth and the Marine Strategy Framework Directive (MSFD). Since the mid-1990s SeaDataNet has expanded and matured and at present it provides federated discovery and access to 110 data centres for physics, chemistry, geology, bathymetry, and biology. SeaDataNet is further developing its discovery, access, ingestion, publishing and visualisation services in the EU HORIZON 2020 SeaDataCloud project. This aims at upgrading and expanding the SeaDataNet architecture and services, making use of cloud services, taking into account the European Open Science Cloud (EOSC) challenge. The major objectives of the SeaDataCloud project are:

  • Improve discovery and access services for users and data providers
  • Optimise connecting data providers and their data centres and data streams to the infrastructure
  • Improve interoperability with other European and international networks to provide users overview and access to additional data sources
  • Develop a Virtual Research Environment with tools for analyzing data and generating and publishing data products.
This is the second edition of the newsletter in the framework of the SeaDataCloud project. It gives you information about the progress of a number of SeaDataCloud developments and related activities. We hope you will enjoy this newsletter and will be triggered to visit the SeaDataNet portal for a try-out of its services and to follow its evolution. We aim to reach as many people as possible, so please forward it to anyone you know may be interested.

IMDIS 2018 Conference, Barcelona Spain, deadline for abstracts approaching!

The 6th edition of the International conference on Marine Data and Information System is organised in the frame of SeaDataCloud project and will take place 5 – 7 November 2018. The IMDIS cycle of conferences has the aim of providing an overview of the existing information systems to serve different users in ocean science. It also shows the progresses on development of efficient: infrastructures for managing large and diverse data sets, standards, interoperable information systems, services and tools for education. The Conference will present different systems for on-line access to data, meta-data and products, communication standards and adapted technology to ensure platforms interoperability. Sessions will focus on infrastructures, technologies and services for different users: environmental authorities, research, schools, universities, etc.

The conference is co-organised by CSIC (Consejo Superior de Investigaciones Cientificas, Spain) and IFREMER jointly with OGS and IOC/IODE. The conference will be held in the auditorium of the PRBB (Barcelona Biomedical Research Park) on the seafront and next to the CSIC office.

building.jpg (35.6 K)
imdis_logo.png (13.7 K)

Visit the conference website and do not forget to submit your abstract before the dead line on 30 April 2018.

Progress with population of SeaDataNet directories

The SeaDataNet infrastructure comprises a network of interconnected data centres that perform marine data management at national and local levels and that together make their information and data resources discoverable and accessible in a harmonized way. The SeaDataNet directory services provide overviews of marine organisations in Europe, and their engagement in marine research projects, managing large datasets, and data acquisition by research vessels and monitoring programmes for the European seas and global oceans:  

  • European Directory of Marine Organisations (EDMO) (> 3.900 entries)
  • European Directory of Marine Environmental Data (EDMED) (> 4.100 entries)
  • European Directory of Marine Environmental Research Projects (EDMERP) (> 3.000 entries)
  • European Directory of Cruise Summary Reports (CSR) (> 55.000 entries)
  • European Directory of  the Ocean Observing Systems (EDIOS) (> 10.000 entries)
  • Common Data Index Data Discovery and Access service (CDI) (>2.1 million entries)
sdn12_monthly_progress_sep_2012.png (49.6 K)
Figure: the monthly progress of each of the directories since September 2012.  

Users can follow this monthly progress at the SeaDataNet portal.

The Common Data Index (CDI) Data Discovery and Access service provides users online unified discovery and access to the vast resources of marine and ocean datasets, managed by the distributed data centres. It gives users a highly detailed insight in the geographical coverage, and other metadata features of data across the different data centres. Users can request access to identified datasets in a harmonised way, using a shopping basket. They can follow the processing of requests via an online transaction register and can download datasets in the SeaDataNet standard formats. Through the cooperation with many EU projects and its active role in the EMODnet development the number of connected data centres has steadily risen to >110 connected data centres at present. This way the CDI service provides metadata and access to more than 2.1 Million data sets, originating from more than 600 organisations in Europe, covering physical, geological, chemical, biological and geophysical data, and acquired in European waters and global oceans.

sdn12_overview_cdi_march_2018.gif (83.3 K)
Figure: Overview of CDI entries per March 2018: >2.1 million data sets from 600+ originators and 110+ connected data centres

sdn12_number_cdi_entries_march_2018.png (59.2 K)
Figure: Number of CDI entries per March 2018 per discipline

SeaDataNet Common Vocabularies, expansion and extra services

In information science controlled vocabularies are carefully selected lists of words and phrases, which are used to tag units of information (document or work) so that they may be more easily retrieved by a search. SeaDataNet has adopted this principle and started building and using controlled vocabularies in an early stage in order to mark-up metadata, data and data products in a consistent and coherent way. Common Vocabulary services were set-up and are populated by SeaDataNet. These are technically managed and hosted by the British Oceanographic Data Centre (BODC) by means of the NERC Vocabulary Server (NVS2.0).

NVS is a SKOS-vocabulary and fit for Linked Data by having unique http URIs. At present NVS maintains and provides 239 different vocabularies of which 55 are relevant and governed as SeaDataNet vocabularies. The number of terms (concepts) in these vocabulary collections has steadily increased over time in dialogue with research communities and under influence of many projects adopting the vocabularies for their data management. These included mapping activities for marking-up metadata and data, resulting in many requests for new terms to be added. At present NVS contains nearly 160.000 terms divided over the different vocabulary collections. For instance the P01 – Parameter Usage Vocabulary contains at present more than 37.000 terms. Next to vocabularies, the NVS also includes mappings between vocabularies both internal (NVS to NVS) and external (NVS to well-established external vocabularies). An illustrative example of internal mapping is the P08 (SDN Parameter Disciplines) => P03 (SDN Agreed Parameter Groups) => P02 (SDN Parameter Discovery Vocabulary) => P01 (SDN Parameter Usage Vocabulary) hierarchical mapping which is used for easing discovery services and classifying the parameters of measurements. In the CDI Data Discovery and Access service P02 (and its P03 and P08 broader relations) are used in the CDI metadata, while P01 is used in the data. Also many of the other SDN vocabularies are used in the metadata (depending on the SDN directories) and data, such as for example L05 for device categories, L06 for platform categories, and L22 for devices (measurement instruments).  Good examples of external mapping are for instance mappings of NVS to the World Register of Marine Species (WoRMS) and Global Change Master Directory (GCMD). As part of SeaDataCloud further population of the vocabularies has taken place and also a series of vocabularies (W01 – W10) has been initiated and populated to mark-up SensorML and Observation & Measurements (O&M) profiles as part of the development and implementation of Sensor Web Enablement (SWE) standards for streamlining data flow from sensors and platforms to data centres and their real-time publishing.

The vocabularies are made available as web services for machines exchange by:

Moreover the vocabularies are published with client interfaces for end-users. These can be found at the BODC website and facilitate to query and receive results as XML documents. There are also user interfaces at the SeaDataNet portal. The latter gives a user interface to oversee and go into all SeaDataNet common vocabularies.

sdn12_bodc_vocabularies.jpg (42.5 K)
Figure: User Interface at SeaDataNet portal to oversee and query all SeaDataNet common vocabularies

The SeaDataNet list user interface has been built by MARIS upon the SOAP web service as provided by NVS. Each vocabulary has a user interface for querying, retrieving, browsing, and CSV export of terms. A number of vocabularies is hierarchical and includes a thesaurus button to browse in a hierarchical manner.

Great progress has been made for the P01 Parameter Usage Vocabulary, which is used for indicating in the SeaDataNet data files (ODV and NetCDF (CF)) which parameters have been observed. The P01 is used intensively by data providers when mapping local data sets to the SeaDataNet target data formats. However identifying the right P01’s during mapping is also quite a challenge as P01 at present counts > 37.000 terms and each P01 concept is built up of a number of elements following a semantic model. In particular when mapping complex terms as in case of Chemistry it takes considerable effort to locate the right terms or to identify missing terms that should be added. Example of P01 term (MMUSDTBT): ‘Concentration of tributyltin cation {tributylstannyl TBT+ CAS 36643-28-4} per unit dry weight of biota {Mytilus galloprovincialis (ITIS: 79456: WoRMS 140481) [Subcomponent: flesh]} ‘. This truncation consists of components for a measurement property, chemical substance, measurement matrix relationship, and matrix. These components itself also have parts. This semantic model for P01 is now exposed by BODC in its components by web services and on top of these a dedicated P01 facet search user interface has been deployed by MARIS. This so-called ‘one-armed bandit’ can be reached from the overall SeaDataNet vocabularies interface by clicking on the ‘magnifying glass button’.

sdn12_p01_semantic_model.jpg (31.0 K)
Figure: How to open the P01 semantic model facet search from the SeaDataNet vocabularies list user interface.

This search tool makes it much easier to find relevant P01 terms by using the facets in combination with multiple free search, while P01 results can be exported in a csv list for use in the local mapping. In addition a P01 vocabulary builder tool has been developed and deployed by BODC to facilitate data providers in composing and submitting requests for new P01 terms. The new terms can be built using the semantic components. The following image illustrates how the P01 vocabulary builder can be reached.

sdn12_p01_vocabulary_builder.jpg (28.5 K)
Figure: How to open the P01 vocabulary builder to compose new P01 terms and to submit these as addition requests.

Implementing the Linked Data concept

“Linked Data” is a method for publishing data, in a structured way, on the World Wide Web. It was first proposed by Sir Tim Berners-Lee in 2006 as a pathway to creating a web of data, like the web of documents which was becoming prevalent at that time and which we are familiar with. The four key principles of Linked Data are, that when publishing data online:

  1. Uniform Resource Identifiers (URIs) should be used to identify data objects. A URI is simply is a string of characters used to identify a resource
  2. Use HyperText Transfer Protocol (HTTP) URIs so that web browsers, and other web-enabled software, can look up the data objects
  3. When a data object is looked up, provide useful information about it by using open standards – in particular the Resource Description Framework
  4. Provide connections to other objects in the information buy referencing them using their HTTP URIs
By developing Linked Data capability for the SeaDataNet infrastructure in the SeaDataCloud project, we expect the following benefits:
  • Improved use of the SeaDataNet vocabularies within the metadata in the infrastructure. The vocabulary terms have already been available as Linked Data for several years.
  • Improved connections between the catalogues
  • Improved search as a result of using the improved connectivity between the catalogues, and connection to other portals including the European Open Data Portal
  • INSPIRE are considering accepting Linked Data as a submission format
  • Making use of the technology behind “rich snippets” in Google search, which they are currently developing further for datasets
Each of the SeaDataNet metadata catalogues has been modelled by Marine Institute in Linked Data terms using existing patterns, with the exception of the Cruise Summary Reports for which no relevant existing pattern could be adopted. For EDMO, EDMED, EDMERP and CDI recommendations from the World Wide Web Consortium were followed. In particular, for EDMED and CDI the same profile as used in the European Open Data Portal was chosen. For EDIOS, the INSPIRE Environmental Monitoring Facility model was followed.

Progress has been made by BODC and MARIS with publishing a number of the catalogues (EDMED and EDMO) as Linked Data resources by means of SPARQL endpoints and RDF triplestores, which are currently being tested. Their approach is documented and will be adopted for implementing also the other catalogues as Linked Data resources which is planned in the coming months. As an extra activity, IODE has asked for support in mapping the OceanExpert directory to Linked Data, which will also address an issue found in the SeaDataCloud work with the lack of “people” as data entities in the Linked Data space.

A next challenge will be to translate the catalogue data models to the specific flavour of Linked Data understood by most common search engines. This flavour is known as Schema.org. This approach will establish full support for “rich snippets” and facilitate that the SeaDataNet directories will be picked up by the major search engines. Moreover it promises to make it possible to enrich the metadata that are provided to human users of the SeaDataNet data discovery and access service when searching, browsing and using retrieved data sets.

New and updated SeaDataNet tools

SeaDataNet has developed and maintains a set of tools to be used by each data centre and freely available from the SeaDataNet portal. It includes documentation and common software tools for metadata and data, statistical analysis and grid interpolation and a versatile software package for data analysis, QA-QC and presentation. As part of the SeaDataCloud project upgrades are undertaken taking into account new requirements. The following software versions are current:  

MIKADO, developed by IFREMER, is used to generate the XML metadata entries for CDI, CSR, EDMED, EDMERP and EDIOS SeaDataNet catalogues. The latest version (3.5) has been released in March 2018. It includes the latest version of CSR and CDI xsd schemas (v10.0.1 for CDI and v3.0.1 for CSR), bug corrections and add-ons such as report when updating vocabularies and instructions about date format in CDI and CSR (automatic mode).

NEMO, developed by IFREMER, enables conversion of ASCII files of vertical profiles, time series or trajectories to SeaDataNet format files which can be text Ocean Data View (ODV) and MedAtlas formats or binary NetCDF format. The latest version (1.6.6) has been released in November 2017. Next version (1.6.7) is under development and will include bug corrections and conversion of ASCII files to other SeaDataNet format files designed specifically for Biological data, Microlitter data and Flow cytometry data.

OCTOPUS, developed by IFREMER, is a format conversion software and also a format checker for SeaDataNet ODV, netCDF and MedAtlas files. OCTOPUS enables the splitting of a multi-station SeaDataNet file into several mono-station SeaDataNet files, and to extract stations from a multi-station file; finally, OCTOPUS is designed to convert specific Magnetism, Gravimetry and Depth data files (MGD) to SeaDataNet ODV format. The latest version (1.4.0) has been released in March 2018, it includes:

  • a new conversion from netCDF to ODV,
  • a new format checker for SeaDataNet netCDF files,
  • the format checkers available in batch mode.
The next version of OCTOPUS will focus on the development of conversions and checks to the other SeaDataNet formats designed specifically for Biological data, Microlitter data and Flow cytometry data.

EndsAndBends, developed by IFREMER, is used to generate spatial objects from raw navigation (ship routes). Typical navigation log files record more than one location / 10 seconds (ex: GPS outputs) and the Size of these navigation log files are not practical to be managed or visualized using standard GIS software or services (WMS, WFS and GML). EndsAndBends enables a sub-setting of the navigation files, keeping the same geographical shape of the vessel route and reducing significantly the number of geographical locations to preserve response time.The latest version of EndsAndBends (2.1.0) has been released on April 2014, since then no new version has been developed.

SeaDataNet Reduce service, under development by MARIS, is an online service with comparable functionality as the EndsAndBends stand-alone software. It is being developed in response to data providers that want to reduce their navigation and survey tracks in batch for inclusion in their CDI metadata entries. The Reduce service expects GML input and makes use of fuzzy logic to determine the best fit reduction factor. It also gives map visualisation to compare the original track versus reduced tracks with different factors, including the advised version. The service is at present under testing.

Download Manager (DM), developed by IFREMER, supports connecting to the SeaDataNet infrastructure. The Download Manager handles all communication between the data centre system and the CDI RSM service and that takes care that requested files are made ready for downloading by users via their personal download pages at the data centre. The latest version (1.4.7) has been released in October 2017; it includes:
  • Deprecated vocab terms automatically replaced
  • Technical bases and dependencies updated
  • Revision of the installation: simpler and shorter
This is the second version embedded by ENEA in a virtual appliance that will ease installation, configuration and version updating for many data centres. This virtual appliance is currently deployed in a few test data centres and will be distributed soon. The virtual appliance will also be adopted for installing and configuring the new SeaDataCloud Replication Manager software, once ready. Set-up and functionality of the Download Manager are currently updated to become the Replication Manager that can interact with the local data centre configuration, planned CDI Import Manager (IM) software and data cloud.

Replication Manager (RM), under development by IFREMER, handles the replication of the local data sets managed in a SeaDataNet data centre into the central cloud. First version of the replication manager (1.0) will be released in April 2018 for a connection of some specific data centres which agree to be beta testers. As follow-up a wider deployment is planned at all relevant SeaDataNet nodes. This will be started at the SeaDataCloud Training Workshops which are planned in June 2018. The actual implementation is planned in the period from the Training Workshops till end 2018 as part of the upgrading of the CDI Data Discovery and Access service, adopting cloud services. The Replication Manager is a web application which will allow data providers to submit new and updated CDI files to the new CDI import manager at MARIS and as next step to replicate data sets for accepted CDIs to the central cloud at EUDAT.

SeaDataNet SWE Ingestion service for real-time data

SeaDataNet strives for a common standardised approach for describing and giving discovery and access to marine data from different marine disciplines. So far SeaDataNet has put most of its focus on delayed mode data sets that are already managed and stored at data centres, whereby those data centres are asked to ‘translate’ their local metadata and data to the SeaDataNet standards for formats and vocabularies for inclusion in the federated CDI Data Discovery and Access service. In SeaDataCloud the challenge of standardising is also directed towards real-time data streams as collected by operational sensors and platforms. For that purpose SeaDataNet is adopting the Sensor Web Enablement (SWE) framework of standards of the Open Geospatial Consortium (OGC). The OGC SWE architecture comprises several specifications facilitating the sharing of observation data and metadata via the Web. Important building blocks are standards for observation data models, for the corresponding metadata about measurement processes, and interfaces for providing sensor-related functionality (e.g. data access) via the World Wide Web.

Within the SeaDataCloud project two different aspects of Sensor Web technology are addressed, led by 52°North:

  • Developing an SWE Ingestion Service for facilitating the publication of real-time sensor data and their historical time-series through interoperable standards
  • Advancing the open source Sensor Web viewer “Helgoland”.
The SeaDataNet SWE Ingestion Service aims to support sensor operators, researchers and data owners in streaming and publishing marine observation data collected with sensors at platforms. This means that the collected data and corresponding metadata shall be managed in a data repository which complies with the OGC Sensor Observation Service standard. Furthermore, the collected observation data will be made discoverable through the SeaDataNet infrastructure, in particular the CDI Data Discovery and Access service. As a result, the CDI will also offer real-time data streams and their historical time-series as an additional resource for researchers. To make this happen, the SWE Ingestion Service has to cover a range of different functionalities for the data publication process:
  • A common metadata format based on the OGC Sensor Model Language (SensorML) standard; this work also contributes to the definition of common Sensor Web Enablement standard profiles for marine applications as undertaken together with other projects and published at GitHub: https://odip.github.io/MarineProfilesForSWE/
  • A tool helping users to describe the data sets they are publishing
  • A harvesting/upload component taking metadata descriptions as an input to automatically harvest available observation data sets
  • A feedback tool helping to monitor the success of the data publication workflow
The specification of the SWE Ingestion Service was completed in fall 2017 and the implementation is currently ongoing. A first running version is expected soon.

sdn12_overview_seadatacloud_architecture.png (44.8 K)
Figure: Overview of the SeaDataCloud SWE Ingestion Architecture

The Sensor Web Viewer “Helgoland”, developed and maintained by 52°North, addresses the exploration and visualisation of observation data available through Sensor Web servers. “Helgoland” is a Web-based application which offers functionality to determine which data sets are offered by a Sensor Web server and to subsequently download this data. Based on these data sets, “Helgoland” offers different means for data visualisation covering different types of observation data sets, for example stationary time-series data, profile measurements and data measured along a trajectory.

Within the SeaDataCloud project, the Helgoland viewer will be further extended in close cooperation with users. This comprises especially the support of further data types as well as usability improvements.

sdn12_screenshot_helgoland.png (93.4 K)
Figure: Screenshot of the Sensor Web Viewer "Helgoland"

Data providers as well as scientists using real-time observation data can expect from the SeaDataCloud project a range of Sensor Web tools that will facilitate the publication, discovery, and usage of observation data through interoperable standards. All of the Sensor Web software developed as part of SeaDataCloud will be published as open source software.

Vocabularies for handling flow cytometer data

Planktonic microbial communities play a major role in the functioning of the global ecosystem. In marine waters, they are the major producers and mineralizers of the organic matter. Thus, they are key actors of biogeochemical processes and can be considered as important indicators of marine health.

Flow Cytometry (FCM) measures the optical properties of single cells transported by a liquid sheath as they cross a light source excitation (one to several lasers). This technology was applied to the marine field in the early 80s, leading to major discoveries such as the Prochlorococcus or Ostreococcus genera. Applied to the marine or freshwater environments, flow cytometry enables the sorting and counting of small phytoplankton cells into several functional groups based on their light scattering and natural fluorescent properties. As the volume analysed can be measured, abundances can be provided for each group of cells. Nucleic acid specific fluorescent dyes may be used to enable the detection of heterotrophic prokaryotes (Bacteria and Archea), heterotrophic eukaryotes (e.g. pico and nanoflagellates) and viruses.

Flow cytometry datasets are most often acquired using conventional benchtop flow cytometers in the laboratory, at a low temporal and/or spatial frequency, with the need of an operator to manipulate the instrument and the samples. Some instruments can be deployed in situ in water bodies and pre-set to run the analyses automatically several times per hour. These instruments record various optical properties such as several fluorescences, and light scatter of every single particle analyzed. Some specific instruments are also able to collect a picture of single cells as they flow, giving additional taxonomical identification of cells above 20 µm (below the resolution is not sufficient enough).

Within SeaDataCloud, for the first time, a special focus is given to these data so as to make them interoperable with the SeaDataNet system. Preparatory work for building a flow cytometry common and standardized vocabulary is undertaken by CNRS/MIO, BODC and VLIZ.

A methodology was established for defining a common set of terms that could be used by a worldwide community of flow cytometry users. First, we analysed existing parameter codes currently held in the BODC Parameter Usage Vocabulary (P01). Then, we worked with a small European FCM community to target the common captured parameters. In the meantime, a literature review was carried out starting from the beginning of the flow cytometry technique in the 1980’s till 2017, which allowed to identify the most commonly captured parameters and the names given to the groups of cells defined by flow cytometry. Finally, we designed a questionnaire to gather more information about methodology and terminology from flow cytometry plankton specialists worldwide, and ask for feedback on the proposed common vocabularies.

There are currently 34 parameter codes related flow cytometry in the P01 vocabulary. These have been created over the past 30 years to markup datasets received at BODC. Most were created to reflect the terminology used at the source but remodeled to fit the BODC semantic model for biological parameter codes. The collection has grown and diversified over the years as flow cytometry spread in marine laboratories and terminology shifted in response to new experimental applications, greater instrument performance and new scientific understanding. As a result many of these codes have become either ambiguous, poorly defined, or redundant. All 34 existing P01 codes will be reviewed once the new common standard vocabulary is finalized. This situation is a testimony to the timeliness of agreeing on a set of common vocabularies and on their definition in order to widely share FCM datasets and make them interoperable with one another.

In order to upgrade these codes for both automated and conventional flow cytometry, at a broad level of agreement between FCM users, we worked closely with some of the JericoNext partners (CNRS/MIO, Rijkswaterstaat (RWS), the University of Littoral Côte d’Opale (ULCO), VLIZ and the Centre for Environment, Fisheries and Aquaculture Science (Cefas)) on a common exercise to identify their FCM data management method and which parameters are captured after the analysis processing. The result below shows common and unique captured parameters for each partner.

sdn12_synthesis_captured_parameters.png (20.3 K)
Figure: Synthesis of captured parameters per partner

The combination of all these parameters leads to a total of 73 captured parameters (metadata and data). Since we are focusing on parameter usage vocabulary, our choice was limited to the common 12 data variables found in this exercise.

sdn12_common_data_parameters_p01.png (43.5 K)
Figure: Common data parameters added to P01 list

These identified common data FCM parameters, which give information on biological as well as non-biological groups of particles (i.e: Standard fluorescent microsphere used as an internal standard for quantitative and qualitative comparisons), are mapped to the BODC semantic model.

Thanks to the literature review, the most common groups defined in marine waters by flow cytometry have been defined as Prochlorococcus, Synechococcus, Eukaryote Picophytoplankton, Eukaryote Nanophytoplankton, Cryptophytes, Coccolithophores, Microphytoplankton, Heterotrophic prokaryotes. The Standard beads (non-biological entities) are also considered as a separate group because they are analyzed routinely in every FCM analyses for quality control (monitoring to the instrument performance) and as internal standard for standardization of the results (signal intensities).

These clusters and their definitions are managed in a separate vocabulary collection recently created by the BODC. The list entitled “SeaDataCloud Flow Cytometry Standardised Cluster Names (F02)” is also available from the SeaDataNet vocabulary list: http://seadatanet.maris2.nl/v_bodc_vocab_v2/welcome.asp.

In order to update this vocabulary codes with a large consensus of FCM users, a questionnaire has been created and submitted to FCM users (more than 150) all over the world. It covers four main parts: FCM Group names and definitions, FCM Metadata, Sample Metadata and FCM Data. Results from it will mature the standard vocabulary for FCM. A future work will consist in integrating specific data from the high diversity of benchtop flow cytometers. Pictures collected by some flow cytometers only will go through an already defined way of taxonomical collection and management (automated images recognition tools and diversity established databases).

We hope that the resulting brainstorming for the identification and definition of those standards will help in enhancing FCM dataset storage and use in international interoperable databases.

SeaDataNet VRE, developing a Virtual Research Environment

The Virtual Research Environment as it is under development within SeaDataCloud is an online accessible environment that enables collaborative and individual research of marine datasets. This includes the following basic features:

  1. Users should be able to handle, analyse and process ocean and marine data into value-added data products which can be integrated, visualised and published using high level visualisation services.     
  2. User should be able to combine data with subsets from other data resources, such as ingested collections.     
  3. The VRE should have a high capacity and performance for big data processing and state-of-the-art web visualisation services.
  4. It will have to respect privacy of users and differences in data policies. Differentiated users, different access to data and data products.
  5. It should be possible to configure virtual work spaces for individuals or groups to work on specific projects, including setting up of dedicated pools of data.
  6. The VRE should allow producers to decide whether their outcomes will be shared in the public domain or stay private.    
  7. The SDC VRE will be based and hosted on EUDAT’s infrastructure.
The work in SeaDataCloud has first focused on a clear specification of the VRE and creation of a common understanding between the involved partners. The specification includes an analysis of existing VRE’s - performed in synergy with the ODIP (Ocean Data Interoperability Platform) project where VRE’s from Australia and USA have been presented and discussed in detail next to European VRE’s - , an overview of the use cases targeted during development, the requirements of these use cases, a high-level architecture of the system (see figure below), and description of the main components to be developed to run the VRE.

sdn12_high_level_architecture_vre_components.jpg (35.6 K)
Figure: high level architecture of main VRE components

The VRE will offer the user the opportunity to work in a group, or individually, with services and data available in the cloud. The integration of the services will take place in the “VRE portal”. The frontend layer provides users access to the services and data that are part of the VRE. The portal offers user interfaces in various “flavours”, from very technical / scientific (Jupyter Notebook, Virtual labs) to interfaces created for less scientific users (dedicated interfaces).

The EUDAT HTTP-API is an important part of the service layer. All calls from interfaces to services and data could go through the HTTP-API in a standardised way. The HTTP-API can be seen as the glue between the VRE portal and the EUDAT services B2SAFE / B2DROP where the data resides. And in general it will connect the data to all services/processes running in the EUDAT service B2HOST (if WebDAV is not used). The services will include “service functions” of well-known SeaDataNet applications like Ocean Data View (ODV) and DIVA, as well as a set of transformation services, quality control services and visualisation services to work with the datasets.

The aim is to deliver end 2018 a first version of the VRE facilitating the SeaDataCloud product group working on generating the SeaDataNet Temperature and Salinity Climatology. Many parallel actions have been started since the kick-off of the technical development phase in January 2018. The VRE team will work in 4-monthly development cycles ending in a code sprint and demonstration of an actual product. The first development cycle is focused on the VRE foundation and has begun by identifying the main components needed to create a first “VRE product”, e.g.: a design of the VRE GUI, using a service running in a Docker container to access a user dataset, and visualising an output dataset. All developments are made within the EUDAT cloud infrastructure. The SeaDataCloud data product working group will be closely involved in the development by communicating their requirements, and by test and feedback loops.

Use case: how can SeaDataNet output (CDI + ODV) be transformed into INSPIRE compliant datasets for supporting MSFD implementation

SeaDataCloud strives for full INSPIRE compliance and has planned deployment of central transformation services for converting SeaDataNet data sets (CDI and ODV / NetCDF formats) to relevant INSPIRE application schemas, depending on types of data. Therefore the feasibility of transforming SeaDataNet formats into INSPIRE data standards has been positively analysed by BODC and SYKE, following the INSPIRE data implementation rules.

The INSPIRE Directive aims to create a European Union (EU) spatial data infrastructure to enable the sharing of environmental spatial information among public sector organisations and better facilitate public access to this data across Europe. Implementation of the INSPIRE Directive is based on harmonised common data models and standardised ways to share the data. Of primary concern for SeaDataCloud are the INSPIRE Themes ‘Environmental Monitoring Facilities (EF)’ and ‘Oceanographic Features (OF)’, which have both been defined based upon the OGC Observations & Measurements (O&M) model. In addition to the EF and OF data specifications, a SeaDataCloud technical guideline document has been composed detailing the requirements for the sharing of observations and measurements data.

EMODnet Chemistry has adopted the SeaDataNet standards and services as its backbone for gathering and giving discovery and access to marine chemistry datasets. EMODnet Chemistry is aiming at generating and providing data products and services which are fit-for-purpose of supporting the Marine Strategy Framework Directive (MSFD) implementation.

The MSFD Directive 2008/56/EC defines some obligations, in accordance with Member States (MS), for the implementation of strategies for achieving or maintaining good environmental status (GES) in the marine environment. One of these obligations, described in the Article 19(3), prescribes that MS shall make data resulting from Article 8 and Article 11 available in agreement with the Directive 2007/2/EC (INSPIRE). In this context, the Technical Group on Marine Data (TG-DATA), formed in 2012, has taken actions for improving the MSFD Article 19(3) and providing recommendations for the publication of datasets under the MSFD Article 19(3). These guidelines propose some examples and best practices.

EMODnet Chemistry is participating in TG-DATA and was asked to work out together with the MEDCIS project an INSPIRE use case for nutrients data in the Mediterranean Sea. This concerned MSFD Criterion D5C1 “Nutrients concentrations in water”. For the use case the results of the SeaDataCloud transformation analysis were successfully applied. Test data were provided by SeaDataCloud and EMODnet Chemistry partner IOF (Croatia). The data were provided with metadata in the SeaDataNet CDI format, and data in SeaDataNet ODV format. The solution developed and proposed in the SeaDataCloud project to deliver data in a INSPIRE compliant way was adopted and adapted.

The classes used in this work are:

  • Environmental Monitoring Facility (EMF);
  • Feature of Interest (FoI);
  • Procedure (Proc) and
  • Observed Property (Obs).
The resulting mapping between SeaDataCloud formats and INSPIRE elements can be found at:
http://nodc.ogs.trieste.it/INSPIRE_compliant/INSPIREmatching_MEDCIS.xlsx

This mapping has been developed using the matching tables for the EF theme, as improved by SeaDataCloud and uploaded in the INSPIRE Thematic Clusters platform:
https://themes.jrc.ec.europa.eu/file/view/170503/inspire-ef-matching-table

A complete version of XML files are downloadable at the following link:
http://nodc.ogs.trieste.it/INSPIRE_compliant

where an example of nutrients data acquired in Mediterranean is described by INSPIRE standards.

The exercise demonstrates the completeness of SeaDatanet / EMODnet Chemistry metadata with respect to INSPIRE requirements and the feasibility to map SeaDataNet / EMODnet Chemistry data to INSPIRE models. It also demonstrates that the EMODnet Chemistry platform, powered by SeaDataNet, could be used by Member States to expose monitoring data following Article 19(3), i.e. compliant with INSPIRE, through a centralized transformation service, to be developed to convert formats. This use case as reported to TG-DATA has major potential for paving the way for Member States to adopt SeaDataNet standards for part of their monitoring data and EMODnet Chemistry as a distribution platform for providing references to their data in INSPIRE format to MSFD without having to undertake extra INSPIRE efforts themselves.

SeaDataCloud Training Workshops for uptake of upgraded CDI service

Currently more than 110 Data Centres from 34 countries around European seas are connected to the SeaDataNet infrastructure and are making (part of) their marine data resources discoverable and accessible by means of the SeaDataNet CDI Data Discovery and Access service. This important service is serving several EMODnet thematic portals (chemistry, bathymetry, physics, biology, geology) and various EU and international projects and related portals (e.g. GEOSS and Ocean Data Portal). Also CDI references are included in several established and published data products such as the SeaDataNet Temperature & Salinity Climatology and the EMODnet Bathymetry Digital Terrain Model, which are widely used and referenced by science, government and industry. This way the content in the SeaDataNet CDI service is reaching out to a large user community, taking benefit of the distributed network and promotion efforts of each of the associated projects, portals and data products.

As part of the SeaDataCloud project good progress is being made with upgrading the CDI service. The upgrading involves adopting a central cloud, hosted by EUDAT, to function as data cache for unrestricted data from connected Data Centres. Exchange will take place by dynamic replication for which the present locally installed Download Manager will be replaced by a Replication Manager tool. A new CDI – data import and quality control process will be made available to Data Centres to manage, validate and oversee their new and updated submissions. The locally installed Replication Manager software will interact with the Import Manager process and the EUDAT cloud. This new set-up will improve considerably the overall functioning and consistency of the input part of the CDI service. Moreover, the discovery and shopping interfaces and related processes are being upgraded to provide an easier service to users including a largely improved performance.

The upgrading is making good progress and it is planned that the new service can be tested end of April 2018 and finalised during May 2018. Thereafter an implementation period will start to deploy the new set-up at all connected Data Centres and to phase out the current set-up, this all while the overall CDI service must stay operational and functioning.

To kick-off the implementation a Training Workshop is being organized to inform technicians and data managers working at the connected data centers about the new set-up and new procedures for updating and submitting new data entries, and to give instructions about how to install, configure and test the Replication Manager.

Considering the large number of connected Data Centres 2 similar sessions of the Training Workshop are planned and these will take place at the IOC Program Office for IODE in Ostend (Belgium) on:

  • 20 – 22 June 2018
  • 25 – 27 June 2018
Technicians and/or data managers working at connected Data Centers are kindly invited to participate to one of the sessions of your choice. The SeaDataCloud project will fund all the travel, accommodation and subsistence costs for all invited data centres (note: SeaDataCloud partners have a provision in their budget and their participation is mandatory, while non-SeaDataCloud organisations will be refunded). Thus by attending the workshop, you will gain knowledge about the upgrade and the new developments of cutting edge SeaDataNet tools… at no cost for you. The target audience are the staff members who currently maintain the operational SeaDataNet services (CDIs, Download Manager, Data deliveries). Note: the Training Workshop is not a “SeaDataNet for beginners” session.

Invited colleagues should have received a pre-registration form where you are asked to choose between one of both sessions. The logistic organization of the Training Workshops including the refunding is in the hands of RBINS. In case you qualify as connected Data Centre and have questions or did not receive the earlier invitation, please mail the RBINS contact.

We strongly encourage technicians and/or data managers working at connected Data Centers that are non-SeaDataCloud partners to participate in this Training Workshop as it a unique opportunity to learn more about the upgrading and also the important function of the CDI service for many applications. It will provide key information about the local upgrading that is expected in the second part of 2018. Moreover you will meet many other colleagues that are involved in marine data management and related projects, which should provide very useful conversation. Finally we underpin that your travel, accommodation and subsistence will be funded by the SeaDataCloud project. We are looking forward to meeting you again.


Remark: To learn more about how to prepare and check CDI metadata and SeaDataNet data files (ODV – NetCDF), a useful catalog of all technical documentation of current SeaDataNet tools (including user manuals, videos and presentations of previous training workshops) has been set up and is published at the SeaDatacloud website here.

SeaDataCloud EMODnet abstracts to INSPIRE Conference, 18-21 September 2018, Antwerp - Belgium

The 2018 edition of the INSPIRE conference has as motto: INSPIRE users: Make it work together! The call for abstracts has a deadline of 15 April 2018 and is organised in 3 strands:

INSPIRE the Users: In this strand special attention goes to inspiring the usage and users of INSPIRE. Both public and private parties are invited to share their experiences with creating applications that benefit from INSPIRE.

Doing it Together: This strand is dedicated to 'cooperation'. We are looking for examples where the cooperation between public and/or private sector organisations and programmes has been successfully developed to support the implementation of INSPIRE.

Making it work: The strand adressess a more technical approach to ‘make it work’. We are calling for submissions where implementation issues have been identified and where the source of the problems has been - or needs to be - addressed (e.g. Simplifications encoding, Automatisations, quality assessment, etc.).

SeaDataCloud and EMODnet will submit a number of abstracts, such as anyway:

  • SeaDataCloud and ODIP: SWE activities for the marine domain
  • EMODnet Chemistry: use case of mapping nutrients data sets in SeaDataNet format (CDI + ODV) to INSPIRE data model following SeaDataCloud analysis.
SeaDataCloud partners are requested to consider additional abstracts in order to represent well the marine data community.

SeaDataCloud Workshop at EUDAT Conference, 23 January 2018, Porto - Portugal

The EUDAT conference "Putting the EOSC vision into practice" was organized in Porto – Portugal and aimed at sharing & preserving research data across disciplines and borders. It brought together data infrastructure users and providers and policy makers from across Europe:

  • To showcase the latest trends in data infrastructure and data management solutions for research;
  • To demo the solutions developed by EUDAT to address researchers and research communities’ data management needs, through concrete pilots;
  • To inform about the concrete opportunities offered by the newly established EUDAT Collaborative Data Infrastructure (CDI) to the service providers and research communities;
  • To discuss the progress of the European Open Science Cloud (EOSC) and the European Data Infrastructure (EDI) and the contribution of EUDAT as well as other infrastructures to these two initiatives.
The main outcome of the last item has been summarized in a document which you can find at:
https://eudat.eu/news/charting-the-course-towards-a-concrete-european-open-science-cloud-outcomes-of-the-eudat

During the EUDAT Conference opportunity was given to the SeaDataCloud project to organise a half-day SeaDataCloud Workshop. This Workshop gathered circa 30 participants. Colleagues from MARIS, Marine Institute, 52North, SYKE, BODC, INGV, AWI, ULiege, OGS and DKRZ gave a series of presentations divided over 2 sessions with room for discussion:
  • Current status of the SeaDataNet infrastructure and activities for upgrading standards and services.
  • How to derive data products from the large data resources and new opportunities provided by the cloud.
The programme and presentations can be found at the SeaDataNet portal at: https://www.seadatanet.org/Events/Others/EUDAT-SeaDataCloud-workshop

The presentations gave a nice overview of a number of activities that are underway for improving and expanding the services of the SeaDataNet infrastructure for marine and ocean data management. Presenters had undertaken good efforts to explain the background, the marine context and the reasons for the planned upgrading of services as well as the present status of developments. Presenters also highlighted where the cooperation and synergy with EUDAT and its services is planned and under development.

Towards a Blue Cloud as part of the European Open Science Cloud

The European Open Science Cloud (EOSC) is an initiative launched by the European Commission in 2016, as part of the European Cloud Initiative. The objective of the EOSC, as set out by the EC Communication published in April 2016, is to provide a virtual environment with free at the point of use, open and seamless services for storage, management, analysis and re-use of research data, across borders and scientific disciplines, leveraging and federating the existing data infrastructures. In short the aim of EOSC is to create a global, federated environment for managing and sharing scientific data, for science.

The EC has started to fund a number of projects that are expected to support the development of the EOSC. The EOSC-Pilot which brings together a large consortium of 48 organizations (33 beneficiaries and 15 Third Parties) kicked off in January 2017 and will run until end of 2018. Its main objectives are to propose and trial the governance framework for the EOSC and develop a number of demonstrators functioning as high-profile pilots that integrate services and infrastructures to show interoperability and its benefits in a number of scientific domains. Another important project which started in January 2018 is EOSC-Hub. It leverages the EGI and EUDAT infrastructures and consortia and includes more than 100 participants for a total budget of 30M€ over three years. EOSC-Hub essentially deals with setting up the technical infrastructure, in particular from the building blocks provided by EGI, EUDAT and INDIGO-DataCloud.

Further projects are expected to be funded as part of the Research Infrastructures (including e-Infrastructures) work programme 2018-19 to consolidate these initial activities, and ensure the integration of research infrastructures and commercial service providers to the EOSC.

EOSC offers a great opportunity for the SeaDataCloud infrastructure and its network of partners to promote their expertise and science. SDC actors are already involved in the two EOSC related initiatives, in particular through EUDAT partners in both projects and through developing a pilot Marine Competence Center in EOSC-Hub, involving IFREMER, MARIS, ULiege, CSC, and CINECA as SDC partners. Through this engagement, SeaDataNet has the opportunity to influence the EOSC overall agenda by proposing solutions & technologies that have been tested and have shown their benefits in a real, cross-border environment. This can be re-used and serve as a basis for the development of a thematic cloud focusing on ocean and marine data, commonly referred as the BlueCloud.

EMODnet Data Ingestion Wake up your data!

The EMODnet Data Ingestion portal has been launched in February 2017. It aims at reaching out to organisations from research, public, and private sectors who are holding marine datasets and who are not yet connected and contributing to the existing marine data management infrastructures. By a combination of central and national marketing activities potential data providers are identified, encouraged, motivated, and supported to release their datasets through the EMODnet Ingestion portal. The portal provides services that facilitate data holders to submit their marine data sets for validation, safeguarding,  and publishing by qualified data centres and subsequent distribution through European marine data infrastructures.

sdn12_homepage_ingestion.jpg (66.0 K)
Figure: Homepage (partly) of EMODnet Ingestion

The Submission Service facilitates submission of data files. Distinction is made in 2 phases in the life cycle of a data submission:

  • Phase I:  from data submission to publishing ‘as is’  
  • Phase II: further elaboration and integration (of subsets) in national, European and EMODnet thematic portals.
sdn12_data_submission_workflow.png (25.6 K)
Figure: Data submission workflow

The submission workflow is illustrated above and has the following steps:
  • Step 1: Data submitter (possibly with help of EMODnet ‘ambassador’) completes a number of key fields of the submission form and uploads a zip file with the datasets and related documentation;
  • Step 2: Data Centre is assigned who reviews and completes the submission form for publishing ‘as is’ in Summary Service. Assignment goes by data theme and country;
  • Step 3: Data Centre elaborates, where possible, the data sets, resulting in availability in standard formats in data centre portal and European portals such as SeaDataNet, EurOBIS, a.o., and in thematic EMODnet portals.
sdn12_screen_submission_service.jpg (42.4 K)
Figure: Screen of the Submission Service for completing the submission form

The EMODnet network for validating and processing data submissions is recruited from the EMODnet Ingestion and EMODnet thematic portal consortia and at present comprises circa 50 qualified data centres for marine chemistry, physics, geology, bathymetry, biology, seabed habitats, and human activities data.

The marketing and promotion of EMODnet Ingestion is done by a number of central activities, such as promotion at the EMODnet central portal and each of the thematic portals, at conferences  like EGU, AGU, Oceanology International, European Maritime Days, IMDIS, and at events like EMODnet Hackatons, EU project meetings, various Workshops etc. The national marketing is undertaken by members of the EMODnet Ingestion and Thematic consortia who act as EMODnet ambassadors to identify, motivate, and support potential data providers to submit their marine datasets. This is done by using local networks of contacts, organising a national EMODnet day for relations, involving other institute departments, presenting at project meetings, etc. supported by a promotion kit.

EMODnet generic products and services are increasingly discovered and taken up by research, government, and industry as useful input for their scientific, management and economic activities. This can be used as an important argument for convincing data providers. Because it can be argued that their data submissions can contribute to improving the coverage and quality  of the EMODnet products which will be beneficial for themselves as they are potential users of these products. This basic argument is quite strong also when you at the same time consider that data providers of a specific data type must have a real interest in services and products of this data type because why are they otherwise collecting and/or holding this kind of data. Moreover, EMODnet Ingestion acknowledges data submitters and data originators in its publishing service, and for scientific purposes can also include DOIs for data citation.

The marketing and promotion is gaining momentum and end March 2018 more than 100 dataset  submissions have been completed which are published ‘as is’ at the Ingestion portal and of which a number are also already elaborated and populated in SeaDataNet and EMODnet.

sdn12_statistics_submissions_march_2018.png (64.9 K)
Figure: Statistics of submissions per theme end March 2018

The majority of these submissions are from scientific institutes while also a part (20) is from industry related to offshore developments such as surveys for new pipelines, harbours, wind farms.

The scope of EMODnet Ingestion not only concerns historic datasets, but also operational oceanography data that are collected by fixed and moving platforms such as fixed stations, moorings,  buoys, tide gauges, surface drifters, ferryboxes, argo floats,  gliders, HF radars and other platforms. The portal explains and gives guidance how operators of such platforms should get connected. This includes joining the European operational oceanography NRT data exchange as operated by CMEMS-INSTAC (Copernicus) and the EuroGOOS ROOS’s and as archived and validated long timeseries in SeaDataNet. Then their datasets will also become available at the EMODnet Physics portal which is driven by these 3 pillars (CMEMS-INSTAC, EuroGOOS and SeaDataNet). Already several new operators and stations have been added through the intermediation of EMODnet Ingestion ambassadors and their activities,

Wake up your data! Set them free for Blue Society! Make use of the EMODnet Ingestion portal.

Acronyms as used in this Newsletter

This newsletter contains many acronyms which are described in the following list:  

API: Application Programming Interface
CDI: Common Data Index
CF: Climate and Forecast
CMEMS: Copernicus Marine Environmental Monitoring Service
CSR: Cruise Summary Reports
CSV: Comma Separated Values
CS-W: Catalogue Service for the Web
DIVA: Data-Interpolating Variational Analysis software
DOI: Digital Object Identifier
DTM: Digital Terrain Model
EDIOS: European Directory of Oceanographic Observing Systems
EDMED:  European Directory of Marine Environmental Data
EDMERP: European Directory of Marine Environmental Research Projects
EDMO: European Directory of Marine Organisations
EMODnet: European Marine Observation and Data Network
EOSC: European Open Science Cloud
EuroGOOS: European Global Ocean Observing System
GEBCO: General Bathymetric Chart of the Oceans
GEOSS: Global Earth Observation System of Systems
HPC: High Performance Computing
ICES: International Council for the Exploration of the Sea
ICT: Information and Communication Technologies
IMDIS: International Conference on Marine Data and Information Systems
IOC: Intergovernmental Oceanographic Commission
IODE: International Oceanographic Data and Information Exchange
ISO: International Organization for Standardization
JSON: Java Script Object Notation
MSFD: Marine Strategy Framework Directive
NetCDF: Network Common Data Form
NODC: National Oceanographic Data Centre
NRT: Near Real Time
NVS: NERC Vocabulary Services
OBIS: Ocean Biogeographic Information System
ODIP: Ocean Data Interoperability Platform
ODV: Ocean Data View software
ODSBP: Ocean Data Standards and Best Practices project
OGC: Open Geospatial Consortium
O&M: Observations & Measurements
OWL: Ontology Web Language
QA: Quality Assurance
QC: Quality Control
RDA: Research Data Alliance
RDF: Resource Description Framework
RSM: Request Status Manager
RTD: Research and Technological Development
SDB: Satellite Derived Bathymetry
SDC: SeaDataCloud
SDN: SeaDataNet
SWE: OGC Sensor Web Enablement
URL: Universal Resource Locator
VRE: Virtual Research Environment
W3C: World Wide Web Consortium
WCS: Web Coverage Service
WFS: Web Feature Service
WMS: Web Map Service
XML: Extensible Markup Language