Zenodo is hosted by CERN which has existed since 1954 and currently has an experimental programme defined for the next 20+ years. CERN is a memory institution for High Energy Physics and renowned for its pioneering work in Open Access. Organisationally Zenodo is embedded in the IT Department, Collaboration Devices and Applications Group, Digital Repositories Section (IT-CDA-DR).
Zenodo is offered by CERN as part of its mission to make available the results of its work (CERN Convention, Article II, §1).
Zenodo is funded by:
Zenodo is developed and supported as a marginal activity, and hosted on top of existing infrastructure and services at CERN, in order to reduce operational costs and rely on existing efforts for High Energy Physics. CERN has some of the world’s top experts in running large scale research data infrastructures and digital repositories that we rely on in order to deliver a trusted digital repository.
Zenodo is operated currently by:
Zenodo is however embedded in a much larger team, headed by Jose Benito Gonzalez Lopez, which runs services such as CERN Document Server, CERN Open Data, CERN Analysis Preservation and we rely heavily on co-developing features via the Invenio digital library framework.
CERN is an active member of the following organisations and international bodies (non-exhaustive):
We are partners in multiple European Commission funded projects, amongst others:
Zenodo servers are managed via OpenStack and Puppet configuration management system which ensures that our servers always have the latest security patches applied. Servers are monitored via CERN’s monitoring infrastructure based on Flume, Elasticsearch, Kibana and Hadoop. Application errors are logged and aggregated in a local Sentry instance. Traffic to Zenodo frontend servers is load balanced via a combination of DNS load balancing and HAProxy load balancers.
We are furthermore running two independent systems: one production system and one quality assurance system. This ensures that all changes, whether at infrastructure level or source code level, can be tested and validated on our quality assurance system prior to being applied to our production system.
Zenodo frontend servers are responsible for running the Invenio repository platform application which is based on Python and the Flask web development framework. The frontend servers are running nginx HTTP server and uwsgi application server in front of the application and nginx is in addition in charge of serving static content.
All files uploaded to Zenodo are stored in CERN’s EOS service in an 18 petabytes disk cluster. Each file copy has two replicas located on different disk servers.
For each file we store two independent MD5 checksums. One checksum is stored by Invenio, and used to detect changes to files made from outside of Invenio. The other checksum is stored by EOS, and used for automatic detection and recovery of file corruption on disks.
Zenodo may, depending on access patterns in the future, move the archival and/or the online copy to CERN’s offline long-term tape storage system CASTOR in order to minimize long-term storage costs.
EOS is the primary low latency storage infrastructure for physics data from the Large Hadron Collider (LHC) and CERN currently operates multiple instances totalling 150+ petabytes of data with expected growth rates of 30-50 petabytes per year. CERN’s CASTOR system currently manages 100+ petabytes of LHC data which are regularly checked for data corruption.
Invenio provides an object store like file management layer on top of EOS which is in charge of e.g. version changes to files.
Metadata and persistent identifiers in Zenodo are stored in a PostgreSQL instance operated on CERN’s Database on Demand infrastructure with 12-hourly backup cycle with one backup sent to tape storage once a week. Metadata is in addition indexed in an Elasticsearch cluster for fast and powerful searching. Metadata is stored in JSON format in PostgreSQL in a structure described by versioned JSONSchemas. All changes to metadata records on Zenodo are versioned, and happening inside database transactions.
In addition to the metadata and data storage, Zenodo relies on Redis for caching and RabbitMQ and python Celery for distributed background jobs.
We take security very serious and do our best to protect your data.
Special note on closed access data
Zenodo allows users to upload files under closed access. Closed access means that zenodo.org users will not be able to access the files you uploaded. The files are however stored unencrypted and may be viewed by Zenodo operational staff under specific conditions. This means that “closed access” on Zenodo is not suitable for secret or confidential data.