The main project goal is to develop libraries and software tools based on the MPEG-G standard for genomic data compression, transport and access. The performance of existing platforms for genome analysis will be improved between tenfold and hundredfold in terms of processing speed and data access speed, and new functionality will be supported. The performance of processing and storage platforms based on MPEG-G technology will largely reduce the costs for data storage and transfer, costs that are dramatically increasing for the usage of inappropriate legacy technologies.
The main result of the project will be the deployment of the first genome analysis pipeline able to manipulate genome sequencing data compressed in compliance with MPEG-G, the new ISO standard for genomic information representation. This would enable functionality such as genome data streaming, advanced selective access, controlled access for privacy protection which are not possible today with existing formats. Genome analysts will have access to a widened range of functionality to access and manipulate both locally and remotely stored genomic data and at the same time will experience faster answers to database queries and faster data processing times.
MPEG and GENOMICS
MEGAPIPELINE will develop the first software able to compress and transport genomic data encoded in compliance with ISO/IEC 23092 (MPEG-G, http://mpeg-g.org, https://mpeg.chiariglione.org/standards/mpeg-g).
In this respect the product is completely new, a breakthrough in terms of compression performance, data access speed, transport capability and new “digital media like” functionality. MPEG-G introduces in genomics the same innovations that MPEG-2, MP3 and JPEG have provided to TV, audio and still pictures.
The developed solutions will reach the market right after the standard publication in mid 2019. The first prototypes for field trials might be available even before.
The new MPEG-G libraries will be integrated in genome analysis pipelines already deployed at The Pirbright Institute, Imegen and other partners’ premises.
MPEG-G provides a normative specification to enable transcoding from a file format to an equivalent packetized version to be transferred on a packet network so as to support genomic data streaming.
The type of streaming supported by MPEG-G ensures that at any point in time the partially transferred file stored on disk is consistent with the syntax specification and can be then parsed and processed. This will enable use cases where genomic data analysis can start at a remote location even before the sequencing run has been completed.