MedFormat Open Source Library

MED Open Source Library includes fully functional code to create, open, read and write MED compressed data files and structures in C for Linux and MacOS. The MED format is fully documented.

For access to open source code and documentation:

The Multiscale Electrophysiology Data (MED) Format is an open source data format developed to manage big data in electrophysiology and facilitate data sharing.



Projects Using MED Format:



To add your project to this list, contact us.


Developers:

If you are interested in contributing to the evolution of MED, or sharing your own MED-based source code, please contact us at contact@medformat.org. You will receive proper attribution and we will include links to your software here at your discretion.


Users:

If you would like to suggest future features, or report bugs please contact us at contact@medformat.org.




Modern electrophysiology generates enormous volumes of data in both the clinical and research realms. The tools necessary to manage this data have not kept pace with the technical ability to acquire it. The Multiscale Electrophysiology Data Format (MED) is an open source format developed to address modern issues with big data acquisition, analysis, and transmission. Additionally, data sharing is often hindered by data size, proprietary formats, HIPAA regulation, operating system incompatibilities, and institutional IRB requirements. The MED format is designed to ameliorate many of these difficulties with three cardinal features: being open source, data compression, and stratified encryption.

Feature Characteristics
Format
  • One directory per channel
  • Channels are segmented in time (single segment channels are supported)
  • Extensible channel types (currently, time series & video)
  • Time series channel:
    1. 32 bit resolution (integer)
    2. Independent channel sampling frequencies
    3. Any time series data can be encoded (e.g. transforms of original data)
Time Series Compression
  • Decreased data storage
  • Increased network transfer, read/write speeds
  • Variable block sizes
  • Channel-specific sampling rates supported reduce data volume
  • Adaptive lossless or lossy compression
  • Improved compression ratio with decreased signal variance (e.g. filtering)
  • Independent blocks allow parallel compression / decompression
  • Variable sampling rates supported
  • Algorithm optimized for hardware implementation
  • Block headers contain information necessary to facilitate data transmission, including data loss detection and asynchronous transmission
Encryption
  • AES 128-bit content with 256-bit SHA password hashes exceeding HIPAA requirements
  • Sharing of human data does not require de-identification procedures
  • Dual-tiered, single-password, encryption scheme allowing differential access to the same file
  • Secure password recovery mechanism
  • Unauthorized copies have no access to creator-determined file regions: technical metadata, subject-identifying metadata, specific records, time series data
  • Times are optionally offset, preserving true time of day, but obscuring actual recording date and time zone
  • Encryption is not required
Access
  • Rapid random access via indices files
  • Field alignment facilitates direct variable access after data read
Analysis
  • Separate directory for each channel to facilitate parallel processing
  • Independence of time series blocks support asynchronous and parallel processing
  • Multiple precalculated fields facilitate various common analyses
Real-time
  • The structure of MED files allows real-time reading and writing.
  • Catastrophic failure during an acquisition will leave an intact valid MED structure
Redundancy & Damage Mitigation
  • 32-bit CRC checksums for detection of file, individual record, & time series block corruption
  • Parity channel for targeted damage repair
  • Time Series Channels:
    1. Block independence limits extent of data loss if damage occurs
    2. Block alignment facilitates file recovery
    3. Critical fields duplicated in block header and indices files
    4. Entire indices file can be reconstructed from data file
Time
  • Time discontinuities supported and indexed
  • µUTC time provides globally accurate date & time of day to microsecond resolution
  • µUTC time is easily converted to UTC time for use with standard Unix / Posix time functions
International Support
  • All strings and passwords support international character sets (UTF-8)
  • All times in global µUTC time
  • Global timezones are fully incorporated and supported
Events
  • Stored in binary records file
  • User-defined event types readily accommodated by records format
Video
  • Video channels are explicitly supported
Support
  • Open source (GNU public license, with commercial provisions)
  • Code library in C with API documentation
  • Full format specification documentation
  • Example code

MED Format file hierarchy & naming conventions

MED is the next evolution of the Multiscale Electrophysiology Format, version 3 (MEF3). MEF was developed at Mayo Clinic starting in 2006. a summary of the development history follows:

MEF1 (2006):

  • Data Compression: Range Encoded Derivatives (RED) was created to be a random access and efficient compression scheme for neurophysiological data.
  • Data Structure Alignment: This design principle allows memory mapping of MEF file structures. I.e. data can be read directly into an intact MEF structure hierarchy in memory, without parsing or assignment, from disk due to the MEF structure design. The converse is also true for writing.
  • Data blocks and discontinuity indexing for rapid random access.
  • Channel independence: Channels are stored in separate files, and can have their own sampling frequencies, facilitating parallelization of analysis and data reduction, by being able tune the sampling frequency to the information content of the channel.
  • Global outlook: MEF was designed to be a global format with times represented as Micro Coordinated Universal Time (µUTC), and strings, including filenames and passwords, represented in Unicode (UTF-8) supporting all international character sets.
  • Intuitive file system organization, facilitating, and robust to, direct user manipulation.
  • User extensible record-types.
  • Reserved discretionary space in all sections of the format for user customization.
  • Reserved protected space in all sections of the format for potential future extensions.

MEF2 (2008):

  • Tiered encryption scheme: Differential access to the same file depending on password. Defined levels are termed “Level 1 & 2” with level 2 access decrypting all file information, including potentially patient identifying information, while level 1 access provides only technical recording details necessary for analysis. (The schema is actually more flexible than described, but this is the most common use-case.)
  • Time obfuscation: Times & timezones can be offset such that the recording start day appears to be 1 January 1970 (“The Epoch” in Unix parlance), but the true times of day are preserved. The recording start timezone is likewise shown as Greenwich Meantime (GMT or UTC). Level 2 access displays times with true dates and timezones.
  • Damage detection schema: Cyclic Redundancy Codes (CRCs) are judiciously distributed throughout the format to detect and localize file damage.
  • Support for video channels was explicitly added. Other channel types, yet to be defined, are implicitly supported.

MEF3 (2016):

  • Re-design of the MEF hierarchy to:
  • Ameliorate failure during data acquisition (e.g. power outage)
  • Facilitate realtime analysis:
    1. Allow data segmentation
    2. Variable amplitude compression mode (lossy)
    3. Improved RED codec performance

MED (Jan 2020):

  • Predictive RED (PRED) compression
  • Minimal Bit Encoding (MBE) fall-through compression for degenerate data blocks
  • Variable frequency compression mode (lossy)
  • Multiple derivative compression mode (lossless)
  • Online block statistics calculation
  • Parity channel for data recovery: In combination with the MED CRCs, this channel permits immediate targeted micro-repair, rather than protracted, full-scale restoration from backup.
  • Secure password recovery mechanism
  • Mechanism to display times in the timezone in which they were recorded, accounting for Daylight Saving Time (with appropriate user access)
  • Support for geotagging
  • Enhanced, expanded open source library code base and documentation
  • Stead M, Halford JJ. A Proposal for a Standard Format for Neurophysiology Data Recording and Exchange. J Clin Neurphys, Jan 2016.
  • Brinkmann BH, Bower MR, Stengel KA, Worrell GA, Stead M. Large-scale Electrophysiology: Acquisition, Compression, Encryption, and Storage of Big Data. J Neurosci Meth. 2009. May 30;180(1):185-92.
  • Brinkmann BH, Bower MR, Stengel KA, Worrell GA, Stead M. Multiscale electrophysiology format: an open-source electrophysiology format using data compression, encryption, and cyclic redundancy check. Annu Int Conf IEEE Eng Med Biol Soc. 2009;2009:7083-6.
  • Bower MR, Stead M, Brinkmann BH, Dufendach K, Worrell GA. Metadata and annotations for multi-scale electrophysiological data. Annu Int Conf IEEE Eng Med Biol Soc. 2009;2009:2811-4.

Multiscale Electrophysiology Data (MED) Format Software Library, Version 1.0

Written by Matt Stead


MED library source code (medlib) is copyrighted by Dark Horse Neuro Inc, 2021 (Matt Stead & Casey Stengel)


Medlib is free software:

You can redistribute it and/or modify it under the terms of the Gnu General Public License (Gnu GPL), version 3, or any later version (as published by the Free Software Foundation).

The Gnu GPL requires that any object code built and distributed using this software is accompanied by the FULL SOURCE CODE used to generate the object code.


This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;

without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

See the Gnu GPL for more details.


If you did not receive a copy of the Gnu GPL along with this code, you can find it on the GNU web site.

You can also obtain a copy by writing to the Free Software Foundation, Inc. at:

51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.


We kindly ask you to acknowledge medlib in any program or publication in which you use it, but you are not required to do so.


Commercial versions of medlib may be licensed from Dark Horse Neuro Inc, Bozeman, MT, USA.

Commercially licensed copies do not require object code using medlib to be accompanied with it's corresponding full source code.

Users interested in a commercial license may contact Dark Horse Neuro, Inc.

MED derives from the Multiscale Electrophysiology Format (MEF), versions 1-3.

Many people contributed to the MEF effort, but special mention is owed to

Greg Worrell, Casey Stengel, Andy Gardner, Mark Bower, Vince Vasoli, Ben Brinkmann,

Dan Crepeau, Jan Cimbálnik, Jon Lange, and Jon Halford for their contributions

in design, coding, testing, implementation, and adoption.


The encryption / decryption algorithm is the 128-bit AES standard (  http://www.csrc.nist.gov/publications/fips/fips197/fips-197.pdf ).

AES routines (128 bit only) are included in the library, with attribution, for convenience.


The hash algorithm is the SHA-256 standard (  http://csrc.nist.gov/publications/fips/fips180-4/fips-180-4.pdf ).

Basic SHA-256 routines are included in the library, with attribution, for convenience.


Strings are encoded in the Universal Character Set standard, ISO/IEC 10646:2012 otherwise known as UTF-8.

http://standards.iso.org/ittf/PubliclyAvailableStandards/c056921_ISO_IEC_10646_2012.zip )

Basic UTF-8 manipulation routines are included in the library, with attribution, for convenience.


Error detection is implemented with 32-bit cyclic redundancy checksums (CRCs).

Basic CRC-32 manipulation routines are included in the library, with attribution, for convenience.

Version 1.0
Copyright © MEDformat.org, 2021. All Rights Reserved

contact@medformat.org