libgunshotmatch.datafile

Represents a parsed datafile.

Classes:

Datafile(name, original_filename, …[, …])

Represents a single datafile in a project.

FileType(value)

Represents the input datafile types supported by PyMassSpec.

GCMSDataInfo(rt_range, time_step, …)

Represents information about a GCMS_data object returned by get_info_from_gcms_data().

Repeat(datafile, peaks[, qualified_peaks, …])

Represents a repeat sample in a project, constructed from a datafile.

Functions:

get_info_from_gcms_data(gcms_data)

Returns a information about the data in a pyms.GCMS.Class.GCMS_data object.

class Datafile(name, original_filename, original_filetype, description='', intensity_matrix=None, user=getpass.getuser(), device=socket.gethostname(), date_created=datetime.datetime.now(), date_modified=datetime.datetime.now(), version=1)[source]

Represents a single datafile in a project.

Parameters
  • name (str) – The name of the Datafile.

  • original_filename (str) – The filename of the file the Datafile was created from.

  • original_filetype (int) – The filetype of the file the Datafile was created from.

  • description (str) – A description of the Datafile. Default ''.

  • intensity_matrix (Optional[IntensityMatrix]) – PyMassSpec IntensityMatrix object. Default None.

  • user (str) – The user who created the Repeat. Default taken from the currently logged-in user.

  • device (str) – The device that created the Repeat. Default taken from the current device’s hostname.

  • date_created (datetime) – The date and time the Repeat was created. Default is the current date and time.

  • date_modified (datetime) – The date and time the Repeat was last modified. Default is the current date and time.

  • version (int) – File format version . Default 1.

Attributes:

date_created

The date and time the Datafile was created.

date_modified

The date and time the Datafile was last modified.

description

A description of the Datafile.

device

The device that created the Datafile.

intensity_matrix

PyMassSpec IntensityMatrix object.

name

The name of the Datafile.

original_filename

The filename of the file the Datafile was created from.

original_filetype

The filetype of the file the Datafile was created from.

user

The user who created the Datafile.

version

File format version

Methods:

export(output_dir)

Export as a .gsmd file and return the output filename.

from_dict(d)

Construct a Datafile from a dictionary.

from_file(filename)

Parse a gsmd file.

load_gcms_data([filename])

Load GC-MS data from the datafile.

new(name, filename)

Construct a new Datafile from a file.

prepare_intensity_matrix(gcms_data[, …])

Build an IntensityMatrix for the datafile.

to_dict()

Returns a dictionary representation of this Datafile.

date_created

Type:    datetime

The date and time the Datafile was created.

date_modified

Type:    datetime

The date and time the Datafile was last modified.

description

Type:    str

A description of the Datafile.

device

Type:    str

The device that created the Datafile.

export(output_dir)[source]

Export as a .gsmd file and return the output filename.

Return type

str

classmethod from_dict(d)[source]

Construct a Datafile from a dictionary.

Parameters

d (Mapping[str, Any])

Return type

Datafile

classmethod from_file(filename)[source]

Parse a gsmd file.

Parameters

filename (Union[str, Path, PathLike]) – The input filename.

Return type

Datafile

intensity_matrix

Type:    Optional[IntensityMatrix]

PyMassSpec IntensityMatrix object.

load_gcms_data(filename=None)[source]

Load GC-MS data from the datafile.

Parameters

filename (Union[str, Path, PathLike, None]) – Alternative filename to load the data from. Useful if the file has moved since the Datafile was created. Default None.

Changed in version 0.4.0: Added the filename attribute.

Return type

GCMS_data

name

Type:    str

The name of the Datafile.

classmethod new(name, filename)[source]

Construct a new Datafile from a file.

Parameters
Return type

Datafile

original_filename

Type:    str

The filename of the file the Datafile was created from.

original_filetype

Type:    FileType

The filetype of the file the Datafile was created from.

prepare_intensity_matrix(gcms_data, savitzky_golay=True, tophat=True, tophat_structure_size='1.5m', crop_mass_range=None)[source]

Build an IntensityMatrix for the datafile.

Parameters
  • gcms_data (GCMS_data)

  • savitzky_golay (Union[bool, SavitzkyGolayMethod]) – Whether to perform Savitzky-Golay smoothing. Default True.

  • tophat (bool) – Whether to perform Tophat baseline correction. Default True.

  • tophat_structure_size (str) – The structure size for Tophat baseline correction. Default '1.5m'.

  • crop_mass_range (Optional[Tuple[float, float]]) – The range of masses to which the GC-MS data should be limited to. Default None.

Return type

IntensityMatrix

to_dict()[source]

Returns a dictionary representation of this Datafile.

All keys are native, JSON-serializable, Python objects.

Return type

Dict[str, Any]

user

Type:    str

The user who created the Datafile.

version

Type:    int

File format version

enum FileType(value)[source]

Bases: enum_tools.custom_enums.IntEnum

Represents the input datafile types supported by PyMassSpec.

Member Type

int

Valid values are as follows:

JDX = <FileType.JDX: 0>
MZML = <FileType.MZML: 1>
ANDI = <FileType.ANDI: 2>
namedtuple GCMSDataInfo(rt_range, time_step, time_step_stdev, n_scans, mz_range, num_mz_mean, num_mz_median)[source]

Bases: NamedTuple

Represents information about a GCMS_data object returned by get_info_from_gcms_data().

Fields
  1.  rt_range (Tuple[float, float]) – The minimum and maximum retention times.

  2.  time_step (float) – The average time step between scans.

  3.  time_step_stdev (float) – The standard deviation of the time steps between scans.

  4.  n_scans (int) – The total number of scans.

  5.  mz_range (Tuple[float, float]) – The minimum and maximum mass (m/z) values.

  6.  num_mz_mean (float) – The mean average number of masses per scan.

  7.  num_mz_median (float) – The median number of masses per scan.

class Repeat(datafile, peaks, qualified_peaks=None, user=getpass.getuser(), device=socket.gethostname(), date_created=datetime.datetime.now(), date_modified=datetime.datetime.now(), version=1)[source]

Represents a repeat sample in a project, constructed from a datafile.

Parameters
  • datafile (Datafile) – The Datafile for this repeat.

  • peaks (List[Peak])

  • qualified_peaks (Optional[List[QualifiedPeak]]) – Peaks containing identities from library search. This is usually populated after peak alignment. Default None.

  • user (str) – The user who created the Repeat. Default taken from the currently logged-in user.

  • device (str) – The device that created the Repeat. Default taken from the current device’s hostname.

  • date_created (datetime) – The date and time the Repeat was created. Default is the current date and time.

  • date_modified (datetime) – The date and time the Repeat was last modified. Default is the current date and time.

  • version (int) – File format version . Default 1.

Attributes:

datafile

The Datafile for this repeat.

date_created

The date and time the Repeat was created.

date_modified

The date and time the Repeat was last modified.

device

The device that created the Repeat.

name

The name of the Datafile.

peaks

qualified_peaks

Peaks containing identities from library search.

user

The user who created the Repeat.

version

File format version

Methods:

export(output_dir)

Export as a .gsmr file and return the output filename.

from_dict(d)

Construct a Repeat from a dictionary.

from_file(filename)

Parse a gsmr file.

to_dict()

Returns a dictionary representation of this Repeat.

datafile

Type:    Datafile

The Datafile for this repeat.

date_created

Type:    datetime

The date and time the Repeat was created.

date_modified

Type:    datetime

The date and time the Repeat was last modified.

device

Type:    str

The device that created the Repeat.

export(output_dir)[source]

Export as a .gsmr file and return the output filename.

New in version 0.4.0.

Return type

str

classmethod from_dict(d)[source]

Construct a Repeat from a dictionary.

Parameters

d (Mapping[str, Any])

Return type

Repeat

classmethod from_file(filename)[source]

Parse a gsmr file.

Parameters

filename (Union[str, Path, PathLike]) – The input filename.

Return type

Repeat

New in version 0.4.0.

property name

The name of the Datafile.

Return type

str

New in version 0.4.0.

peaks

Type:    PeakList

qualified_peaks

Type:    Optional[List[QualifiedPeak]]

Peaks containing identities from library search. This is usually populated after peak alignment.

to_dict()[source]

Returns a dictionary representation of this Repeat.

All keys are native, JSON-serializable, Python objects.

Return type

Dict[str, Any]

user

Type:    str

The user who created the Repeat.

version

Type:    int

File format version

get_info_from_gcms_data(gcms_data)[source]

Returns a information about the data in a pyms.GCMS.Class.GCMS_data object.

Return type

GCMSDataInfo