libgunshotmatch.consolidate

Functions for combining peak identifications across aligned peaks into a single set of results.

Classes:

ConsolidatedPeak(rt_list, area_list, ms_list, *)

A Peak that has been produced by consolidating the properties and search results of several qualified peaks.

ConsolidatedPeakFilter([name_filter, …])

Class to filter a list of consolidated peaks to exclude peaks by hit name, match factor etc.

ConsolidatedSearchResult(name, cas[, …])

Represents a candidate compound for a peak.

InvertedFilter([name_filter, …])

Inverted version of ConsolidatedPeakFilter.

Functions:

combine_spectra(peak)

Sum the intensities across all mass spectra in the given peak.

match_counter(engine, peak_numbers, …)

Find the most likely compound for each peak.

pairwise_ms_comparisons(alignment[, parallel])

Between Samples Spectra Comparison.

class ConsolidatedPeak(rt_list, area_list, ms_list, *, minutes=False, hits=None, ms_comparison=None, meta=None)[source]

A Peak that has been produced by consolidating the properties and search results of several qualified peaks.

Parameters

Methods:

__len__()

How many instances of the peak make up this ConsolidatedPeak.

from_dict(d)

Construct a ConsolidatedPeak from a dictionary.

to_dict()

Returns a dictionary representation of this ConsolidatedPeak.

Attributes:

area

The average peak area across the aligned peaks.

area_list

List of peak areas for the aligned peaks.

area_stdev

The standard deviation of the peak area across the aligned peaks.

average_ms_comparison

The average of the pairwise mass spectral comparison scores.

hits

Optional list of possible identities for this peak.

meta

Optional dictionary for storing e.g.

ms_comparison

Pairwise mass spectral comparison scores.

ms_comparison_stdev

The standard deviation of the pairwise mass spectral comparison scores.

ms_list

List of mass spectra for the aligned peaks.

rt

The average retention time across the aligned peaks.

rt_list

List of retention times of the aligned peaks.

rt_stdev

The standard deviation of the retention time across the aligned peaks.

__len__()[source]

How many instances of the peak make up this ConsolidatedPeak.

Return type

int

property area

The average peak area across the aligned peaks.

Return type

float

area_list

Type:    List[float]

List of peak areas for the aligned peaks.

property area_stdev

The standard deviation of the peak area across the aligned peaks.

Return type

float

property average_ms_comparison

The average of the pairwise mass spectral comparison scores.

Return type

float

classmethod from_dict(d)[source]

Construct a ConsolidatedPeak from a dictionary.

Parameters

d (Mapping[str, Any])

Return type

ConsolidatedPeak

hits

Type:    List[ConsolidatedSearchResult]

Optional list of possible identities for this peak.

meta

Type:    Dict[str, Any]

Optional dictionary for storing e.g. peak number or whether the peak should be hidden.

ms_comparison

Type:    Series

Pairwise mass spectral comparison scores.

property ms_comparison_stdev

The standard deviation of the pairwise mass spectral comparison scores.

Return type

float

ms_list

Type:    MutableSequence[Optional[MassSpectrum]]

List of mass spectra for the aligned peaks.

property rt

The average retention time across the aligned peaks.

Return type

float

rt_list

Type:    List[float]

List of retention times of the aligned peaks.

property rt_stdev

The standard deviation of the retention time across the aligned peaks.

Return type

float

to_dict()[source]

Returns a dictionary representation of this ConsolidatedPeak.

All keys are native, JSON-serializable, Python objects.

Return type

Dict[str, Any]

class ConsolidatedPeakFilter(name_filter=[], min_match_factor=600, min_appearances=- 1, verbose=False)[source]

Class to filter a list of consolidated peaks to exclude peaks by hit name, match factor etc.

New in version 0.2.0.

Parameters
  • name_filter (Iterable[str]) – List of glob-style matches for compound names. Consolidated peaks matching any of these will be excluded. Default [].

  • min_match_factor (int) – Minimum average match factor. Consolidated peaks with an average match factor below this will be excluded. Default 600.

  • min_appearances (int) – Number of times the hit must appear across the individual aligned peaks. Consolidated peaks where the most common hit appears fewer times than this will be excluded. If set to -1 the number of instances of the peak in the project are used. Default -1.

  • verbose (bool) – If True details of excluded peaks will be printed. Default False.

Methods:

filter(consolidated_peaks)

Filter a list of consolidated peaks.

from_method(method)

Construct a ConsolidatedPeakFilter from a ConsolidateMethod.

print_skip_reason(peak, reason)

Print the reason for skipping a peak, if ConsolidatedPeakFilter.verbose is True.

should_filter_peak(peak)

Returns True if the peak should be excluded based on the current filter options.

Attributes:

min_appearances

Number of times the hit must appear across the individual aligned peaks.

min_match_factor

Minimum average match factor.

name_filter

List of glob-style matches for compound names.

verbose

If True details of excluded peaks will be printed.

filter(consolidated_peaks)[source]

Filter a list of consolidated peaks.

Parameters

consolidated_peaks (List[ConsolidatedPeak])

Return type

List[ConsolidatedPeak]

classmethod from_method(method)[source]

Construct a ConsolidatedPeakFilter from a ConsolidateMethod.

Parameters

method (ConsolidateMethod)

Return type

ConsolidatedPeakFilter

min_appearances

Type:    int

Number of times the hit must appear across the individual aligned peaks.

Consolidated peaks where the most common hit appears fewer times than this will be excluded.

If set to -1 the number of instances of the peak in the project are used.

min_match_factor

Type:    int

Minimum average match factor.

Consolidated peaks with an average match factor below this will be excluded.

name_filter

Type:    List[str]

List of glob-style matches for compound names.

Consolidated peaks matching any of these will be excluded.

print_skip_reason(peak, reason)[source]

Print the reason for skipping a peak, if ConsolidatedPeakFilter.verbose is True.

Parameters
  • peak (ConsolidatedPeak) – The peak being skipped.

  • reason (str) – The reason for skipping the peak.

should_filter_peak(peak)[source]

Returns True if the peak should be excluded based on the current filter options.

Parameters

peak (ConsolidatedPeak)

Return type

bool

verbose

Type:    bool

If True details of excluded peaks will be printed.

class InvertedFilter(name_filter=[], min_match_factor=600, min_appearances=- 1, verbose=False)[source]

Bases: libgunshotmatch.consolidate.ConsolidatedPeakFilter

Inverted version of ConsolidatedPeakFilter.

Returns peaks which would be excluded by a ConsolidatedPeakFilter.

New in version 0.10.0.

Parameters
  • name_filter (Iterable[str]) – List of glob-style matches for compound names. Consolidated peaks matching any of these will be excluded. Default [].

  • min_match_factor (int) – Minimum average match factor. Consolidated peaks with an average match factor below this will be excluded. Default 600.

  • min_appearances (int) – Number of times the hit must appear across the individual aligned peaks. Consolidated peaks where the most common hit appears fewer times than this will be excluded. If set to -1 the number of instances of the peak in the project are used. Default -1.

  • verbose (bool) – If True details of excluded peaks will be printed. Default False.

Methods:

filter(consolidated_peaks)

Filter a list of consolidated peaks.

print_skip_reason(peak, reason)

Print the reason for skipping a peak, if ConsolidatedPeakFilter.verbose is True.

filter(consolidated_peaks)[source]

Filter a list of consolidated peaks.

Parameters

consolidated_peaks (List[ConsolidatedPeak])

Return type

List[ConsolidatedPeak]

print_skip_reason(peak, reason)[source]

Print the reason for skipping a peak, if ConsolidatedPeakFilter.verbose is True.

Parameters
  • peak (ConsolidatedPeak) – The peak being skipped.

  • reason (str) – The reason for skipping the peak.

class ConsolidatedSearchResult(name, cas, mf_list=[], rmf_list=[], hit_numbers=[], reference_data=None)[source]

Represents a candidate compound for a peak.

This is determined from a set of SearchResults for a set of aligned peaks.

Parameters
  • name (str) – The name of the candidate compound.

  • cas (str) – The CAS number of the compound.

  • mf_list (List[int]) – List of Match Factors comparing the mass spectrum of the peak with the reference spectrum in each aligned peak. Will contain NaN where the compound was not in the hit list for a peak. Default [].

  • rmf_list (List[int]) – List of Reverse Match Factors comparing the reference spectrum with the spectrum for each aligned peak. Will contain NaN where the compound was not in the hit list for a peak. Default [].

  • hit_numbers (List[int]) – List of “hit” numbers from NIST MS Search. Lower is better. Will contain NaN where the compound was not in the hit list for a peak. Default [].

  • reference_data (Union[Dict, ReferenceData, None]) – The reference mass spectrum for the compound from the NIST library. Default None.

Methods:

__len__()

The number of aligned peaks the compound appeared in the hit list for.

from_dict(d)

Construct a ConsolidatedSearchResult from a dictionary.

to_dict()

Returns a dictionary representation of this ConsolidatedSearchResult.

Attributes:

average_hit_number

The average hit number.

cas

The CAS number of the compound.

hit_number_stdev

The standard deviation of the hit numbers.

hit_numbers

List of “hit” numbers from NIST MS Search.

match_factor

The average match factor.

match_factor_stdev

The standard deviation of the match factors.

mf_list

List of Match Factors comparing the mass spectrum of the peak with the reference spectrum in each aligned peak.

name

The name of the candidate compound.

reference_data

The reference mass spectrum for the compound from the NIST library.

reverse_match_factor

The average reverse match factor.

reverse_match_factor_stdev

The standard deviation of the reverse match factors.

rmf_list

List of Reverse Match Factors comparing the reference spectrum with the spectrum for each aligned peak.

__len__()[source]

The number of aligned peaks the compound appeared in the hit list for.

Return type

int

property average_hit_number

The average hit number.

Missing values (where the compound was not in the hit list for a peak) are excluded from the calculation.

Return type

float

cas

Type:    str

The CAS number of the compound.

classmethod from_dict(d)[source]

Construct a ConsolidatedSearchResult from a dictionary.

Parameters

d (Mapping[str, Any])

Return type

ConsolidatedSearchResult

property hit_number_stdev

The standard deviation of the hit numbers.

Missing values (where the compound was not in the hit list for a peak) are excluded from the calculation.

Return type

float

hit_numbers

Type:    List[int]

List of “hit” numbers from NIST MS Search.

Lower is better. Will contain NaN where the compound was not in the hit list for a peak.

property match_factor

The average match factor.

Missing values (where the compound was not in the hit list for a peak) are excluded from the calculation.

Return type

float

property match_factor_stdev

The standard deviation of the match factors.

Missing values (where the compound was not in the hit list for a peak) are excluded from the calculation.

Return type

float

mf_list

Type:    List[int]

List of Match Factors comparing the mass spectrum of the peak with the reference spectrum in each aligned peak.

Will contain NaN where the compound was not in the hit list for a peak.

name

Type:    str

The name of the candidate compound.

reference_data

Type:    Optional[ReferenceData]

The reference mass spectrum for the compound from the NIST library.

property reverse_match_factor

The average reverse match factor.

Missing values (where the compound was not in the hit list for a peak) are excluded from the calculation.

Return type

float

property reverse_match_factor_stdev

The standard deviation of the reverse match factors.

Missing values (where the compound was not in the hit list for a peak) are excluded from the calculation.

Return type

float

rmf_list

Type:    List[int]

List of Reverse Match Factors comparing the reference spectrum with the spectrum for each aligned peak.

Will contain NaN where the compound was not in the hit list for a peak.

to_dict()[source]

Returns a dictionary representation of this ConsolidatedSearchResult.

All keys are native, JSON-serializable, Python objects.

Return type

Dict[str, Any]

match_counter(engine, peak_numbers, qualified_peaks, ms_comp_data)[source]

Find the most likely compound for each peak.

Parameters
  • engine (Engine)

  • peak_numbers (List[int]) – List of peak numbers to process.

  • qualified_peaks (List[List[QualifiedPeak]]) – List of lists of qualified aligned peaks for each repeat.

  • ms_comp_data (DataFrame) – Dataframe giving pairwise mass spectrum comparisons for each set of aligned peaks.

Return type

List[ConsolidatedPeak]

pairwise_ms_comparisons(alignment, parallel=True)[source]

Between Samples Spectra Comparison.

Parameters
Return type

DataFrame

Returns

pandas.DataFrame where the columns are pairwise spectrum similarity scores and the rows are the peaks.