`libgunshotmatch.consolidate`¶

Functions for combining peak identifications across aligned peaks into a single set of results.

Classes:

`ConsolidatedPeak`(rt_list, area_list, ms_list, *)	A Peak that has been produced by consolidating the properties and search results of several qualified peaks.
`ConsolidatedPeakFilter`([name_filter, …])	Class to filter a list of consolidated peaks to exclude peaks by hit name, match factor etc.
`ConsolidatedSearchResult`(name, cas[, …])	Represents a candidate compound for a peak.
`InvertedFilter`([name_filter, …])	Inverted version of `ConsolidatedPeakFilter`.

Functions:

`combine_spectra`(peak)	Sum the intensities across all mass spectra in the given peak.
`match_counter`(engine, peak_numbers, …)	Find the most likely compound for each peak.
`pairwise_ms_comparisons`(alignment[, parallel])	Between Samples Spectra Comparison.

class ConsolidatedPeak(rt_list, area_list, ms_list, *, minutes=False, hits=None, ms_comparison=None, meta=None)[source]¶

A Peak that has been produced by consolidating the properties and search results of several qualified peaks.

Parameters

rt_list (List[float]) – List of retention times of the aligned peaks.
area_list (List[float]) – List of peak areas for the aligned peaks.
ms_list (MutableSequence[Optional[MassSpectrum]]) – List of mass spectra for the aligned peaks.
minutes (bool) – Retention time units flag. If True, retention time is in minutes; if False retention time is in seconds. Default False.
hits (Optional[List[ConsolidatedSearchResult]]) – Optional list of possible identities for this peak. Default None.
ms_comparison (Union[Mapping[str, float], Series, None]) – Mapping or Pandas Series giving pairwise mass spectral comparison scores. Default None.
meta (Optional[Dict[str, Any]]) – Optional dictionary for storing e.g. peak number or whether the peak should be hidden. Default None.

Methods:

`__len__`()	How many instances of the peak make up this `ConsolidatedPeak`.
`from_dict`(d)	Construct a `ConsolidatedPeak` from a dictionary.
`to_dict`()	Returns a dictionary representation of this `ConsolidatedPeak`.

Attributes:

`area`	The average peak area across the aligned peaks.
`area_list`	List of peak areas for the aligned peaks.
`area_stdev`	The standard deviation of the peak area across the aligned peaks.
`average_ms_comparison`	The average of the pairwise mass spectral comparison scores.
`hits`	Optional list of possible identities for this peak.
`meta`	Optional dictionary for storing e.g.
`ms_comparison`	Pairwise mass spectral comparison scores.
`ms_comparison_stdev`	The standard deviation of the pairwise mass spectral comparison scores.
`ms_list`	List of mass spectra for the aligned peaks.
`rt`	The average retention time across the aligned peaks.
`rt_list`	List of retention times of the aligned peaks.
`rt_stdev`	The standard deviation of the retention time across the aligned peaks.

__len__()[source]¶

How many instances of the peak make up this ConsolidatedPeak.

Return type: int

property area¶

The average peak area across the aligned peaks.

Return type: float

area_list¶

Type: List[float]

List of peak areas for the aligned peaks.

property area_stdev¶

The standard deviation of the peak area across the aligned peaks.

Return type: float

property average_ms_comparison¶

The average of the pairwise mass spectral comparison scores.

Return type: float

classmethod from_dict(d)[source]¶

Construct a ConsolidatedPeak from a dictionary.

Parameters: d (Mapping[str, Any])
Return type: ConsolidatedPeak

hits¶

Type: List[ConsolidatedSearchResult]

Optional list of possible identities for this peak.

meta¶

Type: Dict[str, Any]

Optional dictionary for storing e.g. peak number or whether the peak should be hidden.

ms_comparison¶

Type: Series

Pairwise mass spectral comparison scores.

property ms_comparison_stdev¶

The standard deviation of the pairwise mass spectral comparison scores.

Return type: float

ms_list¶

Type: MutableSequence[Optional[MassSpectrum]]

List of mass spectra for the aligned peaks.

property rt¶

The average retention time across the aligned peaks.

Return type: float

rt_list¶

Type: List[float]

List of retention times of the aligned peaks.

property rt_stdev¶

The standard deviation of the retention time across the aligned peaks.

Return type: float

to_dict()[source]¶

Returns a dictionary representation of this ConsolidatedPeak.

All keys are native, JSON-serializable, Python objects.

Return type: Dict[str, Any]

class ConsolidatedPeakFilter(name_filter=[], min_match_factor=600, min_appearances=- 1, verbose=False)[source]¶

Class to filter a list of consolidated peaks to exclude peaks by hit name, match factor etc.

New in version 0.2.0.

Parameters

name_filter (Iterable[str]) – List of glob-style matches for compound names. Consolidated peaks matching any of these will be excluded. Default [].
min_match_factor (int) – Minimum average match factor. Consolidated peaks with an average match factor below this will be excluded. Default 600.
min_appearances (int) – Number of times the hit must appear across the individual aligned peaks. Consolidated peaks where the most common hit appears fewer times than this will be excluded. If set to -1 the number of instances of the peak in the project are used. Default -1.
verbose (bool) – If True details of excluded peaks will be printed. Default False.

Methods:

`filter`(consolidated_peaks)	Filter a list of consolidated peaks.
`from_method`(method)	Construct a `ConsolidatedPeakFilter` from a `ConsolidateMethod`.
`print_skip_reason`(peak, reason)	Print the reason for skipping a peak, if `ConsolidatedPeakFilter.verbose` is `True`.
`should_filter_peak`(peak)	Returns `True` if the peak should be excluded based on the current filter options.

Attributes:

`min_appearances`	Number of times the hit must appear across the individual aligned peaks.
`min_match_factor`	Minimum average match factor.
`name_filter`	List of glob-style matches for compound names.
`verbose`	If `True` details of excluded peaks will be printed.

filter(consolidated_peaks)[source]¶

Filter a list of consolidated peaks.

Parameters: consolidated_peaks (List[ConsolidatedPeak])
Return type: List[ConsolidatedPeak]

classmethod from_method(method)[source]¶

Construct a ConsolidatedPeakFilter from a ConsolidateMethod.

Parameters: method (ConsolidateMethod)
Return type: ConsolidatedPeakFilter

min_appearances¶

Type: int

Number of times the hit must appear across the individual aligned peaks.

Consolidated peaks where the most common hit appears fewer times than this will be excluded.

If set to -1 the number of instances of the peak in the project are used.

min_match_factor¶

Type: int

Minimum average match factor.

Consolidated peaks with an average match factor below this will be excluded.

name_filter¶

Type: List[str]

List of glob-style matches for compound names.

Consolidated peaks matching any of these will be excluded.

print_skip_reason(peak, reason)[source]¶

Print the reason for skipping a peak, if ConsolidatedPeakFilter.verbose is True.

Parameters

peak (ConsolidatedPeak) – The peak being skipped.
reason (str) – The reason for skipping the peak.

should_filter_peak(peak)[source]¶

Returns True if the peak should be excluded based on the current filter options.

Parameters: peak (ConsolidatedPeak)
Return type: bool

verbose¶

Type: bool

If True details of excluded peaks will be printed.

class InvertedFilter(name_filter=[], min_match_factor=600, min_appearances=- 1, verbose=False)[source]¶

Bases: libgunshotmatch.consolidate.ConsolidatedPeakFilter

Inverted version of ConsolidatedPeakFilter.

Returns peaks which would be excluded by a ConsolidatedPeakFilter.

New in version 0.10.0.

Parameters

name_filter (Iterable[str]) – List of glob-style matches for compound names. Consolidated peaks matching any of these will be excluded. Default [].
min_match_factor (int) – Minimum average match factor. Consolidated peaks with an average match factor below this will be excluded. Default 600.
min_appearances (int) – Number of times the hit must appear across the individual aligned peaks. Consolidated peaks where the most common hit appears fewer times than this will be excluded. If set to -1 the number of instances of the peak in the project are used. Default -1.
verbose (bool) – If True details of excluded peaks will be printed. Default False.

Methods:

`filter`(consolidated_peaks)	Filter a list of consolidated peaks.
`print_skip_reason`(peak, reason)	Print the reason for skipping a peak, if `ConsolidatedPeakFilter.verbose` is `True`.

filter(consolidated_peaks)[source]¶

Filter a list of consolidated peaks.

Parameters: consolidated_peaks (List[ConsolidatedPeak])
Return type: List[ConsolidatedPeak]

print_skip_reason(peak, reason)[source]¶

Print the reason for skipping a peak, if ConsolidatedPeakFilter.verbose is True.

Parameters

peak (ConsolidatedPeak) – The peak being skipped.
reason (str) – The reason for skipping the peak.

class ConsolidatedSearchResult(name, cas, mf_list=[], rmf_list=[], hit_numbers=[], reference_data=None)[source]¶

Represents a candidate compound for a peak.

This is determined from a set of SearchResults for a set of aligned peaks.

Parameters

name (str) – The name of the candidate compound.
cas (str) – The CAS number of the compound.
mf_list (List[int]) – List of Match Factors comparing the mass spectrum of the peak with the reference spectrum in each aligned peak. Will contain NaN where the compound was not in the hit list for a peak. Default [].
rmf_list (List[int]) – List of Reverse Match Factors comparing the reference spectrum with the spectrum for each aligned peak. Will contain NaN where the compound was not in the hit list for a peak. Default [].
hit_numbers (List[int]) – List of “hit” numbers from NIST MS Search. Lower is better. Will contain NaN where the compound was not in the hit list for a peak. Default [].
reference_data (Union[Dict, ReferenceData, None]) – The reference mass spectrum for the compound from the NIST library. Default None.

Methods:

`__len__`()	The number of aligned peaks the compound appeared in the hit list for.
`from_dict`(d)	Construct a `ConsolidatedSearchResult` from a dictionary.
`to_dict`()	Returns a dictionary representation of this `ConsolidatedSearchResult`.

Attributes:

`average_hit_number`	The average hit number.
`cas`	The CAS number of the compound.
`hit_number_stdev`	The standard deviation of the hit numbers.
`hit_numbers`	List of “hit” numbers from NIST MS Search.
`match_factor`	The average match factor.
`match_factor_stdev`	The standard deviation of the match factors.
`mf_list`	List of Match Factors comparing the mass spectrum of the peak with the reference spectrum in each aligned peak.
`name`	The name of the candidate compound.
`reference_data`	The reference mass spectrum for the compound from the NIST library.
`reverse_match_factor`	The average reverse match factor.
`reverse_match_factor_stdev`	The standard deviation of the reverse match factors.
`rmf_list`	List of Reverse Match Factors comparing the reference spectrum with the spectrum for each aligned peak.

__len__()[source]¶

The number of aligned peaks the compound appeared in the hit list for.

Return type: int

property average_hit_number¶

The average hit number.

Missing values (where the compound was not in the hit list for a peak) are excluded from the calculation.

Return type: float

cas¶

Type: str

The CAS number of the compound.

classmethod from_dict(d)[source]¶

Construct a ConsolidatedSearchResult from a dictionary.

Parameters: d (Mapping[str, Any])
Return type: ConsolidatedSearchResult

property hit_number_stdev¶

The standard deviation of the hit numbers.

Missing values (where the compound was not in the hit list for a peak) are excluded from the calculation.

Return type: float

hit_numbers¶

Type: List[int]

List of “hit” numbers from NIST MS Search.

Lower is better. Will contain NaN where the compound was not in the hit list for a peak.

property match_factor¶

The average match factor.

Missing values (where the compound was not in the hit list for a peak) are excluded from the calculation.

Return type: float

property match_factor_stdev¶

The standard deviation of the match factors.

Missing values (where the compound was not in the hit list for a peak) are excluded from the calculation.

Return type: float

mf_list¶

Type: List[int]

List of Match Factors comparing the mass spectrum of the peak with the reference spectrum in each aligned peak.

Will contain NaN where the compound was not in the hit list for a peak.

name¶

Type: str

The name of the candidate compound.

reference_data¶

Type: Optional[ReferenceData]

The reference mass spectrum for the compound from the NIST library.

property reverse_match_factor¶

The average reverse match factor.

Missing values (where the compound was not in the hit list for a peak) are excluded from the calculation.

Return type: float

property reverse_match_factor_stdev¶

The standard deviation of the reverse match factors.

Missing values (where the compound was not in the hit list for a peak) are excluded from the calculation.

Return type: float

rmf_list¶

Type: List[int]

List of Reverse Match Factors comparing the reference spectrum with the spectrum for each aligned peak.

Will contain NaN where the compound was not in the hit list for a peak.

to_dict()[source]¶

Returns a dictionary representation of this ConsolidatedSearchResult.

All keys are native, JSON-serializable, Python objects.

Return type: Dict[str, Any]

match_counter(engine, peak_numbers, qualified_peaks, ms_comp_data)[source]¶

Find the most likely compound for each peak.

Parameters

engine (Engine)
peak_numbers (List[int]) – List of peak numbers to process.
qualified_peaks (List[List[QualifiedPeak]]) – List of lists of qualified aligned peaks for each repeat.
ms_comp_data (DataFrame) – Dataframe giving pairwise mass spectrum comparisons for each set of aligned peaks.

Return type

List[ConsolidatedPeak]

pairwise_ms_comparisons(alignment, parallel=True)[source]¶

Between Samples Spectra Comparison.

Parameters

alignment (Alignment)
parallel (bool) – Set to False to disable parallelisation. Default True.

Return type

DataFrame

Returns

pandas.DataFrame where the columns are pairwise spectrum similarity scores and the rows are the peaks.

libgunshotmatch.consolidate¶

`libgunshotmatch.consolidate`¶