Transform

transform.py

This module contains functions that transform matrix inputs into different forms that are of use in bigger functions where they are called. These functions focus mainly on overlapping repeated structures and annotation markers.

The module contains the following functions:

remove_overlaps
Removes any pairs of repeats with the same length and annotation marker where at least one pair of repeats overlap in time.
__create_anno_remove_overlaps
Turns rows of repeats into marked rows with annotation markers for the start indices and zeroes otherwise. After removing the annotations that have overlaps, the function creates separate arrays for annotations with overlaps and annotations without overlaps. Finally, the annotation markers are checked and fixed if necessary.
__separate_anno_markers
Expands vector of non-overlapping repeats into a matrix representation. The matrix representation is a visual record of where all of the repeats in a song start and end.

repytah.transform.remove_overlaps(input_mat, song_length)

Removes any pairs of repeat length and specific annotation marker where there exists at least one pair of repeats that overlap in time.

Parameters

input_mat (np.ndarray[int]) – List of pairs of repeats with annotations marked. The first two columns refer to the first repeat or the pair, the second two refer to the second repeat of the pair, the fifth column refers to the length of the repeats, and the sixth column contains the annotation markers.
song_length (int) – Number of audio shingles.

Returns

A tuple (lst_no_overlaps, matrix_no_overlaps, key_no_overlaps, annotations_no_overlaps, all_overlap_lst). All variables have data type np.ndarray[int].

lst_no_overlaps is a list of pairs of repeats with annotations marked where all the repeats of a given length and with a specific annotation marker do not overlap in time.

matrix_no_overlaps is a matrix representation of lst_no_overlaps with one row for each group of repeats.

key_no_overlaps is a vector containing the lengths of the repeats encoded in each row of matrix_no_overlaps.

annotations_no_overlaps is a vector containing the annotation markers of the repeats encoded in each row of matrix_no_overlaps.

all_overlap_lst is a list of pairs of repeats where for each pair of repeat length and specific annotation marker, there exists at least one pair of repeats that do overlap in time.

repytah.transform.__create_anno_remove_overlaps(k_mat, song_length, band_width)

Turns k_mat into marked rows with annotation markers for the start indices and zeroes otherwise. After removing the annotations that have overlaps, the function outputs k_lst_out which only contains rows that have no overlaps, then takes the annotations that have overlaps from k_lst_out and puts them in overlap_lst. Lastly, it checks if the proper sequence of annotation markers was given and fix them if necessary.

Parameters

k_mat (np.ndarray) – List of pairs of repeats of length 1 with annotations marked. The first two columns refer to the first repeat of the pair, the second two refer to the second repeat of the pair, the fifth column refers to the length of the repeats, and the sixth column contains the annotation markers.
song_length (int) – Number of audio shingles.
band_width (int) – Length of repeats encoded in k_mat.

Returns

A tuple (pattern_row, k_lst_out, overlap_lst) where all variables have data type np.ndarray.

pattern_row marks where non-overlapping repeats occur, marking start indices with annotation markers and 0’s otherwise.

k_lst_out is a list of pairs of repeats of length band_width that contain no overlapping repeats with annotations marked.

overlap_lst is a list of pairs of repeats of length band_width that contain overlapping repeats with annotations marked.

repytah.transform.__separate_anno_markers(k_mat, song_length, band_width, pattern_row)

Expands pattern_row, a row vector that marks where non-overlapping repeats occur, into a matrix representation or np.array. The dimension of this array is twice the pairs of repeats by song_length. k_mat provides a list of annotation markers that is used in separating the repeats of length band_width into individual rows. Each row will mark the start and end time steps of a repeat with 1’s and 0’s otherwise. The array is a visual record of where all of the repeats in a song start and end.

Parameters

k_mat (np.ndarray) – List of pairs of repeats of length band_width with annotations marked. The first two columns refer to the start and end time steps of the first repeat of the pair, the second two refer to the start and end time steps of second repeat of the pair, the fifth column refers to the length of the repeats, and the sixth column contains the annotation markers. We will be indexing into the sixth column to obtain a list of annotation markers.
song_length (int) – Number of audio shingles.
band_width (int) – Length of repeats encoded in k_mat.
pattern_row (np.ndarray) – Row vector of the length of the song that marks where non-overlapping repeats occur with the repeats’ corresponding annotation markers and 0’s otherwise.

Returns

A tuple (pattern_mat, pattern_key, anno_id_lst) where all variables have data type np.ndarray.

pattern_mat is a matrix representation where each row contains a group of repeats marked.

pattern_key is a column vector containing the lengths of the repeats encoded in each row of pattern_mat.

anno_id_lst is a column vector containing the annotation markers of the repeats encoded in each row of pattern_mat.