Search
search.py
This module holds functions used to find and record the diagonals in the thresholded matrix, T. These functions prepare the diagonals found for transformation and assembling later. The module contains the following functions:
- find_complete_list
Finds all smaller diagonals (and the associated pairs of repeats) that are contained in pair_list, which is composed of larger diagonals found in find_initial_repeats.
- __find_add_rows
Finds pairs of repeated structures, represented as diagonals of a certain length, k, that neither start nor end at the same time steps as previously found pairs of repeated structures of the same length.
- find_all_repeats
Finds all the diagonals present in thresh_mat. This function is nearly identical to find_initial_repeats except for two crucial differences. First, we do not remove diagonals after we find them. Second, there is no smallest bandwidth size as we are looking for all diagonals.
- find_complete_list_anno_only
Finds annotations for all pairs of repeats found in find_all_repeats. This list contains all the pairs of repeated structures with their starting/ending indices and lengths.
- repytah.search.find_complete_list(pair_list, song_length)
Finds all smaller diagonals (and the associated pairs of repeats) that are contained in pair_list, which is composed of larger diagonals found in find_initial_repeats.
- Parameters
pair_list (np.ndarray) – List of pairs of repeats found in earlier steps (bandwidths MUST be in ascending order). If you have run find_initial_repeats before this script, then pair_list will be ordered correctly.
song_length (int) – Song length, which is the number of audio shingles.
- Returns
List of pairs of repeats with smaller repeats added.
- Return type
final_lst (np.ndarray)
- repytah.search.__find_add_rows(lst_no_anno, check_inds, k)
Finds pairs of repeated structures, represented as diagonals of a certain length, k, that that start at the same time step, or end at the same time step, or neither start nor end at the same time step as previously found pairs of repeated structures of the same length.
- Parameters
lst_no_anno (np.ndarray) – List of pairs of repeats.
check_inds (np.ndarray) – List of starting indices for repeats of length k that we use to check lst_no_anno for more repeats of length k.
k (int) – Length of repeats that we are looking for.
- Returns
List of newly found pairs of repeats of length K that are contained in larger repeats in lst_no_anno.
- Return type
add_rows (np.ndarray)
- repytah.search.find_all_repeats(thresh_mat, bw_vec)
Finds all the diagonals present in thresh_mat. This function is nearly identical to find_initial_repeats, with two crucial differences. First, we do not remove diagonals after we find them. Second, there is no smallest bandwidth size as we are looking for all diagonals.
- Parameters
thresh_mat (np.ndarray) – Thresholded matrix that we extract diagonals from.
bw_vec (np.ndarray) – Vector of lengths of diagonals to be found. Should be 1, 2, 3, …, n where n = number of timesteps.
- Returns
Pairs of repeats that correspond to diagonals in thresh_mat.
- Return type
all_lst (np.ndarray)
- repytah.search.find_complete_list_anno_only(pair_list, song_length)
Finds annotations for all pairs of repeats found in find_all_repeats. This list contains all the pairs of repeated structures with their starting/ending indices and lengths.
- Parameters
pair_list (np.ndarray) – List of pairs of repeats. WARNING: Bandwidths must be in ascending order.
song_length (int) – Number of audio shingles in song.
- Returns
List of pairs of repeats with smaller repeats added and with annotation markers.
- Return type
out_lst (np.ndarray)