Search

search.py

This module holds functions used to find and record the diagonals in the thresholded matrix, T. These functions prepare the diagonals found for transformation and assembling later. The module contains the following functions:

  • find_complete_list

    Finds all smaller diagonals (and the associated pairs of repeats) that are contained in pair_list, which is composed of larger diagonals found in find_initial_repeats.

  • __find_add_rows

    Finds pairs of repeated structures, represented as diagonals of a certain length, k, that neither start nor end at the same time steps as previously found pairs of repeated structures of the same length.

  • find_all_repeats

    Finds all the diagonals present in thresh_mat. This function is nearly identical to find_initial_repeats except for two crucial differences. First, we do not remove diagonals after we find them. Second, there is no smallest bandwidth size as we are looking for all diagonals.

  • find_complete_list_anno_only

    Finds annotations for all pairs of repeats found in find_all_repeats. This list contains all the pairs of repeated structures with their starting/ending indices and lengths.

repytah.search.find_complete_list(pair_list, song_length)

Finds all smaller diagonals (and the associated pairs of repeats) that are contained in pair_list, which is composed of larger diagonals found in find_initial_repeats.

Parameters
  • pair_list (np.ndarray) – List of pairs of repeats found in earlier steps (bandwidths MUST be in ascending order). If you have run find_initial_repeats before this script, then pair_list will be ordered correctly.

  • song_length (int) – Song length, which is the number of audio shingles.

Returns

List of pairs of repeats with smaller repeats added.

Return type

final_lst (np.ndarray)

repytah.search.__find_add_rows(lst_no_anno, check_inds, k)

Finds pairs of repeated structures, represented as diagonals of a certain length, k, that that start at the same time step, or end at the same time step, or neither start nor end at the same time step as previously found pairs of repeated structures of the same length.

Parameters
  • lst_no_anno (np.ndarray) – List of pairs of repeats.

  • check_inds (np.ndarray) – List of starting indices for repeats of length k that we use to check lst_no_anno for more repeats of length k.

  • k (int) – Length of repeats that we are looking for.

Returns

List of newly found pairs of repeats of length K that are contained in larger repeats in lst_no_anno.

Return type

add_rows (np.ndarray)

repytah.search.find_all_repeats(thresh_mat, bw_vec)

Finds all the diagonals present in thresh_mat. This function is nearly identical to find_initial_repeats, with two crucial differences. First, we do not remove diagonals after we find them. Second, there is no smallest bandwidth size as we are looking for all diagonals.

Parameters
  • thresh_mat (np.ndarray) – Thresholded matrix that we extract diagonals from.

  • bw_vec (np.ndarray) – Vector of lengths of diagonals to be found. Should be 1, 2, 3, …, n where n = number of timesteps.

Returns

Pairs of repeats that correspond to diagonals in thresh_mat.

Return type

all_lst (np.ndarray)

repytah.search.find_complete_list_anno_only(pair_list, song_length)

Finds annotations for all pairs of repeats found in find_all_repeats. This list contains all the pairs of repeated structures with their starting/ending indices and lengths.

Parameters
  • pair_list (np.ndarray) – List of pairs of repeats. WARNING: Bandwidths must be in ascending order.

  • song_length (int) – Number of audio shingles in song.

Returns

List of pairs of repeats with smaller repeats added and with annotation markers.

Return type

out_lst (np.ndarray)