Assemble

assemble.py

This module finds and forms essential structure components, which are the smallest building blocks that form every repeat in the song.

These functions ensure that each time step of a song is contained in at most one of the song’s essential structure components by checking that there are no overlapping repeats in time. When repeats overlap, they undergo a process where they are divided until there are only non-overlapping pieces left.

The module contains the following functions:

  • breakup_overlaps_by_intersect

    Extracts repeats in input_pattern_obj that has the starting indices of the repeats, into the essential structure components using bw_vec, that has the lengths of each repeat.

  • check_overlaps

    Compares every pair of groups, determining if there are any repeats in any pairs of the groups that overlap.

  • __compare_and_cut

    Compares two rows of repeats labeled RED and BLUE, and determines if there are any overlaps in time between them. If there are overlaps, we cut the repeats in RED and BLUE into up to 3 pieces.

  • __num_of_parts

    Determines the number of blocks of consecutive time steps in a list of time steps. A block of consecutive time steps represents a distilled section of a repeat.

  • __inds_to_rows

    Expands a vector containing the starting indices of a piece or two of a repeat into a matrix representation recording when these pieces occur in the song with 1’s. All remaining entries are marked with 0’s.

  • __merge_based_on_length

    Merges repeats that are the same length, as set by full_bandwidth, and are repeats of the same piece of structure.

  • __merge_rows

    Merges rows that have at least one common repeat. These common repeat(s) must occur at the same time step and be of a common length.

  • hierarchical_structure

    Distills the repeats encoded in matrix_no_overlaps (and key_no_overlaps) to the essential structure components and then builds the hierarchical representation. Optionally outputs visualizations of the hierarchical representations.

repytah.assemble.breakup_overlaps_by_intersect(input_pattern_obj, bw_vec, thresh_bw)

Extracts repeats in input_pattern_obj that has the starting indices of the repeats, into the essential structure components using bw_vec, that has the lengths of each repeat. The essential structure components are the smallest building blocks that form every repeat in the song.

Parameters
  • input_pattern_obj (np.ndarray) – Binary matrix with 1’s where repeats begin and 0’s otherwise.

  • bw_vec (np.ndarray) – Vector containing the lengths of the repeats encoded in input_pattern_obj.

  • thresh_bw (int) – One less than the smallest allowable repeat length.

Returns

A tuple (pattern_no_overlaps, pattern_no_overlaps_key) where all variables have data type np.ndarray.

pattern_no_overlaps is a binary matrix with 1’s where repeats of essential structure components begin.

pattern_no_overlaps_key is a vector containing the lengths of the repeats of essential structure components in pattern_no_overlaps.

repytah.assemble.check_overlaps(input_mat)

Compares every pair of repeat groups and determines if there are any repeats in any pairs of the groups that overlap.

Parameters

input_mat (np.ndarray) – Binary matrix with blocks of 1’s equal to the length of repeats to be checked for overlaps.

Returns

Logical array where (i,j) = 1 if row i of input_mat and row j of input_mat overlap and (i,j) = 0 elsewhere.

Return type

overlap_mat (np.ndarray)

repytah.assemble.__compare_and_cut(red, red_len, blue, blue_len)

Compares two rows of repeats labeled RED and BLUE, and determines if there are any overlaps in time between them. If there is, then we cut the repeats in RED and BLUE into up to 3 pieces.

Parameters
  • red (np.ndarray) – Binary row vector encoding a set of repeats with 1’s where each repeat starts and 0’s otherwise.

  • red_len (np.ndarray) – Length of repeats encoded in red.

  • blue (np.ndarray) – Binary row vector encoding a set of repeats with 1’s where each repeat starts and 0’s otherwise.

  • blue_len (np.ndarray) – Length of repeats encoded in blue.

Returns

A tuple (union_mat, union_length) where all variables have data type np.ndarray.

union_mat is a binary matrix representation of up to three rows encoding non-overlapping repeats cut from red and blue.

union_length is a vector containing the lengths of the repeats encoded in union_mat.

repytah.assemble.__num_of_parts(input_vec, input_start, input_all_starts)

Determines the number of blocks of consecutive time steps in a list of time steps. A block of consecutive time steps represents a distilled section of a repeat. This distilled section will be replicated and the starting indices of the repeats within it will be returned.

Parameters
  • input_vec (np.ndarray) – Vector that contains one or two parts of a repeat that are overlap(s) in time that may need to be replicated.

  • input_start (np.ndarray) – Starting index for the part to be replicated.

  • input_all_starts (np.ndarray) – Starting indices for replication.

Returns

A tuple (start_mat, length_vec) where all variables have data type np.ndarray.

start_mat is an array of one or two rows containing the starting indices of the replicated repeats.

length_vec is a column vector containing the lengths of the replicated parts.

repytah.assemble.__inds_to_rows(start_mat, row_length)

Expands a vector containing the starting indices of a piece or two of a repeat into a matrix representation recording when these pieces occur in the song with 1’s. All remaining entries are marked with 0’s.

Parameters
  • start_mat (np.ndarray) – Matrix of one or two rows, containing the starting indices.

  • row_length (int) – Length of the rows.

Returns

Binary matrix of one or two rows, with 1’s where the starting indices and 0’s otherwise.

Return type

new_mat (np.ndarray)

repytah.assemble.__merge_based_on_length(full_mat, full_bw, target_bw)

Merges repeats that are the same length, as set by full_bw, and are repeats of the same piece of structure.

Parameters
  • full_mat (np.ndarray) – Binary matrix with ones where repeats start and zeroes otherwise.

  • full_bw (np.ndarray) – Length of repeats encoded in input_mat.

  • target_bw (np.ndarray) – Lengths of repeats that we seek to merge.

Returns

A tuple (out_mat, one_length_vec) where all variables have data type np.ndarray.

out_mat is a binary matrix with 1’s where repeats start and 0’s otherwise with rows of full_mat merged if appropriate.

one_length_vec is a vector that contains the length of repeats encoded in out_mat.

repytah.assemble.__merge_rows(input_mat, input_width)

Merges rows that have at least one common repeat; said common repeat(s) must occur at the same time step and be of common length.

Parameters
  • input_mat (np.ndarray) – Binary matrix with ones where repeats start and zeroes otherwise.

  • input_width (int) – Length of repeats encoded in input_mat.

Returns

Binary matrix with ones where repeats start and zeroes otherwise.

Return type

merge_mat (np.ndarray)

repytah.assemble.hierarchical_structure(matrix_no_overlaps, key_no_overlaps, sn, vis=False)

Distills the repeats encoded in matrix_no_overlaps (and key_no_overlaps) to the essential structure components and then builds the hierarchical representation. Optionally shows visualizations of the hierarchical structure via the vis argument.

Parameters
  • matrix_no_overlaps (np.ndarray) – Binary matrix with 1’s where repeats begin and 0’s otherwise.

  • key_no_overlaps (np.ndarray) – Vector containing the lengths of the repeats encoded in matrix_no_overlaps.

  • sn (int) –

  • length (Song) –

  • shingles. (which is the number of audio) –

  • vis (bool) – Shows visualizations if True (default = False).

Returns

A tuple (full_visualization, full_key, full_matrix_no_overlaps, full_anno_lst) where all variables have data type np.ndarray.

full_visualization is a binary matrix representation for full_matrix_no_overlaps with blocks of 1’s equal to the lengths prescribed in full_key.

full_key is a vector containing the lengths of the hierarchical structure encoded in full_matrix_no_overlaps.

full_matrix_no_overlaps is a binary matrix with 1’s where hierarchical structure begins and 0’s otherwise.

full_anno_lst is a vector containing the annotation markers of the hierarchical structure encoded in each row of full_matrix_no_overlaps.