Skip to content

metadata_utils

Metadata Helpers Module.

Provides utility functions for parsing, cleaning, and processing music track metadata. These functions are designed to be independent of specific service instances and can be used across different parts of the application.

Functions:

Name Description
- parse_tracks

Parses raw AppleScript output into structured track dictionaries.

- group_tracks_by_artist

Groups track dictionaries by artist name.

- determine_dominant_genre_for_artist

Determines the most likely genre for an artist.

- remove_parentheses_with_keywords

Removes specified parenthetical content from strings.

- clean_names

Applies cleaning rules to track and album names.

- is_music_app_running

Checks if Music.app is currently running.

AppleScriptFieldIndex

Bases: IntEnum

Field positions in AppleScript output.

These are indices into the field array returned by AppleScript, NOT semantic names for TrackDict fields. The naming reflects what AppleScript actually provides at each position.

AppleScript field order (from fetch_tracks.applescript): {track_id, track_name, track_artist, album_artist, track_album, track_genre, date_added, modification_date, track_status, track_year, release_year, ""}

reset_cleaning_exceptions_log

reset_cleaning_exceptions_log()

Reset the set of logged cleaning exceptions (call at start of new run).

Source code in src/core/models/metadata_utils.py
def reset_cleaning_exceptions_log() -> None:
    """Reset the set of logged cleaning exceptions (call at start of new run)."""
    _logged_cleaning_exceptions.clear()

parse_tracks

parse_tracks(raw_data, error_logger)

Parse raw AppleScript output into a list of track dictionaries.

Uses the Record Separator (U+001E) as the field delimiter and Group Separator (U+001D) as the line delimiter.

Parameters:

Name Type Description Default
raw_data str

Raw string data from AppleScript.

required
error_logger Logger

Logger for error output.

required

Returns:

Type Description
list[TrackDict]

List of TrackDict instances parsed from the input data.

Source code in src/core/models/metadata_utils.py
def parse_tracks(raw_data: str, error_logger: logging.Logger) -> list[TrackDict]:
    """Parse raw AppleScript output into a list of track dictionaries.

    Uses the Record Separator (U+001E) as the field delimiter and
    Group Separator (U+001D) as the line delimiter.

    Args:
        raw_data: Raw string data from AppleScript.
        error_logger: Logger for error output.

    Returns:
        List of TrackDict instances parsed from the input data.

    """
    field_separator = FIELD_SEPARATOR if FIELD_SEPARATOR in raw_data else "\t"

    if not raw_data:
        error_logger.error("No data fetched from AppleScript.")
        return []

    error_logger.debug(
        "parse_tracks: Input raw_data (first 500 chars): %s...",
        raw_data[:500],
    )
    error_logger.debug(
        "parse_tracks: Using field separator %r, line separator %r",
        field_separator,
        LINE_SEPARATOR if field_separator == FIELD_SEPARATOR else "newline",
    )

    tracks: list[TrackDict] = []
    stripped = raw_data.strip()
    rows = split_applescript_rows(stripped, field_separator)

    for row in rows:
        if not row:  # Skip empty rows
            continue

        fields = row.split(field_separator)

        if len(fields) >= MIN_REQUIRED_FIELDS:
            track = _create_track_from_fields(fields)
            tracks.append(track)
        else:
            error_logger.warning("Malformed track data row skipped: %s", row)

    return tracks

group_tracks_by_artist

group_tracks_by_artist(tracks)

Group tracks by album_artist or normalized artist.

Uses album_artist when available to properly handle collaboration tracks. Falls back to normalized artist when album_artist is empty.

Parameters:

Name Type Description Default
tracks list[TrackDict]

List of track dictionaries to group.

required

Returns:

Type Description
dict[str, list[TrackDict]]

Dictionary mapping normalized artist names to lists of their tracks.

Source code in src/core/models/metadata_utils.py
def group_tracks_by_artist(
    tracks: list[TrackDict],
) -> dict[str, list[TrackDict]]:
    """Group tracks by album_artist or normalized artist.

    Uses album_artist when available to properly handle collaboration tracks.
    Falls back to normalized artist when album_artist is empty.

    Args:
        tracks: List of track dictionaries to group.

    Returns:
        Dictionary mapping normalized artist names to lists of their tracks.

    """
    artists: defaultdict[str, list[TrackDict]] = defaultdict(list)
    for track in tracks:
        # Prefer album_artist for grouping (handles collaborations correctly)
        album_artist = track.get("album_artist", "")

        # Fallback to normalized artist if album_artist is empty
        if not album_artist or not str(album_artist).strip():
            raw_artist = str(track.get("artist", "Unknown"))
            album_artist = normalize_collaboration_artist(raw_artist)

        if album_artist and isinstance(album_artist, str) and album_artist.strip():
            artist_key = normalize_for_matching(album_artist)
            artists[artist_key].append(track)
    return dict(artists)

determine_dominant_genre_for_artist

determine_dominant_genre_for_artist(
    artist_tracks, error_logger
)

Determine the dominant genre for an artist based on earliest album.

Algorithm
  1. Find the earliest track for each album (by date_added).
  2. Determine the earliest album (by date_added of its earliest track).
  3. Use the genre of the earliest track in that album as dominant.

Parameters:

Name Type Description Default
artist_tracks Sequence[TrackDict]

List of track dictionaries for a single artist.

required
error_logger Logger

Logger for error output.

required

Returns:

Type Description
str

The dominant genre string, or "Unknown" if no tracks or on error.

Source code in src/core/models/metadata_utils.py
def determine_dominant_genre_for_artist(
    artist_tracks: Sequence[TrackDict],
    error_logger: logging.Logger,
) -> str:
    """Determine the dominant genre for an artist based on earliest album.

    Algorithm:
        1. Find the earliest track for each album (by date_added).
        2. Determine the earliest album (by date_added of its earliest track).
        3. Use the genre of the earliest track in that album as dominant.

    Args:
        artist_tracks: List of track dictionaries for a single artist.
        error_logger: Logger for error output.

    Returns:
        The dominant genre string, or "Unknown" if no tracks or on error.

    """
    if not artist_tracks:
        return "Unknown"
    try:
        if album_earliest := _get_earliest_track_per_album(artist_tracks, error_logger):
            return (
                _get_genre_from_track(earliest_album_track)
                if (earliest_album_track := _get_earliest_track_across_albums(album_earliest, error_logger))
                else "Unknown"
            )
        return "Unknown"

    except (KeyError, TypeError, AttributeError) as e:  # Catch data structure errors
        error_logger.exception(
            "Error in determine_dominant_genre_for_artist: %s",
            e,
        )
        return "Unknown"

remove_parentheses_with_keywords

remove_parentheses_with_keywords(
    name, keywords, console_logger, error_logger
)

Remove any parenthetical segment that contains at least one of the given keywords.

This implementation uses balanced parentheses parsing to properly handle nested parentheses/brackets (e.g., "Album (Reissue (2024))" -> "Album"). It scans for opening brackets, finds the matching closing bracket, and removes the entire segment if it contains any of the keywords (case-insensitive).

Parameters:

Name Type Description Default
name str

The string to process

required
keywords list[str]

List of keywords to search for in parenthetical segments

required
console_logger Logger

Logger for debug output

required
error_logger Logger

Logger for error output

required

Returns:

Type Description
str

The cleaned string with matching parenthetical segments removed

Source code in src/core/models/metadata_utils.py
def remove_parentheses_with_keywords(
    name: str,
    keywords: list[str],
    console_logger: logging.Logger,
    error_logger: logging.Logger,
) -> str:
    """Remove any parenthetical segment that contains at least one of the given keywords.

    This implementation uses balanced parentheses parsing to properly handle nested
    parentheses/brackets (e.g., "Album (Reissue (2024))" -> "Album"). It scans for
    opening brackets, finds the matching closing bracket, and removes the entire
    segment if it contains any of the keywords (case-insensitive).

    Args:
        name: The string to process
        keywords: List of keywords to search for in parenthetical segments
        console_logger: Logger for debug output
        error_logger: Logger for error output

    Returns:
        The cleaned string with matching parenthetical segments removed

    """
    if not name or not keywords:
        return name

    try:
        return _clean_text_segments(name, keywords, console_logger)
    except (ValueError, TypeError, AttributeError) as e:
        error_logger.exception(
            "Error processing '%s' with keywords %s: %s",
            name,
            keywords,
            e,
        )
        return name  # Return original on error

clean_names

clean_names(
    artist,
    track_name,
    album_name,
    *,
    config,
    console_logger,
    error_logger
)

Clean track and album names per config: remove remaster segments and suffixes.

Skips artist+album pairs in exceptions list. Collapses excess whitespace.

Parameters:

Name Type Description Default
artist str

Artist name (used for exception checking).

required
track_name str

Raw track title to clean.

required
album_name str

Raw album title to clean.

required
config AppConfig

Typed application configuration.

required
console_logger Logger

Logger for debug/info output.

required
error_logger Logger

Logger for error reporting.

required

Returns:

Type Description
tuple[str, str]

Tuple of (cleaned_track_name, cleaned_album_name).

Source code in src/core/models/metadata_utils.py
def clean_names(
    artist: str,
    track_name: str,
    album_name: str,
    *,
    config: AppConfig,
    console_logger: logging.Logger,
    error_logger: logging.Logger,
) -> tuple[str, str]:
    """Clean track and album names per config: remove remaster segments and suffixes.

    Skips artist+album pairs in exceptions list. Collapses excess whitespace.

    Args:
        artist: Artist name (used for exception checking).
        track_name: Raw track title to clean.
        album_name: Raw album title to clean.
        config: Typed application configuration.
        console_logger: Logger for debug/info output.
        error_logger: Logger for error reporting.

    Returns:
        Tuple of (cleaned_track_name, cleaned_album_name).

    """
    if track_name or album_name:
        console_logger.debug(
            "clean_names called with: artist='%s', track_name='%s', album_name='%s'",
            artist,
            track_name,
            album_name,
        )

    exceptions_list = [{"artist": exc.artist, "album": exc.album} for exc in config.exceptions.track_cleaning]
    if _is_cleaning_exception(artist, album_name, exceptions_list):
        # Log only once per artist/album combination
        key = (artist.lower(), album_name.lower())
        if key not in _logged_cleaning_exceptions:
            _logged_cleaning_exceptions.add(key)
            console_logger.info(
                "No cleaning applied due to exceptions for artist '%s', album '%s'.",
                artist,
                album_name,
            )
        return track_name.strip(), album_name.strip()

    remaster_keywords = list(config.cleaning.remaster_keywords)
    album_suffixes_raw: list[str] = list(config.cleaning.album_suffixes_to_remove)
    compiled_suffixes = _compile_suffix_patterns(album_suffixes_raw)

    # Helper function for cleaning strings using remove_parentheses_with_keywords
    def clean_string(val: str, keywords: list[str]) -> str:
        """Clean a string by removing parenthetical segments containing keywords.

        Args:
            val: The string to clean
            keywords: List of keywords to search for in parenthetical segments

        Returns:
            The cleaned string with matching parenthetical segments removed

        """
        # Use the utility function defined above, passing loggers
        new_val = remove_parentheses_with_keywords(
            val,
            keywords,
            console_logger,
            error_logger,
        )
        # Clean up multiple spaces and strip whitespace
        new_val = re.sub(r"\s+", " ", new_val).strip()
        return new_val or ""  # Return an empty string if the result is empty after cleaning

    original_track = track_name
    original_album = album_name

    # Apply cleaning to track name and album name
    cleaned_track = clean_string(track_name, remaster_keywords)
    cleaned_album = clean_string(album_name, remaster_keywords)

    # Ensure cleaned_album is always a string (type hint for linters)
    cleaned_album = str(cleaned_album)

    # Remove specified album suffixes in a case-insensitive manner
    cleaned_album = _strip_album_suffixes(cleaned_album, compiled_suffixes, console_logger)

    # Log the cleaning results only if something changed
    if cleaned_track != original_track:
        console_logger.debug(
            "Cleaned track name: '%s' -> '%s'",
            original_track,
            cleaned_track,
        )
    if cleaned_album != original_album:
        console_logger.debug(
            "Cleaned album name: '%s' -> '%s'",
            original_album,
            cleaned_album,
        )

    return cleaned_track, cleaned_album

is_music_app_running

is_music_app_running(error_logger)

Check if the Music.app is currently running using subprocess.

Parameters:

Name Type Description Default
error_logger Logger

Logger for error output.

required

Returns:

Type Description
bool

True if Music.app is running, False otherwise.

Source code in src/core/models/metadata_utils.py
def is_music_app_running(error_logger: logging.Logger) -> bool:
    """Check if the Music.app is currently running using subprocess.

    Args:
        error_logger: Logger for error output.

    Returns:
        True if Music.app is running, False otherwise.

    """
    try:
        return _check_osascript_availability(error_logger)
    except subprocess.TimeoutExpired:
        error_logger.warning("Music.app status check timed out after 10 seconds. Assuming Music.app is available.")
        # Optimistic: assume running to avoid false-negative blocking.
        # If Music.app isn't actually running, AppleScript commands will
        # fail later with clear error messages.
        return True
    except (subprocess.SubprocessError, OSError) as e:
        error_logger.exception("Unable to check Music.app status: %s", e)
        return False  # Assume not running on error
    except (ValueError, KeyError, AttributeError) as e:  # Catch data processing errors
        error_logger.exception(
            "Unexpected error checking Music.app status: %s",
            e,
        )
        return False