Skip to content

year_scoring

Release scoring system for music metadata evaluation.

This module contains the core scoring algorithm that evaluates music releases to determine the most likely original release year. The scoring system considers multiple factors including artist/album matching, release characteristics, contextual information, and source reliability.

Extracted from the legacy external API service to enable modular usage across different API providers while preserving the sophisticated scoring logic.

ArtistPeriodContext

Bases: TypedDict

Context about an artist's active period.

ReleaseScorer

ReleaseScorer(
    scoring_config=None,
    min_valid_year=1900,
    definitive_score_threshold=85,
    console_logger=None,
    remaster_keywords=None,
    major_market_codes=None,
)

Release scoring system for evaluating music metadata quality.

This class implements the sophisticated scoring algorithm that evaluates releases from different sources to determine the most likely original release. The algorithm considers multiple factors and applies configuration-driven scoring rules to ensure consistent and accurate results.

Attributes:

Name Type Description
scoring_config ScoringConfig

Configuration dictionary with scoring parameters

min_valid_year

Minimum valid year for releases (default: 1900)

current_year

Current year for validation (default: current system year)

definitive_score_threshold

Threshold for considering a score definitive

artist_period_context ArtistPeriodContext | None

Optional context about artist's active period

console_logger

Logger for debug output

Initialize the release scorer.

Parameters:

Name Type Description Default
scoring_config ScoringConfig | None

Typed scoring parameters (uses defaults if None)

None
min_valid_year int

Minimum valid year for releases

1900
definitive_score_threshold int

Threshold for definitive scoring

85
console_logger Logger | None

Optional logger for debug output

None
remaster_keywords list[str] | None

Keywords to identify edition suffixes

None
major_market_codes list[str] | None

Country codes for major market bonus

None
Source code in src/services/api/year_scoring.py
def __init__(
    self,
    scoring_config: ScoringConfig | None = None,
    min_valid_year: int = 1900,
    definitive_score_threshold: int = 85,
    console_logger: logging.Logger | None = None,
    remaster_keywords: list[str] | None = None,
    major_market_codes: list[str] | None = None,
) -> None:
    """Initialize the release scorer.

    Args:
        scoring_config: Typed scoring parameters (uses defaults if None)
        min_valid_year: Minimum valid year for releases
        definitive_score_threshold: Threshold for definitive scoring
        console_logger: Optional logger for debug output
        remaster_keywords: Keywords to identify edition suffixes
        major_market_codes: Country codes for major market bonus

    """
    self.scoring_config: ScoringConfig = scoring_config or ScoringConfig(
        base_score=10,
        artist_exact_match_bonus=20,
        album_exact_match_bonus=25,
        perfect_match_bonus=10,
        album_variation_bonus=10,
        album_substring_penalty=-15,
        album_unrelated_penalty=-40,
        mb_release_group_match_bonus=50,
        type_album_bonus=15,
        type_ep_single_penalty=-10,
        type_compilation_live_penalty=-25,
        status_official_bonus=10,
        status_bootleg_penalty=-50,
        status_promo_penalty=-20,
        reissue_penalty=-30,
        year_diff_penalty_scale=-5,
        year_diff_max_penalty=-40,
        year_before_start_penalty=-25,
        year_after_end_penalty=-20,
        year_near_start_bonus=20,
        country_artist_match_bonus=10,
        country_major_market_bonus=5,
        source_mb_bonus=5,
        source_discogs_bonus=2,
        source_itunes_bonus=4,
    )
    self.min_valid_year = min_valid_year
    self.current_year = dt.now(UTC).year
    self.definitive_score_threshold = definitive_score_threshold
    self.artist_period_context: ArtistPeriodContext | None = None
    self.console_logger = console_logger or logging.getLogger(__name__)
    self.remaster_keywords = remaster_keywords or []
    self.major_market_codes = major_market_codes or self._DEFAULT_MARKET_CODES

    # Constants from the original implementation
    self.YEAR_LENGTH = 4

set_artist_period_context

set_artist_period_context(context)

Set the artist activity period context for scoring.

Parameters:

Name Type Description Default
context ArtistPeriodContext | None

Dictionary with start_year and end_year information

required
Source code in src/services/api/year_scoring.py
def set_artist_period_context(self, context: ArtistPeriodContext | None) -> None:
    """Set the artist activity period context for scoring.

    Args:
        context: Dictionary with start_year and end_year information

    """
    self.artist_period_context = context

clear_artist_period_context

clear_artist_period_context()

Clear the artist activity period context.

Source code in src/services/api/year_scoring.py
def clear_artist_period_context(self) -> None:
    """Clear the artist activity period context."""
    self.artist_period_context = None

score_original_release

score_original_release(
    release,
    artist_norm,
    album_norm,
    *,
    artist_region,
    source="unknown",
    album_orig=None
)

REVISED scoring function prioritizing original release indicators (v3).

This is the core scoring algorithm that evaluates a release against multiple criteria to determine how likely it is to be the original release of an album.

The scoring considered: 1. Core match quality (artist/album name matching) 2. Release characteristics (type, status, reissue indicators) 3. Contextual factors (year validation, artist activity period) 4. Source reliability (MusicBrainz > iTunes > Discogs)

Parameters:

Name Type Description Default
release dict[str, Any]

Dictionary containing release metadata

required
artist_norm str

Normalized artist name for matching

required
album_norm str

Normalized album name for matching

required
artist_region str | None

Artist's region/country for bonus scoring

required
source str

Source of the release data (musicbrainz, discogs, itunes)

'unknown'
album_orig str | None

Original album name with parentheses for edition stripping

None

Returns:

Type Description
int

Integer score (0-100+) indicating release quality/originality

Source code in src/services/api/year_scoring.py
def score_original_release(
    self,
    release: dict[str, Any],
    artist_norm: str,
    album_norm: str,
    *,
    artist_region: str | None,
    source: str = "unknown",
    album_orig: str | None = None,
) -> int:
    """REVISED scoring function prioritizing original release indicators (v3).

    This is the core scoring algorithm that evaluates a release against multiple
    criteria to determine how likely it is to be the original release of an album.

    The scoring considered:
    1. Core match quality (artist/album name matching)
    2. Release characteristics (type, status, reissue indicators)
    3. Contextual factors (year validation, artist activity period)
    4. Source reliability (MusicBrainz > iTunes > Discogs)

    Args:
        release: Dictionary containing release metadata
        artist_norm: Normalized artist name for matching
        album_norm: Normalized album name for matching
        artist_region: Artist's region/country for bonus scoring
        source: Source of the release data (musicbrainz, discogs, itunes)
        album_orig: Original album name with parentheses for edition stripping

    Returns:
        Integer score (0-100+) indicating release quality/originality

    """
    cfg = self.scoring_config
    score: int = cfg.base_score
    score_components: list[str] = []

    # Extract key fields
    release_title_orig = release.get("title", "") or ""
    release_artist_orig = release.get("artist", "") or ""
    year_str = release.get("year", "") or ""
    source = release.get("source", source)

    # Strip edition suffixes BEFORE normalization (preserves parentheses for stripping)
    release_title_stripped = self._strip_edition_suffix(release_title_orig)
    release_title_norm = self._normalize_name(release_title_stripped)
    release_artist_norm = self._normalize_name(release_artist_orig)

    # If original album name provided, strip editions from it too
    if album_orig:
        album_stripped = self._strip_edition_suffix(album_orig)
        album_norm = self._normalize_name(album_stripped)

    # Validate year first (early return if invalid)
    validated_year, is_valid = self._validate_year(year_str, score_components)
    if not is_valid or validated_year is None:
        return 0

    # At this point, validated_year is guaranteed to be int
    year: int = validated_year

    # Apply penalties for current and future year releases
    if year > self.current_year:
        # Future years are suspicious (likely incorrect data)
        future_penalty = cfg.future_year_penalty
        score += future_penalty
        score_components.append(f"Future Year ({year}): {future_penalty}")
    elif year == self.current_year:
        # Current year: small penalty to prefer earlier releases when ambiguous
        current_year_penalty = cfg.current_year_penalty
        score += current_year_penalty
        if current_year_penalty != 0:
            score_components.append(f"Current Year ({year}): {current_year_penalty}")

    # Calculate score components
    score += self._calculate_match_score(
        release_artist_norm,
        artist_norm,
        release_title_norm,
        album_norm,
        score_components=score_components,
    )

    # Soundtrack compensation: if target is "Various Artists" etc., but album matches exactly
    # and API confirms it's a soundtrack, compensate for the artist mismatch penalty
    score += self._calculate_soundtrack_compensation(
        target_artist_norm=artist_norm,
        release_title_norm=release_title_norm,
        target_album_norm=album_norm,
        release_genre=release.get("genre", ""),
        score_components=score_components,
    )

    char_score, rg_first_year = self._calculate_release_characteristics_score(release, year_str, source, score_components)
    score += char_score

    score += self._calculate_contextual_score(year, rg_first_year, score_components)
    score += self._calculate_country_score(release, artist_region, score_components)
    score += self._calculate_source_score(source, score_components)

    final_score = max(0, score)

    # Debug logging for significant scores
    if final_score > self.definitive_score_threshold - 20 or any("penalty" in comp.lower() for comp in score_components):
        debug_log_msg = f"Score Calculation for '{release_title_orig}' ({year_str}) [{source}]:\n"
        debug_log_msg += "\n".join([f"  - {comp}" for comp in score_components])
        debug_log_msg += f"\n  ==> Final Score: {final_score}"
        self.console_logger.debug(debug_log_msg)

    return final_score

create_release_scorer

create_release_scorer(
    scoring_config=None,
    min_valid_year=1900,
    definitive_score_threshold=85,
    console_logger=None,
    remaster_keywords=None,
    major_market_codes=None,
)

Create a configured ReleaseScorer instance.

Parameters:

Name Type Description Default
scoring_config ScoringConfig | None

Typed scoring parameters (uses defaults if None)

None
min_valid_year int

Minimum valid year for releases

1900
definitive_score_threshold int

Threshold for definitive scoring

85
console_logger Logger | None

Optional logger for debug output

None
remaster_keywords list[str] | None

Keywords to identify edition suffixes

None
major_market_codes list[str] | None

Country codes for major market bonus

None

Returns:

Type Description
ReleaseScorer

Configured ReleaseScorer instance

Source code in src/services/api/year_scoring.py
def create_release_scorer(
    scoring_config: ScoringConfig | None = None,
    min_valid_year: int = 1900,
    definitive_score_threshold: int = 85,
    console_logger: logging.Logger | None = None,
    remaster_keywords: list[str] | None = None,
    major_market_codes: list[str] | None = None,
) -> ReleaseScorer:
    """Create a configured ReleaseScorer instance.

    Args:
        scoring_config: Typed scoring parameters (uses defaults if None)
        min_valid_year: Minimum valid year for releases
        definitive_score_threshold: Threshold for definitive scoring
        console_logger: Optional logger for debug output
        remaster_keywords: Keywords to identify edition suffixes
        major_market_codes: Country codes for major market bonus

    Returns:
        Configured ReleaseScorer instance

    """
    return ReleaseScorer(
        scoring_config=scoring_config,
        min_valid_year=min_valid_year,
        definitive_score_threshold=definitive_score_threshold,
        console_logger=console_logger,
        remaster_keywords=remaster_keywords,
        major_market_codes=major_market_codes,
    )