script_detection
¶
Script detection utilities for text analysis.
ScriptType
¶
Bases: StrEnum
Script type enumeration for text analysis.
has_arabic
¶
Check if the text contains Arabic characters.
Covers: - Arabic (U+0600-U+06FF) - Arabic, Persian, Urdu - Arabic Supplement (U+0750-U+077F) - Additional Arabic letters
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Text to analyze |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if Arabic characters are found, False otherwise |
Examples:
Source code in src/core/models/script_detection.py
has_chinese
¶
Check if the text contains Chinese characters.
Covers: - CJK Unified Ideographs (U+4E00-U+9FFF) - Han characters - CJK Extension A (U+3400-U+4DBF) - Additional Han characters
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Text to analyze |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if Chinese characters are found, False otherwise |
Examples:
Source code in src/core/models/script_detection.py
has_cyrillic
¶
Check if the text contains Cyrillic characters.
Covers: - Basic Cyrillic (U+0400-U+04FF) - Russian, Ukrainian, Serbian, Bulgarian, etc. - Cyrillic Supplement (U+0500-U+052F) - Additional characters - Cyrillic Extended-A (U+2DE0-U+2DFF) - Historic letters - Cyrillic Extended-B (U+A640-U+A69F) - Additional historic
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Text to analyze |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if Cyrillic characters are found, False otherwise |
Examples:
>>> has_cyrillic("МУР")
True
>>> has_cyrillic("Pink Floyd")
False
>>> has_cyrillic("діти інженерів")
True
Source code in src/core/models/script_detection.py
has_devanagari
¶
Check if the text contains Devanagari characters.
Covers: - Devanagari (U+0900-U+097F) - Hindi, Marathi, Sanskrit, Nepali
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Text to analyze |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if Devanagari characters are found, False otherwise |
Examples:
Source code in src/core/models/script_detection.py
has_greek
¶
Check if the text contains Greek characters.
Covers: - Greek and Coptic (U+0370-U+03FF) - Modern and ancient Greek
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Text to analyze |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if Greek characters are found, False otherwise |
Examples:
Source code in src/core/models/script_detection.py
has_hebrew
¶
Check if the text contains Hebrew characters.
Covers: - Hebrew (U+0590-U+05FF) - Hebrew alphabet
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Text to analyze |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if Hebrew characters are found, False otherwise |
Examples:
Source code in src/core/models/script_detection.py
has_japanese
¶
Check if the text contains Japanese characters.
Covers: - Hiragana (U+3040-U+309F) - Japanese syllabary - Katakana (U+30A0-U+30FF) - Japanese syllabary - CJK Unified Ideographs (U+4E00-U+9FFF) - Kanji (shared with Chinese)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Text to analyze |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if Japanese characters are found, False otherwise |
Examples:
>>> has_japanese("音楽")
True
>>> has_japanese("ひらがな")
True
>>> has_japanese("カタカナ")
True
>>> has_japanese("Pink Floyd")
False
Source code in src/core/models/script_detection.py
has_korean
¶
Check if the text contains Korean characters.
Covers: - Hangul Syllables (U+AC00-U+D7AF) - Korean alphabet - Hangul Jamo (U+1100-U+11FF) - Korean alphabet components
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Text to analyze |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if Korean characters are found, False otherwise |
Examples:
Source code in src/core/models/script_detection.py
has_latin
¶
Check if the text contains Latin alphabetic characters.
Covers: - Basic Latin (U+0041-U+005A, U+0061-U+007A) - A-Z, a-z - Latin-1 Supplement (U+0080-U+00FF) - Accented letters - Latin Extended-A (U+0100-U+017F) - Eastern European - Latin Extended-B (U+0180-U+024F) - African languages, phonetic
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Text to analyze |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if Latin alphabetic characters are found, False otherwise |
Examples:
>>> has_latin("Pink Floyd")
True
>>> has_latin("Café")
True
>>> has_latin("МУР") # noqa: RUF002
False
>>> has_latin("123")
False
>>> has_latin("!!!")
False
Source code in src/core/models/script_detection.py
has_thai
¶
Check if the text contains Thai characters.
Covers: - Thai (U+0E00-U+0E7F) - Thai alphabet
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Text to analyze |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if Thai characters are found, False otherwise |
Examples:
Source code in src/core/models/script_detection.py
get_all_scripts
¶
Get all scripts detected in text.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Text to analyze |
required |
Returns:
| Type | Description |
|---|---|
list[ScriptType]
|
List of detected script types |
Source code in src/core/models/script_detection.py
detect_primary_script
¶
Detect the primary script used in text.
Returns the most dominant script type found in the text. Uses character counting to determine dominance when multiple scripts are present.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Text to analyze |
required |
Returns:
| Type | Description |
|---|---|
ScriptType
|
Primary script type |
Examples:
>>> detect_primary_script("МУР") # noqa: RUF002
ScriptType.CYRILLIC
>>> detect_primary_script("Pink Floyd")
ScriptType.LATIN
>>> detect_primary_script("МУР featuring John") # noqa: RUF002
ScriptType.MIXED
>>> detect_primary_script("音楽")
ScriptType.JAPANESE
Source code in src/core/models/script_detection.py
is_script_type
¶
Check if text contains a specific script type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Text to analyze |
required |
script_type
|
ScriptType
|
Script type to check for |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if the script type is detected in the text |
Examples:
>>> is_script_type("МУР", ScriptType.CYRILLIC) # noqa: RUF002
True
>>> is_script_type("Pink Floyd", ScriptType.LATIN)
True
>>> is_script_type("音楽", ScriptType.JAPANESE)
True
Source code in src/core/models/script_detection.py
is_primarily_cyrillic
¶
Check if text is primarily in Cyrillic script.
Legacy compatibility function for existing API prioritization logic.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Text to analyze |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if text is primarily Cyrillic (pure or mixed with Cyrillic dominance) |
Examples:
>>> is_primarily_cyrillic("МУР") # noqa: RUF002
True
>>> is_primarily_cyrillic("МУР feat. John") # noqa: RUF002
True
>>> is_primarily_cyrillic("Pink Floyd")
False