Skip to content

[EN Currency] ISO codes not recognised as currency prefixes; crore/lakh multipliers missing from currency context #3211

@nikitabuxy

Description

@nikitabuxy

Describe the bug
When an ISO currency code is directly concatenated to a number (no space between
them), the currency model returns a wrong numeric value with high confidence and
no review flag. This is silent data corruption — the caller has no indication
the returned value is incorrect.

To Reproduce

Install: pip install recognizers-number-with-unit

from recognizers_text import Culture
from recognizers_number_with_unit import NumberWithUnitRecognizer

model = NumberWithUnitRecognizer(Culture.English).get_currency_model()

print(model.parse("USD34.6 million"))
# Returns: [{'value': '6000000', 'unit': None}]

print(model.parse("VND4,927 billion"))
# Returns: [{'value': '927000000000', 'unit': None}]

print(model.parse("USD0.92 Million"))
# Returns: [{'value': '92000000', 'unit': None}]

**Expected behavior**

model.parse("USD34.6 million")
# Expected: [{'value': '34600000', 'unit': 'United States dollar'}]

model.parse("VND4,927 billion")
# Expected: [{'value': '4927000000000', 'unit': 'Vietnamese dong'}]

model.parse("USD0.92 Million")
# Expected: [{'value': '920000', 'unit': 'United States dollar'}]


**Sample input/output**

┌──────────────────┬──────────────┬────────────────┐
│      InputActual valueExpected value │
├──────────────────┼──────────────┼────────────────┤
│ USD34.6 million600000034600000       │
├──────────────────┼──────────────┼────────────────┤
│ USD4.1 Million10000004100000        │
├──────────────────┼──────────────┼────────────────┤
│ USD0.92 Million92000000920000         │
├──────────────────┼──────────────┼────────────────┤
│ AUD1.2 million20000001200000        │
├──────────────────┼──────────────┼────────────────┤
│ VND4,927 billion9270000000004927000000000  │
└──────────────────┴──────────────┴────────────────┘

Root cause: QueryProcessor.preprocess() lowercases the query before
extraction. The internal EnglishNumberExtractor (Unit mode) then
misreads the lowercased formusd34.6 million has the decimal 34.
treated as a sentence-ending period, so only 6 million is extracted.
Similarly, vnd4,927 billion has 4, absorbed into the non-numeric
prefix, leaving only 927 billion. Both cases return a high-confidence
result with no flag.


**Platform (please complete the following information):**
- Platform: Python
- Environment: pip package (recognizers-number-with-unit)
- Version: 1.0.1
- Culture: English


**Additional context**
The spaced variant (USD 34.6 million) also currently returns no result
because ISO codes are absent from CurrencyPrefixListthat is tracked
as a separate enhancement request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions