Currently, the hash field set in schemas/hash.yml defines fields for common hashes (MD5, SHA1, etc.) but does not explicitly advise on the format of the hex-encoded hash values themselves. While the examples provided in the schema are lowercase, there is no normative text suggesting that users should normalize these values.
This ambiguity allows users to populate these fields with uppercase, lowercase, or mixed-case letters. If the underlying storage or query engine is case-sensitive (e.g. Elasticsearch keyword fields 😉 ), then the user might not match the threat indicator.
Proposal
Update the description in schemas/hash.yml to explicitly recommend that hash values be normalized to lowercase.
Current Description:
The hash fields represent different bitwise hash algorithms and their values.
Proposed Addition:
The hash fields represent different bitwise hash algorithms and their values. Field values should be normalized to lowercase (e.g. efd6... instead of EFD6...).
Future Consideration
Ideally, a lowercase normalizer should be applied to these fields in the generated Elasticsearch templates so this is handled automatically at index and query time.
Related PR/discussion from 2019 for additional context: https://github.com/elastic/ecs/pull/426/changes#r276024283
Currently, the
hashfield set inschemas/hash.ymldefines fields for common hashes (MD5, SHA1, etc.) but does not explicitly advise on the format of the hex-encoded hash values themselves. While the examples provided in the schema are lowercase, there is no normative text suggesting that users should normalize these values.This ambiguity allows users to populate these fields with uppercase, lowercase, or mixed-case letters. If the underlying storage or query engine is case-sensitive (e.g. Elasticsearch
keywordfields 😉 ), then the user might not match the threat indicator.Proposal
Update the description in
schemas/hash.ymlto explicitly recommend that hash values be normalized to lowercase.Current Description:
Proposed Addition:
Future Consideration
Ideally, a lowercase normalizer should be applied to these fields in the generated Elasticsearch templates so this is handled automatically at index and query time.
Related PR/discussion from 2019 for additional context: https://github.com/elastic/ecs/pull/426/changes#r276024283