Skip to content

TAXII Collector bot and STIX Parser bot #2611

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from

Conversation

laciKE
Copy link

@laciKE laciKE commented Apr 29, 2025

As a bare minimum, TAXII Collector currently collects only the objects of type indicator. These objects contain information about indicators and the detection patterns, e.g. in stix, pcre, sigma, snort, suricata, yara format. The pattern, pattern_type and valid_from properties are required, while confidence, description and labels are only optional properties. However, they are present in several TAXII feeds and could be used to determine classification.taxonomy and classification.type even without processing the relationships of the indicators (e.g. indicator indicates malware)

STIX Parser is currently capable of parsing objects of type indicator (usually retrieved from the TAXII Collector). From the indicator objects, it extracts the detection pattern (currently only the single Observation Expressions in STIX format are supported). It supports IP addresses, Domains and URLs indicator values. Moreover, this parser also attempts to extract some optional properties of STIX objects such as description and labels, which can be useful for futher classification of the event with the Expert Bots

TAXII Collector tests for missing parameters and mock the simple TAXII server providing minimal collection with simple indicator object STIX Parser tests fo indicator patterns parsing
Improvements based on @sebix comments, collection title used as feed.code Fix codestyle in TAXII and STIX bots
Fix Python 3.8 support in STIX Parser bot.

As a bare minimum, TAXII Collector currently collects only the objects of type indicator. These objects contain information about indicators and the detection patterns, e.g. in stix, pcre, sigma, snort, suricata, yara format. The pattern, pattern_type and valid_from properties are required, while confidence, description and labels are only optional properties. However, they are present in several TAXII feeds and could be used to determine classification.taxonomy and classification.type even without processing the relationships of the indicators (e.g. indicator indicates malware)

STIX Parser is currently capable of parsing objects of type indicator (usually retrieved from the TAXII Collector). From the indicator objects, it extracts the detection pattern (currently only the single Observation Expressions in STIX format are supported). It supports IP addresses, Domains and URLs indicator values. Moreover, this parser also attempts to extract some optional properties of STIX objects such as description and labels, which can be useful for futher classification of the event with the Expert Bots

TAXII Collector tests for missing parameters and mock the simple TAXII server providing minimal collection with simple indicator object
STIX Parser tests fo indicator patterns parsing
Improvements based on @sebix comments, collection title used as feed.code
Fix codestyle in TAXII and STIX bots
Fix Python 3.8 support in STIX Parser bot
@laciKE
Copy link
Author

laciKE commented Apr 29, 2025

The TAXII and STIX bots are currently tested with the ESET Threat Intelligence (ETI) feeds.
Recently, ETI added several new feeds which are available only via TAXII/STIX 2.1, and older ESETCollectorBot and ESETParserBot cannot handle them.

I am working on Expert Bot for classification events from ETI and I would like to publish it when it will be ready - together with feeds in feeds.yaml

@laciKE
Copy link
Author

laciKE commented May 1, 2025

Hello, I have a question regarding the proposal from the last commit.

I created ESETExpertBot which can add the proper classification.type and malware.name (if possible) to the events produced by StixParserBot. Ref: https://github.com/laciKE/intelmq/blob/eset/intelmq/bots/experts/eset/expert.py

When I wanted to add ESET Threat Intelligence TAXII feeds to feeds.yaml also with the expert bot, too, the tests failed, because it seems that the expert bot is not allowed in feeds.yaml.

Especially with the TAXII feeds, three bots will be needed to ingest those feeds:

  • Collect STIX objects from TAXII server (generic TAXII Collector)
  • Parse generic STIX indicator objects (generic STIX Parser)
  • Apply vendor-specific enrichment of events based on optional STIX properties used by the particular vendor (vendor-specific Expert bot).

As far as I understand, two parsers cannot by chained in the pipeline (because the input is Report, and output is the Event).
What is the suggested way to do a three-step ingestion in similar cases? One generic Parser bot for given format, and all vendor-specific bots should inherit from that generic parser bot?

@sebix
Copy link
Member

sebix commented May 2, 2025

From what I understand, reading the code, the ESET expert fixes the classification for all events coming from the ESET feed. That logic should be in the Parser instead. Or is the code of ESET expert also useful for other sources other than ESET?

@laciKE
Copy link
Author

laciKE commented May 2, 2025

Thank you for your answer. You are right, that expert bot fixes the classification and it is ESET-specific. I will change it to parser bot, which will inherit from the StixParserBot from this pull request. After that, I will add the commits with "EsetStixParserBot" to this pull request.

@sebix
Copy link
Member

sebix commented May 2, 2025

Ah, I see. That parser also works for multiple sources, other than ESET?

@laciKE
Copy link
Author

laciKE commented May 2, 2025

This StixParserBot yes, it should work for any source which provide Threat Intelligence data in STIX 2.1 format. I created it from scratch by reading STIX 2.1 documentation, and it is able to parse Indicators Objects with simple Patterns.

StixParserBot (and TaxiiCollectorBot) should be used with any TAXII/STIX 2.1 feed. General parsing of indicators works, but for correct classification, the vendor-specific bot is needed. This is why I asked what is the proper way to do it.

Currently I tested TaxiiCollectorBot+StixParserBot only with ESET Threat Intelligence TAXII feeds, because I do not have access to other TAXII 2.1 feeds. For correct classification, I created the ESETExpertBot, which I am going to change to ESETStixParserBot (it will by child a class of generic StixParserBot)

laciKE added 2 commits May 3, 2025 01:43
Parser bot for enriching events from ESET Threat Intelligence, which were collected by TaxiiCollectorBot. It inherits from generic StixParserBot and implement vendor-specific parsing. ESET STIX Parser bot analyzes comment (based on original description of STIX Indicator object) and choose proper classification type and if possible, also fills the malware.name in the event.
ETI feeds with URLs, domains and IP addresses, which can be collected by TaxiiCollectorBot and parsed by ESETStixParserBot
@laciKE laciKE force-pushed the taxii branch 2 times, most recently from 0760f22 to c177068 Compare May 3, 2025 01:04
@laciKE laciKE marked this pull request as draft May 23, 2025 21:48
@laciKE
Copy link
Author

laciKE commented May 23, 2025

I will try to do better parsing for STIX2 patterns.

Also, in ESET Threat Intelligence there are sometimes domains reported in URL feed and IP addresses in Domain feed, and this causes InvalidValue exceptions in produced events - I will try to address it, at least by discarding those indicators without raising exceptions (raise_failure=False).

laciKE added 3 commits May 29, 2025 00:04
Use the official STIX2 Pattern Validator to get thecomparison expressions and extracts simple IoCs from them. Support for URLs, Domains, IPv4, IPv6 and also for MD5, SHA-1 and SHA-256 hashes. Small fixes and workarounds implemented to address certain anomalies in STIX data provided by some vendors (e.g. ETI) - SHA1 and SHA256 keywords accepted, invalid objects reported as Domains or URLs are dropped without throwing the exceptions
@laciKE laciKE marked this pull request as ready for review May 28, 2025 23:57
@laciKE
Copy link
Author

laciKE commented May 29, 2025

Better parsing for STIX2 patterns ready, now the STIX parser bot can extract also hashes.

Above-mentioned issues with ESET Threat Intelligence fixed.

From my side, the PR is ready for review. If I should change something or if I forgot to do something, please, let me know, this is my first PR to IntelMQ.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants