Skip to content

Release: ab_sniffer_v4. New frameworks & global ESLint enforcement

Add a heading

We're excited to announce the release of ab_sniffer_v4, a new challenge template that introduces additional evaluation frameworks and raises the baseline quality standards for all submissions.

Release Date

ab_sniffer_v4 will be released on: October 16, 2025 10:00 AM UTC.
Please treat that as the authoritative go-live time for the new challenge.

What’s included in ab_sniffer_v4

The v4 release adds three new frameworks for miner submissions: - botasaurus: automaton framework for scripted bot evaluations.
- pydoll: Python-based evaluation harness for flexible test scenarios.
- human: real device interaction mode.

The human framework uses a real device interacting with the miner's injected web page. This allows evaluation of miner scripts under realistic user-driven conditions (touch/keyboard interactions, timing variability, and real browser/device subtleties). In other words, human tests how your script behaves when a real person interacts with the page, not a synthetic emulator.

ESLint: now enforced across all challenges

To raise baseline code quality and prevent rule bypassing, ESLint checks are now integrated into all challenges, not just ab_sniffer_v4. The shared ESLint configuration sets:

"noInlineConfig": true

This prevents miners from disabling rules via inline comments such as /* eslint-disable */ or // eslint-disable-line. If a rule flags a problem, the correct approach is to address the underlying issue or propose a change to the shared config — not to silence the linter inline.

Inline ESLint disables will be ignored and flagged. Please update code to conform to the rules or submit a change request for the shared config if you have a legitimate need.

Scoring results (scoring_result)

Miners can now view their validator scoring results in JSON form (available from the Hugging Face view for the submission). The scoring_result JSON contains per-run validator outputs — for example, predictions, detection flags, driver/framework used, and execution time.

Short example of scoring_result:

{
    "0": [
        {
            "driver": "human",
            "predicted": "HeadlessBot",
            "detected": false,
            "execution_time": 15.676664113998413
        }
    ],
    "1": [
        {
            "detected": false,
            "predicted": "Chrome",
            "driver": "seleniumbase",
            "execution_time": 15.242670774459839
        }
    ],
    "2": [
        {
            "driver": "human",
            "predicted": "HeadlessBot",
            "detected": false,
            "execution_time": 12.766318559646606
        }
    ],
    "3": [
        {
            "detected": false,
            "predicted": "Chrome",
            "driver": "seleniumbase",
            "execution_time": 15.284128427505493
        }
    ],
    "4": [
        {
            "driver": "human",
            "predicted": "HeadlessBot",
            "detected": false,
            "execution_time": 12.783197164535522
        }
    ],
    "5": [
        {
            "detected": false,
            "predicted": "Chrome",
            "driver": "seleniumbase",
            "execution_time": 15.305853605270386
        }
    ]
}

Fields explained:

  • driver: the evaluation framework / driver used (e.g., human, seleniumbase, botasaurus, pydoll).
  • predicted: result of a submission for that session.
  • detected: boolean indicating that marks match of pridicted value with the driver.
  • execution_time: the execution time of script in that session in seconds (float).

Where to find it

scoring_result JSON is available in the validators' Hugging Face view for your submission . Check the submission page to download or inspect the JSON for each validator run.

Why these changes

The new frameworks (including the real-device human mode) and the global ESLint enforcement improve the realism, fairness, and maintainability of the RedTeam evaluation ecosystem. They help ensure that miners are judged on robust, reproducible behavior and on sound implementation practices.