Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Changelog

Beginner Guide

我们来开启 WarpParse 的使用学习。

准备工作

安装

curl  -sSf https://get.warpparse.ai/setup.sh | bash

最新发布页

MacOS 提示

  • 打开安装限制

下载学习示例

git clone https://github.com/wp-labs/wp-examples.git

目标1: 让WarpParse 运行起来

初始化项目

wproj init

mkdir ${HOME}/wp-space ;
cd  ${HOME}/wp-space;

wproj init -m full
  • 项目的结构
tree -L 2
.
├── conf
│   ├── wparse.toml
│   └── wpgen.toml
├── connectors
│   ├── sink.d
│   └── source.d
├── data
│   ├── in_dat
│   ├── logs
│   ├── out_dat
│   └── rescue
├── models
│   ├── knowledge
│   ├── oml
│   └── wpl
└── topology
    ├── sinks
    └── sources

生成测试数据

第一个例子,是最为简单的从文件解析到文件

wpgen sample -n 3000 

引擎解析数据

wparse batch --stat 2 -p 

运行结果:

============================ total stat ==============================

+-------+------------+-----------------+---------+-------+---------+----------+--------+
| stage | name       | target          | collect | total | success | suc-rate | speed  |
+======================================================================================+
| Parse | parse_stat | /nginx//example |         | 3000  | 3000    | 100.0%   | 3.12   |
|-------+------------+-----------------+---------+-------+---------+----------+--------|
| Pick  | pick_stat  | file_1          |         | 3000  | 3000    | 100.0%   | 150.00 |
|-------+------------+-----------------+---------+-------+---------+----------+--------|
| Sink  | sink_stat  | demo/json       |         | 1280  | 1280    | 100.0%   | 3.56   |
+-------+------------+-----------------+---------+-------+---------+----------+--------+

数据统计

wproj data stat
  • 输出
== Sources ==
| Key    | Enabled | Lines | Path                    | Error |
|--------|---------|-------|-------------------------|-------|
| file_1 |    Y    |  3000 | .../data/in_dat/gen.dat | -     |
Total enabled lines: 3000

== Sinks ==
| Scope    | Sink        | Path                         | Lines |
|----------|-------------|------------------------------|-------|
| business | demo/json   | .../data/out_dat/demo.json   |  3000 |
| infra    | default/[0] | .../data/out_dat/default.dat |     0 |
| infra    | error/[0]   | .../data/out_dat/error.dat   |     0 |
| infra    | miss/[0]    | .../data/out_dat/miss.dat    |     0 |
| infra    | monitor/[0] | .../data/out_dat/monitor.dat |     0 |
| infra    | residue/[0] | .../data/out_dat/residue.dat |     0 |

只能基于文件才可以通过wproj统计

目标2: 解析自己的日志

学习 WPL 解析日志

样本: linux 系统日志

Oct 10 08:30:15 server systemd[1]: Started Apache HTTP Server.
  • 打开 editor工具
  • 看看 editor的几个例子
  • 学习下WPL

纳入WP工程

mkdir ./models/wpl/my_sys

样本放置:

./models/wpl/my_sys/sample.dat

WPL放置:

./models/wpl/my_sys/parse.wpl

生成数据

使用 wpgen

批量解析

使用 wparse

Docker

下载最新版本

docker pull ghcr.io/wp-labs/warp-parse:latest
sudo docker run -it --rm --user root --entrypoint /bin/bash <image-id>

getting_started

本用例用于“快速初始化 + 基准验证”。包含最小的源文件、基础的 sinks 组(default/miss/residue/intercept/error/monitor)以及示例业务路由(如有)。

目录结构

core/getting_started/
├── README.md                    # 本说明文档
├── run.sh                       # 一键运行脚本
├── conf/                        # 配置文件目录
│   ├── wparse.toml             # WarpParse 主配置
│   └── wpgen.toml              # 数据生成器配置
├── models/                      # 模型定义目录
│   ├── wpl/                    # WPL(WarpParse Language)模型定义
│   ├── oml/                    # OML(Object Mapping Language)模型定义
│   │   └── benchmark.oml       # 基准测试规则
│   ├── sources/                # 数据源配置
│   │   └── wpsrc.toml          # 数据源定义(文件/系统日志)
│   └── sinks/                  # 数据汇配置
│       ├── defaults.toml       # 默认配置
│       ├── infra.d/            # 基础设施 sinks
│       │   ├── default.toml    # 默认数据汇
│       │   ├── miss.toml       # 缺失数据处理
│       │   ├── residue.toml    # 残留数据处理
│       │   ├── error.toml      # 错误数据处理
│       │   └── monitor.toml    # 监控数据处理
│       └── business.d/         # 业务 sinks
│           ├── business.toml   # 业务数据处理
│           └── example/        # 示例业务处理
│               └── simple.toml # 简单示例
├── data/                        # 数据目录
│   ├── in_dat/                  # 输入数据目录
│   │   └── gen.dat             # 生成的测试数据
│   ├── out_dat/                 # 输出数据目录
│   │   ├── default.dat         # 默认输出
│   │   ├── miss.dat            # 缺失数据输出
│   │   ├── residue.dat         # 残留数据输出
│   │   ├── error.dat           # 错误数据输出
│   │   ├── monitor.dat         # 监控数据输出
│   │   └── business.dat        # 业务数据输出
│   ├── logs/                    # 日志文件目录
│   │   ├── wparse.log          # WarpParse 运行日志
│   │   └── wpgen.log           # 数据生成器日志
│   └── rescue/                  # 救援数据目录
│       └── *.rescue            # 救援数据文件
├── .run/                        # 运行时数据目录
│   ├── authority.sqlite        # 权限数据库
│   └── rule_mapping.dat        # 规则映射数据
├── sink.d/                      # sinks 目录符号链接
└── source.d/                    # sources 目录符号链接

快速开始

运行环境要求

  • WarpParse 引擎(需在系统 PATH 中)
  • Bash shell 环境
  • 基础系统工具(awk, grep, wc 等)

运行命令

# 进入用例目录
cd usecase/core/getting_started

# 运行完整流程(默认生成 3000 条测试数据)
./run.sh

# 或指定生成的数据条数
./run.sh 5000

运行选项

run.sh 脚本支持以下参数:

  • 无参数:使用默认值(生成 3000 条数据)
  • 数字参数:指定生成的数据条数(如 ./run.sh 5000

执行逻辑

流程概览

run.sh 脚本执行以下主要步骤:

  1. 环境初始化

    • 保留必要的配置文件(wparse.toml, wpgen.toml)
    • 清理历史运行数据
    • 创建必要的目录结构
    • 设置符号链接(sink.d, source.d)
  2. WarpParse 服务初始化

    • 使用 wparse init 初始化服务
    • 创建权限数据库和规则映射
  3. 配置与数据清理

    • 清空输入输出数据目录
    • 重置日志文件
    • 清理救援数据目录
  4. 生成测试数据

    • 使用 wpgen 根据配置生成测试数据
    • 默认生成 3000 条基准测试日志
    • 数据保存到 data/in_dat/gen.dat
  5. 验证输入数据

    • 检查生成的数据条数
    • 确保数据格式正确
  6. 执行数据处理

    • 启动 WarpParse 引擎
    • 加载 WPL/OML 模型
    • 处理输入数据并分发到各 sinks
  7. 验证输出结果

    • 检查各个 sinks 的输出文件
    • 验证数据处理的完整性
    • 统计处理结果

数据流向

输入数据 (data/in_dat/gen.dat)
    ↓
WarpParse 引擎
    ↓
┌─────────────┬─────────────┬─────────────┐
│  default    │    miss     │   residue   │
│    sink     │    sink     │    sink     │
└─────────────┴─────────────┴─────────────┘
┌─────────────┬─────────────┬─────────────┐
│    error    │   monitor   │   business  │
│    sink     │    sink     │    sink     │
└─────────────┴─────────────┴─────────────┘

关键处理节点

  1. 数据源处理

    • 文件数据源:读取 gen.dat 中的日志数据
    • 系统日志源:实时接收系统日志(本例中未启用)
  2. OML 规则匹配

    • /benchmark* 规则匹配特定格式的日志
    • 自动提取并处理数据
  3. Sinks 分发

    • default:正常处理的数据
    • miss:未被规则匹配的数据
    • residue:处理后的剩余数据
    • error:处理过程中产生的错误
    • monitor:性能和状态监控数据
    • business:业务相关的处理结果

配置说明

主配置文件 (conf/wparse.toml)

version = "1.0"
robust = "normal"

[models]
wpl = "./models/wpl"      # WPL 模型目录
oml = "./models/oml"      # OML 模型目录
sources = "./models/sources"  # 数据源配置目录
sinks = "./models/sinks"      # 数据汇配置目录

[performance]
rate_limit_rps = 10000    # 速率限制(请求/秒)
parse_workers = 2         # 解析工作线程数

[rescue]
path = "./data/rescue"    # 救援数据存储路径

[log_conf]
level = "warn,ctrl=info"
output = "File"          # 日志输出方式

[stat]
window_sec = 60          # 统计窗口时间(秒)

数据生成器配置 (conf/wpgen.toml)

[generator]
mode = "rule"           # 生成模式:rule 或 random
count = 1000           # 生成数据条数
speed = 1000           # 生成速度(条/秒)
parallel = 1           # 并行数

[output]
connect = "file_raw_sink"  # 输出连接器

[output.params]
base = "data/in_dat"       # 输出基准路径
file = "gen.dat"          # 输出文件名

OML 规则示例 (models/oml/benchmark.oml)

name : /oml/benchmark
rule : /benchmark*
---
* : auto = take() ;

该规则定义了:

  • name:规则的唯一标识符
  • rule:匹配以 /benchmark 开头的日志
  • 动作take() 表示提取并处理匹配的数据

Sinks 配置结构

每个 sink 配置文件包含:

version = "2.0"

[sink]
name = "default"              # sink 名称
type = "file_raw_sink"        # sink 类型
connect = "default_sink"      # 连接器名称

[sink.params]
base = "data/out_dat"         # 输出基准路径
file = "default.dat"          # 输出文件名

数据源配置 (models/sources/wpsrc.toml)

支持两种数据源:

  1. 文件数据源(默认启用)

    • 读取本地文件中的日志数据
    • 适合批量处理场景
  2. 系统日志源(默认禁用)

    • 实时接收系统日志
    • 适合实时流处理场景

验证与故障排除

运行成功验证

运行完成后,可以通过以下方式验证是否成功:

  1. 输出文件统计
wproj data stat 
== Sources ==
| Key       | Enabled | Lines | Path                  | Error |
|-----------|---------|-------|-----------------------|-------|
| demo_file |    Y    |  3000 | ./data/in_dat/gen.dat |   -   |
Total enabled lines: 3000

== Sinks ==
business   | business/out_kv           | ././data/out_dat/demo.kv                      | 2000
business   | /example//proto           | ././data/out_dat/simple.dat                   | 1000
business   | /example//kv              | ././data/out_dat/simple.kv                    | 1000
business   | /example//json            | ././data/out_dat/simple.json                  | 1000
infras     | default/[0]               | ././data/out_dat/default.dat                  | 0
infras     | error/[0]                 | ././data/out_dat/error.dat                    | 0
infras     | miss/[0]                  | ././data/out_dat/miss.dat                     | 0
infras     | monitor/[0]               | ././data/out_dat/monitor.dat                  | 0
infras     | residue/[0]               | ././data/out_dat/residue.dat                  | 0
  1. 查看运行日志

    # WarpParse 运行日志
    tail -f data/logs/wparse.log
    
    # 数据生成器日志
    tail -f data/logs/wpgen.log
    
  2. 验证数据完整性

wproj data  validate
wproj data validate
validate: PASS
Total input: 3000 (source=override)

| Sink            | Actual | Expected  | Lines/Denom | Verdict |
|-----------------|--------|-----------|-------------|---------|
| /example//proto |  0.333 | 0.33±0.02 |  1000/3000  |    OK   |
| /example//kv    |  0.333 | 0.33±0.02 |  1000/3000  |    OK   |
| /example//json  |  0.333 | 0.33±0.02 |  1000/3000  |    OK   |
| default/[0]     |    0   |   0±0.02  |    0/3000   |    OK   |
| error/[0]       |    0   | 0.01±0.02 |    0/3000   |    OK   |
| miss/[0]        |    0   | [0 ~ 0.1] |    0/3000   |    OK   |
| monitor/[0]     |    0   |  [0 ~ 1]  |    0/3000   |    OK   |

常见问题与解决方案

1. WarpParse 命令未找到

错误信息wparse: command not found

解决方案

  • 确保 WarpParse 已正确安装
  • 将 WarpParse 添加到系统 PATH 中
  • 或使用完整路径运行

2. 权限不足

错误信息Permission denied

解决方案

chmod +x run.sh
chmod -R 755 data/

3. 数据生成失败

可能原因

  • wpgen 配置错误
  • 磁盘空间不足
  • 并发数设置过高

解决方案

  • 检查 conf/wpgen.toml 配置
  • 清理磁盘空间
  • 降低 parallel 参数值

评估

Task 1

要求:

  • 1、通过二进制执行命令,输入是文件,输出是文件
  • 2、要求输出结果包括字段名和字段类型都完全一致

样本:

[20/Feb/2018:12:12:14 +0800] 112.195.209.90 - -  "GET / HTTP/1.1" 200 190 "-" "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Mobile Safari/537.36" "-"

结果:

{
  "remote_addr": "112.195.209.90",
  "time_local": "2018-02-20 12:12:14",
  "request": "GET / HTTP/1.1",
  "status": "200",
  "body_bytes_sent": 190,
  "http_referer": "-",
  "http_user_agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Mobile Safari/537.36",
  "http_x_forwarded_for": "-"
}

T2

要求

  • 通过二进制执行命令,输入是 TCP,输出是文件
  • 通过 wpgen 命令实现发送数据
  • 要求输出结果包括字段名和字段类型都完全一致

样本:

http 2018-11-30T22:23:00.186641Z app/my-lb 192.168.1.10:2000 10.0.0.15:8080 0.01 0.02 0.01 200 200 100 200 "POST https://api.example.com/u?p=1&sid=2&t=3 HTTP/1.1" "Mozilla/5.0 (Win) Chrome/90" "ECDHE" "TLSv1.3" arn:aws:elb:us:123:tg "Root=1-test" "api.example.com" "arn:aws:acm:us:123:cert/short" 1 2018-11-30T22:22:48.364000Z "forward" "https://auth.example.com/r" "err" "10.0.0.1:80" "200" "cls" "rsn" TID_x1

结果:

{
  "log_type": "http",
  "timestamp": 1543616580186,
  "elb": "app/my-lb",
  "client_host": "192.168.1.10:2000",
  "target_host": "10.0.0.15:8080",
  "request_processing_time": 0.01,
  "target_processing_time": 0.02,
  "response_processing_time": 0.01,
  "elb_status_code": "200",
  "target_status_code": "200",
  "received_bytes": 100,
  "sent_bytes": 200,
  "request_method": "POST",
  "request_url": "https://api.example.com/u?p=1&sid=2&t=3",
  "request_protocol": "HTTP/1.1",
  "user_agent": "Mozilla/5.0 (Win) Chrome/90",
  "ssl_cipher": "ECDHE",
  "ssl_protocol": "TLSv1.3",
  "target_group_arn": "arn:aws:elb:us:123:tg",
  "trace_id": "Root=1-test",
  "domain": "api.example.com",
  "chosen_cert_arn": "arn:aws:acm:us:123:cert/short",
  "matched_rule_priority": "1",
  "request_creation_time": "2018-11-30 22:22:48.364",
  "actions_executed": "forward",
  "redirect_url": "https://auth.example.com/r",
  "error_reason": "err",
  "target_port_list": "10.0.0.1:80",
  "target_status_code_list": "200",
  "classification": "cls",
  "classification_reason": "rsn",
  "traceability_id": "TID_x1"
}

T3

要求

  • 通过 docker 启动 wparse,输入是 TCP,输出是文件
  • TCP通过 wpgen 命令 发送
  • 要求输出结果包括字段名和字段类型都完全一致
  • 需要根据 sip 和 dip 判断内外网

原文:

{"update_time":"2024-12-03 10:23:22","access_ip":"0.0.0.0","packet_data":"dXNlcjphZG1pbiBwYXNzd29yZDoxMjM0NTY=","ip":"1.1.1.1:1111->10.0.0.1:2222","attack_result": "1","log_type":"flow_ty_attack"}

结果:

{
  "log_type": "flow_ty_attack",
  "access_ip": "0.0.0.0",
  "sip": "1.1.1.1",
  "sport": 1111,
  "dip": "10.0.0.1",
  "dport": 2222,
  "user_name": "admin",
  "password": "123456",
  "update_time": "2024-12-03 10:23:22",
  "attack_result": "成功",
  "src_zone": "External",
  "dst_zone": "Internal"
}

T4

要求

  • 通过 docker 启动 wparse,输入是 KAFKA,输出是 MYSQL
  • KAFKA通过 wpgen 命令 发送
  • 要求输出结果包括字段名和字段类型都完全一致
  • 按提示完成 oml 转化

原文:

222.133.52.20 simple_chars 80 192.168.1.10 select_one left 2025-12-29 12:00:00 {"msg":"hello"} "" aGVsbG8gd29ybGQ= ["val1","val2","val3"] /home/user/file.txt  http://example.com/path/to/resource?foo=1&bar=2  [{"one":{"two":"nested"}}] foo bar baz qux 500 ext_value_1 ext_value_2  &lt;script&gt;  hello"world 12345

结果:

{
    "direct_chars": "13", //直接赋值
    "direct_digit": 13,
    "simple_chars": "simple_chars",//直接赋值
    "simple_port": 80,
    "simple_ip": "192.168.1.10",
    "ip_ip4_to_int": 3232235786, //ip转int(192.168.1.10)
    "html_unescape": "<script>", //html转码
    "html_escape": "&lt;script&gt;",//html转码
    "str_escape": "hello\\\"world",//转义
    "select_chars": "select_one", //使用option
    "match_chars": "1",			// left为1,right为2
    "time": "2026-01-12 19:38:43.452811", //当前标准时间
    "date": 20260112, //当前时间(YYYYMMDD格式)
    "hour": 2026011219, //当前时间(YYYYMMDDHH格式)
    "timestamp": 1766980800000, //北京时间毫秒级时间戳(使用日志中的时间2025-12-29 12:00:00)
    "timestamp1": 1766980800, //秒(使用日志中的时间2025-12-29 12:00:00)
    "timestamp2": 1766980800000, //毫秒(使用日志中的时间2025-12-29 12:00:00)
    "timestamp3": 1766980800000000, //微秒(使用日志中的时间2025-12-29 12:00:00)
    "timestamp4": 1768246851, //UTC+8秒(使用当前时间)
    "timestamp5": 1768246851009, //UTC+8 毫秒(使用当前时间)
    "timestamp6": 1768246851009507, //UTC+8 微秒(使用当前时间0)
    "base64_en": "aGVsbG8=", //base64
    "base64_de": "hello word",
    "array_get": "val1", //数组取值
    "array_str": "[\"val1\",\"val2\",\"val3\"]", //数组转json
    "name": "file.txt", //文件路径取值
    "path": "/path/to/resource", // 从全路径中获取目录路径
    "domain": "example.com", //http取值
    "host": "example.com",
    "uri": "/path/to/resource?foo=1&bar=2",
    "params": "foo=1&bar=2",
    "obj": "nested", //多层取值
    "splice": "foo:bar|baz:qux", //字符拼接
    "num_range": "大于 0 小于 1000", //范围判断
    "extends": { //扩展字段
        "extend1": "ext_value_1",
        "extend2": "ext_value_2"
    }
}

Core Examples

This directory contains core end-to-end examples and scenario-based configurations for quickly validating parsing, routing, filtering, metrics, and verification capabilities.

Case List

CasePurposeValidated Features
confvars_caseConfiguration variables usageVariable substitution, environment overrides
error_reportingError data reporting with multi-format outputError routing, JSON/KV output, OML transformation
file_sourceFile-based data source ingestionFile source, batch processing
knowdb_caseKnowledge database queries and data associationSQL-like OML queries, CSV knowledge bases, dynamic lookup
oml_examplesComprehensive OML transformationConditional matching, range matching, tuple matching, knowledge base queries
prometheus_metricsPrometheus metrics exportHTTP /metrics endpoint, counters, gauges, histograms
sink_filterSink-level filtering and data splittingFilter rules, multi-path routing, expectation validation
sink_recoverySink failure handling and data recoveryRescue files, interruption/recovery cycle, replay pipeline
syslog_udpUDP Syslog source integrationUDP syslog reception, parsing, routing
tcp_roundtripTCP input/output end-to-end linkTCP source/sink, data flow validation
wpl_missingWPL field missing and fault toleranceOptional fields, miss group handling, data completeness
wpl_pipeWPL pipeline preprocessingBase64 decoding, unquote/unescape, trim operations
wpl_successSuccessful full-chain WPL parsingMulti-rule parsing, data_type tags, routing validation
stat_testStatistical testingStatistics validation, test scenarios

Common Directory Structure

Each case typically follows this structure:

case_name/
├── README.md                 # Documentation
├── run.sh                    # Execution script
├── conf/
│   ├── wparse.toml          # Main WarpParse configuration
│   └── wpgen.toml           # Data generator configuration (optional)
├── models/
│   ├── wpl/                 # WPL parsing rules
│   ├── oml/                 # OML transformation models
│   ├── knowledge/           # Knowledge base data (CSV/SQL)
│   └── sinks/               # Sink routing configuration
├── data/
│   ├── in_dat/              # Input data
│   ├── out_dat/             # Output data
│   └── logs/                # Processing logs
└── topology/                # Alternative structure for some cases

Quick Start

# Enter case directory
cd core/<case_name>

# Run the case
./run.sh

# Check statistics
wproj data stat

# Validate output
wproj data validate

FAQ

  • Filter not working:
    • Paths are resolved relative to current working directory; ensure filter.conf is accessible from sink_root
    • Expressions must be parseable by TCondParser; test with simple expressions first
  • Prometheus not started:
    • Without configuring Prometheus connector and switching monitor group to it, no /metrics endpoint will be available
  • Parameter override failed:
    • params keys must be in the connector’s allow_override whitelist

Convention over configuration: Always explicitly set name for each sink to get stable full_name and more readable validation reports; for filter-based cases, put filter conditions in filter.conf for reuse and review.


Core用例 (中文)

本目录收录核心端到端用例与场景化配置,便于快速验证解析、路由、过滤、度量与校验能力。

用例清单

用例目的验证特性
confvars_case配置变量使用变量替换、环境变量覆盖
error_reporting错误数据报表与多格式输出错误路由、JSON/KV 输出、OML 转换
file_source基于文件的数据源输入文件源、批处理
knowdb_case知识库查询与数据关联SQL 风格 OML 查询、CSV 知识库、动态查找
oml_examples综合 OML 转换示例条件匹配、范围匹配、元组匹配、知识库查询
prometheus_metricsPrometheus 指标导出HTTP /metrics 端点、计数器、仪表、直方图
sink_filterSink 级过滤与数据分流过滤规则、多路径路由、期望值校验
sink_recoverySink 故障处理与数据恢复救急文件、中断/恢复流程、回放管道
syslog_udpUDP Syslog 源集成UDP syslog 接收、解析、路由
tcp_roundtripTCP 输入/输出端到端链路TCP 源/汇、数据流验证
wpl_missingWPL 字段缺失与容错可选字段、miss 组处理、数据完整性
wpl_pipeWPL 管道预处理Base64 解码、unquote/unescape、trim 操作
wpl_successWPL 成功解析全链路多规则解析、data_type 标签、路由验证
stat_test统计测试统计验证、测试场景

通用目录结构

每个用例通常遵循以下结构:

case_name/
├── README.md                 # 文档说明
├── run.sh                    # 执行脚本
├── conf/
│   ├── wparse.toml          # WarpParse 主配置
│   └── wpgen.toml           # 数据生成器配置(可选)
├── models/
│   ├── wpl/                 # WPL 解析规则
│   ├── oml/                 # OML 转换模型
│   ├── knowledge/           # 知识库数据(CSV/SQL)
│   └── sinks/               # Sink 路由配置
├── data/
│   ├── in_dat/              # 输入数据
│   ├── out_dat/             # 输出数据
│   └── logs/                # 处理日志
└── topology/                # 部分用例的替代结构

快速开始

# 进入用例目录
cd core/<case_name>

# 运行用例
./run.sh

# 查看统计
wproj data stat

# 校验输出
wproj data validate

常见问题

  • filter 未生效
    • 路径基于当前工作目录解析;确保 filter.conf 相对 sink_root 可访问
    • 表达式需能被 TCondParser 解析;可先用简单表达式测试
  • Prometheus 未启动
    • 未配置 Prometheus 连接器并将 monitor 组切换到该连接器时,不会有 /metrics 端点
  • 覆盖参数失败
    • params 的键必须在连接器 allow_override 白名单中

约定优于配置:尽量为每个 sink 显式给出 name,以获得稳定的 full_name 与更可读的校验报表;对过滤型用例,请把拦截条件放在 filter.conf 文件,便于复用与审阅。

Configuration Variables

This example demonstrates how to use configuration variables for dynamic configuration management.

Purpose

Validate the ability to:

  • Define and use configuration variables in TOML files
  • Override variables via environment variables
  • Apply variables across sources, sinks, and model configurations

Features Validated

FeatureDescription
Variable SubstitutionUsing ${VAR} syntax in configuration files
Environment OverridesOverriding config values via environment variables
Default ValuesSetting fallback values for undefined variables
Cross-Config ReferencesUsing variables across multiple configuration files

Quick Start

cd core/confvars_case

# Run with default variables
./run.sh

# Run with custom environment variables
LINE_CNT=5000 STAT_SEC=5 ./run.sh

Directory Structure

confvars_case/
├── conf/wparse.toml       # Main config with variable references
├── models/
│   ├── wpl/               # Parsing rules
│   ├── oml/               # Transformation models
│   └── sinks/             # Sink routing
└── data/                  # Runtime data

Example Usage

# In wparse.toml
[performance]
rate_limit_rps = ${RATE_LIMIT:-500000}  # Default: 500000

[log_conf]
level = "${LOG_LEVEL:-info}"  # Default: info

配置使用变量 (中文)

本用例演示如何在配置中使用变量进行动态配置管理。

Error Reporting

This example demonstrates error data reporting with multi-format output for system monitoring logs.

Purpose

Validate the ability to:

  • Parse skyeye_stat system monitoring logs
  • Transform data using OML models
  • Output to multiple formats (JSON, KV)
  • Collect and analyze error data for reporting

Features Validated

FeatureDescription
WPL ParsingParsing system monitoring metrics (CPU, memory, disk)
OML TransformationData enrichment with take(), Time::now(), fmt(), object{}
Multi-format OutputJSON and KV format outputs
Field CollectionUsing collect for dynamic field gathering
Pipeline ProcessingUsing pipe ... | base64_en for data encoding

OML Features Demonstrated

  • take(): Extract fields from parsed results
  • Time::now(): Get current timestamp
  • fmt(): String formatting with template
  • object {}: Create nested objects
  • collect: Collect matching fields dynamically
  • pipe ... | base64_en: Pipeline processing with Base64 encoding

Quick Start

cd core/error_reporting
./run.sh

error_reporting (中文)

本用例演示“错误数据报表与多格式输出“的场景:针对 skyeye_stat 类型的系统监控日志进行解析,通过 OML 进行数据转换,并支持多种输出格式(JSON、KV 等)。适用于错误数据的收集、分析与报表生成。

目录结构

  • conf/:配置目录
    • conf/wparse.toml:主配置
    • conf/wpgen.toml:数据生成器配置
  • models/:规则与路由
    • models/wpl/skyeye_stat/:skyeye_stat 解析规则
    • models/wpl/example/simple/:示例解析规则
    • models/oml/skyeye_stat.oml:OML 转换模型
    • models/sinks/business.d/:业务路由
    • models/sinks/infra.d/:基础组
    • models/sources/wpsrc.toml:源配置
  • data/:运行数据目录

WPL 解析规则

skyeye_stat 规则 (models/wpl/skyeye_stat/parse.wpl)

解析 skyeye 系统监控日志,提取 CPU、内存、磁盘等系统指标:

package skyeye_stat {
   #[copy_raw(name:"raw_msg")]
   rule case1 {
        (digit<<,>>, digit, symbol(skyeye), _,
         time:updatetime\|\!, ip:sip\|\!, chars:log_type\|\![),
         some_of (
             json( symbol(空闲CPU百分比)@name, @value:cpu_free),
             json( symbol(空闲内存kB)@name, @value:memory_free),
             json( symbol(1分钟平均CPU负载)@name, @value:cpu_used_by_one_min),
             // ... 更多指标
         )\,
    }
}

OML 转换模型

skyeye_stat.oml

对解析后的数据进行二次转换与增强:

name : skyeye_stat
rule : skyeye_stat/*
---
src_key     : chars =  take() ;
recv_time   : time  = Time::now() ;
pos_sn      : chars =  take() ;
updatetime  : time  =  take() ;
sip         : chars =  take() ;
log_type    : chars =  take() ;
cust_tag    : chars = fmt("[{pos_sn}-{sip}]", @pos_sn, @sip ) ;

value  : obj = object {
  process,memory_free : float =  take() ;
  cpu_free,cpu_used_by_one_min, cpu_used_by_fifty_min : float =  take() ;
  disk_free,disk_used,disk_used_by_one_min, disk_used_by_fify_min : float =  take() ;
} ;

time_all : array = collect  take( keys : [ *time* ] ) ;
raw_msg : chars =  pipe take() | base64_en ;

快速使用

构建项目

cargo build --workspace --all-features

运行用例

cd core/error_reporting
./run.sh

脚本执行流程:

  1. 初始化环境与配置(保留 conf 以加载自定义配置)
  2. 使用 wpgen 生成样本数据
  3. 运行 wparse 批处理解析
  4. 校验输出统计

手动执行

# 初始化配置
wproj check
wproj data clean
wpgen data clean

# 生成样本数据
wpgen sample -n 3000 --stat 3

# 运行批处理解析
wparse batch --stat 3 -p -n 3000

# 校验输出
wproj data stat
wproj data validate

可调参数

通过环境变量覆盖:

  • LINE_CNT:生成行数(默认 3000)
  • STAT_SEC:统计间隔秒数(默认 3)

业务组配置

skyeye_stat 业务组 (models/sinks/business.d/skyeye_stat.toml)

支持多种输出格式:

  • JSON 格式输出
  • KV 格式输出

OML 特性演示

本用例展示了以下 OML 特性:

  • take():从解析结果中提取字段
  • Time::now():获取当前时间
  • fmt():字符串格式化
  • object {}:创建嵌套对象
  • collect:收集匹配字段
  • pipe ... | base64_en:管道处理与 Base64 编码

常见问题

Q1: 解析失败

  • 确认输入数据格式符合 WPL 规则定义
  • 查看 data/logs/wparse.log 中的详细错误信息
  • 检查 some_of 中的可选字段是否正确匹配

Q2: OML 转换失败

  • 确认 OML 中 rule 匹配的 WPL package 路径正确
  • 确认字段名与 WPL 解析结果一致

相关文档

File Source

This example demonstrates file-based data source ingestion for batch processing scenarios.

Purpose

Validate the ability to:

  • Read input data from local files
  • Process data through WPL parsing rules
  • Route parsed data to configured sinks
  • Handle file-based batch processing workflows

Features Validated

FeatureDescription
File SourceReading data from local file system
Batch ProcessingProcessing files in batch mode
Data RoutingRouting parsed data to business/infra sinks

Quick Start

cd core/file_source
./run.sh

Directory Structure

file_source/
├── conf/wparse.toml       # Main configuration
├── models/
│   ├── wpl/               # Parsing rules
│   ├── sinks/             # Sink routing
│   └── sources/           # File source config
└── data/
    ├── in_dat/            # Input files
    └── out_dat/           # Output files

FileSource (中文)

本用例演示基于文件的数据源输入场景,适用于批处理工作流。

KnowDB Case

This example demonstrates knowledge database (KnowDB) queries and data association for enriching parsed logs with business data.

Purpose

Validate the ability to:

  • Load CSV-based knowledge bases
  • Query knowledge bases using SQL-like OML syntax
  • Dynamically associate parsed log data with business data
  • Enrich parsed results with external lookups

Features Validated

FeatureDescription
CSV Knowledge BaseLoading data from CSV files
SQL-like Queriesselect ... from ... where in OML
Dynamic LookupRuntime data association
Table Configurationknowdb.toml for table mapping

Knowledge Base Structure

models/knowledge/example/
├── create.sql     # Table schema definition
├── data.csv       # Data file
└── insert.sql     # Data insertion statements

OML Query Example

# Query math score from example_score table
math_score = select math from example_score where id = read(sid);

Quick Start

cd core/knowdb_case
./run.sh

knowdb_case (中文)

本用例演示“知识库(KnowDB)查询与数据关联“的场景:通过 WPL 规则解析日志后,使用 OML 中的 select ... from ... where 语句从知识库中查询关联数据,实现日志解析与业务数据的动态关联。

目录结构

  • conf/:配置目录
    • conf/wpgen.toml:数据生成器配置(UDP syslog 输出)
  • models/:规则与路由
    • models/wpl/example/:WPL 解析规则(nginx 日志解析)
    • models/knowledge/example/:知识库数据(CSV 格式)
    • models/sinks/business.d/:业务路由
    • models/sinks/infra.d/:基础组
    • models/sources/wpsrc.toml:源配置(UDP syslog)
  • data/:运行数据目录
    • data/in_dat/:输入数据
    • data/out_dat/:sink 输出
    • data/logs/:日志目录

知识库配置

知识库表定义 (models/knowledge/example/)

models/knowledge/example/
├── create.sql     # 表结构定义
├── data.csv       # 数据文件
└── insert.sql     # 数据插入语句

示例数据 (data.csv):

name,pinying
令狐冲,linghuchong
任盈盈,renyingying

知识库配置 (models/knowledge/knowdb.toml)

定义知识库的加载路径与表映射关系。

WPL 解析规则

解析规则 (models/wpl/example/parse.wpl)

package /example {
  #[tag(from_dc: "warplog/cs/nginx")]
  rule nginx {
    (ip:sip,2*_,time<[,]>,http/request",http/status,digit,chars",http/agent",_")
  }
}

OML Examples

This example provides comprehensive OML (Object Modeling Language) transformation demonstrations covering data transformation, field mapping, conditional matching, and knowledge base queries.

Purpose

Validate the ability to:

  • Transform parsed data using OML models
  • Perform conditional matching with match expressions
  • Query knowledge bases with SQL-like syntax
  • Create nested objects and collect fields dynamically

Features Validated

FeatureDescription
Conditional Matchingmatch ... {} with pattern matching
Range Matchingin(digit(1), digit(3)) for value ranges
Tuple Matching(read(city), read(count)) for multi-field conditions
Negation Matching!chars(warp) for exclusion patterns
Optional Fieldstake(option:[severity]) for optional handling
Knowledge Base Queryselect ... from ... where SQL-like queries
Wildcard Collection* = take() for passthrough fields
Object Creationobject { ... } for nested structures
Pipeline Processingpipe ... | base64_en for transformations

OML Models Included

ModelDescription
csv_example.omlCSV data processing with conditional matching
skyeye_stat.omlSystem monitoring data transformation
work_case.omlBusiness case data processing

Quick Start

cd core/oml_examples
./run.sh

oml_examples (中文)

本用例演示“OML(Object Modeling Language)转换“的多种场景:通过丰富的 OML 示例展示数据转换、字段映射、条件匹配、知识库查询等高级特性。适用于学习 OML 语法与最佳实践。

目录结构

core/oml_examples/
├── README.md                    # 本说明文档
├── run.sh                       # 一键运行脚本
├── conf/                        # 配置文件目录
│   ├── wparse.toml             # WarpParse 主配置
│   └── wpgen.toml              # 数据生成器配置
├── models/                      # 规则与模型目录
│   ├── oml/                    # OML 转换模型
│   │   ├── csv_example.oml     # CSV 数据处理与条件匹配
│   │   ├── skyeye_stat.oml     # 系统监控数据转换
│   │   └── work_case.oml       # 工作案例数据处理
│   ├── knowledge/              # 知识库数据
│   │   ├── knowdb.toml         # 知识库主配置
│   │   ├── address/            # 地址信息知识库
│   │   │   ├── create.sql      # 建表语句
│   │   │   ├── data.csv        # 地址数据
│   │   │   └── insert.sql      # 插入语句
│   │   ├── example/            # 示例数据知识库
│   │   │   ├── create.sql      # 建表语句
│   │   │   ├── data.csv        # 示例数据
│   │   │   └── insert.sql      # 插入语句
│   │   └── example_score/      # 分数数据知识库
│   │       ├── create.sql      # 建表语句
│   │       ├── data.csv        # 分数数据
│   │       └── insert.sql      # 插入语句
│   ├── sinks/                  # 数据汇配置
│   │   ├── defaults.toml       # 默认配置
│   │   ├── business.d/         # 业务路由配置
│   │   │   ├── csv_example.toml # CSV 示例输出
│   │   │   ├── skyeye_stat.toml # 系统监控输出
│   │   │   └── work_case.toml   # 工作案例输出
│   │   └── infra.d/            # 基础设施配置
│   │       ├── default.toml    # 默认数据汇
│   │       ├── error.toml      # 错误数据处理
│   │       ├── miss.toml       # 缺失数据处理
│   │       ├── monitor.toml    # 监控数据处理
│   │       └── residue.toml    # 残留数据处理
│   ├── sources/                # 数据源配置(空)
│   └── wpl/                    # WPL 解析规则(空)
├── data/                        # 运行数据目录
│   ├── in_dat/                  # 输入数据目录
│   ├── out_dat/                 # 输出数据目录
│   │   ├── csv_example.dat     # CSV 处理结果
│   │   ├── skyeye_adm.json     # SkyEye ADM 输出
│   │   ├── skyeye_pdm.json     # SkyEye PDM 输出
│   │   └── work_case.json      # 工作案例输出
│   └── logs/                    # 日志文件目录
│       ├── gen.dat             # 生成的样本数据
│       └── *.log               # 各类日志文件
└── .run/                        # 运行时数据目录

快速开始

运行环境要求

  • WarpParse 引擎(需在系统 PATH 中)
  • Bash shell 环境

运行命令

# 进入用例目录
cd core/oml_examples

# 运行完整流程(默认生成 3000 条测试数据)
./run.sh

执行逻辑

流程概览

run.sh 脚本执行以下主要步骤:

  1. 环境初始化

    • 保留必要的配置文件(wparse.toml, wpgen.toml)
    • 清理历史运行数据
    • 创建必要的目录结构
    • 初始化知识库数据
  2. 服务检查

    • 使用 wproj check 验证配置和模型
    • 确保所有依赖项正常
  3. 数据清理

    • 清空输入输出数据目录
    • 重置日志文件
    • 清理生成器缓存
  4. 生成样本数据

    • 使用 wpgen sample 生成测试数据
    • 数据包含 CSV、JSON 等多种格式
    • 模拟真实的业务场景数据
  5. 执行数据解析

    • 启动 WarpParse 批处理模式
    • 加载 OML 模型进行数据转换
    • 应用知识库查询增强数据
  6. 验证输出结果

    • 统计各输出文件的数据量
    • 验证数据转换的正确性
    • 检查知识库查询结果

数据流向

生成数据 (data/logs/gen.dat)
    ↓
WarpParse OML 引擎
    ↓
┌─────────────┬─────────────┬─────────────┐
│ csv_example │ skyeye_stat │ work_case   │
│    处理     │    转换     │    解析     │
└─────────────┴─────────────┴─────────────┘
    ↓             ↓             ↓
┌─────────────┬─────────────┬─────────────┐
│csv_example  │skyeye_adm   │work_case    │
│    .dat     │  .json      │   .json     │
└─────────────┴─────────────┴─────────────┘
              └─────────────┘
                skyeye_pdm
                  .json

OML 示例详解

1. csv_example.oml - CSV 数据处理与条件匹配

name: csv_example
rule : csv_example/*
---
occur_time  =  Time::now() ;
year  = take();
sid   = digit(10);

quart   = match  read(month) {
    in ( digit(1) , digit(3) )    => chars(Q1);
    in ( digit(4) , digit(6) )    => chars(Q2);
    in ( digit(7) , digit(9) )    => chars(Q3);
    in ( digit(10) , digit(12) )  => chars(Q4);
    _ => chars(Q5);
};

level  = match  ( read(city) , read(count) ) {
   ( chars(cs) , in ( digit(81) ,  digit(200) ) )    => chars(GOOD);
   ( chars(cs) , in ( digit(0) ,   digit(80) )  )   => chars(BAD);
   ( chars(bj) , in ( digit(101) , digit(200) ) )   => chars(GOOD);
   ( chars(bj) , in ( digit(0) ,   digit(100) ) )   => chars(BAD);
    _ => chars(NOR);
};

vender = match  read(product)  {
    chars(warp)   =>   chars(warp-rd)
    !chars(warp)   =>  chars(other)
};

severity: chars =  match take(option:[severity]) {
    digit(0) => chars(未知);
    digit(1) => chars(信息);
    digit(2) => chars(低危);
    digit(3) => chars(高危(漏洞));
};

math_score = select math  from example_score where id = read(sid) ;
*  = take() ;

特性演示

  • 时间戳生成Time::now() 获取当前时间
  • 范围匹配in (digit(1), digit(3)) 匹配数值范围
  • 元组匹配(read(city), read(count)) 多字段联合匹配
  • 否定匹配!chars(warp) 排除特定值
  • 可选字段take(option:[severity]) 处理可选字段
  • 知识库查询select ... from ... where 动态查询数据
  • 透传字段* = take() 保留所有未处理字段

2. skyeye_stat.oml - 系统监控数据转换

name : skyeye_stat
rule : skyeye_stat/*
---
vendor      = chars(wparse) ;
v_ip        = ip(127.0.0.1) ;
recv_time   = Time::now() ;
cust_tag    = fmt("[{pos_sn}-{sip}]", @pos_sn, @sip ) ;

value  = object {
  process,memory_free : float =  take() ;
  cpu_free,cpu_used_by_one_min, cpu_used_by_fifty_min : float =  take() ;
  disk_free,disk_used,disk_used_by_one_min, disk_used_by_fify_min : float =  take() ;
} ;

time_all  = collect  take( keys : [ *time* ] ) ;
raw_msg  =  pipe take() | base64_en ;

特性演示

  • 常量赋值chars(wparse)ip(127.0.0.1) 固定值
  • 格式化字符串fmt() 模板字符串
  • 嵌套对象object { ... } 创建结构化数据
  • 字段收集collect take(keys: [*time*]) 动态收集字段
  • 管道处理pipe ... | base64_en 数据流处理

3. work_case.oml - 工作案例数据处理

name : work_case
rule : work_case/*
---
agent_id   = take() ;
symbol     = take() ;
botnet     = take() ;

# 条件分支处理
symbol: chars = match read(symbol) {
    chars(web_cve) => chars(Web漏洞);
    chars(os_cve)  => chars(系统漏洞);
    _              => read(symbol);
};

# 动态路径收集
path_list = collect read(keys: [details*path]) ;

# 透传剩余字段
* = take() ;

特性演示

  • 简单字段提取:基础的数据提取
  • 条件替换:基于值的字段映射
  • 通配符收集[details*path] 模糊匹配收集
  • 剩余字段处理:保留所有未定义字段

配置说明

主配置文件 (conf/wparse.toml)

version = "1.0"
robust = "normal"

[models]
wpl = "./models/wpl"
oml = "./models/oml"
sources = "./models/sources"
sinks = "./models/sinks"

[performance]
rate_limit_rps = 500000   # 高速率限制,适合批处理
parse_workers = 2         # 解析工作线程数

[rescue]
path = "./data/rescue"

[log_conf]
level = "warn,ctrl=info,launch=info,source=info,sink=info,stat=info,runtime=warn,oml=warn,wpl=warn,klib=warn,orion_error=error,orion_sens=warn"
output = "File"

[stat]
window_sec = 60

数据生成器配置 (conf/wpgen.toml)

[generator]
mode = "sample"          # 使用预定义样本
count = 1000            # 生成数据条数
speed = 1000            # 生成速度(条/秒)
parallel = 1            # 并发数

[output]
connect = "file_raw_sink"

[output.params]
base = "data/in_dat"
file = "gen.dat"

[log_conf]
level = "info"
output = "File"

知识库配置 (models/knowledge/knowdb.toml)

version = "2.0"

[default]
transaction = true       # 启用事务
batch_size = 2000       # 批处理大小

[csv]
has_header = true       # CSV 包含表头
delimiter = ","         # 分隔符
encoding = "utf-8"      # 编码

[table.example_score]
mapping = "header"      # 按表头映射
range = "5,110"         # 数据行范围

[table.address]
mapping = "index"       # 按索引映射
range = "5,110"         # 数据行范围

业务 Sink 配置示例 (models/sinks/business.d/csv_example.toml)

version = "2.0"

[sink]
name = "csv_example"
type = "file_csv_sink"
connect = "csv_sink"

[sink.params]
base = "data/out_dat"
file = "csv_example.dat"

[sink.expect]
basis = "total_input"
ratio = 0.7              # 期望 70% 的数据进入此 sink
deviation = 0.01         # 允许 1% 的偏差

验证

运行成功验证

  1. 输出文件统计

    wproj data stat
    
  2. 验证数据完整性

    wproj data validate
    
  3. 查看输出样例

    # CSV 输出
    head -20 data/out_dat/csv_example.dat
    
    # JSON 输出
    jq . data/out_dat/skyeye_adm.json | head -40
    

使用不同的输出格式

配置不同的 Sink 类型:

  • JSON 输出type = "file_json_sink"
  • CSV 输出type = "file_csv_sink"
  • KV 输出type = "file_kv_sink"
  • 原始输出type = "file_raw_sink"

本文档最后更新时间:2025-12-16

Prometheus Metrics

This example demonstrates Prometheus metrics export for monitoring system integration and performance observation.

Purpose

Validate the ability to:

  • Export internal WarpParse metrics via Prometheus sink
  • Expose HTTP /metrics endpoint for scraping
  • Support Grafana integration for visualization

Features Validated

FeatureDescription
Prometheus SinkExporting metrics via Prometheus protocol
HTTP Endpoint/metrics endpoint at configurable port
Counter MetricsInput/output counts, error counts
Gauge MetricsQueue depth, active connections
Histogram MetricsParse duration, processing latency

Metrics Types

TypeExamples
Counterwparse_input_total, wparse_output_total
GaugeQueue depth, active connections
Histogramwparse_parse_duration_seconds

Quick Start

cd core/prometheus_metrics
./run.sh

# Fetch metrics
curl -s http://localhost:35666/metrics

Grafana Integration

# Input rate
rate(wparse_input_total[1m])

# Output distribution by sink
sum by (sink) (wparse_output_total)

# P99 latency
histogram_quantile(0.99, wparse_parse_duration_seconds_bucket)

prometheus_metrics (中文)

本用例演示“Prometheus 指标导出“的场景:通过 Prometheus sink 导出 warp-flow 的内部运行指标,支持通过 HTTP /metrics 端点拉取指标数据。适用于监控系统集成与性能观测。

目录结构

  • conf/:配置目录
    • conf/wparse.toml:主配置(含 Prometheus 导出配置)
  • models/:规则与路由
    • models/wpl/:WPL 解析规则
    • models/oml/:OML 转换模型
    • models/sinks/business.d/:业务路由
    • models/sinks/infra.d/:基础组(含 monitor 组)
    • models/sources/wpsrc.toml:源配置(UDP syslog)
  • data/:运行数据目录

Prometheus 配置

Monitor 组配置

models/sinks/infra.d/monitor.toml 中配置 Prometheus sink:

[[sink_group]]
name = "/sink/infra/monitor"
connect = "prometheus_sink"

[[sink_group.sinks]]
name = "prometheus_exporter"

Prometheus 连接器

connectors/sink.d/ 中添加 Prometheus 连接器:

[connector]
id = "prometheus_sink"
type = "prometheus"

[connector.params]
endpoint = "127.0.0.1:35666"

快速使用

构建项目

cargo build --workspace --all-features

运行用例

cd core/prometheus_metrics
./run.sh

脚本执行流程:

  1. 初始化环境与配置
  2. 启动 wparse(syslog 接收 + Prometheus 导出)
  3. 等待服务启动(约 3 秒)
  4. 使用 wpgen 生成并发送样本数据
  5. 停止服务并校验输出

手动执行

# 初始化配置
wproj data clean

# 启动解析服务(后台)
wparse daemon --stat 2 -p &

# 等待服务启动
sleep 3

# 生成并发送样本
wpgen sample -n 1000 --stat 1 -p

# 拉取 Prometheus 指标
curl -s http://localhost:35666/metrics

# 停止服务
kill $(cat ./.run/wparse.pid)

# 校验输出
wproj data stat
wproj data validate --input-cnt 1000

可调参数

  • LINE_CNT:生成行数(默认 1000)
  • STAT_SEC:统计间隔秒数(默认 2)

Prometheus 指标

示例指标

# HELP wparse_input_total Total number of input records
# TYPE wparse_input_total counter
wparse_input_total 1000

# HELP wparse_output_total Total number of output records by sink
# TYPE wparse_output_total counter
wparse_output_total{sink="benchmark"} 950
wparse_output_total{sink="default"} 50

# HELP wparse_parse_duration_seconds Parse duration histogram
# TYPE wparse_parse_duration_seconds histogram
wparse_parse_duration_seconds_bucket{le="0.001"} 800
wparse_parse_duration_seconds_bucket{le="0.01"} 990
wparse_parse_duration_seconds_bucket{le="+Inf"} 1000

指标类型

  • Counter:输入/输出计数、错误计数
  • Gauge:队列深度、活跃连接数
  • Histogram:解析延迟、处理时长

Grafana 集成

添加数据源

  1. 在 Grafana 中添加 Prometheus 数据源
  2. URL: http://localhost:35666

示例查询

# 输入速率
rate(wparse_input_total[1m])

# 输出分布
sum by (sink) (wparse_output_total)

# P99 延迟
histogram_quantile(0.99, wparse_parse_duration_seconds_bucket)

常见问题

Q1: Prometheus 端点无响应

  • 确认 conf/wparse.toml 中启用了 Prometheus 导出
  • 确认 monitor 组连接器配置为 prometheus_sink
  • 检查端口是否被占用:lsof -i :35666

Q2: 指标为空

  • 确认 wparse 已接收到数据
  • 检查 data/logs/wparse.log 中的错误信息
  • 确认 sinks 路由配置正确

Q3: 指标延迟

  • Prometheus 指标通常有几秒的采集延迟
  • 确认 scrape_interval 配置合理

相关文档

Sink Filter

This example demonstrates sink-level filtering and data splitting based on business rules.

Purpose

Validate the ability to:

  • Route data to different sink paths (all/safe/residue) based on filter rules
  • Configure filter expressions for conditional routing
  • Validate output ratios using defaults.expect and per-sink expect settings
  • Ensure filter logic correctness and expected data distribution

Features Validated

FeatureDescription
Filter RulesConditional data routing via filter.conf
Multi-path RoutingSplitting data to all/safe/residue paths
Expectation ValidationRatio validation with expect settings
Default ExpectationsGroup-level defaults in defaults.toml
Tolerance Settingsratio, tol, min, max constraints

Filter Configuration

# sink/defaults.toml
[defaults.expect]
basis = "total_input"  # Denominator for ratio calculation
min_samples = 1
mode = "warn"

Quick Start

cd core/sink_filter
./run.sh

# Validate output ratios
wproj validate sink-file -v

sink_filter (中文)

本用例演示“按 sink 过滤/分流“的场景:依据业务规则将输入样本分发到不同的 sink 路径(all/safe/residue 等),并通过 defaults.expect 与单项 expect 对输出比例进行离线校验。适用于验证过滤逻辑正确性、残留/错误路径占比是否符合预期。

目录结构

  • conf/wparse.toml(主配置)、wpsrc.toml(v2 源配置)
  • connectors/source.d/:文件源连接器(可按需添加更多)
  • sink/(作为 sink_root
    • infra.d/:基础组(default/miss/residue/intercept/error/monitor)
    • business.d/:过滤型业务组路由(示例:filter/*.toml
    • defaults.toml:默认期望 [defaults.expect]
  • wpl/oml/:规则与对象模型
  • data/:运行输出(data/out_dat/data/logs/ 等;初始化后生成)
  • case_verify.sh:端到端校验脚本

默认期望(defaults.expect)

sink/defaults.toml 中设置默认组级期望(示例):

[defaults.expect]
basis = "total_input"  # 以总输入作为分母校验比例
min_samples = 1
mode = "warn"
  • 固定组(default/miss/residue/intercept/error)与 monitor 组若未显式设置 [group.expect],会继承该默认值。
  • 若某个组需要自定义期望,可在该组下声明 [group.expect] 覆盖默认。
  • 每个 sink 的单项约束在 [group.sinks.expect] 下配置:
    • 目标区间:ratio + tol(表示 ratio±tol
    • 上下限:min/max

过滤型业务组(示例)

  • 业务组定义:sink/filter/sink.toml
  • 过滤规则:sink/filter/filter.conf(命中条件)
  • 常见做法:
    • 主路径 sink(例如 all.dat)不设置 ratio,仅设置其他路径的上限 max 或设置 sum_tol 控制多个 ratio 的和
    • 安全路径 sink(例如 safe.dat)设置较高的 min,确保大部分样本进入安全路径
    • 错误/残留路径设置 max,避免过高比例

快速使用

在仓库根目录构建:

cargo build --workspace --all-features

在用例目录统计/校验:

cd usecase/core/sink_filter
# 统计源与 sink(文本/JSON)
../../../target/debug/wproj stat file
../../../target/debug/wproj stat file --json

# 离线校验(文本/JSON)
../../../target/debug/wproj validate sink-file
../../../target/debug/wproj validate sink-file --json

校验提示策略(WARN)

  • 当组级分母为 0(无样本)或小于 min_samples 时,校验会忽略该组,但打印 WARN 提示;仅 ERROR/PANIC 会导致 FAIL。

端到端脚本(可选)

如需完整跑一遍生成/过滤/校验流程,可执行:

./case_verify.sh

脚本会执行构建、生成样本、启动服务与校验。若你只需要离线校验与统计,可以直接使用 wproj stat/validate


建议:新增业务组时,统一在 sink/defaults.toml 维护 [defaults.expect],各组仅在确需与默认不同的场景下覆盖组级 expect;对单个 sink 的比例约束,请放在 [group.sinks.expect] 下设置。

Sink Recovery

This example demonstrates sink failure handling and data recovery workflows.

Purpose

Validate the ability to:

  • Handle sink write failures gracefully
  • Store failed data in rescue files (rescue/ directory)
  • Recover and replay rescue files to original target sinks
  • Manage .lock and .dat file lifecycle

Features Validated

FeatureDescription
Rescue File GenerationCreating <sink>-YYYY-MM-DD_HH:MM:SS.dat.lock on failure
File Lock Management.lock suffix during write, removed on completion
Recovery Daemonwprescue daemon for replaying rescue files
Checkpoint Trackingrescue/recover.lock for resumption
Ordered ReplayTime-sorted file processing

Recovery Workflow

  1. Interruption Phase: Failed writes create rescue files
  2. Recovery Phase: wprescue daemon replays rescue files to sinks
  3. Cleanup: Successfully processed files are deleted

Quick Start

cd core/sink_recovery

# Phase 1: Generate rescue files (interrupt simulation)
./case_interrupt.sh

# Phase 2: Recover and replay
./case_recovery.sh

# Validate
wproj data stat
wproj validate sink-file -v

中断与恢复 (中文)

本用例演示 sink 中断与恢复流程:当业务 sink 写入失败时,数据会落入 rescue/ 目录;随后通过恢复流程(wprescue daemon)将救急文件回放到原目标 sink。

核心要点:

  • 使用 test_rescue 作为业务 sink 的后端,周期性切换可用性(约每 2 秒一次)。
  • 中断阶段产生的救急文件命名形如:<sink_name>-YYYY-MM-DD_HH:MM:SS.dat.lock;句柄释放时去掉 .lock 才可参与恢复。
  • 恢复阶段扫描 rescue/*.dat(不含 .lock),按时间顺序取最新一个文件回放,并在成功后删除该文件。

目录结构(关键项)

  • usecase/core/sink_recovery/conf/wparse.toml:工作目录、速率、日志、命令通道等基础配置(rescue_root = "./data/rescue")。
  • usecase/core/sink_recovery/conf/wpsrc.toml:文件源(v2)读取 ./data/in_dat/gen.dat(默认)。
  • usecase/core/sink_recovery/sink/infra.d/:基础组(default/miss/residue/intercept/error/monitor)。
  • usecase/core/sink_recovery/sink/business.d/benchmark.toml:业务组 benchmark,目标 test_rescue,用于触发中断与救急文件。
  • usecase/core/sink_recovery/case_interrupt.sh:中断阶段 e2e 脚本(生成数据 -> 启动解析 -> 观察 rescue)。
  • usecase/core/sink_recovery/case_recovery.sh:恢复阶段 e2e 脚本(启动 wprescue work -> 回放 rescue -> 校验)。

快速开始

  1. 中断阶段(生成救急文件)
  • 进入用例目录并运行:
    usecase/core/sink_recovery/case_interrupt.sh
    
  • 该脚本会:
    • 预构建并初始化配置;
    • 通过 wpgen sample 生成 10000 行样本到 ./data/in_dat/gen.dat
    • 启动 wparse daemonbenchmark 组使用 test_rescue 后端,周期性中断触发救急写入;
    • 结束时打印 rescue/ 下的 .dat 文件及 wproj stat file 汇总。
  • 期望:rescue/ 目录出现至少一个 benchmark_file_sink-*.dat 文件。
  1. 恢复阶段(回放救急文件)
  • 在同一目录运行:
    usecase/core/sink_recovery/case_recovery.sh
    
  • 该脚本会:
    • 启动 wprescue daemon --stat 100 进入恢复模式;
    • 等待片刻并发送 USR1 信号优雅结束;
    • 列出 rescue/、输出 wproj stat file 和一致性校验 wproj validate sink-file -v
  • 期望:
    • rescue/ 中的 .dat 文件被消费并删除;
    • sinks v2 下对应文件计数增加(例如 data/out_dat/benchmark.datdata/out_dat/default.dat 等)。

参考日志(logs/wprescue.log):

recover begin
recover file: ./data/rescue/benchmark_file_sink-2025-10-04_01:59:24.dat
recover begin! file : ./data/rescue/benchmark_file_sink-2025-10-04_01:59:24.dat
recover end! clean file : ./data/rescue/benchmark_file_sink-2025-10-04_01:59:24.dat
recover end

运行原理

  • 中断写入与救急文件

    • 业务 sink 后端为 test_rescue(见 usecase/core/sink_recovery/sink/benchmark/sink.toml),通过代理定时切换健康状态;
    • 写入失败时,SinkRuntime 切换到备份写出(rescue),创建 rescue/<sink>-YYYY-MM-DD_HH:MM:SS.dat.lock,释放句柄后重命名为 .dat
    • 相关实现:
      • 备份切换:src/sinks/runtime/manager.rs:120use_back_sink/swap_back
      • 文件锁/解锁:src/sinks/backends/file.rs.lock 后缀在 Drop/stop 时去除)。
  • 恢复回放

    • ActCovPicker 周期扫描 rescue/*.dat,按名称中的时间排序取最新文件;
    • 由文件名前缀解析 sink 名称(get_sink_name),通过 SinkRouteAgent.get_sink_agent 找到对应 sink 通道;
    • 将每一行作为 Raw 数据发送到该 sink;数据库类后端(Mysql/ClickHouse/Elasticsearch)会走 to_tdc(当前为 TODO 示例);
    • 成功读取完文件后删除 .dat 并更新检查点(断点记录 rescue/recover.lock)。
    • 相关实现:src/services/collector/recovery/mod.rs

排错建议

  • 统计为 0 或无数据写入:
    • 确认 rescue/ 存在 .dat(非 .lock)文件;
    • 确认 wprescue daemon 的工作目录与用例一致(conf/wparse.tomlrescue_root./data/rescue);
    • 确认业务 sink 名称与救急文件前缀一致(benchmark_file_sink-*.dat 对应 [[sink_group.sinks]].name = "benchmark_file_sink");
    • 如需查看恢复流程细节,查看 logs/wprescue.log 中的 recover 关键字。

可调参数

  • conf/wparse.toml
    • speed_limit 控制恢复读取速率(每秒行数上限)。
    • rescue_root 控制救急目录。
  • 脚本环境变量:
    • LINE_CNTSTAT_SEC 可通过导出覆盖(详见脚本中默认值)。

相关文件与命令

  • 运行脚本:
    • usecase/core/sink_recovery/case_interrupt.sh
    • usecase/core/sink_recovery/case_recovery.sh
  • 核心日志:usecase/core/sink_recovery/logs/wprescue.log
  • 校验工具:
    • wproj stat file 统计 sinks 输出行数
    • wproj validate sink-file -v 校验期望配置/占比

测试完整性与健壮性建议

为保证恢复链路在不同环境、边界条件下稳定可用,建议补充如下用例与核验点:

  • 场景覆盖

    • 多救急文件顺序回放:在 rescue/ 下造多个 <sink>-YYYY-MM-DD_HH:MM:SS.dat,确认按时间排序仅回放最新一个,且文件成功删除。
    • .lock.dat 共存:确认 .lock 被忽略,仅 .dat 参与恢复;强杀写入进程后残留 .lock 不影响恢复。
    • 空文件/空行:当前恢复读取逐行发送,建议保证救急文件无空行(与代码注释一致),并在工具侧对空行做显式跳过或报警。
    • 名称不匹配:当文件前缀与 sink 名称不一致时(get_sink_name 解析),应记录错误日志并跳过;建议添加该负例用例。
  • 断点续传与幂等

    • 中途中止 wprescue work(如 kill -USR1SIGINT),再次启动后应从 rescue/recover.lock 记录点续传,且已处理文件不重复回放。
    • 连续执行 case_recovery.sh 多次,目标 sink 的行数不应无限增长(不存在重复消费)。
  • 性能与压力

    • 调整 conf/wparse.tomlspeed_limit:分别测试低速(如 10)、高速(如 1e6)和默认值,观察吞吐、CPU、I/O。
    • 大文件恢复:准备 10^5~10^6 行救急文件,验证内存占用、指标发送与最终文件删除的及时性。
  • 故障注入与恢复

    • test_rescue 阶段错位:拉长或缩短切换周期,观察备份切换与 ActMaintainer 重连行为是否符合预期(warn_sink! reconnect 日志)。
    • 目标 sink 短暂不可用:在恢复过程中手动切断写入(如文件权限只读/目录不可写),确认:失败记账、重试、最终回退策略符合鲁棒性策略(Throw/Tolerant/FixRetry 等)。
  • 期望校验(expect)

  • sink/defaults.toml[defaults.expect] 设置 min_samples/sum_tol/others_max 等参数,wproj validate sink-file -v 观察是否给出清晰证据(denom/ratio/lines)。

    • 在业务 sink.toml 对单个 sink 配置 [[sinks]].expect(如 ratio/tol),校验实际占比是否在容差内。
  • 观测与日志

    • log_conf.level 临时提升至 debug,grep 关键字 recover begin|recover file|recover end|reconnect success,形成问题定位基线。
    • 校验 monitor 指标是否随恢复进度变化(SinkStat/pick_stat)。
  • 兼容性与路径

    • 确认 rescue_rootsink_rootout/ 等目录在不同平台(Linux/macOS)下权限与路径分隔符无差异问题。
    • 业务 sink 名称与救急文件前缀严格一致(例如 benchmark_file_sink),避免路由失败。
  • 后续改进点(建议)

    • to_tdc(数据库类后端的 TDC 转换)当前为 TODO,补齐实现后应新增单元/集成测试验证 SQL/批量写入逻辑。
    • test_rescue 的阶段时长暴露为环境变量,便于在 CI 中构造确定性时序。
    • 在 CI 中串行执行 case_interrupt.shcase_recovery.sh,并收集 wprescue.logwproj validate 结果作为工件。

TCP Roundtrip

This example demonstrates TCP input/output end-to-end data flow.

Purpose

Validate the ability to:

  • Push data via TCP sink from generator
  • Receive data via TCP source in parser
  • Process and output to file sinks
  • Verify data integrity through the TCP pipeline

Features Validated

FeatureDescription
TCP SinkPushing data to TCP endpoint
TCP SourceReceiving data from TCP port
End-to-End FlowComplete data path validation
Data IntegrityInput/output count verification

Quick Start

cd core/tcp_roundtrip
./run.sh

Steps

  1. Start the parser
wparse daemon --stat 5
  1. Generate data (push to TCP)
wpgen sample -n 10000 --stat 5
  1. Stop and validate
wproj stat sink-file
wproj validate sink-file -v --input-cnt 10000

TCP Roundtrip (中文)

目标:演示通用 TCP 输入/输出的端到端链路。

  • wpgen:通过 connect = "tcp_sink" 将样本数据推送到本机端口
  • wparse:启用 tcp_src 源监听同一端口,落地到文件 sink

步骤

  1. 启动解析器
wparse deamon --stat 5
  1. 生成数据(推送到 TCP)
wpgen sample -n 10000 --stat 5
  1. 停止解析器并校验
wproj stat sink-file
wproj validate sink-file -v --input-cnt 10000

关键文件

  • conf/wparse.toml:主配置(sources/sinks/model 路径)
  • models/sources/wpsrc.toml:source 列表(包含 tcp_src
  • conf/wpgen.toml:生成器配置(输出 tcp_sink 到本机端口)

WPL Missing

This example demonstrates WPL field missing tolerance and fault handling.

Purpose

Validate the ability to:

  • Handle missing required fields in input data
  • Process optional fields with \| syntax
  • Route incomplete data to the miss infrastructure group
  • Validate WPL rule fault tolerance

Features Validated

FeatureDescription
Optional Field SyntaxUsing | to mark optional fields
Miss Group RoutingRouting records with missing required fields
Fault ToleranceContinuing parsing when optional fields are absent
Data CompletenessValidating expected miss/success ratios

Field Handling

Field TypeBehavior When Missing
Required (no |)Record routes to miss group
Optional (with |)Parsing continues, field is empty
Parse ErrorRecord routes to residue or error group

WPL Syntax Example

package /benchmark {
    rule benchmark_1 {
        (digit:id, digit:len, time, sn, chars:dev_name\|, ...)
        # dev_name is optional (marked with \|)
    }
}

Quick Start

cd core/wpl_missing
./run.sh

wpl_missing (中文)

本用例演示“WPL 字段缺失容错“的场景:当输入数据中某些字段不存在或解析失败时,系统如何处理缺失字段并将数据路由到相应的基础组(miss)。适用于验证 WPL 规则的容错性与数据完整性校验。

目录结构

  • conf/:配置目录
    • conf/wparse.toml:主配置
  • models/:规则与路由
    • models/wpl/benchmark/:WPL 解析规则
      • parse.wpl:解析规则
      • gen_rule.wpl:生成规则(包含缺失字段的样本)
    • models/sinks/business.d/:业务路由
    • models/sinks/infra.d/:基础组(含 miss 组)
    • models/sources/wpsrc.toml:源配置
  • data/:运行数据目录

WPL 容错机制

可选字段语法

在 WPL 规则中,使用 \| 标记可选字段:

package /benchmark {
    rule benchmark_1 {
        (digit:id, digit:len, time, sn, chars:dev_name\|, ...)
    }
}

缺失字段处理

  • 必需字段缺失:整条记录路由到 miss 基础组
  • 可选字段缺失:记录继续解析,缺失字段为空值
  • 解析失败:记录路由到 residueerror 基础组

解析规则示例

parse.wpl

package /benchmark {
    rule benchmark_1 {
        (digit:id, digit:len, time, sn, chars:dev_name, time, kv, sn,
         chars:dev_name, time, time, ip, kv, chars, kv, kv, chars, kv, kv,
         chars, chars, ip, chars, http/request<[,]>, http/agent")\,
    }
    rule benchmark_2 {
        (ip:src_ip, digit:port, chars:dev_name, ip:dst_ip, digit:port,
         time", kv, kv, sn, kv, ip, kv, chars, kv, sn, kv, kv, time, chars,
         time, sn, kv, chars, chars, ip, chars, http/request", http/agent")\,
    }
}

快速使用

构建项目

cargo build --workspace --all-features

运行用例

cd core/wpl_missing
./run.sh

脚本执行流程:

  1. 初始化环境与配置
  2. 使用 wpgen 生成包含缺失字段的样本数据
  3. 运行 wparse 批处理解析
  4. 校验输出统计(验证 miss 组有数据)

手动执行

# 初始化配置
wproj check
wproj data clean
wpgen data clean

# 生成样本数据
wpgen sample -n 1000

# 运行批处理解析
wparse batch --stat 2 -p

# 校验输出
wproj data stat
wproj data validate

可调参数

  • LINE_CNT:生成行数(默认 1000)
  • STAT_SEC:统计间隔秒数(默认 2)

基础组说明

组名用途预期行为
default默认输出未路由到业务组的数据
miss缺失字段必需字段缺失的数据
residue残留数据部分解析成功的数据
error错误数据处理出错的数据
monitor监控指标系统指标输出

期望配置

models/sinks/defaults.toml 中设置期望:

[defaults.expect]
basis = "total_input"
min_samples = 1
mode = "warn"

对于 miss 组,可以设置合理的上限:

# models/sinks/infra.d/miss.toml
[[sink_group]]
name = "/sink/infra/miss"

[sink_group.expect]
max = 0.1  # 缺失数据不超过 10%

常见问题

Q1: miss 组数据过多

  • 检查 WPL 规则是否与输入数据格式匹配
  • 确认必需字段在输入数据中存在
  • 考虑将某些字段改为可选(添加 \| 标记)

Q2: 如何区分 miss 和 residue

  • miss:必需字段缺失导致规则无法匹配
  • residue:规则部分匹配但有残留内容

Q3: 可选字段默认值

  • 可选字段缺失时默认为空值
  • 可在 OML 中使用 take(option:[field]) 处理

相关文档

WPL Pipe

This example demonstrates WPL pipeline preprocessing for handling encoded or escaped data before parsing.

Purpose

Validate the ability to:

  • Preprocess input data with pipeline operations before parsing
  • Decode Base64 encoded content
  • Handle quoted and escaped strings
  • Chain multiple preprocessing operations

Features Validated

FeatureDescription
Pipeline Syntax|操作| prefix for preprocessing
Base64 Decodingdecode/base64 operation
Unquote/Unescapeunquote/unescape for quoted strings
Trimtrim for whitespace removal
Operation ChainingLeft-to-right operation execution

Pipeline Operations

OperationInputOutput
decode/base64eyJhIjogMX0={"a": 1}
unquote/unescape"{ \"a\": 1 }"{ "a": 1 }
trim data data

WPL Syntax Example

package /pipe_demo {
    rule fmt_from_base64 {
        // 1) Base64 decode
        // 2) Unquote and unescape
        // 3) Parse JSON
        |decode/base64|unquote/unescape|(json(_@_origin))
    }
}

Quick Start

cd core/wpl_pipe
./run.sh

wpl_pipe (中文)

本用例演示“WPL 管道预处理“的场景:在正式解析前,通过管道操作对输入数据进行预处理(如 Base64 解码、引号转义还原等),然后再进行 JSON 或其他格式的解析。适用于处理编码、转义等复杂格式的日志数据。

目录结构

  • conf/:配置目录
    • conf/wparse.toml:主配置
  • models/:规则与路由
    • models/wpl/pipe_demo/:管道处理规则
      • parse.wpl:解析规则(含管道预处理)
    • models/sinks/business.d/:业务路由
    • models/sinks/infra.d/:基础组
    • models/sources/wpsrc.toml:源配置
  • data/:运行数据目录

WPL 管道语法

管道操作符

在规则开头使用 |操作| 定义预处理管道:

rule example {
    |操作1|操作2|...|( 解析规则 )
}

常用管道操作

  • decode/base64:Base64 解码
  • unquote/unescape:去除外层引号并还原转义字符
  • trim:去除首尾空白

解析规则示例

parse.wpl

package /pipe_demo {
   #[copy_raw(name : "_origin")]
   rule fmt_from_quote {
        // 输入: "{ \"a\": 1, \"b\": \"foo\" }"
        // 1) 去除外层引号并还原内部转义引号
        // 2) 解析根级 JSON 为字段
        |unquote/unescape|(json(_@_origin))
   }

   #[copy_raw(name : "_origin")]
    rule fmt_from_base64 {
        // 输入: base64("{ \"a\": 2, \"b\": \"bar\" }")
        // 1) Base64 解码
        // 2) 去除外层引号并还原转义
        // 3) 解析 JSON 为字段
        |decode/base64|unquote/unescape|(json(_@_origin))
    }
}

处理流程

  1. 输入"eyJhIjogMX0=" (Base64 编码的 {"a": 1})
  2. decode/base64{"a": 1}
  3. unquote/unescape{"a": 1} (如有引号包裹则去除)
  4. json解析:提取 a=1

快速使用

构建项目

cargo build --workspace --all-features

运行用例

cd core/wpl_pipe
./run.sh

脚本执行流程:

  1. 初始化环境与配置
  2. 生成 Base64/转义格式的样本数据
  3. 运行 wparse 批处理解析
  4. 校验输出统计

手动执行

# 初始化配置
wproj check
wproj data clean
wpgen data clean

# 生成样本数据(Base64+引号转义 JSON)
wpgen sample -n 1000 --stat 2

# 运行批处理解析
wparse batch --stat 2 -S 1 -p -n 1000

# 校验输出
wproj data stat
wproj data validate

可调参数

  • LINE_CNT:生成行数(默认 1000)
  • STAT_SEC:统计间隔秒数(默认 2)

管道操作详解

decode/base64

将 Base64 编码的字符串解码为原始内容:

输入: eyJhIjogMX0=
输出: {"a": 1}

unquote/unescape

处理带引号和转义的字符串:

输入: "{ \"a\": 1, \"b\": \"hello\" }"
输出: { "a": 1, "b": "hello" }

转义字符处理:

  • \""
  • \\\
  • \n → 换行
  • \t → 制表符

组合使用

多个管道操作从左到右依次执行:

|decode/base64|unquote/unescape|(...)

注解说明

copy_raw

#[copy_raw(name : "_origin")] 注解用于保留原始输入:

  • 将原始输入复制到 _origin 字段
  • 便于后续审计或调试

常见问题

Q1: Base64 解码失败

  • 确认输入是有效的 Base64 编码
  • 检查是否有 URL 安全 Base64(需要不同的解码器)
  • 查看 data/logs/wparse.log 中的错误信息

Q2: 转义还原不完整

  • 确认转义格式与 unquote/unescape 支持的格式一致
  • 某些特殊转义可能需要自定义处理

Q3: 管道顺序错误

  • 管道从左到右执行
  • 先解码(decode)再去引号(unquote)

相关文档

WPL Success

This example validates successful full-chain WPL parsing with multiple security alert log formats.

Purpose

Validate the ability to:

  • Parse multiple security alert types successfully
  • Apply data_type tags via rule annotations
  • Route parsed data to appropriate business sinks
  • Achieve high parsing success rates

Features Validated

FeatureDescription
Multi-Rule ParsingParsing 6+ security alert types
Rule Annotations#[tag(data_type: "...")] for tagging
Data Type RoutingRouting based on data_type tags
Success Rate ValidationVerifying high parsing success rates

Supported Alert Types

Alert Typedata_type TagDescription
webids_alertwebids_alertWeb intrusion detection alerts
webshell_alertwebshell_alertWebshell detection alerts
ips_alertips_alertIntrusion prevention system alerts
ioc_alertioc_alertThreat intelligence alerts
systemsystemSystem logs
auditauditAudit logs

WPL Syntax Example

package /qty {
    #[tag(data_type: "webids_alert")]
    rule webids_alert {
        (symbol(webids_alert), chars:serialno, chars:rule_id, ...)
    }

    #[tag(data_type: "audit")]
    rule audit {
        (kv(chars@username), kv(chars@serialno), ...)
    }
}

Quick Start

cd core/wpl_success
./run.sh

wpl_success (中文)

本用例演示“WPL 成功解析全链路“的场景:验证 WPL 规则能够成功解析多种安全告警日志格式(webids_alert、webshell_alert、ips_alert、ioc_alert、system、audit 等),并正确路由到业务 sink。适用于验证解析规则的正确性与完整性。

目录结构

  • conf/:配置目录
    • conf/wparse.toml:主配置
  • models/:规则与路由
    • models/wpl/qty_alert/:安全告警解析规则
      • parse.wpl:多种告警类型的解析规则
    • models/oml/:OML 转换模型
    • models/sinks/business.d/:业务路由
    • models/sinks/infra.d/:基础组
    • models/sources/wpsrc.toml:源配置
  • data/:运行数据目录
  • out/:输出目录

支持的告警类型

解析规则 (models/wpl/qty_alert/parse.wpl)

本用例支持以下安全告警类型:

告警类型data_type 标签说明
webids_alertwebids_alertWeb 入侵检测告警
webshell_alertwebshell_alertWebshell 检测告警
ips_alertips_alert入侵防护系统告警
ioc_alertioc_alert威胁情报告警
systemsystem系统日志
auditaudit审计日志

规则示例

package /qty {
    #[tag(data_type: "webids_alert")]
    rule webids_alert {
        (symbol(webids_alert), chars:serialno, chars:rule_id, chars:rule_name,
         time_timestamp:write_date, chars:vuln_type, ip:sip, digit:sport,
         ip:dip, digit:dport, digit:severity, chars:host, chars:parameter,
         chars:uri, chars:filename, chars:referer, chars:method, chars:vuln_desc,
         time:public_date, chars:vuln_harm, chars:solution, chars:confidence,
         chars:victim_type, chars:attack_flag, chars:attacker, chars:victim,
         digit:attack_result, chars:kill_chain, chars:code_language,
         time:loop_public_date, chars:rule_version, chars:xff, chars:vlan_id,
         chars:vxlan_id)\|\!
    }

    #[tag(data_type: "audit")]
    rule audit {
        (kv(chars@username), kv(chars@serialno), kv(chars@submod),
         kv(chars@detail), kv(time@updatetime), kv(ip@ip), kv(chars@sub2),
         kv(chars@log_type), kv(chars@module), kv(digit@sub_type))\|\!
    }
    // ... 更多规则
}

规则注解

  • #[tag(data_type: "...")]:为解析结果添加数据类型标签,用于后续路由分发

快速使用

构建项目

cargo build --workspace --all-features

运行用例

cd core/wpl_success
./run.sh

脚本执行流程:

  1. 初始化环境与配置
  2. 使用 wpgen 生成多种告警类型的样本数据
  3. 验证输入文件存在
  4. 运行 wparse 批处理解析
  5. 校验输出统计与期望

手动执行

# 初始化配置
wproj check
wproj data clean
wpgen data clean

# 生成样本数据
wpgen sample -n 3000 --stat 3

# 验证输入文件
test -s "./data/in_dat/gen.dat" || echo "missing input file"

# 运行批处理解析
wparse batch --stat 3 -p -n 3000

# 校验输出
wproj data stat
wproj data validate

可调参数

  • LINE_CNT:生成行数(默认 3000)
  • STAT_SEC:统计间隔秒数(默认 3)

期望配置

成功解析期望

本用例的目标是验证解析成功率,期望配置示例:

[defaults.expect]
basis = "total_input"
min_samples = 100
mode = "warn"

# 业务组期望高成功率
[sink_group.expect]
ratio = 0.95    # 期望 95% 的数据成功解析
tol = 0.05      # 容差 ±5%

基础组期望

# miss 组期望较低
[sink_group.expect]
max = 0.05      # 缺失数据不超过 5%

# error 组期望为 0
[sink_group.expect]
max = 0.01      # 错误数据不超过 1%

字段提取说明

必需字段(无 \| 标记)

缺失时整条记录进入 miss 组

可选字段(带 \|! 标记)

缺失时记录继续解析,字段为空

KV 格式解析

kv(chars@username)  # 解析 key=value 格式,提取 username
kv(time@updatetime) # 解析 key=value 格式,提取时间类型的 updatetime

常见问题

Q1: 解析成功率低

  • 检查样本数据格式是否与 WPL 规则匹配
  • 确认字段分隔符与规则定义一致
  • 查看 data/logs/wparse.log 中的解析错误

Q2: 某种告警类型未匹配

  • 确认告警类型前缀正确(如 symbol(webids_alert)
  • 检查字段数量与顺序是否一致

Q3: 数据标签未生效

  • 确认规则注解格式正确:#[tag(data_type: "...")]
  • 确认 OML/Sink 路由配置使用了对应标签

相关文档

Extension Connectors

This directory contains extension connector examples demonstrating WarpParse integration with various external systems (databases, message queues, log storage, monitoring).

Case List

CasePurposeValidated Features
dorisFile Source → Doris Stream Load pipelineDoris sink, Stream Load, batch processing
kafkaKafka Source/Sink integrationKafka consumer/producer, topic routing
practiceReal-world multi-source monitoring scenarioMulti-source collection, Fluent-bit, Kafka, VictoriaLogs, Grafana
tcp_mysqlTCP Source → MySQL Sink pipelineTCP source, MySQL sink, data persistence
tcp_victorialogsTCP Source → VictoriaLogs Sink pipelineTCP source, VictoriaLogs sink, log storage
victoriametricsInternal metrics push to VictoriaMetricsVictoriaMetrics sink, metrics export, monitoring

Common Structure

case_name/
├── conf/
│   ├── wparse.toml          # Engine configuration
│   └── wpgen.toml           # Data generator configuration
├── topology/
│   ├── sources/             # Source definitions
│   └── sinks/               # Sink definitions
├── models/
│   ├── wpl/                 # WPL parsing rules
│   └── oml/                 # OML transformation models
├── data/                    # Runtime data
├── docker-compose.yml       # Container orchestration
└── run.sh                   # Execution script

Quick Start

# Enter case directory
cd extensions/<case_name>

# Start dependent services
docker compose up -d

# Run the case
./run.sh

扩展连接器 (中文)

本目录包含扩展连接器示例,演示 WarpParse 与各种外部系统(数据库、消息队列、日志存储、监控)的集成。

用例清单

用例目的验证特性
doris文件源 → Doris Stream Load 管道Doris sink、Stream Load、批处理
kafkaKafka 源/汇集成Kafka 消费者/生产者、topic 路由
practice实战多源监控场景多源采集、Fluent-bit、Kafka、VictoriaLogs、Grafana
tcp_mysqlTCP 源 → MySQL Sink 管道TCP 源、MySQL sink、数据持久化
tcp_victorialogsTCP 源 → VictoriaLogs Sink 管道TCP 源、VictoriaLogs sink、日志存储
victoriametrics内部指标推送到 VictoriaMetricsVictoriaMetrics sink、指标导出、监控

通用结构

case_name/
├── conf/
│   ├── wparse.toml          # 引擎配置
│   └── wpgen.toml           # 数据生成器配置
├── topology/
│   ├── sources/             # 源定义
│   └── sinks/               # 汇定义
├── models/
│   ├── wpl/                 # WPL 解析规则
│   └── oml/                 # OML 转换模型
├── data/                    # 运行时数据
├── docker-compose.yml       # 容器编排
└── run.sh                   # 执行脚本

快速开始

# 进入用例目录
cd extensions/<case_name>

# 启动依赖服务
docker compose up -d

# 运行用例
./run.sh

doris 使用说明

前提

  1. 启动docker compose:docker compose up -d
  2. 等待doris启动后,创建test.sql中的库表
  3. 查看内容:
  • 进入:http://localhost:8030/Playground/result/wp_test-events_parsed页面
  • 执行查询语句:select * from events_parsed alt text

配置介绍

[connectors.params]

# Stream Load API 配置(新版)
endpoint = "http://localhost:8040"  # 使用 BE 的 HTTP 端口(推荐)
user = "root"   #用户名
password = ""   #密码
database = "test_db"    #数据库
table = "events_parsed" # 表名
timeout_secs = 30       # 超时时间
max_retries = 3         # 重试次数
batch_size = 100_0000   # 通用参数 批量大小

# 可选:自定义 Stream Load 参数 
[connectors.params.headers]
strip_outer_array = "false"
max_filter_ratio = "0.1"
strict_mode = "false"

Kafka

This directory provides an end-to-end Kafka validation case to verify the unified Kafka Source/Sink connectors work as expected.

  • Producer: wpgen writes sample data to Kafka (input topic, default wp.testcase.events.raw)
  • Engine: wparse reads from Kafka input, parses and routes to multiple sinks (including Kafka output topic, default wp.testcase.events.parsed, and a file sink)
  • Optional verification: Use wpkit kafka consume to verify messages on the Kafka output topic

Data Flow

The diagram below shows the data flow and key components (wp.testcase.events.raw/wp.testcase.events.parsed).

flowchart LR
    subgraph Producer
      WPGEN[wpgen sample]\n(按 wpgen.toml 写 Kafka)
    end
    subgraph Kafka
      KAFKA_IN[(KAFKA_INPUT_TOPIC)]
      KAFKA_OUT[(KAFKA_OUTPUT_TOPIC)]
    end
    subgraph Engine
      WPARSE[wparse batch\n(-n 限制条数自动退出)]
      SINKS{{Sink Group\n(models/sink)}}
      OML[OML 映射/脱敏]
    end
    subgraph Verifier
      FILE[file sink: events.parsed.prototext]
      CONSUME[wpkit kafka consume\n(可选验证)]
    end

    WPGEN -- produce --> KAFKA_IN
    WPARSE -- consume --> KAFKA_IN
    WPARSE -- route --> OML --> SINKS
    SINKS -- write --> FILE
    SINKS -- produce --> KAFKA_OUT
    CONSUME -- verify --> KAFKA_OUT

If Mermaid is not supported, refer to the ASCII version:

wpgen(sample) --> Kafka(KAFKA_INPUT_TOPIC) --> wparse(batch) --> [OML/route] --> sinks{file,kafka}
    sinks --> file: data/out_dat/events.parsed.prototext
    sinks --> Kafka(KAFKA_OUTPUT_TOPIC) --> (optional) wpkit kafka consume verification

Directory Structure

  • conf/
    • wparse.toml: Engine main config (directories/concurrency/logging, etc.)
    • wpgen.toml: Data generator config (points to Kafka sink, overrides input topic)
  • topology/source/wpsrc.toml: Source routing (contains two [[sources]]: kafka_input subscribes to input topic; kafka_output_tap subscribes to output topic for self-testing/demo, can be disabled as needed)
  • topology/sink/business.d/example.toml: Business sink routing (contains a file sink and a Kafka sink)
  • models/oml/...: OML models (result field mapping/masking)
  • case_verify.sh: One-click verification script (starts wparsewpgen sends → verification)

Note: Source and Sink connector IDs reference definitions in the repository root connectors/ directory:

  • connectors/source.d/30-kafka.toml: id=kafka_src (allows overriding topic/group_id/config)
  • connectors/sink.d/30-kafka.toml: id=kafka_sink (allows overriding topic/config/num_partitions/replication/brokers/fmt)

Prerequisites

  • Kafka running locally, default address localhost:9092 (or override via environment variables, see below)

Quick Start

Enter the case directory and run the script (default debug):

cd extensions/wp-connectors/testcase
./case_verify.sh            # or ./case_verify.sh release

Main script steps:

  1. Clean run directory (preserving conf/ templates), build binaries to target/<profile>, add to PATH
  2. wpkit conf check for config self-check; clean data directory
  3. Start wparse in background (-n limits processing count, auto-exits on completion)
  4. Run wpgen sample to generate sample data and write to Kafka input topic
  5. Wait for wparse to exit and perform file sink verification (optional)

Parameters

The script supports the following optional environment variables:

  • PROFILE: Build and run profile (debug|release), default debug
  • LINE_CNT: Number of sample records to generate/process, default 3000
  • STAT_SEC: Statistics print interval (seconds), default 3
  • KAFKA_BOOTSTRAP_SERVERS: Kafka address, default localhost:9092
  • KAFKA_INPUT_TOPIC: Input topic (wpgen writes, wparse consumes), default wp.testcase.events.raw
  • KAFKA_OUTPUT_TOPIC: Output topic (wparse Kafka sink writes), default wp.testcase.events.parsed

Example:

KAFKA_BOOTSTRAP_SERVERS=127.0.0.1:9092 KAFKA_INPUT_TOPIC=my_in KAFKA_OUTPUT_TOPIC=my_out ./case_verify.sh

Result Verification

  • File Sink: The script runs wpkit stat file and wpkit validate sink-file -v; output is available in data/out_dat/ as events.parsed.prototext (per models/sink/business.d/example.toml file sink config)
  • Kafka Output: Optionally run the following command to view output topic (recommend using a fresh group to avoid messages being consumed by other consumers)
wpkit kafka consume --brokers ${KAFKA_BOOTSTRAP_SERVERS:-localhost:9092} \
  --group wpkit-consume-$$ \
  --topic "${KAFKA_OUTPUT_TOPIC:-wp.testcase.events.parsed}"

Kafka (中文)

本目录提供一套基于 Kafka 的端到端校验用例,验证统一 Kafka Source/Sink 连接器是否按预期工作。

  • 发送端:wpgen 将样例数据写入 Kafka(输入 topic,默认 wp.testcase.events.raw
  • 引擎端:wparse 读取 Kafka 输入,解析并路由到多个 Sink(其中包含 Kafka 输出 topic,默认 wp.testcase.events.parsed,以及一个文件型 Sink)
  • 可选校验:使用 wpkit kafka consume 验证 Kafka 输出 topic 的消息

数据流图

下图展示 testcase 的数据流与关键环节(wp.testcase.events.raw/wp.testcase.events.parsed)。

flowchart LR
    subgraph Producer
      WPGEN[wpgen sample]\n(按 wpgen.toml 写 Kafka)
    end
    subgraph Kafka
      KAFKA_IN[(KAFKA_INPUT_TOPIC)]
      KAFKA_OUT[(KAFKA_OUTPUT_TOPIC)]
    end
    subgraph Engine
      WPARSE[wparse batch\n(-n 限制条数自动退出)]
      SINKS{{Sink Group\n(models/sink)}}
      OML[OML 映射/脱敏]
    end
    subgraph Verifier
      FILE[file sink: events.parsed.prototext]
      CONSUME[wpkit kafka consume\n(可选验证)]
    end

    WPGEN -- produce --> KAFKA_IN
    WPARSE -- consume --> KAFKA_IN
    WPARSE -- route --> OML --> SINKS
    SINKS -- write --> FILE
    SINKS -- produce --> KAFKA_OUT
    CONSUME -- verify --> KAFKA_OUT

如渲染不支持 Mermaid,可参考 ASCII 版:

wpgen(sample) --> Kafka(KAFKA_INPUT_TOPIC) --> wparse(batch) --> [OML/route] --> sinks{file,kafka}
    sinks --> file: data/out_dat/events.parsed.prototext
    sinks --> Kafka(KAFKA_OUTPUT_TOPIC) --> (optional) wpkit kafka consume 验证

目录结构

  • conf/
    • wparse.toml:引擎主配置(目录/并发/日志等)
    • wpgen.toml:数据生成器配置(已指向 Kafka sink,并覆写输入 topic)
  • topology/source/wpsrc.toml:Source 路由(包含两个 [[sources]]kafka_input 订阅输入 topic;kafka_output_tap 订阅输出 topic,用于自测/演示,可按需关闭)
  • topology/sink/business.d/example.toml:业务 Sink 路由(包含一个文件型 sink 与一个 Kafka sink)
  • models/oml/...:OML 模型(结果字段映射/脱敏)
  • case_verify.sh:一键校验脚本(启动 wparsewpgen 发送 → 校验)

说明:Source 与 Sink 连接器 id 引用仓库根目录 connectors/ 下的定义:

  • connectors/source.d/30-kafka.toml:id=kafka_src(允许覆写 topic/group_id/config
  • connectors/sink.d/30-kafka.toml:id=kafka_sink(允许覆写 topic/config/num_partitions/replication/brokers/fmt

前置要求

  • 本机已启动 Kafka,默认地址 localhost:9092(或通过环境变量覆盖,见下文)

快速开始

进入用例目录并运行脚本(默认 debug):

cd extensions/wp-connectors/testcase
./case_verify.sh            # 或 ./case_verify.sh release

脚本主要步骤:

  1. 清理运行目录(保留 conf/ 模板)并构建二进制到 target/<profile>,加入 PATH
  2. wpkit conf check 进行配置自检;清理数据目录
  3. 后台启动 wparse-n 限制处理条数,完成后自动退出)
  4. 执行 wpgen sample 生成样例数据并写入 Kafka 输入 topic
  5. 等待 wparse 退出并进行文件型 sink 校验(可选)

运行参数

脚本支持以下可选环境变量:

  • PROFILE:构建与运行的 profile(debug|release),默认 debug
  • LINE_CNT:生成/处理的样例条数,默认 3000
  • STAT_SEC:统计打印间隔(秒),默认 3
  • KAFKA_BOOTSTRAP_SERVERS:Kafka 地址,默认 localhost:9092
  • KAFKA_INPUT_TOPIC:输入 topic(wpgen 写入、wparse 消费),默认 wp.testcase.events.raw
  • KAFKA_OUTPUT_TOPIC:输出 topic(wparse 的 Kafka sink 写入),默认 wp.testcase.events.parsed

示例:

KAFKA_BOOTSTRAP_SERVERS=127.0.0.1:9092 KAFKA_INPUT_TOPIC=my_in KAFKA_OUTPUT_TOPIC=my_out ./case_verify.sh

结果验证

  • 文件型 Sink:脚本会执行 wpkit stat filewpkit validate sink-file -v,在 data/out_dat/ 下可见 events.parsed.prototext(按 models/sink/business.d/example.toml 的文件型 sink 配置)
  • Kafka 输出:可选执行以下命令查看输出 topic(建议使用全新 group,以免被其他消费者读走)
wpkit kafka consume --brokers ${KAFKA_BOOTSTRAP_SERVERS:-localhost:9092} \
  --group wpkit-consume-$$ \
  --topic "${KAFKA_OUTPUT_TOPIC:-wp.testcase.events.parsed}"

Practice - Real-World Multi-Source Monitoring

Overview

This example demonstrates a real-world scenario that includes log collection, parsing, writing to multiple backends, and monitoring. The diagram above illustrates the use case architecture.

alt text

  • wpgen periodically sends data to files, Fluent-bit TCP, and Kafka, simulating log sources.
  • Fluent-bit collects file logs and listens on port 5170, forwarding logs to wparse for parsing.
  • wparse listens on TCP and Kafka, receives various logs for parsing and classification, and routes them by log type (currently nginx only) to different outputs such as Kafka, files, and VictoriaLogs.

Deployment Architecture

1767163018145

You can use VictoriaLogs to view various logs: VLogs URL

Querying logs from Kafka input: 1767163283607

Querying logs from TCP input: 1767163217471

Monitoring

This example uses VictoriaMetrics to monitor key components including Fluent-bit, Kafka, and wparse. External monitoring is available via Grafana: Related Link 1767163356003

wp-monitor

wp-monitor is a proprietary metrics visualization component based on system architecture. Link: http://106.55.164.250:25816/wp-monitor. img.png

Grafana Monitoring

Credentials:

  • Grafana username: admin
  • Grafana password: admin

Currently we provide dashboards for Kafka and wparse:

Usage

Prerequisites: Docker images referenced in docker-compose must be pullable and match your CPU architecture.

  • Enter working directory: cd wp-examples/extensions/practice
  • Update the output address in fluent-bit.yml to the actual wparse address: 1767163583267
  • Start Docker components: docker compose up -d
  • Enter wparse working directory: cd parse-work
  • Start wparse: wparse daemon --stat 10 -p> data/logs/wparse-info.log 2>&1 &
  • Start data generators:
    • wpgen sample -c wpgen-kafka.toml --stat 10 -p > data/logs/wpgen-kafka.log 2>&1 &
    • wpgen sample -c wpgen-tcp.toml --stat 10 -p > data/logs/wpgen-tcp.log 2>&1 &
    • wpgen sample -c wpgen-file.toml --stat 10 -p > data/logs/wpgen-file.log 2>&1 &

Method 2: One-Click Start Script

The script depends on nohup, which is typically available on Linux systems.

  • Enter working directory: cd wp-examples/extensions/practice
  • Update the output address in fluent-bit.yml to the actual wparse address: 1767163583267
  • Start Docker components: docker compose up -d
  • Start: ./run.sh
  • Stop: ./stop.sh

Practice - 实战多源监控场景 (中文)

示例介绍

本示例是一个基于实践场景下的用例。该场景包括了日志收集、解析、入多个库、监控,上面是该用例的示意图。 alt text

  • wpgen定期向文件、fluent-bit TCP、Kafka发送数据,模拟日志来源。
  • fluent-bit收集文件日志、监听5170端口,将日志转发到wparse做解析。
  • wparse监听TCP和Kafka,将接收各类日志进行解析和分类,并根据日志类型(目前只有nginx)转发到不同的输出源,如kafka、文件、vlogs。

部署结构

1767163018145

可以使用vlogs进行查看:各类日志Vlog地址

查询kafka输入的日志 1767163283607 查询TCP输入的日志 1767163217471

监控

本例子使用Victoria-metrics对fluent-bit、kafka、以及wparse几个关键组件的监控。外部可以使用grafana进行监控:相关链接 1767163356003

wp-monitor监控

wp-monitor是我司自研的一款基于系统结构的指标展示组件,链接:http://106.55.164.250:25816/wp-monitorimg.png

grafana监控

账号:

  • grafana账号:admin
  • grafana密码:admin

目前我们提供了kafka和wparse的仪表盘

项目使用

前置条件:可以拉取docker-copmose中的镜像,并且镜像的CPU架构一致

方式1 手工启动(MAC推荐)

  • 进入工作目录:cd wp-examples/extensions/practice

  • 将fluent-bit.yml中输出地址信息改为实际的wparse地址: 1767163583267

  • 启动docker相关组件:docker compose up -d

  • 进入wparse工作目录:cd parse-work

  • 启动wparse:wparse daemon --stat 10 -p> data/logs/wparse-info.log 2>&1 &

  • 启动发送相关工具:

    • wpgen sample -c wpgen-kafka.toml --stat 10 -p > data/logs/wpgen-kafka.log 2>&1 &
    • wpgen sample -c wpgen-tcp.toml --stat 10 -p > data/logs/wpgen-tcp.log 2>&1 &
    • wpgen sample -c wpgen-file.toml --stat 10 -p > data/logs/wpgen-file.log 2>&1 &

方式二 一键化启动脚本

脚本依赖于nohup,一般linux会带此工具

  • 进入工作目录:cd wp-examples/extensions/practice
  • 将fluent-bit.yml中输出地址信息改为实际的wparse地址: 1767163583267
  • 启动docker相关组件:docker compose up -d
  • 执行:./run.sh
  • 停止:./stop.sh

MySQL

This directory provides an end-to-end TCP-based MySQL ingestion case to verify the unified TCP Source and MySQL Sink connectors work as expected.

  • Producer: wpgen sends sample data via TCP protocol to a specified port (default 19001)
  • Engine: wparse listens on TCP port to receive data, parses and routes to MySQL Sink for database ingestion
  • Verification: Verify data is correctly ingested by querying MySQL

Data Flow

The diagram below shows the tcp_mysql data flow and key components.

flowchart LR
    subgraph Producer
      WPGEN["wpgen sample<br/>(sends TCP data per wpgen.toml)"]
    end
    subgraph TCP
      TCP_PORT[(TCP Port<br/>19001)]
    end
    subgraph Engine
      WPARSE["wparse daemon<br/>(listens on TCP port)"]
      SINKS{{"Sink Group<br/>(models/sink)"}}
      OML[OML Mapping]
    end
    subgraph Database
      MYSQL[(MySQL<br/>nginx_logs table)]
    end

    WPGEN -- produce --> TCP_PORT
    TCP_PORT -- consume --> WPARSE
    WPARSE -- route --> OML --> SINKS
    SINKS -- write --> MYSQL

If Mermaid is not supported, refer to the ASCII version:

wpgen(sample) --> TCP(TCP:19001) --> wparse(daemon) --> [OML/route] --> sinks{mysql}
    sinks --> MySQL: nginx_logs table

Directory Structure

  • conf/
    • wparse.toml: Engine main config (directories/concurrency/logging, etc.)
    • wpgen.toml: Data generator config (points to TCP sink, configured with port)
  • topology/sources/wpsrc.toml: Source routing (contains tcp_1 listening on port 19001)
  • topology/sinks/business.d/all.toml: Business sink routing (contains MySQL sink with column configuration)
  • models/oml/nginx.oml: OML model (result field mapping/masking)
  • models/wpl/nginx/: WPL parsing rules and sample data
  • preparatory_work.sql: MySQL table schema definition
  • run.sh: One-click run script

Note: Source and Sink connector IDs reference definitions in the repository root connectors/ directory:

  • connectors/source.d/20-tcp.toml: id=tcp_src (allows overriding port/prefer_newline)
  • connectors/sink.d/20-mysql.toml: id=mysql_sink (allows overriding table/columns/dsn, etc.)

Prerequisites

  • MySQL running locally, default address 127.0.0.1:3306 (or override via environment variables, see below)
  • Ensure the nginx_logs table is created in the target database (execute create_table.sql)
  • Note: Custom database tables require a mandatory wp_event_id field as primary key with BIGINT type

Quick Start

Enter the case directory and run the script (default debug):

cd extensions/tcp_mysql
./run.sh            # or ./run.sh release

Main script steps:

  1. wproj check for config self-check, clean data directory
  2. Start wparse daemon in background (listening on TCP port 19001)
  3. Run wpgen sample to generate sample data and send via TCP
  4. Wait for data ingestion, stop wparse
  5. Run wproj data stat and wproj data validate for verification

Parameters

The script supports the following optional environment variables:

  • LINE_CNT: Number of sample records to generate/process, default 100
  • SPEED_MAX: Maximum send rate (records/sec), default 5000

Example:

LINE_CNT=1000 SPEED_MAX=10000 ./run.sh

Configuration

wpgen.toml (Data Generator Config)

[generator]
mode = "sample"
count = 1000        # Number of samples to generate
speed = 0           # Send rate limit, 0 means unlimited
parallel = 4        # Concurrency

[output]
name = "gen_out"
connect = "tcp_sink"
params = { port = 19001 }

wparse.toml (Engine Config)

[models]
wpl = "./models/wpl"
oml = "./models/oml"

[topology]
sources = "./topology/sources"
sinks = "./topology/sinks"

[performance]
parse_workers = 2   # Parse concurrency
rate_limit_rps = 0  # Rate limit, 0 means unlimited

topology/sinks/business.d/all.toml (MySQL Sink Config)

[sink_group]
name = "all"
rule = ["/*"]
parallel = 8

[[sink_group.sinks]]
name = "main"
connect = "mysql_sink"
params = {
    columns = ["sip", "timestamp", "http/request", "status", "size", "referer", "http/agent", "wp_event_id"]
}

Database Setup

Execute the following SQL to create the target table:

mysql -h 127.0.0.1 -u root -p wparse < preparatory_work.sql

Or copy the contents of preparatory_work.sql directly into the MySQL client.

Result Verification

  • MySQL ingestion verification: Connect to the database and query the nginx_logs table to confirm record count and data
mysql -h 127.0.0.1 -u root -p your_database -e "SELECT COUNT(*) FROM nginx_logs; SELECT * FROM nginx_logs LIMIT 100;"
  • Data statistics: wproj data stat outputs processing statistics for each stage
  • Data validation: wproj data validate verifies input/output data consistency

FAQ

  • Connection failed: Confirm MySQL service is running, user has access to the target database, and the table has been created
  • Port conflict: Ensure port 19001 is not in use, or modify the port in topology/sources/wpsrc.toml
  • No data ingested: Check log files under data/logs/ to confirm TCP connection and parsing are working
  • Field mismatch: Verify that columns in topology/sinks/business.d/all.toml matches the create_table.sql table structure

MySQL (中文)

本目录提供一套基于 TCP 传输的端到端 MySQL 入库用例,验证统一 TCP Source 与 MySQL Sink 连接器是否按预期工作。

  • 发送端:wpgen 将样例数据通过 TCP 协议发送到指定端口(默认 19001)
  • 引擎端:wparse 监听 TCP 端口接收数据,解析并路由到 MySQL Sink 完成入库
  • 验证端:通过 MySQL 查询验证数据是否正确入库

数据流图

下图展示 tcp_mysql 的数据流与关键环节。

flowchart LR
    subgraph Producer
      WPGEN["wpgen sample<br/>(按 wpgen.toml 发送 TCP 数据)"]
    end
    subgraph TCP
      TCP_PORT[(TCP 端口<br/>19001)]
    end
    subgraph Engine
      WPARSE["wparse daemon<br/>(监听 TCP 端口)"]
      SINKS{{"Sink Group<br/>(models/sink)"}}
      OML[OML 映射/脱敏]
    end
    subgraph Database
      MYSQL[(MySQL<br/>nginx_logs 表)]
    end

    WPGEN -- produce --> TCP_PORT
    TCP_PORT -- consume --> WPARSE
    WPARSE -- route --> OML --> SINKS
    SINKS -- write --> MYSQL

如渲染不支持 Mermaid,可参考 ASCII 版:

wpgen(sample) --> TCP(TCP:19001) --> wparse(daemon) --> [OML/route] --> sinks{mysql}
    sinks --> MySQL: nginx_logs 表

目录结构

  • conf/
    • wparse.toml:引擎主配置(目录/并发/日志等)
    • wpgen.toml:数据生成器配置(已指向 TCP sink,并配置端口)
  • topology/sources/wpsrc.toml:Source 路由(包含 tcp_1 监听 19001 端口)
  • topology/sinks/business.d/all.toml:业务 Sink 路由(包含 MySQL sink,配置入库字段)
  • models/oml/nginx.oml:OML 模型(结果字段映射/脱敏)
  • models/wpl/nginx/:WPL 解析规则与样例数据
  • preparatory_work.sql:MySQL 表结构定义
  • run.sh:一键运行脚本

说明:Source 与 Sink 连接器 id 引用仓库根目录 connectors/ 下的定义:

  • connectors/source.d/20-tcp.toml:id=tcp_src(允许覆写 port/prefer_newline
  • connectors/sink.d/20-mysql.toml:id=mysql_sink(允许覆写 table/columns/dsn 等)

前置要求

  • 本机已启动 MySQL,默认地址 127.0.0.1:3306(或通过环境变量覆盖,见下文)
  • 确保目标数据库中已创建 nginx_logs 表(执行 create_table.sql
  • 注意自定义数据库表需要其中必要的字段wp_event_id作为主键且为BIGINT类型

快速开始

进入用例目录并运行脚本(默认 debug):

cd extensions/tcp_mysql
./run.sh            # 或 ./run.sh release

脚本主要步骤:

  1. wproj check 进行配置自检,清理数据目录
  2. 后台启动 wparse daemon(监听 TCP 19001 端口)
  3. 执行 wpgen sample 生成样例数据并通过 TCP 发送
  4. 等待数据入库,停止 wparse
  5. 执行 wproj data statwproj data validate 进行校验

运行参数

脚本支持以下可选环境变量:

  • LINE_CNT:生成/处理的样例条数,默认 100
  • SPEED_MAX:最大发送速率(条/秒),默认 5000

示例:

LINE_CNT=1000 SPEED_MAX=10000 ./run.sh

配置说明

wpgen.toml(数据生成器配置)

[generator]
mode = "sample"
count = 1000        # 生成样例数量
speed = 0           # 发送速率限制,0 表示不限速
parallel = 4        # 并发数

[output]
name = "gen_out"
connect = "tcp_sink"
params = { port = 19001 }

wparse.toml(引擎配置)

[models]
wpl = "./models/wpl"
oml = "./models/oml"

[topology]
sources = "./topology/sources"
sinks = "./topology/sinks"

[performance]
parse_workers = 2   # 解析并发数
rate_limit_rps = 0  # 限速,0 表示不限速

topology/sinks/business.d/all.toml(MySQL Sink 配置)

[sink_group]
name = "all"
rule = ["/*"]
parallel = 8

[[sink_group.sinks]]
name = "main"
connect = "mysql_sink"
params = {
    columns = ["sip", "timestamp", "http/request", "status", "size", "referer", "http/agent", "wp_event_id"]
}

数据库准备

执行以下 SQL 创建目标表:

mysql -h 127.0.0.1 -u root -p wparse < preparatory_work.sql

或直接复制 preparatory_work.sql 内容到 MySQL 客户端执行。

结果验证

  • MySQL 入库验证:连接数据库查询 nginx_logs 表,确认记录数与数据内容
mysql -h 127.0.0.1 -u root -p your_database -e "SELECT COUNT(*) FROM nginx_logs; SELECT * FROM nginx_logs LIMIT 100;"
  • 数据统计:wproj data stat 会输出各阶段处理统计
  • 数据校验:wproj data validate 会校验输入输出数据一致性

常见问题排查

  • 连接失败:确认 MySQL 服务已启动,用户有目标数据库访问权限,表已创建
  • 端口冲突:确保 19001 端口未被占用,或修改 topology/sources/wpsrc.toml 中的端口配置
  • 无数据入库:检查 data/logs/ 下的日志文件,确认 TCP 连接与解析是否正常
  • 字段不匹配:确认 topology/sinks/business.d/all.toml 中的 columnscreate_table.sql 表结构一致

TCP to VictoriaLogs

This directory provides an end-to-end TCP-based VictoriaLogs ingestion case to verify the unified TCP Source and VictoriaLogs Sink connectors work as expected.

  • Producer: wpgen sends sample data via TCP protocol to a specified port (default 19001)
  • Engine: wparse listens on TCP port to receive data, parses and routes to VictoriaLogs Sink for log storage
  • Verification: Verify data is correctly ingested via VictoriaLogs HTTP API queries

Data Flow

The diagram below shows the tcp_victorialogs data flow and key components.

graph LR
    subgraph P[Producer]
        A[wpgen sample]
    end
    subgraph T[TCP]
        B[Tcp19001]
    end
    subgraph E[Engine]
        C[wparse daemon]
        D[Sinks]
    end
    subgraph DB[Database]
        E1[VictoriaLogs]
        E2[File backup]
    end

    A --> B
    B --> C
    C --> D
    D --> E1
    D --> E2

If Mermaid is not supported, refer to the ASCII version:

wpgen(sample) --> TCP:19001 --> wparse(daemon) --> Sinks
                                              ├──> VictoriaLogs
                                              └──> all.dat

Directory Structure

  • conf/
    • wparse.toml: Engine main config (directories/concurrency/logging, etc.)
    • wpgen.toml: Data generator config (points to TCP sink, configured with port)
  • topology/sources/wpsrc.toml: Source routing (contains tcp_1 listening on port 19001)
  • topology/sinks/business.d/all.toml: Business sink routing (contains VictoriaLogs sink and file sink)
  • models/oml/nginx.oml: OML model (result field mapping/masking)
  • models/wpl/nginx/: WPL parsing rules and sample data
  • run.sh: One-click run script

Note: Source and Sink connector IDs reference definitions in the repository root connectors/ directory:

  • connectors/source.d/20-tcp.toml: id=tcp_src (allows overriding port/prefer_newline)
  • connectors/sink.d/40-victorialogs.toml: id=victorialogs_sink (allows overriding endpoint/insert_path/flush_interval_secs/create_time_field)

Field descriptions:

  • endpoint: VictoriaLogs HTTP address, default http://127.0.0.1:9428
  • insert_path: Insert path, default /insert/jsonline
  • flush_interval_secs: Interval for pushing parsed logs, default 3
  • create_time_field: Time field stored in VictoriaLogs (note: the field must be parsed using time functions in WPL), defaults to VictoriaLogs insertion timestamp when empty

Prerequisites

  • VictoriaLogs running locally, default address http://127.0.0.1:9428 (or override via environment variables, see below)
  • Ensure VictoriaLogs HTTP service is accessible
  • You can use the docker-compose.yml provided in this project to start VictoriaLogs; note that the provided docker-compose.yml limits queryable data to the last 400 days

Quick Start

Enter the case directory and run the script (default debug):

cd extensions/tcp_victorialogs
./run.sh            # or ./run.sh release

Main script steps:

  1. wproj check for config self-check, clean data directory
  2. Start wparse daemon in background (listening on TCP port 19001)
  3. Run wpgen sample to generate sample data and send via TCP
  4. Wait for data ingestion, stop wparse
  5. Run wproj data stat and wproj data validate for verification

Parameters

The script supports the following optional environment variables:

  • LINE_CNT: Number of sample records to generate/process, default 100
  • SPEED_MAX: Maximum send rate (records/sec), default 5000

Example:

LINE_CNT=1000 SPEED_MAX=10000 ./run.sh

Configuration

wpgen.toml (Data Generator Config)

[generator]
mode = "sample"
count = 1000        # Number of samples to generate
speed = 0           # Send rate limit, 0 means unlimited
parallel = 4        # Concurrency

[output]
name = "gen_out"
connect = "tcp_sink"
params = { port = 19001 }

wparse.toml (Engine Config)

[models]
wpl = "./models/wpl"
oml = "./models/oml"

[topology]
sources = "./topology/sources"
sinks = "./topology/sinks"

[performance]
parse_workers = 2   # Parse concurrency
rate_limit_rps = 0  # Rate limit, 0 means unlimited

topology/sinks/business.d/all.toml (Sink Config)

[sink_group]
name = "all"
rule = ["/*"]
parallel = 8

[[sink_group.sinks]]
name = "victorialogs_output"
connect = "victorialogs_sink"
params = {
    endpoint = "http://127.0.0.1:9428",
    insert_path = "/insert/jsonline",
    flush_interval_secs = 3,
    create_time_field = "timestamp"
}

[[sink_group.sinks]]
name = "main"
connect = "file_proto_sink"
params = { file = "all.dat" }

Result Verification

  • VictoriaLogs ingestion verification: Query data via HTTP API
curl 'http://127.0.0.1:9428/api/ui/query?query={_any="*"}&limit=10'
  • Data statistics: wproj data stat outputs processing statistics for each stage
  • Data validation: wproj data validate verifies input/output data consistency
  • File backup: all.dat file saves the original parsed data

VictoriaLogs Query Examples

Query by time range (limit 100 records):

curl --location --request POST 'http://localhost:9428/select/logsql/query' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--header 'Accept: */*' \
--header 'Host: localhost:9428' \
--header 'Connection: keep-alive' \
--data-urlencode 'query=*' \
--data-urlencode 'start=2025-12-27T09:30:20.251Z' \
--data-urlencode 'end=2025-12-29T09:30:20.251Z' \
--data-urlencode 'limit=100'

FAQ

  • Connection failed: Confirm VictoriaLogs service is running and HTTP port (default 9428) is accessible
  • Port conflict: Ensure port 19001 is not in use, or modify the port in topology/sources/wpsrc.toml
  • No data ingested: Check log files under data/logs/ to confirm TCP connection and parsing are working
  • No query results: Verify victorialogs_sink endpoint and insert_path are correctly configured
  • Time field issues: For time-series functionality, confirm create_time_field is set to an existing time field name in the data

TCP to VictoriaLogs (中文)

本目录提供一套基于 TCP 传输的端到端 VictoriaLogs 入库用例,验证统一 TCP Source 与 VictoriaLogs Sink 连接器是否按预期工作。

  • 发送端:wpgen 将样例数据通过 TCP 协议发送到指定端口(默认 19001)
  • 引擎端:wparse 监听 TCP 端口接收数据,解析并路由到 VictoriaLogs Sink 完成入库
  • 验证端:通过 VictoriaLogs HTTP API 查询验证数据是否正确入库

数据流图

下图展示 tcp_victorialogs 的数据流与关键环节。

graph LR
    subgraph P[Producer]
        A[wpgen sample]
    end
    subgraph T[TCP]
        B[Tcp19001]
    end
    subgraph E[Engine]
        C[wparse daemon]
        D[Sinks]
    end
    subgraph DB[Database]
        E1[VictoriaLogs]
        E2[File backup]
    end

    A --> B
    B --> C
    C --> D
    D --> E1
    D --> E2

如渲染不支持 Mermaid,可参考 ASCII 版:

wpgen(sample) --> TCP:19001 --> wparse(daemon) --> Sinks
                                              ├──> VictoriaLogs
                                              └──> all.dat

目录结构

  • conf/
    • wparse.toml:引擎主配置(目录/并发/日志等)
    • wpgen.toml:数据生成器配置(已指向 TCP sink,并配置端口)
  • topology/sources/wpsrc.toml:Source 路由(包含 tcp_1 监听 19001 端口)
  • topology/sinks/business.d/all.toml:业务 Sink 路由(包含 VictoriaLogs sink 与文件 sink)
  • models/oml/nginx.oml:OML 模型(结果字段映射/脱敏)
  • models/wpl/nginx/:WPL 解析规则与样例数据
  • run.sh:一键运行脚本

说明:Source 与 Sink 连接器 id 引用仓库根目录 connectors/ 下的定义:

  • connectors/source.d/20-tcp.toml:id=tcp_src(允许覆写 port/prefer_newline
  • connectors/sink.d/40-victorialogs.toml:id=victorialogs_sink(允许覆写 endpoint/insert_path/flush_interval_secs/create_time_field) 字段解释如下
    • endpoint:VictoriaLogs HTTP 地址,默认 http://127.0.0.1:9428
    • insert_path:插入路径,默认 /insert/jsonline
    • flush_interval_secs: 解析出的日志推送时间间隔,默认 3
    • create_time_field:存入 VictoriaLogs 中的时间字段(注意其中的字段是需要在WPL中使用时间函数解析),默认为空则采用插入 VictoriaLogs 的时间戳

前置要求

  • 本机已启动 VictoriaLogs,默认地址 http://127.0.0.1:9428(或通过环境变量覆盖,见下文)
  • 确保 VictoriaLogs 的 HTTP 服务可访问
  • 可以直接使用本项目中提供的docker-compose.yml启动VictoriaLogs,但是注意项目中提供的docker-compose.yml中能查询到的数据范围为最近400d的数据

快速开始

进入用例目录并运行脚本(默认 debug):

cd extensions/tcp_victorialogs
./run.sh            # 或 ./run.sh release

脚本主要步骤:

  1. wproj check 进行配置自检,清理数据目录
  2. 后台启动 wparse daemon(监听 TCP 19001 端口)
  3. 执行 wpgen sample 生成样例数据并通过 TCP 发送
  4. 等待数据入库,停止 wparse
  5. 执行 wproj data statwproj data validate 进行校验

运行参数

脚本支持以下可选环境变量:

  • LINE_CNT:生成/处理的样例条数,默认 100
  • SPEED_MAX:最大发送速率(条/秒),默认 5000

示例:

LINE_CNT=1000 SPEED_MAX=10000 ./run.sh

配置说明

wpgen.toml(数据生成器配置)

[generator]
mode = "sample"
count = 1000        # 生成样例数量
speed = 0           # 发送速率限制,0 表示不限速
parallel = 4        # 并发数

[output]
name = "gen_out"
connect = "tcp_sink"
params = { port = 19001 }

wparse.toml(引擎配置)

[models]
wpl = "./models/wpl"
oml = "./models/oml"

[topology]
sources = "./topology/sources"
sinks = "./topology/sinks"

[performance]
parse_workers = 2   # 解析并发数
rate_limit_rps = 0  # 限速,0 表示不限速

topology/sinks/business.d/all.toml(Sink 配置)

[sink_group]
name = "all"
rule = ["/*"]
parallel = 8

[[sink_group.sinks]]
name = "victorialogs_output"
connect = "victorialogs_sink"
params = {
    endpoint = "http://127.0.0.1:9428",
    insert_path = "/insert/jsonline",
    flush_interval_secs = 3,
    create_time_field = "timestamp"
}

[[sink_group.sinks]]
name = "main"
connect = "file_proto_sink"
params = { file = "all.dat" }

结果验证

  • VictoriaLogs 入库验证:通过 HTTP API 查询数据
curl 'http://127.0.0.1:9428/api/ui/query?query={_any="*"}&limit=10'
  • 数据统计:wproj data stat 会输出各阶段处理统计
  • 数据校验:wproj data validate 会校验输入输出数据一致性
  • 文件备份:all.dat 文件会保存原始解析后的数据

VictoriaLogs 查询示例

按时时间范围查询(限制 100 条):

curl --location --request POST 'http://localhost:9428/select/logsql/query' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--header 'Accept: */*' \
--header 'Host: localhost:9428' \
--header 'Connection: keep-alive' \
--data-urlencode 'query=*' \
--data-urlencode 'start=2025-12-27T09:30:20.251Z' \
--data-urlencode 'end=2025-12-29T09:30:20.251Z' \
--data-urlencode 'limit=100'

常见问题排查

  • 连接失败:确认 VictoriaLogs 服务已启动,HTTP 端口(默认 9428)可访问
  • 端口冲突:确保 19001 端口未被占用,或修改 topology/sources/wpsrc.toml 中的端口配置
  • 无数据入库:检查 data/logs/ 下的日志文件,确认 TCP 连接与解析是否正常
  • 查询无结果:确认 victorialogs_sinkendpointinsert_path 配置正确
  • 时间字段问题:如需使用时间序列功能,确认 create_time_field 配置为数据中存在的时间字段名

Benchmark Guide

The benchmark directory contains performance test cases based on benchmark/benchmark_common.sh. Test cases are organized by data source type, with each case containing different processing scenarios. This document explains the overall structure, common parameters, and the purpose of each test scenario.

Prerequisites

  1. All scripts run in release profile by default and depend on wparse/wpgen/wproj being in PATH.
  2. Run scripts from the benchmark directory or specific test directory.

Directory Structure

benchmark/
├── benchmark_common.sh      # Common function library (parameter parsing, environment initialization)
├── check_run.sh            # Batch test script (runs all tests with -m parameter)
├── models/                 # Shared model files
│   ├── wpl/               # WPL rule sets (nginx, sysmon, apt, aws, etc.)
│   └── oml/               # OML transformation models
├── sinks/                 # Shared sink configurations
│   ├── parse_to_blackhole/
│   ├── parse_to_file/
│   ├── trans_to_blackhole/
│   └── trans_to_file/
├── case_tcp/              # TCP data source test scenarios
├── case_file/             # File data source test scenarios
├── case_syslog_tcp/       # Syslog TCP test scenarios
├── case_syslog_udp/       # Syslog UDP test scenarios
└── wpgen_test/            # wpgen performance test

Test Scenarios

Processing Modes:

  • parse: Parse mode - pure WPL parsing only
  • trans: Transform mode - WPL parsing + OML transformation

Output Targets:

  • blackhole: Discards data, tests pure parsing/forwarding performance
  • file: Outputs to files, tests complete processing pipeline

Test Case Matrix

CaseModeInputOutputPurpose
case_tcp/parse_to_blackholeParseTCP (daemon)BlackholeTCP reception + pure parsing performance
case_tcp/parse_to_fileParseTCP (daemon)FileFull TCP-to-file parsing pipeline
case_tcp/trans_to_blackholeTransTCP (daemon)BlackholeParsing + OML transformation performance
case_tcp/trans_to_fileTransTCP (daemon)FileFull transformation pipeline
case_file/parse_to_blackholeParseFile (batch)BlackholePure parsing throughput
case_file/parse_to_fileParseFile (batch)FileFile-to-file parsing
case_file/trans_to_blackholeTransFile (batch)BlackholeParsing + OML transformation throughput
case_file/trans_to_fileTransFile (batch)FileFull transformation pipeline
case_syslog_tcp/parse_to_blackholeParseSyslog TCPBlackholeSyslog TCP pure parsing
case_syslog_tcp/trans_to_blackholeTransSyslog TCPBlackholeSyslog TCP parsing + transformation
case_syslog_udp/parse_to_blackholeParseSyslog UDPBlackholeSyslog UDP pure parsing
case_syslog_udp/trans_to_blackholeTransSyslog UDPBlackholeSyslog UDP parsing + transformation

Common Options

All test scripts share the following parameters (parsed by benchmark_parse_args in benchmark_common.sh):

ParameterDescriptionDefault
-mUse medium dataset (200K lines)20M lines
-fForce regenerate input dataSmart detection
-c <cnt>Specify line count (overrides -m)-
-w <cnt>Specify wparse worker count6 (daemon) / 10 (batch)
wpl_dirWPL rule set directory (positional)nginx
speedGeneration rate limit (lines/sec, 0=unlimited)0

Run ./run.sh -h to see supported options for a specific test script.

Quick Start

Run Single Test

cd benchmark

# Default config (nginx rules, large dataset)
./case_tcp/parse_to_blackhole/run.sh

# Medium dataset
./case_tcp/parse_to_blackhole/run.sh -m

# Use sysmon rules, 12 workers, 1M lines/sec rate limit
./case_tcp/parse_to_blackhole/run.sh -w 12 sysmon 1000000

Run All Tests

cd benchmark
./check_run.sh

FAQ

  • WPL model load failure: Check wpl path in wparse.toml is correct relative to test directory
  • Data generation failure: Check disk space and wpgen.toml configuration
  • Slow test execution: Use -m parameter to reduce data size, adjust -w for worker count

Benchmark 用例指南 (中文)

benchmark 目录收录了基于 benchmark/benchmark_common.sh 的性能测试用例。测试用例按数据源类型组织为多个 case 目录,每个 case 下包含不同的处理场景。本文档说明整体结构、通用参数与各测试场景的用途。

前置准备

  1. 所有脚本默认在 release profile 下运行,并依赖 wparse/wpgen/wproj,确保它们位于 PATH 中。
  2. 从 benchmark 目录或具体测试目录运行脚本。

目录结构

benchmark/
├── benchmark_common.sh      # 公共函数库(参数解析、环境初始化等)
├── check_run.sh            # 批量测试脚本(使用 -m 参数运行所有测试)
├── models/                 # 共享的模型文件
│   ├── wpl/               # WPL 规则集(nginx、sysmon、apt、aws 等)
│   └── oml/               # OML 转换模型
├── sinks/                 # 共享的 sink 配置
│   ├── parse_to_blackhole/
│   ├── parse_to_file/
│   ├── trans_to_blackhole/
│   └── trans_to_file/
├── case_tcp/              # TCP 数据源测试场景
├── case_file/             # File 数据源测试场景
├── case_syslog_tcp/       # Syslog TCP 测试场景
├── case_syslog_udp/       # Syslog UDP 测试场景
└── wpgen_test/            # wpgen 性能测试

测试场景说明

处理模式:

  • parse: 解析模式 - 纯 WPL 解析
  • trans: 转换模式 - WPL 解析 + OML 转换

输出目标:

  • blackhole: 黑洞输出 - 丢弃数据,用于测试纯解析/转发性能
  • file: 文件输出 - 输出到文件,测试完整的处理链路

测试用例矩阵

用例模式输入输出用途
case_tcp/parse_to_blackhole解析TCP (daemon)黑洞TCP 接收 + 纯解析性能
case_tcp/parse_to_file解析TCP (daemon)文件完整 TCP 到文件解析管道
case_tcp/trans_to_blackhole转换TCP (daemon)黑洞解析 + OML 转换性能
case_tcp/trans_to_file转换TCP (daemon)文件完整转换管道
case_file/parse_to_blackhole解析文件 (batch)黑洞纯解析吞吐量
case_file/parse_to_file解析文件 (batch)文件文件到文件解析
case_file/trans_to_blackhole转换文件 (batch)黑洞解析 + OML 转换吞吐量
case_file/trans_to_file转换文件 (batch)文件完整转换管道
case_syslog_tcp/parse_to_blackhole解析Syslog TCP黑洞Syslog TCP 纯解析
case_syslog_tcp/trans_to_blackhole转换Syslog TCP黑洞Syslog TCP 解析 + 转换
case_syslog_udp/parse_to_blackhole解析Syslog UDP黑洞Syslog UDP 纯解析
case_syslog_udp/trans_to_blackhole转换Syslog UDP黑洞Syslog UDP 解析 + 转换

通用选项

所有测试脚本共享以下参数(由 benchmark_common.sh 中的 benchmark_parse_args 解析):

参数说明默认值
-m使用中等规模数据集(20 万行)2000 万行
-f强制重新生成输入数据智能检测
-c <cnt>指定数据条数(与 -m 互斥,优先级更高)-
-w <cnt>指定 wparse worker 数量6 (daemon) / 10 (batch)
wpl_dirWPL 规则集目录(位置参数)nginx
speed样本生成限速(行/秒,0 表示不限速)0

执行 ./run.sh -h 可查看某个测试脚本支持的选项组合。

快速开始

运行单个测试

cd benchmark

# 使用默认配置(nginx 规则,大规模数据集)
./case_tcp/parse_to_blackhole/run.sh

# 使用中等规模数据集
./case_tcp/parse_to_blackhole/run.sh -m

# 使用 sysmon 规则,12 个 worker,限速 1M 行/秒
./case_tcp/parse_to_blackhole/run.sh -w 12 sysmon 1000000

批量测试所有场景

cd benchmark
./check_run.sh

常见问题

  • WPL 模型加载失败:检查 wparse.toml 中的 wpl 路径是否相对于测试目录正确
  • 数据生成失败:检查磁盘空间是否充足,确认 wpgen.toml 配置正确
  • 测试运行缓慢:使用 -m 参数减小数据规模,调整 -w 参数优化 worker 数

WarpParse、Vector、Logstash 性能基准测试报告

1. 技术概述与测试背景

1.1 测试背景

本报告记录在 Linux 平台完成的单机基准测试结果,覆盖从轻量级 Web 日志到复杂安全威胁日志的典型场景,用于形成阶段性 benchmark 基线,便于后续版本或方案之间的横向与纵向对比。本文仅描述测试方法与结果,不对生产环境性能上限作外推。

1.2 被测对象

  • WarpParse: 大禹安全公司研发的 ETL 核心引擎,采用 Rust 构建。
  • Vector: 开源可观测性数据管道工具,采用 Rust 构建。
    • Vector-VRL:基于 VRL 的 parse_regex 进行正则解析。
    • Vector-Fixed:尽量使用内置解析(如 nginx/aws 内置函数;sysmon 直接 JSON 解析;APT 无专用手段仍使用正则)。
  • Logstash: Elastic 生态的日志处理引擎,采用 JVM 运行时。

1.3 测试对象与版本说明

本次测试使用版本如下:

  • WarpParse:0.12.0
  • Vector:0.49.0
  • Logstash:9.2.3

构建与来源信息:

  • WarpParse:构建来源/commit/tag = GitHub tag v0.12.0-alpha (commit: 2ba6e55);构建参数 = 官方 release 构建产物(zip/tar.gz),未修改构建选项
  • Vector:构建来源/commit/tag = v0.49.0 (commit: dc7e792);构建参数 = 官方发布的 release 二进制,未修改构建选项
  • Logstash:构建来源/commit/tag = GitHub tag v9.2.3 (commit: 4eb0f3f);构建参数 = 官方发行包(zip / tar.gz,bundled JDK),未进行源码级构建

本报告已记录版本与构建来源;复现时仍需确保引擎运行参数、系统配置与数据集参数一致。

1.4 报告定位

本文档定位为阶段性 benchmark 报告,侧重方法与数据的可复现性与长期可比性,不作为最终性能结论或生产容量承诺。

2. 测试环境与方法

2.1 测试环境(Test Environment)

平台信息(Platform)

  • 平台类型:AWS EC2
  • 实例规格(Instance Type):c5a.2xlarge
  • 操作系统:Ubuntu 24.04 LTS
  • 系统架构:x86_64
  • 网络环境:本机回环(127.0.0.1,Loopback)

计算资源(Compute)

  • CPU:8 vCPU
  • CPU 型号:AMD EPYC 7R32
  • 内存:16 GiB

存储配置(Storage)

  • 存储类型:Amazon EBS
  • 卷类型:通用型 SSD(gp3)
  • 卷大小:128 GiB
  • IOPS:30,000
  • 吞吐量:200 MiB/s

说明(Notes)

  • gp3 卷支持 IOPS 与吞吐量独立配置,用于避免容量与性能强绑定
  • 当前配置提供较高的随机 I/O 能力(IOPS),并具备中等顺序 I/O 吞吐能力
  • 网络带宽/网卡能力:
    • 本报告中的 TCP 场景均基于本机 loopback(127.0.0.1)进行数据发送与接收;
    • 测试流量不经过物理网卡或云实例网络链路,不受实例网络带宽或 ENI 性能限制;
    • TCP 场景主要反映内核 TCP 协议栈开销与引擎自身的解析、调度与 I/O 处理能力。

2.2 测试范畴 (Scope)

  • 日志类型:
    • Nginx Access Log (239B): 典型 Web 访问日志,高吞吐场景。
    • AWS ELB Log (411B): 云设施负载均衡日志,中等复杂度。
    • Firewall Log (1K): 终端安全监控日志,JSON 结构,字段较多。
    • APT Threat Log (3K): 模拟的高级持续性威胁日志,大体积、长文本。
    • Mixed Log (886B): 上述四类日志混合形成的日志类型。
  • 数据拓扑:
    • File -> BlackHole: 测算引擎极限 I/O 读取与处理能力 (基准)。
    • TCP -> BlackHole: 测算网络接收与处理能力。
    • TCP -> File: 测算端到端完整落地能力。
  • 测试能力:
    • 解析 (Parse): 仅进行正则提取/KV解析与字段标准化。
    • 解析+转换 (Parse+Transform): 在解析基础上增加字段映射、富化、类型转换等逻辑。

2.3 评估指标

  • EPS (Events Per Second): 每秒处理事件数(核心吞吐指标)。
  • MPS (MiB/s): 每秒处理数据量。
  • CPU (Avg/Peak): 测试进程 CPU 使用率的平均值与峰值。
  • MEM (Avg/Peak): 测试进程内存占用的平均值与峰值。
  • Rule Size: 规则配置文件体积,评估分发与维护成本。
  • 性能倍数: 在同一日志类型 + 同一拓扑下,以 Vector-VRL 的 EPS 为 1.0x 进行归一化。

说明:

  • CPU 为多核累计百分比(例如 800% ≈ 8 个逻辑核满载),统计对象为测试进程本身(非系统总 CPU),由外部监控脚本按固定采样周期采集并计算 Avg/Peak。
  • MPS 换算公式:MPS = EPS × AvgLogSize(B) / 1024 / 1024
  • 采样来源与采样口径说明:
    • EPS:统一基于各引擎原生可观测性或统计接口获取。
      • WarpParse / Vector:使用引擎内置的吞吐统计能力。
      • Logstash:通过自动化脚本定期采集其官方 Monitoring API / 运行时统计信息。
    • CPU / MEM:通过外部监控脚本采集测试进程的资源使用情况(基于 shell 的周期性采样),用于跨引擎对比。
    • MPS:基于测得的 EPS 与对应日志的平均大小进行换算计算,用于辅助衡量实际数据吞吐规模。
    • 规则大小统计前对配置进行了统一去注释/去空行处理,仅保留有效表达部分,降低格式差异影响。
    • 各指标在不同引擎中的采集实现方式可能不同,但统计口径保持一致,结果以各指标最权威来源为准。

2.4 测试方法与执行方式

测试在单机环境中按日志类型与拓扑逐项执行。输入数据由本仓库提供的 benchmark 脚本生成或回放, 测试过程中各引擎独立运行,避免相互干扰。 输出目标根据测试拓扑配置为 BlackHole 或 File,以分别评估纯处理能力与包含 I/O 的端到端性能。

测试执行流程、脚本入口及通用参数说明见 benchmark/README.md。

2.4.1 最小复现清单(Minimal Repro Checklist)

  • 引擎版本与来源:

    • WarpParse / Vector / Logstash 的版本、tag、commit 及构建方式见 1.3。
  • Benchmark 工具链版本:

    • benchmark 仓库以 wp-example 仓库的最新提交(repo HEAD)为准。
    • 复现实验时建议记录具体 commit hash 以保证结果可追溯。
  • 数据规模与事件数量:

    • 本报告中“数据集规模”与“事件数量”为同一概念,均以处理的事件总数作为规模定义。
    • 在 WarpParse 的 benchmark 执行脚本中,通过参数 -c 指定事件总数; 该参数用于明确数据集规模,但并非要求所有引擎具备相同参数形式。
    • 对于 Vector 与 Logstash,测试数据集规模与 WarpParse 使用相同的事件数量, 通过等量输入数据实现规模对齐,而非依赖统一的启动参数。
    • 因此,-c 可视为本 benchmark 中“统一事件规模定义”的符号化表示, 而非跨引擎通用的命令行参数。
  • 结束条件:

    • 所有测试均以处理完成等量事件作为结束条件。
    • 不采用按固定运行时长结束的方式, 以避免不同引擎在启动、预热与稳定阶段差异带来的统计偏差。
  • Warmup 与采样窗口:

    • WarpParse 与 Vector:引擎启动后快速进入稳定状态,未单独区分 warmup 阶段。

    • Logstash:由于 JVM/JIT 与 pipeline 初始化特性,测试前先进行 warmup 运行; 在确认吞吐进入稳定区间后,开始采集 EPS / 资源指标。

    • 重复次数与取值规则:

      • 默认单次运行。
      • 如需更严格统计,建议重复 N=3 次并取 median 作为最终结果。

2.5 默认配置与调优说明

除非表格或备注中明确说明,本报告结果基于各引擎默认配置,未开启专项性能调优或非默认参数。

3. 详细吞吐量性能对比分析

3.0 测试结果汇总表

下表为结果索引,用于定位不同日志类型与测试能力的明细表格。

日志类型解析(Parse Only)解析 + 转换(Parse + Transform)
Nginx Access Log (239B)见 3.1.1见 3.2.1
AWS ELB Log (411B)见 3.1.2见 3.2.2
Firewall Log (1K)见 3.1.3见 3.2.3
APT Threat Log (3K)见 3.1.4见 3.2.4
Mixed Log (平均日志大小:867B)见 3.1.5见 3.2.5

3.1 日志解析能力 (Parse Only)

本节给出纯解析场景的测试结果。

3.1.1 Nginx Access Log (239B)

表 3.1.1-1:Nginx Access Log(Parse Only;File -> BlackHole / TCP -> BlackHole / TCP -> File)

引擎拓扑EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParseFile -> BlackHole810,100184.65626% / 639%115 MB / 314 MB3.83x
Vector-VRLFile -> BlackHole211,25048.15292% / 305%148 MB / 153 MB1.0x
Vector-FixedFile -> BlackHole170,66638.90431% / 451%141 MB / 151 MB0.81x
LogstashFile -> BlackHole106,38224.25436% / 461%1144 MB / 1175 MB0.50x
WarpParseTCP -> BlackHole765,800174.55574% / 628%245 MB / 366 MB1.56x
Vector-VRLTCP -> BlackHole492,200112.19501% / 510%155 MB / 159 MB1.0x
Vector-FixedTCP -> BlackHole255,50058.24480% / 533%138 MB / 145 MB0.52x
LogstashTCP -> BlackHole161,29036.76462% / 475%1174 MB / 1224 MB0.33x
WarpParseTCP -> File377,60086.07645% / 673%221 MB / 444 MB20.30x
Vector-VRLTCP -> File18,6004.24133% / 135%122 MB / 126 MB1.0x
Vector-FixedTCP -> File17,3003.94148% / 156%115 MB / 119 MB0.93x
LogstashTCP -> File147,05833.52465% / 476%1148 MB / 1186 MB7.91x

解析规则大小:

  • WarpParse:174B
  • Vector-VRL:217B
  • Vector-Fixed:86B
  • Logstash:248B

在同一日志类型 + 同一拓扑下,以 Vector-VRL 的 EPS 作为统一基准(1.0x),对所有引擎进行归一化对比。

3.1.2 AWS ELB Log (411B)

表 3.1.2-1:AWS ELB Log(Parse Only;File -> BlackHole / TCP -> BlackHole / TCP -> File)

引擎拓扑EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParseFile -> BlackHole398,800156.31698% / 756%194 MB / 366 MB2.82x
Vector-VRLFile -> BlackHole141,60055.50423% / 437%166 MB / 170 MB1.0x
Vector-FixedFile -> BlackHole161,94463.47496% / 515%174 MB / 179 MB1.14x
LogstashFile -> BlackHole87,71934.38514% / 532%1145 MB / 1170 MB0.62x
WarpParseTCP -> BlackHole369,900144.98669% / 724%178 MB / 461 MB2.49x
Vector-VRLTCP -> BlackHole148,40058.16456% / 486%178 MB / 185 MB1.0x
Vector-FixedTCP -> BlackHole176,60069.22417% / 435%169 MB / 176 MB1.19x
LogstashTCP -> BlackHole125,00049.00557% / 625%1181 MB / 1217 MB0.84x
WarpParseTCP -> File169,90066.59686% / 699%191 MB / 251 MB9.71x
Vector-VRLTCP -> File17,5006.86169% / 176%166 MB / 171 MB1.0x
Vector-FixedTCP -> File16,6006.51159% / 171%157 MB / 164 MB0.95x
LogstashTCP -> File121,95147.80559% / 621%1283 MB / 1359 MB6.97x

解析规则大小:

  • WarpParse:1153B
  • Vector-VRL:921B
  • Vector-Fixed:64B
  • Logstash:876B

在同一日志类型 + 同一拓扑下,以 Vector-VRL 的 EPS 作为统一基准(1.0x),对所有引擎进行归一化对比。

3.1.3 Firewall Log (1K)

表 3.1.3-1: Firewall Log(Parse Only;File -> BlackHole / TCP -> BlackHole / TCP -> File)

引擎拓扑EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParseFile -> BlackHole163,700175.16672% / 761%272 MB / 441 MB2.88x
VectorFile -> BlackHole56,76060.73513% / 693%176 MB / 204 MB1.00x
LogstashFile -> BlackHole17,39118.61675% / 724%1201 MB / 1228 MB0.31x
WarpParseTCP -> BlackHole154,900165.75665% / 735%128 MB / 353 MB2.38x
VectorTCP -> BlackHole65,20069.77648% / 768%240 MB / 253 MB1.00x
LogstashTCP -> BlackHole19,15720.50722% / 745%1283 MB / 1298 MB0.29x
WarpParseTCP -> File99,700106.68659% / 746%89 MB / 280 MB5.09x
VectorTCP -> File19,60020.97293% / 328%243 MB / 253 MB1.00x
LogstashTCP -> File18,01819.28654% / 709%1292 MB / 1364 MB0.92x

解析规则大小:

  • WarpParse:1552B
  • Vector-Fixed:1852B
  • Logstash:2406B

在同一日志类型 + 同一拓扑下,以 Vector 的 EPS 作为统一基准(1.0x),对所有引擎进行归一化对比。

3.1.4 APT Threat Log (3K)

表 3.1.4-1:APT Threat Log(Parse Only;File -> BlackHole / TCP -> BlackHole / TCP -> File)

引擎拓扑EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParseFile -> BlackHole129,700438.71535% / 543%273 MB / 295 MB7.67x
VectorFile -> BlackHole16,90157.17692% / 730%175 MB / 180 MB1.0x
LogstashFile -> BlackHole9,00930.47684% / 736%1211 MB / 1229 MB0.53x
WarpParseTCP -> BlackHole129,600438.37499% / 558%265 MB / 389 MB6.86x
VectorTCP -> BlackHole18,90063.93774% / 794%229 MB / 243 MB1.0x
LogstashTCP -> BlackHole10,18334.45733% / 757%1294 MB / 1308 MB0.54x
WarpParseTCP -> File55,000186.04362% / 368%197 MB / 224 MB5.91x
VectorTCP -> File9,30031.46412% / 450%211 MB / 218 MB1.0x
LogstashTCP -> File8,92830.20672% / 726%1305 MB / 1369 MB0.96x

解析规则大小:

  • WarpParse:985B
  • Vector:873B
  • Logstash:1027B

在同一日志类型 + 同一拓扑下,以 Vector 的 EPS 作为统一基准(1.0x),对所有引擎进行归一化对比。

3.1.5 Mixed Log (平均日志大小:886B)

表 3.1.5-1:Mixed Log(Parse Only;File -> BlackHole / TCP -> BlackHole / TCP -> File)

引擎拓扑EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParseFile -> BlackHole286,000241.66632% / 736%271 MB / 374 MB5.56x
Vector-VRLFile -> BlackHole51,44643.47494% / 692%228 MB / 249 MB1.00x
Vector-FixedFile -> BlackHole52,53044.39500% / 696%182 MB / 201 MB1.02x
LogstashFile -> BlackHole21,50518.17400% /444%1136 MB / 1163 MB0.42x
WarpParseTCP -> BlackHole258,600218.51532% / 709%189 MB / 483 MB3.10x
Vector-VRLTCP -> BlackHole83,30070.38516% / 781%208 MB / 222 MB1.00x
Vector-FixedTCP -> BlackHole81,70069.03518% / 784%181 MB / 191 MB0.98x
LogstashTCP -> BlackHole35,08729.65629% / 697%1222 MB / 1282 MB0.42x
WarpParseTCP -> File149,400126.24546% / 623%111 MB / 221 MB7.82x
Vector-VRLTCP -> File19,10016.14315% / 332%275 MB / 287 MB1.00x
Vector-FixedTCP -> File19,20016.22276% / 293%190 MB / 195 MB1.01x
LogstashTCP -> File32,78627.70593% / 670%1317 MB / 1428 MB1.72x

解析规则大小:

  • WarpParse:3864B
  • Vector-VRL:3960B
  • Vector-Fixed:4725B
  • Logstash:5396B

在同一日志类型 + 同一拓扑下,以 Vector-VRL 的 EPS 作为统一基准(1.0x),对所有引擎进行归一化对比。

规则大小可能受格式/换行/注释/路径等影响,体积差异不影响性能口径;规则逻辑保持一致。

混合日志规则:

  • 4类日志按照3:2:1:1混合

3.2 解析 + 转换能力 (Parse + Transform)

本节给出解析 + 转换场景的测试结果。

3.2.1 Nginx Access Log(239B)

表 3.2.1-1:Nginx Access Log(Parse + Transform;File -> BlackHole / TCP -> BlackHole / TCP -> File)

引擎拓扑EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParseFile -> BlackHole656,800149.71688% / 768%220 MB / 357 MB3.27x
Vector-VRLFile -> BlackHole201,00045.81339% / 350%167 MB / 175 MB1.0x
Vector-FixedFile -> BlackHole153,33334.95466% / 481%159 MB / 168 MB0.76x
LogstashFile -> BlackHole76,92317.53470% / 483%1126 MB / 1160 MB0.38x
WarpParseTCP -> BlackHole524,800119.62608% / 637%189 MB / 410 MB1.34x
Vector-VRLTCP -> BlackHole392,20089.39472% / 512%162 MB / 166 MB1.0x
Vector-FixedTCP -> BlackHole208,90047.61502% / 537%146 MB / 151 MB0.53x
LogstashTCP -> BlackHole107,14224.42520% / 552%1163 MB / 1243 MB0.27x
WarpParseTCP -> File297,10067.72645% / 664%238 MB / 317 MB17.90x
Vector-VRLTCP -> File16,6003.78138% / 143%138 MB / 143 MB1.0x
Vector-FixedTCP -> File17,2003.92156% / 166%128 MB / 133MB1.04x
LogstashTCP -> File95,23821.71510% / 551%1141 MB / 1217 MB5.74x

解析+转换规则大小:

  • WarpParse:521B
  • Vector-VRL:519B
  • Vector-Fixed:500B
  • Logstash:712B

在同一日志类型 + 同一拓扑下,以 Vector-VRL 的 EPS 作为统一基准(1.0x),对所有引擎进行归一化对比。

3.2.2 AWS ELB Log(411B)

表 3.2.2-1:AWS ELB Log(Parse + Transform;File -> BlackHole / TCP -> BlackHole / TCP -> File)

引擎拓扑EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParseFile -> BlackHole275,900108.14649% / 719%236 MB / 327 MB2.22x
Vector-VRLFile -> BlackHole124,33348.73523% / 560%190 MB / 199 MB1.0x
Vector-FixedFile -> BlackHole141,81855.59514% / 529%179 MB / 191 MB1.14x
LogstashFile -> BlackHole54,05421.19582% / 653%1155 MB / 1217 MB0.43x
WarpParseTCP -> BlackHole259,900101.87682% / 697%139 MB / 275 MB1.99x
Vector-VRLTCP -> BlackHole130,60051.19446% / 500%191 MB / 195 MB1.0x
Vector-FixedTCP -> BlackHole146,00057.23413% / 441%181 MB / 184 MB1.12x
LogstashTCP -> BlackHole78,12530.62624% / 696%1212 MB / 1272 MB0.60x
WarpParseTCP -> File139,80054.80717% / 738%139 MB / 296 MB7.99x
Vector-VRLTCP -> File17,5006.86177% / 194%181 MB / 187 MB1.0x
Vector-FixedTCP -> File17,6006.90164% / 182%173 MB / 180 MB1.01x
LogstashTCP -> File69,44427.22636% / 690%1192 MB / 1232 MB3.97x

解析+转换规则大小:

  • WarpParse:1694B
  • Vector-VRL:1259B
  • Vector-Fixed:570B
  • Logstash:2019B

在同一日志类型 + 同一拓扑下,以 Vector-VRL 的 EPS 作为统一基准(1.0x),对所有引擎进行归一化对比。

3.2.3 Firewall Log (1K)

表 3.2.3-1:Firewall Log(Parse Only;File -> BlackHole / TCP -> BlackHole / TCP -> File)

引擎拓扑EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParseFile -> BlackHole132,500141.78693% / 786%348 MB / 430 MB2.49x
VectorFile -> BlackHole53,25856.99482% / 692%198 MB / 225 MB1.00x
LogstashFile -> BlackHole15,87316.98579% / 680%1111 MB / 1127 MB0.30x
WarpParseTCP -> BlackHole125,500134.29614% / 747%130 MB / 304 MB2.22x
VectorTCP -> BlackHole56,60060.56630% / 702%257 MB / 272 MB1.00x
LogstashTCP -> BlackHole16,52817.69664% / 720%1228 MB / 1262 MB0.29x
WarpParseTCP -> File69,80074.69620% / 760%129 MB / 175 MB3.31x
VectorTCP -> File21,10022.58315% / 332%275 MB / 287 MB1.00x
LogstashTCP -> File16,39317.54660% / 716%1210 MB / 1236 MB0.78x

解析+转换规则大小:

  • WarpParse:2249B
  • Vector:2344B
  • Logstash:3453B

在同一日志类型 + 同一拓扑下,以 Vector-VRL 的 EPS 作为统一基准(1.0x),对所有引擎进行归一化对比。

3.2.4 APT Threat Log (3K)

表 3.2.4-1:APT Threat Log(Parse + Transform;File -> BlackHole / TCP -> BlackHole / TCP -> File)

引擎拓扑EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParseFile -> BlackHole123,100416.38599% / 607%199 MB / 265 MB7.65x
VectorFile -> BlackHole16,09354.43674% / 742%188 MB / 199 MB1.0x
LogstashFile -> BlackHole7,63325.82657% / 732%1174 MB / 1197 MB0.47x
WarpParseTCP -> BlackHole114,200386.28508% / 532%228 MB / 248 MB6.14x
VectorTCP -> BlackHole18,60062.91769% / 790%243 MB / 252 MB1.0x
LogstashTCP -> BlackHole9,85233.33704% / 748%1283 MB / 1304 MB0.53x
WarpParseTCP -> File54,800185.36441% / 447%196 MB / 215 MB5.89x
Vector-VRLTCP -> File9,30031.46345% / 479%217 MB / 227 MB1.0x
LogstashTCP -> File8,62029.16671% / 729%1229 MB / 1251 MB0.93x

解析+转换规则大小:

  • WarpParse:1638B
  • Vector:1382B
  • Logstash:2041B

在同一日志类型 + 同一拓扑下,以 Vector-VRL 的 EPS 作为统一基准(1.0x),对所有引擎进行归一化对比。

3.2.5 Mixed Log (平均日志大小:886B)

表 3.2.5-1:Mixed Log(Parse + Transform;File -> BlackHole / TCP -> BlackHole / TCP -> File)

引擎拓扑EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParseFile -> BlackHole204,400172.71566% / 663%196 MB / 265 MB4.45x
Vector-VRLFile -> BlackHole45,90938.79469% / 683%204 MB / 225 MB1.00x
Vector-FixedFile -> BlackHole48,48440.97541% / 714%178 MB / 209 MB1.06x
LogstashFile -> BlackHole32,96727.86573% / 685%1150 MB / 1172 MB0.72x
WarpParseTCP -> BlackHole190,300160.80603% / 623%119 MB / 190 MB2.40x
Vector-VRLTCP -> BlackHole79,40067.09776% / 782%204 MB / 211 MB1.00x
Vector-FixedTCP -> BlackHole76,50064.64776% / 781%190 MB / 203 MB0.96x
LogstashTCP -> BlackHole30,30325.60656% / 719%1258 MB / 1287MB0.38x
WarpParseTCP -> File120,100101.48648% / 727%121 MB / 183 MB6.19x
Vector-VRLTCP -> File19,40016.39268% / 300%201 MB / 216 MB1.00x
Vector-FixedTCP -> File20,00016.90273% / 303%195 MB / 207 MB1.03x
LogstashTCP -> File26,66622.53612% / 689%1218 MB / 1253 MB1.37x

解析+转换规则大小:

  • WarpParse:3864B
  • Vector-VRL:4723B
  • Vector-Fixed:1733B
  • Logstash:3984B

在同一日志类型 + 同一拓扑下,以 Vector-VRL 的 EPS 作为统一基准(1.0x),对所有引擎进行归一化对比。

规则大小可能受格式/换行/注释/路径等影响,体积差异不影响性能口径;规则逻辑保持一致。

混合日志规则:

  • 4类日志按照3:2:1:1混合

4 固定速率资源占用测试

4.1 Mixed Log (平均日志大小:867B)

表 4.1.1-1:Mixed Log(Parse Only; TCP -> BlackHole )

引擎拓扑CPU (Avg/Peak)MEM (Avg/Peak)
WarpParseTCP -> BlackHole54% / 56%60 MB / 66 MB
Vector-VRLTCP -> BlackHole173% / 180%162 MB / 166 MB
Vector-FixedTCP -> BlackHole171% / 177%128 MB / 134 MB
LogstashTCP -> BlackHole276% / 396%1190 MB / 1223 MB
  • 20000EPS下的资源消耗情况
  • logstash在warmup后采集

5. 结果解读

5.1 吞吐与资源表现

结果摘要:

  1. 在 Linux 平台测试中,WarpParse 相对 Vector-VRL 的 EPS 倍数范围为:解析 1.56x - 20.30x,解析+转换 1.34x - 17.90x;TCP -> File 拓扑下的倍数区间更高。
  2. CPU 使用率在 WarpParse 场景中整体高于 Vector/Logstash(见各表);吞吐提升与 CPU 占用同时出现。
  3. APT (3K) 场景下,WarpParse 的 MPS 保持较高水平;Vector 在同场景的 EPS/MPS 相对更低(见 3.1.4/3.2.4)。
  • 5.2 规则与表达能力要点

    • 规则体积不仅反映配置分发与维护成本, 也可作为衡量引擎在表达同等日志语义时所需复杂度的参考指标。 在相同解析与转换语义下,规则体积越小,通常意味着引擎具备更高层级的内置能力或更强的表达抽象。

    • 各日志类型与拓扑下的规则体积差异见对应表格“规则大小”备注, 用于辅助评估不同引擎在表达能力、规则可读性与维护复杂度方面的差异。

    • Vector 测试同时包含 VRL 与 Fixed 两种策略:

      • VRL 更偏向通用表达能力,对复杂语义具备更强灵活性;
      • Fixed 优先使用内置解析能力,在规则体积与维护复杂度上更具优势。 两者在表达能力与性能上的权衡以表格数据为准。
    • 在多数日志类型下,TCP → File 拓扑呈现更高的性能倍数区间(见 3.1 / 3.2 对应表格), 该结论在不同规则复杂度水平下均保持一致。

5.3 稳定性

  • 本报告未引入背压/队列深度等专用指标,稳定性判断仅基于运行期间吞吐与资源观测。
  • 注意点: TCP -> File 大包场景(如 APT)下内存随吞吐上升(约 224-389 MB),需结合容量规划。

6. 阶段性总结与建议

以下为基于本报告范围的阶段性观察,不构成生产选型结论;实际落地需结合业务流量、架构约束与运维能力评估。

决策维度建议方案结果要点依据
追求吞吐能力WarpParse关注本报告中的 EPS 倍数区间解析场景 1.56x-20.30x,解析+转换 1.34x-17.90x;TCP -> File 拓扑区间更高。
资源受限环境WarpParse关注 CPU/内存的权衡关系Vector-VRL 在多数场景下 CPU/MEM 低于 WarpParse;Logstash 内存占用显著更高(见各表)。
边缘/Agent部署WarpParse关注规则体积与单机吞吐规则体积在不同日志类型间存在差异;吞吐指标在本报告中更高,具体差异见各节“规则大小”和表格数据。
通用生态兼容WarpParse关注生态与可扩展性生态兼容性未在本报告中量化,建议结合现有生态与插件适配成本评估。

阶段性结论: 基于本报告数据,WarpParse 与 Vector-VRL 的 EPS 倍数区间为:纯解析 1.56x-20.30x,解析+转换 1.34x-17.90x,端到端(TCP -> File)倍数更高。上述结果可作为同类场景的阶段性基线参考;在大包 TCP -> File 场景下需关注内存随吞吐上升(约 224-389 MB)。

7. 已知限制与注意事项

  • 本报告为单机测试,未覆盖多节点、HA(High Availability,高可用)、持久化优化或生产负载波动等因素。
  • 测试范围限定为五类日志与三种拓扑,未覆盖更复杂的输入/输出链路。
  • 结果依赖具体硬件、操作系统与存储配置,跨环境对比需谨慎。

WarpParse、Vector、Logstash 性能基准测试报告

1. 技术概述与测试背景

1.1 测试背景

本报告记录在 Mac 平台完成的单机基准测试结果,覆盖从轻量级 Web 日志到复杂安全威胁日志的典型场景,用于形成阶段性 benchmark 基线,便于后续版本或方案之间的横向与纵向对比。本文仅描述测试方法与结果,不对生产环境性能上限作外推。

1.2 被测对象

  • WarpParse: 大禹安全公司研发的 ETL 核心引擎,采用 Rust 构建。
  • Vector: 开源可观测性数据管道工具,采用 Rust 构建。
    • Vector-VRL:基于 VRL 的 parse_regex(Firewall 使用 KV 解析;APT 仍使用正则解析) 进行正则解析。
    • Vector-Fixed:优先使用内置解析(如 nginx/aws 内置函数)。
  • Logstash: Elastic 生态的日志处理引擎,采用 JVM 运行时。

1.3 测试对象与版本说明

本次测试使用版本如下:

  • WarpParse:0.12.0
  • Vector:0.49.0
  • Logstash:9.2.3

构建与来源信息:

  • WarpParse:构建来源/commit/tag = GitHub tag v0.12.0-alpha (commit: 2ba6e55);构建参数 = 官方 release 构建产物(zip/tar.gz),未修改构建选项
  • Vector:构建来源/commit/tag = v0.49.0 (commit: dc7e792);构建参数 = 官方发布的 release 二进制,未修改构建选项
  • Logstash:构建来源/commit/tag = GitHub tag v9.2.3 (commit: 4eb0f3f);构建参数 = 官方发行包(zip / tar.gz,bundled JDK),未进行源码级构建

本报告已记录版本与构建来源;复现时仍需确保引擎运行参数、系统配置与数据集参数一致。

1.4 报告定位

本文档定位为阶段性 benchmark 报告,侧重方法与数据的可复现性与长期可比性,不作为最终性能结论或生产容量承诺。

2. 测试环境与方法

2.1 测试环境(Test Environment)

平台信息(Platform)

  • 平台类型:Mac mini(Apple M4)
  • 操作系统:macOS 15.5
  • 系统架构:arm64
  • 网络环境:本机回环(127.0.0.1)

计算资源(Compute)

  • CPU:10-core
  • 内存:16 GiB
  • 后台任务/性能模式:测试期间关闭不必要后台任务;未做额外系统调优

存储配置(Storage)

  • 存储介质:Internal SSD
  • 文件系统:APFS
  • 卷大小:256G

2.2 测试范畴 (Scope)

  • 日志类型:
    • Nginx Access Log (239B): 典型 Web 访问日志,高吞吐场景。
    • AWS ELB Log (411B): 云设施负载均衡日志,中等复杂度。
    • Firewall Log (1K): 终端安全监控日志,JSON 结构,字段较多。
    • APT Threat Log (3K): 模拟的高级持续性威胁日志,大体积、长文本。
    • Mixed Log (886B): 上述四类日志混合起来形成的日志类型。
  • 数据拓扑:
    • File -> BlackHole: 测算引擎极限 I/O 读取与处理能力 (基准)。
    • TCP -> BlackHole: 测算网络接收与处理能力。
    • TCP -> File: 测算端到端完整落地能力。
  • 测试能力:
    • 解析 (Parse): 仅进行正则提取/KV解析与字段标准化。
    • 解析+转换 (Parse+Transform): 在解析基础上增加字段映射、富化、类型转换等逻辑。

2.3 评估指标

  • EPS (Events Per Second): 每秒处理事件数(核心吞吐指标)。
  • MPS (MiB/s): 每秒处理数据量。
  • CPU (Avg/Peak): 测试进程 CPU 使用率的平均值与峰值。
  • MEM (Avg/Peak): 测试进程内存占用的平均值与峰值。
  • Rule Size: 规则配置文件体积,用于衡量在表达同等日志语义时所需的描述复杂度,同时辅助评估配置分发、可读性与长期维护成本。
  • 性能倍数: 在同一日志类型 + 同一拓扑下,以 Vector-VRL 的 EPS 为 1.0x 进行归一化。

说明:

  • CPU 为多核累计百分比(例如 800% ≈ 8 个逻辑核满载),统计对象为测试进程本身(非系统总 CPU),由外部监控脚本按固定采样周期采集并计算 Avg/Peak。
  • MPS 换算公式:MPS = EPS × AvgLogSize(B) / 1024 / 1024
  • 采样来源与采样口径说明:
    • EPS:统一基于各引擎原生可观测性或统计接口获取。
      • WarpParse / Vector:使用引擎内置的吞吐统计能力。
      • Logstash:通过自动化脚本定期采集其官方 Monitoring API / 运行时统计信息。
    • CPU / MEM:通过外部监控脚本采集测试进程的资源使用情况(基于 shell 的周期性采样),用于跨引擎对比。
    • MPS:基于测得的 EPS 与对应日志的平均大小进行换算计算,用于辅助衡量实际数据吞吐规模。
    • 规则大小统计前对配置进行了统一去注释/去空行处理,仅保留有效表达部分,降低格式差异影响。
    • 各指标在不同引擎中的采集实现方式可能不同,但统计口径保持一致,结果以各指标最权威来源为准。

2.4 测试方法与执行方式

测试在单机环境中按日志类型与拓扑逐项执行。输入数据由本仓库提供的 benchmark 脚本生成或回放, 测试过程中各引擎独立运行,避免相互干扰。 输出目标根据测试拓扑配置为 BlackHole 或 File,以分别评估纯处理能力与包含 I/O 的端到端性能。

测试执行流程、脚本入口及通用参数说明见 benchmark/README.md。

2.4.1 最小复现清单(Minimal Repro Checklist)

  • 引擎版本与来源:

    • WarpParse / Vector / Logstash 的版本、tag、commit 及构建方式见 1.3。
  • Benchmark 工具链版本:

    • benchmark 仓库以 wp-example 仓库的最新提交(repo HEAD)为准。
    • 复现实验时建议记录具体 commit hash 以保证结果可追溯。
  • 数据规模与事件数量:

    • 本报告中“数据集规模”与“事件数量”为同一概念,均以处理的事件总数作为规模定义。
    • 在 WarpParse 的 benchmark 执行脚本中,通过参数 -c 指定事件总数; 该参数用于明确数据集规模,但并非要求所有引擎具备相同参数形式。
    • 对于 Vector 与 Logstash,测试数据集规模与 WarpParse 使用相同的事件数量, 通过等量输入数据实现规模对齐,而非依赖统一的启动参数。
    • 因此,-c 可视为本 benchmark 中“统一事件规模定义”的符号化表示, 而非跨引擎通用的命令行参数。
  • 结束条件:

    • 所有测试均以处理完成等量事件作为结束条件。
    • 不采用按固定运行时长结束的方式, 以避免不同引擎在启动、预热与稳定阶段差异带来的统计偏差。
  • Warmup 与采样窗口:

    • WarpParse 与 Vector:引擎启动后快速进入稳定状态,未单独区分 warmup 阶段。

    • Logstash:由于 JVM/JIT 与 pipeline 初始化特性,测试前先进行 warmup 运行; 在确认吞吐进入稳定区间后,开始采集 EPS / 资源指标。

    • 重复次数与取值规则:

      • 默认单次运行。
      • 如需更严格统计,建议重复 N=3 次并取 median 作为最终结果。

2.5 默认配置与调优说明

除非表格或备注中明确说明,本报告结果基于各引擎默认配置,未开启专项性能调优或非默认参数。

3. 详细性能对比分析

3.0 测试结果汇总表

下表为结果索引,用于定位不同日志类型与测试能力的明细表格。

日志类型解析(Parse Only)解析 + 转换(Parse + Transform)
Nginx Access Log (239B)见 3.1.1见 3.2.1
AWS ELB Log (411B)见 3.1.2见 3.2.2
Firewall Log (1K)见 3.1.3见 3.2.3
APT Threat Log (3K)见 3.1.4见 3.2.4
Mixed Log (平均日志大小:886B)见 3.1.5见 3.2.5

3.1 日志解析能力 (Parse Only)

本节给出纯解析场景的测试结果。

3.1.1 Nginx Access Log (239B)

表 3.1.1-1:Nginx Access Log(Parse Only;File -> BlackHole / TCP -> BlackHole / TCP -> File)

引擎拓扑EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParseFile -> BlackHole2,789,800635.86768% / 858%126 MB / 130 MB4.88x
Vector-VRLFile -> BlackHole572,076130.39298% / 320%222 MB / 241 MB1.0x
Vector-FixedFile -> BlackHole513,181116.97466% / 538%232 MB / 245 MB0.90x
LogstashFile -> BlackHole270,27061.60308% / 418%1092 MB / 1115 MB0.47x
WarpParseTCP -> BlackHole1,657,500377.80530% / 580%307 MB / 320 MB1.42x
Vector-VRLTCP -> BlackHole1,163,700265.24540% / 598%218 MB / 224 MB1.0x
Vector-FixedTCP -> BlackHole730,700166.55592% / 658%212 MB / 220 MB0.63x
LogstashTCP -> BlackHole541,403123.40465% / 667%1161 MB / 1234 MB0.47x
WarpParseTCP -> File789,000179.84445% / 470%315 MB / 353 MB8.78x
Vector-VRLTCP -> File89,90020.49165% / 170%213 MB / 221 MB1.0x
Vector-FixedTCP -> File92,30021.04201% / 214%195 MB / 208 MB1.03x
LogstashTCP -> File507,975115.78515% / 762%1153 MB / 1184 MB5.65x

解析规则大小:

  • WarpParse:150B
  • Vector-VRL:217B
  • Vector-Fixed:86B
  • Logstash:248B

在同一日志类型 + 同一拓扑下,以 Vector-VRL 的 EPS 作为统一基准(1.0x),对所有引擎进行归一化对比。

3.1.2 AWS ELB Log (411B)

表 3.1.2-1:AWS ELB Log(Parse Only;File -> BlackHole / TCP -> BlackHole / TCP -> File)

引擎拓扑EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParseFile -> BlackHole1,124,500440.79787% / 824%314 MB / 320 MB2.89x
Vector-VRLFile -> BlackHole389,000152.47597% / 658%280 MB / 297 MB1.0x
Vector-FixedFile -> BlackHole491,739192.74514% / 537%259 MB / 284 MB1.26x
LogstashFile -> BlackHole208,33381.66394% / 506%983 MB / 1141 MB0.54x
WarpParseTCP -> BlackHole947,300371.33625% / 664%357 MB / 362 MB2.40x
Vector-VRLTCP -> BlackHole394,600154.67546% / 620%275 MB / 286 MB1.0x
Vector-FixedTCP -> BlackHole555,500217.73465% / 523%250 MB / 255 MB1.41x
LogstashTCP -> BlackHole425,531166.79817% / 879%1257 MB / 1287 MB1.08x
WarpParseTCP -> File349,700137.07496% / 537%333 MB / 432 MB4.12x
Vector-VRLTCP -> File84,70033.20240% / 256%268 MB / 275 MB1.0x
Vector-FixedTCP -> File86,90034.06199% / 208%252 MB / 264 MB1.03x
LogstashTCP -> File350,877137.53679% / 891%1288 MB / 1327 MB4.14x

解析规则大小:

  • WarpParse:1153B
  • Vector-VRL:921B
  • Vector-Fixed:64B
  • Logstash:876B

在同一日志类型 + 同一拓扑下,以 Vector-VRL 的 EPS 作为统一基准(1.0x),对所有引擎进行归一化对比。

3.1.3 Firewall Log (1K)

表 3.1.3-1: Firewall Log(Parse Only;File -> BlackHole / TCP -> BlackHole / TCP -> File)

引擎拓扑EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParseFile -> BlackHole459,900492.10887% / 923%229 MB / 234 MB4.24x
Vector-VRLFile -> BlackHole115,322123.40456% / 504%254 MB / 275 MB1.00x
LogstashFile -> BlackHole50,50554.04881% / 929%1139 MB / 1192 MB0.47x
WarpParseTCP -> BlackHole406,400434.86761% / 787%424 MB / 484 MB2.15x
Vector-VRLTCP -> BlackHole188,800186.50691% / 790%373 MB / 393 MB1.00x
LogstashTCP -> BlackHole54,34758.15874% / 934%1223 MB / 1260 MB0.31x
WarpParseTCP -> File251,100268.68677% / 712%237 MB / 247 MB3.45x
Vector-VRLTCP -> File72,70077.79368% / 413%403 MB / 407 MB1.00x
LogstashTCP -> File54,94558.79894% / 950%1192 MB / 1223 MB0.76x

解析规则大小:

  • WarpParse:137B
  • Vector-VRL:317B
  • Logstash:527B

在同一日志类型 + 同一拓扑下,以 Vector-VRL的 EPS 作为统一基准(1.0x),对所有引擎进行归一化对比。

3.1.4 APT Threat Log (3K)

表 3.1.4-1:APT Threat Log(Parse Only;File -> BlackHole / TCP -> BlackHole / TCP -> File)

引擎拓扑EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParseFile -> BlackHole328,0001109.53743% / 829%183 MB / 184 MB8.68x
Vector-VRLFile -> BlackHole37,777127.79578% / 657%255 MB / 265 MB1.0x
LogstashFile -> BlackHole29,940101.28847% / 915%944 MB / 1152 MB0.79x
WarpParseTCP -> BlackHole299,7001013.80718% / 743%335 MB / 351 MB5.88x
Vector-VRLTCP -> BlackHole51,000172.52834% / 887%385 MB / 413 MB1.0x
LogstashTCP -> BlackHole31,446106.37843% / 892%1218 MB / 1313 MB0.62x
WarpParseTCP -> File99,900337.94336% / 352%333 MB / 508 MB2.69x
Vector-VRLTCP -> File37,200125.84652% / 837%411 MB / 424 MB1.0x
LogstashTCP -> File30,120101.89840% / 897%1060 MB / 1232 MB0.81x

解析规则大小:

  • WarpParse:985B
  • Vector:873B
  • Logstash:1027B

在同一日志类型 + 同一拓扑下,以 Vector-VRL的 EPS 作为统一基准(1.0x),对所有引擎进行归一化对比。

3.1.5 Mixed Log (平均日志大小:886B)

表 3.1.5-1:Mixed Log(Parse Only;File -> BlackHole / TCP -> BlackHole / TCP -> File)

引擎拓扑EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParseFile -> BlackHole715,000604.14860% / 868%246 MB / 254 MB3.76x
Vector-VRLFile -> BlackHole190,000160.54827% / 880%281 MB / 329 MB1.0x
Vector-FixedFile -> BlackHole197,073166.52825% / 903%237 MB / 250 MB1.04x
LogstashFile -> BlackHole109,89086.43746% / 955%1271 MB / 1292 MB0.62x
WarpParseTCP -> BlackHole586,900495.90697% / 706%299 MB / 322 MB2.69x
Vector-VRLTCP -> BlackHole218,600184.71891% / 930%351 MB / 369 MB1.0x
Vector-FixedTCP -> BlackHole220,100185.98894% / 935%293 MB / 312 MB1.01x
LogstashTCP -> BlackHole128,205108.33893% / 957%1258 MB / 1289 MB0.66x
WarpParseTCP -> File308,400260.58537% / 560%177 MB / 251 MB3.90x
Vector-VRLTCP -> File79,00066.75383% / 415%393 MB / 396 MB1.0x
Vector-FixedTCP -> File79,50067.17384% / 407%331 MB / 355 MB1.01x
LogstashTCP -> File126,582106.96879% / 972%1278 MB / 1296 MB1.60x

解析规则大小:

  • WarpParse:3864B
  • Vector-VRL:4723B
  • Vector-Fixed:1733B
  • Logstash:3984B

在同一日志类型 + 同一拓扑下,以 Vector-VRL 的 EPS 作为统一基准(1.0x),对所有引擎进行归一化对比。

混合日志规则:

  • 4类日志按照3:2:1:1混合

3.1.6 Mixed Log (平均日志大小:867B)

表 3.1.6-1:Mixed Log(Parse Only; TCP -> BlackHole )

引擎拓扑CPU (Avg/Peak)MEM (Avg/Peak)
WarpParseTCP -> BlackHole44% / 57%97 MB / 100 MB
Vector-VRLTCP -> BlackHole116% / 143%191 MB / 194 MB
Vector-FixedTCP -> BlackHole125% / 146%153 MB / 156 MB
LogstashTCP -> BlackHole159% / 192%1119 MB / 1191 MB
  • 20000EPS下的资源消耗情况
  • logstash在warmup后采集

3.2 解析 + 转换能力 (Parse + Transform)

本节给出解析 + 转换场景的测试结果。

3.2.1 Nginx Access Log(239B)

表 3.2.1-1:Nginx Access Log(Parse + Transform;File -> BlackHole / TCP -> BlackHole / TCP -> File)

引擎拓扑EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParseFile -> BlackHole2,162,500492.91821% / 911%209 MB / 222 MB3.77x
Vector-VRLFile -> BlackHole572,941130.59344% / 378%274 MB / 286 MB1.0x
Vector-FixedFile -> BlackHole482,000109.86554% / 612%252 MB / 261 MB0.84x
LogstashFile -> BlackHole227,27251.80359% / 548%1109 MB / 1143 MB0.40x
WarpParseTCP -> BlackHole1,382,800315.19602% / 656%279 MB / 369 MB1.35x
Vector-VRLTCP -> BlackHole1,024,300233.47534% / 618%232 MB / 235 MB1.0x
Vector-FixedTCP -> BlackHole595,800135.80543% / 651%214 MB / 219 MB0.58x
LogstashTCP -> BlackHole357,14281.40685% / 861%1219 MB / 1258 MB0.35x
WarpParseTCP -> File788,900179.82574% / 587%249 MB / 253 MB8.44x
Vector-VRLTCP -> File93,50021.31171% / 184%203 MB / 211 MB1.0x
Vector-FixedTCP -> File87,50019.94208% / 223%197 MB / 212 MB0.94x
LogstashTCP -> File344,82778.60661% / 883%1202 MB / 1230 MB3.69x

解析+转换规则大小:

  • WarpParse:521B
  • Vector-VRL:519B
  • Vector-Fixed:500B
  • Logstash:712B

在同一日志类型 + 同一拓扑下,以 Vector-VRL 的 EPS 作为统一基准(1.0x),对所有引擎进行归一化对比。

3.2.2 AWS ELB Log(411B)

表 3.2.2-1:AWS ELB Log(Parse + Transform;File -> BlackHole / TCP -> BlackHole / TCP -> File)

引擎拓扑EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParseFile -> BlackHole913,300358.00880% / 942%228 MB / 248 MB2.64x
Vector-VRLFile -> BlackHole345,500135.42548% / 649%291 MB / 309 MB1.0x
Vector-FixedFile -> BlackHole446,111174.86506% / 597%276 MB / 295 MB1.29x
LogstashFile -> BlackHole147,05857.64525% / 701%1121 MB / 1170 MB0.43x
WarpParseTCP -> BlackHole757,600296.97714% / 758%270 MB / 360 MB2.04x
Vector-VRLTCP -> BlackHole370,900145.38561% / 607%284 MB / 293 MB1.0x
Vector-FixedTCP -> BlackHole481,700188.81466% / 536%265 MB / 272 MB1.30x
LogstashTCP -> BlackHole222,22287.10795% / 889%1336 MB / 1377 MB0.60x
WarpParseTCP -> File319,900125.39540% / 600%321 MB / 432 MB3.87x
Vector-VRLTCP -> File82,70032.42242% / 257%272 MB / 288 MB1.0x
Vector-FixedTCP -> File83,60032.77211% / 220%260 MB / 274 MB1.01x
LogstashTCP -> File200,00078.39750% / 881%1289 MB / 1325 MB2.42x

解析+转换规则大小:

  • WarpParse:1694B
  • Vector-VRL:1259B
  • Vector-Fixed:570B
  • Logstash:2019B

在同一日志类型 + 同一拓扑下,以 Vector-VRL 的 EPS 作为统一基准(1.0x),对所有引擎进行归一化对比。

3.2.3 Firewall Log (1K)

表 3.2.3-1:Firewall Log(Parse Only;File -> BlackHole / TCP -> BlackHole / TCP -> File)

引擎拓扑EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParseFile -> BlackHole382,500409.28912% / 960%181 MB / 194 MB3.44x
Vector-VRLFile -> BlackHole111,081118.86450% / 530%295 MB / 320 MB1.0x
LogstashFile -> BlackHole49,01952.45894% / 927%1180 MB / 1219 MB0.37x
WarpParseTCP -> BlackHole288,300308.49679% / 696%238 MB / 242 MB1.77x
Vector-VRLTCP -> BlackHole163,300174.73683% / 757%416 MB / 432 MB1.0x
LogstashTCP -> BlackHole51,54655.16879% / 922%1253 MB / 1281 MB0.32x
WarpParseTCP -> File224,500240.22798% / 818%481 MB / 488 MB3.04x
Vector-VRLTCP -> File73,90079.07378% / 442%412 MB / 426 MB1.00x
LogstashTCP -> File50,00053.50884% / 934%1256 MB / 1289 MB0.68x

解析规则大小:

  • WarpParse:2249B
  • Vector-VRL:2344B
  • Logstash:3453B

在同一日志类型 + 同一拓扑下,以 Vector-VRL的 EPS 作为统一基准(1.0x),对所有引擎进行归一化对比。

3.2.4 APT Threat Log (3K)

表 3.2.4-1:APT Threat Log(Parse + Transform;File -> BlackHole / TCP -> BlackHole / TCP -> File)

引擎拓扑EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParseFile -> BlackHole299,4001012.79763% / 855%155 MB / 162 MB8.12x
Vector-VRLFile -> BlackHole36,857124.68567% / 654%268 MB / 286 MB1.0x
LogstashFile -> BlackHole26,31589.02852% / 901%1256 MB / 1305 MB0.71x
WarpParseTCP -> BlackHole279,700946.14762% / 784%335 MB / 345 MB5.38x
Vector-VRLTCP -> BlackHole52,000175.90862% / 907%400 MB / 416 MB1.0x
LogstashTCP -> BlackHole27,02791.42846% / 926%1379 MB / 1413 MB0.52x
WarpParseTCP -> File89,900304.11355% / 377%300 MB / 324 MB2.41x
Vector-VRLTCP -> File37,300126.18664% / 750%392 MB / 411 MB1.0x
LogstashTCP -> File25,64186.74819% / 936%1300 MB / 1356 MB0.69x

解析+转换规则大小:

  • WarpParse:1638B
  • Vector-VRL:1382B
  • Logstash:2041B

在同一日志类型 + 同一拓扑下,以 Vector-VRL的 EPS 作为统一基准(1.0x),对所有引擎进行归一化对比。

3.2.5 Mixed Log (平均日志大小:886B)

表 3.2.5-1:Mixed Log(Parse + Transform;File -> BlackHole / TCP -> BlackHole / TCP -> File)

引擎拓扑EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParseFile -> BlackHole659,700557.42889% / 940%170 MB / 184 MB3.80x
Vector-VRLFile -> BlackHole173,750146.81784% / 860%278 MB / 299 MB1.0x
Vector-FixedFile -> BlackHole178,261150.62772% / 836%273 MB / 298 MB1.03x
LogstashFile -> BlackHole50,50542.67911% / 939%1249 MB / 1276 MB0.29x
WarpParseTCP -> BlackHole543,100458.90799% / 824%394 MB / 479 MB2.61x
Vector-VRLTCP -> BlackHole208,200175.92878% / 925%319 MB / 334 MB1.0x
Vector-FixedTCP -> BlackHole206,600174.57919% / 936%296 MB / 321 MB0.99x
LogstashTCP -> BlackHole94,33979.71878% / 941%1285 MB / 1318 MB0.45x
WarpParseTCP -> File299,900253.40616% / 754%332 MB / 493 MB3.86x
Vector-VRLTCP -> File77,60065.57397% / 421%363 MB / 374 MB1.0x
Vector-FixedTCP -> File78,10065.99400% / 421%337 MB / 358 MB1.01x
LogstashTCP -> File93,15378.71859% / 957%1274 MB / 1308 MB1.20x

解析+转换规则大小:

  • WarpParse:3864B
  • Vector-VRL:4723B
  • Vector-Fixed:1733B
  • Logstash:3984B

在同一日志类型 + 同一拓扑下,以 Vector-VRL 的 EPS 作为统一基准(1.0x),对所有引擎进行归一化对比。

规则大小可能受格式/换行/注释/路径等影响,体积差异不影响性能口径;规则逻辑保持一致。

混合日志规则:

  • 4类日志按照3:2:1:1混合

3.2.6 Mixed Log (平均日志大小:867B)

表 3.2.6-1:Mixed Log(Parse Only; TCP -> BlackHole )

引擎拓扑CPU (Avg/Peak)MEM (Avg/Peak)
WarpParseTCP -> BlackHole61% / 82%101 MB / 106 MB
Vector-VRLTCP -> BlackHole116% / 143%191 MB / 194 MB
Vector-FixedTCP -> BlackHole125% / 146%153 MB / 156 MB
LogstashTCP -> BlackHole159% / 192%1119 MB / 1191 MB
  • 20000EPS下的资源消耗情况
  • logstash在warmup后采集

4. 结果解读

4.1 吞吐与资源表现

结果摘要:

  1. 在 Mac 平台测试中,WarpParse 相对 Vector-VRL 的 EPS 倍数范围为:解析 1.42x - 8.78x,解析+转换 1.35x - 8.44x;峰值出现在 Nginx 的 TCP -> File 拓扑。
  2. 同等事件量下,WarpParse 场景的 CPU 使用率整体高于 Vector/Logstash(见各表);吞吐提升与 CPU 占用同时出现。
  3. APT (3K) 场景下,WarpParse 的 MPS 最高为 1109.53 MiB/s(File -> BlackHole);Vector 在同场景的 EPS/MPS 相对更低(见 3.1.4/3.2.4)。

4.2 规则与表达能力要点

  • 规则体积不仅反映配置分发与维护成本, 也可作为衡量引擎在表达同等日志语义时所需复杂度的参考指标。 在相同解析与转换语义下,规则体积越小,通常意味着引擎具备更高层级的内置能力或更强的表达抽象。

  • 各日志类型与拓扑下的规则体积差异见对应表格“规则大小”备注, 用于辅助评估不同引擎在表达能力、规则可读性与维护复杂度方面的差异。

  • Vector 测试同时包含 VRL 与 Fixed 两种策略:

    • VRL 更偏向通用表达能力,对复杂语义具备更强灵活性;
    • Fixed 优先使用内置解析能力,在规则体积与维护复杂度上更具优势。 两者在表达能力与性能上的权衡以表格数据为准。
  • 在多数日志类型下,TCP → File 拓扑呈现更高的性能倍数区间(见 3.1 / 3.2 对应表格), 该结论在不同规则复杂度水平下均保持一致。

4.3 稳定性

  • 本报告未引入背压/队列深度等专用指标,稳定性判断仅基于运行期间吞吐与资源观测。
  • 注意点: TCP -> File 大包场景下内存随吞吐上升(APT 场景峰值约 508 MB;Mixed 约 493 MB),需结合容量规划。

5. 阶段性总结与建议

以下为基于本报告范围的阶段性观察,不构成生产选型结论;实际落地需结合业务流量、架构约束与运维能力评估。

决策维度建议方案结果要点依据
追求吞吐能力WarpParse关注本报告中的 EPS 倍数区间解析场景 1.42x-8.78x,解析+转换 1.35x-8.44x;TCP -> File 拓扑区间更高。
资源受限环境WarpParse关注 CPU/内存的权衡关系虽然峰值 CPU 较高,但完成同等数据量所需的总 CPU 时间更少;小包场景内存控制优异。
边缘/Agent部署WarpParse关注规则体积与单机吞吐规则体积在不同日志类型间存在差异;吞吐指标在本报告中更高,具体差异见各节“规则大小”和表格数据。
通用生态兼容WarpParse关注生态与可扩展性提供面向开发者的 API 与插件扩展机制,支持用户快速开发自定义输入 / 输出模块;在满足性能要求的同时,也具备良好的生态扩展能力。

阶段性结论: 基于本报告数据,WarpParse 与 Vector-VRL 的 EPS 倍数区间为:纯解析 1.42x-8.78x,解析+转换 1.35x-8.44x,端到端(TCP -> File)倍数更高。上述结果可作为同类场景的阶段性基线参考;在大包 TCP -> File 场景需关注内存随吞吐上升(约 490-510MB)。

6. 已知限制与注意事项

  • 本报告为单机测试,未覆盖多节点、HA(High Availability,高可用)、持久化优化或生产负载波动等因素。
  • 测试范围限定为五类日志与三种拓扑,未覆盖更复杂的输入/输出链路。
  • 结果依赖具体硬件、操作系统与存储配置,跨环境对比需谨慎。

Rule Definitions

本文档汇总了测试中使用的解析与解析+转换规则,按日志类型与引擎归档。

1. Nginx Access Log (239B)

WarpParse

  • 解析配置(WPL)
package /nginx/ {
   rule nginx {
        (ip:sip,2*_,chars:timestamp<[,]>,http/request:http_request",chars:status,chars:size,chars:referer",http/agent:http_agent",_")
   }
}
  • 解析+转换配置(WPL + OML)
package /nginx/ {
   rule nginx {
        (ip:sip,2*_,chars:timestamp<[,]>,http/request:http_request",chars:status,chars:size,chars:referer",http/agent:http_agent",_")
   }
}
name : nginx
rule : /nginx/*
---
size : digit = take(size);
status : digit = take(status);
str_status = match read(option:[status]) {
    digit(500) => chars(Internal Server Error);
    digit(404) => chars(Not Found); 
};
match_chars = match read(option:[wp_src_ip]) {
    ip(127.0.0.1) => chars(localhost); 
    !ip(127.0.0.1) => chars(attack_ip); 
};
* : auto = read();

Vector-VRL

  • 解析配置
source = '''
  . |= parse_regex!(.message, r'^(?P<sip>\S+) \S+ \S+ \[(?P<timestamp>[^\]]+)\] "(?P<http_request>[^"]*)" (?P<status>\d{3}) (?P<size>\d+) "(?P<referer>[^"]*)" "(?P<http_agent>[^"]*)"')
  del(.message)
'''
  • 解析+转换配置
source = '''
  . |= parse_regex!(.message, r'^(?P<sip>\S+) \S+ \S+ \[(?P<timestamp>[^\]]+)\] "(?P<http_request>[^"]*)" (?P<status>\d{3}) (?P<size>\d+) "(?P<referer>[^"]*)" "(?P<http_agent>[^"]*)"')
  del(.message)
  .status = to_int!(.status)
  .size = to_int!(.size)
if .host == "127.0.0.1" {
    .match_chars = "localhost"
} else if .host != "127.0.0.1" {
    .match_chars = "attack_ip"
}  
if .status == 500 {
    .str_status = "Internal Server Error"
} else if .status == 404 {
    .str_status = "Not Found"
}  
'''

Vector-Fixed

  • 解析配置
source = '''
  . |= parse_nginx_log!(.message, format: "combined")
  del(.message)
'''
  • 解析+转换配置
source = '''
  . |= parse_nginx_log!(.message, format: "combined")
  .http_agent = .agent
  del(.agent)
  .http_request = .request
  del(.request)
  .sip = .client
  del(.client)
  .status = to_int!(.status)
  .size = to_int!(.size)
if .host == "127.0.0.1" {
    .match_chars = "localhost"
} else if .host != "127.0.0.1" {
    .match_chars = "attack_ip"
}  
if .status == 500 {
    .str_status = "Internal Server Error"
} else if .status == 404 {
    .str_status = "Not Found"
}  
  del(.message)
'''

Logstash

  • 解析配置
filter {
  dissect {
  mapping => {
    "message" => '%{sip} - - [%{timestamp}] "%{http_request}" %{status} %{size} "%{referer}" "%{http_agent}"'
  }
}
  mutate {
  remove_field => ["message","@timestamp","@version","event","[event][original]"]
}
}
  • 解析+转换配置
filter {
  dissect {
    mapping => {
      "message" => '%{sip} - - [%{timestamp}] "%{http_request}" %{status} %{size} "%{referer}" "%{http_agent}"'
    }
  }

  mutate {
    convert => {
      "status" => "integer"
      "size"   => "integer"
    }
  }
  
  if [src_ip] == "127.0.0.1" {
    mutate { add_field => { "match_chars" => "localhost" } }
  } else {
    mutate { add_field => { "match_chars" => "attack_ip" } }
  }

  if [status] == 200 {
    mutate { add_field => { "str_status" => "OK" } }
  } else if [status] == 500 {
    mutate { add_field => { "str_status" => "Internal Server Error" } }
  } 

  mutate {
    remove_field => ["message", "@timestamp", "@version", "event", "[event][original]"]
  }
}

2. AWS ELB Log (411B)

WarpParse

  • 解析配置(WPL)
package /aws/ {
   rule aws {
        (
            symbol(http),
            chars:timestamp,
            chars:elb,
            chars:client_host,
            chars:target_host,
            chars:request_processing_time,
            chars:target_processing_time,
            chars:response_processing_time,
            chars:elb_status_code,
            chars:target_status_code,
            chars:received_bytes,
            chars:sent_bytes,
            chars:request | (chars:request_method, chars:request_url, chars:request_protocol),
            chars:user_agent,
            chars:ssl_cipher,
            chars:ssl_protocol,
            chars:target_group_arn,
            chars:trace_id,
            chars:domain_name,
            chars:chosen_cert_arn,
            chars:matched_rule_priority,
            chars:request_creation_time,
            chars:actions_executed,
            chars:redirect_url,
            chars:error_reason,
            chars:target_port_list,
            chars:target_status_code_list,
            chars:classification,
            chars:classification_reason,
            chars:traceability_id,
        )
   }
   }
  • 解析+转换配置(WPL + OML)
package /aws/ {
   rule aws {
        (
            symbol(http),
            chars:timestamp,
            chars:elb,
            chars:client_host,
            chars:target_host,
            chars:request_processing_time,
            chars:target_processing_time,
            chars:response_processing_time,
            chars:elb_status_code,
            chars:target_status_code,
            chars:received_bytes,
            chars:sent_bytes,
            chars:request | (chars:request_method, chars:request_url, chars:request_protocol),
            chars:user_agent,
            chars:ssl_cipher,
            chars:ssl_protocol,
            chars:target_group_arn,
            chars:trace_id,
            chars:domain_name,
            chars:chosen_cert_arn,
            chars:matched_rule_priority,
            chars:request_creation_time,
            chars:actions_executed,
            chars:redirect_url,
            chars:error_reason,
            chars:target_port_list,
            chars:target_status_code_list,
            chars:classification,
            chars:classification_reason,
            chars:traceability_id,
        )
   }
   }
name : aws
rule : /aws/*
---
sent_bytes:digit = take(sent_bytes) ;
target_status_code:digit = take(target_status_code) ;
elb_status_code:digit = take(elb_status_code) ;
extends : obj = object {
    ssl_cipher = read(ssl_cipher);
    ssl_protocol = read(ssl_protocol);
};
match_chars = match read(option:[wp_src_ip]) {
    ip(127.0.0.1) => chars(localhost); 
    !ip(127.0.0.1) => chars(attack_ip); 
};
str_elb_status = match read(option:[elb_status_code]) {
    digit(200) => chars(ok);
    digit(404) => chars(error); 
};
* : auto = read();

Vector-VRL

  • 解析配置(VRL)
source = '''
  . |= parse_regex!(.message, r'^(?P<symbol>\S+) (?P<timestamp>\S+) (?P<elb>\S+) (?P<client_host>\S+) (?P<target_host>\S+) (?P<request_processing_time>[-\d\.]+) (?P<target_processing_time>[-\d\.]+) (?P<response_processing_time>[-\d\.]+) (?P<elb_status_code>\S+) (?P<target_status_code>\S+) (?P<received_bytes>\d+) (?P<sent_bytes>\d+) "(?P<request_method>\S+) (?P<request_url>[^ ]+) (?P<request_protocol>[^"]+)" "(?P<user_agent>[^"]*)" "(?P<ssl_cipher>[^"]*)" "(?P<ssl_protocol>[^"]*)" (?P<target_group_arn>\S+) "(?P<trace_id>[^"]*)" "(?P<domain_name>[^"]*)" "(?P<chosen_cert_arn>[^"]*)" (?P<matched_rule_priority>\S+) (?P<request_creation_time>\S+) "(?P<actions_executed>[^"]*)" "(?P<redirect_url>[^"]*)" "(?P<error_reason>[^"]*)" "(?P<target_port_list>[^"]*)" "(?P<target_status_code_list>[^"]*)" "(?P<classification>[^"]*)" "(?P<classification_reason>[^"]*)" (?P<traceability_id>\S+)$')
  del(.message)
'''
  • 解析+转换配置(VRL)
source = '''
  . |= parse_regex!(.message, r'^(?P<symbol>\S+) (?P<timestamp>\S+) (?P<elb>\S+) (?P<client_host>\S+) (?P<target_host>\S+) (?P<request_processing_time>[-\d\.]+) (?P<target_processing_time>[-\d\.]+) (?P<response_processing_time>[-\d\.]+) (?P<elb_status_code>\S+) (?P<target_status_code>\S+) (?P<received_bytes>\d+) (?P<sent_bytes>\d+) "(?P<request_method>\S+) (?P<request_url>[^ ]+) (?P<request_protocol>[^"]+)" "(?P<user_agent>[^"]*)" "(?P<ssl_cipher>[^"]*)" "(?P<ssl_protocol>[^"]*)" (?P<target_group_arn>\S+) "(?P<trace_id>[^"]*)" "(?P<domain_name>[^"]*)" "(?P<chosen_cert_arn>[^"]*)" (?P<matched_rule_priority>\S+) (?P<request_creation_time>\S+) "(?P<actions_executed>[^"]*)" "(?P<redirect_url>[^"]*)" "(?P<error_reason>[^"]*)" "(?P<target_port_list>[^"]*)" "(?P<target_status_code_list>[^"]*)" "(?P<classification>[^"]*)" "(?P<classification_reason>[^"]*)" (?P<traceability_id>\S+)$')
  del(.message)
if .host == "127.0.0.1" {
    .match_chars = "localhost"
} else if .host != "127.0.0.1" {
    .match_chars = "attack_ip"
}   
if .elb_status_code == "200" {
    .str_elb_status = "ok"
} else if .elb_status_code == "404" {
    .str__elb_status = "error"
}
  .extends = {
    "ssl_cipher": .ssl_cipher,
    "ssl_protocol": .ssl_protocol,
}
'''

Vector-Fixed

  • 解析配置
source = '''
. |= parse_aws_alb_log!(.message)
del(.message)
'''
  • 解析+转换配置
source = '''
  . |= parse_aws_alb_log!(.message)
  del(.message)
  .symbol = .type
  del(.type)
  .elb_status_code    = to_int!(.elb_status_code)
  .target_status_code = to_int!(.target_status_code)
  .sent_bytes         = to_int!(.sent_bytes)

  if .host == "127.0.0.1" {
    .match_chars = "localhost"
  } else {
    .match_chars = "attack_ip"
  }

  if .elb_status_code == 200 {
    .str_elb_status = "ok"
  } else if .elb_status_code == 404 {
    .str_elb_status = "error"
  }

  .extends = {
    "ssl_cipher": .ssl_cipher,
    "ssl_protocol": .ssl_protocol,
  }
'''

Logstash

  • 解析配置
filter {
  dissect {
    mapping => {
      "message" => '%{symbol} %{timestamp} %{elb} %{client_host} %{target_host} %{request_processing_time} %{target_processing_time} %{response_processing_time} %{elb_status_code} %{target_status_code} %{received_bytes} %{sent_bytes} "%{raw_request}" "%{user_agent}" "%{ssl_cipher}" "%{ssl_protocol}" %{target_group_arn} "%{trace_id}" "%{domain_name}" "%{chosen_cert_arn}" %{matched_rule_priority} %{request_creation_time} "%{actions_executed}" "%{redirect_url}" "%{error_reason}" "%{target_port_list}" "%{target_status_code_list}" "%{classification}" "%{classification_reason}" %{traceability_id}'
    }
  }

  dissect {
    mapping => {
      "raw_request" => "%{request_method} %{request_url} %{request_protocol}"
    }
  }

  mutate {
  remove_field => ["message","@timestamp","@version","event","[event][original]","raw_request"]
}
}
  • 解析+转换配置
filter {
   dissect {
    mapping => {
      "message" => '%{symbol} %{timestamp} %{elb} %{client_host} %{target_host} %{request_processing_time} %{target_processing_time} %{response_processing_time} %{elb_status_code} %{target_status_code} %{received_bytes} %{sent_bytes} "%{raw_request}" "%{user_agent}" "%{ssl_cipher}" "%{ssl_protocol}" %{target_group_arn} "%{trace_id}" "%{domain_name}" "%{chosen_cert_arn}" %{matched_rule_priority} %{request_creation_time} "%{actions_executed}" "%{redirect_url}" "%{error_reason}" "%{target_port_list}" "%{target_status_code_list}" "%{classification}" "%{classification_reason}" %{traceability_id}'
    }
  }

  dissect {
    mapping => {
      "raw_request" => "%{request_method} %{request_url} %{request_protocol}"
    }
    tag_on_failure => ["aws_raw_request_dissect_failure"]
  }

    mutate {
    convert => {
      "client_port"              => "integer"
      "target_port"              => "integer"
      "request_processing_time"  => "float"
      "target_processing_time"   => "float"
      "response_processing_time" => "float"
      "elb_status_code"          => "integer"
      "target_status_code"       => "integer"
      "received_bytes"           => "integer"
      "sent_bytes"               => "integer"
      "matched_rule_priority"    => "integer"
    }
  }
  mutate {
    add_field => {
      "[extends][ssl_cipher]"   => "%{ssl_cipher}"
      "[extends][ssl_protocol]" => "%{ssl_protocol}"
    }
  }

  if [src_ip] == "127.0.0.1" {
    mutate { add_field => { "match_chars" => "localhost" } }
  } else {
    mutate { add_field => { "match_chars" => "attack_ip" } }
  }

  if [elb_status_code] == 200 {
    mutate { add_field => { "str_elb_status" => "ok" } }
  } else if [elb_status_code] == 404 {
    mutate { add_field => { "str_elb_status" => "not_found" } }
  } else {
    mutate { add_field => { "str_elb_status" => "error" } }
  }

  mutate {
    remove_field => ["message","raw_request","@timestamp","@version","event","[event][original]"]
  }
}

3. Firewall Log (1K, KV)

WarpParse

  • 解析配置(WPL)
package /firewall/{
    rule firewall{
        (
          chars:timestamp\S,
          2*_,
          kv()| (*kv()\|),
        )
    }
}
  • 解析+转换配置(WPL + OML)
package /firewall/{
    rule firewall{
        (
          chars:timestamp\S,
          2*_,
          kv()| (*kv()\|),
        )
    }
}
name : /oml/firewall
rule : /firewall/*
---
ipVersion:digit = take(ipVersion) ;
packetCount:digit = take(packetCount) ;
enrich_level = match read(option:[proto]) {
    chars(UDP) => chars(0);
    chars(TCP) => chars(1);
};
extends : obj = object {
    srcIP = read(srcIP);
    srcPort = read(srcPort);
};
extends_dir = object {
    url = read(url);
    urlCategory = read(urlCategory);
};
match_chars = match read(option:[srcIP]) {
    chars(10.17.34.12) => chars(internal); 
    !chars(10.17.34.12) => chars(external); 
};
num_range = match read(option:[ipVersion]) {
    in ( digit(0), digit(1000) ) => read(ipVersion) ;
    _ => digit(0) ;
};
* : auto = read();

Vector-VRL

  • 解析配置(VRL)
source = '''
raw = to_string!(.message)
m = parse_regex!(raw,r'^(?P<ts>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\s+(?P<severity>\S+)\s+(?P<tz>[+-]\d{2}:\d{2})\s+(?P<label>Block|Allow|Drop|Reject):\s+(?P<kv>.*)$')
.timestamp = m.ts
. |= parse_key_value!(m.kv,field_delimiter: "|",key_value_delimiter: "=")
del(.message)
'''
  • 解析+转换配置(VRL)
source = '''
raw = to_string!(.message)
m = parse_regex!(raw,r'^(?P<ts>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\s+(?P<severity>\S+)\s+(?P<tz>[+-]\d{2}:\d{2})\s+(?P<label>Block|Allow|Drop|Reject):\s+(?P<kv>.*)$')
.timestamp = m.ts
. |= parse_key_value!(m.kv,field_delimiter: "|",key_value_delimiter: "=")

.ipVersion = to_int!(.ipVersion)
.packetCount = to_int!(.packetCount)           
if .proto == "UDP" {
    .enrich_level = "0"
} else if .host != "TCP" {
    .enrich_level = "1"
}   
.extends = {
    "srcIP": .srcIP,
    "srcPort": .srcPort,
}
.extends_dir = {
    "url": .url,
    "urlCategory": .urlCategory,
}
if .srcIP == "10.17.34.12" {
    .match_chars = "internal"
} else if .srcIP != "10.17.34.12"{
    .match_chars = "external"
} 
.num_range = if .ipVersion >= 0 && .ipVersion <= 1000 {
    .ipVersion
} else {
    0
}

del(.message)
'''

Logstash

  • 解析配置
filter {
  grok {
    match => {
      "message" => [
        "^(?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) %{WORD:severity} (?<tz>[+-]\d{2}:\d{2}) %{WORD:label}: %{GREEDYDATA:body}$"
      ]
    }
  }

  kv {
    source => "body"
    field_split => "|"
    value_split => "="
    trim_key => " "
    trim_value => " "
    allow_empty_values => true
  }

  mutate {
    remove_field => [
      "severity","tz","label","body","message","@version","host","event"
    ]
  }
}
  • 解析+转换配置
filter {
  grok {
    match => {
      "message" => [
        "^(?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) %{WORD:severity} (?<tz>[+-]\d{2}:\d{2}) %{WORD:label}: %{GREEDYDATA:body}$"
      ]
    }
  }

  kv {
    source => "body"
    field_split => "|"
    value_split => "="
    trim_key => " "
    trim_value => " "
    allow_empty_values => true
  }
  mutate {
  convert => {
    "ipVersion" => "integer"
    "packetCount" => "integer"

  }
}
if [proto] == "UDP"  {
  mutate { add_field => { "enrich_level" => "0" } }
} else if [proto] == "TCP" {
  mutate { add_field => { "enrich_level" => "1" } }
}
mutate {
  add_field => {
    "[extends][srcIP]"  => "%{srcIP}"
    "[extends][srcPort]" => "%{srcPort}"
  }
}
mutate {
  add_field => {
    "[extends_dir][url]" => "%{url}"
    "[extends_dir][urlCategory]" => "%{urlCategory}"
  }
}
if [srcIP] == "10.17.34.12" {
  mutate { add_field => { "match_chars" => "internal" } }
} else {
  mutate { add_field => { "match_chars" => "external" } }
}
if [ipVersion] and [ipVersion] >= 0 and [ipVersion] <= 1000 {
  mutate { add_field => { "num_range" => "%{ipVersion}" } }
} else {
  mutate { add_field => { "num_range" => "0" } }
}
  mutate {
    remove_field => [
      "severity",
      "tz",
      "label",
      "body",
      "message",
      "@version",
      "host",
      "event"
    ]
  }
}

4. APT Threat Log (3K)

WarpParse

  • 解析配置(WPL)
package /apt/ {
   rule apt {
        (
            _\#,
            time:timestamp,
            _,
            chars:Hostname,
            _\%\%, 
            chars:ModuleName\/,
            chars:SeverityHeader\/,
            symbol(ANTI-APT)\(,
            chars:type\),
            chars:Count<[,]>,
            _\:,
            chars:Content\(,
        ),
        (
            kv(chars@SyslogId),
            kv(chars@VSys),
            kv(chars@Policy),
            kv(chars@SrcIp),
            kv(chars@DstIp),
            kv(chars@SrcPort),
            kv(chars@DstPort),
            kv(chars@SrcZone),
            kv(chars@DstZone),
            kv(chars@User),
            kv(chars@Protocol),
            kv(chars@Application),
            kv(chars@Profile),
            kv(chars@Direction),
            kv(chars@ThreatType),
            kv(chars@ThreatName),
            kv(chars@Action),
            kv(chars@FileType),
            kv(chars@Hash)\),
        )\,
    }
   }
  • 解析+转换配置(WPL + OML)
package /apt/ {
   rule apt {
        (
            _\#,
            time:timestamp,
            _,
            chars:Hostname,
            _\%\%, 
            chars:ModuleName\/,
            chars:SeverityHeader\/,
            symbol(ANTI-APT)\(,
            chars:type\),
            chars:Count<[,]>,
            _\:,
            chars:Content\(,
        ),
        (
            kv(chars@SyslogId),
            kv(chars@VSys),
            kv(chars@Policy),
            kv(chars@SrcIp),
            kv(chars@DstIp),
            kv(chars@SrcPort),
            kv(chars@DstPort),
            kv(chars@SrcZone),
            kv(chars@DstZone),
            kv(chars@User),
            kv(chars@Protocol),
            kv(chars@Application),
            kv(chars@Profile),
            kv(chars@Direction),
            kv(chars@ThreatType),
            kv(chars@ThreatName),
            kv(chars@Action),
            kv(chars@FileType),
            kv(chars@Hash)\),
        )\,
    }
   }
name : apt
rule : /apt/*
---
count:digit = take(Count) ;
severity:digit = take(SeverityHeader) ;
match_chars = match read(option:[wp_src_ip]) {
    ip(127.0.0.1) => chars(localhost); 
    !ip(127.0.0.1) => chars(attack_ip); 
};
num_range = match read(option:[count]) {
    in ( digit(0), digit(1000) ) => read(count) ;
    _ => digit(0) ;
};
src_system_log_type = match read(option:[type]) {
    chars(l) => chars(日志信息);
    chars(s) => chars(安全日志信息);
};
extends_ip : obj = object {
    DstIp = read(DstIp);
    SrcIp = read(SrcIp);
};
extends_info : obj = object {
    hostname = read(Hostname);
    source_type = read(wp_src_key)
};
* : auto = read();

Vector-VRL

  • 解析配置(VRL)
source = '''
  . |= parse_regex!(.message, r'(?s)^#(?P<timestamp>\w+\s+\d+\s+\d{4}\s+\d{2}:\d{2}:\d{2}[+-]\d{2}:\d{2})\s+(?P<Hostname>\S+)\s+%%(?P<ModuleName>\d+[^/]+)/(?P<SeverityHeader>\d+)/(?P<symbol>[^(]+)\((?P<type>[^)]+)\)\[(?P<Count>\d+)\]:\s*(?P<Content>[^()]+?)\s*\(SyslogId=(?P<SyslogId>[^,]+),\s+VSys="(?P<VSys>[^"]+)",\s+Policy="(?P<Policy>[^"]+)",\s+SrcIp=(?P<SrcIp>[^,]+),\s+DstIp=(?P<DstIp>[^,]+),\s+SrcPort=(?P<SrcPort>[^,]+),\s+DstPort=(?P<DstPort>[^,]+),\s+SrcZone=(?P<SrcZone>[^,]+),\s+DstZone=(?P<DstZone>[^,]+),\s+User="(?P<User>[^"]+)",\s+Protocol=(?P<Protocol>[^,]+),\s+Application="(?P<Application>[^"]+)",\s+Profile="(?P<Profile>[^"]+)",\s+Direction=(?P<Direction>[^,]+),\s+ThreatType=(?P<ThreatType>[^,]+),\s+ThreatName=(?P<ThreatName>[^,]+),\s+Action=(?P<Action>[^,]+),\s+FileType=(?P<FileType>[^,]+),\s+Hash=(?P<Hash>.*)\)$')
  del(.message)
'''
  • 解析+转换配置(VRL)
source = '''
  . |= parse_regex!(.message, r'(?s)^#(?P<timestamp>\w+\s+\d+\s+\d{4}\s+\d{2}:\d{2}:\d{2}[+-]\d{2}:\d{2})\s+(?P<Hostname>\S+)\s+%%(?P<ModuleName>\d+[^/]+)/(?P<SeverityHeader>\d+)/(?P<symbol>[^(]+)\((?P<type>[^)]+)\)\[(?P<Count>\d+)\]:\s*(?P<Content>[^()]+?)\s*\(SyslogId=(?P<SyslogId>[^,]+),\s+VSys="(?P<VSys>[^"]+)",\s+Policy="(?P<Policy>[^"]+)",\s+SrcIp=(?P<SrcIp>[^,]+),\s+DstIp=(?P<DstIp>[^,]+),\s+SrcPort=(?P<SrcPort>[^,]+),\s+DstPort=(?P<DstPort>[^,]+),\s+SrcZone=(?P<SrcZone>[^,]+),\s+DstZone=(?P<DstZone>[^,]+),\s+User="(?P<User>[^"]+)",\s+Protocol=(?P<Protocol>[^,]+),\s+Application="(?P<Application>[^"]+)",\s+Profile="(?P<Profile>[^"]+)",\s+Direction=(?P<Direction>[^,]+),\s+ThreatType=(?P<ThreatType>[^,]+),\s+ThreatName=(?P<ThreatName>[^,]+),\s+Action=(?P<Action>[^,]+),\s+FileType=(?P<FileType>[^,]+),\s+Hash=(?P<Hash>.*)\)$')
  del(.message)
.severity = to_int!(.SeverityHeader)
.Count = to_int!(.Count)
if .host == "127.0.0.1" {
    .match_chars = "localhost"
} else if .host != "127.0.0.1" {
    .match_chars = "attack_ip"
}  
if .type == "l" {
.src_system_log_type = "日志信息"
} else if .type == "s" {
.src_system_log_type = "安全日志信息"
}
.extends_ip = {
    "DstIp": .DstIp,
    "SrcIp": .SrcIp,
}
.extends_info = {
    "hostname": .Hostname,
    "source_type": .source_type,
}
.num_range = if .Count >= 0 && .Count <= 1000 {
    .Count
} else {
    0
}
'''

Logstash

  • 解析配置
filter {

 mutate { copy => { "message" => "raw" } }

  mutate {
    gsub => [ "raw", "^#", "" ]
  }

  grok {
    match => {
      "raw" => [
        "^(?<timestamp>[A-Za-z]{3}\s+\d{1,2}\s+\d{4}\s+\d{2}:\d{2}:\d{2}\+\d{2}:\d{2})\s+(?<Hostname>\S+)\s+%%(?<ModuleName>[^/]+)/(?<SeverityHeader>\d+)/(?<symbol>[^(]+)\((?<type>[^)]+)\)\[(?<Count>\d+)\]:\s+(?<Content>.*?)\s+\((?<kv_pairs>.*)\)\s*$"
      ]
    }
    tag_on_failure => ["_grokfailure"]
  }

  kv {
    source => "kv_pairs"
    target => ""
    value_split => "="
    field_split_pattern => ", (?=[A-Za-z][A-Za-z0-9_]*=)"
    trim_key => " "
    trim_value => " \""
    remove_char_value => "\""
  }

  if [ExtraInfo] and [Hash] {

  mutate {
    gsub => [
      "ExtraInfo", "\\\\\"", "\""
    ]
  }

  mutate {
    replace => { "Hash" => "%{Hash}, ExtraInfo=\"%{ExtraInfo}\"" }
    remove_field => ["ExtraInfo"]
  }
}

  mutate {
    remove_field => ["raw", "kv_pairs"]
  }

  mutate {
    remove_field => ["@timestamp", "@version", "[event]","message"]
  }

}


output {
  file { path => "/dev/null" codec => "json_lines" }
}
  • 解析+转换配置
filter {
 mutate { copy => { "message" => "raw" } }

  mutate {
    gsub => [ "raw", "^#", "" ]
  }

  grok {
    match => {
      "raw" => [
        "^(?<timestamp>[A-Za-z]{3}\s+\d{1,2}\s+\d{4}\s+\d{2}:\d{2}:\d{2}\+\d{2}:\d{2})\s+(?<Hostname>\S+)\s+%%(?<ModuleName>[^/]+)/(?<SeverityHeader>\d+)/(?<symbol>[^(]+)\((?<type>[^)]+)\)\[(?<Count>\d+)\]:\s+(?<Content>.*?)\s+\((?<kv_pairs>.*)\)\s*$"
      ]
    }
    tag_on_failure => ["_grokfailure"]
  }

  kv {
    source => "kv_pairs"
    target => ""
    value_split => "="
    field_split_pattern => ", (?=[A-Za-z][A-Za-z0-9_]*=)"
    trim_key => " "
    trim_value => " \""
    remove_char_value => "\""
  }

  if [ExtraInfo] and [Hash] {
  mutate {
    gsub => [
      "ExtraInfo", "\\\\\"", "\""
    ]
  }

  mutate {
    replace => { "Hash" => "%{Hash}, ExtraInfo=\"%{ExtraInfo}\"" }
    remove_field => ["ExtraInfo"]
  }
}

  mutate {
    remove_field => ["raw", "kv_pairs"]
  }

mutate {
  convert => {
    "SeverityHeader" => "integer"
    "Count"          => "integer"
  }
}

mutate {
  add_field => { "severity" => "%{SeverityHeader}" }
}
mutate { convert => { "severity" => "integer" } }

if [src_ip] == "127.0.0.1" {
  mutate { add_field => { "match_chars" => "localhost" } }
} else {
  mutate { add_field => { "match_chars" => "attack_ip" } }
}

if [type] == "l" {
  mutate { add_field => { "src_system_log_type" => "日志信息" } }
} else if [type] == "s" {
  mutate { add_field => { "src_system_log_type" => "安全日志信息" } }
}

mutate {
  add_field => {
    "[extends_ip][DstIp]" => "%{DstIp}"
    "[extends_ip][SrcIp]" => "%{SrcIp}"
  }
}

mutate {
  add_field => {
    "[extends_info][hostname]"    => "%{Hostname}"
    "[extends_info][source_type]" => "%{source_type}"
  }
}

if [Count] and [Count] >= 0 and [Count] <= 1000 {
  mutate { add_field => { "num_range" => "%{Count}" } }
} else {
  mutate { add_field => { "num_range" => "0" } }
}
mutate { convert => { "num_range" => "integer" } }

  mutate {
    remove_field => ["@timestamp", "@version", "[event]","message"]
  }
}

output {
  file { path => "/dev/null" codec => "json_lines" }
}

Test Data Samples

本文档展示了四类日志在经过解析与转换后的输出数据样本 (Output Samples)。

1. Nginx Access Log Samples

样本数据

180.57.30.148 - - [21/Jan/2025:01:40:02 +0800] "GET /nginx-logo.png HTTP/1.1" 500 368 "http://207.131.38.110/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36" "-"

解析结果 (Parsed):

  • WarpParse:
{
	"wp_event_id": 1764645169882925000,
	"sip": "180.57.30.148",
	"timestamp": "21/Jan/2025:01:40:02 +0800",
	"http_request": "GET /nginx-logo.png HTTP/1.1",
	"status": "500",
	"size": "368",
	"referer": "http://207.131.38.110/",
	"http_agent": "Mozilla/5.0(Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36 ",
	"wp_src_key": "socket",
	"wp_src_ip": "127.0.0.1"
}
  • Vector:
{
	"host": "127.0.0.1",
	"http_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36",
	"http_request": "GET /nginx-logo.png HTTP/1.1",
	"port": 58102,
	"referer": "http://207.131.38.110/",
	"sip": "180.57.30.148",
	"size": "368",
	"source_type": "socket",
	"status": "500",
	"timestamp": "21/Jan/2025:01:40:02 +0800"
}

解析+转换结果 (Transformed):

  • WarpParse:
{
	"host": "127.0.0.1",
	"http_agent": "Mozilla/5.0(Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36 ",
	"http_request": "GET /nginx-logo.png HTTP/1.1",
	"match_chars": "localhost",
	"referer": "http://207.131.38.110/",
	"sip": "180.57.30.148",
	"size": 368,
	"source_type": "socket",
	"status": 500,
	"str_status": "Internal Server Error",
	"timestamp": "21/Jan/2025:01:40:02 +0800"
}
  • Vector:
{
	"host": "127.0.0.1",
	"http_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36",
	"http_request": "GET /nginx-logo.png HTTP/1.1",
	"match_chars": "localhost",
	"port": 53894,
	"referer": "http://207.131.38.110/",
	"sip": "180.57.30.148",
	"size": 368,
	"source_type": "socket",
	"status": 500,
	"str_status": "Internal Server Error",
	"timestamp": "21/Jan/2025:01:40:02 +0800"
}

2. AWS ELB Log Samples

样本数据

http 2018-11-30T22:23:00.186641Z app/my-lb 192.168.1.10:2000 10.0.0.15:8080 0.01 0.02 0.01 200 200 100 200 "POST https://api.example.com/u?p=1&sid=2&t=3 HTTP/1.1" "Mozilla/5.0 (Win) Chrome/90" "ECDHE" "TLSv1.3" arn:aws:elb:us:123:tg "Root=1-test" "api.example.com" "arn:aws:acm:us:123:cert/short" 1 2018-11-30T22:22:48.364000Z "forward" "https://auth.example.com/r" "err" "10.0.0.1:80" "200" "cls" "rsn" TID_x1

解析结果 (Parsed):

  • WarpParse:
{
	"wp_event_id": 1764646097464011000,
	"symbol": "http",
	"timestamp": "2018-11-30T22:23:00.186641Z",
	"elb": "app/my-lb",
	"client_host": "192.168.1.10:2000",
	"target_host": "10.0.0.15:8080",
	"request_processing_time": "0.01",
	"target_processing_time": "0.02",
	"response_processing_time": "0.01",
	"elb_status_code": "200",
	"target_status_code": "200",
	"received_bytes": "100",
	"sent_bytes": "200",
	"request_method": "POST",
	"request_url": "https://api.example.com/u?p=1&sid=2&t=3",
	"request_protocol": "HTTP/1.1",
	"user_agent": "Mozilla/5.0 (Win) Chrome/90",
	"ssl_cipher": "ECDHE",
	"ssl_protocol": "TLSv1.3",
	"target_group_arn": "arn:aws:elb:us:123:tg",
	"trace_id": "Root=1-test",
	"domain_name": "api.example.com",
	"chosen_cert_arn": "arn:aws:acm:us:123:cert/short",
	"matched_rule_priority": "1",
	"request_creation_time": "2018-11-30T22:22:48.364000Z",
	"actions_executed": "forward",
	"redirect_url": "https://auth.example.com/r",
	"error_reason": "err",
	"target_port_list": "10.0.0.1:80",
	"target_status_code_list": "200",
	"classification": "cls",
	"classification_reason": "rsn",
	"traceability_id": "TID_x1",
	"wp_src_key": "socket",
	"wp_src_ip": "127.0.0.1"
}
  • Vector:
{
	"actions_executed": "forward",
	"chosen_cert_arn": "arn:aws:acm:us:123:cert/short",
	"classification": "cls",
	"classification_reason": "rsn",
	"client_host": "192.168.1.10:2000",
	"domain_name": "api.example.com",
	"elb": "app/my-lb",
	"elb_status_code": "200",
	"error_reason": "err",
	"host": "127.0.0.1",
	"matched_rule_priority": "1",
	"port": 58786,
	"received_bytes": "100",
	"redirect_url": "https://auth.example.com/r",
	"request_creation_time": "2018-11-30T22:22:48.364000Z",
	"request_method": "POST",
	"request_processing_time": "0.01",
	"request_protocol": "HTTP/1.1",
	"request_url": "https://api.example.com/u?p=1&sid=2&t=3",
	"response_processing_time": "0.01",
	"sent_bytes": "200",
	"source_type": "socket",
	"ssl_cipher": "ECDHE",
	"ssl_protocol": "TLSv1.3",
	"symbol": "http",
	"target_group_arn": "arn:aws:elb:us:123:tg",
	"target_host": "10.0.0.15:8080",
	"target_port_list": "10.0.0.1:80",
	"target_processing_time": "0.02",
	"target_status_code": "200",
	"target_status_code_list": "200",
	"timestamp": "2018-11-30T22:23:00.186641Z",
	"trace_id": "Root=1-test",
	"traceability_id": "TID_x1",
	"user_agent": "Mozilla/5.0 (Win) Chrome/90"
}

解析+转换结果 (Transformed):

  • WarpParse:
{
	"timestamp": "2018-11-30T22:23:00.186641Z",
	"actions_executed": "forward",
	"chosen_cert_arn": "arn:aws:acm:us:123:cert/short",
	"classification": "cls",
	"classification_reason": "rsn",
	"client_host": "192.168.1.10:2000",
	"domain_name": "api.example.com",
	"elb": "app/my-lb",
	"elb_status_code": 200,
	"error_reason": "err",
	"extends": {
		"ssl_cipher": "ECDHE",
		"ssl_protocol": "TLSv1.3"
	},
	"host": "127.0.0.1",
	"match_chars": "localhost",
	"matched_rule_priority": "1",
	"received_bytes": "100",
	"redirect_url": "https://auth.example.com/r",
	"request_creation_time": "2018-11-30T22:22:48.364000Z",
	"request_method": "POST",
	"request_processing_time": "0.01",
	"request_protocol": "HTTP/1.1",
	"request_url": "https://api.example.com/u?p=1&sid=2&t=3",
	"response_processing_time": "0.01",
	"sent_bytes": 200,
	"source_type": "socket",
	"ssl_cipher": "ECDHE",
	"ssl_protocol": "TLSv1.3",
	"str_elb_status": "ok",
	"target_group_arn": "arn:aws:elb:us:123:tg",
	"target_host": "10.0.0.15:8080",
	"target_port_list": "10.0.0.1:80",
	"target_processing_time": "0.02",
	"target_status_code": 200,
	"target_status_code_list": "200",
	"trace_id": "Root=1-test",
	"traceability_id": "TID_x1",
	"user_agent": "Mozilla/5.0 (Win) Chrome/90"
}
  • Vector:
{
	"actions_executed": "forward",
	"chosen_cert_arn": "arn:aws:acm:us:123:cert/short",
	"classification": "cls",
	"classification_reason": "rsn",
	"client_host": "192.168.1.10:2000",
	"domain_name": "api.example.com",
	"elb": "app/my-lb",
	"elb_status_code": 200,
	"error_reason": "err",
	"extends": {
		"ssl_cipher": "ECDHE",
		"ssl_protocol": "TLSv1.3"
	},
	"host": "127.0.0.1",
	"match_chars": "localhost",
	"matched_rule_priority": "1",
	"port": 53995,
	"received_bytes": "100",
	"redirect_url": "https://auth.example.com/r",
	"request_creation_time": "2018-11-30T22:22:48.364000Z",
	"request_method": "POST",
	"request_processing_time": "0.01",
	"request_protocol": "HTTP/1.1",
	"request_url": "https://api.example.com/u?p=1&sid=2&t=3",
	"response_processing_time": "0.01",
	"sent_bytes": 200,
	"source_type": "socket",
	"ssl_cipher": "ECDHE",
	"ssl_protocol": "TLSv1.3",
	"str_elb_status": "ok",
	"symbol": "http",
	"target_group_arn": "arn:aws:elb:us:123:tg",
	"target_host": "10.0.0.15:8080",
	"target_port_list": "10.0.0.1:80",
	"target_processing_time": "0.02",
	"target_status_code": 200,
	"target_status_code_list": "200",
	"timestamp": "2018-11-30T22:23:00.186641Z",
	"trace_id": "Root=1-test",
	"traceability_id": "TID_x1",
	"user_agent": "Mozilla/5.0 (Win) Chrome/90"
}

3. Sysmon Log Samples

样本数据

<14>Apr 09 18:37:27 10.77.32.19 Microsoft-Windows-Sysmon:{"Id":1,"Version":1,"Level":4,"Task":1,"Opcode":0,"Keywords":0,"RecordId":null,"ProviderName":"P","ProviderId":"PID","LogName":"L","ProcessId":1,"ThreadId":1,"MachineName":"A","TimeCreated":"2025-04-10T14:17:28.693228+08:00","ActivityId":null,"RelatedActivityId":null,"Qualifiers":null,"LevelDisplayName":"信息","OpcodeDisplayName":"信息","TaskDisplayName":"Process Create","Description":{"RuleName":"R","UtcTime":"2025-04-10 06:17:28.503","ProcessGuid":"{G}","ProcessId":"1","Image":"C:\\Windows\\a.exe","FileVersion":"1","Description":"D","Product":"P","Company":"C","OriginalFileName":"a.exe","CommandLine":"a.exe","CurrentDirectory":"C:\\","User":"U","LogonGuid":"{LG}","LogonId":"1","TerminalSessionId":"1","IntegrityLevel":"M","Hashes":"H","ParentProcessGuid":"{PG}","ParentProcessId":"1","ParentImage":"C:\\Windows\\b.exe","ParentCommandLine":"b.exe","ParentUser":"U"},"DescriptionRawMessage":"Process Create\r\nRuleName: R"}

解析结果 (Parsed):

  • WarpParse:
{
	"wp_event_id": 1764657738662604000,
	"cmd_line": "a.exe",
	"product_company": "C",
	"current_dir": "C:\\\\",
	"process_desc": "D",
	"file_version": "1",
	"Hashes": "H",
	"process_path": "C:\\\\Windows\\\\a.exe",
	"integrity_level": "M",
	"logon_guid": "{LG}",
	"logon_id": "1",
	"origin_file_name": "a.exe",
	"parent_cmd_line": "b.exe",
	"parent_process_path": "C:\\\\Windows\\\\b.exe",
	"parent_process_guid": "{PG}",
	"parent_process_id": "1",
	"parent_process_user": "U",
	"process_guid": "{G}",
	"process_id": "1",
	"product_name": "P",
	"rule_name": "R",
	"terminal_session_id": "1",
	"user_name": "U",
	"occur_time": "2025-04-10 06:17:28.503",
	"DescriptionRawMessage": "Process Create\\r\\nRuleName: R",
	"id": "1",
	"keywords": "0",
	"severity": "4",
	"LevelDisplayName": "信息",
	"LogName": "L",
	"MachineName": "A",
	"Opcode": "0",
	"OpcodeDisplayName": "信息",
	"ProcessId": "1",
	"ProviderId": "PID",
	"ProviderName": "P",
	"Task": "1",
	"TaskDisplayName": "Process Create",
	"ThreadId": "1",
	"TimeCreated": "2025-04-10T14:17:28.693228+08:00",
	"Version": "1",
	"wp_src_key": "socket",
	"wp_src_ip": "127.0.0.1"
}
  • Vector:
{
	"DescriptionRawMessage": "Process Create\\r\\nRuleName: R",
	"Hashes": "H",
	"LevelDisplayName": "信息",
	"LogName": "L",
	"MachineName": "A",
	"Opcode": "0",
	"OpcodeDisplayName": "信息",
	"ProcessId": "1",
	"ProviderId": "PID",
	"ProviderName": "P",
	"Task": "1",
	"TaskDisplayName": "Process Create",
	"ThreadId": "1",
	"TimeCreated": "2025-04-10T14:17:28.693228+08:00",
	"Version": "1",
	"cmd_line": "a.exe",
	"current_dir": "C:\\\\",
	"file_version": "1",
	"host": "127.0.0.1",
	"id": "1",
	"integrity_level": "M",
	"keywords": "0",
	"logon_guid": "{LG}",
	"logon_id": "1",
	"occur_time": "2025-04-10 06:17:28.503",
	"origin_file_name": "a.exe",
	"parent_cmd_line": "b.exe",
	"parent_process_guid": "{PG}",
	"parent_process_id": "1",
	"parent_process_path": "C:\\\\Windows\\\\b.exe",
	"parent_process_user": "U",
	"port": 50558,
	"process_desc": "D",
	"process_guid": "{G}",
	"process_id": "1",
	"process_path": "C:\\\\Windows\\\\a.exe",
	"product_company": "C",
	"product_name": "P",
	"rule_name": "R",
	"severity": "4",
	"source_type": "socket",
	"terminal_session_id": "1",
	"timestamp": "2025-12-02T06:33:53.716258Z",
	"user_name": "U"
}

解析+转换结果 (Transformed):

  • WarpParse:
{
	"Id": 1,
	"LogonId": 1,
	"enrich_level": "severity",
	"extends": {
		"OriginalFileName": "a.exe",
		"ParentCommandLine": "b.exe"
	},
	"extends_dir": {
		"ParentProcessPath": "C:\\\\Windows\\\\b.exe",
		"Process_path": "C:\\\\Windows\\\\a.exe"
	},
	"match_chars": "localhost",
	"num_range": 1,
	"wp_event_id": 1764813339134818000,
	"cmd_line": "a.exe",
	"product_company": "C",
	"current_dir": "C:\\\\",
	"process_desc": "D",
	"file_version": "1",
	"Hashes": "H",
	"process_path": "C:\\\\Windows\\\\a.exe",
	"integrity_level": "M",
	"logon_guid": "{LG}",
	"origin_file_name": "a.exe",
	"parent_cmd_line": "b.exe",
	"parent_process_path": "C:\\\\Windows\\\\b.exe",
	"parent_process_guid": "{PG}",
	"parent_process_id": "1",
	"parent_process_user": "U",
	"process_guid": "{G}",
	"process_id": "1",
	"product_name": "P",
	"rule_name": "R",
	"terminal_session_id": "1",
	"user_name": "U",
	"occur_time": "2025-04-10 06:17:28.503",
	"DescriptionRawMessage": "Process Create\\\\r\\\\nRuleName: R",
	"keywords": "0",
	"severity": "4",
	"LevelDisplayName": "信息",
	"LogName": "L",
	"MachineName": "A",
	"Opcode": "0",
	"OpcodeDisplayName": "信息",
	"ProcessId": "1",
	"ProviderId": "PID",
	"ProviderName": "P",
	"Task": "1",
	"TaskDisplayName": "Process Create",
	"ThreadId": "1",
	"TimeCreated": "2025-04-10T14:17:28.693228+08:00",
	"Version": "1",
	"wp_src_key": "socket",
	"wp_src_ip": "127.0.0.1"
}
  • Vector:
{
	"DescriptionRawMessage": "Process Create\\\\r\\\\nRuleName: R",
	"Hashes": "H",
	"Id": 1,
	"LevelDisplayName": "信息",
	"LogName": "L",
	"LogonId": 1,
	"MachineName": "A",
	"Opcode": "0",
	"OpcodeDisplayName": "信息",
	"ProcessId": "1",
	"ProviderId": "PID",
	"ProviderName": "P",
	"Task": "1",
	"TaskDisplayName": "Process Create",
	"ThreadId": "1",
	"TimeCreated": "2025-04-10T14:17:28.693228+08:00",
	"Version": "1",
	"cmd_line": "a.exe",
	"current_dir": "C:\\\\",
	"enrich_level": "severity",
	"extends": {
		"OriginalFileName": "a.exe",
		"ParentCommandLine": "b.exe"
	},
	"extends_dir": {
		"ParentProcessPath": "C:\\\\Windows\\\\b.exe",
		"Process_path": "C:\\\\Windows\\\\a.exe"
	},
	"file_version": "1",
	"host": "127.0.0.1",
	"integrity_level": "M",
	"keywords": "0",
	"logon_guid": "{LG}",
	"match_chars": "localhost",
	"num_range": 1,
	"occur_time": "2025-04-10 06:17:28.503",
	"origin_file_name": "a.exe",
	"parent_cmd_line": "b.exe",
	"parent_process_guid": "{PG}",
	"parent_process_id": "1",
	"parent_process_path": "C:\\\\Windows\\\\b.exe",
	"parent_process_user": "U",
	"port": 49838,
	"process_desc": "D",
	"process_guid": "{G}",
	"process_id": "1",
	"process_path": "C:\\\\Windows\\\\a.exe",
	"product_company": "C",
	"product_name": "P",
	"rule_name": "R",
	"severity": "4",
	"source_type": "socket",
	"terminal_session_id": "1",
	"timestamp": "2025-12-04T02:04:24.686378Z",
	"user_name": "U"
}

4. APT Threat Log Samples

样本数据

#Feb  7 2025 15:07:18+08:00 USG1000E %%01ANTI-APT/4/ANTI-APT(l)[29]: An advanced persistent threat was detected. (SyslogId=1, VSys="public-long-virtual-system-name-for-testing-extra-large-value-to-simulate-enterprise-scenario", Policy="trust-untrust-high-risk-policy-with-deep-inspection-and-layer7-protection-enabled-for-advanced-threat-detection", SrcIp=192.168.1.123, DstIp=182.150.63.102, SrcPort=51784, DstPort=10781, SrcZone=trust-zone-with-multiple-segments-for-internal-security-domains-and-access-control, DstZone=untrust-wide-area-network-zone-with-external-facing-interfaces-and-honeynet-integration, User="unknown-long-user-field-used-for-simulation-purpose-with-extra-description-and-tags-[tag1][tag2][tag3]-to-reach-required-size", Protocol=TCP, Application="HTTP-long-application-signature-identification-with-multiple-behavior-patterns-and-deep-packet-inspection-enabled", Profile="IPS_default_advanced_extended_profile_with_ml_detection-long", Direction=aaa-long-direction-field-used-to-extend-size-with-additional-info-about-traffic-orientation-from-client-to-server, ThreatType=File Reputation with additional descriptive context of multi-layer analysis engine including sandbox-behavioral-signature-ml-static-analysis-and-network-correlation-modules-working-together, ThreatName=bbb-advanced-threat-campaign-with-code-name-operation-shadow-storm-and-related-IOCs-collected-over-multiple-incidents-in-the-wild-attached-metadata-[phase1][phase2][phase3], Action=ccc-block-and-alert-with-deep-scan-followed-by-quarantine-and-forensic-dump-generation-for-further-investigation, FileType=ddd-executable-binary-with-multiple-packed-layers-suspicious-import-table-behavior-and-evasion-techniques, Hash=eee1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef-long-hash-value-used-for-testing-purpose-extended-with-multiple-hash-representations-[MD5:aaa111bbb222ccc333]-[SHA1:bbb222ccc333ddd444]-[SHA256:ccc333ddd444eee555]-[SSDEEP:ddd444eee555fff666]-end-of-hash-section, Extr... [truncated]

解析结果 (Parsed):

  • WarpParse:
{
	"wp_event_id": 1764661811871722000,
	"timestamp": "2025-02-07 15:07:18",
	"Hostname": "USG1000E",
	"ModuleName": "01ANTI-APT",
	"SeverityHeader": "4",
	"symbol": "ANTI-APT",
	"type": "l",
	"Count": "29",
	"Content": "An advanced persistent threat was detected.",
	"SyslogId": "1",
	"VSys": "public-long-virtual-system-name-for-testing-extra-large-value-to-simulate-enterprise-scenario",
	"Policy": "trust-untrust-high-risk-policy-with-deep-inspection-and-layer7-protection-enabled-for-advanced-threat-detection",
	"SrcIp": "192.168.1.123",
	"DstIp": "182.150.63.102",
	"SrcPort": "51784",
	"DstPort": "10781",
	"SrcZone": "trust-zone-with-multiple-segments-for-internal-security-domains-and-access-control",
	"DstZone": "untrust-wide-area-network-zone-with-external-facing-interfaces-and-honeynet-integration",
	"User": "unknown-long-user-field-used-for-simulation-purpose-with-extra-description-and-tags-[tag1][tag2][tag3]-to-reach-required-size",
	"Protocol": "TCP",
	"Application": "HTTP-long-application-signature-identification-with-multiple-behavior-patterns-and-deep-packet-inspection-enabled",
	"Profile": "IPS_default_advanced_extended_profile_with_ml_detection-long",
	"Direction": "aaa-long-direction-field-used-to-extend-size-with-additional-info-about-traffic-orientation-from-client-to-server",
	"ThreatType": "File Reputation with additional descriptive context of multi-layer analysis engine including sandbox-behavioral-signature-ml-static-analysis-and-network-correlation-modules-working-together",
	"ThreatName": "bbb-advanced-threat-campaign-with-code-name-operation-shadow-storm-and-related-IOCs-collected-over-multiple-incidents-in-the-wild-attached-metadata-[phase1][phase2][phase3]",
	"Action": "ccc-block-and-alert-with-deep-scan-followed-by-quarantine-and-forensic-dump-generation-for-further-investigation",
	"FileType": "ddd-executable-binary-with-multiple-packed-layers-suspicious-import-table-behavior-and-evasion-techniques",
	"Hash": "eee1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef-long-hash-value-used-for-testing-purpose-extended-with-multiple-hash-representations-[MD5:aaa111bbb222ccc333]-[SHA1:bbb222ccc333ddd444]-[SHA256:ccc333ddd444eee555]-[SSDEEP:ddd444eee555fff666]-end-of-hash-section, ExtraInfo=\"This is additional extended information purposely added to inflate the total log size for stress testing of log ingestion engines such as Vector, Fluent Bit, self-developed ETL pipelines, and any high-throughput log processing systems. It contains repeated segments to simulate realistic verbose threat intelligence attachment blocks. [SEG-A-BEGIN] The threat was part of a coordinated multi-vector campaign observed across various geographic regions targeting enterprise networks with spear-phishing, watering-hole attacks, and supply-chain compromise vectors. Enriched indicators include C2 domains, malware families, behavioral clusters, sandbox detonation traces, and network telemetry correlation. [SEG-A-END] [SEG-B-BEGIN] Further analysis revealed that the payload exhibited persistence techniques including registry autoruns, scheduled tasks, masquerading, process injection, and lateral movement attempts leveraging remote service creation and stolen credentials. The binary contains multiple obfuscation layers, anti-debugging, anti-VM checks, and unusual API call sequences. [SEG-B-END] [SEG-C-BEGIN] IOC Bundle: Domains=malicious-domain-example-01.com,malicious-domain-example-02.net,malicious-update-service.info; IPs=103.21.244.0,198.51.100.55,203.0.113.77; FileNames=update_service.exe,winlog_service.dll,mscore_update.bin; RegistryKeys=HKCU\\\\Software\\\\Microsoft\\\\Windows\\\\CurrentVersion\\\\Run,HKLM\\\\System\\\\Services\\\\FakeService; Mutex=Global\\\\A1B2C3D4E5F6G7H8; YARA Matches=[rule1,rule2,rule3]. [SEG-C-END] EndOfExtraInfo\",
	"wp_src_key": "socket",
	"wp_src_ip": "127.0.0.1"
}
  • Vector:
{
	"Action": "ccc-block-and-alert-with-deep-scan-followed-by-quarantine-and-forensic-dump-generation-for-further-investigation",
	"Application": "HTTP-long-application-signature-identification-with-multiple-behavior-patterns-and-deep-packet-inspection-enabled",
	"Content": "An advanced persistent threat was detected.",
	"Count": "29",
	"Direction": "aaa-long-direction-field-used-to-extend-size-with-additional-info-about-traffic-orientation-from-client-to-server",
	"DstIp": "182.150.63.102",
	"DstPort": "10781",
	"DstZone": "untrust-wide-area-network-zone-with-external-facing-interfaces-and-honeynet-integration",
	"FileType": "ddd-executable-binary-with-multiple-packed-layers-suspicious-import-table-behavior-and-evasion-techniques",
	"Hash": "eee1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef-long-hash-value-used-for-testing-purpose-extended-with-multiple-hash-representations-[MD5:aaa111bbb222ccc333]-[SHA1:bbb222ccc333ddd444]-[SHA256:ccc333ddd444eee555]-[SSDEEP:ddd444eee555fff666]-end-of-hash-section, ExtraInfo=\"This is additional extended information purposely added to inflate the total log size for stress testing of log ingestion engines such as Vector, Fluent Bit, self-developed ETL pipelines, and any high-throughput log processing systems. It contains repeated segments to simulate realistic verbose threat intelligence attachment blocks. [SEG-A-BEGIN] The threat was part of a coordinated multi-vector campaign observed across various geographic regions targeting enterprise networks with spear-phishing, watering-hole attacks, and supply-chain compromise vectors. Enriched indicators include C2 domains, malware families, behavioral clusters, sandbox detonation traces, and network telemetry correlation. [SEG-A-END] [SEG-B-BEGIN] Further analysis revealed that the payload exhibited persistence techniques including registry autoruns, scheduled tasks, masquerading, process injection, and lateral movement attempts leveraging remote service creation and stolen credentials. The binary contains multiple obfuscation layers, anti-debugging, anti-VM checks, and unusual API call sequences. [SEG-B-END] [SEG-C-BEGIN] IOC Bundle: Domains=malicious-domain-example-01.com,malicious-domain-example-02.net,malicious-update-service.info; IPs=103.21.244.0,198.51.100.55,203.0.113.77; FileNames=update_service.exe,winlog_service.dll,mscore_update.bin; RegistryKeys=HKCU\\\\Software\\\\Microsoft\\\\Windows\\\\CurrentVersion\\\\Run,HKLM\\\\System\\\\Services\\\\FakeService; Mutex=Global\\\\A1B2C3D4E5F6G7H8; YARA Matches=[rule1,rule2,rule3]. [SEG-C-END] EndOfExtraInfo\",
	"Hostname": "USG1000E",
	"ModuleName": "01ANTI-APT",
	"Policy": "trust-untrust-high-risk-policy-with-deep-inspection-and-layer7-protection-enabled-for-advanced-threat-detection",
	"Profile": "IPS_default_advanced_extended_profile_with_ml_detection-long",
	"Protocol": "TCP",
	"SeverityHeader": "4",
	"SrcIp": "192.168.1.123",
	"SrcPort": "51784",
	"SrcZone": "trust-zone-with-multiple-segments-for-internal-security-domains-and-access-control",
	"SyslogId": "1",
	"ThreatName": "bbb-advanced-threat-campaign-with-code-name-operation-shadow-storm-and-related-IOCs-collected-over-multiple-incidents-in-the-wild-attached-metadata-[phase1][phase2][phase3]",
	"ThreatType": "File Reputation with additional descriptive context of multi-layer analysis engine including sandbox-behavioral-signature-ml-static-analysis-and-network-correlation-modules-working-together",
	"User": "unknown-long-user-field-used-for-simulation-purpose-with-extra-description-and-tags-[tag1][tag2][tag3]-to-reach-required-size",
	"VSys": "public-long-virtual-system-name-for-testing-extra-large-value-to-simulate-enterprise-scenario",
	"host": "127.0.0.1",
	"port": 55771,
	"source_type": "socket",
	"symbol": "ANTI-APT",
	"timestamp": "Feb  7 2025 15:07:18+08:00",
	"type": "l"
}

解析+转换结果 (Transformed):

  • WarpParse:
{
	"count": 29,
	"severity": 4,
	"match_chars": "localhost",
	"num_range": 29,
	"src_system_log_type": "日志信息",
	"extends_ip": {
		"DstIp": "182.150.63.102",
		"SrcIp": "192.168.1.123"
	},
	"extends_info": {
		"hostname": "USG1000E",
		"source_type": "socket"
	},
	"wp_event_id": 1764815397395451000,
	"timestamp": "2025-02-07 15:07:18",
	"Hostname": "USG1000E",
	"ModuleName": "01ANTI-APT",
	"symbol": "ANTI-APT",
	"type": "l",
	"Content": "An advanced persistent threat was detected.",
	"SyslogId": "1",
	"VSys": "public-long-virtual-system-name-for-testing-extra-large-value-to-simulate-enterprise-scenario",
	"Policy": "trust-untrust-high-risk-policy-with-deep-inspection-and-layer7-protection-enabled-for-advanced-threat-detection",
	"SrcIp": "192.168.1.123",
	"DstIp": "182.150.63.102",
	"SrcPort": "51784",
	"DstPort": "10781",
	"SrcZone": "trust-zone-with-multiple-segments-for-internal-security-domains-and-access-control",
	"DstZone": "untrust-wide-area-network-zone-with-external-facing-interfaces-and-honeynet-integration",
	"User": "unknown-long-user-field-used-for-simulation-purpose-with-extra-description-and-tags-[tag1][tag2][tag3]-to-reach-required-size",
	"Protocol": "TCP",
	"Application": "HTTP-long-application-signature-identification-with-multiple-behavior-patterns-and-deep-packet-inspection-enabled",
	"Profile": "IPS_default_advanced_extended_profile_with_ml_detection-long",
	"Direction": "aaa-long-direction-field-used-to-extend-size-with-additional-info-about-traffic-orientation-from-client-to-server",
	"ThreatType": "File Reputation with additional descriptive context of multi-layer analysis engine including sandbox-behavioral-signature-ml-static-analysis-and-network-correlation-modules-working-together",
	"ThreatName": "bbb-advanced-threat-campaign-with-code-name-operation-shadow-storm-and-related-IOCs-collected-over-multiple-incidents-in-the-wild-attached-metadata-[phase1][phase2][phase3]",
	"Action": "ccc-block-and-alert-with-deep-scan-followed-by-quarantine-and-forensic-dump-generation-for-further-investigation",
	"FileType": "ddd-executable-binary-with-multiple-packed-layers-suspicious-import-table-behavior-and-evasion-techniques",
	"Hash": "eee1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef-long-hash-value-used-for-testing-purpose-extended-with-multiple-hash-representations-[MD5:aaa111bbb222ccc333]-[SHA1:bbb222ccc333ddd444]-[SHA256:ccc333ddd444eee555]-[SSDEEP:ddd444eee555fff666]-end-of-hash-section, ExtraInfo=\"This is additional extended information purposely added to inflate the total log size for stress testing of log ingestion engines such as Vector, Fluent Bit, self-developed ETL pipelines, and any high-throughput log processing systems. It contains repeated segments to simulate realistic verbose threat intelligence attachment blocks. [SEG-A-BEGIN] The threat was part of a coordinated multi-vector campaign observed across various geographic regions targeting enterprise networks with spear-phishing, watering-hole attacks, and supply-chain compromise vectors. Enriched indicators include C2 domains, malware families, behavioral clusters, sandbox detonation traces, and network telemetry correlation. [SEG-A-END] [SEG-B-BEGIN] Further analysis revealed that the payload exhibited persistence techniques including registry autoruns, scheduled tasks, masquerading, process injection, and lateral movement attempts leveraging remote service creation and stolen credentials. The binary contains multiple obfuscation layers, anti-debugging, anti-VM checks, and unusual API call sequences. [SEG-B-END] [SEG-C-BEGIN] IOC Bundle: Domains=malicious-domain-example-01.com,malicious-domain-example-02.net,malicious-update-service.info; IPs=103.21.244.0,198.51.100.55,203.0.113.77; FileNames=update_service.exe,winlog_service.dll,mscore_update.bin; RegistryKeys=HKCU\\\\\\\\Software\\\\\\\\Microsoft\\\\\\\\Windows\\\\\\\\CurrentVersion\\\\\\\\Run,HKLM\\\\\\\\System\\\\\\\\Services\\\\\\\\FakeService; Mutex=Global\\\\\\\\A1B2C3D4E5F6G7H8; YARA Matches=[rule1,rule2,rule3]. [SEG-C-END] EndOfExtraInfo\",
	"wp_src_key": "socket",
	"wp_src_ip": "127.0.0.1"
}
  • Vector:
{
	"Action": "ccc-block-and-alert-with-deep-scan-followed-by-quarantine-and-forensic-dump-generation-for-further-investigation",
	"Application": "HTTP-long-application-signature-identification-with-multiple-behavior-patterns-and-deep-packet-inspection-enabled",
	"Content": "An advanced persistent threat was detected.",
	"Direction": "aaa-long-direction-field-used-to-extend-size-with-additional-info-about-traffic-orientation-from-client-to-server",
	"DstIp": "182.150.63.102",
	"DstPort": "10781",
	"DstZone": "untrust-wide-area-network-zone-with-external-facing-interfaces-and-honeynet-integration",
	"FileType": "ddd-executable-binary-with-multiple-packed-layers-suspicious-import-table-behavior-and-evasion-techniques",
	"Hash": "eee1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef-long-hash-value-used-for-testing-purpose-extended-with-multiple-hash-representations-[MD5:aaa111bbb222ccc333]-[SHA1:bbb222ccc333ddd444]-[SHA256:ccc333ddd444eee555]-[SSDEEP:ddd444eee555fff666]-end-of-hash-section, ExtraInfo=\"This is additional extended information purposely added to inflate the total log size for stress testing of log ingestion engines such as Vector, Fluent Bit, self-developed ETL pipelines, and any high-throughput log processing systems. It contains repeated segments to simulate realistic verbose threat intelligence attachment blocks. [SEG-A-BEGIN] The threat was part of a coordinated multi-vector campaign observed across various geographic regions targeting enterprise networks with spear-phishing, watering-hole attacks, and supply-chain compromise vectors. Enriched indicators include C2 domains, malware families, behavioral clusters, sandbox detonation traces, and network telemetry correlation. [SEG-A-END] [SEG-B-BEGIN] Further analysis revealed that the payload exhibited persistence techniques including registry autoruns, scheduled tasks, masquerading, process injection, and lateral movement attempts leveraging remote service creation and stolen credentials. The binary contains multiple obfuscation layers, anti-debugging, anti-VM checks, and unusual API call sequences. [SEG-B-END] [SEG-C-BEGIN] IOC Bundle: Domains=malicious-domain-example-01.com,malicious-domain-example-02.net,malicious-update-service.info; IPs=103.21.244.0,198.51.100.55,203.0.113.77; FileNames=update_service.exe,winlog_service.dll,mscore_update.bin; RegistryKeys=HKCU\\\\\\\\Software\\\\\\\\Microsoft\\\\\\\\Windows\\\\\\\\CurrentVersion\\\\\\\\Run,HKLM\\\\\\\\System\\\\\\\\Services\\\\\\\\FakeService; Mutex=Global\\\\\\\\A1B2C3D4E5F6G7H8; YARA Matches=[rule1,rule2,rule3]. [SEG-C-END] EndOfExtraInfo\",
	"Hostname": "USG1000E",
	"ModuleName": "01ANTI-APT",
	"Policy": "trust-untrust-high-risk-policy-with-deep-inspection-and-layer7-protection-enabled-for-advanced-threat-detection",
	"Profile": "IPS_default_advanced_extended_profile_with_ml_detection-long",
	"Protocol": "TCP",
	"SeverityHeader": "4",
	"SrcIp": "192.168.1.123",
	"SrcPort": "51784",
	"SrcZone": "trust-zone-with-multiple-segments-for-internal-security-domains-and-access-control",
	"SyslogId": "1",
	"ThreatName": "bbb-advanced-threat-campaign-with-code-name-operation-shadow-storm-and-related-IOCs-collected-over-multiple-incidents-in-the-wild-attached-metadata-[phase1][phase2][phase3]",
	"ThreatType": "File Reputation with additional descriptive context of multi-layer analysis engine including sandbox-behavioral-signature-ml-static-analysis-and-network-correlation-modules-working-together",
	"User": "unknown-long-user-field-used-for-simulation-purpose-with-extra-description-and-tags-[tag1][tag2][tag3]-to-reach-required-size",
	"VSys": "public-long-virtual-system-name-for-testing-extra-large-value-to-simulate-enterprise-scenario",
	"count": 29,
	"extends_info": {
		"hostname": "USG1000E",
		"source_type": "socket"
	},
	"extends_ip": {
		"DstIp": "182.150.63.102",
		"SrcIp": "192.168.1.123"
	},
	"host": "127.0.0.1",
	"match_chars": "localhost",
	"num_range": 29,
	"port": 51272,
	"severity": 4,
	"source_type": "socket",
	"src_system_log_type": "日志信息",
	"symbol": "ANTI-APT",
	"timestamp": "Feb  7 2025 15:07:18+08:00",
	"type": "l"
}

File Source Benchmarks

Performance benchmarks for file-based data source scenarios using batch processing mode.

Purpose

Test file I/O and parsing performance with pre-generated data files.

Test Scenarios

ScenarioDescriptionValidated Features
parse_to_blackholeFile → Parse → DiscardFile reading + pure parsing throughput
parse_to_fileFile → Parse → FileComplete file-to-file parsing pipeline
trans_to_blackholeFile → Parse+Transform → DiscardParsing + OML transformation throughput
trans_to_fileFile → Parse+Transform → FileComplete transformation pipeline

Quick Start

cd benchmark

# Parse to blackhole (default: 20M lines, 6 workers)
./case_file/parse_to_blackhole/run.sh

# Medium dataset (200K lines)
./case_file/parse_to_blackhole/run.sh -m

# Custom configuration
./case_file/parse_to_file/run.sh -w 8 nginx

Data Flow

wpgen → gen.dat → wparse batch → sink (blackhole/file)

文件源基准测试 (中文)

基于文件数据源的性能基准测试,使用批处理模式。

用途

测试文件 I/O 和解析性能,使用预生成的数据文件。

测试场景

场景说明验证特性
parse_to_blackhole文件 → 解析 → 丢弃文件读取 + 纯解析吞吐量
parse_to_file文件 → 解析 → 文件完整文件到文件解析管道
trans_to_blackhole文件 → 解析+转换 → 丢弃解析 + OML 转换吞吐量
trans_to_file文件 → 解析+转换 → 文件完整转换管道

快速开始

cd benchmark

# 解析到黑洞(默认:2000 万行,6 个 worker)
./case_file/parse_to_blackhole/run.sh

# 中等规模数据集(20 万行)
./case_file/parse_to_blackhole/run.sh -m

# 自定义配置
./case_file/parse_to_file/run.sh -w 8 nginx

数据流向

wpgen → gen.dat → wparse batch → sink (黑洞/文件)

apt_file_to_blackhole

Case Metadata

  • Case ID: apt_file_to_blackhole
  • Category: file
  • Capability: parse_only
  • Topology: file -> blackhole
  • Platforms: Mac M4 Mini / Linux (AWS EC2)

Scenario Definition

  • 日志类型: APT Threat Log (3K)
  • 平均大小: 3K
  • 能力: parse_only
  • 输入/输出: File -> BlackHole
  • 说明: APT Threat Log 场景,File 输入到 BlackHole 输出,执行 日志解析 能力。

Dataset Contract

  • 输入数据: benchmark/case_file/parse_to_blackhole/data/in_dat/gen.dat(数据文件) / benchmark/case_file/parse_to_blackhole/conf/wpgen.toml(生成器配置)
  • 事件数: 支持 -m(中等规模)与 -c(指定条数),事件含义与生成器配置保持一致
  • 编码/分隔: UTF-8 / LF
  • 混合比例: 不适用

Configuration Binding

  • WarpParse: benchmark/case_file/parse_to_blackhole/conf/wparse.toml(规则目录:benchmark/models/wpl/apt;解析场景不启用 OML)
  • Vector-VRL: benchmark/vector/vector-vrl/apt_file_to_blackhole.toml
  • Vector-Fixed: benchmark/vector/vector-fixed/apt_file_to_blackhole.toml
  • Logstash: benchmark/logstash/logstash_parse/apt_file_to_blackhole.conf

Execution Contract

  • 结束条件: 消费完等量事件(或按数据集规模),如需按时间结束请补充
  • 并发/Worker: 默认配置(wparse 的 parse_workers 以配置为准)
  • 重复次数: 默认单次;建议 N=3 取 median

Metrics

  • EPS: Events Per Second
  • MPS: MiB/s,公式:MPS = EPS * AvgEventSize / 1024 / 1024
  • CPU: 多核累计百分比(例如 800% ≈ 8 个逻辑核满载)
  • MEM: 进程内存占用(Avg/Peak)
  • Rule Size: 规则配置体积

Performance Data

Linux (AWS EC2)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse129,700438.71535% / 543%273 MB / 295 MB7.67x
Vector16,90157.17692% / 730%175 MB / 180 MB1.0x
Logstash9,00930.47684% / 736%1211 MB / 1229 MB0.53x

macOS (Mac M4 Mini)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse328,0001109.53743% / 829%183 MB / 184 MB8.68x
Vector-VRL37,777127.79578% / 657%255 MB / 265 MB1.0x
Logstash29,940101.28847% / 915%944 MB / 1152 MB0.79x

Correctness Check

  • 对齐说明: 参见 benchmark/report/test_sample.md
  • 抽样方式: 运行 file 输出链路进行抽样对比,检查关键字段与 Golden 输出一致
  • 输出路径约定: benchmark/case_file/parse_to_file/data/out_dat/(如需校验可切换到 file 输出)

Notes

  • 实例规格: 若为 TBD,不影响 file 场景口径,但建议补齐以便复现
  • 限制: 单机测试,未覆盖分布式/HA

References

  • Mac 报告: benchmark/report/report_mac.md#314-apt-threat-log-3K(章节 3.1.4)
  • Linux 报告: benchmark/report/report_linux.md#314-apt-threat-log-3K(章节 3.1.4)
  • 规则说明: benchmark/report/test_rule.md
  • 样本对齐: benchmark/report/test_sample.md

aws_file_to_blackhole

Case Metadata

  • Case ID: aws_file_to_blackhole
  • Category: file
  • Capability: parse_only
  • Topology: file -> blackhole
  • Platforms: Mac M4 Mini / Linux (AWS EC2)

Scenario Definition

  • 日志类型: AWS ELB Log (411B)
  • 平均大小: 411B
  • 能力: parse_only
  • 输入/输出: File -> BlackHole
  • 说明: AWS ELB Log 场景,File 输入到 BlackHole 输出,执行 日志解析 能力。

Dataset Contract

  • 输入数据: benchmark/case_file/parse_to_blackhole/data/in_dat/gen.dat(数据文件) / benchmark/case_file/parse_to_blackhole/conf/wpgen.toml(生成器配置)
  • 事件数: 支持 -m(中等规模)与 -c(指定条数),事件含义与生成器配置保持一致
  • 编码/分隔: UTF-8 / LF
  • 混合比例: 不适用

Configuration Binding

  • WarpParse: benchmark/case_file/parse_to_blackhole/conf/wparse.toml(规则目录:benchmark/models/wpl/aws;解析场景不启用 OML)
  • Vector-VRL: benchmark/vector/vector-vrl/aws_file_to_blackhole.toml
  • Vector-Fixed: benchmark/vector/vector-fixed/aws_file_to_blackhole.toml
  • Logstash: benchmark/logstash/logstash_parse/aws_file_to_blackhole.conf

Execution Contract

  • 结束条件: 消费完等量事件(或按数据集规模),如需按时间结束请补充
  • 并发/Worker: 默认配置(wparse 的 parse_workers 以配置为准)
  • 重复次数: 默认单次;建议 N=3 取 median

Metrics

  • EPS: Events Per Second
  • MPS: MiB/s,公式:MPS = EPS * AvgEventSize / 1024 / 1024
  • CPU: 多核累计百分比(例如 800% ≈ 8 个逻辑核满载)
  • MEM: 进程内存占用(Avg/Peak)
  • Rule Size: 规则配置体积

Performance Data

Linux (AWS EC2)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse398,800156.31698% / 756%194 MB / 366 MB2.82x
Vector-VRL141,60055.50423% / 437%166 MB / 170 MB1.0x
Vector-Fixed161,94463.47496% / 515%174 MB / 179 MB1.14x
Logstash87,71934.38514% / 532%1145 MB / 1170 MB0.62x

macOS (Mac M4 Mini)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse1,124,500440.79787% / 824%314 MB / 320 MB2.89x
Vector-VRL389,000152.47597% / 658%280 MB / 297 MB1.0x
Vector-Fixed491,739192.74514% / 537%259 MB / 284 MB1.26x
Logstash208,33381.66394% / 506%983 MB / 1141 MB0.54x

Correctness Check

  • 对齐说明: 参见 benchmark/report/test_sample.md
  • 抽样方式: 运行 file 输出链路进行抽样对比,检查关键字段与 Golden 输出一致
  • 输出路径约定: benchmark/case_file/parse_to_file/data/out_dat/(如需校验可切换到 file 输出)

Notes

  • 实例规格: 若为 TBD,不影响 file 场景口径,但建议补齐以便复现
  • 限制: 单机测试,未覆盖分布式/HA

References

  • Mac 报告: benchmark/report/report_mac.md#312-aws-elb-log-411b(章节 3.1.2)
  • Linux 报告: benchmark/report/report_linux.md#312-aws-elb-log-411b(章节 3.1.2)
  • 规则说明: benchmark/report/test_rule.md
  • 样本对齐: benchmark/report/test_sample.md

mixed_file_to_blackhole

Case Metadata

  • Case ID: mixed_file_to_blackhole
  • Category: file
  • Capability: parse_only
  • Topology: file -> blackhole
  • Platforms: Mac M4 Mini / Linux (AWS EC2)

Scenario Definition

  • 日志类型: Mixed Log (平均日志大小:886B)
  • 平均大小: 平均日志大小:886B
  • 能力: parse_only
  • 输入/输出: File -> BlackHole
  • 说明: Mixed Log 场景,File 输入到 BlackHole 输出,执行 日志解析 能力。

Dataset Contract

  • 输入数据: benchmark/case_file/parse_to_blackhole/data/in_dat/gen.dat(数据文件) / benchmark/case_file/parse_to_blackhole/conf/wpgen.toml(生成器配置)
  • 事件数: 支持 -m(中等规模)与 -c(指定条数),事件含义与生成器配置保持一致
  • 编码/分隔: UTF-8 / LF
  • 混合比例: 3:2:1:1(nginx:aws:firewall:apt)

Configuration Binding

  • WarpParse: benchmark/case_file/parse_to_blackhole/conf/wparse.toml(规则目录:benchmark/models/wpl/mixed;解析场景不启用 OML)
  • Vector-VRL: benchmark/vector/vector-vrl/mixed_file_to_blackhole.toml
  • Vector-Fixed: benchmark/vector/vector-fixed/mixed_file_to_blackhole.toml
  • Logstash: benchmark/logstash/logstash_parse/mixed_file_to_blackhole.conf

Execution Contract

  • 结束条件: 消费完等量事件(或按数据集规模),如需按时间结束请补充
  • 并发/Worker: 默认配置(wparse 的 parse_workers 以配置为准)
  • 重复次数: 默认单次;建议 N=3 取 median

Metrics

  • EPS: Events Per Second
  • MPS: MiB/s,公式:MPS = EPS * AvgEventSize / 1024 / 1024
  • CPU: 多核累计百分比(例如 800% ≈ 8 个逻辑核满载)
  • MEM: 进程内存占用(Avg/Peak)
  • Rule Size: 规则配置体积

Performance Data

Linux (AWS EC2)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse286,000241.66632% / 736%271 MB / 374 MB5.56x
Vector-VRL51,44643.47494% / 692%228 MB / 249 MB1.00x
Vector-Fixed52,53044.39500% / 696%182 MB / 201 MB1.02x
Logstash21,50518.17400% /444%1136 MB / 1163 MB0.42x

macOS (Mac M4 Mini)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse715,000604.14860% / 868%246 MB / 254 MB3.76x
Vector-VRL190,000160.54827% / 880%281 MB / 329 MB1.0x
Vector-Fixed197,073166.52825% / 903%237 MB / 250 MB1.04x
Logstash109,89086.43746% / 955%1271 MB / 1292 MB0.62x

Correctness Check

  • 对齐说明: 参见 benchmark/report/test_sample.md
  • 抽样方式: 运行 file 输出链路进行抽样对比,检查关键字段与 Golden 输出一致
  • 输出路径约定: benchmark/case_file/parse_to_file/data/out_dat/(如需校验可切换到 file 输出)

Notes

  • 实例规格: 若为 TBD,不影响 file 场景口径,但建议补齐以便复现
  • 限制: 单机测试,未覆盖分布式/HA

References

  • Mac 报告: benchmark/report/report_mac.md#315-mixed-log-平均日志大小886b(章节 3.1.5)
  • Linux 报告: benchmark/report/report_linux.md#315-mixed-log-平均日志大小886b(章节 3.1.5)
  • 规则说明: benchmark/report/test_rule.md
  • 样本对齐: benchmark/report/test_sample.md

nginx_file_to_blackhole

Case Metadata

  • Case ID: nginx_file_to_blackhole
  • Category: file
  • Capability: parse_only
  • Topology: file -> blackhole
  • Platforms: Mac M4 Mini / Linux (AWS EC2)

Scenario Definition

  • 日志类型: Nginx Access Log (239B)
  • 平均大小: 239B
  • 能力: parse_only
  • 输入/输出: File -> BlackHole
  • 说明: Nginx Access Log 场景,File 输入到 BlackHole 输出,执行 日志解析 能力。

Dataset Contract

  • 输入数据: benchmark/case_file/parse_to_blackhole/data/in_dat/gen.dat(数据文件) / benchmark/case_file/parse_to_blackhole/conf/wpgen.toml(生成器配置)
  • 事件数: 支持 -m(中等规模)与 -c(指定条数),事件含义与生成器配置保持一致
  • 编码/分隔: UTF-8 / LF
  • 混合比例: 不适用

Configuration Binding

  • WarpParse: benchmark/case_file/parse_to_blackhole/conf/wparse.toml(规则目录:benchmark/models/wpl/nginx;解析场景不启用 OML)
  • Vector-VRL: benchmark/vector/vector-vrl/nginx_file_to_blackhole.toml
  • Vector-Fixed: benchmark/vector/vector-fixed/nginx_file_to_blackhole.toml
  • Logstash: benchmark/logstash/logstash_parse/nginx_file_to_blackhole.conf

Execution Contract

  • 结束条件: 消费完等量事件(或按数据集规模)
  • 并发/Worker: 默认配置(wparse 的 parse_workers 以配置为准)
  • 重复次数: 默认单次;建议 N=3 取 median

Metrics

  • EPS: Events Per Second
  • MPS: MiB/s,公式:MPS = EPS * AvgEventSize / 1024 / 1024
  • CPU: 多核累计百分比(例如 800% ≈ 8 个逻辑核满载)
  • MEM: 进程内存占用(Avg/Peak)
  • Rule Size: 规则配置体积

Performance Data

Linux (AWS EC2)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse810,100184.65626% / 639%115 MB / 314 MB3.83x
Vector-VRL211,25048.15292% / 305%148 MB / 153 MB1.0x
Vector-Fixed170,66638.90431% / 451%141 MB / 151 MB0.81x
Logstash106,38224.25436% / 461%1144 MB / 1175 MB0.50x

macOS (Mac M4 Mini)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse2,789,800635.86768% / 858%126 MB / 130 MB4.88x
Vector-VRL572,076130.39298% / 320%222 MB / 241 MB1.0x
Vector-Fixed513,181116.97466% / 538%232 MB / 245 MB0.90x
Logstash270,27061.60308% / 418%1092 MB / 1115 MB0.47x

Correctness Check

  • 对齐说明: 参见 benchmark/report/test_sample.md
  • 抽样方式: 运行 file 输出链路进行抽样对比,检查关键字段与 Golden 输出一致
  • 输出路径约定: benchmark/case_file/parse_to_file/data/out_dat/(如需校验可切换到 file 输出)

Notes

  • 限制: 单机测试,未覆盖分布式/HA

References

  • Mac 报告: benchmark/report/report_mac.md#311-nginx-access-log-239b(章节 3.1.1)
  • Linux 报告: benchmark/report/report_linux.md#311-nginx-access-log-239b(章节 3.1.1)
  • 规则说明: benchmark/report/test_rule.md
  • 样本对齐: benchmark/report/test_sample.md

sysmon_file_to_blackhole

本测试用于验证 sysmon_file_to_blackhole 场景下的引擎性能(Mac M4 Mini,日志解析)。

场景描述

本场景可概括为:Sysmon 1K JSON 日志,File 输入到 BlackHole 输出,执行 日志解析 能力

本测试旨在评估 WarpParseVector 两款引擎在 Sysmon 1K JSON 日志 处理场景下的表现。

设计

  1. 测试目标: 对比两引擎在该日志场景下的吞吐、延迟、CPU/内存使用。
  2. 输入配置: File 输入,覆盖高并发/大吞吐压测场景。
  3. 输出配置: BlackHole 输出,用于验证链路吞吐能力。
  4. 预期行为:
  • 高频日志稳定消费,不丢失、不乱序,字段提取正确。
  • 在对应输入/输出链路下持续跑满数据源,不出现明显 backpressure。
  • 监控指标可正常采集,用于后续性能对比。

Results(Mac M4 Mini)

引擎输入模式输出模式消费速率(EPS)MPSCPU平均/峰值内存平均/峰值
WarpParseFileBlackHole440,000413.74852.01 % / 943.50 %223.52 MB / 338.05 MB
VectorFileBlackHole76,71772.14462.81 % / 563.70 %294.87 MB / 312.77 MB

结论

在本测试场景(Sysmon 1K JSON 日志,File 输入到 BlackHole 输出,执行 日志解析 能力)中,对比 WarpParseVector 的性能表现,得出以下结论:

  1. 吞吐性能: WarpParse 表现出显著优势。

    • 消费速率达到 440,000 EPS,约是 Vector (76,717 EPS) 的 5.74 倍。
    • 这意味着在相同硬件资源下,WarpParse 能够处理更大规模的数据流量。
  2. 系统资源开销:

    • CPU: WarpParse 的 CPU 使用率更高(852.01 % vs 462.81 %)。
    • 内存: WarpParse 的内存占用更低(223.52 MB vs 294.87 MB)。

总结: WarpParse 在该场景下展现了卓越的吞吐性能(领先约 5.74 倍),同时保持了更低的内存占用。对于追求高吞吐量的日志处理场景,WarpParse 是更优的选择。

apt_file_to_blackhole

Case Metadata

  • Case ID: apt_file_to_blackhole
  • Category: file
  • Capability: parse_trans
  • Topology: file -> blackhole
  • Platforms: Mac M4 Mini / Linux (AWS EC2)

Scenario Definition

  • 日志类型: APT Threat Log (3K)
  • 平均大小: 3K
  • 能力: parse_trans
  • 输入/输出: File -> BlackHole
  • 说明: APT Threat Log 场景,File 输入到 BlackHole 输出,执行 日志解析+转换 能力。

Dataset Contract

  • 输入数据: benchmark/case_file/parse_to_blackhole/data/in_dat/gen.dat(数据文件) / benchmark/case_file/parse_to_blackhole/conf/wpgen.toml(生成器配置)
  • 事件数: 支持 -m(中等规模)与 -c(指定条数),事件含义与生成器配置保持一致
  • 编码/分隔: UTF-8 / LF
  • 混合比例: 不适用

Configuration Binding

  • WarpParse: benchmark/case_file/parse_to_blackhole/conf/wparse.toml(规则目录:benchmark/models/wpl/apt;解析+转换使用 benchmark/models/oml)
  • Vector-VRL: benchmark/vector/vector-vrl_transform/apt_file_to_blackhole.toml
  • Vector-Fixed: benchmark/vector/vector-fixed_transform/apt_file_to_blackhole.toml
  • Logstash: benchmark/logstash/logstash_trans/apt_file_to_blackhole.conf

Execution Contract

  • 结束条件: 消费完等量事件(或按数据集规模),如需按时间结束请补充
  • 并发/Worker: 默认配置(wparse 的 parse_workers 以配置为准)
  • 重复次数: 默认单次;建议 N=3 取 median

Metrics

  • EPS: Events Per Second
  • MPS: MiB/s,公式:MPS = EPS * AvgEventSize / 1024 / 1024
  • CPU: 多核累计百分比(例如 800% ≈ 8 个逻辑核满载)
  • MEM: 进程内存占用(Avg/Peak)
  • Rule Size: 规则配置体积

Performance Data

Linux (AWS EC2)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse123,100416.38599% / 607%199 MB / 265 MB7.65x
Vector16,09354.43674% / 742%188 MB / 199 MB1.0x
Logstash7,63325.82657% / 732%1174 MB / 1197 MB0.47x

macOS (Mac M4 Mini)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse299,4001012.79763% / 855%155 MB / 162 MB8.12x
Vector-VRL36,857124.68567% / 654%268 MB / 286 MB1.0x
Logstash26,31589.02852% / 901%1256 MB / 1305 MB0.71x

Correctness Check

  • 对齐说明: 参见 benchmark/report/test_sample.md
  • 抽样方式: 运行 file 输出链路进行抽样对比,检查关键字段与 Golden 输出一致
  • 输出路径约定: benchmark/case_file/parse_to_file/data/out_dat/(如需校验可切换到 file 输出)

Notes

  • 实例规格: 若为 TBD,不影响 file 场景口径,但建议补齐以便复现
  • 限制: 单机测试,未覆盖分布式/HA

References

  • Mac 报告: benchmark/report/report_mac.md#324-apt-threat-log-3K(章节 3.2.4)
  • Linux 报告: benchmark/report/report_linux.md#324-apt-threat-log-3K(章节 3.2.4)
  • 规则说明: benchmark/report/test_rule.md
  • 样本对齐: benchmark/report/test_sample.md

aws_file_to_blackhole

Case Metadata

  • Case ID: aws_file_to_blackhole
  • Category: file
  • Capability: parse_trans
  • Topology: file -> blackhole
  • Platforms: Mac M4 Mini / Linux (AWS EC2)

Scenario Definition

  • 日志类型: AWS ELB Log (411B)
  • 平均大小: 411B
  • 能力: parse_trans
  • 输入/输出: File -> BlackHole
  • 说明: AWS ELB Log 场景,File 输入到 BlackHole 输出,执行 日志解析+转换 能力。

Dataset Contract

  • 输入数据: benchmark/case_file/parse_to_blackhole/data/in_dat/gen.dat(数据文件) / benchmark/case_file/parse_to_blackhole/conf/wpgen.toml(生成器配置)
  • 事件数: 支持 -m(中等规模)与 -c(指定条数),事件含义与生成器配置保持一致
  • 编码/分隔: UTF-8 / LF
  • 混合比例: 不适用

Configuration Binding

  • WarpParse: benchmark/case_file/parse_to_blackhole/conf/wparse.toml(规则目录:benchmark/models/wpl/aws;解析+转换使用 benchmark/models/oml)
  • Vector-VRL: benchmark/vector/vector-vrl_transform/aws_file_to_blackhole.toml
  • Vector-Fixed: benchmark/vector/vector-fixed_transform/aws_file_to_blackhole.toml
  • Logstash: benchmark/logstash/logstash_trans/aws_file_to_blackhole.conf

Execution Contract

  • 结束条件: 消费完等量事件(或按数据集规模),如需按时间结束请补充
  • 并发/Worker: 默认配置(wparse 的 parse_workers 以配置为准)
  • 重复次数: 默认单次;建议 N=3 取 median

Metrics

  • EPS: Events Per Second
  • MPS: MiB/s,公式:MPS = EPS * AvgEventSize / 1024 / 1024
  • CPU: 多核累计百分比(例如 800% ≈ 8 个逻辑核满载)
  • MEM: 进程内存占用(Avg/Peak)
  • Rule Size: 规则配置体积

Performance Data

Linux (AWS EC2)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse275,900108.14649% / 719%236 MB / 327 MB2.22x
Vector-VRL124,33348.73523% / 560%190 MB / 199 MB1.0x
Vector-Fixed141,81855.59514% / 529%179 MB / 191 MB1.14x
Logstash54,05421.19582% / 653%1155 MB / 1217 MB0.43x

macOS (Mac M4 Mini)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse913,300358.00880% / 942%228 MB / 248 MB2.64x
Vector-VRL345,500135.42548% / 649%291 MB / 309 MB1.0x
Vector-Fixed446,111174.86506% / 597%276 MB / 295 MB1.29x
Logstash147,05857.64525% / 701%1121 MB / 1170 MB0.43x

Correctness Check

  • 对齐说明: 参见 benchmark/report/test_sample.md
  • 抽样方式: 运行 file 输出链路进行抽样对比,检查关键字段与 Golden 输出一致
  • 输出路径约定: benchmark/case_file/parse_to_file/data/out_dat/(如需校验可切换到 file 输出)

Notes

  • 实例规格: 若为 TBD,不影响 file 场景口径,但建议补齐以便复现
  • 限制: 单机测试,未覆盖分布式/HA

References

  • Mac 报告: benchmark/report/report_mac.md#322-aws-elb-log-411b(章节 3.2.2)
  • Linux 报告: benchmark/report/report_linux.md#322-aws-elb-log-411b(章节 3.2.2)
  • 规则说明: benchmark/report/test_rule.md
  • 样本对齐: benchmark/report/test_sample.md

mixed_file_to_blackhole

Case Metadata

  • Case ID: mixed_file_to_blackhole
  • Category: file
  • Capability: parse_trans
  • Topology: file -> blackhole
  • Platforms: Mac M4 Mini / Linux (AWS EC2)

Scenario Definition

  • 日志类型: Mixed Log (平均日志大小:886B)
  • 平均大小: 平均日志大小:886B
  • 能力: parse_trans
  • 输入/输出: File -> BlackHole
  • 说明: Mixed Log 场景,File 输入到 BlackHole 输出,执行 日志解析+转换 能力。

Dataset Contract

  • 输入数据: benchmark/case_file/parse_to_blackhole/data/in_dat/gen.dat(数据文件) / benchmark/case_file/parse_to_blackhole/conf/wpgen.toml(生成器配置)
  • 事件数: 支持 -m(中等规模)与 -c(指定条数),事件含义与生成器配置保持一致
  • 编码/分隔: UTF-8 / LF
  • 混合比例: 3:2:1:1(nginx:aws:firewall:apt)

Configuration Binding

  • WarpParse: benchmark/case_file/parse_to_blackhole/conf/wparse.toml(规则目录:benchmark/models/wpl/mixed;解析+转换使用 benchmark/models/oml)
  • Vector-VRL: benchmark/vector/vector-vrl_transform/mixed_file_to_blackhole.toml
  • Vector-Fixed: benchmark/vector/vector-fixed_transform/mixed_file_to_blackhole.toml
  • Logstash: benchmark/logstash/logstash_trans/mixed_file_to_blackhole.conf

Execution Contract

  • 结束条件: 消费完等量事件(或按数据集规模),如需按时间结束请补充
  • 并发/Worker: 默认配置(wparse 的 parse_workers 以配置为准)
  • 重复次数: 默认单次;建议 N=3 取 median

Metrics

  • EPS: Events Per Second
  • MPS: MiB/s,公式:MPS = EPS * AvgEventSize / 1024 / 1024
  • CPU: 多核累计百分比(例如 800% ≈ 8 个逻辑核满载)
  • MEM: 进程内存占用(Avg/Peak)
  • Rule Size: 规则配置体积

Performance Data

Linux (AWS EC2)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse204,400172.71566% / 663%196 MB / 265 MB4.45x
Vector-VRL45,90938.79469% / 683%204 MB / 225 MB1.00x
Vector-Fixed48,48440.97541% / 714%178 MB / 209 MB1.06x
Logstash32,96727.86573% / 685%1150 MB / 1172 MB0.72x

macOS (Mac M4 Mini)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse659,700557.42889% / 940%170 MB / 184 MB3.80x
Vector-VRL173,750146.81784% / 860%278 MB / 299 MB1.0x
Vector-Fixed178,261150.62772% / 836%273 MB / 298 MB1.03x
Logstash50,50542.67911% / 939%1249 MB / 1276 MB0.29x

Correctness Check

  • 对齐说明: 参见 benchmark/report/test_sample.md
  • 抽样方式: 运行 file 输出链路进行抽样对比,检查关键字段与 Golden 输出一致
  • 输出路径约定: benchmark/case_file/parse_to_file/data/out_dat/(如需校验可切换到 file 输出)

Notes

  • 实例规格: 若为 TBD,不影响 file 场景口径,但建议补齐以便复现
  • 限制: 单机测试,未覆盖分布式/HA

References

  • Mac 报告: benchmark/report/report_mac.md#325-mixed-log-平均日志大小886b(章节 3.2.5)
  • Linux 报告: benchmark/report/report_linux.md#325-mixed-log-平均日志大小886b(章节 3.2.5)
  • 规则说明: benchmark/report/test_rule.md
  • 样本对齐: benchmark/report/test_sample.md

nginx_file_to_blackhole

Case Metadata

  • Case ID: nginx_file_to_blackhole
  • Category: file
  • Capability: parse_trans
  • Topology: file -> blackhole
  • Platforms: Mac M4 Mini / Linux (AWS EC2)

Scenario Definition

  • 日志类型: Nginx Access Log (239B)
  • 平均大小: 239B
  • 能力: parse_trans
  • 输入/输出: File -> BlackHole
  • 说明: Nginx Access Log 场景,File 输入到 BlackHole 输出,执行 日志解析+转换 能力。

Dataset Contract

  • 输入数据: benchmark/case_file/parse_to_blackhole/data/in_dat/gen.dat(数据文件) / benchmark/case_file/parse_to_blackhole/conf/wpgen.toml(生成器配置)
  • 事件数: 支持 -m(中等规模)与 -c(指定条数),事件含义与生成器配置保持一致
  • 编码/分隔: UTF-8 / LF
  • 混合比例: 不适用

Configuration Binding

  • WarpParse: benchmark/case_file/parse_to_blackhole/conf/wparse.toml(规则目录:benchmark/models/wpl/nginx;解析+转换使用 benchmark/models/oml)
  • Vector-VRL: benchmark/vector/vector-vrl_transform/nginx_file_to_blackhole.toml
  • Vector-Fixed: benchmark/vector/vector-fixed_transform/nginx_file_to_blackhole.toml
  • Logstash: benchmark/logstash/logstash_trans/nginx_file_to_blackhole.conf

Execution Contract

  • 结束条件: 消费完等量事件(或按数据集规模),如需按时间结束请补充
  • 并发/Worker: 默认配置(wparse 的 parse_workers 以配置为准)
  • 重复次数: 默认单次;建议 N=3 取 median

Metrics

  • EPS: Events Per Second
  • MPS: MiB/s,公式:MPS = EPS * AvgEventSize / 1024 / 1024
  • CPU: 多核累计百分比(例如 800% ≈ 8 个逻辑核满载)
  • MEM: 进程内存占用(Avg/Peak)
  • Rule Size: 规则配置体积

Performance Data

Linux (AWS EC2)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse656,800149.71688% / 768%220 MB / 357 MB3.27x
Vector-VRL201,00045.81339% / 350%167 MB / 175 MB1.0x
Vector-Fixed153,33334.95466% / 481%159 MB / 168 MB0.76x
Logstash76,92317.53470% / 483%1126 MB / 1160 MB0.38x

macOS (Mac M4 Mini)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse2,162,500492.91821% / 911%209 MB / 222 MB3.77x
Vector-VRL572,941130.59344% / 378%274 MB / 286 MB1.0x
Vector-Fixed482,000109.86554% / 612%252 MB / 261 MB0.84x
Logstash227,27251.80359% / 548%1109 MB / 1143 MB0.40x

Correctness Check

  • 对齐说明: 参见 benchmark/report/test_sample.md
  • 抽样方式: 运行 file 输出链路进行抽样对比,检查关键字段与 Golden 输出一致
  • 输出路径约定: benchmark/case_file/parse_to_file/data/out_dat/(如需校验可切换到 file 输出)

Notes

  • 实例规格: 若为 TBD,不影响 file 场景口径,但建议补齐以便复现
  • 限制: 单机测试,未覆盖分布式/HA

References

  • Mac 报告: benchmark/report/report_mac.md#321-nginx-access-log-239b(章节 3.2.1)
  • Linux 报告: benchmark/report/report_linux.md#321-nginx-access-log-239b(章节 3.2.1)
  • 规则说明: benchmark/report/test_rule.md
  • 样本对齐: benchmark/report/test_sample.md

sysmon_file_to_blackhole

File Parse to Blackhole

Benchmark for “File Source → Blackhole Sink” batch processing scenario: uses wpgen to generate test data files, wparse reads and parses in batch mode, outputs to blackhole to test pure parsing throughput performance.

Purpose

Validate the ability to:

  • Read data from files in batch mode
  • Apply WPL parsing rules
  • Measure pure parsing throughput (no I/O overhead from output)

Features Validated

FeatureDescription
File SourceReading from pre-generated data files
Batch Processingwparse batch mode execution
Blackhole SinkDiscarding output to measure pure parsing
Multi-file InputDual file sources (gen.dat, gen1.dat)

Quick Start

cd benchmark/case_file/parse_to_blackhole

# Default test (20M lines, 6 workers)
./run.sh

# Medium dataset (200K lines)
./run.sh -m

# Custom configuration
./run.sh -w 8 -f nginx

Parameters

ParameterDescriptionDefault
-mMedium dataset20M → 200K lines
-fForce regenerate dataSmart detection
-w <cnt>Worker count6
wpl_dirWPL rule directorynginx
speedGeneration rate limit0 (unlimited)

Data Flow

wpgen 1 → gen.dat  ─┐
                    ├→ wparse batch → blackhole sink
wpgen 2 → gen1.dat ─┘

file_blackhole 说明 (中文)

本用例演示“文件源 → Blackhole 汇“的批处理性能基准测试场景:使用 wpgen 生成测试数据文件,wparse 通过批处理模式读取并解析,输出到 blackhole 以测试纯解析吞吐性能。

目录结构

benchmark/file_blackhole/
├── README.md                    # 本说明文档
├── run.sh                       # 性能测试运行脚本
├── conf/                        # 配置文件目录
│   ├── wparse.toml             # WarpParse 主配置
│   ├── wpgen.toml              # 第一路文件生成器配置
│   └── wpgen1.toml             # 第二路文件生成器配置
├── data/                        # 运行数据目录
│   ├── in_dat/                 # 输入数据目录
│   │   ├── gen.dat            # 第一路生成数据
│   │   └── gen1.dat           # 第二路生成数据
│   ├── out_dat/                # 输出数据目录
│   │   ├── error.dat          # 错误数据输出
│   │   ├── miss.dat           # 缺失数据输出
│   │   ├── monitor.dat        # 监控数据输出
│   │   └── residue.dat        # 残留数据输出
│   ├── logs/                   # 日志文件目录
│   └── rescue/                 # 救援数据目录
└── .run/                        # 运行时数据目录
    └── rule_mapping.dat        # 规则映射数据

快速开始

运行环境要求

  • WarpParse 引擎(需在系统 PATH 中)
  • Bash shell 环境
  • 推荐系统:
    • Linux:最佳性能,支持所有优化功能
    • macOS:良好性能,部分优化功能受限

运行命令

# 进入 benchmark 目录
cd benchmark/file_blackhole

# 默认大规模性能测试(2000 万行数据)
./run.sh

# 中等规模测试(20 万行数据)
./run.sh -m

# 强制重新生成数据(即使已存在)
./run.sh -f

# 指定 worker 数量
./run.sh -w 8

# 使用特定 WPL 规则
./run.sh nginx

# 组合使用
./run.sh -m -w 8 -f nginx

运行参数

参数说明默认值
-m使用中等规模数据集2000万 → 20万行
-f强制重新生成数据智能检测
-w <cnt>指定 worker 数量6
wpl_dirWPL 规则目录名nginx
speed生成器限速(行/秒)0(不限速)

性能测试选项

  • 默认测试:2000 万行数据,双路文件源,6 个 worker
  • 中等测试:20 万行数据,适合快速验证
  • 强制生成-f 参数强制重新生成测试数据
  • 自定义 WPL:支持 nginx、apache、sysmon 等规则

执行逻辑

流程概览

run.sh 脚本执行以下主要步骤:

  1. 环境准备

    • 加载 benchmark 公共函数库
    • 解析命令行参数
    • 设置默认值(大规模:2000万行,中等:20万行)
  2. 数据生成检查

    • 检查 data/in_dat/gen.datdata/in_dat/gen1.dat 是否存在
    • 如果不存在或使用 -f 参数,则生成新数据
  3. 数据生成(如需要)

    • 启动 wpgen 生成第一路数据到 gen.dat
    • 启动 wpgen 生成第二路数据到 gen1.dat
    • 支持并发生成提高效率
  4. 批处理执行

    • 使用 wparse batch 读取文件数据
    • 应用 WPL 规则进行解析
    • 数据输出到 blackhole(丢弃)
  5. 性能统计

    • 实时显示处理进度
    • 记录吞吐量、处理时间等指标
    • 输出最终性能报告

数据流向

wpgen 生成器 1        wpgen 生成器 2
       ↓                    ↓
   gen.dat              gen1.dat
       ↓                    ↓
┌────────────────────────────────┐
│      wparse batch             │
│   - 读取文件数据               │
│   - 应用 WPL 规则解析          │
│   - 分发到 sinks              │
└────────────────────────────────┘
    ↓
┌─────────────┬─────────────┐
│  blackhole  │   monitor   │
│    sink     │    sink     │
│ (丢弃数据)  │ (收集统计)  │
└─────────────┴─────────────┘

验证与故障排除

运行成功验证

  1. 检查性能输出

    • 查看 terminal 输出的实时统计信息
    • 关注 “Throughput”(吞吐量)指标
    • 确认无错误或异常
  2. 验证输出文件

    # 检查监控数据
    ls -la data/out_dat/monitor.dat
    
    # 确认其他文件为空(无错误)
    ls -la data/out_dat/{error,miss,residue}.dat
    

性能

优化

  1. 系统级优化

    Linux 系统:

    # CPU 亲和性设置
    taskset -c 0-5 ./run.sh -w 6
    
    # I/O 调度器优化(SSD)
    echo noop | sudo tee /sys/block/sdX/queue/scheduler
    
    # 增大文件描述符限制
    ulimit -n 65536
    

    macOS 系统:

    # 调整文件描述符限制
    ulimit -n 65536
    
    # 调整系统参数(需要管理员权限)
    sudo sysctl -w kern.maxfiles=65536
    sudo sysctl -w kern.maxfilesperproc=65536
    
  2. 应用级优化

    • 增加 worker 数量:-w 12(不超过 CPU 核心数)
    • 使用更快的 WPL 规则(如 nginx)
    • 启用数据预生成并缓存
  3. 存储优化

    • 使用 SSD 存储
    • 使用 RAID 0 提高读写性能
    • 考虑使用内存文件系统

影响因素

  1. WPL 规则复杂度

    • nginx:简单正则,性能最佳
    • apache:中等复杂度
    • sysmon:复杂规则,性能较低
  2. 数据特征

    • 日志行长度
    • 正则匹配复杂度
    • 字段提取数量
  3. 系统配置

    • CPU 核心数和频率
    • 内存大小和速度
    • 磁盘 I/O 性能(关键因素)

使用建议

  • 选择 file_blackhole

    • 需要测试纯解析性能
    • 批量数据处理场景
    • 追求最高吞吐量
  • 选择 tcp_blackhole

    • 需要可靠的网络传输
    • 实时数据处理
    • 模拟 TCP 数据源
  • 选择 syslog_blackhole

    • 传统 syslog 集成
    • 极限性能测试
    • 日志收集场景

本文档最后更新时间:2025-12-16

File Parse to File

Benchmark for “File Source → File Sink” batch processing scenario: uses wpgen to generate test data files, wparse reads and parses in batch mode, outputs to files to test complete data processing pipeline performance.

Purpose

Validate the ability to:

  • Read data from files in batch mode
  • Apply WPL parsing rules
  • Write parsed output to files
  • Measure complete file-to-file pipeline throughput

Features Validated

FeatureDescription
File SourceReading from pre-generated data files
Batch Processingwparse batch mode execution
File SinkWriting parsed output to files
Complete PipelineFull file-to-file transformation

Quick Start

cd benchmark/case_file/parse_to_file

# Default test (20M lines, 2 workers)
./run.sh

# Medium dataset (200K lines)
./run.sh -m

# Custom configuration
./run.sh -w 4 -f nginx

Parameters

ParameterDescriptionDefault
-mMedium dataset20M → 200K lines
-fForce regenerate dataSmart detection
-w <cnt>Worker count2
wpl_dirWPL rule directorynginx
speedGeneration rate limit0 (unlimited)

Data Flow

wpgen → gen.dat → wparse batch → all.dat (output file)

file_file 说明 (中文)

本用例演示“文件源 → 文件汇“的性能基准测试场景:使用 wpgen 生成测试数据文件,wparse 通过批处理模式读取并解析,输出到文件以测试完整的数据处理管道性能。

目录结构

benchmark/file_file/
├── README.md                    # 本说明文档
├── run.sh                       # 性能测试运行脚本
├── record.md                    # 运行记录文档
├── conf/                        # 配置文件目录
│   ├── wparse.toml             # WarpParse 主配置
│   └── wpgen.toml              # 文件生成器配置
├── models/                      # 模型配置目录
│   ├── sinks/                  # 数据汇配置
│   │   ├── defaults.toml       # 默认配置
│   │   ├── business.d/         # 业务组配置
│   │   │   └── all.toml        # 文件汇组配置
│   │   └── infra.d/            # 基础设施组配置
│   │       ├── default.toml    # 默认数据汇
│   │       ├── error.toml      # 错误数据处理
│   │       ├── miss.toml       # 缺失数据处理
│   │       ├── monitor.toml    # 监控数据处理
│   │       └── residue.toml    # 残留数据处理
│   ├── sources/                # 数据源配置
│   │   └── wpsrc.toml          # 文件源配置
│   ├── wpl/                    # WPL 解析规则目录
│   │   ├── nginx/              # Nginx 日志规则
│   │   ├── apache/             # Apache 日志规则
│   │   └── sysmon/             # 系统监控规则
│   ├── oml/                    # OML 转换规则目录(空)
│   └── knowledge/              # 知识库目录(空)
├── data/                        # 运行数据目录
│   ├── in_dat/                 # 输入数据目录
│   │   └── gen.dat            # 生成数据文件
│   ├── out_dat/                # 输出数据目录
│   │   ├── all.dat            # 主输出文件
│   │   ├── error.dat          # 错误数据输出
│   │   ├── miss.dat           # 缺失数据输出
│   │   ├── monitor.dat        # 监控数据输出
│   │   └── residue.dat        # 残留数据输出
│   ├── logs/                   # 日志文件目录
│   └── rescue/                 # 救援数据目录
└── .run/                       # 运行时数据目录
    └── rule_mapping.dat        # 规则映射数据

快速开始

运行环境要求

  • WarpParse 引擎(需在系统 PATH 中)
  • Bash shell 环境
  • 推荐系统:
    • Linux:最佳性能,支持所有优化功能
    • macOS:良好性能,部分优化功能受限

运行命令

# 进入 benchmark 目录
cd benchmark/file_file

# 默认大规模性能测试(2000 万行数据)
./run.sh

# 中等规模测试(20 万行数据)
./run.sh -m

# 强制重新生成数据(即使已存在)
./run.sh -f

# 指定 worker 数量
./run.sh -w 8

# 使用特定 WPL 规则
./run.sh nginx

# 组合使用
./run.sh -m -w 8 -f nginx

运行参数

参数说明默认值
-m使用中等规模数据集2000万 → 20万行
-f强制重新生成数据智能检测
-w <cnt>指定 worker 数量2
wpl_dirWPL 规则目录名nginx
speed生成器限速(行/秒)0(不限速)

性能测试选项

  • 默认测试:2000 万行数据,单路文件源,2 个 worker
  • 中等测试:20 万行数据,适合快速验证
  • 强制生成-f 参数强制重新生成测试数据
  • 自定义 WPL:支持 nginx、apache、sysmon 等规则

执行逻辑

流程概览

run.sh 脚本执行以下主要步骤:

  1. 环境准备

    • 加载 benchmark 公共函数库
    • 解析命令行参数
    • 设置默认值(大规模:2000万行,中等:20万行)
  2. 初始化环境

    • 初始化 release 模式环境
    • 验证指定的 WPL 规则路径
    • 清理历史数据和日志
  3. 数据生成检查

    • 检查 data/in_dat/gen.dat 是否存在
    • 如果不存在或使用 -f 参数,则生成新数据
  4. 数据生成(如需要)

    • 启动 wpgen 生成数据到 gen.dat
    • 支持并发和批量生成
  5. 批处理执行

    • 使用 wparse batch 读取文件数据
    • 应用 WPL 规则进行解析
    • 数据输出到文件(all.dat)
  6. 性能统计

    • 实时显示处理进度
    • 记录吞吐量、处理时间等指标
    • 输出最终性能报告

数据流向

wpgen 生成器
       ↓
   gen.dat (输入文件)
       ↓
┌────────────────────────────────┐
│      wparse batch             │
│   - 读取文件数据               │
│   - 应用 WPL 规则解析          │
│   - 分发到 sinks              │
└────────────────────────────────┘
    ↓
┌─────────────┬─────────────┐
│    all.dat  │   monitor   │
│  (主输出)   │    sink     │
│             │ (收集统计)  │
└─────────────┴─────────────┘

使用建议

  • 选择 file_file

    • 需要测试完整的数据处理管道
    • 验证输出格式和内容
    • 文件到文件转换场景
  • 选择 file_blackhole

    • 只关心解析性能
    • 批量数据处理
    • 追求最高吞吐量
  • 选择 tcp_blackhole

    • 网络数据源
    • 实时处理需求
    • 可靠传输要求
  • 选择 syslog_blackhole

    • 日志集成
    • 高频数据接收
    • 传统 syslog 场景

贡献与反馈

如发现问题或有性能优化建议,请:

  1. 提交详细的测试环境信息(CPU、内存、存储配置)
  2. 包含完整的性能报告和 I/O 统计
  3. 分享存储优化经验和最佳实践
  4. 提供不同文件格式下的性能对比

本文档最后更新时间:2025-12-16

Record

File Trans to Blackhole

Benchmark for “File Source → Blackhole Sink” transformation scenario: uses wpgen to generate test data files, wparse reads in batch mode with WPL parsing and OML transformation, outputs to blackhole to test parsing + transformation throughput.

Purpose

Validate the ability to:

  • Read data from files in batch mode
  • Apply WPL parsing rules
  • Apply OML transformation models
  • Measure parsing + transformation throughput

Features Validated

FeatureDescription
File SourceReading from pre-generated data files
WPL ParsingApplying parsing rules
OML TransformationApplying transformation models
Blackhole SinkDiscarding output to measure throughput

Quick Start

cd benchmark/case_file/trans_to_blackhole

# Default test (20M lines, 6 workers)
./run.sh

# Medium dataset (200K lines)
./run.sh -m

Data Flow

wpgen → gen.dat → wparse batch (parse + OML) → blackhole sink

file_trans_blackhole 说明 (中文)

本用例演示“文件源 → Blackhole 汇“的转换性能基准测试场景:使用 wpgen 生成测试数据文件,wparse 通过批处理模式读取并进行 WPL 解析和 OML 转换,输出到 blackhole 以测试解析 + 转换吞吐性能。

目录结构

benchmark/file_blackhole/
├── README.md                    # 本说明文档
├── run.sh                       # 性能测试运行脚本
├── conf/                        # 配置文件目录
│   ├── wparse.toml             # WarpParse 主配置
│   ├── wpgen.toml              # 第一路文件生成器配置
│   └── wpgen1.toml             # 第二路文件生成器配置
├── models/                      # 模型配置目录
│   ├── sinks/                  # 数据汇配置
│   │   ├── defaults.toml       # 默认配置
│   │   ├── business.d/         # 业务组配置
│   │   │   └── all.toml        # Blackhole 汇组配置
│   │   └── infra.d/            # 基础设施组配置
│   │       ├── default.toml    # 默认数据汇
│   │       ├── error.toml      # 错误数据处理
│   │       ├── miss.toml       # 缺失数据处理
│   │       ├── monitor.toml    # 监控数据处理
│   │       └── residue.toml    # 残留数据处理
│   ├── sources/                # 数据源配置
│   │   └── wpsrc.toml          # 文件源配置
│   ├── wpl/                    # WPL 解析规则目录
│   │   ├── nginx/              # Nginx 日志规则
│   │   ├── apache/             # Apache 日志规则
│   │   └── sysmon/             # 系统监控规则
│   ├── oml/                    # OML 转换规则目录(空)
│   └── knowledge/              # 知识库目录(空)
├── data/                        # 运行数据目录
│   ├── in_dat/                 # 输入数据目录
│   │   ├── gen.dat            # 第一路生成数据
│   │   └── gen1.dat           # 第二路生成数据
│   ├── out_dat/                # 输出数据目录
│   │   ├── error.dat          # 错误数据输出
│   │   ├── miss.dat           # 缺失数据输出
│   │   ├── monitor.dat        # 监控数据输出
│   │   └── residue.dat        # 残留数据输出
│   ├── logs/                   # 日志文件目录
│   └── rescue/                 # 救援数据目录
├── out/                         # 输出目录
└── .run/                        # 运行时数据目录
    └── rule_mapping.dat        # 规则映射数据

快速开始

运行环境要求

  • WarpParse 引擎(需在系统 PATH 中)
  • Bash shell 环境
  • 推荐系统:
    • Linux:最佳性能,支持所有优化功能
    • macOS:良好性能,部分优化功能受限

运行命令

# 进入 benchmark 目录
cd benchmark/file_blackhole

# 默认大规模性能测试(2000 万行数据)
./run.sh

# 中等规模测试(20 万行数据)
./run.sh -m

# 强制重新生成数据(即使已存在)
./run.sh -f

# 指定 worker 数量
./run.sh -w 8

# 使用特定 WPL 规则
./run.sh nginx

# 组合使用
./run.sh -m -w 8 -f nginx

运行参数

参数说明默认值
-m使用中等规模数据集2000万 → 20万行
-f强制重新生成数据智能检测
-w <cnt>指定 worker 数量6
wpl_dirWPL 规则目录名nginx
speed生成器限速(行/秒)0(不限速)

性能测试选项

  • 默认测试:2000 万行数据,双路文件源,6 个 worker
  • 中等测试:20 万行数据,适合快速验证
  • 强制生成-f 参数强制重新生成测试数据
  • 自定义 WPL:支持 nginx、apache、sysmon 等规则

执行逻辑

流程概览

run.sh 脚本执行以下主要步骤:

  1. 环境准备

    • 加载 benchmark 公共函数库
    • 解析命令行参数
    • 设置默认值(大规模:2000万行,中等:20万行)
  2. 数据生成检查

    • 检查 data/in_dat/gen.datdata/in_dat/gen1.dat 是否存在
    • 如果不存在或使用 -f 参数,则生成新数据
  3. 数据生成(如需要)

    • 启动 wpgen 生成第一路数据到 gen.dat
    • 启动 wpgen 生成第二路数据到 gen1.dat
    • 支持并发生成提高效率
  4. 批处理执行

    • 使用 wparse batch 读取文件数据
    • 应用 WPL 规则进行解析
    • 数据输出到 blackhole(丢弃)
  5. 性能统计

    • 实时显示处理进度
    • 记录吞吐量、处理时间等指标
    • 输出最终性能报告

数据流向

wpgen 生成器 1        wpgen 生成器 2
       ↓                    ↓
   gen.dat              gen1.dat
       ↓                    ↓
┌────────────────────────────────┐
│      wparse batch             │
│   - 读取文件数据               │
│   - 应用 WPL 规则解析          │
│   - 分发到 sinks              │
└────────────────────────────────┘
    ↓
┌─────────────┬─────────────┐
│  blackhole  │   monitor   │
│    sink     │    sink     │
│ (丢弃数据)  │ (收集统计)  │
└─────────────┴─────────────┘

验证与故障排除

运行成功验证

  1. 检查性能输出

    • 查看 terminal 输出的实时统计信息
    • 关注 “Throughput”(吞吐量)指标
    • 确认无错误或异常
  2. 验证输出文件

    # 检查监控数据
    ls -la data/out_dat/monitor.dat
    
    # 确认其他文件为空(无错误)
    ls -la data/out_dat/{error,miss,residue}.dat
    

性能

优化

  1. 系统级优化

    Linux 系统:

    # CPU 亲和性设置
    taskset -c 0-5 ./run.sh -w 6
    
    # I/O 调度器优化(SSD)
    echo noop | sudo tee /sys/block/sdX/queue/scheduler
    
    # 增大文件描述符限制
    ulimit -n 65536
    

    macOS 系统:

    # 调整文件描述符限制
    ulimit -n 65536
    
    # 调整系统参数(需要管理员权限)
    sudo sysctl -w kern.maxfiles=65536
    sudo sysctl -w kern.maxfilesperproc=65536
    
  2. 应用级优化

    • 增加 worker 数量:-w 12(不超过 CPU 核心数)
    • 使用更快的 WPL 规则(如 nginx)
    • 启用数据预生成并缓存
  3. 存储优化

    • 使用 SSD 存储
    • 使用 RAID 0 提高读写性能
    • 考虑使用内存文件系统

影响因素

  1. WPL 规则复杂度

    • nginx:简单正则,性能最佳
    • apache:中等复杂度
    • sysmon:复杂规则,性能较低
  2. 数据特征

    • 日志行长度
    • 正则匹配复杂度
    • 字段提取数量
  3. 系统配置

    • CPU 核心数和频率
    • 内存大小和速度
    • 磁盘 I/O 性能(关键因素)

使用建议

  • 选择 file_blackhole

    • 需要测试纯解析性能
    • 批量数据处理场景
    • 追求最高吞吐量
  • 选择 tcp_blackhole

    • 需要可靠的网络传输
    • 实时数据处理
    • 模拟 TCP 数据源
  • 选择 syslog_blackhole

    • 传统 syslog 集成
    • 极限性能测试
    • 日志收集场景

本文档最后更新时间:2025-12-16

File Trans to File

Benchmark for “File Source → File Sink” transformation scenario: uses wpgen to generate test data files, wparse reads in batch mode with WPL parsing and OML transformation, outputs to files to test complete transformation pipeline performance.

Purpose

Validate the ability to:

  • Read data from files in batch mode
  • Apply WPL parsing rules
  • Apply OML transformation models
  • Write transformed output to files
  • Measure complete transformation pipeline throughput

Features Validated

FeatureDescription
File SourceReading from pre-generated data files
WPL ParsingApplying parsing rules
OML TransformationApplying transformation models
File SinkWriting transformed output to files
Complete PipelineFull file-to-file transformation

Quick Start

cd benchmark/case_file/trans_to_file

# Default test (20M lines, 2 workers)
./run.sh

# Medium dataset (200K lines)
./run.sh -m

Data Flow

wpgen → gen.dat → wparse batch (parse + OML) → all.dat (output file)

file_trans_file 说明 (中文)

本用例演示“文件源 → 文件汇“的转换性能基准测试场景:使用 wpgen 生成测试数据文件,wparse 通过批处理模式读取并进行 WPL 解析和 OML 转换,输出到文件以测试完整转换管道性能。

目录结构

benchmark/file_file/
├── README.md                    # 本说明文档
├── run.sh                       # 性能测试运行脚本
├── record.md                    # 运行记录文档
├── conf/                        # 配置文件目录
│   ├── wparse.toml             # WarpParse 主配置
│   └── wpgen.toml              # 文件生成器配置
├── data/                        # 运行数据目录
│   ├── in_dat/                 # 输入数据目录
│   │   └── gen.dat            # 生成数据文件
│   ├── out_dat/                # 输出数据目录
│   │   ├── all.dat            # 主输出文件
│   │   ├── error.dat          # 错误数据输出
│   │   ├── miss.dat           # 缺失数据输出
│   │   ├── monitor.dat        # 监控数据输出
│   │   └── residue.dat        # 残留数据输出
│   ├── logs/                   # 日志文件目录
│   └── rescue/                 # 救援数据目录
└── .run/                       # 运行时数据目录
    └── rule_mapping.dat        # 规则映射数据

快速开始

运行环境要求

  • WarpParse 引擎(需在系统 PATH 中)
  • Bash shell 环境
  • 推荐系统:
    • Linux:最佳性能,支持所有优化功能
    • macOS:良好性能,部分优化功能受限

运行命令

# 进入 benchmark 目录
cd benchmark/file_file

# 默认大规模性能测试(2000 万行数据)
./run.sh

# 中等规模测试(20 万行数据)
./run.sh -m

# 强制重新生成数据(即使已存在)
./run.sh -f

# 指定 worker 数量
./run.sh -w 8

# 使用特定 WPL 规则
./run.sh nginx

# 组合使用
./run.sh -m -w 8 -f nginx

运行参数

参数说明默认值
-m使用中等规模数据集2000万 → 20万行
-f强制重新生成数据智能检测
-w <cnt>指定 worker 数量2
wpl_dirWPL 规则目录名nginx
speed生成器限速(行/秒)0(不限速)

性能测试选项

  • 默认测试:2000 万行数据,单路文件源,2 个 worker
  • 中等测试:20 万行数据,适合快速验证
  • 强制生成-f 参数强制重新生成测试数据
  • 自定义 WPL:支持 nginx、apache、sysmon 等规则

执行逻辑

流程概览

run.sh 脚本执行以下主要步骤:

  1. 环境准备

    • 加载 benchmark 公共函数库
    • 解析命令行参数
    • 设置默认值(大规模:2000万行,中等:20万行)
  2. 初始化环境

    • 初始化 release 模式环境
    • 验证指定的 WPL 规则路径
    • 清理历史数据和日志
  3. 数据生成检查

    • 检查 data/in_dat/gen.dat 是否存在
    • 如果不存在或使用 -f 参数,则生成新数据
  4. 数据生成(如需要)

    • 启动 wpgen 生成数据到 gen.dat
    • 支持并发和批量生成
  5. 批处理执行

    • 使用 wparse batch 读取文件数据
    • 应用 WPL 规则进行解析
    • 数据输出到文件(all.dat)
  6. 性能统计

    • 实时显示处理进度
    • 记录吞吐量、处理时间等指标
    • 输出最终性能报告

数据流向

wpgen 生成器
       ↓
   gen.dat (输入文件)
       ↓
┌────────────────────────────────┐
│      wparse batch             │
│   - 读取文件数据               │
│   - 应用 WPL 规则解析          │
│   - 分发到 sinks              │
└────────────────────────────────┘
    ↓
┌─────────────┬─────────────┐
│    all.dat  │   monitor   │
│  (主输出)   │    sink     │
│             │ (收集统计)  │
└─────────────┴─────────────┘

使用建议

  • 选择 file_file

    • 需要测试完整的数据处理管道
    • 验证输出格式和内容
    • 文件到文件转换场景
  • 选择 file_blackhole

    • 只关心解析性能
    • 批量数据处理
    • 追求最高吞吐量
  • 选择 tcp_blackhole

    • 网络数据源
    • 实时处理需求
    • 可靠传输要求
  • 选择 syslog_blackhole

    • 日志集成
    • 高频数据接收
    • 传统 syslog 场景

贡献与反馈

如发现问题或有性能优化建议,请:

  1. 提交详细的测试环境信息(CPU、内存、存储配置)
  2. 包含完整的性能报告和 I/O 统计
  3. 分享存储优化经验和最佳实践
  4. 提供不同文件格式下的性能对比

本文档最后更新时间:2025-12-16

syslog source

syslog_blackhole 说明

syslog_blackhole 说明

TCP Source Benchmarks

Performance benchmarks for TCP-based data source scenarios using daemon mode.

Purpose

Test TCP network reception and parsing performance with reliable transport.

Test Scenarios

ScenarioDescriptionValidated Features
parse_to_blackholeTCP → Parse → DiscardTCP reception + pure parsing throughput
parse_to_fileTCP → Parse → FileComplete TCP-to-file parsing pipeline
trans_to_blackholeTCP → Parse+Transform → DiscardParsing + OML transformation throughput
trans_to_fileTCP → Parse+Transform → FileComplete transformation pipeline

Quick Start

cd benchmark

# Parse to blackhole (default: 20M lines, 6 workers)
./case_tcp/parse_to_blackhole/run.sh

# Medium dataset (200K lines)
./case_tcp/parse_to_blackhole/run.sh -m

# Custom configuration
./case_tcp/parse_to_file/run.sh -w 8 sysmon 500000

Data Flow

wpgen → TCP (port 19001) → wparse daemon → sink (blackhole/file)

Configuration

  • Default Port: 19001
  • Default Workers: 6
  • Protocol: TCP (reliable transport)

TCP 源基准测试 (中文)

基于 TCP 数据源的性能基准测试,使用 daemon 模式。

用途

测试 TCP 网络接收和解析性能,使用可靠传输。

测试场景

场景说明验证特性
parse_to_blackholeTCP → 解析 → 丢弃TCP 接收 + 纯解析吞吐量
parse_to_fileTCP → 解析 → 文件完整 TCP 到文件解析管道
trans_to_blackholeTCP → 解析+转换 → 丢弃解析 + OML 转换吞吐量
trans_to_fileTCP → 解析+转换 → 文件完整转换管道

快速开始

cd benchmark

# 解析到黑洞(默认:2000 万行,6 个 worker)
./case_tcp/parse_to_blackhole/run.sh

# 中等规模数据集(20 万行)
./case_tcp/parse_to_blackhole/run.sh -m

# 自定义配置
./case_tcp/parse_to_file/run.sh -w 8 sysmon 500000

数据流向

wpgen → TCP (端口 19001) → wparse daemon → sink (黑洞/文件)

配置说明

  • 默认端口: 19001
  • 默认 Worker: 6
  • 协议: TCP(可靠传输)

apt_tcp_to_blackhole

Case Metadata

  • Case ID: apt_tcp_to_blackhole
  • Category: tcp
  • Capability: parse_only
  • Topology: tcp -> blackhole
  • Platforms: Mac M4 Mini / Linux (AWS EC2)

Scenario Definition

  • 日志类型: APT Threat Log (3K)
  • 平均大小: 3K
  • 能力: parse_only
  • 输入/输出: TCP -> BlackHole
  • 说明: APT Threat Log 场景,TCP 输入到 BlackHole 输出,执行 日志解析 能力。

Dataset Contract

  • 输入数据: benchmark/case_tcp/parse_to_blackhole/data/in_dat/gen.dat(数据文件) / benchmark/case_tcp/parse_to_blackhole/conf/wpgen.toml(生成器配置)
  • 事件数: 支持 -m(中等规模)与 -c(指定条数),事件含义与生成器配置保持一致
  • 编码/分隔: UTF-8 / LF
  • 混合比例: 不适用

Configuration Binding

  • WarpParse: benchmark/case_tcp/parse_to_blackhole/conf/wparse.toml(规则目录:benchmark/models/wpl/apt;解析场景不启用 OML)
  • Vector-VRL: benchmark/vector/vector-vrl/apt_tcp_to_blackhole.toml
  • Vector-Fixed: benchmark/vector/vector-fixed/apt_tcp_to_blackhole.toml
  • Logstash: benchmark/logstash/logstash_parse/apt_tcp_to_blackhole.conf

Execution Contract

  • 结束条件: 消费完等量事件(或按数据集规模),如需按时间结束请补充
  • 并发/Worker: 默认配置(wparse 的 parse_workers 以配置为准)
  • 重复次数: 默认单次;建议 N=3 取 median

Metrics

  • EPS: Events Per Second
  • MPS: MiB/s,公式:MPS = EPS * AvgEventSize / 1024 / 1024
  • CPU: 多核累计百分比(例如 800% ≈ 8 个逻辑核满载)
  • MEM: 进程内存占用(Avg/Peak)
  • Rule Size: 规则配置体积

Performance Data

Linux (AWS EC2)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse129,600438.37499% / 558%265 MB / 389 MB6.86x
Vector18,90063.93774% / 794%229 MB / 243 MB1.0x
Logstash10,18334.45733% / 757%1294 MB / 1308 MB0.54x

macOS (Mac M4 Mini)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse299,7001013.80718% / 743%335 MB / 351 MB5.88x
Vector-VRL51,000172.52834% / 887%385 MB / 413 MB1.0x
Logstash31,446106.37843% / 892%1218 MB / 1313 MB0.62x

Correctness Check

  • 对齐说明: 参见 benchmark/report/test_sample.md
  • 抽样方式: 运行 file 输出链路进行抽样对比,检查关键字段与 Golden 输出一致
  • 输出路径约定: benchmark/case_tcp/parse_to_file/data/out_dat/(如需校验可切换到 file 输出)

Notes

  • Loopback TCP: TCP 场景均使用 127.0.0.1 回环,不受物理网卡限制
  • 实例规格: 若为 TBD,loopback TCP 口径不受实例带宽/ENI 影响
  • 限制: 单机测试,未覆盖分布式/HA

References

  • Mac 报告: benchmark/report/report_mac.md#314-apt-threat-log-3K(章节 3.1.4)
  • Linux 报告: benchmark/report/report_linux.md#314-apt-threat-log-3K(章节 3.1.4)
  • 规则说明: benchmark/report/test_rule.md
  • 样本对齐: benchmark/report/test_sample.md

apt_tcp_to_file

Case Metadata

  • Case ID: apt_tcp_to_file
  • Category: tcp
  • Capability: parse_only
  • Topology: tcp -> file
  • Platforms: Mac M4 Mini / Linux (AWS EC2)

Scenario Definition

  • 日志类型: APT Threat Log (3K)
  • 平均大小: 3K
  • 能力: parse_only
  • 输入/输出: TCP -> File
  • 说明: APT Threat Log 场景,TCP 输入到 File 输出,执行 日志解析 能力。

Dataset Contract

  • 输入数据: benchmark/case_tcp/parse_to_blackhole/data/in_dat/gen.dat(数据文件) / benchmark/case_tcp/parse_to_file/conf/wpgen.toml(生成器配置)
  • 事件数: 支持 -m(中等规模)与 -c(指定条数),事件含义与生成器配置保持一致
  • 编码/分隔: UTF-8 / LF
  • 混合比例: 不适用

Configuration Binding

  • WarpParse: benchmark/case_tcp/parse_to_file/conf/wparse.toml(规则目录:benchmark/models/wpl/apt;解析场景不启用 OML)
  • Vector-VRL: benchmark/vector/vector-vrl/apt_tcp_to_file.toml
  • Vector-Fixed: benchmark/vector/vector-fixed/apt_tcp_to_file.toml
  • Logstash: benchmark/logstash/logstash_parse/apt_tcp_to_file.conf

Execution Contract

  • 结束条件: 消费完等量事件(或按数据集规模),如需按时间结束请补充
  • 并发/Worker: 默认配置(wparse 的 parse_workers 以配置为准)
  • 重复次数: 默认单次;建议 N=3 取 median

Metrics

  • EPS: Events Per Second
  • MPS: MiB/s,公式:MPS = EPS * AvgEventSize / 1024 / 1024
  • CPU: 多核累计百分比(例如 800% ≈ 8 个逻辑核满载)
  • MEM: 进程内存占用(Avg/Peak)
  • Rule Size: 规则配置体积

Performance Data

Linux (AWS EC2)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse55,000186.04362% / 368%197 MB / 224 MB5.91x
Vector9,30031.46412% / 450%211 MB / 218 MB1.0x
Logstash8,92830.20672% / 726%1305 MB / 1369 MB0.96x

macOS (Mac M4 Mini)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse99,900337.94336% / 352%333 MB / 508 MB2.69x
Vector-VRL37,200125.84652% / 837%411 MB / 424 MB1.0x
Logstash30,120101.89840% / 897%1060 MB / 1232 MB0.81x

Correctness Check

  • 对齐说明: 参见 benchmark/report/test_sample.md
  • 抽样方式: 运行 file 输出链路进行抽样对比,检查关键字段与 Golden 输出一致
  • 输出路径约定: benchmark/case_tcp/parse_to_file/data/out_dat/(如需校验可切换到 file 输出)

Notes

  • Loopback TCP: TCP 场景均使用 127.0.0.1 回环,不受物理网卡限制
  • 实例规格: 若为 TBD,loopback TCP 口径不受实例带宽/ENI 影响
  • 限制: 单机测试,未覆盖分布式/HA

References

  • Mac 报告: benchmark/report/report_mac.md#314-apt-threat-log-3K(章节 3.1.4)
  • Linux 报告: benchmark/report/report_linux.md#314-apt-threat-log-3K(章节 3.1.4)
  • 规则说明: benchmark/report/test_rule.md
  • 样本对齐: benchmark/report/test_sample.md

aws_tcp_to_blackhole

Case Metadata

  • Case ID: aws_tcp_to_blackhole
  • Category: tcp
  • Capability: parse_only
  • Topology: tcp -> blackhole
  • Platforms: Mac M4 Mini / Linux (AWS EC2)

Scenario Definition

  • 日志类型: AWS ELB Log (411B)
  • 平均大小: 411B
  • 能力: parse_only
  • 输入/输出: TCP -> BlackHole
  • 说明: AWS ELB Log 场景,TCP 输入到 BlackHole 输出,执行 日志解析 能力。

Dataset Contract

  • 输入数据: benchmark/case_tcp/parse_to_blackhole/data/in_dat/gen.dat(数据文件) / benchmark/case_tcp/parse_to_blackhole/conf/wpgen.toml(生成器配置)
  • 事件数: 支持 -m(中等规模)与 -c(指定条数),事件含义与生成器配置保持一致
  • 编码/分隔: UTF-8 / LF
  • 混合比例: 不适用

Configuration Binding

  • WarpParse: benchmark/case_tcp/parse_to_blackhole/conf/wparse.toml(规则目录:benchmark/models/wpl/aws;解析场景不启用 OML)
  • Vector-VRL: benchmark/vector/vector-vrl/aws_tcp_to_blackhole.toml
  • Vector-Fixed: benchmark/vector/vector-fixed/aws_tcp_to_blackhole.toml
  • Logstash: benchmark/logstash/logstash_parse/aws_tcp_to_blackhole.conf

Execution Contract

  • 结束条件: 消费完等量事件(或按数据集规模),如需按时间结束请补充
  • 并发/Worker: 默认配置(wparse 的 parse_workers 以配置为准)
  • 重复次数: 默认单次;建议 N=3 取 median

Metrics

  • EPS: Events Per Second
  • MPS: MiB/s,公式:MPS = EPS * AvgEventSize / 1024 / 1024
  • CPU: 多核累计百分比(例如 800% ≈ 8 个逻辑核满载)
  • MEM: 进程内存占用(Avg/Peak)
  • Rule Size: 规则配置体积

Performance Data

Linux (AWS EC2)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse369,900144.98669% / 724%178 MB / 461 MB2.49x
Vector-VRL148,40058.16456% / 486%178 MB / 185 MB1.0x
Vector-Fixed176,60069.22417% / 435%169 MB / 176 MB1.19x
Logstash125,00049.00557% / 625%1181 MB / 1217 MB0.84x

macOS (Mac M4 Mini)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse947,300371.33625% / 664%357 MB / 362 MB2.40x
Vector-VRL394,600154.67546% / 620%275 MB / 286 MB1.0x
Vector-Fixed555,500217.73465% / 523%250 MB / 255 MB1.41x
Logstash425,531166.79817% / 879%1257 MB / 1287 MB1.08x

Correctness Check

  • 对齐说明: 参见 benchmark/report/test_sample.md
  • 抽样方式: 运行 file 输出链路进行抽样对比,检查关键字段与 Golden 输出一致
  • 输出路径约定: benchmark/case_tcp/parse_to_file/data/out_dat/(如需校验可切换到 file 输出)

Notes

  • Loopback TCP: TCP 场景均使用 127.0.0.1 回环,不受物理网卡限制
  • 实例规格: 若为 TBD,loopback TCP 口径不受实例带宽/ENI 影响
  • 限制: 单机测试,未覆盖分布式/HA

References

  • Mac 报告: benchmark/report/report_mac.md#312-aws-elb-log-411b(章节 3.1.2)
  • Linux 报告: benchmark/report/report_linux.md#312-aws-elb-log-411b(章节 3.1.2)
  • 规则说明: benchmark/report/test_rule.md
  • 样本对齐: benchmark/report/test_sample.md

aws_tcp_to_file

Case Metadata

  • Case ID: aws_tcp_to_file
  • Category: tcp
  • Capability: parse_only
  • Topology: tcp -> file
  • Platforms: Mac M4 Mini / Linux (AWS EC2)

Scenario Definition

  • 日志类型: AWS ELB Log (411B)
  • 平均大小: 411B
  • 能力: parse_only
  • 输入/输出: TCP -> File
  • 说明: AWS ELB Log 场景,TCP 输入到 File 输出,执行 日志解析 能力。

Dataset Contract

  • 输入数据: benchmark/case_tcp/parse_to_blackhole/data/in_dat/gen.dat(数据文件) / benchmark/case_tcp/parse_to_file/conf/wpgen.toml(生成器配置)
  • 事件数: 支持 -m(中等规模)与 -c(指定条数),事件含义与生成器配置保持一致
  • 编码/分隔: UTF-8 / LF
  • 混合比例: 不适用

Configuration Binding

  • WarpParse: benchmark/case_tcp/parse_to_file/conf/wparse.toml(规则目录:benchmark/models/wpl/aws;解析场景不启用 OML)
  • Vector-VRL: benchmark/vector/vector-vrl/aws_tcp_to_file.toml
  • Vector-Fixed: benchmark/vector/vector-fixed/aws_tcp_to_file.toml
  • Logstash: benchmark/logstash/logstash_parse/aws_tcp_to_file.conf

Execution Contract

  • 结束条件: 消费完等量事件(或按数据集规模),如需按时间结束请补充
  • 并发/Worker: 默认配置(wparse 的 parse_workers 以配置为准)
  • 重复次数: 默认单次;建议 N=3 取 median

Metrics

  • EPS: Events Per Second
  • MPS: MiB/s,公式:MPS = EPS * AvgEventSize / 1024 / 1024
  • CPU: 多核累计百分比(例如 800% ≈ 8 个逻辑核满载)
  • MEM: 进程内存占用(Avg/Peak)
  • Rule Size: 规则配置体积

Performance Data

Linux (AWS EC2)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse169,90066.59686% / 699%191 MB / 251 MB9.71x
Vector-VRL17,5006.86169% / 176%166 MB / 171 MB1.0x
Vector-Fixed16,6006.51159% / 171%157 MB / 164 MB0.95x
Logstash121,95147.80559% / 621%1283 MB / 1359 MB6.97x

macOS (Mac M4 Mini)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse349,700137.07496% / 537%333 MB / 432 MB4.12x
Vector-VRL84,70033.20240% / 256%268 MB / 275 MB1.0x
Vector-Fixed86,90034.06199% / 208%252 MB / 264 MB1.03x
Logstash350,877137.53679% / 891%1288 MB / 1327 MB4.14x

Correctness Check

  • 对齐说明: 参见 benchmark/report/test_sample.md
  • 抽样方式: 运行 file 输出链路进行抽样对比,检查关键字段与 Golden 输出一致
  • 输出路径约定: benchmark/case_tcp/parse_to_file/data/out_dat/(如需校验可切换到 file 输出)

Notes

  • Loopback TCP: TCP 场景均使用 127.0.0.1 回环,不受物理网卡限制
  • 实例规格: 若为 TBD,loopback TCP 口径不受实例带宽/ENI 影响
  • 限制: 单机测试,未覆盖分布式/HA

References

  • Mac 报告: benchmark/report/report_mac.md#312-aws-elb-log-411b(章节 3.1.2)
  • Linux 报告: benchmark/report/report_linux.md#312-aws-elb-log-411b(章节 3.1.2)
  • 规则说明: benchmark/report/test_rule.md
  • 样本对齐: benchmark/report/test_sample.md

mixed_tcp_to_blackhole

Case Metadata

  • Case ID: mixed_tcp_to_blackhole
  • Category: tcp
  • Capability: parse_only
  • Topology: tcp -> blackhole
  • Platforms: Mac M4 Mini / Linux (AWS EC2)

Scenario Definition

  • 日志类型: Mixed Log (平均日志大小:886B)
  • 平均大小: 平均日志大小:886B
  • 能力: parse_only
  • 输入/输出: TCP -> BlackHole
  • 说明: Mixed Log 场景,TCP 输入到 BlackHole 输出,执行 日志解析 能力。

Dataset Contract

  • 输入数据: benchmark/case_tcp/parse_to_blackhole/data/in_dat/gen.dat(数据文件) / benchmark/case_tcp/parse_to_blackhole/conf/wpgen.toml(生成器配置)
  • 事件数: 支持 -m(中等规模)与 -c(指定条数),事件含义与生成器配置保持一致
  • 编码/分隔: UTF-8 / LF
  • 混合比例: 3:2:1:1(nginx:aws:firewall:apt)

Configuration Binding

  • WarpParse: benchmark/case_tcp/parse_to_blackhole/conf/wparse.toml(规则目录:benchmark/models/wpl/mixed;解析场景不启用 OML)
  • Vector-VRL: benchmark/vector/vector-vrl/mixed_tcp_to_blackhole.toml
  • Vector-Fixed: benchmark/vector/vector-fixed/mixed_tcp_to_blackhole.toml
  • Logstash: benchmark/logstash/logstash_parse/mixed_tcp_to_blackhole.conf

Execution Contract

  • 结束条件: 消费完等量事件(或按数据集规模),如需按时间结束请补充
  • 并发/Worker: 默认配置(wparse 的 parse_workers 以配置为准)
  • 重复次数: 默认单次;建议 N=3 取 median

Metrics

  • EPS: Events Per Second
  • MPS: MiB/s,公式:MPS = EPS * AvgEventSize / 1024 / 1024
  • CPU: 多核累计百分比(例如 800% ≈ 8 个逻辑核满载)
  • MEM: 进程内存占用(Avg/Peak)
  • Rule Size: 规则配置体积

Performance Data

Linux (AWS EC2)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse258,600218.51532% / 709%189 MB / 483 MB3.10x
Vector-VRL83,30070.38516% / 781%208 MB / 222 MB1.00x
Vector-Fixed81,70069.03518% / 784%181 MB / 191 MB0.98x
Logstash35,08729.65629% / 697%1222 MB / 1282 MB0.42x

macOS (Mac M4 Mini)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse586,900495.90697% / 706%299 MB / 322 MB2.69x
Vector-VRL218,600184.71891% / 930%351 MB / 369 MB1.0x
Vector-Fixed220,100185.98894% / 935%293 MB / 312 MB1.01x
Logstash128,205108.33893% / 957%1258 MB / 1289 MB0.66x

Correctness Check

  • 对齐说明: 参见 benchmark/report/test_sample.md
  • 抽样方式: 运行 file 输出链路进行抽样对比,检查关键字段与 Golden 输出一致
  • 输出路径约定: benchmark/case_tcp/parse_to_file/data/out_dat/(如需校验可切换到 file 输出)

Notes

  • Loopback TCP: TCP 场景均使用 127.0.0.1 回环,不受物理网卡限制
  • 实例规格: 若为 TBD,loopback TCP 口径不受实例带宽/ENI 影响
  • 限制: 单机测试,未覆盖分布式/HA

References

  • Mac 报告: benchmark/report/report_mac.md#315-mixed-log-平均日志大小886b(章节 3.1.5)
  • Linux 报告: benchmark/report/report_linux.md#315-mixed-log-平均日志大小886b(章节 3.1.5)
  • 规则说明: benchmark/report/test_rule.md
  • 样本对齐: benchmark/report/test_sample.md

mixed_tcp_to_file

Case Metadata

  • Case ID: mixed_tcp_to_file
  • Category: tcp
  • Capability: parse_only
  • Topology: tcp -> file
  • Platforms: Mac M4 Mini / Linux (AWS EC2)

Scenario Definition

  • 日志类型: Mixed Log (平均日志大小:886B)
  • 平均大小: 平均日志大小:886B
  • 能力: parse_only
  • 输入/输出: TCP -> File
  • 说明: Mixed Log 场景,TCP 输入到 File 输出,执行 日志解析 能力。

Dataset Contract

  • 输入数据: benchmark/case_tcp/parse_to_blackhole/data/in_dat/gen.dat(数据文件) / benchmark/case_tcp/parse_to_file/conf/wpgen.toml(生成器配置)
  • 事件数: 支持 -m(中等规模)与 -c(指定条数),事件含义与生成器配置保持一致
  • 编码/分隔: UTF-8 / LF
  • 混合比例: 3:2:1:1(nginx:aws:firewall:apt)

Configuration Binding

  • WarpParse: benchmark/case_tcp/parse_to_file/conf/wparse.toml(规则目录:benchmark/models/wpl/mixed;解析场景不启用 OML)
  • Vector-VRL: benchmark/vector/vector-vrl/mixed_tcp_to_file.toml
  • Vector-Fixed: benchmark/vector/vector-fixed/mixed_tcp_to_file.toml
  • Logstash: benchmark/logstash/logstash_parse/mixed_tcp_to_file.conf

Execution Contract

  • 结束条件: 消费完等量事件(或按数据集规模),如需按时间结束请补充
  • 并发/Worker: 默认配置(wparse 的 parse_workers 以配置为准)
  • 重复次数: 默认单次;建议 N=3 取 median

Metrics

  • EPS: Events Per Second
  • MPS: MiB/s,公式:MPS = EPS * AvgEventSize / 1024 / 1024
  • CPU: 多核累计百分比(例如 800% ≈ 8 个逻辑核满载)
  • MEM: 进程内存占用(Avg/Peak)
  • Rule Size: 规则配置体积

Performance Data

Linux (AWS EC2)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse149,400126.24546% / 623%111 MB / 221 MB7.82x
Vector-VRL19,10016.14315% / 332%275 MB / 287 MB1.00x
Vector-Fixed19,20016.22276% / 293%190 MB / 195 MB1.01x
Logstash32,78627.70593% / 670%1317 MB / 1428 MB1.72x

macOS (Mac M4 Mini)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse308,400260.58537% / 560%177 MB / 251 MB3.90x
Vector-VRL79,00066.75383% / 415%393 MB / 396 MB1.0x
Vector-Fixed79,50067.17384% / 407%331 MB / 355 MB1.01x
Logstash126,582106.96879% / 972%1278 MB / 1296 MB1.60x

Correctness Check

  • 对齐说明: 参见 benchmark/report/test_sample.md
  • 抽样方式: 运行 file 输出链路进行抽样对比,检查关键字段与 Golden 输出一致
  • 输出路径约定: benchmark/case_tcp/parse_to_file/data/out_dat/(如需校验可切换到 file 输出)

Notes

  • Loopback TCP: TCP 场景均使用 127.0.0.1 回环,不受物理网卡限制
  • 实例规格: 若为 TBD,loopback TCP 口径不受实例带宽/ENI 影响
  • 限制: 单机测试,未覆盖分布式/HA

References

  • Mac 报告: benchmark/report/report_mac.md#315-mixed-log-平均日志大小886b(章节 3.1.5)
  • Linux 报告: benchmark/report/report_linux.md#315-mixed-log-平均日志大小886b(章节 3.1.5)
  • 规则说明: benchmark/report/test_rule.md
  • 样本对齐: benchmark/report/test_sample.md

nginx_tcp_to_blackhole

Case Metadata

  • Case ID: nginx_tcp_to_blackhole
  • Category: tcp
  • Capability: parse_only
  • Topology: tcp -> blackhole
  • Platforms: Mac M4 Mini / Linux (AWS EC2)

Scenario Definition

  • 日志类型: Nginx Access Log (239B)
  • 平均大小: 239B
  • 能力: parse_only
  • 输入/输出: TCP -> BlackHole
  • 说明: Nginx Access Log 场景,TCP 输入到 BlackHole 输出,执行 日志解析 能力。

Dataset Contract

  • 输入数据: benchmark/case_tcp/parse_to_blackhole/data/in_dat/gen.dat(数据文件) / benchmark/case_tcp/parse_to_blackhole/conf/wpgen.toml(生成器配置)
  • 事件数: 支持 -m(中等规模)与 -c(指定条数),事件含义与生成器配置保持一致
  • 编码/分隔: UTF-8 / LF
  • 混合比例: 不适用

Configuration Binding

  • WarpParse: benchmark/case_tcp/parse_to_blackhole/conf/wparse.toml(规则目录:benchmark/models/wpl/nginx;解析场景不启用 OML)
  • Vector-VRL: benchmark/vector/vector-vrl/nginx_tcp_to_blackhole.toml
  • Vector-Fixed: benchmark/vector/vector-fixed/nginx_tcp_to_blackhole.toml
  • Logstash: benchmark/logstash/logstash_parse/nginx_tcp_to_blackhole.conf

Execution Contract

  • 结束条件: 消费完等量事件(或按数据集规模),如需按时间结束请补充
  • 并发/Worker: 默认配置(wparse 的 parse_workers 以配置为准)
  • 重复次数: 默认单次;建议 N=3 取 median

Metrics

  • EPS: Events Per Second
  • MPS: MiB/s,公式:MPS = EPS * AvgEventSize / 1024 / 1024
  • CPU: 多核累计百分比(例如 800% ≈ 8 个逻辑核满载)
  • MEM: 进程内存占用(Avg/Peak)
  • Rule Size: 规则配置体积

Performance Data

Linux (AWS EC2)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse765,800174.55574% / 628%245 MB / 366 MB1.56x
Vector-VRL492,200112.19501% / 510%155 MB / 159 MB1.0x
Vector-Fixed255,50058.24480% / 533%138 MB / 145 MB0.52x
Logstash161,29036.76462% / 475%1174 MB / 1224 MB0.33x

macOS (Mac M4 Mini)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse1,657,500377.80530% / 580%307 MB / 320 MB1.42x
Vector-VRL1,163,700265.24540% / 598%218 MB / 224 MB1.0x
Vector-Fixed730,700166.55592% / 658%212 MB / 220 MB0.63x
Logstash541,403123.40465% / 667%1161 MB / 1234 MB0.47x

Correctness Check

  • 对齐说明: 参见 benchmark/report/test_sample.md
  • 抽样方式: 运行 file 输出链路进行抽样对比,检查关键字段与 Golden 输出一致
  • 输出路径约定: benchmark/case_tcp/parse_to_file/data/out_dat/(如需校验可切换到 file 输出)

Notes

  • Loopback TCP: TCP 场景均使用 127.0.0.1 回环,不受物理网卡限制
  • 实例规格: 若为 TBD,loopback TCP 口径不受实例带宽/ENI 影响
  • 限制: 单机测试,未覆盖分布式/HA

References

  • Mac 报告: benchmark/report/report_mac.md#311-nginx-access-log-239b(章节 3.1.1)
  • Linux 报告: benchmark/report/report_linux.md#311-nginx-access-log-239b(章节 3.1.1)
  • 规则说明: benchmark/report/test_rule.md
  • 样本对齐: benchmark/report/test_sample.md

nginx_tcp_to_file

Case Metadata

  • Case ID: nginx_tcp_to_file
  • Category: tcp
  • Capability: parse_only
  • Topology: tcp -> file
  • Platforms: Mac M4 Mini / Linux (AWS EC2)

Scenario Definition

  • 日志类型: Nginx Access Log (239B)
  • 平均大小: 239B
  • 能力: parse_only
  • 输入/输出: TCP -> File
  • 说明: Nginx Access Log 场景,TCP 输入到 File 输出,执行 日志解析 能力。

Dataset Contract

  • 输入数据: benchmark/case_tcp/parse_to_blackhole/data/in_dat/gen.dat(数据文件) / benchmark/case_tcp/parse_to_file/conf/wpgen.toml(生成器配置)
  • 事件数: 支持 -m(中等规模)与 -c(指定条数),事件含义与生成器配置保持一致
  • 编码/分隔: UTF-8 / LF
  • 混合比例: 不适用

Configuration Binding

  • WarpParse: benchmark/case_tcp/parse_to_file/conf/wparse.toml(规则目录:benchmark/models/wpl/nginx;解析场景不启用 OML)
  • Vector-VRL: benchmark/vector/vector-vrl/nginx_tcp_to_file.toml
  • Vector-Fixed: benchmark/vector/vector-fixed/nginx_tcp_to_file.toml
  • Logstash: benchmark/logstash/logstash_parse/nginx_tcp_to_file.conf

Execution Contract

  • 结束条件: 消费完等量事件(或按数据集规模),如需按时间结束请补充
  • 并发/Worker: 默认配置(wparse 的 parse_workers 以配置为准)
  • 重复次数: 默认单次;建议 N=3 取 median

Metrics

  • EPS: Events Per Second
  • MPS: MiB/s,公式:MPS = EPS * AvgEventSize / 1024 / 1024
  • CPU: 多核累计百分比(例如 800% ≈ 8 个逻辑核满载)
  • MEM: 进程内存占用(Avg/Peak)
  • Rule Size: 规则配置体积

Performance Data

Linux (AWS EC2)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse377,60086.07645% / 673%221 MB / 444 MB20.30x
Vector-VRL18,6004.24133% / 135%122 MB / 126 MB1.0x
Vector-Fixed17,3003.94148% / 156%115 MB / 119 MB0.93x
Logstash147,05833.52465% / 476%1148 MB / 1186 MB7.91x

macOS (Mac M4 Mini)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse789,000179.84445% / 470%315 MB / 353 MB8.78x
Vector-VRL89,90020.49165% / 170%213 MB / 221 MB1.0x
Vector-Fixed92,30021.04201% / 214%195 MB / 208 MB1.03x
Logstash507,975115.78515% / 762%1153 MB / 1184 MB5.65x

Correctness Check

  • 对齐说明: 参见 benchmark/report/test_sample.md
  • 抽样方式: 运行 file 输出链路进行抽样对比,检查关键字段与 Golden 输出一致
  • 输出路径约定: benchmark/case_tcp/parse_to_file/data/out_dat/(如需校验可切换到 file 输出)

Notes

  • Loopback TCP: TCP 场景均使用 127.0.0.1 回环,不受物理网卡限制
  • 实例规格: 若为 TBD,loopback TCP 口径不受实例带宽/ENI 影响
  • 限制: 单机测试,未覆盖分布式/HA

References

  • Mac 报告: benchmark/report/report_mac.md#311-nginx-access-log-239b(章节 3.1.1)
  • Linux 报告: benchmark/report/report_linux.md#311-nginx-access-log-239b(章节 3.1.1)
  • 规则说明: benchmark/report/test_rule.md
  • 样本对齐: benchmark/report/test_sample.md

sysmon_tcp_to_blackhole

sysmon_tcp_to_file

apt_tcp_to_blackhole

Case Metadata

  • Case ID: apt_tcp_to_blackhole
  • Category: tcp
  • Capability: parse_trans
  • Topology: tcp -> blackhole
  • Platforms: Mac M4 Mini / Linux (AWS EC2)

Scenario Definition

  • 日志类型: APT Threat Log (3K)
  • 平均大小: 3K
  • 能力: parse_trans
  • 输入/输出: TCP -> BlackHole
  • 说明: APT Threat Log 场景,TCP 输入到 BlackHole 输出,执行 日志解析+转换 能力。

Dataset Contract

  • 输入数据: benchmark/case_tcp/parse_to_blackhole/data/in_dat/gen.dat(数据文件) / benchmark/case_tcp/parse_to_blackhole/conf/wpgen.toml(生成器配置)
  • 事件数: 支持 -m(中等规模)与 -c(指定条数),事件含义与生成器配置保持一致
  • 编码/分隔: UTF-8 / LF
  • 混合比例: 不适用

Configuration Binding

  • WarpParse: benchmark/case_tcp/parse_to_blackhole/conf/wparse.toml(规则目录:benchmark/models/wpl/apt;解析+转换使用 benchmark/models/oml)
  • Vector-VRL: benchmark/vector/vector-vrl_transform/apt_tcp_to_blackhole.toml
  • Vector-Fixed: benchmark/vector/vector-fixed_transform/apt_tcp_to_blackhole.toml
  • Logstash: benchmark/logstash/logstash_trans/apt_tcp_to_blackhole.conf

Execution Contract

  • 结束条件: 消费完等量事件(或按数据集规模),如需按时间结束请补充
  • 并发/Worker: 默认配置(wparse 的 parse_workers 以配置为准)
  • 重复次数: 默认单次;建议 N=3 取 median

Metrics

  • EPS: Events Per Second
  • MPS: MiB/s,公式:MPS = EPS * AvgEventSize / 1024 / 1024
  • CPU: 多核累计百分比(例如 800% ≈ 8 个逻辑核满载)
  • MEM: 进程内存占用(Avg/Peak)
  • Rule Size: 规则配置体积

Performance Data

Linux (AWS EC2)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse114,200386.28508% / 532%228 MB / 248 MB6.14x
Vector18,60062.91769% / 790%243 MB / 252 MB1.0x
Logstash9,85233.33704% / 748%1283 MB / 1304 MB0.53x

macOS (Mac M4 Mini)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse279,700946.14762% / 784%335 MB / 345 MB5.38x
Vector-VRL52,000175.90862% / 907%400 MB / 416 MB1.0x
Logstash27,02791.42846% / 926%1379 MB / 1413 MB0.52x

Correctness Check

  • 对齐说明: 参见 benchmark/report/test_sample.md
  • 抽样方式: 运行 file 输出链路进行抽样对比,检查关键字段与 Golden 输出一致
  • 输出路径约定: benchmark/case_tcp/parse_to_file/data/out_dat/(如需校验可切换到 file 输出)

Notes

  • Loopback TCP: TCP 场景均使用 127.0.0.1 回环,不受物理网卡限制
  • 实例规格: 若为 TBD,loopback TCP 口径不受实例带宽/ENI 影响
  • 限制: 单机测试,未覆盖分布式/HA

References

  • Mac 报告: benchmark/report/report_mac.md#324-apt-threat-log-3K(章节 3.2.4)
  • Linux 报告: benchmark/report/report_linux.md#324-apt-threat-log-3K(章节 3.2.4)
  • 规则说明: benchmark/report/test_rule.md
  • 样本对齐: benchmark/report/test_sample.md

apt_tcp_to_file

Case Metadata

  • Case ID: apt_tcp_to_file
  • Category: tcp
  • Capability: parse_trans
  • Topology: tcp -> file
  • Platforms: Mac M4 Mini / Linux (AWS EC2)

Scenario Definition

  • 日志类型: APT Threat Log (3K)
  • 平均大小: 3K
  • 能力: parse_trans
  • 输入/输出: TCP -> File
  • 说明: APT Threat Log 场景,TCP 输入到 File 输出,执行 日志解析+转换 能力。

Dataset Contract

  • 输入数据: benchmark/case_tcp/parse_to_blackhole/data/in_dat/gen.dat(数据文件) / benchmark/case_tcp/parse_to_file/conf/wpgen.toml(生成器配置)
  • 事件数: 支持 -m(中等规模)与 -c(指定条数),事件含义与生成器配置保持一致
  • 编码/分隔: UTF-8 / LF
  • 混合比例: 不适用

Configuration Binding

  • WarpParse: benchmark/case_tcp/parse_to_file/conf/wparse.toml(规则目录:benchmark/models/wpl/apt;解析+转换使用 benchmark/models/oml)
  • Vector-VRL: benchmark/vector/vector-vrl_transform/apt_tcp_to_file.toml
  • Vector-Fixed: benchmark/vector/vector-fixed_transform/apt_tcp_to_file.toml
  • Logstash: benchmark/logstash/logstash_trans/apt_tcp_to_file.conf

Execution Contract

  • 结束条件: 消费完等量事件(或按数据集规模),如需按时间结束请补充
  • 并发/Worker: 默认配置(wparse 的 parse_workers 以配置为准)
  • 重复次数: 默认单次;建议 N=3 取 median

Metrics

  • EPS: Events Per Second
  • MPS: MiB/s,公式:MPS = EPS * AvgEventSize / 1024 / 1024
  • CPU: 多核累计百分比(例如 800% ≈ 8 个逻辑核满载)
  • MEM: 进程内存占用(Avg/Peak)
  • Rule Size: 规则配置体积

Performance Data

Linux (AWS EC2)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse54,800185.36441% / 447%196 MB / 215 MB5.89x
Vector-VRL9,30031.46345% / 479%217 MB / 227 MB1.0x
Logstash8,62029.16671% / 729%1229 MB / 1251 MB0.93x

macOS (Mac M4 Mini)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse89,900304.11355% / 377%300 MB / 324 MB2.41x
Vector-VRL37,300126.18664% / 750%392 MB / 411 MB1.0x
Logstash25,64186.74819% / 936%1300 MB / 1356 MB0.69x

Correctness Check

  • 对齐说明: 参见 benchmark/report/test_sample.md
  • 抽样方式: 运行 file 输出链路进行抽样对比,检查关键字段与 Golden 输出一致
  • 输出路径约定: benchmark/case_tcp/parse_to_file/data/out_dat/(如需校验可切换到 file 输出)

Notes

  • Loopback TCP: TCP 场景均使用 127.0.0.1 回环,不受物理网卡限制
  • 实例规格: 若为 TBD,loopback TCP 口径不受实例带宽/ENI 影响
  • 限制: 单机测试,未覆盖分布式/HA

References

  • Mac 报告: benchmark/report/report_mac.md#324-apt-threat-log-3K(章节 3.2.4)
  • Linux 报告: benchmark/report/report_linux.md#324-apt-threat-log-3K(章节 3.2.4)
  • 规则说明: benchmark/report/test_rule.md
  • 样本对齐: benchmark/report/test_sample.md

aws_tcp_to_blackhole

Case Metadata

  • Case ID: aws_tcp_to_blackhole
  • Category: tcp
  • Capability: parse_trans
  • Topology: tcp -> blackhole
  • Platforms: Mac M4 Mini / Linux (AWS EC2)

Scenario Definition

  • 日志类型: AWS ELB Log (411B)
  • 平均大小: 411B
  • 能力: parse_trans
  • 输入/输出: TCP -> BlackHole
  • 说明: AWS ELB Log 场景,TCP 输入到 BlackHole 输出,执行 日志解析+转换 能力。

Dataset Contract

  • 输入数据: benchmark/case_tcp/parse_to_blackhole/data/in_dat/gen.dat(数据文件) / benchmark/case_tcp/parse_to_blackhole/conf/wpgen.toml(生成器配置)
  • 事件数: 支持 -m(中等规模)与 -c(指定条数),事件含义与生成器配置保持一致
  • 编码/分隔: UTF-8 / LF
  • 混合比例: 不适用

Configuration Binding

  • WarpParse: benchmark/case_tcp/parse_to_blackhole/conf/wparse.toml(规则目录:benchmark/models/wpl/aws;解析+转换使用 benchmark/models/oml)
  • Vector-VRL: benchmark/vector/vector-vrl_transform/aws_tcp_to_blackhole.toml
  • Vector-Fixed: benchmark/vector/vector-fixed_transform/aws_tcp_to_blackhole.toml
  • Logstash: benchmark/logstash/logstash_trans/aws_tcp_to_blackhole.conf

Execution Contract

  • 结束条件: 消费完等量事件(或按数据集规模),如需按时间结束请补充
  • 并发/Worker: 默认配置(wparse 的 parse_workers 以配置为准)
  • 重复次数: 默认单次;建议 N=3 取 median

Metrics

  • EPS: Events Per Second
  • MPS: MiB/s,公式:MPS = EPS * AvgEventSize / 1024 / 1024
  • CPU: 多核累计百分比(例如 800% ≈ 8 个逻辑核满载)
  • MEM: 进程内存占用(Avg/Peak)
  • Rule Size: 规则配置体积

Performance Data

Linux (AWS EC2)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse259,900101.87682% / 697%139 MB / 275 MB1.99x
Vector-VRL130,60051.19446% / 500%191 MB / 195 MB1.0x
Vector-Fixed146,00057.23413% / 441%181 MB / 184 MB1.12x
Logstash78,12530.62624% / 696%1212 MB / 1272 MB0.60x

macOS (Mac M4 Mini)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse757,600296.97714% / 758%270 MB / 360 MB2.04x
Vector-VRL370,900145.38561% / 607%284 MB / 293 MB1.0x
Vector-Fixed481,700188.81466% / 536%265 MB / 272 MB1.30x
Logstash222,22287.10795% / 889%1336 MB / 1377 MB0.60x

Correctness Check

  • 对齐说明: 参见 benchmark/report/test_sample.md
  • 抽样方式: 运行 file 输出链路进行抽样对比,检查关键字段与 Golden 输出一致
  • 输出路径约定: benchmark/case_tcp/parse_to_file/data/out_dat/(如需校验可切换到 file 输出)

Notes

  • Loopback TCP: TCP 场景均使用 127.0.0.1 回环,不受物理网卡限制
  • 实例规格: 若为 TBD,loopback TCP 口径不受实例带宽/ENI 影响
  • 限制: 单机测试,未覆盖分布式/HA

References

  • Mac 报告: benchmark/report/report_mac.md#322-aws-elb-log-411b(章节 3.2.2)
  • Linux 报告: benchmark/report/report_linux.md#322-aws-elb-log-411b(章节 3.2.2)
  • 规则说明: benchmark/report/test_rule.md
  • 样本对齐: benchmark/report/test_sample.md

aws_tcp_to_file

Case Metadata

  • Case ID: aws_tcp_to_file
  • Category: tcp
  • Capability: parse_trans
  • Topology: tcp -> file
  • Platforms: Mac M4 Mini / Linux (AWS EC2)

Scenario Definition

  • 日志类型: AWS ELB Log (411B)
  • 平均大小: 411B
  • 能力: parse_trans
  • 输入/输出: TCP -> File
  • 说明: AWS ELB Log 场景,TCP 输入到 File 输出,执行 日志解析+转换 能力。

Dataset Contract

  • 输入数据: benchmark/case_tcp/parse_to_blackhole/data/in_dat/gen.dat(数据文件) / benchmark/case_tcp/parse_to_file/conf/wpgen.toml(生成器配置)
  • 事件数: 支持 -m(中等规模)与 -c(指定条数),事件含义与生成器配置保持一致
  • 编码/分隔: UTF-8 / LF
  • 混合比例: 不适用

Configuration Binding

  • WarpParse: benchmark/case_tcp/parse_to_file/conf/wparse.toml(规则目录:benchmark/models/wpl/aws;解析+转换使用 benchmark/models/oml)
  • Vector-VRL: benchmark/vector/vector-vrl_transform/aws_tcp_to_file.toml
  • Vector-Fixed: benchmark/vector/vector-fixed_transform/aws_tcp_to_file.toml
  • Logstash: benchmark/logstash/logstash_trans/aws_tcp_to_file.conf

Execution Contract

  • 结束条件: 消费完等量事件(或按数据集规模),如需按时间结束请补充
  • 并发/Worker: 默认配置(wparse 的 parse_workers 以配置为准)
  • 重复次数: 默认单次;建议 N=3 取 median

Metrics

  • EPS: Events Per Second
  • MPS: MiB/s,公式:MPS = EPS * AvgEventSize / 1024 / 1024
  • CPU: 多核累计百分比(例如 800% ≈ 8 个逻辑核满载)
  • MEM: 进程内存占用(Avg/Peak)
  • Rule Size: 规则配置体积

Performance Data

Linux (AWS EC2)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse139,80054.80717% / 738%139 MB / 296 MB7.99x
Vector-VRL17,5006.86177% / 194%181 MB / 187 MB1.0x
Vector-Fixed17,6006.90164% / 182%173 MB / 180 MB1.01x
Logstash69,44427.22636% / 690%1192 MB / 1232 MB3.97x

macOS (Mac M4 Mini)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse319,900125.39540% / 600%321 MB / 432 MB3.87x
Vector-VRL82,70032.42242% / 257%272 MB / 288 MB1.0x
Vector-Fixed83,60032.77211% / 220%260 MB / 274 MB1.01x
Logstash200,00078.39750% / 881%1289 MB / 1325 MB2.42x

Correctness Check

  • 对齐说明: 参见 benchmark/report/test_sample.md
  • 抽样方式: 运行 file 输出链路进行抽样对比,检查关键字段与 Golden 输出一致
  • 输出路径约定: benchmark/case_tcp/parse_to_file/data/out_dat/(如需校验可切换到 file 输出)

Notes

  • Loopback TCP: TCP 场景均使用 127.0.0.1 回环,不受物理网卡限制
  • 实例规格: 若为 TBD,loopback TCP 口径不受实例带宽/ENI 影响
  • 限制: 单机测试,未覆盖分布式/HA

References

  • Mac 报告: benchmark/report/report_mac.md#322-aws-elb-log-411b(章节 3.2.2)
  • Linux 报告: benchmark/report/report_linux.md#322-aws-elb-log-411b(章节 3.2.2)
  • 规则说明: benchmark/report/test_rule.md
  • 样本对齐: benchmark/report/test_sample.md

mixed_tcp_to_blackhole

Case Metadata

  • Case ID: mixed_tcp_to_blackhole
  • Category: tcp
  • Capability: parse_trans
  • Topology: tcp -> blackhole
  • Platforms: Mac M4 Mini / Linux (AWS EC2)

Scenario Definition

  • 日志类型: Mixed Log (平均日志大小:886B)
  • 平均大小: 平均日志大小:886B
  • 能力: parse_trans
  • 输入/输出: TCP -> BlackHole
  • 说明: Mixed Log 场景,TCP 输入到 BlackHole 输出,执行 日志解析+转换 能力。

Dataset Contract

  • 输入数据: benchmark/case_tcp/parse_to_blackhole/data/in_dat/gen.dat(数据文件) / benchmark/case_tcp/parse_to_blackhole/conf/wpgen.toml(生成器配置)
  • 事件数: 支持 -m(中等规模)与 -c(指定条数),事件含义与生成器配置保持一致
  • 编码/分隔: UTF-8 / LF
  • 混合比例: 3:2:1:1(nginx:aws:firewall:apt)

Configuration Binding

  • WarpParse: benchmark/case_tcp/parse_to_blackhole/conf/wparse.toml(规则目录:benchmark/models/wpl/mixed;解析+转换使用 benchmark/models/oml)
  • Vector-VRL: benchmark/vector/vector-vrl_transform/mixed_tcp_to_blackhole.toml
  • Vector-Fixed: benchmark/vector/vector-fixed_transform/mixed_tcp_to_blackhole.toml
  • Logstash: benchmark/logstash/logstash_trans/mixed_tcp_to_blackhole.conf

Execution Contract

  • 结束条件: 消费完等量事件(或按数据集规模),如需按时间结束请补充
  • 并发/Worker: 默认配置(wparse 的 parse_workers 以配置为准)
  • 重复次数: 默认单次;建议 N=3 取 median

Metrics

  • EPS: Events Per Second
  • MPS: MiB/s,公式:MPS = EPS * AvgEventSize / 1024 / 1024
  • CPU: 多核累计百分比(例如 800% ≈ 8 个逻辑核满载)
  • MEM: 进程内存占用(Avg/Peak)
  • Rule Size: 规则配置体积

Performance Data

Linux (AWS EC2)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse190,300160.80603% / 623%119 MB / 190 MB2.40x
Vector-VRL79,40067.09776% / 782%204 MB / 211 MB1.00x
Vector-Fixed76,50064.64776% / 781%190 MB / 203 MB0.96x
Logstash30,30325.60656% / 719%1258 MB / 1287MB0.38x

macOS (Mac M4 Mini)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse543,100458.90799% / 824%394 MB / 479 MB2.61x
Vector-VRL208,200175.92878% / 925%319 MB / 334 MB1.0x
Vector-Fixed206,600174.57919% / 936%296 MB / 321 MB0.99x
Logstash94,33979.71878% / 941%1285 MB / 1318 MB0.45x

Correctness Check

  • 对齐说明: 参见 benchmark/report/test_sample.md
  • 抽样方式: 运行 file 输出链路进行抽样对比,检查关键字段与 Golden 输出一致
  • 输出路径约定: benchmark/case_tcp/parse_to_file/data/out_dat/(如需校验可切换到 file 输出)

Notes

  • Loopback TCP: TCP 场景均使用 127.0.0.1 回环,不受物理网卡限制
  • 实例规格: 若为 TBD,loopback TCP 口径不受实例带宽/ENI 影响
  • 限制: 单机测试,未覆盖分布式/HA

References

  • Mac 报告: benchmark/report/report_mac.md#325-mixed-log-平均日志大小886b(章节 3.2.5)
  • Linux 报告: benchmark/report/report_linux.md#325-mixed-log-平均日志大小886b(章节 3.2.5)
  • 规则说明: benchmark/report/test_rule.md
  • 样本对齐: benchmark/report/test_sample.md

mixed_tcp_to_file

Case Metadata

  • Case ID: mixed_tcp_to_file
  • Category: tcp
  • Capability: parse_trans
  • Topology: tcp -> file
  • Platforms: Mac M4 Mini / Linux (AWS EC2)

Scenario Definition

  • 日志类型: Mixed Log (平均日志大小:886B)
  • 平均大小: 平均日志大小:886B
  • 能力: parse_trans
  • 输入/输出: TCP -> File
  • 说明: Mixed Log 场景,TCP 输入到 File 输出,执行 日志解析+转换 能力。

Dataset Contract

  • 输入数据: benchmark/case_tcp/parse_to_blackhole/data/in_dat/gen.dat(数据文件) / benchmark/case_tcp/parse_to_file/conf/wpgen.toml(生成器配置)
  • 事件数: 支持 -m(中等规模)与 -c(指定条数),事件含义与生成器配置保持一致
  • 编码/分隔: UTF-8 / LF
  • 混合比例: 3:2:1:1(nginx:aws:firewall:apt)

Configuration Binding

  • WarpParse: benchmark/case_tcp/parse_to_file/conf/wparse.toml(规则目录:benchmark/models/wpl/mixed;解析+转换使用 benchmark/models/oml)
  • Vector-VRL: benchmark/vector/vector-vrl_transform/mixed_tcp_to_file.toml
  • Vector-Fixed: benchmark/vector/vector-fixed_transform/mixed_tcp_to_file.toml
  • Logstash: benchmark/logstash/logstash_trans/mixed_tcp_to_file.conf

Execution Contract

  • 结束条件: 消费完等量事件(或按数据集规模),如需按时间结束请补充
  • 并发/Worker: 默认配置(wparse 的 parse_workers 以配置为准)
  • 重复次数: 默认单次;建议 N=3 取 median

Metrics

  • EPS: Events Per Second
  • MPS: MiB/s,公式:MPS = EPS * AvgEventSize / 1024 / 1024
  • CPU: 多核累计百分比(例如 800% ≈ 8 个逻辑核满载)
  • MEM: 进程内存占用(Avg/Peak)
  • Rule Size: 规则配置体积

Performance Data

Linux (AWS EC2)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse120,100101.48648% / 727%121 MB / 183 MB6.19x
Vector-VRL19,40016.39268% / 300%201 MB / 216 MB1.00x
Vector-Fixed20,00016.90273% / 303%195 MB / 207 MB1.03x
Logstash26,66622.53612% / 689%1218 MB / 1253 MB1.37x

macOS (Mac M4 Mini)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse299,900253.40616% / 754%332 MB / 493 MB3.86x
Vector-VRL77,60065.57397% / 421%363 MB / 374 MB1.0x
Vector-Fixed78,10065.99400% / 421%337 MB / 358 MB1.01x
Logstash93,15378.71859% / 957%1274 MB / 1308 MB1.20x

Correctness Check

  • 对齐说明: 参见 benchmark/report/test_sample.md
  • 抽样方式: 运行 file 输出链路进行抽样对比,检查关键字段与 Golden 输出一致
  • 输出路径约定: benchmark/case_tcp/parse_to_file/data/out_dat/(如需校验可切换到 file 输出)

Notes

  • Loopback TCP: TCP 场景均使用 127.0.0.1 回环,不受物理网卡限制
  • 实例规格: 若为 TBD,loopback TCP 口径不受实例带宽/ENI 影响
  • 限制: 单机测试,未覆盖分布式/HA

References

  • Mac 报告: benchmark/report/report_mac.md#325-mixed-log-平均日志大小886b(章节 3.2.5)
  • Linux 报告: benchmark/report/report_linux.md#325-mixed-log-平均日志大小886b(章节 3.2.5)
  • 规则说明: benchmark/report/test_rule.md
  • 样本对齐: benchmark/report/test_sample.md

nginx_tcp_to_blackhole

Case Metadata

  • Case ID: nginx_tcp_to_blackhole
  • Category: tcp
  • Capability: parse_trans
  • Topology: tcp -> blackhole
  • Platforms: Mac M4 Mini / Linux (AWS EC2)

Scenario Definition

  • 日志类型: Nginx Access Log (239B)
  • 平均大小: 239B
  • 能力: parse_trans
  • 输入/输出: TCP -> BlackHole
  • 说明: Nginx Access Log 场景,TCP 输入到 BlackHole 输出,执行 日志解析+转换 能力。

Dataset Contract

  • 输入数据: benchmark/case_tcp/parse_to_blackhole/data/in_dat/gen.dat(数据文件) / benchmark/case_tcp/parse_to_blackhole/conf/wpgen.toml(生成器配置)
  • 事件数: 支持 -m(中等规模)与 -c(指定条数),事件含义与生成器配置保持一致
  • 编码/分隔: UTF-8 / LF
  • 混合比例: 不适用

Configuration Binding

  • WarpParse: benchmark/case_tcp/parse_to_blackhole/conf/wparse.toml(规则目录:benchmark/models/wpl/nginx;解析+转换使用 benchmark/models/oml)
  • Vector-VRL: benchmark/vector/vector-vrl_transform/nginx_tcp_to_blackhole.toml
  • Vector-Fixed: benchmark/vector/vector-fixed_transform/nginx_tcp_to_blackhole.toml
  • Logstash: benchmark/logstash/logstash_trans/nginx_tcp_to_blackhole.conf

Execution Contract

  • 结束条件: 消费完等量事件(或按数据集规模),如需按时间结束请补充
  • 并发/Worker: 默认配置(wparse 的 parse_workers 以配置为准)
  • 重复次数: 默认单次;建议 N=3 取 median

Metrics

  • EPS: Events Per Second
  • MPS: MiB/s,公式:MPS = EPS * AvgEventSize / 1024 / 1024
  • CPU: 多核累计百分比(例如 800% ≈ 8 个逻辑核满载)
  • MEM: 进程内存占用(Avg/Peak)
  • Rule Size: 规则配置体积

Performance Data

Linux (AWS EC2)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse524,800119.62608% / 637%189 MB / 410 MB1.34x
Vector-VRL392,20089.39472% / 512%162 MB / 166 MB1.0x
Vector-Fixed208,90047.61502% / 537%146 MB / 151 MB0.53x
Logstash107,14224.42520% / 552%1163 MB / 1243 MB0.27x

macOS (Mac M4 Mini)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse1,382,800315.19602% / 656%279 MB / 369 MB1.35x
Vector-VRL1,024,300233.47534% / 618%232 MB / 235 MB1.0x
Vector-Fixed595,800135.80543% / 651%214 MB / 219 MB0.58x
Logstash357,14281.40685% / 861%1219 MB / 1258 MB0.35x

Correctness Check

  • 对齐说明: 参见 benchmark/report/test_sample.md
  • 抽样方式: 运行 file 输出链路进行抽样对比,检查关键字段与 Golden 输出一致
  • 输出路径约定: benchmark/case_tcp/parse_to_file/data/out_dat/(如需校验可切换到 file 输出)

Notes

  • Loopback TCP: TCP 场景均使用 127.0.0.1 回环,不受物理网卡限制
  • 实例规格: 若为 TBD,loopback TCP 口径不受实例带宽/ENI 影响
  • 限制: 单机测试,未覆盖分布式/HA

References

  • Mac 报告: benchmark/report/report_mac.md#321-nginx-access-log-239b(章节 3.2.1)
  • Linux 报告: benchmark/report/report_linux.md#321-nginx-access-log-239b(章节 3.2.1)
  • 规则说明: benchmark/report/test_rule.md
  • 样本对齐: benchmark/report/test_sample.md

nginx_tcp_to_file

Case Metadata

  • Case ID: nginx_tcp_to_file
  • Category: tcp
  • Capability: parse_trans
  • Topology: tcp -> file
  • Platforms: Mac M4 Mini / Linux (AWS EC2)

Scenario Definition

  • 日志类型: Nginx Access Log (239B)
  • 平均大小: 239B
  • 能力: parse_trans
  • 输入/输出: TCP -> File
  • 说明: Nginx Access Log 场景,TCP 输入到 File 输出,执行 日志解析+转换 能力。

Dataset Contract

  • 输入数据: benchmark/case_tcp/parse_to_blackhole/data/in_dat/gen.dat(数据文件) / benchmark/case_tcp/parse_to_file/conf/wpgen.toml(生成器配置)
  • 事件数: 支持 -m(中等规模)与 -c(指定条数),事件含义与生成器配置保持一致
  • 编码/分隔: UTF-8 / LF
  • 混合比例: 不适用

Configuration Binding

  • WarpParse: benchmark/case_tcp/parse_to_file/conf/wparse.toml(规则目录:benchmark/models/wpl/nginx;解析+转换使用 benchmark/models/oml)
  • Vector-VRL: benchmark/vector/vector-vrl_transform/nginx_tcp_to_file.toml
  • Vector-Fixed: benchmark/vector/vector-fixed_transform/nginx_tcp_to_file.toml
  • Logstash: benchmark/logstash/logstash_trans/nginx_tcp_to_file.conf

Execution Contract

  • 结束条件: 消费完等量事件(或按数据集规模),如需按时间结束请补充
  • 并发/Worker: 默认配置(wparse 的 parse_workers 以配置为准)
  • 重复次数: 默认单次;建议 N=3 取 median

Metrics

  • EPS: Events Per Second
  • MPS: MiB/s,公式:MPS = EPS * AvgEventSize / 1024 / 1024
  • CPU: 多核累计百分比(例如 800% ≈ 8 个逻辑核满载)
  • MEM: 进程内存占用(Avg/Peak)
  • Rule Size: 规则配置体积

Performance Data

Linux (AWS EC2)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse297,10067.72645% / 664%238 MB / 317 MB17.90x
Vector-VRL16,6003.78138% / 143%138 MB / 143 MB1.0x
Vector-Fixed17,2003.92156% / 166%128 MB / 133MB1.04x
Logstash95,23821.71510% / 551%1141 MB / 1217 MB5.74x

macOS (Mac M4 Mini)

引擎EPSMPSCPU (Avg/Peak)MEM (Avg/Peak)性能倍数
WarpParse788,900179.82574% / 587%249 MB / 253 MB8.44x
Vector-VRL93,50021.31171% / 184%203 MB / 211 MB1.0x
Vector-Fixed87,50019.94208% / 223%197 MB / 212 MB0.94x
Logstash344,82778.60661% / 883%1202 MB / 1230 MB3.69x

Correctness Check

  • 对齐说明: 参见 benchmark/report/test_sample.md
  • 抽样方式: 运行 file 输出链路进行抽样对比,检查关键字段与 Golden 输出一致
  • 输出路径约定: benchmark/case_tcp/parse_to_file/data/out_dat/(如需校验可切换到 file 输出)

Notes

  • Loopback TCP: TCP 场景均使用 127.0.0.1 回环,不受物理网卡限制
  • 实例规格: 若为 TBD,loopback TCP 口径不受实例带宽/ENI 影响
  • 限制: 单机测试,未覆盖分布式/HA

References

  • Mac 报告: benchmark/report/report_mac.md#321-nginx-access-log-239b(章节 3.2.1)
  • Linux 报告: benchmark/report/report_linux.md#321-nginx-access-log-239b(章节 3.2.1)
  • 规则说明: benchmark/report/test_rule.md
  • 样本对齐: benchmark/report/test_sample.md

sysmon_tcp_to_blackhole

sysmon_tcp_to_file

TCP Parse to Blackhole

Benchmark for “TCP Source → Blackhole Sink” daemon mode scenario: uses wpgen to send data via TCP, wparse receives in daemon mode and parses, outputs to blackhole to test TCP reception and parsing combined performance.

Purpose

Validate the ability to:

  • Receive data via TCP in daemon mode
  • Apply WPL parsing rules to network data
  • Measure TCP + parsing throughput

Features Validated

FeatureDescription
TCP SourceReceiving data via TCP (port 19001)
Daemon Modewparse daemon mode execution
Blackhole SinkDiscarding output to measure throughput
Network + ParseCombined TCP reception and parsing

Quick Start

cd benchmark/case_tcp/parse_to_blackhole

# Default test (20M lines, 6 workers)
./run.sh

# Medium dataset (200K lines)
./run.sh -m

# Custom configuration
./run.sh -w 8 sysmon 500000

Parameters

ParameterDescriptionDefault
-mMedium dataset20M → 200K lines
-w <cnt>Worker count6
wpl_dirWPL rule directorynginx
speedGeneration rate limit0 (unlimited)

Data Flow

wpgen → TCP (port 19001) → wparse daemon → blackhole sink

tcp_blackhole 说明 (中文)

本用例演示“TCP 源 → Blackhole 汇“的性能基准测试场景:使用 wpgen 通过 TCP 协议发送数据,wparse 以 daemon 模式接收并处理,输出到 blackhole 以测试可靠传输与解析的综合性能。

目录结构

benchmark/tcp_blackhole/
├── README.md                    # 本说明文档
├── run.sh                       # 性能测试运行脚本
├── conf/                        # 配置文件目录
│   ├── wparse.toml             # WarpParse 主配置
│   ├── wpgen.toml              # 第一路 TCP 生成器配置
│   └── wpgen2.toml             # 第二路 TCP 生成器配置(可选)
├── models/                      # 模型配置目录
│   ├── sinks/                  # 数据汇配置
│   │   ├── defaults.toml       # 默认配置
│   │   ├── business.d/         # 业务组配置
│   │   │   └── sink.toml       # 业务汇组配置
│   │   └── infra.d/            # 基础设施组配置
│   │       ├── default.toml    # 默认数据汇
│   │       ├── error.toml      # 错误数据处理
│   │       ├── miss.toml       # 缺失数据处理
│   │       ├── monitor.toml    # 监控数据处理
│   │       └── residue.toml    # 残留数据处理
│   ├── sources/                # 数据源配置
│   │   └── wpsrc.toml          # TCP 源配置
│   ├── wpl/                    # WPL 解析规则目录
│   │   ├── nginx/              # Nginx 日志规则
│   │   ├── apache/             # Apache 日志规则
│   │   └── sysmon/             # 系统监控规则
│   ├── oml/                    # OML 转换规则目录(空)
│   └── knowledge/              # 知识库目录(空)
├── data/                        # 运行数据目录
│   ├── out_dat/                 # 输出数据目录
│   │   ├── error.dat           # 错误数据输出
│   │   ├── miss.dat            # 缺失数据输出
│   │   ├── monitor.dat         # 监控数据输出
│   │   └── residue.dat         # 残留数据输出
│   └── logs/                    # 日志文件目录
└── .run/                        # 运行时数据目录
    └── rule_mapping.dat        # 规则映射数据

快速开始

运行环境要求

  • WarpParse 引擎(需在系统 PATH 中)
  • Bash shell 环境
  • 支持 TCP 网络连接的系统环境
  • 推荐系统:
    • Linux:最佳性能,支持所有优化功能
    • macOS:良好性能,部分优化功能受限

运行命令

# 进入 benchmark 目录
cd benchmark

# 默认大规模性能测试(2000 万行数据)
./tcp_blackhole/run.sh

# 中等规模测试(20 万行数据)
./tcp_blackhole/run.sh -m

# 使用 sysmon 规则并限速 100 万行/秒
./tcp_blackhole/run.sh sysmon 1000000 

# 自定义测试参数
./tcp_blackhole/run.sh  -w 8 nginx 500000

./tcp_blackhole/run.sh  -w 8 sysmon 500000

运行参数

参数说明默认值
-m使用中等规模数据集2000万 → 20万行
-w <cnt>指定 worker 数量6
wpl_dirWPL 规则目录名nginx
speed生成器限速(行/秒)0(不限速)

性能测试选项

  • 默认测试:2000 万行 Nginx 日志,不限速,6 个 worker
  • 中等测试:20 万行数据,适合快速验证
  • 自定义 WPL:支持 nginx、apache、sysmon 等规则
  • 速率限制:可指定生成速率,测试流控性能

执行逻辑

流程概览

run.sh 脚本执行以下主要步骤:

  1. 环境准备

    • 加载 benchmark 公共函数库
    • 解析命令行参数
    • 设置默认值(大规模:2000万行,中等:20万行)
  2. 初始化环境

    • 初始化 release 模式环境
    • 验证指定的 WPL 规则路径
    • 清理历史数据和日志
  3. 启动 Daemon 模式

    • 启动 wparse daemon 监听 TCP 端口
    • 加载指定的 WPL 规则
    • 等待 TCP 连接
  4. 数据生成与发送

    • 启动 wpgen 生成测试数据
    • 通过 TCP 协议发送到 wparse daemon
    • 支持单路或双路并发发送
  5. 性能监控

    • 实时监控处理速度
    • 记录吞吐量、延迟等指标
    • 收集错误和异常统计
  6. 结果统计

    • 停止 daemon 进程
    • 输出性能报告
    • 验证数据完整性

数据流向

wpgen 生成器
    ↓ TCP 连接 (端口 19001)
┌────────────────────────────────┐
│        wparse daemon           │
│    - 接收 TCP 数据              │
│    - 应用 WPL 规则解析           │
│    - 分发到 sinks               │
└────────────────────────────────┘
    ↓
┌─────────────┬─────────────┐
│  blackhole  │   monitor   │
│    sink     │    sink     │
│ (丢弃数据)  │ (收集统计)     │
└─────────────┴─────────────┘

验证与故障排除

运行成功验证

  1. 检查性能输出

    • 查看 terminal 输出的实时统计信息
    • 关注 “Throughput”(吞吐量)指标
    • 确认无错误或异常
  2. 验证输出文件

    # 检查监控数据
    ls -la data/out_dat/monitor.dat
    
    # 确认其他文件为空(无错误)
    ls -la data/out_dat/{error,miss,residue}.dat
    

常见问题与解决方案

1. TCP 连接被拒绝

错误信息Connection refused

解决方案

  • 确认 wparse daemon 已成功启动
  • 检查端口 19001 是否被占用
  • 验证防火墙设置
# 检查端口占用
netstat -tlnp | grep 19001

# 或使用 ss 命令
ss -tlnp | grep 19001

2. daemon 进程未正常退出

解决方案

# 查找并终止 wparse 进程
ps aux | grep wparse
kill -9 <PID>

# 清理可能残留的端口占用
sudo lsof -i :19001

性能

优化

  1. 系统级优化

    Linux 系统:

    # CPU 亲和性设置
    taskset -c 0-5 ./run.sh -w 6
    
    # 实时优先级(需要 root)
    sudo chrt -f 99 ./run.sh
    
    # 调整 TCP 缓冲区(需要 root)
    sudo sysctl -w net.core.wmem_max=26214400
    sudo sysctl -w net.core.rmem_max=26214400
    

影响因素

  1. WPL 规则复杂度

    • nginx:简单正则,性能最佳
    • apache:中等复杂度
    • sysmon:复杂规则,性能较低
  2. 数据特征

    • 日志行长度
    • 正则匹配复杂度
    • 字段提取数量
  3. 系统配置

    • CPU 核心数和频率
    • 内存大小和速度
    • 磁盘 I/O(日志写入)

本文档最后更新时间:2025-12-16

TCP Trans to Blackhole

Benchmark for “TCP Source → Blackhole Sink” transformation daemon mode scenario: uses wpgen to send data via TCP, wparse receives in daemon mode with WPL parsing and OML transformation, outputs to blackhole to test TCP + parsing + transformation performance.

Purpose

Validate the ability to:

  • Receive data via TCP in daemon mode
  • Apply WPL parsing rules to network data
  • Apply OML transformation models
  • Measure TCP + parsing + transformation throughput

Features Validated

FeatureDescription
TCP SourceReceiving data via TCP (port 19001)
Daemon Modewparse daemon mode execution
WPL ParsingApplying parsing rules
OML TransformationApplying transformation models
Blackhole SinkDiscarding output to measure throughput

Quick Start

cd benchmark/case_tcp/trans_to_blackhole

# Default test (20M lines, 6 workers)
./run.sh

# Medium dataset (200K lines)
./run.sh -m

Data Flow

wpgen → TCP (port 19001) → wparse daemon (parse + OML) → blackhole sink

tcp_trans_blackhole 说明 (中文)

本用例演示“TCP 源 → Blackhole 汇“的转换性能基准测试场景:使用 wpgen 通过 TCP 协议发送数据,wparse 以 daemon 模式接收并进行 WPL 解析和 OML 转换,输出到 blackhole 以测试 TCP + 解析 + 转换的综合性能。

目录结构

benchmark/tcp_blackhole/
├── README.md                    # 本说明文档
├── run.sh                       # 性能测试运行脚本
├── conf/                        # 配置文件目录
│   ├── wparse.toml             # WarpParse 主配置
│   ├── wpgen.toml              # 第一路 TCP 生成器配置
│   └── wpgen2.toml             # 第二路 TCP 生成器配置(可选)
├── models/                      # 模型配置目录
│   ├── sinks/                  # 数据汇配置
│   │   ├── defaults.toml       # 默认配置
│   │   ├── business.d/         # 业务组配置
│   │   │   └── sink.toml       # 业务汇组配置
│   │   └── infra.d/            # 基础设施组配置
│   │       ├── default.toml    # 默认数据汇
│   │       ├── error.toml      # 错误数据处理
│   │       ├── miss.toml       # 缺失数据处理
│   │       ├── monitor.toml    # 监控数据处理
│   │       └── residue.toml    # 残留数据处理
│   ├── sources/                # 数据源配置
│   │   └── wpsrc.toml          # TCP 源配置
│   ├── wpl/                    # WPL 解析规则目录
│   │   ├── nginx/              # Nginx 日志规则
│   │   ├── apache/             # Apache 日志规则
│   │   └── sysmon/             # 系统监控规则
│   ├── oml/                    # OML 转换规则目录(空)
│   └── knowledge/              # 知识库目录(空)
├── data/                        # 运行数据目录
│   ├── out_dat/                 # 输出数据目录
│   │   ├── error.dat           # 错误数据输出
│   │   ├── miss.dat            # 缺失数据输出
│   │   ├── monitor.dat         # 监控数据输出
│   │   └── residue.dat         # 残留数据输出
│   └── logs/                    # 日志文件目录
└── .run/                        # 运行时数据目录
    └── rule_mapping.dat        # 规则映射数据

快速开始

运行环境要求

  • WarpParse 引擎(需在系统 PATH 中)
  • Bash shell 环境
  • 支持 TCP 网络连接的系统环境
  • 推荐系统:
    • Linux:最佳性能,支持所有优化功能
    • macOS:良好性能,部分优化功能受限

运行命令

# 进入 benchmark 目录
cd benchmark

# 默认大规模性能测试(2000 万行数据)
./tcp_blackhole/run.sh

# 中等规模测试(20 万行数据)
./tcp_blackhole/run.sh -m

# 使用 sysmon 规则并限速 100 万行/秒
./tcp_blackhole/run.sh sysmon 1000000 

# 自定义测试参数
./tcp_blackhole/run.sh  -w 8 nginx 500000

./tcp_blackhole/run.sh  -w 8 sysmon 500000

运行参数

参数说明默认值
-m使用中等规模数据集2000万 → 20万行
-w <cnt>指定 worker 数量6
wpl_dirWPL 规则目录名nginx
speed生成器限速(行/秒)0(不限速)

性能测试选项

  • 默认测试:2000 万行 Nginx 日志,不限速,6 个 worker
  • 中等测试:20 万行数据,适合快速验证
  • 自定义 WPL:支持 nginx、apache、sysmon 等规则
  • 速率限制:可指定生成速率,测试流控性能

执行逻辑

流程概览

run.sh 脚本执行以下主要步骤:

  1. 环境准备

    • 加载 benchmark 公共函数库
    • 解析命令行参数
    • 设置默认值(大规模:2000万行,中等:20万行)
  2. 初始化环境

    • 初始化 release 模式环境
    • 验证指定的 WPL 规则路径
    • 清理历史数据和日志
  3. 启动 Daemon 模式

    • 启动 wparse daemon 监听 TCP 端口
    • 加载指定的 WPL 规则
    • 等待 TCP 连接
  4. 数据生成与发送

    • 启动 wpgen 生成测试数据
    • 通过 TCP 协议发送到 wparse daemon
    • 支持单路或双路并发发送
  5. 性能监控

    • 实时监控处理速度
    • 记录吞吐量、延迟等指标
    • 收集错误和异常统计
  6. 结果统计

    • 停止 daemon 进程
    • 输出性能报告
    • 验证数据完整性

数据流向

wpgen 生成器
    ↓ TCP 连接 (端口 19001)
┌────────────────────────────────┐
│        wparse daemon           │
│    - 接收 TCP 数据              │
│    - 应用 WPL 规则解析           │
│    - 分发到 sinks               │
└────────────────────────────────┘
    ↓
┌─────────────┬─────────────┐
│  blackhole  │   monitor   │
│    sink     │    sink     │
│ (丢弃数据)  │ (收集统计)     │
└─────────────┴─────────────┘

验证与故障排除

运行成功验证

  1. 检查性能输出

    • 查看 terminal 输出的实时统计信息
    • 关注 “Throughput”(吞吐量)指标
    • 确认无错误或异常
  2. 验证输出文件

    # 检查监控数据
    ls -la data/out_dat/monitor.dat
    
    # 确认其他文件为空(无错误)
    ls -la data/out_dat/{error,miss,residue}.dat
    

常见问题与解决方案

1. TCP 连接被拒绝

错误信息Connection refused

解决方案

  • 确认 wparse daemon 已成功启动
  • 检查端口 19001 是否被占用
  • 验证防火墙设置
# 检查端口占用
netstat -tlnp | grep 19001

# 或使用 ss 命令
ss -tlnp | grep 19001

2. daemon 进程未正常退出

解决方案

# 查找并终止 wparse 进程
ps aux | grep wparse
kill -9 <PID>

# 清理可能残留的端口占用
sudo lsof -i :19001

性能

优化

  1. 系统级优化

    Linux 系统:

    # CPU 亲和性设置
    taskset -c 0-5 ./run.sh -w 6
    
    # 实时优先级(需要 root)
    sudo chrt -f 99 ./run.sh
    
    # 调整 TCP 缓冲区(需要 root)
    sudo sysctl -w net.core.wmem_max=26214400
    sudo sysctl -w net.core.rmem_max=26214400
    

影响因素

  1. WPL 规则复杂度

    • nginx:简单正则,性能最佳
    • apache:中等复杂度
    • sysmon:复杂规则,性能较低
  2. 数据特征

    • 日志行长度
    • 正则匹配复杂度
    • 字段提取数量
  3. 系统配置

    • CPU 核心数和频率
    • 内存大小和速度
    • 磁盘 I/O(日志写入)

本文档最后更新时间:2025-12-16

wpgen Performance Test

This case tests the wpgen data generator performance independently, without starting wparse.

Purpose

Validate the ability to:

  • Test wpgen generation capability in isolation
  • Evaluate different WPL rule sets for generation speed
  • Measure rate-limited vs unlimited generation performance
  • Prepare data for other benchmark tests

Features Validated

FeatureDescription
Pure GenerationTesting wpgen without wparse overhead
Multi-Rule Setsnginx and benchmark rule sets
Rate LimitingTesting different speed limits
Large ScaleDefault 8M lines + 6K lines samples

Quick Start

cd benchmark

# Default test (nginx + benchmark rules)
./wpgen_test/run.sh

# Specify profile (release/debug)
./wpgen_test/run.sh release
./wpgen_test/run.sh debug

Test Configuration

# High-speed generation test
LINE_CNT=8000000
SPEED_MAX=2000000
wpgen sample -n $LINE_CNT -s $SPEED_MAX --stat 2 -p --wpl ./models/wpl/nginx
wpgen sample -n $LINE_CNT -s $SPEED_MAX --stat 2 -p --wpl ./models/wpl/benchmark

# Low-speed generation test
LINE_CNT=6000
SPEED_MAX=1000
wpgen sample -n $LINE_CNT -s $SPEED_MAX --stat 2 -p --wpl ./models/wpl/nginx

Performance Factors

FactorImpact
CPU PerformanceRule complexity affects generation speed
Disk I/OFile writing bottleneck
Rule ComplexityField count and types

wpgen 性能测试 (中文)

本用例专门用于测试 wpgen 数据生成器的性能:仅生成样本数据,不启动 wparse 解析。

用途

验证以下能力:

  • 独立测试 wpgen 生成能力
  • 评估不同 WPL 规则集的生成速度
  • 测量限速与无限速生成性能
  • 为其他基准测试准备数据

验证特性

特性说明
纯生成测试不含 wparse 开销的 wpgen 测试
多规则集nginx 和 benchmark 规则集
速率限制测试不同限速配置
大规模数据默认 800 万行 + 6000 行样本

快速开始

cd benchmark

# 默认测试(nginx + benchmark 两套规则)
./wpgen_test/run.sh

# 指定 profile(release/debug)
./wpgen_test/run.sh release
./wpgen_test/run.sh debug

测试配置

# 高速生成测试
LINE_CNT=8000000
SPEED_MAX=2000000
wpgen sample -n $LINE_CNT -s $SPEED_MAX --stat 2 -p --wpl ./models/wpl/nginx
wpgen sample -n $LINE_CNT -s $SPEED_MAX --stat 2 -p --wpl ./models/wpl/benchmark

# 低速生成测试
LINE_CNT=6000
SPEED_MAX=1000
wpgen sample -n $LINE_CNT -s $SPEED_MAX --stat 2 -p --wpl ./models/wpl/nginx

性能影响因素

因素影响
CPU 性能规则复杂度影响生成速度
磁盘 I/O文件写入瓶颈
规则复杂度字段数量和类型