Skip to content

Configuration File Reference

Complete reference for dbslice YAML configuration files.

Table of Contents


Overview

dbslice supports YAML configuration files for managing complex extraction scenarios. Configuration files are useful for:

  • Repeatable Extractions: Save extraction settings for consistent results
  • Team Sharing: Share extraction configs with team members
  • Complex Configurations: Manage multi-seed, multi-table extractions
  • CI/CD Integration: Version-controlled extraction configurations
  • Security: Keep sensitive settings (database URLs) out of command history

File Location

Default Locations

dbslice looks for configuration files in these locations (in order):

  1. File specified with --config flag
  2. dbslice.yaml in current directory
  3. .dbslice.yaml in current directory
  4. ~/.config/dbslice/config.yaml in user home directory

Generating Configuration Files

# Generate default configuration
dbslice init postgresql://localhost/mydb

# Generate to specific location
dbslice init postgresql://localhost/mydb -f config/production.yaml

# Generate without sensitive field detection
dbslice init postgresql://localhost/mydb --no-detect-sensitive

Configuration Schema

The configuration file uses YAML format with the following top-level structure:

version: "1.0"           # Optional config version tag (informational)
database:                # Database connection settings
extraction:              # Extraction behavior settings
anonymization:           # Anonymization configuration
compliance:              # Compliance profiles and audit manifest (optional)
output:                  # Output format settings
tables:                  # Per-table configuration (optional)
performance:             # Performance tuning (optional)

Sections

version

Type: String Required: No Default: unset

Optional schema/version tag for your own tracking. dbslice currently treats this as informational metadata.

version: "1.0"

database

Database connection configuration.

Schema

database:
  url: string              # Database connection URL (required)
  schema: string           # Schema name (optional, default: "public" for PostgreSQL)
  options: object          # Optional URL query options (key/value)

Fields

Field Type Required Default Description
url String Yes - Database connection URL
schema String No "public" Schema name for PostgreSQL
options Object No {} Extra connection options merged into URL query params

Examples

# Basic PostgreSQL connection
database:
  url: postgresql://user:pass@localhost:5432/mydb

# With schema specification
database:
  url: postgresql://user:pass@localhost:5432/mydb
  schema: public

# Add query options via config
database:
  url: postgresql://user:pass@localhost:5432/mydb?sslmode=disable
  options:
    sslmode: require
    application_name: dbslice

# Environment variable (recommended for security)
database:
  url: ${DATABASE_URL}

# Read from file
database:
  url: ${DATABASE_URL_FILE}

database.url placeholder behavior: - Exact-match placeholders only: the full value must be ${VAR} or ${VAR_FILE}. - ${VAR}: uses the value of environment variable VAR. - ${VAR_FILE}: reads file path from environment variable VAR_FILE, then uses trimmed file contents. - Missing env var or unreadable _FILE target causes config-load validation failure. - Partial-string interpolation is not supported.

database.options precedence: - Applied only when URL comes from config (database.url). - If CLI provides database URL, config database.options are ignored.


extraction

Extraction behavior configuration.

Schema

extraction:
  default_depth: integer           # Default traversal depth
  direction: string                # Traversal direction (up/down/both)
  exclude_tables: list[string]     # Tables to exclude
  validate: boolean                # Enable validation
  fail_on_validation_error: boolean  # Stop on validation errors
  max_rows_per_table: integer      # Optional global row soft-cap
  allow_unsafe_where: boolean      # Allow subqueries in seed WHERE clauses (trusted input only)

Fields

Field Type Required Default Description
default_depth Integer No 3 Maximum FK traversal depth
direction String No "both" Traversal direction: up, down, or both
exclude_tables List[String] No [] Tables to exclude from extraction
validate Boolean No true Validate extraction for referential integrity
fail_on_validation_error Boolean No false Stop execution if validation finds issues
max_rows_per_table Integer No unlimited Global per-table soft-cap with integrity closure
allow_unsafe_where Boolean No false Allow seed subqueries like IN (SELECT ...) for trusted inputs

max_rows_per_table is deterministic and integrity-first: - dbslice first caps each table deterministically by primary key sort. - It then adds required parent rows so FK integrity is preserved. - Parent closure may exceed the configured cap. - If any row limit is configured, streaming mode is disabled automatically.

allow_unsafe_where notes: - Default is false for security. - When true, subqueries in seed WHERE clauses are allowed (for advanced filtering/join-style selection). - Dangerous operations (DROP, DELETE, comments, stacked queries, etc.) are still blocked.

Examples

# Basic extraction config
extraction:
  default_depth: 3
  direction: both

# Exclude audit tables
extraction:
  default_depth: 5
  direction: both
  exclude_tables:
    - audit_logs
    - sessions
    - temp_data
    - migration_history

# With validation
extraction:
  default_depth: 3
  direction: both
  validate: true
  fail_on_validation_error: false

# Parents only (dependencies)
extraction:
  default_depth: 10
  direction: up
  validate: true

# Trusted advanced WHERE filters (subqueries)
extraction:
  allow_unsafe_where: true

anonymization

Anonymization and data redaction configuration.

Schema

anonymization:
  enabled: boolean              # Enable anonymization
  seed: string                  # Deterministic seed
  fields: object                # Exact table.column -> provider
  patterns: object              # Wildcard table.column glob -> provider
  security_null_fields: list    # Wildcard table.column globs to force NULL
  deterministic: boolean        # Use deterministic anonymization (default: true)

Fields

Field Type Required Default Description
enabled Boolean No false Enable automatic anonymization
seed String No Generated Deterministic seed for consistent anonymization
fields Object No {} Exact map of table.column to Faker method
patterns Object No {} Wildcard map of table.column glob to Faker method
security_null_fields List[String] No [] Wildcard table.column globs to force NULL
deterministic Boolean No true Deterministic mode (same input = same output). Set false for non-deterministic anonymization with stronger privacy guarantees

Notes: - fields keys must be exact table.column entries (no wildcards). - patterns and security_null_fields use shell-style globs (*, ?) on table.column. - Provider names are validated at config-load time; invalid Faker providers fail fast. - Rule precedence: exact fields > wildcard patterns > built-in pattern matching. - If multiple wildcard patterns match, the most specific wins (ties use first-defined order). - Foreign-key columns are never anonymized or nulled.

Field Anonymization Methods

Common Faker methods for the fields mapping:

Method Description Example Output
email Email address john@example.com
phone_number Phone number +1-555-0123
first_name First name John
last_name Last name Doe
name Full name John Doe
address Street address 123 Main St
city City name New York
zipcode ZIP/postal code 12345
ssn Social Security Number 123-45-6789
credit_card_number Credit card number 4532-1234-5678-9010
ipv4 IPv4 address 192.168.1.1
company Company name Acme Corp
url URL https://example.com

See Faker documentation for complete list.

Examples

# Basic anonymization
anonymization:
  enabled: true

# With custom seed (for deterministic output)
anonymization:
  enabled: true
  seed: "my-secret-seed-12345"

# Field-specific anonymization
anonymization:
  enabled: true
  fields:
    users.email: email
    users.phone: phone_number
    users.first_name: first_name
    users.last_name: last_name
    users.ssn: ssn
    customers.company: company
    payments.card_number: credit_card_number
    logs.ip_address: ipv4

# Wildcard anonymization + forced NULL rules
anonymization:
  enabled: true
  patterns:
    users.*_name: name
    "*.phone*": phone_number
  security_null_fields:
    - users.password*
    - "*.api_key"

# Complete anonymization config
anonymization:
  enabled: true
  seed: "production-to-dev-2023"
  fields:
    # User PII
    users.email: email
    users.phone: phone_number
    users.first_name: first_name
    users.last_name: last_name
    users.date_of_birth: date_of_birth

    # Identity documents
    users.ssn: ssn
    users.passport: passport_number
    users.driver_license: license_plate

    # Financial data
    payments.card_number: credit_card_number
    payments.routing_number: aba
    payments.account_number: bban

    # Contact information
    customers.company: company
    customers.address: address
    customers.city: city
    customers.postal_code: postcode

    # Network data
    logs.ip_address: ipv4
    sessions.user_agent: user_agent

# Disable anonymization for specific environments
anonymization:
  enabled: false  # For non-production to non-production transfers

compliance

Compliance profile and audit manifest configuration.

Schema

compliance:
  profiles: list[string]          # Compliance profiles to apply
  strict: boolean                 # Fail if uncovered PII detected
  generate_manifest: boolean      # Generate audit manifest
  policy_mode: string             # Runtime policy gates: off|standard|strict
  allow_url_patterns: list[string]# Regex allow-list for source DB URL
  deny_url_patterns: list[string] # Regex deny-list for source DB URL
  required_sslmode: string        # Required sslmode query value in DB URL
  require_ci: boolean             # Require CI=true environment
  sign_manifest: boolean          # HMAC-sign manifest when key is available
  manifest_key_env: string        # Env var name containing signing key

Fields

Field Type Required Default Description
profiles List[String] No [] Compliance profiles: gdpr, hipaa, pci-dss
strict Boolean No false Fail extraction if value-based PII scanning detects unmasked PII
generate_manifest Boolean No false Generate a JSON audit manifest alongside output (auto-enabled when profiles are active)
policy_mode String No "off" Compliance policy gates: off, standard, strict
allow_url_patterns List[String] No [] Source DB URL must match one of these regex patterns (if set)
deny_url_patterns List[String] No [] Source DB URL must not match any of these regex patterns
required_sslmode String No - Required PostgreSQL sslmode query parameter value
require_ci Boolean No false Fail when running outside CI (CI=true expected)
sign_manifest Boolean No false Sign manifest with HMAC-SHA256 (tamper detection, not non-repudiation)
manifest_key_env String No "DBSLICE_MANIFEST_SIGNING_KEY" Env var containing HMAC signing key (shared secret)

Compliance Profiles

Profile Description Key Coverage
gdpr EU General Data Protection Regulation Names, email, phone, address, IP, DOB, SSN, financial IDs
hipaa HIPAA Safe Harbor (18 identifiers) All 18 Safe Harbor identifiers including medical record numbers, device IDs, dates
pci-dss PCI-DSS v4.0 PAN, cardholder name, expiration, CVV/PIN (NULLed)

When a compliance profile is active: - Anonymization is auto-enabled (no need for anonymization.enabled: true) - Profile-defined column patterns are merged as fallback wildcard rules (user exact fields > user patterns > profile patterns > built-ins) - Value-based scanning runs in two phases: - coverage scan (pre-mask) to detect PII presence - residual scan (post-mask) on unprotected columns only (strict mode fails only here) - Free-text columns (notes, comments, descriptions) are flagged as warnings - Audit manifest is generated by default

Policy Modes

policy_mode adds runtime guardrails when compliance profiles are active. These are CLI-level checks that prevent accidental misconfiguration — they are not a security boundary.

  • off: No policy gates (default).
  • standard / strict: Block risky defaults — stdout output, --allow-unsafe-where, and non-masked extraction are rejected unless overridden with --allow-raw. Both modes currently apply the same gates; strict is reserved for future tightening.

Breakglass override: --allow-raw --breakglass-reason "..." --ticket-id "...". The reason and ticket ID are recorded in the manifest for audit purposes.

Important: Pseudonymization vs Anonymization

dbslice's anonymization is technically pseudonymization under GDPR (deterministic mode: same input = same output, reversible with seed knowledge). For stronger privacy guarantees, use anonymization.deterministic: false (non-deterministic mode), which uses random seeds per value but loses cross-table consistency.

True GDPR anonymization (where re-identification is "not reasonably possible") may require additional measures beyond what dbslice provides (k-anonymity, data generalization, etc.).

Audit Manifest

When generate_manifest is enabled, dbslice writes a *.manifest.json file alongside the output containing:

  • Extraction metadata (timestamp, version, seed hash)
  • Per-table breakdown of masked, NULLed, FK-preserved, and unmasked fields
  • Residual PII scan results from value-based scanning
  • Compliance warnings (e.g., free-text columns that may contain embedded PII)
  • Output file hash set (sha256) for produced artifacts
  • Optional breakglass metadata (reason + ticket) when override is used
  • Optional HMAC-SHA256 signature for tamper detection (symmetric key — integrity checking, not non-repudiation)

This manifest provides structured evidence for audit reviews. For non-repudiation (provable origin), sign the manifest externally with cosign or GPG in your CI pipeline.

Examples

# HIPAA-compliant extraction
compliance:
  profiles: [hipaa]
  strict: true
  generate_manifest: true

anonymization:
  enabled: true
  seed: "hipaa-compliant-seed-2024"

# Multiple compliance profiles
compliance:
  profiles: [gdpr, pci-dss]
  strict: false
  generate_manifest: true

# Non-deterministic mode for stronger privacy
compliance:
  profiles: [gdpr]
  strict: true

anonymization:
  enabled: true
  deterministic: false  # Random output each run

output

Output format and generation configuration.

Schema

output:
  format: string                   # Output format (sql/json/csv)
  include_transaction: boolean     # Wrap in BEGIN/COMMIT
  include_truncate: boolean        # Include TRUNCATE TABLE statements
  disable_fk_checks: boolean       # Disable FK checks during import
  file_mode: string                # Output file permissions (octal, e.g. "600")
  json_mode: string                # JSON mode (single/per-table)
  json_pretty: boolean             # Pretty-print JSON

Fields

Field Type Required Default Description
format String No "sql" Output format: sql, json, or csv
include_transaction Boolean No true Wrap SQL in BEGIN/COMMIT
include_truncate Boolean No false Include TRUNCATE TABLE ... CASCADE before inserts
disable_fk_checks Boolean No false For PostgreSQL SQL output, emits deferred-constraint statements and enables non-nullable cycle fallback when FKs are DEFERRABLE
file_mode String/Octal No "600" File permissions for generated outputs
json_mode String No "single" JSON mode: single or per-table
json_pretty Boolean No true Pretty-print JSON output

Examples

# Basic SQL output
output:
  format: sql

# SQL with transactions
output:
  format: sql
  include_transaction: true
  include_truncate: false

# SQL for test fixtures (destructive)
output:
  format: sql
  include_transaction: true
  include_truncate: true     # Truncates tables before inserting
  disable_fk_checks: true    # Disables FK checks during import

include_drop_tables is still accepted as a backward-compatible alias for include_truncate, but is deprecated.

Cycle note for PostgreSQL SQL imports: - When cycles have no nullable FK, dbslice can still generate SQL if disable_fk_checks: true and cycle FKs are DEFERRABLE. - If cycle FKs are not deferrable, extraction fails with a clear error.

# JSON output (single file)
output:
  format: json
  json_mode: single
  json_pretty: true

# JSON output (per-table files)
output:
  format: json
  json_mode: per-table
  json_pretty: true

# Compact JSON for APIs
output:
  format: json
  json_mode: single
  json_pretty: false

tables

Per-table configuration (optional advanced feature).

Schema

tables:
  table_name:
    skip: boolean                # Skip table entirely
    depth: integer               # Per-table DOWN depth override
    direction: string            # Per-table direction override: up/down/both
    max_rows: integer            # Per-table row soft-cap (overrides global)
    anonymize_fields: object     # Deprecated alias: column -> faker provider
    exclude: boolean             # Deprecated alias for skip

Examples

# Per-table overrides
tables:
  sessions:
    skip: true
  audit_logs:
    skip: true

  orders:
    depth: 2
    direction: up

  users:
    max_rows: 100
    anonymize_fields:
      phone: phone_number

Legacy aliases:
- `tables.<name>.exclude` is accepted as deprecated alias of `skip`.
- `tables.<name>.anonymize_fields` is accepted as deprecated alias; prefer `anonymization.fields`.
- If both `anonymization.fields` and `tables.<name>.anonymize_fields` set the same `table.column`, `anonymization.fields` wins.

performance

Performance tuning configuration (optional).

Schema

performance:
  profile: boolean                    # Enable query profiling
  batch_size: integer                 # Adapter query batch size
  streaming:
    enabled: boolean                  # Force streaming mode
    threshold: integer                # Auto-enable threshold (rows)
    chunk_size: integer               # Rows per chunk

Fields

Field Type Required Default Description
profile Boolean No false Enable query profiling
batch_size Integer No adapter default Query parameter batch size for PostgreSQL adapter
streaming.enabled Boolean No false Force streaming mode
streaming.threshold Integer No 50000 Auto-enable streaming above this row count
streaming.chunk_size Integer No 1000 Rows per chunk in streaming mode

Examples

# Basic performance config
performance:
  profile: true

# Streaming configuration
performance:
  streaming:
    enabled: false           # Auto-enable based on threshold
    threshold: 100000        # Enable streaming at 100K rows
    chunk_size: 1000         # Process 1K rows at a time

# Aggressive performance tuning
performance:
  profile: true
  batch_size: 2000
  streaming:
    enabled: false
    threshold: 50000
    chunk_size: 2000

# Memory-constrained environment
performance:
  streaming:
    enabled: true            # Always stream
    threshold: 10000         # Low threshold
    chunk_size: 500          # Small chunks

CLI Override Behavior

Command-line arguments take precedence over configuration file settings. This allows you to: - Use a base configuration file - Override specific settings via CLI for one-off extractions

Override Rules

  1. CLI always wins: CLI arguments override config file settings
  2. Merge behavior: Some options (like anonymization field mappings + CLI --redact) are merged
  3. Complete replacement: Others (like depth, direction, exclude tables) are replaced

Override Examples

Config file (dbslice.yaml):

version: "1.0"
database:
  url: postgresql://localhost/mydb
extraction:
  default_depth: 3
  direction: both
  exclude_tables:
    - audit_logs
    - sessions
anonymization:
  enabled: true

CLI overrides:

# Override depth
dbslice extract --config dbslice.yaml --seed "orders.id=1" --depth 5
# Result: depth=5 (CLI wins)

# Override direction
dbslice extract --config dbslice.yaml --seed "orders.id=1" --direction up
# Result: direction=up (CLI wins)

# Override excluded tables
dbslice extract --config dbslice.yaml --seed "orders.id=1" --exclude temp_data
# Result: exclude_tables = [temp_data] (CLI replacement)

# Disable anonymization
dbslice extract --config dbslice.yaml --seed "orders.id=1" --no-anonymize
# Result: anonymization disabled (CLI wins)

# Override database URL
dbslice extract postgresql://other-host/db --config dbslice.yaml --seed "orders.id=1"
# Result: Uses postgresql://other-host/db (CLI wins)

Validation Rules

Configuration files are validated when loaded. Common validation errors:

Schema Validation

# ✅ Valid: version is optional
version: "1.0"
database:
  url: postgresql://localhost/mydb

Database URL Validation

# ❌ Invalid: Unsupported protocol
database:
  url: mysql://localhost/mydb  # MySQL not yet supported

# ❌ Invalid: Malformed URL
database:
  url: not-a-valid-url

# ✅ Valid: PostgreSQL URL
database:
  url: postgresql://localhost/mydb

Direction Validation

# ❌ Invalid: Unknown direction
extraction:
  direction: sideways

# ✅ Valid: Known directions
extraction:
  direction: up     # or "down", "both"

Depth Validation

# ❌ Invalid: Negative depth
extraction:
  default_depth: -1

# ❌ Invalid: Zero depth
extraction:
  default_depth: 0

# ✅ Valid: Positive depth
extraction:
  default_depth: 3

Output Format Validation

# ❌ Invalid: Unknown format
output:
  format: xml

# ✅ Valid: Supported formats
output:
  format: sql   # or "json"

Complete Examples

Development Environment

config/development.yaml:

version: "1.0"

database:
  url: postgresql://localhost:5432/myapp_dev

extraction:
  default_depth: 3
  direction: both
  exclude_tables:
    - audit_logs
    - sessions
    - temp_data
  validate: true
  fail_on_validation_error: false

anonymization:
  enabled: false  # No need to anonymize dev-to-dev

output:
  format: sql
  include_transaction: true
  include_truncate: false

performance:
  profile: false
  streaming:
    enabled: false
    threshold: 50000

Usage:

dbslice extract --config config/development.yaml --seed "orders.id=12345"


Production to Staging

config/prod_to_staging.yaml:

version: "1.0"

database:
  url: ${PRODUCTION_DATABASE_URL}  # From environment

extraction:
  default_depth: 5
  direction: both
  exclude_tables:
    - audit_logs
    - sessions
    - analytics_events
    - email_logs
  validate: true
  fail_on_validation_error: true

anonymization:
  enabled: true
  seed: "prod-to-staging-2024"
  fields:
    # User PII
    users.email: email
    users.phone: phone_number
    users.first_name: first_name
    users.last_name: last_name
    users.ssn: ssn
    users.passport: passport_number

    # Financial data
    payments.card_number: credit_card_number
    payments.routing_number: aba
    payments.cvv: random_int

    # Contact info
    customers.company: company
    customers.address: address
    customers.city: city

output:
  format: sql
  include_transaction: true
  include_truncate: false

performance:
  profile: true
  streaming:
    enabled: false
    threshold: 100000
    chunk_size: 1000

Usage:

export PRODUCTION_DATABASE_URL="postgresql://prod.example.com/myapp"

dbslice extract \
  --config config/prod_to_staging.yaml \
  --seed "users:created_at >= '2024-01-01' AND status='active'" \
  --out-file staging_subset.sql \
  --verbose


HIPAA-Compliant Extraction

config/hipaa_compliant.yaml:

version: "1.0"

database:
  url: ${MEDICAL_DATABASE_URL}

extraction:
  default_depth: 3
  direction: both
  exclude_tables:
    - audit_logs
    - system_events
  validate: true
  fail_on_validation_error: true

compliance:
  profiles: [hipaa]
  strict: true              # Fail if PII detected in output
  generate_manifest: true   # Generate audit trail

anonymization:
  enabled: true
  seed: "hipaa-compliant-extraction-2024"
  deterministic: false      # Non-deterministic for stronger privacy

Usage:

export MEDICAL_DATABASE_URL="postgresql://medical-db.example.com/ehr"

dbslice extract \
  --config config/hipaa_compliant.yaml \
  --seed "patients.id=12345" \
  --out-file patient_subset.sql

# Output:
#   patient_subset.sql              (anonymized data)
#   patient_subset.manifest.json    (audit manifest for compliance team)


Test Fixture Generation

config/test_fixtures.yaml:

version: "1.0"

database:
  url: postgresql://localhost/myapp_dev

extraction:
  default_depth: 10  # Deep traversal for complete fixtures
  direction: both
  validate: true
  fail_on_validation_error: true

anonymization:
  enabled: true
  seed: "test-fixtures-stable"  # Stable seed for reproducible tests
  fields:
    users.email: email
    users.phone: phone_number

output:
  format: sql
  include_transaction: true
  include_truncate: true      # Destructive - for test DB
  disable_fk_checks: false        # Keep FK validation

performance:
  profile: false
  streaming:
    enabled: false

Usage:

dbslice extract \
  --config config/test_fixtures.yaml \
  --seed "users.email='test@example.com'" \
  --seed "products:is_test_product=true" \
  --out-file tests/fixtures/baseline.sql


CI/CD Integration

config/ci.yaml:

version: "1.0"

database:
  url: ${CI_DATABASE_URL}

extraction:
  default_depth: 3
  direction: both
  exclude_tables:
    - audit_logs
    - sessions
  validate: true
  fail_on_validation_error: true  # Fail CI on validation errors

anonymization:
  enabled: true
  seed: ${CI_ANONYMIZATION_SEED}  # From CI secrets
  fields:
    users.email: email
    users.ssn: ssn

output:
  format: sql
  include_transaction: true

performance:
  profile: false
  streaming:
    enabled: false
    threshold: 10000  # Lower threshold for CI

CI Pipeline:

# .github/workflows/test.yml
steps:
  - name: Generate test data
    env:
      CI_DATABASE_URL: ${{ secrets.TEST_DB_URL }}
      CI_ANONYMIZATION_SEED: ${{ secrets.ANONYMIZATION_SEED }}
    run: |
      dbslice extract \
        --config config/ci.yaml \
        --seed "users:is_test_user=true" \
        --out-file test_data.sql

  - name: Load test data
    run: |
      psql $CI_DATABASE_URL < test_data.sql


Large Dataset Migration

config/migration.yaml:

version: "1.0"

database:
  url: ${SOURCE_DATABASE_URL}

extraction:
  default_depth: 3
  direction: both
  validate: true
  fail_on_validation_error: false  # Don't fail on orphaned records

anonymization:
  enabled: false  # Disable for migration

output:
  format: sql
  include_transaction: true
  include_truncate: false

performance:
  profile: true
  streaming:
    enabled: true           # Always stream
    threshold: 10000        # Low threshold
    chunk_size: 1000

Usage:

export SOURCE_DATABASE_URL="postgresql://source.example.com/myapp"

dbslice extract \
  --config config/migration.yaml \
  --seed "orders:created_at >= '2024-01-01'" \
  --out-file migration_2024.sql \
  --verbose


Best Practices

1. Version Control Configuration Files

# Commit config files to version control
git add config/*.yaml
git commit -m "Add dbslice extraction configs"

# Use .gitignore for environment-specific files
echo "config/local.yaml" >> .gitignore

2. Use Environment Variables for Secrets

# ❌ Bad: Hardcoded credentials
database:
  url: postgresql://user:password123@prod.example.com/myapp

# ✅ Good: Environment variable
database:
  url: ${DATABASE_URL}

3. Document Configuration Files

version: "1.0"

# Production to Staging configuration
# Purpose: Extract anonymized subset for staging environment
# Updated: 2024-01-15
# Owner: DevOps Team

database:
  url: ${PRODUCTION_DATABASE_URL}

extraction:
  # Depth of 5 captures full order history
  default_depth: 5
  direction: both

  # Exclude high-volume tables
  exclude_tables:
    - audit_logs      # 500M+ rows
    - analytics_events  # 1B+ rows

4. Separate Configs by Environment

config/
├── development.yaml      # Local development
├── staging.yaml          # Staging environment
├── production.yaml       # Production reads
├── ci.yaml              # CI/CD pipeline
└── migration.yaml       # Data migration

5. Test Configuration Files

# Validate config file
dbslice extract --config config/production.yaml --dry-run --seed "orders.id=1"

# Test with small dataset first
dbslice extract --config config/production.yaml --seed "orders.id=12345" --depth 1

6. Use Profiles for Different Scenarios

# Base configuration
version: "1.0"

database:
  url: ${DATABASE_URL}

extraction:
  default_depth: 3
  direction: both

# Override for specific scenarios via CLI
# Bug reproduction: --depth 10 --profile
# Quick test: --depth 1 --no-validate
# Large dataset: --stream --stream-threshold 10000

See Also