Schema Inference
Vague can reverse-engineer schemas from existing JSON or CSV data, detecting types, ranges, patterns, and relationships.
Basic Usage
# Infer from JSON
vague --infer data.json -o schema.vague
# Infer from CSV
vague --infer data.csv --collection-name employees -o schema.vague
What Gets Detected
Types
| Data Pattern | Inferred Type |
|---|---|
123 | int |
12.34 | decimal |
"text" | string |
true/false | boolean |
"2024-01-15" | date |
"2024-01-15T10:30:00Z" | datetime() |
Formats
| Pattern | Inferred Generator |
|---|---|
| UUID | uuid() |
email() | |
| URL | faker.internet.url() |
| Phone | phone() |
Ranges
// Input
[
{ "age": 25 },
{ "age": 42 },
{ "age": 31 }
]
// Inferred
schema Record {
age: int in 25..42
}
Enums
// Input
[
{ "status": "active" },
{ "status": "active" },
{ "status": "pending" }
]
// Inferred with weights
schema Record {
status: 0.67: "active" | 0.33: "pending"
}
Nullable Fields
// Input
[
{ "name": "John", "nickname": "Johnny" },
{ "name": "Jane", "nickname": null }
]
// Inferred
schema Record {
name: string,
nickname: string?
}
Unique Fields
// Input
[
{ "id": 1, "code": "ABC" },
{ "id": 2, "code": "DEF" },
{ "id": 3, "code": "GHI" }
]
// Inferred
schema Record {
id: unique int in 1..3,
code: unique "ABC" | "DEF" | "GHI"
}
Advanced Detection
Derived Fields
Detects computed relationships:
[
{ "qty": 2, "price": 10, "total": 20 },
{ "qty": 3, "price": 15, "total": 45 }
]
schema Record {
qty: int in 2..3,
price: int in 10..15,
total: qty * price // Detected multiplication
}
Ordering Constraints
Detects field ordering:
[
{ "start": "2024-01-01", "end": "2024-01-15" },
{ "start": "2024-02-01", "end": "2024-03-01" }
]
schema Record {
start: date,
end: date,
assume end >= start
}
Conditional Constraints
Detects conditional patterns:
[
{ "type": "premium", "discount": 20 },
{ "type": "premium", "discount": 25 },
{ "type": "basic", "discount": 0 },
{ "type": "basic", "discount": 0 }
]
schema Record {
type: "premium" | "basic",
discount: int in 0..25,
assume if type == "basic" { discount == 0 }
}
CSV Inference
Basic CSV
vague --infer employees.csv --collection-name employees
CSV Options
# Custom delimiter
vague --infer data.csv --infer-delimiter ";" --collection-name records
# Custom dataset name
vague --infer data.csv --collection-name users --dataset-name TestData
Programmatic API
import { inferSchema } from 'vague-lang';
const data = [
{ name: 'John', age: 30 },
{ name: 'Jane', age: 25 }
];
const schema = inferSchema(data, {
collectionName: 'users',
datasetName: 'Inferred'
});
console.log(schema);
Practical Examples
Migration Workflow
# 1. Export existing data
pg_dump --table=users -F json > users.json
# 2. Infer schema
vague --infer users.json -o users.vague
# 3. Review and adjust
# Edit users.vague to add constraints, relationships
# 4. Generate new test data
vague users.vague -o test-users.json
API Contract Discovery
# 1. Capture API responses
curl https://api.example.com/products > products.json
# 2. Infer schema
vague --infer products.json -o products.vague
# 3. Generate mock data
vague products.vague -o mock-products.json -s 42
Database Seeding
# 1. Export sample data
mongoexport --collection=orders --out=orders.json
# 2. Infer schema
vague --infer orders.json -o orders.vague
# 3. Generate scaled dataset
# Edit orders.vague to increase counts
vague orders.vague -o seed-data.json
TypeScript Generation
Generate TypeScript types alongside schemas:
# Schema + TypeScript
vague --infer data.json --typescript -o schema.vague
# TypeScript only
vague --infer data.json --ts-only
Output:
// schema.d.ts
export interface User {
id: string;
name: string;
age: number;
email: string;
status: 'active' | 'pending' | 'inactive';
}
Limitations
- Sample size matters — More data = better inference
- Edge cases — Rare values may not be detected
- Complex relationships — Cross-record refs not auto-detected
- Nested objects — Deep nesting may need manual adjustment
Best Practices
- Use representative data — Include edge cases in samples
- Review inferred schemas — Adjust ranges and constraints
- Add relationships — Manually add
any ofreferences - Test generation — Verify output matches expectations
See Also
- CLI Reference for all inference options
- TypeScript API for programmatic use