Cross-References

Vague allows records to reference other records in the dataset, creating realistic relational data.

Basic References

Use any of to reference a random item from a collection:

schema Customer {
  id: uuid(),
  name: fullName()
}

schema Invoice {
  customer: any of customers  // References a customer
}

dataset Data {
  customers: 50 of Customer,
  invoices: 200 of Invoice
}

Each invoice gets a random customer from the customers collection.

Filtered References

Add where to filter which records can be referenced:

schema Invoice {
  // Only reference active customers
  customer: any of customers where .status == "active"
}

The . prefix refers to fields on the candidate record being tested.

Multiple Conditions

Combine conditions with and and or:

schema Payment {
  // Only unpaid invoices with positive balance
  invoice: any of invoices where .status != "paid" and .balance > 0
}

schema Assignment {
  // Active OR premium users
  user: any of users where .active == true or .tier == "premium"
}

Parent References

Use ^ to reference fields from the parent schema (for nested collections):

schema LineItem {
  currency: ^base_currency,  // Inherits from parent Invoice
  amount: decimal in 10..500
}

schema Invoice {
  base_currency: "USD" | "EUR" | "GBP",
  line_items: 1..5 of LineItem
}

All line items in an invoice share the invoice's base_currency.

Deep Parent References

Parent references work at any nesting level:

schema Product {
  sku: ^product_sku  // From grandparent Order
}

schema LineItem {
  products: 1..3 of Product
}

schema Order {
  product_sku: string,
  items: 1..5 of LineItem
}

Reference Values

References resolve to the full record object. Access fields with dot notation:

schema Payment {
  invoice: any of invoices,

  // Access invoice fields
  invoice_id: invoice.id,
  original_amount: invoice.total
}

Circular References

Be careful with circular dependencies. Vague processes schemas in dataset order:

// Works: customers exist before invoices
dataset Data {
  customers: 50 of Customer,
  invoices: 200 of Invoice  // Can reference customers
}

// Fails: invoices referenced before they exist
dataset Bad {
  payments: 100 of Payment,  // Tries to reference invoices
  invoices: 200 of Invoice   // But invoices don't exist yet!
}

Empty Collection Handling

If a where clause matches no records, generation fails. Ensure your filters can always find matches:

schema Payment {
  // Risky: what if no invoices are pending?
  invoice: any of invoices where .status == "pending"
}

Solutions:

Ensure the dataset has enough matching records
Use broader filters
Generate dependent records in the right order

Practical Example

schema Department {
  id: uuid(),
  name: "Engineering" | "Sales" | "Marketing" | "HR"
}

schema Employee {
  id: uuid(),
  name: fullName(),
  department: any of departments,
  manager: any of employees where .department.name == department.name
}

schema Project {
  id: uuid(),
  name: faker.company.buzzPhrase(),
  lead: any of employees where .department.name == "Engineering",
  members: 2..5 of ProjectMember
}

schema ProjectMember {
  employee: any of employees,
  role: "developer" | "designer" | "analyst"
}

dataset Company {
  departments: 4 of Department,
  employees: 50 of Employee,
  projects: 10 of Project
}

Basic References​

Filtered References​

Multiple Conditions​

Parent References​

Deep Parent References​

Reference Values​

Circular References​

Empty Collection Handling​

Practical Example​