Core Data Schema

Context

We need a robust database structure to store legislative documents, their vector embeddings for AI analysis, and the voting records. The schema must be optimized for PostgreSQL with pgvector.

Proposed Schema

1. Table: `legislative_docs`

Stores the raw and processed text of laws/proposals.

id: UUID (Primary Key)
title: String
content: Text (The full law text)
url: String (Source link)
status: Enum (Draft, Active, Revoked)
created_at: Timestamp

2. Table: `doc_chunks_embeddings`

This is where the AI/LangChain will live. We split large laws into "chunks" for better RAG performance.

id: UUID
doc_id: ForeignKey(legislative_docs)
chunk_content: Text (A portion of the law)
embedding: Vector(1536) <-- Optimized for OpenAI/Open-source models
metadata: JSONB (Page number, section title)

3. Table: `votes`

The integrity core.

id: UUID
doc_id: ForeignKey(legislative_docs)
user_id: UUID
decision: Enum (Yes, No, Abstain)
hash: String (SHA-256 of the vote data for integrity)
timestamp: Timestamp (Immutable)

Rationale

Chunking Strategy: Large laws cannot be sent to an LLM at once. Storing chunks with embeddings allows for precise "Semantic Search".
JSONB Metadata: Allows flexibility for different types of legislation without changing the schema.
Vector(1536): Standard size for many high-performance embedding models.

Core Data Schema ​

Context ​

Proposed Schema ​

1. Table: legislative_docs ​

2. Table: doc_chunks_embeddings ​

3. Table: votes ​

Rationale ​