Skip to main content
ChartDB’s AI-powered migration feature uses large language models to intelligently convert database schemas between different SQL dialects, handling complex type conversions and dialect-specific features that traditional converters struggle with.

Overview

AI migration leverages OpenAI’s language models to provide context-aware database schema translation, going beyond simple type mapping to understand the semantic meaning of your schema and generate optimized DDL for the target database.
AI migration is available as an enhanced alternative to ChartDB’s built-in deterministic cross-dialect conversion. Use it when migrating between databases that don’t have deterministic conversion support, or when you need more sophisticated transformations.

When to Use AI Migration

Complex Schemas

Schemas with database-specific features and optimizations

Unsupported Paths

Migration paths not covered by deterministic conversion

Custom Types

Complex custom types, domains, and user-defined types

Optimization

Generating database-specific performance optimizations

Configuration

API Setup

AI migration requires OpenAI API configuration:
// Environment variables
const OPENAI_API_KEY = process.env.OPENAI_API_KEY;
const OPENAI_API_ENDPOINT = process.env.OPENAI_API_ENDPOINT || 'https://api.openai.com/v1';
const LLM_MODEL_NAME = process.env.LLM_MODEL_NAME || 'gpt-4';

Model Selection

ChartDB supports various OpenAI models:
ModelBest ForCost
gpt-4Complex migrations, high accuracyHigher
gpt-4-turboFast migrations, good accuracyMedium
gpt-3.5-turboSimple migrations, budget-friendlyLower
For production migrations, use gpt-4 for the most accurate and reliable results. For development and testing, gpt-3.5-turbo is often sufficient.

How It Works

The AI migration process follows these steps:
1

Schema Analysis

The source schema is analyzed to identify:
  • Database-specific features
  • Custom types and constraints
  • Index strategies
  • Performance optimizations
2

Context Building

A detailed prompt is constructed with:
  • Source database type and version
  • Target database type and version
  • Schema complexity indicators
  • Special requirements
3

LLM Translation

The language model processes the schema and generates:
  • Type-converted table definitions
  • Target-appropriate constraints
  • Optimized indexes
  • Migration notes and warnings
4

Validation

Generated SQL is validated for:
  • Syntax correctness
  • Referential integrity
  • Feature compatibility

AI Migration Examples

PostgreSQL to MySQL

-- PostgreSQL with array types and JSONB
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL UNIQUE,
    tags TEXT[],
    settings JSONB,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- GIN index for array search
CREATE INDEX idx_users_tags ON users USING GIN(tags);

-- PostgreSQL-specific regex check
ALTER TABLE users 
ADD CONSTRAINT check_email 
CHECK (email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}$');

Oracle to PostgreSQL

-- Oracle with sequences and specific types
CREATE SEQUENCE emp_seq START WITH 1000 INCREMENT BY 1;

CREATE TABLE employees (
    employee_id NUMBER PRIMARY KEY,
    email VARCHAR2(255) NOT NULL,
    hire_date DATE DEFAULT SYSDATE,
    salary NUMBER(10,2) CHECK (salary > 0),
    department_id NUMBER,
    CONSTRAINT fk_dept 
        FOREIGN KEY (department_id) 
        REFERENCES departments(department_id)
);

CREATE INDEX idx_emp_email ON employees(email);

SQL Server to PostgreSQL

-- SQL Server with T-SQL specific features
CREATE TABLE [dbo].[Users] (
    [Id] INT IDENTITY(1,1) PRIMARY KEY,
    [Email] NVARCHAR(255) NOT NULL,
    [UniqueId] UNIQUEIDENTIFIER DEFAULT NEWID(),
    [CreatedAt] DATETIME2 DEFAULT GETDATE(),
    [Data] NVARCHAR(MAX),
    CONSTRAINT [UQ_Email] UNIQUE ([Email])
);

CREATE NONCLUSTERED INDEX [IX_Users_Email]
ON [dbo].[Users] ([Email])
INCLUDE ([UniqueId], [CreatedAt]);

Advanced AI Capabilities

Semantic Understanding

The AI understands the purpose of your schema:
-- AI recognizes this is a timestamp field
-- and suggests appropriate defaults

Source: created_date DATE
Target: created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP

Feature Translation

Complex database features are intelligently translated:
-- PostgreSQL range partitioning
CREATE TABLE measurements (
    id SERIAL,
    measured_at TIMESTAMP,
    value NUMERIC
) PARTITION BY RANGE (measured_at);

-- AI converts to MySQL partitioning syntax
CREATE TABLE measurements (
    id INT AUTO_INCREMENT,
    measured_at TIMESTAMP,
    value DECIMAL(10,2),
    PRIMARY KEY (id, measured_at)
) PARTITION BY RANGE (YEAR(measured_at)) (
    PARTITION p2023 VALUES LESS THAN (2024),
    PARTITION p2024 VALUES LESS THAN (2025)
);
-- PostgreSQL generated column
CREATE TABLE products (
    price NUMERIC,
    tax_rate NUMERIC,
    total NUMERIC GENERATED ALWAYS AS (price * (1 + tax_rate)) STORED
);

-- AI converts to MySQL generated column
CREATE TABLE products (
    price DECIMAL(10,2),
    tax_rate DECIMAL(4,3),
    total DECIMAL(10,2) AS (price * (1 + tax_rate)) STORED
);
-- Complex view with window functions
-- AI ensures syntax compatibility across databases
CREATE VIEW ranked_products AS
SELECT 
    product_id,
    name,
    price,
    ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) as rank
FROM products;
-- PostgreSQL trigger
CREATE OR REPLACE FUNCTION update_modified_column()
RETURNS TRIGGER AS $$
BEGIN
    NEW.modified_at = NOW();
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- AI converts to MySQL trigger
DELIMITER $$
CREATE TRIGGER update_modified_column
BEFORE UPDATE ON table_name
FOR EACH ROW
BEGIN
    SET NEW.modified_at = NOW();
END$$
DELIMITER ;

Prompt Engineering

ChartDB uses carefully crafted prompts for optimal results:
const generateMigrationPrompt = (
    sourceSQL: string,
    sourceDB: DatabaseType,
    targetDB: DatabaseType
): string => {
    return `
You are a database migration expert. Convert this ${sourceDB} schema to ${targetDB}.

Requirements:
1. Preserve all data integrity constraints
2. Use native ${targetDB} features where appropriate
3. Optimize for ${targetDB} performance best practices
4. Include migration notes for any non-trivial conversions
5. Add comments explaining type conversions
6. Ensure referential integrity is maintained

Source SQL (${sourceDB}):
${sourceSQL}

Generate optimized ${targetDB} DDL with:
- Type conversions explained
- Index strategy for ${targetDB}
- Any ${targetDB}-specific optimizations
- Migration warnings if applicable

Return only valid ${targetDB} SQL DDL.
`;
};

Response Processing

The AI response is processed and validated:
interface AIMigrationResult {
    sql: string;
    warnings: string[];
    notes: string[];
    confidence: number;
}

const processAIResponse = async (
    response: string,
    targetDB: DatabaseType
): Promise<AIMigrationResult> => {
    // Extract SQL from response
    const sql = extractSQL(response);
    
    // Extract migration notes
    const notes = extractNotes(response);
    
    // Validate syntax
    const isValid = await validateSQL(sql, targetDB);
    
    // Extract warnings
    const warnings = extractWarnings(response);
    
    return {
        sql,
        warnings,
        notes,
        confidence: calculateConfidence(response),
    };
};

Cost Estimation

Estimate migration cost before running:
interface MigrationCost {
    estimatedTokens: number;
    estimatedCost: number;
    currency: string;
}

const estimateMigrationCost = (
    sourceSQL: string,
    model: string
): MigrationCost => {
    // Rough token estimation (4 chars ≈ 1 token)
    const inputTokens = Math.ceil(sourceSQL.length / 4);
    const estimatedOutputTokens = inputTokens * 1.5; // Response usually 1.5x input
    const totalTokens = inputTokens + estimatedOutputTokens;
    
    // Model pricing (per 1K tokens)
    const pricing = {
        'gpt-4': { input: 0.03, output: 0.06 },
        'gpt-4-turbo': { input: 0.01, output: 0.03 },
        'gpt-3.5-turbo': { input: 0.0005, output: 0.0015 },
    };
    
    const rate = pricing[model] || pricing['gpt-4'];
    const cost = 
        (inputTokens / 1000 * rate.input) +
        (estimatedOutputTokens / 1000 * rate.output);
    
    return {
        estimatedTokens: totalTokens,
        estimatedCost: cost,
        currency: 'USD',
    };
};

Error Handling

Robust error handling for AI migrations:
AI migration can fail due to API limits, token limits, or invalid responses. ChartDB implements automatic retry with exponential backoff and falls back to deterministic conversion when available.
const migrateWithRetry = async (
    sourceSQL: string,
    targetDB: DatabaseType,
    maxRetries: number = 3
): Promise<string> => {
    for (let attempt = 1; attempt <= maxRetries; attempt++) {
        try {
            const result = await callOpenAI(sourceSQL, targetDB);
            return result.sql;
        } catch (error) {
            if (attempt === maxRetries) {
                // Fall back to deterministic conversion if available
                if (hasCrossDialectSupport(sourceDB, targetDB)) {
                    return deterministicConversion(sourceSQL, targetDB);
                }
                throw new Error('AI migration failed after all retries');
            }
            
            // Exponential backoff
            await sleep(Math.pow(2, attempt) * 1000);
        }
    }
};

Best Practices

For successful AI migrations:
  1. Review Output: Always review AI-generated SQL before deployment
  2. Test Thoroughly: Test migrations on staging data first
  3. Compare Results: Compare with deterministic conversion when available
  4. Document Changes: Keep AI migration notes for reference
  5. Validate Constraints: Ensure all constraints are properly converted
  6. Check Performance: Review generated indexes and optimizations
  7. Monitor Costs: Track API usage for budget planning

Limitations

Very large schemas may exceed model token limits. Split into smaller batches if needed.
AI responses may vary between runs. Use deterministic conversion for production when available.
Extremely database-specific features may require manual review and adjustment.
Large schemas can incur significant API costs. Estimate before running.

Comparison: AI vs Deterministic

AspectAI MigrationDeterministic Conversion
AccuracyHigh, context-awareVery high, tested
SpeedAPI-dependent (seconds)Instant
CostPer-use API costFree
ConsistencyMay varyAlways same result
OptimizationDatabase-specific suggestionsStandard mapping
Complex FeaturesBetter handlingLimited support
Unsupported PathsWorks for all combinationsLimited paths
Use deterministic conversion when available for production migrations. Use AI migration for unsupported paths or when you need additional optimization suggestions.

Security Considerations

Your schema is sent to OpenAI’s API. Do not include sensitive data, credentials, or proprietary information in table names, column names, or comments during AI migration.
Security best practices:
  • Sanitize schema before migration
  • Use environment variables for API keys
  • Review OpenAI’s data usage policies
  • Consider self-hosted LLM alternatives for sensitive schemas

Next Steps

Export Options

Learn about all export formats

Import Methods

Explore different import options