Snowflake Semantic ViewBest Practices Guide

What Are Semantic Views?

Introduction - What Are Semantic Views?

The Semantic Layer for Cortex Analyst & Snowflake Intelligence

A Semantic View is a YAML-based metadata layer that sits between your raw Snowflake tables and Cortex Analyst — Snowflake's natural-language-to-SQL engine that powers Snowflake Intelligence.

What is Snowflake Intelligence? Snowflake Intelligence is Snowflake's conversational AI platform powered by Cortex Agents. It enables business users to ask questions in natural language and get instant answers from their data. Cortex Analyst is the core engine that translates these questions into SQL queries using semantic views as its guide.

How They Work Together

Semantic views fix the mismatch between how business users describe data ("revenue by region") and how it's actually stored in database schemas (`TBL_SALES.AMT_TOTAL`). They define the business meaning of your data by providing:

  • Business context — Descriptions that explain what each table and column represents in business terms
  • Relationships — How tables connect to each other through joins
  • Metrics — Pre-defined calculations like Net Revenue = SUM(gross_revenue * (1 - discount))
  • Filters — Common data slices like "active customers" or "last quarter"
  • Examples — Sample question-answer pairs (Verified Queries) that guide the AI

Cortex Analyst uses these semantic views to generate accurate SQL from natural language questions. Instead of guessing column names and join paths, it understands your business concepts and generates reliable queries.

Snowflake Intelligence brings it all together: a Cortex Agent orchestrates multiple tools (Cortex Analyst for structured data, Cortex Search for documents, custom tools) to deliver end-to-end natural-language analytics inside Snowflake.

Example workflow: User asks "What were our top 10 products by revenue last quarter?" → Cortex Agent routes to Cortex Analyst → Cortex Analyst reads the semantic view → Generates SQL against physical tables → Returns answer with charts
OSI Standard Alignment: Snowflake semantic views follow the Open Semantic Initiative (OSI) v1.0 standard, which defines a common specification for semantic layers across the data industry. This ensures portability, interoperability, and future-proofing as the standard evolves.

Key Benefits & Added Value

🎯 Self-Service Analytics for Everyone

Business users query data in natural language without SQL knowledge or understanding complex schemas. Break down technical barriers and empower true self-service.

🤖 Foundation for AI & Agentic Analytics

Semantic views provide the business context LLMs need to answer real-world questions reliably. Essential for Cortex Analyst and Snowflake Intelligence to generate accurate SQL.

🎓 Single Source of Truth

Define metrics and business rules once in Snowflake, reuse everywhere. Eliminate "multiple versions of truth" where revenue means different things in different reports.

🔒 Built-in Governance & Security

Native Snowflake objects inherit role-based access control, sharing, and lineage automatically. No separate semantic server to secure.

💰 Lower TCO vs External Semantic Layers

Avoid separate semantic servers, sync jobs, and duplicated logic. Change once in Snowflake, propagate everywhere—fewer moving parts, lower ops overhead.

🚀 Faster Time-to-Value

New BI tools and AI agents plug into existing semantic views immediately. Autopilot and visual editor accelerate modeling from raw data to business-ready.

🤝 Better Team Collaboration

Shared contract between business and data teams. Names, definitions, and relationships are explicit and documented in one place.

🔓 No Vendor Lock-In

BI-tool agnostic—works with Sigma, Omni, Honeydew, Power BI, and more. Standardize semantics without locking into a single tool.

Why YAML, Not SQL DDL?

Semantic views use YAML because it supports features that SQL DDL cannot express:

  • Verified Queries (VQRs) — example question+SQL pairs for few-shot learning
  • Sample values — representative data values for filter generation
  • Custom instructions — natural language guidance for the LLM
  • Rich descriptions — business context for every column and table
  • Metrics — pre-defined aggregation logic (SUM, COUNT, AVG)
  • Named filters — reusable WHERE clause patterns

YAML format also enables version control via Git CI/CD pipelines, making it easy to track changes and collaborate across teams.

Using the semantic-view Skill in Cortex Code

Using the semantic-view Skill

What Is the semantic-view Skill?

The semantic-view skill is a bundled skill available in both Cortex Code environments:

  • Cortex Code in Snowsight — AI assistant integrated directly into the Snowflake web UI
  • Cortex Code CLI — Command-line AI assistant for local development workflows

It is the single entry point for all semantic view operations — creation, optimization, auditing, debugging, VQR generation, and more. Every best practice described in this guide is implemented as a workflow inside this skill.

You do not need to install anything. The skill ships with Cortex Code and is automatically invoked when you mention semantic views in your prompt.

Getting Started with Cortex Code

📱 Cortex Code in Snowsight

Use Cortex Code directly in your browser within Snowflake's web UI. No installation required.

📚 Documentation:
Snowflake Docs - Cortex Code in Snowsight

💻 Cortex Code CLI

Install the command-line version for terminal-based workflows and local development.

📚 Documentation:
Snowflake Docs - Cortex Code CLI

How to Invoke the Skill

Simply describe what you want in natural language. The skill triggers automatically on keywords like create, optimize, debug, audit, improve, suggest VQRs, suggest metrics, or import Tableau.

Example prompts (works in both Snowsight and CLI)
# Create a new semantic view
"Create a semantic view for our sales tables in ANALYTICS_DB.SEMANTIC_MODELS"

# Optimize an existing one
"Optimize the semantic view PROD_DB.MODELS.SALES_SV"

# Debug SQL generation issues
"Debug why 'top 10 customers by revenue' generates wrong SQL on MY_DB.SCHEMA.MY_VIEW"

# Run a best practices audit
"Audit ANALYTICS_DB.MODELS.FINANCE_SV for best practices"

# Generate VQR suggestions from usage history
"Suggest VQRs for PROD_DB.MODELS.SALES_SV"

# Suggest metrics and filters from query history
"Suggest metrics and filters for PROD_DB.MODELS.SALES_SV"

# Import a Tableau workbook
"Import the Tableau workbook at @MY_STAGE/dashboard.twbx as a semantic view"

Skill Workflow Routing

After initialization, the skill automatically routes to the right workflow based on your intent:

Your Prompt
semantic-view skill
Route
IntentWorkflowWhat Happens
Create new SV Creation Mode FastGen API generates YAML from your tables and SQL queries, then validates
Import Tableau file Tableau Import Analyzes .twb/.twbx/.tds/.tdsx, exports to semantic view YAML
Audit / check quality Audit Mode VQR testing, best practices verification, inconsistency & duplicate detection
Fix SQL generation Debug Mode Diagnosis → Root Cause Analysis → Apply targeted optimizations
Suggest VQRs VQR Suggestions Mines CA request history + query history in parallel, returns ranked suggestions
Suggest metrics / filters Filters & Metrics Suggestions Uses SYSTEM$CORTEX_ANALYST_SVA_TOOL to suggest metrics, filters, and computed facts from real usage

Prerequisites

  • Cortex Code CLI installed and connected to Snowflake
  • Fully qualified semantic view name for existing views: DATABASE.SCHEMA.VIEW_NAME
  • Python managed via uv — the skill uses uv run python for all scripts
  • Dependencies: tomli, urllib3, requests, pyyaml, snowflake-connector-python (auto-managed by uv)
The skill always tests locally first. Changes are applied to a local YAML file. Upload to Snowflake only happens when you explicitly ask for deployment.

Skill Capabilities at a Glance

6Core workflows
18Python scripts
42Guidance documents
5Optimization patterns

YAML Anatomy — Structure of a Semantic View

YAML Anatomy

Complete YAML Skeleton

Every semantic view follows this structure. Elements are categorized as Physical (mapped to real columns) or Computed (added logic).

name: my_analytics_model
description: "Sales analytics for the North America region"

# ---- Tables ----
tables:
  - name: SALES                  # Logical name (for humans)
    description: "Daily sales transactions"
    base_table:
      database: PROD_DB
      schema: ANALYTICS
      table: TBL_SALES           # Physical table name

    # Physical columns (CANNOT add new ones)
    dimensions:                     # Physical categorical
      - name: REGION
        description: "Sales region (NA, EMEA, APAC)"
        expr: region
        data_type: TEXT
        sample_values: ["NA", "EMEA", "APAC"]

    time_dimensions:                # Physical date/time
      - name: SALE_DATE
        description: "Date the transaction occurred"
        expr: sale_date
        data_type: DATE

    facts:                          # Physical numeric
      - name: REVENUE
        description: "Transaction revenue in USD"
        expr: revenue_usd
        data_type: NUMBER

    # Computed elements (CAN add new ones)
    metrics:                        # Computed aggregations
      - name: TOTAL_REVENUE
        description: "Sum of all revenue"
        expr: SUM(revenue_usd)
        access_modifier: public_access

    filters:                        # Computed WHERE clauses
      - name: LAST_30_DAYS
        description: "Records from the last 30 days"
        expr: sale_date >= DATEADD(day, -30, CURRENT_DATE())

# ---- Relationships ----
relationships:                      # Computed joins
  - name: sales_to_customers
    left_table: SALES
    right_table: CUSTOMERS
    relationship_columns:
      - left_column: CUSTOMER_ID
        right_column: CUSTOMER_ID
    relationship_type: many_to_one
    join_type: left_outer

# ---- Verified Queries ----
verified_queries:
  - name: vqr_0
    question: "What is the total revenue by region?"
    sql: |
      SELECT region, SUM(revenue_usd) AS total_revenue
      FROM prod_db.analytics.tbl_sales
      GROUP BY region ORDER BY total_revenue DESC

# ---- Custom Instructions ----
module_custom_instructions:         # Computed LLM guidance
  sql_generation: |
    Use table aliases for readability.
  question_categorization: |
    Treat trend questions as UNAMBIGUOUS_SQL.

Key Concepts — What You Can & Cannot Do

Key Concepts

Physical vs Computed Elements

Category Elements Can Add New? Can Enhance?
Physical Dimensions, Facts, Time Dimensions No — mapped to real columns Yes — descriptions, synonyms, sample_values
Computed Metrics, Filters, Relationships, Custom Instructions Yes Yes

Logical vs Physical Table Names

YAML Definition (Logical)
tables:
  - name: SALES_FACT   # Logical
    base_table:
      table: TBL_SALES  # Physical
Generated SQL (Physical)
SELECT * FROM prod.sales.tbl_sales
-- Uses physical name. This is CORRECT!
Key insight: Cortex Analyst always generates SQL with physical base_table references, never logical names. This is expected and correct behavior.

Deprecated Fields — Never Use These

Deprecated FieldReplacementReason
measuresfactsRenamed in current spec
default_aggregationUse metricsDeprecated from proto
one_to_manymany_to_one (swap tables)Deprecated relationship type
many_to_manyNot supportedRequires PK on at least one side
full_outer / cross / right_outerinner or left_outerDeprecated join types

Creation Workflow

Creation Workflow

Step-by-Step Process

1. Name & Context
2. Explore Schema
3. Generate YAML
4. Validate
5. Deploy

Context Gathering (Step 1)

  • Name: simple identifier or FQN (DATABASE.SCHEMA.NAME)
  • Target: database + schema for the semantic view
  • Context: SQL queries, table references, and business context — all in one request

Two Creation Paths

FastGen API (Automated)
Extracts metadata, infers PKs and relationships, generates dimensions/facts/metrics/VQRs automatically. Recommended for most use cases.
Manual Creation
Hand-craft the YAML using DESCRIBE TABLE results and business knowledge. Fallback when FastGen fails.

Deployment Options

Option 1: Database Object (Recommended)

CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML(
  'DATABASE.SCHEMA',
  '<yaml_content>',
  FALSE  -- FALSE = create
);

Creates a managed Snowflake object. No stage needed.

Option 2: Stage Upload

CREATE STAGE IF NOT EXISTS my_stage;
PUT file://model.yaml @my_stage/;

Reference via stage path: @my_stage/model.yaml

Validation — Always Validate Before Deploying

-- In Cortex Code:
reflect_semantic_model(semantic_model_file="path/to/model.yaml")

-- Via SQL (validate only, don't create):
CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML(
  'DB.SCHEMA', '<yaml>', TRUE  -- TRUE = validate only
);
Always validate after every change. Common errors: invalid data types, missing required fields, incorrect YAML syntax, using deprecated fields.

Sizing & Architecture

Sizing and Architecture

Size Limits

32KToken limit (excl. VQRs)
20KPerformance sweet spot
50-100Columns per SV
100Table hard limit
Performance degrades above 20K tokens. If your semantic view exceeds this, consider splitting by domain.

Domain-Based Splitting

When a single semantic view grows too large, split by business domain:

  • One SV per business domain (Sales, Finance, Marketing, etc.)
  • 50-100 columns each — focused on domain-specific questions
  • Duplicate shared dimensions (e.g., date, region) across SVs
  • Use Cortex Analyst routing to direct questions to the right SV (up to 40 SVs)
User Question
Router (up to 40 SVs)
Sales SV
Finance SV
Marketing SV

Descriptions & Metadata — The Biggest Accuracy Lever

Descriptions and Metadata

Descriptions Have the Biggest Impact

Rich, contextual descriptions are the single most impactful thing you can do for accuracy. They outperform synonyms for English-language environments.

- name: ACCOUNT_ID
  description: "Account ID"
  expr: account_id
  data_type: NUMBER

Minimal — just repeats the column name. Gives the LLM nothing to work with.

- name: ACCOUNT_ID
  description: "Unique identifier for
    Snowflake accounts. Used to
    track usage and link with
    customer information."
  expr: account_id
  data_type: NUMBER
  unique: true

Rich context: what it is, how it's used, business relationships. Plus unique: true to prevent unnecessary DISTINCT.

Description Best Practices

Every description should cover:

  • What the column represents
  • How it's commonly used in queries
  • Relationships to other data (foreign keys, hierarchies)
  • Business context (what domain it belongs to)

Sample Values & Cortex Search

Distinct ValuesApproachExample
Fewer than 10 Use sample_values sample_values: ["NA", "EMEA", "APAC"]
More than 10 Use Cortex Search Auto-updating, low cost, scales to thousands
Tip: For boolean/enum-like columns, also set sample_values to help the LLM generate correct filter values.

Enhancement Strategies

EnhancementWhen to UseImpact
DescriptionsAlways — every column and tableHighest impact
Sample valuesCategorical columns with few valuesHigh — better filter generation
Unique flagPrimary key columnsMedium — prevents unnecessary DISTINCT
SynonymsNon-English environments or strong aliasesLower — descriptions are usually enough

Metrics, Filters & Relationships

Metrics, Filters and Relationships

Optimization Priority

When fixing SQL generation issues, apply fixes in this order:

  1. Enhance Dimensions & Facts
    Improve descriptions, add synonyms, sample_values. Lowest risk, highest impact.
  2. Add Metrics
    Define common aggregations (SUM, COUNT, AVG) to prevent wrong aggregation logic.
  3. Add Filters
    Named WHERE clauses for business logic (active customers, last 30 days, etc.).
  4. Add/Fix Relationships
    Table joins. Requires primary key on at least one side.
  5. Custom Instructions
    Last resort. Only when semantic model elements cannot express the logic.

Metrics

Facts are raw numeric columns. Metrics are computed aggregations you can add.

metrics:
  - name: TOTAL_REVENUE
    description: "Sum of all order amounts"
    expr: SUM(order_amount)
    access_modifier: public_access

  - name: UNIQUE_CUSTOMERS
    description: "Count of distinct customers"
    expr: COUNT(DISTINCT customer_id)
    access_modifier: public_access

  - name: AVG_ORDER_VALUE
    description: "Average value across all orders"
    expr: AVG(order_amount)
    access_modifier: public_access
Always include access_modifier: public_access. Missing it is a common error.

Filters

Named WHERE clauses that must be nested under the table they apply to.

tables:
  - name: CUSTOMERS
    filters:
      - name: ACTIVE_CUSTOMERS
        description: "Customers with active status and recent purchases"
        expr: status = 'ACTIVE' AND last_purchase_date >= DATEADD(month, -6, CURRENT_DATE())

      - name: LAST_30_DAYS
        description: "Records from the last 30 days"
        expr: created_date >= DATEADD(day, -30, CURRENT_DATE())

Filters nested under tables

Simple WHERE-style expressions

Clear descriptions of when to use

Filters at the top level

HAVING clauses (use metrics instead)

Aggregation-dependent filters

Relationships (Joins)

PropertyAllowed ValuesNotes
relationship_typeone_to_one, many_to_oneMust have PK on at least one side
join_typeinner, left_outerUse left_outer unless reference data must exist
relationships:
  - name: orders_to_customers
    left_table: ORDERS
    right_table: CUSTOMERS
    relationship_columns:
      - left_column: CUSTOMER_ID
        right_column: CUSTOMER_ID
    relationship_type: many_to_one
    join_type: left_outer
Primary Key Required: At least one table in the relationship must have a primary key on the join columns. Verify with infer_primary_keys.py (95% uniqueness threshold) before adding relationships.

Verified Queries (VQRs) — 40-60% Accuracy Boost

Verified Queries

What Are VQRs?

Verified Query Results are question + SQL pairs that serve as few-shot examples for Cortex Analyst. When a user asks a question, the system finds the most similar VQR via cosine similarity and uses it as a template.

40-60%Accuracy improvement
#1Accuracy lever available
VQRs are excluded from the 32K token limit. You can add many VQRs without hitting size constraints.

VQR Format & Naming

verified_queries:
  - name: vqr_0                            # Sequential: vqr_0, vqr_1, ...
    question: "What are the daily active users?"  # Natural language
    sql: |                                      # Pipe syntax for multi-line
      SELECT
        DATE_TRUNC('day', logged_at)::date AS ds,
        COUNT(DISTINCT user_id) AS active_users
      FROM analytics.logs.usage_logs
      WHERE logged_at >= '2024-01-01'
      GROUP BY ds
      ORDER BY ds

  - name: vqr_1
    question: "Top 10 customers by revenue?"
    sql: |
      SELECT customer_name, SUM(revenue) AS total_revenue
      FROM sales.transactions
      GROUP BY customer_name
      ORDER BY total_revenue DESC
      LIMIT 10

VQR Best Practices

  • Use sequential naming: vqr_0, vqr_1, ...
  • Use pipe syntax | for multi-line SQL
  • Write natural language questions
  • Reserve for complex/ambiguous cases
  • Include complete, valid SQL
  • Use physical table references in SQL
  • Duplicate similar questions
  • Use inline SQL (escaping issues)
  • Write SQL that references logical table names
  • Add VQRs for trivially simple queries
  • Include VQRs with broken SQL
  • Skip testing VQRs before deployment
Get AI-generated VQR suggestions: Use the semantic-view skill's VQR Suggestions workflow. It runs both ca_requests_based and query_history_based modes in parallel.

Custom Instructions — Last Resort

Custom Instructions

When to Use Custom Instructions

Custom instructions are natural language guidance for the LLM. Only use them after trying all other optimization levers:

Descriptions
Metrics
Relationships
Filters
Custom Instructions

Two Types

Recommended module_custom_instructions

Targeted guidance for specific pipeline components.

module_custom_instructions:
  sql_generation: |
    Use table aliases for readability.
    For fiscal quarters, add 11 months
    before extracting quarter.
  question_categorization: |
    Treat trend questions as
    UNAMBIGUOUS_SQL.

Legacy custom_instructions

General instructions for SQL generation.

custom_instructions: |-
  Unless explicitly asked for
  historical analysis, ALWAYS filter
  for snapshot_flag = 'Current'.

  "Biggest X" means frequency
  (COUNT), not value (SUM).
If starting fresh, prefer module_custom_instructions. If the SV already has custom_instructions, keep using that pattern for consistency.

Writing Good Instructions

Generalized (Good)
"Biggest [dimension]" means most
frequent:
- GROUP BY dimension
- COUNT(DISTINCT primary_key)
- Filter out NULLs

Applies to any "biggest X" question.

Overfitting (Bad)
For "Who are biggest partners?",
use COUNT(DISTINCT id)
GROUP BY partner.

Only works for one exact question.

Maximum ~10 effective instructions. Beyond that, instructions start conflicting or getting ignored. Keep them focused and non-redundant.

Audit & Quality Assurance

Audit and Quality Assurance

Three Audit Types

VQR Testing

Test each VQR without hints. Identifies which queries fail to measure completeness.

Best Practices

Documentation, naming, metadata, type safety, inconsistencies, duplicates, missing relationships.

Custom Criteria

User-defined validation rules for domain-specific requirements.

Best Practices Checks

Check CategoryWhat It VerifiesSeverity
DocumentationAll tables and columns have descriptions; quality/clarityHIGH
NamingNo special characters; consistent conventionsMEDIUM
MetadataData types defined; synonyms used appropriatelyMEDIUM
Type SafetyCorrect dimension vs fact classification; time dimension typesHIGH

Inconsistency Detection

TypeExampleSeverity
Conflicting descriptions order_date described differently across tables MEDIUM
Data type mismatch customer_id is NUMBER in one table, VARCHAR in another CRITICAL
Orphaned relationships Relationship references non-existent table or column CRITICAL
Circular dependencies Table A → B → C → A HIGH
Contradictory filters Filter requires both status = 'active' AND status = 'inactive' CRITICAL

Duplicate Detection

Identifies when custom instructions redundantly repeat information already in model elements:

Duplicate TypeMatch LevelResolution
Exact duplicate100%Remove from instructions
High similarity>85%Remove from instructions
Partial overlap50-85%Review and consolidate
Keep information in its natural location: Descriptions in descriptions, filter logic in filters, metric logic in metrics. Custom instructions should only contain unique guidance not already in the model.

Missing Relationships Thresholds

Tables in SVMinimum Expected RelationshipsFlag If
2-310 relationships
4-621 or fewer
7+32 or fewer

Only flagged when multiple tables share FK-like columns or dimension tables have no relationships.

Debug Workflow — Fixing SQL Generation Issues

Debug Workflow

Three-Step Debug Process

1. Diagnosis
2. Root Cause Analysis
3. Apply Optimizations

Step 1: Diagnosis

  • Get the problematic natural language question from the user
  • Generate SQL using Cortex Analyst with the current semantic view
  • Execute the SQL and present results

Step 2: Root Cause Analysis

  • Compare generated SQL against expected output
  • Identify gaps in the semantic model
  • Present findings and recommended fixes
  • Wait for user approval before applying changes

Step 3: Apply Optimizations

  • Apply approved fixes using the optimization priority order
  • Validate with reflect_semantic_model
  • Re-test the problematic question
  • Exact match required — no "close enough"

Issue-to-Gap Mapping

SQL IssueLikely Semantic Model GapFix Type
Missing or wrong column Dimension/fact not properly described or missing synonyms Enhance
Wrong aggregation Missing metric definition or unclear fact descriptions Add Metric
Missing JOIN Missing relationship between tables Add Relationship
Incorrect filter Missing named filter or unclear dimension values Add Filter
Wrong business logic Complex rules not expressible via model elements Custom Instruction

Improvement & Optimization Loop

Improvement and Optimization Loop

Three Improvement Approaches

VQR Suggestions

AI-generated verified query suggestions based on usage patterns. Modes: ca_requests_based or query_history_based.

Agentic Optimization

Automated AI analysis. Runs async (minutes). Suggests VQRs, descriptions, and structural changes.

Manual Evaluation

Test existing VQRs, identify failures, fix iteratively with the debug workflow.

Agentic Optimization Workflow

Check Existing
Create Optimization
Poll Status
Review Results
Apply Changes
SQL Commands for Agentic Optimization
-- Start optimization
CALL SYSTEM$CORTEX_ANALYST_CREATE_AGENTIC_OPTIMIZATION('{
  "semantic_model": {
    "semantic_view": "DB.SCHEMA.VIEW_NAME"
  },
  "warehouse": "MY_WH",
  "experimental": "{}"
}');

-- Check status
CALL SYSTEM$CORTEX_ANALYST_GET_AGENTIC_OPTIMIZATION('<optimization_name>');

-- List all optimizations
CALL SYSTEM$CORTEX_ANALYST_LIST_AGENTIC_OPTIMIZATIONS('{
  "semantic_model": {
    "semantic_view": "DB.SCHEMA.VIEW_NAME"
  },
  "experimental": "{}"
}');

-- Cancel if needed
CALL SYSTEM$CORTEX_ANALYST_CANCEL_AGENTIC_OPTIMIZATION('<optimization_name>');

Iterative Improvement Cycle

Small incremental changes are better than bulk updates. After each change: validate → test → verify. This makes it easy to identify which change caused a regression.
Audit
Identify Issues
Apply Fix
Validate
Test
Deploy

Re-Deployment After Changes

If the semantic view was exported from a database object, re-deploy after making changes:

-- Export current definition
SELECT SYSTEM$READ_YAML_FROM_SEMANTIC_VIEW('DB.SCHEMA.VIEW_NAME');

-- After editing locally, re-deploy
CALL SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML(
  'DB.SCHEMA',
  '<updated_yaml_content>',
  FALSE
);

Common Pitfalls & Anti-Patterns

Common Pitfalls and Anti-Patterns

Top 10 Mistakes to Avoid

#PitfallImpactFix
1 Too many columns (>100) Performance degrades, token limit exceeded Split by domain, keep 50-100 per SV
2 No VQRs Missing 40-60% accuracy gain Add VQRs for complex/ambiguous questions
3 Missing relationships in multi-table SVs JOINs not generated, data incomplete Add relationships with verified PKs
4 Duplicate instructions Confusion, conflicting guidance, wasted tokens Keep info in its natural location; audit for dupes
5 Deprecated fields (measures, default_aggregation) Validation errors or silent failures Use facts and metrics instead
6 Minimal descriptions LLM can't understand column purpose Write rich descriptions with business context
7 Custom instructions as first resort Brittle, hard to maintain, limited to ~10 Follow priority order: descriptions → metrics → filters → relationships → instructions
8 NOT IN conditions in generated SQL Unexpected NULL handling Use explicit filters or custom instructions to guide SQL patterns
9 No testing before deployment Broken queries in production Always validate + run VQR testing audit
10 Timestamp filtering bugs Wrong date ranges in results Use explicit time_dimensions; add date filters with DATEADD

The Semantic View Maturity Model

Phase 1: Basic

  • IT-driven creation
  • Minimal descriptions
  • No VQRs
  • Manual deployment

Phase 2: Optimized

  • Rich metadata
  • VQRs from usage patterns
  • Regular audits
  • Git-based version control

Phase 3: Business-Owned

  • Business teams maintain SVs
  • CI/CD pipelines
  • Automated testing
  • Agentic optimization

Quick Reference Checklist

Every table has a description

Every column has a rich description

Using facts not measures

VQRs added for complex queries

Relationships defined between tables

Primary keys verified before relationships

Fewer than 100 columns per SV

Under 32K tokens (excl. VQRs)

Metrics have access_modifier

Filters nested under tables

No deprecated fields used

Validated with reflect_semantic_model

VQR testing audit passed

No duplicate instructions

Snowflake Semantic View Best Practices — Built from Cortex Code CLI bundled skills & internal research
Sources: Cortex Code CLI bundled semantic-view skill