Rules · FastSSV

warning Ambiguous Column Reference anti_patterns.ambiguous_column_reference anti_patterns

Detects unqualified column references (e.g., person_id instead of co.person_id) in multi-table queries where the column likely appears in multiple OMOP tables. This can lead to SQL errors or incorrect results.

What this rule detects

Columns like person_id, visit_occurrence_id, provider_id, and care_site_id appear in many OMOP tables. In a multi-table join, an unqualified reference is either a hard error (ambiguous column) or a silent bug where the parser resolves it to the first matching table — which may not be the one the author intended. Always qualify every column in a multi-table query with its table alias.

Example that trips the rule

SELECT person_id
FROM condition_occurrence co
JOIN person p ON co.person_id = p.person_id
WHERE person_id = 1;

Corrected version

SELECT co.person_id
FROM condition_occurrence co
JOIN person p ON co.person_id = p.person_id
WHERE co.person_id = 1;

error Attribute Definition Invalid Join anti_patterns.attribute_definition_invalid_join anti_patterns

attribute_definition is a legacy table with no foreign key relationships to other OMOP tables. It cannot be meaningfully joined and should only be queried standalone.

What this rule detects

attribute_definition is a legacy OMOP table with no foreign-key relationships to any other CDM table. Joining it to clinical or vocabulary tables produces either a Cartesian product or spurious matches on coincidentally-equal integer values. Query it on its own when you need attribute metadata, and drop any JOINs against it.

Example that trips the rule

SELECT *
FROM person p
JOIN attribute_definition ad ON p.person_id = ad.attribute_definition_id;

Corrected version

SELECT *
FROM attribute_definition;

error Comma-Separated Cross Join anti_patterns.comma_separated_cross_join anti_patterns

Comma-separated FROM between clinical tables with no join predicate, produces a Cartesian product.

What this rule detects

Comma-join syntax (FROM a, b) predates the explicit JOIN...ON form introduced in SQL-92 and is still common in analysts who come from SAS, SPSS, or older SQL dialects where it was idiomatic. In OMOP the mistake is rarely about performance first: even on a small dataset the cross-joined query returns row combinations that don't correspond to any real clinical event, every condition paired with every drug for every patient, so the results are semantically wrong before the query size becomes catastrophic. The natural join column between two clinical tables is almost always person_id; occasionally visit_occurrence_id when both sides sit inside the same encounter. If you genuinely want a Cartesian product (test-matrix generation, sparse-grid filling), write CROSS JOIN explicitly; this rule fires only on the implicit comma form, so an explicit CROSS JOIN documents the intent and stays silent.

Example that trips the rule

SELECT co.condition_occurrence_id, de.drug_exposure_id
FROM condition_occurrence co, drug_exposure de
WHERE co.condition_concept_id = 201820;

Corrected version

-- Fix A: explicit JOIN ... ON (preferred for readability)
SELECT co.condition_occurrence_id, de.drug_exposure_id
FROM condition_occurrence co
JOIN drug_exposure de
  ON co.person_id = de.person_id
WHERE co.condition_concept_id = 201820;

-- Fix B: keep the comma syntax, add the join predicate to WHERE
SELECT co.condition_occurrence_id, de.drug_exposure_id
FROM condition_occurrence co, drug_exposure de
WHERE co.person_id = de.person_id
  AND co.condition_concept_id = 201820;

warning Concept Ancestor Mixed with Concept Relationship Redundantly anti_patterns.concept_ancestor_mixed_with_concept_relationship_redundantly anti_patterns

concept_ancestor already includes all transitive hierarchical paths. Using concept_relationship with hierarchical filters in the same query is redundant.

What this rule detects

concept_ancestor already encodes every transitive hierarchical path (ancestor → descendant, including indirect ones). Joining concept_relationship to it for hierarchical relationships like 'Subsumes' or 'Is a' is redundant and typically changes the semantics — the combined filter returns the intersection rather than the hierarchy you intended. Use concept_ancestor for hierarchy and reserve concept_relationship for non-hierarchical links like 'Maps to', 'Has indication', 'Contains'.

Example that trips the rule

SELECT ca.descendant_concept_id
FROM concept_ancestor ca
JOIN concept_relationship cr
  ON ca.descendant_concept_id = cr.concept_id_1
WHERE ca.ancestor_concept_id = 201820
  AND cr.relationship_id = 'Subsumes';

Corrected version

SELECT ca.descendant_concept_id
FROM concept_ancestor ca
WHERE ca.ancestor_concept_id = 201820;

warning Concept Code Requires Vocabulary ID anti_patterns.concept_code_requires_vocabulary_id anti_patterns

concept_code is unique only within a vocabulary. Any filter on concept_code should include a vocabulary_id filter in the same scope to avoid ambiguous cross-vocabulary matches.

What this rule detects

concept_code values are unique only within a vocabulary: 'E11' exists in ICD10CM (diabetes), ICD10 (diabetes), potentially custom vocabularies — and the same code string can mean entirely different things. Filtering on concept_code without also filtering on vocabulary_id silently matches unintended concepts from other vocabularies. Always pair the two predicates in the same scope.

Example that trips the rule

SELECT c.concept_id
FROM concept c
WHERE c.concept_code = 'E11.9';

Corrected version

SELECT c.concept_id
FROM concept c
WHERE c.concept_code = 'E11.9'
  AND c.vocabulary_id = 'ICD10CM';

warning Concept Name Lookup Anti-pattern anti_patterns.concept_name_lookup anti_patterns

Warns when queries filter by concept_name instead of using concept_code + vocabulary_id or concept_id. Concept names are not guaranteed to be unique or stable across versions.

What this rule detects

concept_name is neither unique nor stable: multiple concepts can share a display name, and names change across vocabulary releases (capitalisation, abbreviations, rewording). Filtering by concept_name makes queries silently unreliable across sites and breaks after each vocabulary refresh. Prefer filtering by concept_code + vocabulary_id, or by concept_id directly if you have it.

Example that trips the rule

SELECT c.concept_id
FROM concept c
WHERE c.concept_name = 'Type 2 diabetes mellitus';

Corrected version

SELECT c.concept_id
FROM concept c
WHERE c.concept_code = 'E11'
  AND c.vocabulary_id = 'ICD10CM';

warning Concept Relationship Transitive Misuse anti_patterns.concept_relationship_transitive_misuse anti_patterns

concept_relationship stores only direct relationships. Chaining multiple self-joins attempts transitive closure and is fragile. Use concept_ancestor instead.

What this rule detects

concept_relationship stores only direct (one-hop) relationships. Chaining multiple self-joins on it to simulate transitive closure ('Subsumes' through three levels of hierarchy) is fragile: it only captures descendants at the exact depth you chained for, misses concepts with multiple inheritance paths, and gets slower with each additional self-join. concept_ancestor pre-computes all transitive hierarchical paths and is the correct table for 'everything beneath this concept' queries.

Example that trips the rule

SELECT cr3.concept_id_2
FROM concept_relationship cr1
JOIN concept_relationship cr2 ON cr1.concept_id_2 = cr2.concept_id_1
JOIN concept_relationship cr3 ON cr2.concept_id_2 = cr3.concept_id_1
WHERE cr1.concept_id_1 = 201820
  AND cr1.relationship_id = 'Subsumes'
  AND cr2.relationship_id = 'Subsumes'
  AND cr3.relationship_id = 'Subsumes';

Corrected version

SELECT descendant_concept_id AS concept_id
FROM concept_ancestor
WHERE ancestor_concept_id = 201820;

error Destructive Operations on Clinical Tables anti_patterns.destructive_operations_on_clinical_tables anti_patterns

Detects destructive SQL operations (DELETE, UPDATE, INSERT, TRUNCATE, DROP, ALTER) on protected clinical tables. Analytical SQL should be read-only.

What this rule detects

Analytical SQL against OMOP clinical tables should be strictly read-only. DELETE, UPDATE, INSERT, TRUNCATE, DROP, and ALTER statements against person, condition_occurrence, drug_exposure, etc. either corrupt cohort history or break reproducibility for anyone else using the same warehouse. Any data modification belongs in a governed ETL pipeline, not in a cohort-exploration query.

Example that trips the rule

DELETE FROM condition_occurrence
WHERE person_id = 1;

Corrected version

SELECT *
FROM condition_occurrence
WHERE person_id = 1;

warning Duplicate Column Alias anti_patterns.duplicate_column_alias anti_patterns

Detects when the same expression is selected multiple times with different aliases. This is usually a copy-paste error and creates redundant data in the result set.

What this rule detects

Selecting the same expression twice with different aliases is almost always the result of copy-paste during iteration. It produces result-set columns that carry the same value row-for-row but different names, doubling the output width without adding information. Remove the duplicate, or if you genuinely need the same value projected twice, make the second projection compute something different (or name them differently via a view).

Example that trips the rule

SELECT person_id AS pid_a, person_id AS pid_b
FROM person;

Corrected version

SELECT person_id
FROM person;

error Having Without Group By anti_patterns.having_without_group_by anti_patterns

HAVING clauses without GROUP BY are almost always a logic error. Add GROUP BY clause or use WHERE instead.

What this rule detects

HAVING filters aggregate groups; it is designed to run after GROUP BY collapses rows. Without a GROUP BY clause, HAVING filters over an implicit single-group aggregate, which some dialects accept with surprising semantics and others reject. In OMOP analytics this is almost always a mistake — the author wanted WHERE (for row-level filtering) or forgot to add the GROUP BY for their aggregates.

Example that trips the rule

SELECT person_id
FROM person
HAVING person_id > 1;

Corrected version

SELECT person_id
FROM person
WHERE person_id > 1;

error Join Key Validation anti_patterns.join_key_validation anti_patterns

Detects incorrect or suspicious JOIN conditions. Joining incompatible keys (e.g., person_id = concept_id) produces invalid results.

What this rule detects

OMOP uses distinct ID namespaces for different entities: person_id, visit_occurrence_id, provider_id, care_site_id, concept_id, and so on. Joining incompatible ID columns (e.g. `person.person_id = concept.concept_id`) rarely errors, because they're both integers — but every match is a numeric coincidence, not a real relationship. The result is garbage rows. Pair keys with their canonical foreign-key counterparts.

Example that trips the rule

SELECT *
FROM person p
JOIN concept c ON p.person_id = c.concept_id;

Corrected version

SELECT p.person_id, c.concept_name AS gender
FROM person p
JOIN concept c ON p.gender_concept_id = c.concept_id;

warning LIMIT Without ORDER BY anti_patterns.limit_without_order_by anti_patterns

A SELECT with LIMIT, TOP, or FETCH FIRST but no ORDER BY returns non-deterministic rows. Different engines and even successive runs may return different rows. Add ORDER BY to make the result reproducible.

What this rule detects

Most SQL engines make no guarantee about row order in the absence of an explicit ``ORDER BY`` clause. Pairing ``LIMIT N`` (or ``TOP N``, or ``FETCH FIRST N ROWS ONLY``) with no ordering produces non-deterministic results: a query that asks for "100 patients" may return a different 100 each run, depending on plan choice, parallel-execution interleaving, or storage layout. The same shape breaks pagination (page 2 may overlap page 1) and CI tests that assert on specific rows. The fix is always the same: add a stable ``ORDER BY`` (typically the table's primary key or a composite key that uniquely identifies a row) before the row-limiting clause.

Example that trips the rule

SELECT person_id, condition_concept_id
FROM condition_occurrence
LIMIT 100;

Corrected version

SELECT person_id, condition_concept_id
FROM condition_occurrence
ORDER BY condition_occurrence_id
LIMIT 100;

warning No DISTINCT on Primary Key Column anti_patterns.no_distinct_on_primary_key_column anti_patterns

Detects redundant use of DISTINCT on primary key columns, which are unique by definition. This may indicate a misunderstanding of the data or missing join conditions.

What this rule detects

DISTINCT on a primary-key column is a tautology: primary keys are unique by definition, so the de-duplication never actually removes a row. Its appearance usually signals one of two things: the author doesn't realise the column is a primary key (worth double-checking the table model), or they suspect a JOIN is causing duplicates and are papering over it with DISTINCT instead of fixing the join. Remove the DISTINCT and, if duplicates do appear, tighten the join predicate.

Example that trips the rule

SELECT DISTINCT person_id
FROM person;

Corrected version

SELECT person_id
FROM person;

error No String Identification anti_patterns.no_string_identification anti_patterns

Prevents using string matching (LIKE, =, IN) on *_source_value columns to identify clinical concepts. Use *_concept_id instead.

What this rule detects

`*_source_value` columns store the raw text pulled from the source system before OMOP mapping; they are provenance, not analytical fields. Using LIKE / equality / IN on them to find clinical concepts silently under- or over-matches as source values vary by site, clinician, and ETL version. Use the paired *_concept_id (standard) or *_source_concept_id (source-mapped) column for all concept-level filtering.

Example that trips the rule

SELECT person_id
FROM condition_occurrence
WHERE condition_source_value LIKE '%diabetes%';

Corrected version

SELECT person_id
FROM condition_occurrence
WHERE condition_concept_id = 201820;

error NULL Comparison Must Use IS NULL / IS NOT NULL anti_patterns.null_comparison_operator anti_patterns

Using comparison operators (=, <>, !=, >, <, >=, <=) with NULL is incorrect SQL. NULL comparisons always return UNKNOWN, causing logic errors. Use IS NULL or IS NOT NULL instead.

What this rule detects

Comparing a column to NULL with `=`, `<>`, `!=`, `<`, `>`, `<=`, or `>=` always evaluates to UNKNOWN, which behaves as FALSE in WHERE and HAVING. A predicate like `WHERE death_date = NULL` returns zero rows regardless of the data, silently. Use the dedicated predicates `IS NULL` and `IS NOT NULL` — they are the only correct way to test nullability in SQL.

Example that trips the rule

SELECT person_id
FROM death
WHERE death_date = NULL;

Corrected version

SELECT person_id
FROM death
WHERE death_date IS NULL;

error Singleton Metadata Joined to Clinical Table anti_patterns.singleton_metadata_clinical_join anti_patterns

Singleton CDM-instance metadata tables (cdm_source, metadata) have no foreign keys to clinical data. Joining them to clinical tables creates meaningless cartesian products.

What this rule detects

cdm_source and metadata describe the CDM instance itself (release version, ETL provenance, data characterization). Neither has a primary key or foreign keys into clinical data. Joining them to clinical tables multiplies every clinical row by the metadata row count, yielding a Cartesian product. Read these tables on their own or via a scalar subquery; never through a JOIN.

Example that trips the rule

SELECT *
FROM person p, cdm_source cs;

Corrected version

SELECT *,
       (SELECT cdm_version FROM cdm_source) AS cdm_version
FROM person;

warning Standard Concept OR with Classification anti_patterns.standard_concept_or_with_classification anti_patterns

Flags queries that mix standard ('S') and classification ('C') concepts. Classification concepts should not be used in clinical queries.

What this rule detects

Standard concepts ('S') are meant to represent specific clinical events; classification concepts ('C') are vocabulary-level grouping nodes (e.g. MedDRA categories). Mixing them in a cohort query — `standard_concept = 'S' OR standard_concept = 'C'` — inflates counts with classification nodes that don't correspond to any clinical event. Keep clinical queries to 'S'; reach for 'C' only when you're specifically exploring the vocabulary hierarchy.

Example that trips the rule

SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_id
WHERE c.standard_concept = 'S' OR c.standard_concept = 'C';

Corrected version

SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_id
WHERE c.standard_concept = 'S';

warning TOP/LIMIT for Synthetic Data Generation anti_patterns.top_as_synthetic_data anti_patterns

Detects when TOP N or LIMIT N is used with ROW_NUMBER() to generate synthetic lookup data from actual tables. This is fragile, non-portable, and depends on data distribution. Use explicit VALUES, generate_series(), or recursive CTEs instead.

What this rule detects

Using `SELECT TOP N ROW_NUMBER() OVER (...) FROM some_table` (or the `LIMIT N` equivalent) to generate the integers 1..N from an arbitrary clinical table is an anti-pattern: the result depends on the table having at least N rows, it isn't portable across dialects, and it silently breaks when the reference table changes size. Use a real constant-generator — an explicit VALUES list, generate_series() in PostgreSQL, or a recursive CTE — so the synthetic data is deterministic and dialect-agnostic.

Example that trips the rule

SELECT TOP 10 ROW_NUMBER() OVER (ORDER BY person_id) AS n
FROM person;

Corrected version

SELECT n
FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) AS t(n);

warning Type Concept ID Domain Filter anti_patterns.type_concept_id_domain_filter anti_patterns

Type concept columns (*_type_concept_id) reference concepts with domain_id = 'Type Concept'. Filtering by clinical domains returns no results.

What this rule detects

The `*_type_concept_id` columns resolve to concepts whose domain_id is 'Type Concept' — they describe record provenance (EHR, claim, patient-reported), not clinical entities. Joining `*_type_concept_id` to concept and then filtering domain_id = 'Condition' (or 'Drug', etc.) always returns zero rows because type concepts don't live in clinical domains. Use domain_id = 'Type Concept' if you need the filter, or drop the domain_id predicate entirely.

Example that trips the rule

SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.condition_type_concept_id = c.concept_id
WHERE c.domain_id = 'Condition';

Corrected version

SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.condition_type_concept_id = c.concept_id
WHERE c.domain_id = 'Type Concept';

error Type Concept ID Not For Clinical Filtering anti_patterns.type_concept_id_misuse anti_patterns

The *_type_concept_id columns represent record provenance (EHR, claim, etc.), not clinical categories. Do not use them for cohort definition or clinical filtering in WHERE/HAVING clauses. Using them in GROUP BY, SELECT, or JOIN for labeling is acceptable.

What this rule detects

`*_type_concept_id` columns carry provenance (EHR, claim, patient-reported, registry) — information about *where* the record came from, not what the record means clinically. Using them in WHERE / GROUP BY / HAVING as if they were clinical concepts yields cohorts defined by data-source type, not disease. For cohort logic use the primary `*_concept_id` column; keep `*_type_concept_id` for data-quality and provenance audits.

Example that trips the rule

SELECT person_id
FROM condition_occurrence
WHERE condition_type_concept_id = 32020;

Corrected version

SELECT person_id
FROM condition_occurrence
WHERE condition_concept_id = 201820;

error Concept Ancestor Cross-Domain Validation concept_standardization.concept_ancestor_cross_domain concept_standardization

Ensures concept_ancestor hierarchies are used within the correct domain. Hierarchies are domain-specific and do not cross domains.

What this rule detects

concept_ancestor hierarchies are built within a single domain: a Condition ancestor has only Condition descendants, a Drug ancestor has only Drug descendants, and so on. Filtering descendants by a different domain_id always yields zero rows, which is almost never what the author intended. Cross-domain lookups belong in concept_relationship (e.g. 'Has indication' from Drug to Condition), not in concept_ancestor.

Example that trips the rule

SELECT ca.descendant_concept_id
FROM concept_ancestor ca
JOIN concept c ON ca.descendant_concept_id = c.concept_id
WHERE ca.ancestor_concept_id = 201820
  AND c.domain_id = 'Drug';

Corrected version

SELECT ca.descendant_concept_id
FROM concept_ancestor ca
JOIN concept c ON ca.descendant_concept_id = c.concept_id
WHERE ca.ancestor_concept_id = 201820
  AND c.domain_id = 'Condition';

error Concept Ancestor Max Levels Misuse concept_standardization.concept_ancestor_max_levels_misuse concept_standardization

Detects incorrect usage of max_levels_of_separation for identifying direct relationships. Due to multiple hierarchy paths, exact equality on max_levels_of_separation is unreliable.

What this rule detects

A concept pair can appear multiple times in concept_ancestor via different hierarchy paths, each with its own max_levels_of_separation. Exact equality on that column only matches pairs whose longest path is exactly that value and misses pairs where a shorter path also exists. Use max_levels_of_separation <= N for depth limits, or min_levels_of_separation = 1 when you specifically want direct parent-child relationships.

Example that trips the rule

SELECT descendant_concept_id
FROM concept_ancestor
WHERE ancestor_concept_id = 201820
  AND max_levels_of_separation = 1;

Corrected version

SELECT descendant_concept_id
FROM concept_ancestor
WHERE ancestor_concept_id = 201820
  AND max_levels_of_separation <= 1;

error Concept Ancestor Rollup Direction concept_standardization.concept_ancestor_rollup_direction concept_standardization

Ensures correct usage of concept_ancestor for hierarchical rollups: join clinical concept_id to descendant_concept_id and filter on ancestor_concept_id.

What this rule detects

concept_ancestor is directional: ancestor_concept_id is the broader category (e.g. 201820 = 'Diabetes mellitus') and descendant_concept_id is every more specific concept beneath it. To roll a clinical event up to a parent concept, join the event's *_concept_id to descendant_concept_id and filter on ancestor_concept_id. Swapping the columns silently returns the wrong cohort, usually either empty or a small unrelated subset, and is a common source of hard-to-debug analytical errors.

Example that trips the rule

SELECT co.condition_occurrence_id
FROM condition_occurrence co
JOIN concept_ancestor ca
  ON co.condition_concept_id = ca.ancestor_concept_id
WHERE ca.descendant_concept_id = 201820;

Corrected version

SELECT co.condition_occurrence_id
FROM condition_occurrence co
JOIN concept_ancestor ca
  ON co.condition_concept_id = ca.descendant_concept_id
WHERE ca.ancestor_concept_id = 201820;

warning Concept Ancestor Self-Include Redundancy concept_standardization.concept_ancestor_self_include_redundancy concept_standardization

concept_ancestor includes self (min_levels_of_separation = 0). Explicit inclusion of anchor concepts causes duplication.

What this rule detects

concept_ancestor already includes every concept as its own descendant (rows where min_levels_of_separation = 0). Adding a UNION or OR that explicitly includes the anchor concept_id duplicates the anchor row in the result. The fix is to trust concept_ancestor's self-inclusion and drop the explicit branch, unless the anchor is deliberately being flagged and tracked separately (in which case use UNION ALL with an explicit marker column so the duplication is intentional).

Example that trips the rule

SELECT descendant_concept_id AS concept_id
FROM concept_ancestor
WHERE ancestor_concept_id = 201820
UNION
SELECT 201820 AS concept_id
FROM concept
WHERE concept_id = 201820;

Corrected version

SELECT descendant_concept_id AS concept_id
FROM concept_ancestor
WHERE ancestor_concept_id = 201820;

warning Concept Class ID Ingredient for Drug Grouping concept_standardization.concept_class_id_ingredient_for_drug_grouping concept_standardization

Grouping drug data by ingredient requires concept_class_id = 'Ingredient'. Using product-level classes groups at formulation level instead.

What this rule detects

Drug concepts in OMOP are stored at several granularities: Ingredient (metformin), Clinical Drug Form (metformin tablet), Clinical Drug (metformin 500 mg oral tablet), Branded Drug, and so on. Labelling a result column 'ingredient' while grouping by a more specific concept_class (like Clinical Drug) reports formulation-level counts under an ingredient name, which fragments and undercounts real ingredient usage. Either filter directly to concept_class_id = 'Ingredient', or roll products up to their ingredient via concept_ancestor.

Example that trips the rule

SELECT c.concept_name AS ingredient, COUNT(*) AS n
FROM drug_exposure de
JOIN concept c ON de.drug_concept_id = c.concept_id
WHERE c.concept_class_id = 'Clinical Drug'
GROUP BY c.concept_name;

Corrected version

SELECT ing.concept_name AS ingredient, COUNT(*) AS n
FROM drug_exposure de
JOIN concept_ancestor ca ON de.drug_concept_id = ca.descendant_concept_id
JOIN concept ing ON ca.ancestor_concept_id = ing.concept_id
WHERE ing.concept_class_id = 'Ingredient'
GROUP BY ing.concept_name;

warning Concept Domain ID Matches Target Table concept_standardization.concept_domain_validation concept_standardization

Validates that concept.domain_id matches the expected domain for each *_concept_id column.

What this rule detects

Each OMOP *_concept_id column has a canonical domain_id: condition_occurrence.condition_concept_id is always 'Condition', drug_exposure.drug_concept_id is always 'Drug', and so on. Filtering a concept join with the wrong domain (e.g. c.domain_id = 'Drug' while joining on condition_concept_id) silently returns zero rows. The fix is to match the domain to the column.

Example that trips the rule

SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_id
WHERE c.domain_id = 'Drug';

Corrected version

SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_id
WHERE c.domain_id = 'Condition';

warning Concept Synonym Language Concept ID concept_standardization.concept_synonym_language_concept_id concept_standardization

concept_synonym stores synonyms in multiple languages. When searching by concept_synonym_name, filter by language_concept_id.

What this rule detects

The concept_synonym table stores synonym names in many languages: English (language_concept_id = 4180186), Spanish, German, French, Dutch, etc. Searching concept_synonym_name with a free-text match (LIKE / equality) without filtering by language inadvertently returns concepts whose non-English synonyms contain the query string, which is almost never the intent in English-speaking deployments. Add an explicit language filter, or use IN to allow a deliberate multilingual scope.

Example that trips the rule

SELECT cs.concept_id
FROM concept_synonym cs
WHERE cs.concept_synonym_name LIKE '%diabetes%';

Corrected version

SELECT cs.concept_id
FROM concept_synonym cs
WHERE cs.concept_synonym_name LIKE '%diabetes%'
  AND cs.language_concept_id = 4180186;  -- English

warning Domain Vocabulary Validation (VOCAB_022-025) concept_standardization.domain_vocabulary_validation concept_standardization

Standard *_concept_id columns should align with domain-specific standard vocabularies. Filtering by source vocabularies is likely incorrect.

What this rule detects

Each OMOP domain has a canonical standard vocabulary: Condition → SNOMED, Drug → RxNorm/RxNorm Extension, Procedure → SNOMED / CPT4 / HCPCS depending on site, Measurement → LOINC, Unit → UCUM. Filtering a standard *_concept_id column by a *source* vocabulary like ICD10CM, ICD9CM, or NDC returns zero rows because those codes live on the source side. Use the domain's standard vocabulary, or switch to the *_source_concept_id column if you genuinely want to filter on the originating code system.

Example that trips the rule

SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_id
WHERE c.vocabulary_id = 'ICD10CM';

Corrected version

SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_id
WHERE c.vocabulary_id = 'SNOMED';

error Era Tables Use Standard Concepts Only concept_standardization.era_table_standard_concepts concept_standardization

Era tables contain only standard concepts. Filtering for non-standard concepts will always return 0 rows.

What this rule detects

OMOP era tables (condition_era, drug_era, dose_era) are derived from their underlying occurrence tables by rolling up standard concepts over contiguous time windows. By construction every row references a standard concept, so filtering them with standard_concept IN ('C', NULL) or <> 'S' always returns zero rows. Either drop the filter entirely or filter for 'S' explicitly to make the intent readable.

Example that trips the rule

SELECT de.person_id
FROM drug_era de
JOIN concept c ON de.drug_concept_id = c.concept_id
WHERE c.standard_concept = 'C';

Corrected version

SELECT de.person_id
FROM drug_era de
JOIN concept c ON de.drug_concept_id = c.concept_id
WHERE c.standard_concept = 'S';

warning Invalid Reason Enforcement (strict mode) concept_standardization.invalid_reason_enforcement concept_standardization

Strict-mode-only. When enabled, ensures queries on vocabulary/concept tables filter by invalid_reason so retired or upgraded concepts are not silently included. Silent in default mode.

What this rule detects

Concept tables (concept, concept_relationship, concept_ancestor) carry an invalid_reason column that marks retired or replaced entries: 'D' for deprecated, 'U' for upgraded to a newer concept, NULL for currently valid. Queries that omit an `invalid_reason IS NULL` predicate silently include retired rows, which can pull in concept_ids that your site no longer records or that map onward to a better successor. **Gated behind strict mode.** Real-world OMOP queries on the concept table almost always omit this filter — firing in default mode produced a warning on essentially every realistic query and diluted the signal of every other rule. Enable with ``--strict`` (CLI) or ``strict=True`` (API) when you want the vocabulary-hygiene check applied; the rule is silent otherwise.

Example that trips the rule

SELECT concept_id, concept_name
FROM concept
WHERE vocabulary_id = 'SNOMED';

Corrected version

SELECT concept_id, concept_name
FROM concept
WHERE vocabulary_id = 'SNOMED'
  AND invalid_reason IS NULL;

warning Maps To Target Standard Validation concept_standardization.maps_to_target_standard_validation concept_standardization

When using concept_relationship with relationship_id = 'Maps to', the target (concept_id_2) should be validated as a standard concept via a join to concept table with standard_concept = 'S'. Without this, queries may return deprecated or intermediate non-standard concepts. This is a best practice recommendation - the query will execute correctly but may include non-standard targets.

What this rule detects

concept_relationship with relationship_id = 'Maps to' is the canonical mapping from source codes (ICD10CM, NDC, etc.) to OMOP standard concepts. But the relationship table also contains historical and intermediate mappings, so concept_id_2 is not guaranteed to be standard just because the relationship is 'Maps to'. Without a join to concept with standard_concept = 'S', queries can pick up deprecated or non-standard targets that break downstream cohort logic.

Example that trips the rule

SELECT cr.concept_id_2 AS target_concept_id
FROM concept_relationship cr
WHERE cr.concept_id_1 = 44831230
  AND cr.relationship_id = 'Maps to';

Corrected version

SELECT cr.concept_id_2 AS target_concept_id
FROM concept_relationship cr
JOIN concept c ON cr.concept_id_2 = c.concept_id
WHERE cr.concept_id_1 = 44831230
  AND cr.relationship_id = 'Maps to'
  AND c.standard_concept = 'S';

warning Multiple Maps To Targets Not Handled concept_standardization.multiple_maps_to_targets concept_standardization

A source concept can map to multiple standard concepts via 'Maps to'. Queries assuming 1:1 mapping may produce incorrect results.

What this rule detects

Not every source concept maps 1:1 to a standard concept: one ICD-10 code can split into multiple SNOMED concepts, and one NDC can fan out to several RxNorm ingredients. Querying concept_relationship for 'Maps to' without DISTINCT (or a GROUP BY / aggregate handling the fan-out) duplicates source rows whenever a one-to-many mapping exists, inflating downstream counts. Decide deliberately: DISTINCT to collapse, GROUP BY with ARRAY_AGG to preserve all targets, or a subquery that picks one target per source.

Example that trips the rule

SELECT cr.concept_id_1, cr.concept_id_2
FROM concept_relationship cr
WHERE cr.relationship_id = 'Maps to';

Corrected version

SELECT DISTINCT cr.concept_id_1, cr.concept_id_2
FROM concept_relationship cr
WHERE cr.relationship_id = 'Maps to';

warning Source Concept ID Should Not Filter Standard Concepts concept_standardization.source_concept_id_standard_filter concept_standardization

Joining *_source_concept_id to concept with standard_concept = 'S' is semantically incorrect. Source concepts are non-standard.

What this rule detects

Every OMOP clinical table has two concept columns: *_concept_id (the standard vocabulary mapping, e.g. SNOMED for conditions) and *_source_concept_id (the code from the originating system, e.g. ICD10CM). Source concepts are by definition non-standard, so joining *_source_concept_id to concept and then filtering standard_concept = 'S' always returns zero rows. Either join against *_concept_id (for standard-code analytics) or drop the standard filter (to inspect source codes).

Example that trips the rule

SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.condition_source_concept_id = c.concept_id
WHERE c.standard_concept = 'S';

Corrected version

SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_id
WHERE c.standard_concept = 'S';

warning Source Concept ID Not For Analytical Filtering concept_standardization.source_concept_id_warning concept_standardization

Avoid using *_source_concept_id for cohort definition or analytical filtering. Use standard *_concept_id instead.

What this rule detects

*_source_concept_id stores the original, unmapped code from the source system (ICD10CM, NDC, CPT, etc.) and is intended for audit and ETL provenance, not for cohort analytics. Filtering cohorts on source concepts produces results that do not federate across sites (each warehouse ETL maps slightly differently) and misses patients whose underlying data was mapped from a different source vocabulary. Use the paired *_concept_id (standard) column for any cohort or analytical query.

Example that trips the rule

SELECT person_id
FROM condition_occurrence
WHERE condition_source_concept_id = 44831230;

Corrected version

SELECT person_id
FROM condition_occurrence
WHERE condition_concept_id = 201820;

warning Source to Concept Map Validation concept_standardization.source_to_concept_map_validation concept_standardization

Requires source_vocabulary_id when filtering source_code to avoid ambiguity across vocabularies.

What this rule detects

source_to_concept_map is the per-site translation table from source codes to OMOP standard concepts. Source codes are not globally unique: 'R51' can exist in ICD10CM (headache), ICD9CM, and some local billing vocabularies. Filtering on source_code alone can pick up matches from unrelated vocabularies, returning incorrect target_concept_ids. Always pair source_code with source_vocabulary_id so the match is unambiguous.

Example that trips the rule

SELECT target_concept_id
FROM source_to_concept_map
WHERE source_code = 'R51';

Corrected version

SELECT target_concept_id
FROM source_to_concept_map
WHERE source_code = 'R51'
  AND source_vocabulary_id = 'ICD10CM';

warning Standard Concept Enforcement concept_standardization.standard_concept_enforcement concept_standardization

Ensures queries using STANDARD concept fields enforce standard concepts via concept.standard_concept = 'S' or concept_relationship 'Maps to'

What this rule detects

Standard OMOP *_concept_id columns can point to non-standard or deprecated concepts unless the query explicitly enforces standard_concept = 'S'. Without that filter, cohort queries silently mix in classification-only concepts ('C'), invalid entries, or legacy mappings that never should have persisted, producing over-counts or non-reproducible results across sites. Era tables (condition_era, drug_era) and a handful of other columns are already guaranteed-standard by spec and are excluded from this rule.

Example that trips the rule

SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_id
WHERE c.vocabulary_id = 'SNOMED';

Corrected version

SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_id
WHERE c.vocabulary_id = 'SNOMED'
  AND c.standard_concept = 'S';

error Standard Concept Value Validation concept_standardization.standard_concept_value_validation concept_standardization

Ensures standard_concept uses only valid values: 'S', 'C', or NULL.

What this rule detects

The concept.standard_concept column has exactly three valid values: 'S' (standard), 'C' (classification), and NULL (neither, typically source vocabularies). Any other literal, like 'X', 'Y' or an empty string, matches no rows and is almost always a typo from a developer who remembered there was some letter value. The fix is usually 'S'; reach for 'C' only when you specifically want classification-level concepts such as MedDRA grouping.

Example that trips the rule

SELECT concept_id
FROM concept
WHERE standard_concept = 'X';

Corrected version

SELECT concept_id
FROM concept
WHERE standard_concept = 'S';

warning Unit Vocabulary Validation concept_standardization.unit_vocabulary_validation concept_standardization

Standard unit concepts use vocabulary_id = 'UCUM'. Filtering unit_concept_id with other vocabulary IDs returns non-standard units or zero results.

What this rule detects

OMOP unit concepts live in the UCUM vocabulary (Unified Code for Units of Measure). Filtering unit_concept_id joins with vocabulary_id = 'LOINC' (or any other non-UCUM vocabulary) returns zero rows because units are not stored there. Use UCUM for unit lookups, or drop the vocabulary filter if the join already constrains to unit_concept_id.

Example that trips the rule

SELECT m.person_id
FROM measurement m
JOIN concept c ON m.unit_concept_id = c.concept_id
WHERE c.vocabulary_id = 'LOINC';

Corrected version

SELECT m.person_id
FROM measurement m
JOIN concept c ON m.unit_concept_id = c.concept_id
WHERE c.vocabulary_id = 'UCUM';

error Canonical Vocabulary String Value Validation data_quality.canonical_string_value_validation data_quality

Filters on concept_class_id, domain_id, and vocabulary_id must use canonical OMOP casing (e.g. 'Ingredient', 'Condition', 'SNOMED'). Case-sensitive comparison silently returns zero rows when the casing is wrong; vocabulary_id values must also be hyphen-free.

What this rule detects

Three OMOP vocabulary string columns are case-sensitive and follow a fixed canonical casing: concept_class_id ('Ingredient', 'Clinical Drug', …), domain_id ('Condition', 'Drug', …), and vocabulary_id ('SNOMED', 'RxNorm', 'ICD10CM', …). Filtering with the wrong casing ('ingredient', 'condition', 'snomed') silently returns zero rows. vocabulary_id values are additionally hyphen-free in OMOP — 'ICD-10-CM' is invalid; the canonical form is 'ICD10CM'. Match the canonical form exactly.

Example that trips the rule

SELECT concept_id
FROM concept
WHERE vocabulary_id = 'snomed';

Corrected version

SELECT concept_id
FROM concept
WHERE vocabulary_id = 'SNOMED';

warning Clinical Event Date Should Not Be Before 1900 data_quality.clinical_event_date_before_1900_validation data_quality

Detects filtering logic that targets implausible historical dates (<1900) in clinical event tables.

What this rule detects

Dates before 1900 in OMOP clinical tables are almost always placeholders for missing or unparseable source dates, not real clinical events. Filtering for these dates typically surfaces data-quality artefacts rather than clinical signal. Restrict queries to >= 1900-01-01 unless the intent is explicitly a data-quality audit of the placeholder rows.

Example that trips the rule

SELECT *
FROM condition_occurrence
WHERE condition_start_date < '1900-01-01';

Corrected version

SELECT *
FROM condition_occurrence
WHERE condition_start_date >= '1900-01-01';

error Column Type Validation (SCHEMA Layer) data_quality.column_type_validation data_quality

Ensures compatible data types in JOIN conditions and WHERE filters. Detects mismatches such as integer-to-varchar joins or invalid literal comparisons. Type mismatches are schema-level errors that cause query failures or wrong results.

What this rule detects

OMOP columns have specific types: *_concept_id and primary-key *_id columns are INTEGER, source_value and name columns are VARCHAR. Joining or comparing across incompatible types (e.g. INTEGER = VARCHAR) forces implicit casting and in some dialects silently returns zero rows or raises a type error. Use compatible columns or CAST explicitly when you need cross-type comparison.

Example that trips the rule

SELECT *
FROM person p
JOIN condition_occurrence co ON p.person_id = co.condition_source_value;

Corrected version

SELECT *
FROM person p
JOIN condition_occurrence co ON p.person_id = co.person_id;

warning Concept ID String Comparison data_quality.concept_id_string_comparison data_quality

All *_concept_id columns are INTEGER type. Comparing them with string literals requires implicit casting and may lead to incorrect results.

What this rule detects

Every *_concept_id column in OMOP is an INTEGER. Comparing it to a string literal forces the database to implicitly cast one side per row. In PostgreSQL this usually works but bypasses the concept_id index; in SQL Server and BigQuery it can raise a conversion error or, worse, silently evaluate to FALSE for every row. Writing the literal as a plain integer keeps index usage intact and the semantics identical across every supported dialect.

Example that trips the rule

SELECT person_id
FROM person
WHERE gender_concept_id = '8532';

Corrected version

SELECT person_id
FROM person
WHERE gender_concept_id = 8532;

warning Concept Name Whitespace data_quality.concept_name_whitespace data_quality

concept_name values may contain trailing whitespace. Exact equality without TRIM may silently fail.

What this rule detects

concept.concept_name values are sometimes loaded with trailing whitespace or non-breaking characters from vocabulary source files. Exact equality matching against a trimmed literal can silently fail against those rows. Use TRIM/RTRIM on the column side, or use LIKE with explicit wildcards, so the query is resilient to whitespace artefacts.

Example that trips the rule

SELECT concept_id
FROM concept
WHERE concept_name = 'Type 2 diabetes mellitus';

Corrected version

SELECT concept_id
FROM concept
WHERE TRIM(concept_name) = 'Type 2 diabetes mellitus';

error Episode Requires Concept Filter data_quality.episode_requires_concept_filter data_quality

Episode queries should filter by episode_concept_id to ensure semantic clarity and optimal query performance.

What this rule detects

The episode table aggregates care over time (disease episodes, treatment episodes) across many episode types identified by episode_concept_id. Querying episode without a concept filter returns a mixed bag of unrelated episode types, which is rarely the intent. Add an episode_concept_id predicate or join to concept with an explicit filter to select a single episode type.

Example that trips the rule

SELECT person_id
FROM episode;

Corrected version

SELECT person_id
FROM episode
WHERE episode_concept_id = 32528;

warning Fact Relationship No Self-Reference data_quality.fact_relationship_no_self_reference data_quality

Detects patterns where fact_relationship queries filter for self-referential relationships (fact_id_1 = fact_id_2). Self-linking relationships should be rare and may indicate data quality issues or incorrect query logic.

What this rule detects

fact_relationship links two facts together; a row where fact_id_1 = fact_id_2 represents a fact related to itself, which is almost never a real clinical relationship. The predicate `WHERE fact_id_1 = fact_id_2` typically comes from a copy-paste error where an inequality was flipped. Use `<>` if you want distinct facts, and add a relationship_concept_id filter so the query picks out one specific relationship type.

Example that trips the rule

SELECT fact_id_1
FROM fact_relationship
WHERE fact_id_1 = fact_id_2;

Corrected version

SELECT fact_id_1, fact_id_2
FROM fact_relationship
WHERE fact_id_1 <> fact_id_2
  AND relationship_concept_id = 44818859;

error Fact Relationship Requires Relationship Concept Filter data_quality.fact_relationship_requires_relationship_concept_filter data_quality

Fact relationship queries should filter by relationship_concept_id to ensure semantic clarity and optimal query performance.

What this rule detects

fact_relationship encodes every kind of linkage between OMOP facts — visits to procedures, measurements to their ordering visits, parent-child condition episodes, etc. — each identified by relationship_concept_id. Querying fact_relationship without that filter combines unrelated link types into one result set. Always restrict to the specific relationship you need.

Example that trips the rule

SELECT fact_id_1, fact_id_2
FROM fact_relationship;

Corrected version

SELECT fact_id_1, fact_id_2
FROM fact_relationship
WHERE relationship_concept_id = 44818859;

warning Fact Relationship Valid Concepts data_quality.fact_relationship_valid_concepts data_quality

Ensures that when fact_relationship joins to the concept table (for domain_concept_id_1, domain_concept_id_2, or relationship_concept_id), the query filters by invalid_reason to ensure only valid concepts are used.

What this rule detects

When fact_relationship is joined to concept (via relationship_concept_id, domain_concept_id_1, or domain_concept_id_2), the concept side can include deprecated concepts unless filtered. That means historical or retired relationship types sneak into the result. Add an invalid_reason IS NULL predicate on each concept alias so deprecated concepts are excluded.

Example that trips the rule

SELECT fr.fact_id_1
FROM fact_relationship fr
JOIN concept c ON fr.relationship_concept_id = c.concept_id;

Corrected version

SELECT fr.fact_id_1
FROM fact_relationship fr
JOIN concept c ON fr.relationship_concept_id = c.concept_id
WHERE c.invalid_reason IS NULL;

error Free-Text Column Misuse data_quality.free_text_column_misuse data_quality

Detects when a free-text VARCHAR column (e.g. condition_occurrence.stop_reason, drug_exposure.lot_number, note_nlp.term_modifiers, location.state, location.zip) is joined to the concept table or compared to a numeric literal. These columns have no concept mapping and no numeric semantics.

What this rule detects

Several OMOP CDM tables expose VARCHAR columns that store free-text metadata: condition_occurrence.stop_reason, drug_exposure.lot_number, note_nlp.term_modifiers, location.state, location.zip. These have no mapping into the concept table and no numeric semantics. Joining them to concept (on concept_name, concept_code, or any concept column) yields zero or coincidental matches; comparing them numerically forces an implicit cast that quietly returns no rows. Use these columns only for text filtering/display, and use the row's standard *_concept_id column for concept lookups.

Example that trips the rule

SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.stop_reason = c.concept_name;

Corrected version

SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_id;

error Incorrect Percentile Calculation data_quality.incorrect_percentile_calculation data_quality

Detects the copy-paste bug where percentile_25, median, and percentile_75 (or similarly-named percentile columns) all reuse the same threshold value in a CASE WHEN order_nr < threshold * population_size pattern, silently producing identical results across all percentile columns.

What this rule detects

Hand-rolled percentile calculations using ROW_NUMBER() + population_size + CASE WHEN are easy to copy-paste and easy to get wrong: the classic bug is percentile_25, median, and percentile_75 all re-using the 0.25 threshold, so three columns report the same number. If you spot three aliased percentile columns sharing a threshold, either set them to 0.25/0.50/0.75 respectively or replace the manual logic with NTILE(4) or PERCENTILE_CONT() from the SQL standard.

Example that trips the rule

SELECT
  MAX(CASE WHEN order_nr < 25 * population_size THEN value END) AS percentile_25,
  MAX(CASE WHEN order_nr < 25 * population_size THEN value END) AS median,
  MAX(CASE WHEN order_nr < 25 * population_size THEN value END) AS percentile_75
FROM (
  SELECT value,
         ROW_NUMBER() OVER (ORDER BY value) AS order_nr,
         COUNT(*) OVER () AS population_size
  FROM measurement
) t;

Corrected version

SELECT
  MAX(CASE WHEN order_nr < 0.25 * population_size THEN value END) AS percentile_25,
  MAX(CASE WHEN order_nr < 0.50 * population_size THEN value END) AS median,
  MAX(CASE WHEN order_nr < 0.75 * population_size THEN value END) AS percentile_75
FROM (
  SELECT value,
         ROW_NUMBER() OVER (ORDER BY value) AS order_nr,
         COUNT(*) OVER () AS population_size
  FROM measurement
) t;

error Negative Concept ID Validation data_quality.negative_concept_id_validation data_quality

Concept IDs must be non-negative integers (>= 0).

What this rule detects

Every concept_id in OMOP is a non-negative integer (>= 0); 0 represents unmapped records, positive values are real concepts. Negative literals never match any concept and are usually the result of either a sign error or a confusion with ROW_NUMBER-style synthetic IDs from another system. Replace the literal with the correct non-negative concept_id.

Example that trips the rule

SELECT *
FROM condition_occurrence
WHERE condition_concept_id = -1;

Corrected version

SELECT *
FROM condition_occurrence
WHERE condition_concept_id = 201820;

warning Non-Standard Date Literal Format data_quality.non_standard_date_literal_format data_quality

Detects date literals in non-standard formats (e.g., '01-jan-2011', '12/31/2020') that may be ambiguous or not portable across databases. ISO 8601 format (YYYY-MM-DD) is recommended for clarity and portability.

What this rule detects

Date literals like '01/31/2020', '31-Jan-2020', or '01-jan-2011' are ambiguous: '01/02/2020' means January 2 in the US and February 1 in Europe, and engines disagree about which to pick. ISO 8601 (YYYY-MM-DD) is unambiguous and portable across every supported dialect. Rewrite literals to the ISO form, optionally wrapping in DATE '...' to make the type explicit.

Example that trips the rule

SELECT *
FROM condition_occurrence
WHERE condition_start_date = '01/31/2020';

Corrected version

SELECT *
FROM condition_occurrence
WHERE condition_start_date = DATE '2020-01-31';

warning Note NLP nlp_date for Temporal Filtering data_quality.note_nlp_nlp_date_for_temporal_filtering data_quality

note_nlp.nlp_date is the NLP processing time, not the clinical event date. Use note.note_date for temporal filtering.

What this rule detects

note_nlp.nlp_date records when the NLP pipeline *processed* the source note, which can be months or years after the clinical event itself. For temporal cohort logic you need the clinical date, not the processing date — that lives on note.note_date. Join note_nlp back to note and filter on note_date instead.

Example that trips the rule

SELECT note_nlp_id
FROM note_nlp
WHERE nlp_date >= DATE '2023-01-01';

Corrected version

SELECT nnlp.note_nlp_id
FROM note_nlp nnlp
JOIN note n ON nnlp.note_id = n.note_id
WHERE n.note_date >= DATE '2023-01-01';

warning Note NLP Offset is Character Position data_quality.note_nlp_offset_is_character_position data_quality

note_nlp.offset is VARCHAR storing character positions and must not be used in joins or numeric operations without explicit casting.

What this rule detects

note_nlp.offset is a VARCHAR column that stores the character position of the NLP-extracted term within the note text. Despite the numeric content, the column type is string — using it in a numeric comparison or JOIN without an explicit CAST leads to implicit-cast errors on strict dialects and silent mis-sorts on lenient ones. CAST to INT when you need numeric semantics.

Example that trips the rule

SELECT *
FROM note_nlp
WHERE offset > 100;

Corrected version

SELECT *
FROM note_nlp
WHERE CAST(offset AS INT) > 100;

error OMOP Schema Validation data_quality.schema_validation data_quality

Validates that all referenced tables and columns exist in OMOP CDM 5.4 schema. Schema violations indicate queries that will fail at runtime or produce incorrect results. Only validates physical table references - excludes CTEs, subqueries, and derived expressions.

What this rule detects

Every table and column referenced in the query must exist in the OMOP CDM 5.4 specification. This rule catches typos (e.g. 'cohort_result' instead of 'cohort'), tables from other schemas or vocabulary extensions that aren't part of CDM 5.4, and non-existent columns on otherwise-valid tables. It operates only on physical references; CTEs, subquery aliases, and computed expressions are deliberately ignored so compound queries don't raise false positives.

Example that trips the rule

SELECT person_id, cohort_start_date
FROM cohort_result
WHERE cohort_definition_id = 1;

Corrected version

SELECT condition_occurrence_id, person_id, condition_start_date
FROM condition_occurrence
WHERE condition_concept_id = 201820;

warning Source Value Field Usage data_quality.source_value_field_usage data_quality

Warns when *_source_value fields (unstandardized source codes) are used for GROUP BY or analytical aggregation instead of standard concept fields

What this rule detects

*_source_value columns store the raw, unstandardised source strings from the originating system and are meant for audit or provenance. Aggregating or grouping by source_value produces results that aren't comparable across sites (different ETLs map differently) and won't federate in multi-site studies. Aggregate on the paired *_concept_id for portability.

Example that trips the rule

SELECT condition_source_value, COUNT(*) AS n
FROM condition_occurrence
GROUP BY condition_source_value;

Corrected version

SELECT condition_concept_id, COUNT(*) AS n
FROM condition_occurrence
GROUP BY condition_concept_id;

warning Standard Concept NULL Handling data_quality.standard_concept_null_handling data_quality

Ensures correct handling of concept.standard_concept values. Non-standard concepts are represented by NULL, not 'N' or ''. Invalid comparisons may return zero rows.

What this rule detects

In OMOP vocabulary, non-standard concepts are represented by standard_concept IS NULL — not 'N', not an empty string, not 'n'. A filter like `standard_concept = 'N'` returns zero rows because no concept actually carries that literal value. To select non-standard concepts, use `IS NULL`; to select standard or classification, use the explicit literals 'S' or 'C'.

Example that trips the rule

SELECT concept_id
FROM concept
WHERE standard_concept = 'N';

Corrected version

SELECT concept_id
FROM concept
WHERE standard_concept IS NULL;

warning Union Concept ID Domain Indicator data_quality.union_concept_id_domain_indicator data_quality

UNION queries combining concept_id values from multiple domains must include a domain indicator column to avoid ambiguity.

What this rule detects

UNION-ing concept_id columns from multiple domains into a single output column loses the semantic difference — the downstream consumer cannot tell whether a given concept_id came from the Condition or Drug side. Always add an explicit domain-indicator column so each row's origin is preserved.

Example that trips the rule

SELECT condition_concept_id AS concept_id FROM condition_occurrence
UNION
SELECT drug_concept_id AS concept_id FROM drug_exposure;

Corrected version

SELECT 'Condition' AS domain, condition_concept_id AS concept_id FROM condition_occurrence
UNION ALL
SELECT 'Drug'      AS domain, drug_concept_id      AS concept_id FROM drug_exposure;

warning UNION vs UNION ALL for Clinical Events data_quality.union_vs_union_all_clinical_events data_quality

Detects UNION (without ALL) when combining clinical event data. UNION removes duplicates, but identical-looking rows may represent distinct clinical events. Use UNION ALL to preserve all events.

What this rule detects

UNION de-duplicates rows; UNION ALL preserves them. Two distinct clinical events can share identical values on the selected columns (same person, same date, different visit) — de-duplicating them via UNION silently drops legitimate events and under-counts the cohort. Use UNION ALL for clinical-event combinations; use UNION only when you genuinely want to collapse duplicates and can justify why.

Example that trips the rule

SELECT person_id, condition_start_date AS event_date
FROM condition_occurrence
UNION
SELECT person_id, drug_exposure_start_date
FROM drug_exposure;

Corrected version

SELECT person_id, condition_start_date AS event_date
FROM condition_occurrence
UNION ALL
SELECT person_id, drug_exposure_start_date
FROM drug_exposure;

warning Unmapped Concept Handling data_quality.unmapped_concept_handling data_quality

Warns when filtering clinical tables by specific *_concept_id values without explicitly handling concept_id = 0 (unmapped records)

A longer write-up and example for this rule are still being authored.

error Vocabulary Table Protection data_quality.vocabulary_table_protection data_quality

Prevents DELETE, UPDATE, INSERT, TRUNCATE, MERGE, or DROP TABLE operations on OMOP vocabulary tables. These are reference datasets managed by OHDSI and must remain read-only in analytical workflows.

What this rule detects

OMOP vocabulary tables (concept, concept_ancestor, concept_relationship, vocabulary, domain, concept_class) are reference data distributed and versioned by OHDSI. Any local modification breaks reproducibility, invalidates cross-site comparisons, and typically conflicts with the next vocabulary release. Treat them as strictly read-only; vocabulary updates must be performed via official OHDSI releases.

Example that trips the rule

DELETE FROM concept
WHERE concept_id = 0;

Corrected version

SELECT concept_id
FROM concept
WHERE concept_id = 0;

error CDM v5.3 to v5.4 Column Renames domain_specific.cdm_v53_to_v54_column_renames domain_specific

Detects usage of deprecated OMOP CDM v5.3 column names that were renamed in v5.4.

Example that trips the rule

SELECT visit_occurrence_id, admitting_source_concept_id
FROM visit_occurrence;

Corrected version

SELECT visit_occurrence_id, admitted_from_concept_id
FROM visit_occurrence;

error Cohort Definition Syntax Not Executable SQL domain_specific.cohort_definition_syntax_not_executable_sql domain_specific

cohort_definition_syntax stores cohort definition metadata (JSON/OHDSI format), not executable SQL. Do not filter it using SQL keywords or OMOP table names.

Example that trips the rule

SELECT cohort_definition_id FROM cohort_definition
WHERE cohort_definition_syntax LIKE '%SELECT%';

Corrected version

SELECT cohort_definition_id, cohort_definition_syntax
FROM cohort_definition;

warning Condition Occurrence Cardinality Risk domain_specific.condition_occurrence_cardinality_validation domain_specific

Joining person to condition_occurrence without aggregation can produce multiple rows per person. This may lead to incorrect counts if a person has multiple condition records.

Example that trips the rule

SELECT p.person_id, co.condition_concept_id
FROM person p
JOIN condition_occurrence co ON p.person_id = co.person_id;

Corrected version

SELECT p.person_id, COUNT(DISTINCT co.condition_concept_id) AS n_conditions
FROM person p
JOIN condition_occurrence co ON p.person_id = co.person_id
GROUP BY p.person_id;

error Condition Occurrence Visit Hierarchy Validation domain_specific.condition_visit_hierarchy_validation domain_specific

When condition_occurrence joins to visit_detail, any reference to visit_occurrence columns requires a proper join to visit_occurrence through visit_detail using visit_occurrence_id.

Example that trips the rule

SELECT co.condition_concept_id, vo.visit_start_date
FROM condition_occurrence co
JOIN visit_detail vd ON co.visit_detail_id = vd.visit_detail_id,
     visit_occurrence vo;

Corrected version

SELECT co.condition_concept_id, vo.visit_start_date
FROM condition_occurrence co
JOIN visit_detail vd ON co.visit_detail_id = vd.visit_detail_id
JOIN visit_occurrence vo ON vd.visit_occurrence_id = vo.visit_occurrence_id;

warning Cost Currency Concept ID For Multi-Currency domain_specific.cost_currency_concept_id domain_specific

Aggregating cost amount columns (total_paid, total_charge, etc.) without filtering or grouping by currency_concept_id mixes records from different currencies, producing incorrect financial totals.

What this rule detects

Records in the cost table carry a currency_concept_id indicating which currency their amounts are denominated in. Summing total_paid or total_charge across mixed currencies, without filtering or grouping by currency_concept_id, produces a meaningless total (literally GBP + USD + EUR). Either restrict the query to a single currency, or aggregate per currency so downstream code can convert before rolling the figures up further.

Example that trips the rule

SELECT SUM(total_paid) AS paid_total
FROM cost;

Corrected version

SELECT SUM(total_paid) AS paid_total
FROM cost
WHERE currency_concept_id = 44818668;  -- US Dollar

error Cost Event ID Polymorphic Resolution domain_specific.cost_event_id_polymorphic_resolution domain_specific

cost.cost_event_id is a polymorphic FK whose target table is identified by cost.cost_domain_id. Queries that join or filter on cost_event_id without a cost_domain_id restriction mix IDs from disjoint sequences.

What this rule detects

cost.cost_event_id has no single foreign-key target; it points into a different table depending on cost.cost_domain_id. With cost_domain_id = 'Drug' it points into drug_exposure; with 'Visit' it points into visit_occurrence; and so on. Joining cost_event_id without filtering cost_domain_id either matches nothing (the value lives in a different sequence than the joined table's PK) or matches by coincidence (two unrelated tables happen to use overlapping integer ranges). Either outcome is a silent bug. The mirror rule for location_history.entity_id requires the same discipline.

Example that trips the rule

SELECT c.cost_id, de.drug_concept_id
FROM cost c
JOIN drug_exposure de ON c.cost_event_id = de.drug_exposure_id;

Corrected version

SELECT c.cost_id, de.drug_concept_id
FROM cost c
JOIN drug_exposure de ON c.cost_event_id = de.drug_exposure_id
WHERE c.cost_domain_id = 'Drug';

warning Cost Drug-Specific Columns Require Domain Filter domain_specific.cost_paid_ingredient_cost_drug_specific domain_specific

cost.paid_ingredient_cost and cost.paid_dispensing_fee are pharmacy-specific columns that are NULL or meaningless for non-drug costs. Filter by cost_domain_id = 'Drug' when using these columns.

Example that trips the rule

SELECT cost_id, paid_ingredient_cost FROM cost;

Corrected version

SELECT cost_id, paid_ingredient_cost FROM cost
WHERE cost_domain_id = 'Drug';

error Cost Payer Plan Period ID Join domain_specific.cost_payer_plan_period_id_join domain_specific

cost.payer_plan_period_id is a FK to payer_plan_period.payer_plan_period_id. Joins must use this column pair.

Example that trips the rule

SELECT c.cost_id FROM cost c
JOIN payer_plan_period p ON c.payer_plan_period_id = p.person_id;

Corrected version

SELECT c.cost_id FROM cost c
JOIN payer_plan_period p ON c.payer_plan_period_id = p.payer_plan_period_id;

error Death Cause Source Concept Not For Analytical Filtering domain_specific.death_cause_source_concept_validation domain_specific

Avoid using death.cause_source_concept_id for analytical filtering. Use death.cause_concept_id instead.

Example that trips the rule

SELECT person_id FROM death
WHERE cause_source_concept_id = 4316491;

Corrected version

SELECT person_id FROM death
WHERE cause_concept_id = 4316491;

error Death Join to Person Not to Clinical Event domain_specific.death_join_to_person_not_to_clinical_event domain_specific

The death table joins to person on person_id and has no foreign keys to clinical event tables. Joins between death and clinical tables must include person_id.

Example that trips the rule

SELECT d.person_id FROM death d
JOIN condition_occurrence co ON d.death_date = co.condition_start_date;

Corrected version

SELECT d.person_id FROM death d
JOIN condition_occurrence co ON d.person_id = co.person_id;

warning Dose Era Cross-Unit Comparison domain_specific.dose_era_cross_unit_comparison domain_specific

Aggregating dose_era.dose_value without constraining unit_concept_id mixes incompatible drug dose units (mg, mcg, IU, mL, …) and produces meaningless averages.

What this rule detects

``dose_era.dose_value`` is stored alongside ``unit_concept_id``; the same drug ingredient may have rows in mg, mcg, IU, mL, mEq, etc. across sites and patients. Computing AVG / SUM / MIN / MAX without filtering or grouping by unit_concept_id mixes these into a single number that has no clinical meaning. The same discipline applies as for ``measurement.value_as_number``: pick one unit, or report per-unit aggregates.

Example that trips the rule

SELECT AVG(dose_value) AS avg_dose
FROM dose_era
WHERE drug_concept_id = 1124300;

Corrected version

SELECT AVG(dose_value) AS avg_dose_mg
FROM dose_era
WHERE drug_concept_id = 1124300
  AND unit_concept_id = 8576;

warning Drug Days Supply Validation domain_specific.drug_days_supply_validation domain_specific

Validates that drug_exposure.days_supply is in a plausible range (1 to 365 days). Values outside this range indicate data quality issues or query logic errors.

Example that trips the rule

SELECT person_id FROM drug_exposure
WHERE days_supply = 400;

Corrected version

SELECT person_id FROM drug_exposure
WHERE days_supply BETWEEN 1 AND 365;

error Drug Era Concept Class Validation domain_specific.drug_era_concept_class_validation domain_specific

Ensures drug_era is filtered only on Ingredient-level concepts.

Example that trips the rule

SELECT de.person_id FROM drug_era de
JOIN concept c ON de.drug_concept_id = c.concept_id
WHERE c.concept_class_id = 'Branded Drug';

Corrected version

SELECT de.person_id FROM drug_era de
JOIN concept c ON de.drug_concept_id = c.concept_id
WHERE c.concept_class_id = 'Ingredient';

warning Drug Exposure Cardinality Awareness domain_specific.drug_exposure_cardinality_validation domain_specific

Counting drug_exposure rows may overcount patients due to multiple exposures (e.g., refills, restarts).

Example that trips the rule

SELECT drug_concept_id, COUNT(*) AS patient_count
FROM drug_exposure
GROUP BY drug_concept_id;

Corrected version

SELECT drug_concept_id, COUNT(DISTINCT person_id) AS patient_count
FROM drug_exposure
GROUP BY drug_concept_id;

warning Drug Exposure Quantity Misuse domain_specific.drug_exposure_quantity_misuse domain_specific

Detects use of drug_exposure.quantity as duration in date logic.

Example that trips the rule

SELECT person_id,
       drug_exposure_start_date + quantity AS end_date
FROM drug_exposure;

Corrected version

SELECT person_id,
       drug_exposure_start_date + days_supply AS end_date
FROM drug_exposure;

warning Drug Exposure Sig Parsing domain_specific.drug_exposure_sig_parsing domain_specific

Detects string parsing of drug_exposure.sig to extract structured dose information. The sig field is free-text and not standardized.

Example that trips the rule

SELECT CAST(SUBSTRING(sig, 1, 3) AS INT) AS dose
FROM drug_exposure;

Corrected version

SELECT de.person_id, ds.amount_value, ds.amount_unit_concept_id
FROM drug_exposure de
JOIN drug_strength ds ON de.drug_concept_id = ds.drug_concept_id
WHERE ds.invalid_reason IS NULL;

warning Drug Quantity Validation domain_specific.drug_quantity_validation domain_specific

Ensures drug_exposure.quantity is non-negative. Negative values indicate data quality or query logic issues.

Example that trips the rule

SELECT person_id FROM drug_exposure
WHERE quantity < 0;

Corrected version

SELECT person_id FROM drug_exposure
WHERE quantity >= 0;

warning Drug Strength Completeness (Amount vs Concentration) domain_specific.drug_strength_numerator_denominator_for_concentration domain_specific

drug_strength stores solid formulations in amount_value and liquid/injectable formulations in numerator_value/denominator_value. Queries using only amount_value exclude concentration-based drugs.

Example that trips the rule

SELECT drug_concept_id, amount_value FROM drug_strength
WHERE amount_value > 500;

Corrected version

SELECT drug_concept_id,
       COALESCE(amount_value, numerator_value / denominator_value) AS dose
FROM drug_strength;

warning Drug Strength Validity Filter domain_specific.drug_strength_validity_filter domain_specific

drug_strength is time-versioned. Queries must filter for currently valid records.

Example that trips the rule

SELECT drug_concept_id, amount_value FROM drug_strength;

Corrected version

SELECT drug_concept_id, amount_value FROM drug_strength
WHERE invalid_reason IS NULL
  AND CURRENT_DATE BETWEEN valid_start_date AND valid_end_date;

error Episode Event No Person ID domain_specific.episode_event_no_person_id domain_specific

episode_event table has no person_id column. To access person data, join episode_event to episode, then episode to person.

Example that trips the rule

SELECT person_id FROM episode_event;

Corrected version

SELECT e.person_id FROM episode_event ee
JOIN episode e ON ee.episode_id = e.episode_id;

error Episode Parent ID Self Join domain_specific.episode_parent_id_self_join domain_specific

episode.episode_parent_id is a self-referential FK to episode.episode_id. It must only join to episode.episode_id.

Example that trips the rule

SELECT e.episode_id FROM episode e
JOIN episode parent ON e.episode_parent_id = parent.person_id;

Corrected version

SELECT e.episode_id FROM episode e
JOIN episode parent ON e.episode_parent_id = parent.episode_id;

warning Event Cardinality Risk domain_specific.event_cardinality_validation domain_specific

Joining person to observation, or visit_occurrence to visit_detail, without aggregation produces multiple rows per parent record. Silent fan-out distorts downstream counts.

What this rule detects

OMOP allows multiple ``observation`` rows per person (a single visit can produce dozens of observations) and multiple ``visit_detail`` rows per ``visit_occurrence`` (different units, transfers, sub-encounters). Plain joins from the parent table to either of these without aggregation produce row-level fan-out: a downstream ``COUNT(*) AS patients`` returns observation rows, not patients; ``COUNT(*) AS visits`` returns visit-detail rows, not visits. Use GROUP BY on the parent key, ``DISTINCT``, or explicit aggregation. Sibling rules already enforce this pattern for ``condition_occurrence``, ``drug_exposure``, and ``measurement``.

Example that trips the rule

SELECT p.person_id, o.observation_concept_id
FROM person p
JOIN observation o ON p.person_id = o.person_id;

Corrected version

SELECT p.person_id, COUNT(DISTINCT o.observation_concept_id) AS n_obs
FROM person p
JOIN observation o ON p.person_id = o.person_id
GROUP BY p.person_id;

error Event Date Column Correctness domain_specific.event_date_column_correctness domain_specific

Certain OMOP clinical event tables (procedure_occurrence, measurement, observation, specimen, note) use simplified date column names without '_start' suffix. Referencing non-existent *_start_date columns will cause errors.

Example that trips the rule

SELECT person_id, procedure_start_date
FROM procedure_occurrence;

Corrected version

SELECT person_id, procedure_date
FROM procedure_occurrence;

error Event-Field Polymorphic Resolution domain_specific.event_field_polymorphic_resolution domain_specific

OMOP v5.4 polymorphic FKs (note.note_event_id, observation.observation_event_id, measurement.measurement_event_id, episode_event.event_id) require their sibling *_event_field_concept_id to be filtered. Without the filter the join mixes IDs from disjoint sequences.

What this rule detects

Each ``*_event_id`` column in OMOP v5.4 is a polymorphic foreign key — an INTEGER that points at a row in a different clinical table depending on the value of the sibling ``*_event_field_concept_id`` column. With no filter on the field-concept side, the analyst joins integer IDs from disjoint sequences: a ``measurement_event_id`` of 1234 might happen to equal a ``visit_occurrence_id`` of 1234 in the joined table, but the row doesn't refer to that visit at all. Filter the field-concept column first to identify which target table the event_id addresses, then join. The same discipline is enforced for ``cost.cost_event_id`` and ``location_history.entity_id``.

Example that trips the rule

SELECT m.measurement_id, vo.visit_concept_id
FROM measurement m
JOIN visit_occurrence vo ON m.measurement_event_id = vo.visit_occurrence_id;

Corrected version

SELECT m.measurement_id, vo.visit_concept_id
FROM measurement m
JOIN visit_occurrence vo ON m.measurement_event_id = vo.visit_occurrence_id
WHERE m.meas_event_field_concept_id IS NOT NULL;

error Location History Entity ID Requires Domain ID domain_specific.location_history_entity_id_requires_domain_id domain_specific

location_history.entity_id is a polymorphic FK identified by domain_id. Joins must filter domain_id to match the target table.

Example that trips the rule

SELECT lh.location_history_id FROM location_history lh
JOIN person p ON lh.entity_id = p.person_id;

Corrected version

SELECT lh.location_history_id FROM location_history lh
JOIN person p ON lh.entity_id = p.person_id
WHERE lh.domain_id = 'Person';

warning Measurement Cross-Unit Comparison domain_specific.measurement_cross_unit_comparison domain_specific

Aggregating measurement.value_as_number without constraining unit_concept_id mixes incompatible units and produces meaningless results.

Example that trips the rule

SELECT AVG(m.value_as_number) FROM measurement m
WHERE m.measurement_concept_id = 3004249;

Corrected version

SELECT AVG(m.value_as_number) FROM measurement m
WHERE m.measurement_concept_id = 3004249
  AND m.unit_concept_id = 8876;

warning Measurement Duplicate Detection domain_specific.measurement_duplicate_detection domain_specific

Detects aggregation on measurement data without handling duplicates. Duplicate measurement records can exist and affect results.

Example that trips the rule

SELECT AVG(value_as_number) FROM measurement;

Corrected version

SELECT person_id, measurement_date, measurement_concept_id,
       AVG(value_as_number) AS avg_value
FROM measurement
GROUP BY person_id, measurement_date, measurement_concept_id;

error Measurement Operator Concept Validation domain_specific.measurement_operator_concept_validation domain_specific

Ensures measurement.operator_concept_id uses only valid operator concepts.

Example that trips the rule

SELECT person_id FROM measurement
WHERE operator_concept_id = 99999;

Corrected version

SELECT person_id FROM measurement
WHERE operator_concept_id IN (4172703, 4172704, 4171756, 4171754, 4171755);

error Measurement Range Low/High Validation domain_specific.measurement_range_low_high_validation domain_specific

Detects logically impossible constraints where range_low > range_high.

Example that trips the rule

SELECT person_id FROM measurement
WHERE range_low > range_high;

Corrected version

SELECT person_id FROM measurement
WHERE range_low <= range_high;

warning Measurement Unit Validation domain_specific.measurement_unit_validation domain_specific

Detects queries that filter measurement.value_as_number against a numeric threshold without also constraining unit_concept_id. The same measurement concept can be stored in different units across sites (e.g. glucose in mmol/L vs mg/dL). A numeric threshold applied without a unit filter silently mixes patients measured in different unit conventions.

Example that trips the rule

SELECT m.person_id FROM measurement m
WHERE m.measurement_concept_id = 3004249
  AND m.value_as_number > 140;

Corrected version

SELECT m.person_id FROM measurement m
WHERE m.measurement_concept_id = 3004249
  AND m.value_as_number > 140
  AND m.unit_concept_id = 8876;

error Measurement Value Representation Consistency domain_specific.measurement_value_as_number_and_concept_validation domain_specific

Detects when value_as_number and value_as_concept_id are both filtered with AND, which may indicate inconsistent use of quantitative and qualitative representations.

Example that trips the rule

SELECT person_id FROM measurement
WHERE value_as_number > 140 AND value_as_concept_id = 4126681;

Corrected version

SELECT person_id FROM measurement
WHERE value_as_number > 140;

warning Note NLP Snippet Misuse domain_specific.note_nlp_snippet_misuse domain_specific

note_nlp.snippet and lexical_variant are free text for context, not structured data. Use note_nlp_concept_id instead.

Example that trips the rule

SELECT note_nlp_id FROM note_nlp
WHERE snippet LIKE '%diabetes%';

Corrected version

SELECT note_nlp_id FROM note_nlp
WHERE note_nlp_concept_id = 201820;

warning Observation Value As Columns Mutually Contextual domain_specific.observation_value_as_columns_mutually_contextual domain_specific

value_as_number, value_as_string, and value_as_concept_id represent the same value in different formats. Only one is typically populated per row.

Example that trips the rule

SELECT person_id FROM observation
WHERE value_as_number > 5 AND value_as_string = 'positive';

Corrected version

SELECT person_id FROM observation
WHERE value_as_number > 5;

error Observation Value As Concept Confusion domain_specific.observation_value_as_concept_confusion domain_specific

Detects when the same concept_id is used for both observation_concept_id and value_as_concept_id within the same logical condition.

Example that trips the rule

SELECT person_id FROM observation
WHERE observation_concept_id = 4083587
  AND value_as_concept_id = 4083587;

Corrected version

SELECT person_id FROM observation
WHERE observation_concept_id = 4083587
  AND value_as_concept_id = 4188539;

error Observation Value As String Numeric Comparison domain_specific.observation_value_as_string_numeric_comparison domain_specific

Detects numeric comparisons on observation.value_as_string (VARCHAR). This can lead to incorrect results.

Example that trips the rule

SELECT person_id FROM observation
WHERE value_as_string > 100;

Corrected version

SELECT person_id FROM observation
WHERE value_as_number > 100;

error Person Birth Field Validation domain_specific.person_birth_field_validation domain_specific

Ensures person birth fields (year_of_birth, month_of_birth, day_of_birth) use plausible values within accepted ranges.

Example that trips the rule

SELECT person_id FROM person
WHERE year_of_birth = 1800;

Corrected version

SELECT person_id FROM person
WHERE year_of_birth BETWEEN 1900 AND 2024;

warning Procedure Occurrence Quantity Semantics domain_specific.procedure_occurrence_quantity_semantics domain_specific

Ensures procedure_occurrence.quantity is not confused with record counts. quantity represents units per procedure event, not the number of procedure records.

Example that trips the rule

SELECT procedure_concept_id, SUM(quantity) AS procedure_count
FROM procedure_occurrence
GROUP BY procedure_concept_id;

Corrected version

SELECT procedure_concept_id, COUNT(*) AS procedure_count
FROM procedure_occurrence
GROUP BY procedure_concept_id;

error Specimen Source ID Not Specimen ID domain_specific.specimen_source_id_not_specimen_id domain_specific

specimen_source_id is a free-text identifier from the source system, not an OMOP foreign key. Use specimen.specimen_id for joins.

Example that trips the rule

SELECT s.specimen_id FROM specimen s
JOIN measurement m ON s.specimen_source_id = m.measurement_source_value;

Corrected version

SELECT s.specimen_id FROM specimen s
JOIN measurement m ON s.specimen_id = m.specimen_id;

warning Visit Detail Admitted/Discharged Domain Validation domain_specific.visit_detail_admitted_discharged_domain domain_specific

visit_detail.admitted_from_concept_id and discharged_to_concept_id must reference concepts from Visit or Place of Service domains. Hardcoded concept IDs should always be validated using concept.domain_id.

Example that trips the rule

SELECT visit_detail_id FROM visit_detail
WHERE admitted_from_concept_id = 8870;

Corrected version

SELECT vd.visit_detail_id FROM visit_detail vd
JOIN concept c ON vd.admitted_from_concept_id = c.concept_id
WHERE c.domain_id IN ('Visit', 'Place of Service');

warning Visit Detail Dates Within Parent Visit domain_specific.visit_detail_dates_within_parent_visit domain_specific

Detects conditions where visit_detail dates fall outside the parent visit_occurrence range.

Example that trips the rule

SELECT vd.visit_detail_id FROM visit_detail vd
JOIN visit_occurrence vo ON vd.visit_occurrence_id = vo.visit_occurrence_id
WHERE vd.visit_detail_start_date < vo.visit_start_date;

Corrected version

SELECT vd.visit_detail_id FROM visit_detail vd
JOIN visit_occurrence vo ON vd.visit_occurrence_id = vo.visit_occurrence_id
WHERE vd.visit_detail_start_date >= vo.visit_start_date
  AND vd.visit_detail_end_date <= vo.visit_end_date;

error Visit Detail Has No Preceding Visit Occurrence ID domain_specific.visit_detail_has_no_preceding_visit_occurrence_id domain_specific

visit_detail table has no preceding_visit_occurrence_id column. The temporal chain in visit_detail uses preceding_visit_detail_id. Use preceding_visit_occurrence_id only with visit_occurrence table.

Example that trips the rule

SELECT visit_detail_id, preceding_visit_occurrence_id
FROM visit_detail;

Corrected version

SELECT visit_detail_id, preceding_visit_detail_id
FROM visit_detail;

error Visit Detail Visit Occurrence Reference domain_specific.visit_detail_visit_occurrence_reference domain_specific

Detects potential issues when using visit_detail without proper visit_occurrence context or join.

Example that trips the rule

SELECT visit_detail_id, visit_detail_concept_id
FROM visit_detail;

Corrected version

SELECT vd.visit_detail_id, vd.visit_detail_concept_id, vo.visit_concept_id
FROM visit_detail vd
JOIN visit_occurrence vo ON vd.visit_occurrence_id = vo.visit_occurrence_id;

warning Visit Event Temporal Validation domain_specific.visit_event_temporal_validation domain_specific

Detects when clinical events are filtered to occur before visit_start_date, which may indicate a join mismatch or temporal inconsistency.

Example that trips the rule

SELECT co.person_id FROM condition_occurrence co
JOIN visit_occurrence vo ON co.visit_occurrence_id = vo.visit_occurrence_id
WHERE co.condition_start_date < vo.visit_start_date;

Corrected version

SELECT co.person_id FROM condition_occurrence co
JOIN visit_occurrence vo ON co.visit_occurrence_id = vo.visit_occurrence_id
WHERE co.condition_start_date >= vo.visit_start_date;

warning Visit Length-of-Stay Arithmetic domain_specific.visit_length_of_stay_arithmetic domain_specific

Computing visit length-of-stay (visit_end_date - visit_start_date or DATEDIFF) without restricting to inpatient-like visit_concept_ids mixes outpatient same-day visits (LOS = 0) into the calculation.

What this rule detects

Length-of-stay arithmetic (`visit_end_date - visit_start_date` or `DATEDIFF(day, visit_start_date, visit_end_date)`) is meaningful only for inpatient-like visit types where the end-date can exceed the start-date. Outpatient visits (`visit_concept_id = 9202`) by spec have `visit_end_date = visit_start_date`, so a mixed query averages in a sea of zeros and understates inpatient LOS. Restrict to inpatient concept ids (9201, 9203, 262, 32037, …) before computing. Also note that `visit_end_date` is nullable for ongoing stays — consider `COALESCE(visit_end_date, CURRENT_DATE)` if the query is computing on live data.

Example that trips the rule

SELECT AVG(visit_end_date - visit_start_date) AS avg_los
FROM visit_occurrence;

Corrected version

SELECT AVG(visit_end_date - visit_start_date) AS avg_los
FROM visit_occurrence
WHERE visit_concept_id = 9201
  AND visit_end_date IS NOT NULL;

error Visit Occurrence Type Concept Domain Validation domain_specific.visit_occurrence_type_domain domain_specific

visit_type_concept_id must reference a concept belonging to the Type Concept domain. Queries joining visit_type_concept_id to concept must filter by domain_id = 'Type Concept'.

Example that trips the rule

SELECT vo.visit_occurrence_id FROM visit_occurrence vo
JOIN concept c ON vo.visit_type_concept_id = c.concept_id;

Corrected version

SELECT vo.visit_occurrence_id FROM visit_occurrence vo
JOIN concept c ON vo.visit_type_concept_id = c.concept_id
WHERE c.domain_id = 'Type Concept';

warning Visit Outpatient Same-Day Validation domain_specific.visit_outpatient_same_day_validation domain_specific

Detects queries that filter outpatient visits (visit_concept_id = 9202) with multi-day date range logic, which may indicate confusion with inpatient logic.

Example that trips the rule

SELECT person_id FROM visit_occurrence
WHERE visit_concept_id = 9202
  AND DATEDIFF(day, visit_start_date, visit_end_date) > 30;

Corrected version

SELECT person_id FROM visit_occurrence
WHERE visit_concept_id = 9202
  AND DATEDIFF(day, visit_start_date, visit_end_date) <= 1;

error Relationship Boolean Comparison domain_specific.vocabulary_relationship_boolean_comparison domain_specific

relationship.is_hierarchical and defines_ancestry must be compared with valid boolean values (0,1,TRUE,FALSE), not strings.

Example that trips the rule

SELECT relationship_id FROM relationship
WHERE is_hierarchical = 'yes';

Corrected version

SELECT relationship_id FROM relationship
WHERE is_hierarchical = 1;

warning Person Year-of-Birth Age Arithmetic domain_specific.year_of_birth_age_arithmetic domain_specific

Computing age as `<year_expression> - person.year_of_birth` rounds the result by up to a year and silently ignores birth_datetime / month_of_birth / day_of_birth. Prefer full-date arithmetic when the data carries it.

What this rule detects

Many OMOP cohort definitions compute age at index by subtracting `person.year_of_birth` from the event year: `EXTRACT(YEAR FROM co.condition_start_date) - p.year_of_birth`. This drops the month and day entirely, so a person born 1959-12-31 with an event on 2024-01-01 evaluates as 65 even though they are barely 64. For age cutoffs (`>= 65`) this introduces systematic off-by-one errors. The OMOP person table also carries `birth_datetime` (and `month_of_birth` / `day_of_birth`) for sites that captured them; prefer full-date arithmetic when available.

Example that trips the rule

SELECT co.person_id
FROM person p
JOIN condition_occurrence co ON p.person_id = co.person_id
WHERE EXTRACT(YEAR FROM co.condition_start_date) - p.year_of_birth >= 65;

Corrected version

SELECT co.person_id
FROM person p
JOIN condition_occurrence co ON p.person_id = co.person_id
WHERE FLOOR((co.condition_start_date - p.birth_datetime) / 365.25) >= 65;

error Care Site ID Join Validation joins.care_site_id_join_validation joins

Tables containing care_site_id must join to care_site using care_site_id. Joining on location_id, provider_id, or other columns is incorrect.

Example that trips the rule

SELECT * FROM person p JOIN care_site cs ON p.location_id = cs.care_site_id;

Corrected version

SELECT * FROM person p JOIN care_site cs ON p.care_site_id = cs.care_site_id;

warning Care Site Join Path Validation joins.care_site_join_validation joins

Ensures clinical tables join to location via care_site.

A longer write-up and example for this rule are still being authored.

error Care Site to Location Join Validation joins.care_site_location_join_validation joins

care_site must join to location via location_id. Joining on other columns is incorrect.

Example that trips the rule

SELECT * FROM care_site cs JOIN location l ON cs.care_site_id = l.location_id;

Corrected version

SELECT * FROM care_site cs JOIN location l ON cs.location_id = l.location_id;

error Clinical Tables Require Person ID Linkage joins.clinical_person_id_linkage_validation joins

Clinical tables must be connected via person_id to ensure patient-level correctness. CTEs are treated as validated units and checked separately.

Example that trips the rule

SELECT co.condition_occurrence_id, de.drug_exposure_id
FROM condition_occurrence co
JOIN drug_exposure de ON co.visit_occurrence_id = de.visit_occurrence_id;

Corrected version

SELECT co.condition_occurrence_id, de.drug_exposure_id
FROM condition_occurrence co
JOIN drug_exposure de ON co.person_id = de.person_id;

error Clinical Primary Key Join Validation joins.clinical_pk_cross_join_validation joins

Ensures clinical event primary keys are not incorrectly used in joins. Each clinical event table has an independent primary key sequence with no semantic relationship to other tables.

Example that trips the rule

SELECT * FROM condition_occurrence co
JOIN drug_exposure de ON co.condition_occurrence_id = de.drug_exposure_id;

Corrected version

SELECT * FROM condition_occurrence co
JOIN drug_exposure de ON co.person_id = de.person_id;

error Clinical to Visit Detail Join Validation joins.clinical_visit_detail_join_validation joins

When joining clinical tables to visit_detail, visit_detail_id should be used. Joining visit_occurrence_id to visit_detail_id (or vice versa) is an ID type mismatch.

Example that trips the rule

SELECT * FROM condition_occurrence co
JOIN visit_detail vd ON co.visit_occurrence_id = vd.visit_detail_id;

Corrected version

SELECT * FROM condition_occurrence co
JOIN visit_detail vd ON co.visit_detail_id = vd.visit_detail_id;

error Cohort to Clinical Table Join Validation joins.cohort_clinical_join_validation joins

Ensures cohort joins to clinical tables using subject_id = person_id, either directly or via person table.

Example that trips the rule

SELECT * FROM cohort c
JOIN condition_occurrence co ON c.subject_id = co.visit_occurrence_id;

Corrected version

SELECT * FROM cohort c
JOIN condition_occurrence co ON c.subject_id = co.person_id;

error Concept Alias Reuse Validation joins.concept_alias_reuse_validation joins

Ensures each concept join uses a distinct alias when joining multiple concept_id columns (primary, source, type). Prevents ambiguous joins, semantic confusion, and incorrect results.

Example that trips the rule

SELECT * FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_id
JOIN concept c ON co.condition_source_concept_id = c.concept_id;

Corrected version

SELECT * FROM condition_occurrence co
JOIN concept c1 ON co.condition_concept_id = c1.concept_id
JOIN concept c2 ON co.condition_source_concept_id = c2.concept_id;

error Concept Ancestor Name Resolution Validation joins.concept_ancestor_name_resolution joins

Ensures concept_ancestor joins align with semantic intent (ancestor vs descendant) inferred from column aliases.

A longer write-up and example for this rule are still being authored.

error Concept to Concept Class Join Validation joins.concept_concept_class_join_validation joins

If concept is joined to concept_class, the relationship should use concept.concept_class_id = concept_class.concept_class_id.

Example that trips the rule

SELECT c.concept_id FROM concept c
JOIN concept_class cc ON c.vocabulary_id = cc.concept_class_id;

Corrected version

SELECT c.concept_id FROM concept c
JOIN concept_class cc ON c.concept_class_id = cc.concept_class_id;

error Concept to Domain Join Validation joins.concept_domain_join_validation joins

If concept is joined to domain, the relationship should use concept.domain_id = domain.domain_id.

Example that trips the rule

SELECT c.concept_id FROM concept c
JOIN domain d ON c.vocabulary_id = d.domain_id;

Corrected version

SELECT c.concept_id FROM concept c
JOIN domain d ON c.domain_id = d.domain_id;

error Concept Join Validation joins.concept_join_validation joins

Ensures correct joins to the OMOP concept table. Columns ending with '_concept_id' must join to concept.concept_id. Vocabulary-based joins (e.g., concept_code) should include vocabulary_id.

Example that trips the rule

SELECT co.person_id FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_name;

Corrected version

SELECT co.person_id FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_id;

error Concept Relationship to Concept Join Validation joins.concept_relationship_concept_join_validation joins

Ensures concept_relationship joins to concept align with semantic intent (source vs target) inferred from aliases.

Example that trips the rule

SELECT c_source.concept_name
FROM concept_relationship cr
JOIN concept c_source ON cr.concept_id_2 = c_source.concept_id;

Corrected version

SELECT c_source.concept_name
FROM concept_relationship cr
JOIN concept c_source ON cr.concept_id_1 = c_source.concept_id;

error Concept Relationship to Relationship Join Validation joins.concept_relationship_relationship_join_validation joins

When concept_relationship is joined with relationship, it must use relationship_id on both sides.

Example that trips the rule

SELECT * FROM concept_relationship cr
JOIN relationship r ON cr.concept_id_1 = r.relationship_id;

Corrected version

SELECT * FROM concept_relationship cr
JOIN relationship r ON cr.relationship_id = r.relationship_id;

warning Concept Relationship Requires Relationship ID Filter joins.concept_relationship_requires_relationship_id joins

Queries using concept_relationship should typically filter on relationship_id to avoid cross-product joins. Exploratory/analytical queries may intentionally omit this filter to analyze all relationships.

What this rule detects

concept_relationship is a many-to-many table: a single source concept typically has dozens of relationships ('Maps to', 'Is a', 'Subsumes', 'Has brand name', …). Joining to concept_relationship without filtering on relationship_id pulls them all, blowing up the row count and mixing semantically different relationships into one result set. For cohort definitions, always restrict to a single relationship type (typically relationship_id = 'Maps to'); exploratory queries should at least GROUP BY it so the direction of each match is visible in the output.

Example that trips the rule

SELECT c1.concept_name, c2.concept_name
FROM concept c1
JOIN concept_relationship cr
  ON c1.concept_id = cr.concept_id_1
JOIN concept c2
  ON cr.concept_id_2 = c2.concept_id
WHERE c1.concept_id = 201820;

Corrected version

SELECT c1.concept_name, c2.concept_name
FROM concept c1
JOIN concept_relationship cr
  ON c1.concept_id = cr.concept_id_1
  AND cr.relationship_id = 'Maps to'
JOIN concept c2
  ON cr.concept_id_2 = c2.concept_id
WHERE c1.concept_id = 201820;

error Concept Synonym Join Validation joins.concept_synonym_join_validation joins

Ensures concept_synonym joins to concept via concept_id. Joining on names or other columns is unreliable because names are not unique.

Example that trips the rule

SELECT cs.concept_synonym_name
FROM concept c
JOIN concept_synonym cs ON c.concept_name = cs.concept_synonym_name;

Corrected version

SELECT cs.concept_synonym_name
FROM concept c
JOIN concept_synonym cs ON c.concept_id = cs.concept_id;

error Concept to Vocabulary Join Validation joins.concept_vocabulary_join_validation joins

If concept is joined to vocabulary, the join should use vocabulary_id. Other joins may be incorrect or ambiguous.

Example that trips the rule

SELECT c.concept_id FROM concept c
JOIN vocabulary v ON c.domain_id = v.vocabulary_id;

Corrected version

SELECT c.concept_id FROM concept c
JOIN vocabulary v ON c.vocabulary_id = v.vocabulary_id;

warning Cost Table Domain Validation joins.cost_table_domain_validation joins

Ensures cost joins use correct cost_domain_id to disambiguate polymorphic keys.

Example that trips the rule

SELECT * FROM cost c
JOIN drug_exposure de ON c.cost_event_id = de.drug_exposure_id;

Corrected version

SELECT * FROM cost c
JOIN drug_exposure de ON c.cost_event_id = de.drug_exposure_id
WHERE c.cost_domain_id = 'Drug';

error Death to Visit Occurrence Join Validation joins.death_visit_occurrence_join_validation joins

Ensures death joins to visit_occurrence using person_id. Flags missing or invalid joins.

Example that trips the rule

SELECT * FROM death d
JOIN visit_occurrence vo ON d.death_date = vo.visit_start_date;

Corrected version

SELECT * FROM death d
JOIN visit_occurrence vo ON d.person_id = vo.person_id;

error Drug Exposure to Drug Strength Join Validation joins.drug_exposure_drug_strength_join_validation joins

Ensures drug_exposure joins to drug_strength using drug_concept_id. Flags missing or non-standard joins.

Example that trips the rule

SELECT * FROM drug_exposure de
JOIN drug_strength ds ON de.drug_source_concept_id = ds.drug_concept_id;

Corrected version

SELECT * FROM drug_exposure de
JOIN drug_strength ds ON de.drug_concept_id = ds.drug_concept_id;

error Era Table Forbidden Join Validation joins.era_forbidden_join_validation joins

Ensures era tables (condition_era, drug_era, dose_era) are not joined to visit_occurrence, visit_detail, provider, or care_site tables, either directly or through intermediate tables.

Example that trips the rule

SELECT * FROM drug_era de
JOIN visit_occurrence vo ON de.person_id = vo.person_id;

Corrected version

SELECT * FROM drug_era de;

error Fact Relationship Polymorphic Join Validation joins.fact_relationship_join_validation joins

Ensures fact_relationship joins include correct domain_concept_id filtering. fact_id columns are polymorphic and must be disambiguated using domain_concept_id.

Example that trips the rule

SELECT fr.fact_id_1 FROM fact_relationship fr
JOIN visit_occurrence vo ON fr.fact_id_1 = vo.visit_occurrence_id;

Corrected version

SELECT fr.fact_id_1 FROM fact_relationship fr
JOIN visit_occurrence vo ON fr.fact_id_1 = vo.visit_occurrence_id
WHERE fr.domain_concept_id_1 = 8;

warning Join Path Validation joins.join_path_validation joins

Verifies that concept or concept_relationship tables are properly joined to clinical tables using standard concept fields

Example that trips the rule

SELECT co.person_id FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.domain_id;

Corrected version

SELECT co.person_id FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_id;

warning Left Join Then Where On Right Table joins.left_join_then_where_on_right_table joins

Filtering right table columns in WHERE after LEFT JOIN turns it effectively into INNER JOIN.

Example that trips the rule

SELECT p.person_id FROM person p
LEFT JOIN visit_occurrence vo ON p.person_id = vo.person_id
WHERE vo.visit_start_date >= DATE '2023-01-01';

Corrected version

SELECT p.person_id FROM person p
LEFT JOIN visit_occurrence vo
  ON p.person_id = vo.person_id
  AND vo.visit_start_date >= DATE '2023-01-01';

warning Maps To Direction joins.maps_to_direction joins

Verifies that 'Maps to' relationship is used in the correct direction: concept_id_1 for source, concept_id_2 for standard concept

A longer write-up and example for this rule are still being authored.

error Note NLP to Note Join Validation joins.note_nlp_note_join_validation joins

Ensures note_nlp joins to note using note_id. Flags missing or invalid joins.

Example that trips the rule

SELECT * FROM note_nlp nnlp JOIN note n ON nnlp.note_nlp_id = n.note_id;

Corrected version

SELECT * FROM note_nlp nnlp JOIN note n ON nnlp.note_id = n.note_id;

warning Observation Period Join Requires Date Overlap joins.observation_period_join_validation joins

observation_period joins to clinical tables must include date overlap constraints.

Example that trips the rule

SELECT * FROM condition_occurrence co
JOIN observation_period op ON co.person_id = op.person_id;

Corrected version

SELECT * FROM condition_occurrence co
JOIN observation_period op
  ON co.person_id = op.person_id
  AND co.condition_start_date BETWEEN op.observation_period_start_date
                                  AND op.observation_period_end_date;

warning Payer Plan Period Join Validation joins.payer_plan_period_join_validation joins

Ensures payer_plan_period joins to clinical tables include proper date overlap conditions, not just person_id.

Example that trips the rule

SELECT * FROM condition_occurrence co
JOIN payer_plan_period ppp ON co.person_id = ppp.person_id;

Corrected version

SELECT * FROM condition_occurrence co
JOIN payer_plan_period ppp
  ON co.person_id = ppp.person_id
  AND co.condition_start_date BETWEEN ppp.payer_plan_period_start_date
                                  AND ppp.payer_plan_period_end_date;

error Person ID Join Validation joins.person_id_join_validation joins

Ensures person_id columns only join to other person_id columns. Prevents incorrect joins to unrelated primary keys such as visit_occurrence_id or condition_occurrence_id.

Example that trips the rule

SELECT * FROM person p
JOIN visit_occurrence vo ON p.person_id = vo.visit_occurrence_id;

Corrected version

SELECT * FROM person p
JOIN visit_occurrence vo ON p.person_id = vo.person_id;

error Person to Location Join Validation joins.person_location_join_validation joins

person must join to location via location_id to get patient address. Joining on person_id, person_source_value, or other columns is incorrect.

Example that trips the rule

SELECT * FROM person p JOIN location l ON p.person_id = l.location_id;

Corrected version

SELECT * FROM person p JOIN location l ON p.location_id = l.location_id;

error Preceding Visit Occurrence Validation joins.preceding_visit_occurrence_validation joins

preceding_visit_occurrence_id must reference visit_occurrence.visit_occurrence_id via a proper self-join with person_id and temporal constraints.

Example that trips the rule

SELECT vo.visit_occurrence_id FROM visit_occurrence vo
JOIN visit_occurrence prev ON vo.preceding_visit_occurrence_id = prev.person_id;

Corrected version

SELECT vo.visit_occurrence_id, vo.preceding_visit_occurrence_id
FROM visit_occurrence vo
JOIN visit_occurrence prev
  ON vo.preceding_visit_occurrence_id = prev.visit_occurrence_id
  AND vo.person_id = prev.person_id
  AND prev.visit_end_date <= vo.visit_start_date;

error Provider to Care Site Join Validation joins.provider_care_site_join_validation joins

provider must join to care_site via care_site_id to get practice location. Joining on provider_id, specialty_concept_id, or other columns is incorrect.

Example that trips the rule

SELECT * FROM provider p JOIN care_site cs ON p.provider_id = cs.care_site_id;

Corrected version

SELECT * FROM provider p JOIN care_site cs ON p.care_site_id = cs.care_site_id;

error Provider Join Validation joins.provider_join_validation joins

Clinical event tables must join to the provider table via provider_id. Joining on person_id, care_site_id, or other columns is incorrect.

Example that trips the rule

SELECT * FROM condition_occurrence co
JOIN provider p ON co.person_id = p.provider_id;

Corrected version

SELECT * FROM condition_occurrence co
JOIN provider p ON co.provider_id = p.provider_id;

warning Visit Detail Join Validation joins.visit_detail_join_validation joins

Ensures visit_detail joins to visit_occurrence using visit_occurrence_id. Joining only on person_id can produce incorrect results.

Example that trips the rule

SELECT * FROM visit_detail vd
JOIN visit_occurrence vo ON vd.person_id = vo.person_id;

Corrected version

SELECT * FROM visit_detail vd
JOIN visit_occurrence vo ON vd.visit_occurrence_id = vo.visit_occurrence_id;

error Visit Occurrence ID Join Validation joins.visit_occurrence_id_join_validation joins

Ensures visit_occurrence_id columns only join to other visit_occurrence_id columns. Prevents incorrect joins to unrelated identifiers such as person_id or condition_occurrence_id.

Example that trips the rule

SELECT * FROM visit_occurrence vo
JOIN condition_occurrence co ON vo.visit_occurrence_id = co.condition_occurrence_id;

Corrected version

SELECT * FROM visit_occurrence vo
JOIN condition_occurrence co ON vo.visit_occurrence_id = co.visit_occurrence_id;

warning Visit Occurrence INNER JOIN Validation joins.visit_occurrence_inner_join_validation joins

Detects INNER JOINs to visit_occurrence that may exclude events with NULL visit_occurrence_id.

Example that trips the rule

SELECT co.person_id FROM condition_occurrence co
JOIN visit_occurrence vo ON co.visit_occurrence_id = vo.visit_occurrence_id;

Corrected version

SELECT co.person_id FROM condition_occurrence co
LEFT JOIN visit_occurrence vo ON co.visit_occurrence_id = vo.visit_occurrence_id;

warning Clinical Event Date Should Not Be In Future temporal.clinical_event_date_in_future_validation temporal

Detects filtering logic that implies clinical event dates occur in the future. Future event dates may indicate data quality issues or incorrect query logic.

What this rule detects

Queries that filter condition_start_date > CURRENT_DATE (or the same inequality against any clinical event date column) describe events that have not happened yet. In production OMOP data this is almost always a data-quality audit (trying to find future-dated events to clean up) and not a real cohort selection, so the rule warns rather than errors. If you genuinely need to audit, document that intent in a comment; if the predicate is accidentally reversed, flip the operator.

Example that trips the rule

SELECT * FROM condition_occurrence
WHERE condition_start_date > CURRENT_DATE;

Corrected version

SELECT * FROM condition_occurrence
WHERE condition_start_date <= CURRENT_DATE;

warning Datetime BETWEEN with Date Literal temporal.datetime_between_date_literal temporal

BETWEEN on datetime columns using date-only literals causes data loss because the end date excludes non-midnight times.

What this rule detects

BETWEEN '2023-01-01' AND '2023-12-31' on a *_datetime column quietly drops every record from 2023-12-31 after 00:00:00, because the date literal is cast to midnight on that day. The problem is silent: the query returns a superset of 2023 data minus the last day's non-midnight measurements, which rarely shows up in spot-checks. Use half-open intervals (>= '2023-01-01' AND < '2024-01-01'), cast both sides to timestamps, or switch to the *_date column when sub-day precision is not needed.

Example that trips the rule

SELECT *
FROM measurement
WHERE measurement_datetime BETWEEN DATE '2023-01-01' AND DATE '2023-12-31';

Corrected version

SELECT *
FROM measurement
WHERE measurement_datetime >= TIMESTAMP '2023-01-01 00:00:00'
  AND measurement_datetime <  TIMESTAMP '2024-01-01 00:00:00';

error Death Date Before Birth Validation temporal.death_date_before_birth_validation temporal

Detects impossible temporal conditions where death occurs before birth.

What this rule detects

A predicate that allows death_date < birth_datetime to evaluate TRUE encodes an impossibility, no patient can die before they are born. When this condition appears in a WHERE or JOIN clause, it is almost always a column-swap bug: the tables on either side of the comparison have been reversed, or start/end columns from a template have been mixed up. The rule flags the impossible comparison so the author can restore the intended temporal ordering before the query silently returns zero rows.

Example that trips the rule

SELECT p.person_id
FROM person p
JOIN death d ON p.person_id = d.person_id
WHERE d.death_date < p.birth_datetime;

Corrected version

SELECT p.person_id
FROM person p
JOIN death d ON p.person_id = d.person_id
WHERE d.death_date > p.birth_datetime;

warning Death Date In Future Validation temporal.death_date_in_future_validation temporal

Detects queries filtering for future death dates, indicating data or logic issues.

What this rule detects

Filtering for death_date > CURRENT_DATE describes patients who die in the future, which is not possible. Like the clinical-event-in-future rule, this is almost always either a data-quality audit (looking for impossible deaths to clean up) or an accidentally reversed comparison. The severity is WARNING rather than ERROR to allow legitimate audit queries.

Example that trips the rule

SELECT * FROM death
WHERE death_date > CURRENT_DATE;

Corrected version

SELECT * FROM death
WHERE death_date <= CURRENT_DATE;

error End Before Start Validation temporal.end_before_start_validation temporal

Detects impossible constraints where start_date > end_date

What this rule detects

Filtering for *_start_date > *_end_date (or the equivalent *_end_date < *_start_date) describes an impossible time range. In OMOP tables like visit_occurrence and drug_exposure every record has start <= end by specification, so a predicate that returns only records where start > end selects zero rows. Useful as a data-quality probe, but usually the operator was reversed by accident.

Example that trips the rule

SELECT *
FROM visit_occurrence
WHERE visit_start_date > visit_end_date;

Corrected version

SELECT *
FROM visit_occurrence
WHERE visit_start_date <= visit_end_date;

warning Unbounded Follow-up Window (Future Information Leakage) temporal.future_information_leakage temporal

Detects queries that compare dates across different clinical event tables (e.g. condition_start_date > drug_exposure_start_date) without bounding the later event against observation_period_end_date. The later event can fall outside the patient's observed follow-up window, introducing immortal-time bias and similar follow-up-window errors. Suppressed when observation_period is not joined at all — the observation_period_anchoring rule covers that case with a coherent fix.

What this rule detects

When a query compares event dates across two clinical tables (e.g. 'condition started before first drug exposure'), the future-facing event must be bounded against observation_period_end_date. Without that bound, the comparison reaches into events that occurred AFTER the person's observation ended, silently pulling data that should not be in scope. The leakage is subtle because the join still returns rows; they are just rows representing out-of-bounds data. Add an explicit observation_period bound on the future-facing side.

Example that trips the rule

SELECT co.person_id
FROM condition_occurrence co
JOIN drug_exposure de ON co.person_id = de.person_id
WHERE co.condition_start_date < de.drug_exposure_start_date;

Corrected version

SELECT co.person_id
FROM condition_occurrence co
JOIN drug_exposure de ON co.person_id = de.person_id
JOIN observation_period op ON co.person_id = op.person_id
WHERE co.condition_start_date < de.drug_exposure_start_date
  AND de.drug_exposure_start_date <= op.observation_period_end_date;

warning Nullable End Date NULL Handling temporal.nullable_end_date_null_handling temporal

Ensures nullable end_date columns are properly handled when used in functions, arithmetic, or comparisons to avoid NULL propagation issues.

What this rule detects

Several OMOP end-date columns are nullable (drug_exposure_end_date, condition_end_date, device_exposure_end_date). Passing them directly into DATEDIFF, arithmetic, or a comparison propagates NULL through the expression, which then evaluates to NULL in WHERE and silently excludes the row. Wrap the nullable column in COALESCE with a sensible fallback (often the paired start_date), or filter IS NOT NULL explicitly, so the row-exclusion is deliberate rather than accidental.

Example that trips the rule

SELECT person_id,
       DATEDIFF(day, drug_exposure_start_date, drug_exposure_end_date) AS days_supply
FROM drug_exposure;

Corrected version

SELECT person_id,
       DATEDIFF(day, drug_exposure_start_date,
                COALESCE(drug_exposure_end_date, drug_exposure_start_date)) AS days_supply
FROM drug_exposure;

warning Observation Period Anchoring temporal.observation_period_anchoring temporal

When observation_period is included in a query, ensures it is properly joined on person_id to clinical tables. Does not flag queries that simply don't use observation_period (valid design choice for descriptive queries).

What this rule detects

When a query touches observation_period alongside a clinical table, OMOP semantics require the two to be linked by person_id, otherwise the join produces every (person, observation_period) combination, mixing one patient's observation window with another's clinical events. The rule fires when observation_period appears in the FROM list without a person_id equality linking it to the clinical side. If you only need observation_period's date columns without tying them to a specific patient, that is usually a query-design smell; restructure to make the person_id join explicit.

Example that trips the rule

SELECT co.person_id
FROM condition_occurrence co, observation_period op
WHERE co.condition_start_date BETWEEN op.observation_period_start_date
  AND op.observation_period_end_date;

Corrected version

SELECT co.person_id
FROM condition_occurrence co
JOIN observation_period op ON co.person_id = op.person_id
WHERE co.condition_start_date BETWEEN op.observation_period_start_date
  AND op.observation_period_end_date;

error Observation Period Date Range Logic temporal.observation_period_date_range_logic temporal

Ensures clinical event dates are tested within observation_period bounds. Detects reversed logic where observation_period dates are incorrectly used as values.

What this rule detects

The conventional OMOP invariant is event_date BETWEEN observation_period_start_date AND observation_period_end_date: the event is the value being tested, the observation window is the range. Reversed forms like observation_period_start_date BETWEEN event_start_date AND event_end_date produce rows only when the observation window happens to fall inside a single event, which is almost never what was intended. It is a classic copy-paste bug from swapping the BETWEEN operands.

Example that trips the rule

SELECT co.person_id
FROM condition_occurrence co
JOIN observation_period op ON co.person_id = op.person_id
WHERE op.observation_period_start_date
      BETWEEN co.condition_start_date AND co.condition_end_date;

Corrected version

SELECT co.person_id
FROM condition_occurrence co
JOIN observation_period op ON co.person_id = op.person_id
WHERE co.condition_start_date
      BETWEEN op.observation_period_start_date AND op.observation_period_end_date;

warning Required Date Column Validation temporal.required_date_column_validation temporal

Temporal queries on clinical tables should use required (NOT NULL) date columns instead of nullable columns to avoid silently excluding records.

What this rule detects

Each OMOP clinical table has at least one required (NOT NULL) date column and one or more optional end-date columns. Filtering on a nullable column (e.g. drug_exposure_end_date instead of drug_exposure_start_date) silently excludes every row where that column happens to be null, often a sizeable fraction of the data. Prefer the required column for the primary temporal filter; reach for the nullable ones only when their meaning is specifically what you need (duration calculations, end-of-period cohorts).

Example that trips the rule

SELECT person_id
FROM drug_exposure
WHERE drug_exposure_end_date >= DATE '2023-01-01';

Corrected version

SELECT person_id
FROM drug_exposure
WHERE drug_exposure_start_date >= DATE '2023-01-01';