FastSSV ships 154 static rules covering OMOP CDM v5.4, grouped into 6 categories. Click any rule to see a longer write-up and the SQL pattern that trips it.
warning
Ambiguous Column Reference
anti_patterns.ambiguous_column_reference
anti_patterns
Detects unqualified column references (e.g., person_id instead of co.person_id) in multi-table queries where the column likely appears in multiple OMOP tables. This can lead to SQL errors or incorrect results.
Columns like person_id, visit_occurrence_id, provider_id, and care_site_id appear in many OMOP tables. In a multi-table join, an unqualified reference is either a hard error (ambiguous column) or a silent bug where the parser resolves it to the first matching table — which may not be the one the author intended. Always qualify every column in a multi-table query with its table alias.
SELECT person_id
FROM condition_occurrence co
JOIN person p ON co.person_id = p.person_id
WHERE person_id = 1;
SELECT co.person_id
FROM condition_occurrence co
JOIN person p ON co.person_id = p.person_id
WHERE co.person_id = 1;
error
Attribute Definition Invalid Join
anti_patterns.attribute_definition_invalid_join
anti_patterns
attribute_definition is a legacy table with no foreign key relationships to other OMOP tables. It cannot be meaningfully joined and should only be queried standalone.
attribute_definition is a legacy OMOP table with no foreign-key relationships to any other CDM table. Joining it to clinical or vocabulary tables produces either a Cartesian product or spurious matches on coincidentally-equal integer values. Query it on its own when you need attribute metadata, and drop any JOINs against it.
SELECT *
FROM person p
JOIN attribute_definition ad ON p.person_id = ad.attribute_definition_id;
SELECT *
FROM attribute_definition;
error
Comma-Separated Cross Join
anti_patterns.comma_separated_cross_join
anti_patterns
Comma-separated FROM between clinical tables with no join predicate, produces a Cartesian product.
Comma-join syntax (FROM a, b) predates the explicit JOIN...ON form introduced in SQL-92 and is still common in analysts who come from SAS, SPSS, or older SQL dialects where it was idiomatic. In OMOP the mistake is rarely about performance first: even on a small dataset the cross-joined query returns row combinations that don't correspond to any real clinical event, every condition paired with every drug for every patient, so the results are semantically wrong before the query size becomes catastrophic. The natural join column between two clinical tables is almost always person_id; occasionally visit_occurrence_id when both sides sit inside the same encounter. If you genuinely want a Cartesian product (test-matrix generation, sparse-grid filling), write CROSS JOIN explicitly; this rule fires only on the implicit comma form, so an explicit CROSS JOIN documents the intent and stays silent.
SELECT co.condition_occurrence_id, de.drug_exposure_id
FROM condition_occurrence co, drug_exposure de
WHERE co.condition_concept_id = 201820;
-- Fix A: explicit JOIN ... ON (preferred for readability)
SELECT co.condition_occurrence_id, de.drug_exposure_id
FROM condition_occurrence co
JOIN drug_exposure de
ON co.person_id = de.person_id
WHERE co.condition_concept_id = 201820;
-- Fix B: keep the comma syntax, add the join predicate to WHERE
SELECT co.condition_occurrence_id, de.drug_exposure_id
FROM condition_occurrence co, drug_exposure de
WHERE co.person_id = de.person_id
AND co.condition_concept_id = 201820;
warning
Concept Ancestor Mixed with Concept Relationship Redundantly
anti_patterns.concept_ancestor_mixed_with_concept_relationship_redundantly
anti_patterns
concept_ancestor already includes all transitive hierarchical paths. Using concept_relationship with hierarchical filters in the same query is redundant.
concept_ancestor already encodes every transitive hierarchical path (ancestor → descendant, including indirect ones). Joining concept_relationship to it for hierarchical relationships like 'Subsumes' or 'Is a' is redundant and typically changes the semantics — the combined filter returns the intersection rather than the hierarchy you intended. Use concept_ancestor for hierarchy and reserve concept_relationship for non-hierarchical links like 'Maps to', 'Has indication', 'Contains'.
SELECT ca.descendant_concept_id
FROM concept_ancestor ca
JOIN concept_relationship cr
ON ca.descendant_concept_id = cr.concept_id_1
WHERE ca.ancestor_concept_id = 201820
AND cr.relationship_id = 'Subsumes';
SELECT ca.descendant_concept_id
FROM concept_ancestor ca
WHERE ca.ancestor_concept_id = 201820;
warning
Concept Code Requires Vocabulary ID
anti_patterns.concept_code_requires_vocabulary_id
anti_patterns
concept_code is unique only within a vocabulary. Any filter on concept_code should include a vocabulary_id filter in the same scope to avoid ambiguous cross-vocabulary matches.
concept_code values are unique only within a vocabulary: 'E11' exists in ICD10CM (diabetes), ICD10 (diabetes), potentially custom vocabularies — and the same code string can mean entirely different things. Filtering on concept_code without also filtering on vocabulary_id silently matches unintended concepts from other vocabularies. Always pair the two predicates in the same scope.
SELECT c.concept_id
FROM concept c
WHERE c.concept_code = 'E11.9';
SELECT c.concept_id
FROM concept c
WHERE c.concept_code = 'E11.9'
AND c.vocabulary_id = 'ICD10CM';
warning
Concept Name Lookup Anti-pattern
anti_patterns.concept_name_lookup
anti_patterns
Warns when queries filter by concept_name instead of using concept_code + vocabulary_id or concept_id. Concept names are not guaranteed to be unique or stable across versions.
concept_name is neither unique nor stable: multiple concepts can share a display name, and names change across vocabulary releases (capitalisation, abbreviations, rewording). Filtering by concept_name makes queries silently unreliable across sites and breaks after each vocabulary refresh. Prefer filtering by concept_code + vocabulary_id, or by concept_id directly if you have it.
SELECT c.concept_id
FROM concept c
WHERE c.concept_name = 'Type 2 diabetes mellitus';
SELECT c.concept_id
FROM concept c
WHERE c.concept_code = 'E11'
AND c.vocabulary_id = 'ICD10CM';
warning
Concept Relationship Transitive Misuse
anti_patterns.concept_relationship_transitive_misuse
anti_patterns
concept_relationship stores only direct relationships. Chaining multiple self-joins attempts transitive closure and is fragile. Use concept_ancestor instead.
concept_relationship stores only direct (one-hop) relationships. Chaining multiple self-joins on it to simulate transitive closure ('Subsumes' through three levels of hierarchy) is fragile: it only captures descendants at the exact depth you chained for, misses concepts with multiple inheritance paths, and gets slower with each additional self-join. concept_ancestor pre-computes all transitive hierarchical paths and is the correct table for 'everything beneath this concept' queries.
SELECT cr3.concept_id_2
FROM concept_relationship cr1
JOIN concept_relationship cr2 ON cr1.concept_id_2 = cr2.concept_id_1
JOIN concept_relationship cr3 ON cr2.concept_id_2 = cr3.concept_id_1
WHERE cr1.concept_id_1 = 201820
AND cr1.relationship_id = 'Subsumes'
AND cr2.relationship_id = 'Subsumes'
AND cr3.relationship_id = 'Subsumes';
SELECT descendant_concept_id AS concept_id
FROM concept_ancestor
WHERE ancestor_concept_id = 201820;
error
Destructive Operations on Clinical Tables
anti_patterns.destructive_operations_on_clinical_tables
anti_patterns
Detects destructive SQL operations (DELETE, UPDATE, INSERT, TRUNCATE, DROP, ALTER) on protected clinical tables. Analytical SQL should be read-only.
Analytical SQL against OMOP clinical tables should be strictly read-only. DELETE, UPDATE, INSERT, TRUNCATE, DROP, and ALTER statements against person, condition_occurrence, drug_exposure, etc. either corrupt cohort history or break reproducibility for anyone else using the same warehouse. Any data modification belongs in a governed ETL pipeline, not in a cohort-exploration query.
DELETE FROM condition_occurrence
WHERE person_id = 1;
SELECT *
FROM condition_occurrence
WHERE person_id = 1;
warning
Duplicate Column Alias
anti_patterns.duplicate_column_alias
anti_patterns
Detects when the same expression is selected multiple times with different aliases. This is usually a copy-paste error and creates redundant data in the result set.
Selecting the same expression twice with different aliases is almost always the result of copy-paste during iteration. It produces result-set columns that carry the same value row-for-row but different names, doubling the output width without adding information. Remove the duplicate, or if you genuinely need the same value projected twice, make the second projection compute something different (or name them differently via a view).
SELECT person_id AS pid_a, person_id AS pid_b
FROM person;
SELECT person_id
FROM person;
error
Having Without Group By
anti_patterns.having_without_group_by
anti_patterns
HAVING clauses without GROUP BY are almost always a logic error. Add GROUP BY clause or use WHERE instead.
HAVING filters aggregate groups; it is designed to run after GROUP BY collapses rows. Without a GROUP BY clause, HAVING filters over an implicit single-group aggregate, which some dialects accept with surprising semantics and others reject. In OMOP analytics this is almost always a mistake — the author wanted WHERE (for row-level filtering) or forgot to add the GROUP BY for their aggregates.
SELECT person_id
FROM person
HAVING person_id > 1;
SELECT person_id
FROM person
WHERE person_id > 1;
error
Join Key Validation
anti_patterns.join_key_validation
anti_patterns
Detects incorrect or suspicious JOIN conditions. Joining incompatible keys (e.g., person_id = concept_id) produces invalid results.
OMOP uses distinct ID namespaces for different entities: person_id, visit_occurrence_id, provider_id, care_site_id, concept_id, and so on. Joining incompatible ID columns (e.g. `person.person_id = concept.concept_id`) rarely errors, because they're both integers — but every match is a numeric coincidence, not a real relationship. The result is garbage rows. Pair keys with their canonical foreign-key counterparts.
SELECT *
FROM person p
JOIN concept c ON p.person_id = c.concept_id;
SELECT p.person_id, c.concept_name AS gender
FROM person p
JOIN concept c ON p.gender_concept_id = c.concept_id;
warning
LIMIT Without ORDER BY
anti_patterns.limit_without_order_by
anti_patterns
A SELECT with LIMIT, TOP, or FETCH FIRST but no ORDER BY returns non-deterministic rows. Different engines and even successive runs may return different rows. Add ORDER BY to make the result reproducible.
Most SQL engines make no guarantee about row order in the absence of an explicit ``ORDER BY`` clause. Pairing ``LIMIT N`` (or ``TOP N``, or ``FETCH FIRST N ROWS ONLY``) with no ordering produces non-deterministic results: a query that asks for "100 patients" may return a different 100 each run, depending on plan choice, parallel-execution interleaving, or storage layout. The same shape breaks pagination (page 2 may overlap page 1) and CI tests that assert on specific rows. The fix is always the same: add a stable ``ORDER BY`` (typically the table's primary key or a composite key that uniquely identifies a row) before the row-limiting clause.
SELECT person_id, condition_concept_id
FROM condition_occurrence
LIMIT 100;
SELECT person_id, condition_concept_id
FROM condition_occurrence
ORDER BY condition_occurrence_id
LIMIT 100;
warning
No DISTINCT on Primary Key Column
anti_patterns.no_distinct_on_primary_key_column
anti_patterns
Detects redundant use of DISTINCT on primary key columns, which are unique by definition. This may indicate a misunderstanding of the data or missing join conditions.
DISTINCT on a primary-key column is a tautology: primary keys are unique by definition, so the de-duplication never actually removes a row. Its appearance usually signals one of two things: the author doesn't realise the column is a primary key (worth double-checking the table model), or they suspect a JOIN is causing duplicates and are papering over it with DISTINCT instead of fixing the join. Remove the DISTINCT and, if duplicates do appear, tighten the join predicate.
SELECT DISTINCT person_id
FROM person;
SELECT person_id
FROM person;
error
No String Identification
anti_patterns.no_string_identification
anti_patterns
Prevents using string matching (LIKE, =, IN) on *_source_value columns to identify clinical concepts. Use *_concept_id instead.
`*_source_value` columns store the raw text pulled from the source system before OMOP mapping; they are provenance, not analytical fields. Using LIKE / equality / IN on them to find clinical concepts silently under- or over-matches as source values vary by site, clinician, and ETL version. Use the paired *_concept_id (standard) or *_source_concept_id (source-mapped) column for all concept-level filtering.
SELECT person_id
FROM condition_occurrence
WHERE condition_source_value LIKE '%diabetes%';
SELECT person_id
FROM condition_occurrence
WHERE condition_concept_id = 201820;
error
NULL Comparison Must Use IS NULL / IS NOT NULL
anti_patterns.null_comparison_operator
anti_patterns
Using comparison operators (=, <>, !=, >, <, >=, <=) with NULL is incorrect SQL. NULL comparisons always return UNKNOWN, causing logic errors. Use IS NULL or IS NOT NULL instead.
Comparing a column to NULL with `=`, `<>`, `!=`, `<`, `>`, `<=`, or `>=` always evaluates to UNKNOWN, which behaves as FALSE in WHERE and HAVING. A predicate like `WHERE death_date = NULL` returns zero rows regardless of the data, silently. Use the dedicated predicates `IS NULL` and `IS NOT NULL` — they are the only correct way to test nullability in SQL.
SELECT person_id
FROM death
WHERE death_date = NULL;
SELECT person_id
FROM death
WHERE death_date IS NULL;
error
Singleton Metadata Joined to Clinical Table
anti_patterns.singleton_metadata_clinical_join
anti_patterns
Singleton CDM-instance metadata tables (cdm_source, metadata) have no foreign keys to clinical data. Joining them to clinical tables creates meaningless cartesian products.
cdm_source and metadata describe the CDM instance itself (release version, ETL provenance, data characterization). Neither has a primary key or foreign keys into clinical data. Joining them to clinical tables multiplies every clinical row by the metadata row count, yielding a Cartesian product. Read these tables on their own or via a scalar subquery; never through a JOIN.
SELECT *
FROM person p, cdm_source cs;
SELECT *,
(SELECT cdm_version FROM cdm_source) AS cdm_version
FROM person;
warning
Standard Concept OR with Classification
anti_patterns.standard_concept_or_with_classification
anti_patterns
Flags queries that mix standard ('S') and classification ('C') concepts. Classification concepts should not be used in clinical queries.
Standard concepts ('S') are meant to represent specific clinical events; classification concepts ('C') are vocabulary-level grouping nodes (e.g. MedDRA categories). Mixing them in a cohort query — `standard_concept = 'S' OR standard_concept = 'C'` — inflates counts with classification nodes that don't correspond to any clinical event. Keep clinical queries to 'S'; reach for 'C' only when you're specifically exploring the vocabulary hierarchy.
SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_id
WHERE c.standard_concept = 'S' OR c.standard_concept = 'C';
SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_id
WHERE c.standard_concept = 'S';
warning
TOP/LIMIT for Synthetic Data Generation
anti_patterns.top_as_synthetic_data
anti_patterns
Detects when TOP N or LIMIT N is used with ROW_NUMBER() to generate synthetic lookup data from actual tables. This is fragile, non-portable, and depends on data distribution. Use explicit VALUES, generate_series(), or recursive CTEs instead.
Using `SELECT TOP N ROW_NUMBER() OVER (...) FROM some_table` (or the `LIMIT N` equivalent) to generate the integers 1..N from an arbitrary clinical table is an anti-pattern: the result depends on the table having at least N rows, it isn't portable across dialects, and it silently breaks when the reference table changes size. Use a real constant-generator — an explicit VALUES list, generate_series() in PostgreSQL, or a recursive CTE — so the synthetic data is deterministic and dialect-agnostic.
SELECT TOP 10 ROW_NUMBER() OVER (ORDER BY person_id) AS n
FROM person;
SELECT n
FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) AS t(n);
warning
Type Concept ID Domain Filter
anti_patterns.type_concept_id_domain_filter
anti_patterns
Type concept columns (*_type_concept_id) reference concepts with domain_id = 'Type Concept'. Filtering by clinical domains returns no results.
The `*_type_concept_id` columns resolve to concepts whose domain_id is 'Type Concept' — they describe record provenance (EHR, claim, patient-reported), not clinical entities. Joining `*_type_concept_id` to concept and then filtering domain_id = 'Condition' (or 'Drug', etc.) always returns zero rows because type concepts don't live in clinical domains. Use domain_id = 'Type Concept' if you need the filter, or drop the domain_id predicate entirely.
SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.condition_type_concept_id = c.concept_id
WHERE c.domain_id = 'Condition';
SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.condition_type_concept_id = c.concept_id
WHERE c.domain_id = 'Type Concept';
error
Type Concept ID Not For Clinical Filtering
anti_patterns.type_concept_id_misuse
anti_patterns
The *_type_concept_id columns represent record provenance (EHR, claim, etc.), not clinical categories. Do not use them for cohort definition or clinical filtering in WHERE/HAVING clauses. Using them in GROUP BY, SELECT, or JOIN for labeling is acceptable.
`*_type_concept_id` columns carry provenance (EHR, claim, patient-reported, registry) — information about *where* the record came from, not what the record means clinically. Using them in WHERE / GROUP BY / HAVING as if they were clinical concepts yields cohorts defined by data-source type, not disease. For cohort logic use the primary `*_concept_id` column; keep `*_type_concept_id` for data-quality and provenance audits.
SELECT person_id
FROM condition_occurrence
WHERE condition_type_concept_id = 32020;
SELECT person_id
FROM condition_occurrence
WHERE condition_concept_id = 201820;
error
Concept Ancestor Cross-Domain Validation
concept_standardization.concept_ancestor_cross_domain
concept_standardization
Ensures concept_ancestor hierarchies are used within the correct domain. Hierarchies are domain-specific and do not cross domains.
concept_ancestor hierarchies are built within a single domain: a Condition ancestor has only Condition descendants, a Drug ancestor has only Drug descendants, and so on. Filtering descendants by a different domain_id always yields zero rows, which is almost never what the author intended. Cross-domain lookups belong in concept_relationship (e.g. 'Has indication' from Drug to Condition), not in concept_ancestor.
SELECT ca.descendant_concept_id
FROM concept_ancestor ca
JOIN concept c ON ca.descendant_concept_id = c.concept_id
WHERE ca.ancestor_concept_id = 201820
AND c.domain_id = 'Drug';
SELECT ca.descendant_concept_id
FROM concept_ancestor ca
JOIN concept c ON ca.descendant_concept_id = c.concept_id
WHERE ca.ancestor_concept_id = 201820
AND c.domain_id = 'Condition';
error
Concept Ancestor Max Levels Misuse
concept_standardization.concept_ancestor_max_levels_misuse
concept_standardization
Detects incorrect usage of max_levels_of_separation for identifying direct relationships. Due to multiple hierarchy paths, exact equality on max_levels_of_separation is unreliable.
A concept pair can appear multiple times in concept_ancestor via different hierarchy paths, each with its own max_levels_of_separation. Exact equality on that column only matches pairs whose longest path is exactly that value and misses pairs where a shorter path also exists. Use max_levels_of_separation <= N for depth limits, or min_levels_of_separation = 1 when you specifically want direct parent-child relationships.
SELECT descendant_concept_id
FROM concept_ancestor
WHERE ancestor_concept_id = 201820
AND max_levels_of_separation = 1;
SELECT descendant_concept_id
FROM concept_ancestor
WHERE ancestor_concept_id = 201820
AND max_levels_of_separation <= 1;
error
Concept Ancestor Rollup Direction
concept_standardization.concept_ancestor_rollup_direction
concept_standardization
Ensures correct usage of concept_ancestor for hierarchical rollups: join clinical concept_id to descendant_concept_id and filter on ancestor_concept_id.
concept_ancestor is directional: ancestor_concept_id is the broader category (e.g. 201820 = 'Diabetes mellitus') and descendant_concept_id is every more specific concept beneath it. To roll a clinical event up to a parent concept, join the event's *_concept_id to descendant_concept_id and filter on ancestor_concept_id. Swapping the columns silently returns the wrong cohort, usually either empty or a small unrelated subset, and is a common source of hard-to-debug analytical errors.
SELECT co.condition_occurrence_id
FROM condition_occurrence co
JOIN concept_ancestor ca
ON co.condition_concept_id = ca.ancestor_concept_id
WHERE ca.descendant_concept_id = 201820;
SELECT co.condition_occurrence_id
FROM condition_occurrence co
JOIN concept_ancestor ca
ON co.condition_concept_id = ca.descendant_concept_id
WHERE ca.ancestor_concept_id = 201820;
warning
Concept Ancestor Self-Include Redundancy
concept_standardization.concept_ancestor_self_include_redundancy
concept_standardization
concept_ancestor includes self (min_levels_of_separation = 0). Explicit inclusion of anchor concepts causes duplication.
concept_ancestor already includes every concept as its own descendant (rows where min_levels_of_separation = 0). Adding a UNION or OR that explicitly includes the anchor concept_id duplicates the anchor row in the result. The fix is to trust concept_ancestor's self-inclusion and drop the explicit branch, unless the anchor is deliberately being flagged and tracked separately (in which case use UNION ALL with an explicit marker column so the duplication is intentional).
SELECT descendant_concept_id AS concept_id
FROM concept_ancestor
WHERE ancestor_concept_id = 201820
UNION
SELECT 201820 AS concept_id
FROM concept
WHERE concept_id = 201820;
SELECT descendant_concept_id AS concept_id
FROM concept_ancestor
WHERE ancestor_concept_id = 201820;
warning
Concept Class ID Ingredient for Drug Grouping
concept_standardization.concept_class_id_ingredient_for_drug_grouping
concept_standardization
Grouping drug data by ingredient requires concept_class_id = 'Ingredient'. Using product-level classes groups at formulation level instead.
Drug concepts in OMOP are stored at several granularities: Ingredient (metformin), Clinical Drug Form (metformin tablet), Clinical Drug (metformin 500 mg oral tablet), Branded Drug, and so on. Labelling a result column 'ingredient' while grouping by a more specific concept_class (like Clinical Drug) reports formulation-level counts under an ingredient name, which fragments and undercounts real ingredient usage. Either filter directly to concept_class_id = 'Ingredient', or roll products up to their ingredient via concept_ancestor.
SELECT c.concept_name AS ingredient, COUNT(*) AS n
FROM drug_exposure de
JOIN concept c ON de.drug_concept_id = c.concept_id
WHERE c.concept_class_id = 'Clinical Drug'
GROUP BY c.concept_name;
SELECT ing.concept_name AS ingredient, COUNT(*) AS n
FROM drug_exposure de
JOIN concept_ancestor ca ON de.drug_concept_id = ca.descendant_concept_id
JOIN concept ing ON ca.ancestor_concept_id = ing.concept_id
WHERE ing.concept_class_id = 'Ingredient'
GROUP BY ing.concept_name;
warning
Concept Domain ID Matches Target Table
concept_standardization.concept_domain_validation
concept_standardization
Validates that concept.domain_id matches the expected domain for each *_concept_id column.
Each OMOP *_concept_id column has a canonical domain_id: condition_occurrence.condition_concept_id is always 'Condition', drug_exposure.drug_concept_id is always 'Drug', and so on. Filtering a concept join with the wrong domain (e.g. c.domain_id = 'Drug' while joining on condition_concept_id) silently returns zero rows. The fix is to match the domain to the column.
SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_id
WHERE c.domain_id = 'Drug';
SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_id
WHERE c.domain_id = 'Condition';
warning
Concept Synonym Language Concept ID
concept_standardization.concept_synonym_language_concept_id
concept_standardization
concept_synonym stores synonyms in multiple languages. When searching by concept_synonym_name, filter by language_concept_id.
The concept_synonym table stores synonym names in many languages: English (language_concept_id = 4180186), Spanish, German, French, Dutch, etc. Searching concept_synonym_name with a free-text match (LIKE / equality) without filtering by language inadvertently returns concepts whose non-English synonyms contain the query string, which is almost never the intent in English-speaking deployments. Add an explicit language filter, or use IN to allow a deliberate multilingual scope.
SELECT cs.concept_id
FROM concept_synonym cs
WHERE cs.concept_synonym_name LIKE '%diabetes%';
SELECT cs.concept_id
FROM concept_synonym cs
WHERE cs.concept_synonym_name LIKE '%diabetes%'
AND cs.language_concept_id = 4180186; -- English
warning
Domain Vocabulary Validation (VOCAB_022-025)
concept_standardization.domain_vocabulary_validation
concept_standardization
Standard *_concept_id columns should align with domain-specific standard vocabularies. Filtering by source vocabularies is likely incorrect.
Each OMOP domain has a canonical standard vocabulary: Condition → SNOMED, Drug → RxNorm/RxNorm Extension, Procedure → SNOMED / CPT4 / HCPCS depending on site, Measurement → LOINC, Unit → UCUM. Filtering a standard *_concept_id column by a *source* vocabulary like ICD10CM, ICD9CM, or NDC returns zero rows because those codes live on the source side. Use the domain's standard vocabulary, or switch to the *_source_concept_id column if you genuinely want to filter on the originating code system.
SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_id
WHERE c.vocabulary_id = 'ICD10CM';
SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_id
WHERE c.vocabulary_id = 'SNOMED';
error
Era Tables Use Standard Concepts Only
concept_standardization.era_table_standard_concepts
concept_standardization
Era tables contain only standard concepts. Filtering for non-standard concepts will always return 0 rows.
OMOP era tables (condition_era, drug_era, dose_era) are derived from their underlying occurrence tables by rolling up standard concepts over contiguous time windows. By construction every row references a standard concept, so filtering them with standard_concept IN ('C', NULL) or <> 'S' always returns zero rows. Either drop the filter entirely or filter for 'S' explicitly to make the intent readable.
SELECT de.person_id
FROM drug_era de
JOIN concept c ON de.drug_concept_id = c.concept_id
WHERE c.standard_concept = 'C';
SELECT de.person_id
FROM drug_era de
JOIN concept c ON de.drug_concept_id = c.concept_id
WHERE c.standard_concept = 'S';
warning
Invalid Reason Enforcement (strict mode)
concept_standardization.invalid_reason_enforcement
concept_standardization
Strict-mode-only. When enabled, ensures queries on vocabulary/concept tables filter by invalid_reason so retired or upgraded concepts are not silently included. Silent in default mode.
Concept tables (concept, concept_relationship, concept_ancestor) carry an invalid_reason column that marks retired or replaced entries: 'D' for deprecated, 'U' for upgraded to a newer concept, NULL for currently valid. Queries that omit an `invalid_reason IS NULL` predicate silently include retired rows, which can pull in concept_ids that your site no longer records or that map onward to a better successor. **Gated behind strict mode.** Real-world OMOP queries on the concept table almost always omit this filter — firing in default mode produced a warning on essentially every realistic query and diluted the signal of every other rule. Enable with ``--strict`` (CLI) or ``strict=True`` (API) when you want the vocabulary-hygiene check applied; the rule is silent otherwise.
SELECT concept_id, concept_name
FROM concept
WHERE vocabulary_id = 'SNOMED';
SELECT concept_id, concept_name
FROM concept
WHERE vocabulary_id = 'SNOMED'
AND invalid_reason IS NULL;
warning
Maps To Target Standard Validation
concept_standardization.maps_to_target_standard_validation
concept_standardization
When using concept_relationship with relationship_id = 'Maps to', the target (concept_id_2) should be validated as a standard concept via a join to concept table with standard_concept = 'S'. Without this, queries may return deprecated or intermediate non-standard concepts. This is a best practice recommendation - the query will execute correctly but may include non-standard targets.
concept_relationship with relationship_id = 'Maps to' is the canonical mapping from source codes (ICD10CM, NDC, etc.) to OMOP standard concepts. But the relationship table also contains historical and intermediate mappings, so concept_id_2 is not guaranteed to be standard just because the relationship is 'Maps to'. Without a join to concept with standard_concept = 'S', queries can pick up deprecated or non-standard targets that break downstream cohort logic.
SELECT cr.concept_id_2 AS target_concept_id
FROM concept_relationship cr
WHERE cr.concept_id_1 = 44831230
AND cr.relationship_id = 'Maps to';
SELECT cr.concept_id_2 AS target_concept_id
FROM concept_relationship cr
JOIN concept c ON cr.concept_id_2 = c.concept_id
WHERE cr.concept_id_1 = 44831230
AND cr.relationship_id = 'Maps to'
AND c.standard_concept = 'S';
warning
Multiple Maps To Targets Not Handled
concept_standardization.multiple_maps_to_targets
concept_standardization
A source concept can map to multiple standard concepts via 'Maps to'. Queries assuming 1:1 mapping may produce incorrect results.
Not every source concept maps 1:1 to a standard concept: one ICD-10 code can split into multiple SNOMED concepts, and one NDC can fan out to several RxNorm ingredients. Querying concept_relationship for 'Maps to' without DISTINCT (or a GROUP BY / aggregate handling the fan-out) duplicates source rows whenever a one-to-many mapping exists, inflating downstream counts. Decide deliberately: DISTINCT to collapse, GROUP BY with ARRAY_AGG to preserve all targets, or a subquery that picks one target per source.
SELECT cr.concept_id_1, cr.concept_id_2
FROM concept_relationship cr
WHERE cr.relationship_id = 'Maps to';
SELECT DISTINCT cr.concept_id_1, cr.concept_id_2
FROM concept_relationship cr
WHERE cr.relationship_id = 'Maps to';
warning
Source Concept ID Should Not Filter Standard Concepts
concept_standardization.source_concept_id_standard_filter
concept_standardization
Joining *_source_concept_id to concept with standard_concept = 'S' is semantically incorrect. Source concepts are non-standard.
Every OMOP clinical table has two concept columns: *_concept_id (the standard vocabulary mapping, e.g. SNOMED for conditions) and *_source_concept_id (the code from the originating system, e.g. ICD10CM). Source concepts are by definition non-standard, so joining *_source_concept_id to concept and then filtering standard_concept = 'S' always returns zero rows. Either join against *_concept_id (for standard-code analytics) or drop the standard filter (to inspect source codes).
SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.condition_source_concept_id = c.concept_id
WHERE c.standard_concept = 'S';
SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_id
WHERE c.standard_concept = 'S';
warning
Source Concept ID Not For Analytical Filtering
concept_standardization.source_concept_id_warning
concept_standardization
Avoid using *_source_concept_id for cohort definition or analytical filtering. Use standard *_concept_id instead.
*_source_concept_id stores the original, unmapped code from the source system (ICD10CM, NDC, CPT, etc.) and is intended for audit and ETL provenance, not for cohort analytics. Filtering cohorts on source concepts produces results that do not federate across sites (each warehouse ETL maps slightly differently) and misses patients whose underlying data was mapped from a different source vocabulary. Use the paired *_concept_id (standard) column for any cohort or analytical query.
SELECT person_id
FROM condition_occurrence
WHERE condition_source_concept_id = 44831230;
SELECT person_id
FROM condition_occurrence
WHERE condition_concept_id = 201820;
warning
Source to Concept Map Validation
concept_standardization.source_to_concept_map_validation
concept_standardization
Requires source_vocabulary_id when filtering source_code to avoid ambiguity across vocabularies.
source_to_concept_map is the per-site translation table from source codes to OMOP standard concepts. Source codes are not globally unique: 'R51' can exist in ICD10CM (headache), ICD9CM, and some local billing vocabularies. Filtering on source_code alone can pick up matches from unrelated vocabularies, returning incorrect target_concept_ids. Always pair source_code with source_vocabulary_id so the match is unambiguous.
SELECT target_concept_id
FROM source_to_concept_map
WHERE source_code = 'R51';
SELECT target_concept_id
FROM source_to_concept_map
WHERE source_code = 'R51'
AND source_vocabulary_id = 'ICD10CM';
warning
Standard Concept Enforcement
concept_standardization.standard_concept_enforcement
concept_standardization
Ensures queries using STANDARD concept fields enforce standard concepts via concept.standard_concept = 'S' or concept_relationship 'Maps to'
Standard OMOP *_concept_id columns can point to non-standard or deprecated concepts unless the query explicitly enforces standard_concept = 'S'. Without that filter, cohort queries silently mix in classification-only concepts ('C'), invalid entries, or legacy mappings that never should have persisted, producing over-counts or non-reproducible results across sites. Era tables (condition_era, drug_era) and a handful of other columns are already guaranteed-standard by spec and are excluded from this rule.
SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_id
WHERE c.vocabulary_id = 'SNOMED';
SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_id
WHERE c.vocabulary_id = 'SNOMED'
AND c.standard_concept = 'S';
error
Standard Concept Value Validation
concept_standardization.standard_concept_value_validation
concept_standardization
Ensures standard_concept uses only valid values: 'S', 'C', or NULL.
The concept.standard_concept column has exactly three valid values: 'S' (standard), 'C' (classification), and NULL (neither, typically source vocabularies). Any other literal, like 'X', 'Y' or an empty string, matches no rows and is almost always a typo from a developer who remembered there was some letter value. The fix is usually 'S'; reach for 'C' only when you specifically want classification-level concepts such as MedDRA grouping.
SELECT concept_id
FROM concept
WHERE standard_concept = 'X';
SELECT concept_id
FROM concept
WHERE standard_concept = 'S';
warning
Unit Vocabulary Validation
concept_standardization.unit_vocabulary_validation
concept_standardization
Standard unit concepts use vocabulary_id = 'UCUM'. Filtering unit_concept_id with other vocabulary IDs returns non-standard units or zero results.
OMOP unit concepts live in the UCUM vocabulary (Unified Code for Units of Measure). Filtering unit_concept_id joins with vocabulary_id = 'LOINC' (or any other non-UCUM vocabulary) returns zero rows because units are not stored there. Use UCUM for unit lookups, or drop the vocabulary filter if the join already constrains to unit_concept_id.
SELECT m.person_id
FROM measurement m
JOIN concept c ON m.unit_concept_id = c.concept_id
WHERE c.vocabulary_id = 'LOINC';
SELECT m.person_id
FROM measurement m
JOIN concept c ON m.unit_concept_id = c.concept_id
WHERE c.vocabulary_id = 'UCUM';
error
Canonical Vocabulary String Value Validation
data_quality.canonical_string_value_validation
data_quality
Filters on concept_class_id, domain_id, and vocabulary_id must use canonical OMOP casing (e.g. 'Ingredient', 'Condition', 'SNOMED'). Case-sensitive comparison silently returns zero rows when the casing is wrong; vocabulary_id values must also be hyphen-free.
Three OMOP vocabulary string columns are case-sensitive and follow a fixed canonical casing: concept_class_id ('Ingredient', 'Clinical Drug', …), domain_id ('Condition', 'Drug', …), and vocabulary_id ('SNOMED', 'RxNorm', 'ICD10CM', …). Filtering with the wrong casing ('ingredient', 'condition', 'snomed') silently returns zero rows. vocabulary_id values are additionally hyphen-free in OMOP — 'ICD-10-CM' is invalid; the canonical form is 'ICD10CM'. Match the canonical form exactly.
SELECT concept_id
FROM concept
WHERE vocabulary_id = 'snomed';
SELECT concept_id
FROM concept
WHERE vocabulary_id = 'SNOMED';
warning
Clinical Event Date Should Not Be Before 1900
data_quality.clinical_event_date_before_1900_validation
data_quality
Detects filtering logic that targets implausible historical dates (<1900) in clinical event tables.
Dates before 1900 in OMOP clinical tables are almost always placeholders for missing or unparseable source dates, not real clinical events. Filtering for these dates typically surfaces data-quality artefacts rather than clinical signal. Restrict queries to >= 1900-01-01 unless the intent is explicitly a data-quality audit of the placeholder rows.
SELECT *
FROM condition_occurrence
WHERE condition_start_date < '1900-01-01';
SELECT *
FROM condition_occurrence
WHERE condition_start_date >= '1900-01-01';
error
Column Type Validation (SCHEMA Layer)
data_quality.column_type_validation
data_quality
Ensures compatible data types in JOIN conditions and WHERE filters. Detects mismatches such as integer-to-varchar joins or invalid literal comparisons. Type mismatches are schema-level errors that cause query failures or wrong results.
OMOP columns have specific types: *_concept_id and primary-key *_id columns are INTEGER, source_value and name columns are VARCHAR. Joining or comparing across incompatible types (e.g. INTEGER = VARCHAR) forces implicit casting and in some dialects silently returns zero rows or raises a type error. Use compatible columns or CAST explicitly when you need cross-type comparison.
SELECT *
FROM person p
JOIN condition_occurrence co ON p.person_id = co.condition_source_value;
SELECT *
FROM person p
JOIN condition_occurrence co ON p.person_id = co.person_id;
warning
Concept ID String Comparison
data_quality.concept_id_string_comparison
data_quality
All *_concept_id columns are INTEGER type. Comparing them with string literals requires implicit casting and may lead to incorrect results.
Every *_concept_id column in OMOP is an INTEGER. Comparing it to a string literal forces the database to implicitly cast one side per row. In PostgreSQL this usually works but bypasses the concept_id index; in SQL Server and BigQuery it can raise a conversion error or, worse, silently evaluate to FALSE for every row. Writing the literal as a plain integer keeps index usage intact and the semantics identical across every supported dialect.
SELECT person_id
FROM person
WHERE gender_concept_id = '8532';
SELECT person_id
FROM person
WHERE gender_concept_id = 8532;
warning
Concept Name Whitespace
data_quality.concept_name_whitespace
data_quality
concept_name values may contain trailing whitespace. Exact equality without TRIM may silently fail.
concept.concept_name values are sometimes loaded with trailing whitespace or non-breaking characters from vocabulary source files. Exact equality matching against a trimmed literal can silently fail against those rows. Use TRIM/RTRIM on the column side, or use LIKE with explicit wildcards, so the query is resilient to whitespace artefacts.
SELECT concept_id
FROM concept
WHERE concept_name = 'Type 2 diabetes mellitus';
SELECT concept_id
FROM concept
WHERE TRIM(concept_name) = 'Type 2 diabetes mellitus';
error
Episode Requires Concept Filter
data_quality.episode_requires_concept_filter
data_quality
Episode queries should filter by episode_concept_id to ensure semantic clarity and optimal query performance.
The episode table aggregates care over time (disease episodes, treatment episodes) across many episode types identified by episode_concept_id. Querying episode without a concept filter returns a mixed bag of unrelated episode types, which is rarely the intent. Add an episode_concept_id predicate or join to concept with an explicit filter to select a single episode type.
SELECT person_id
FROM episode;
SELECT person_id
FROM episode
WHERE episode_concept_id = 32528;
warning
Fact Relationship No Self-Reference
data_quality.fact_relationship_no_self_reference
data_quality
Detects patterns where fact_relationship queries filter for self-referential relationships (fact_id_1 = fact_id_2). Self-linking relationships should be rare and may indicate data quality issues or incorrect query logic.
fact_relationship links two facts together; a row where fact_id_1 = fact_id_2 represents a fact related to itself, which is almost never a real clinical relationship. The predicate `WHERE fact_id_1 = fact_id_2` typically comes from a copy-paste error where an inequality was flipped. Use `<>` if you want distinct facts, and add a relationship_concept_id filter so the query picks out one specific relationship type.
SELECT fact_id_1
FROM fact_relationship
WHERE fact_id_1 = fact_id_2;
SELECT fact_id_1, fact_id_2
FROM fact_relationship
WHERE fact_id_1 <> fact_id_2
AND relationship_concept_id = 44818859;
error
Fact Relationship Requires Relationship Concept Filter
data_quality.fact_relationship_requires_relationship_concept_filter
data_quality
Fact relationship queries should filter by relationship_concept_id to ensure semantic clarity and optimal query performance.
fact_relationship encodes every kind of linkage between OMOP facts — visits to procedures, measurements to their ordering visits, parent-child condition episodes, etc. — each identified by relationship_concept_id. Querying fact_relationship without that filter combines unrelated link types into one result set. Always restrict to the specific relationship you need.
SELECT fact_id_1, fact_id_2
FROM fact_relationship;
SELECT fact_id_1, fact_id_2
FROM fact_relationship
WHERE relationship_concept_id = 44818859;
warning
Fact Relationship Valid Concepts
data_quality.fact_relationship_valid_concepts
data_quality
Ensures that when fact_relationship joins to the concept table (for domain_concept_id_1, domain_concept_id_2, or relationship_concept_id), the query filters by invalid_reason to ensure only valid concepts are used.
When fact_relationship is joined to concept (via relationship_concept_id, domain_concept_id_1, or domain_concept_id_2), the concept side can include deprecated concepts unless filtered. That means historical or retired relationship types sneak into the result. Add an invalid_reason IS NULL predicate on each concept alias so deprecated concepts are excluded.
SELECT fr.fact_id_1
FROM fact_relationship fr
JOIN concept c ON fr.relationship_concept_id = c.concept_id;
SELECT fr.fact_id_1
FROM fact_relationship fr
JOIN concept c ON fr.relationship_concept_id = c.concept_id
WHERE c.invalid_reason IS NULL;
error
Free-Text Column Misuse
data_quality.free_text_column_misuse
data_quality
Detects when a free-text VARCHAR column (e.g. condition_occurrence.stop_reason, drug_exposure.lot_number, note_nlp.term_modifiers, location.state, location.zip) is joined to the concept table or compared to a numeric literal. These columns have no concept mapping and no numeric semantics.
Several OMOP CDM tables expose VARCHAR columns that store free-text metadata: condition_occurrence.stop_reason, drug_exposure.lot_number, note_nlp.term_modifiers, location.state, location.zip. These have no mapping into the concept table and no numeric semantics. Joining them to concept (on concept_name, concept_code, or any concept column) yields zero or coincidental matches; comparing them numerically forces an implicit cast that quietly returns no rows. Use these columns only for text filtering/display, and use the row's standard *_concept_id column for concept lookups.
SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.stop_reason = c.concept_name;
SELECT co.person_id
FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_id;
error
Incorrect Percentile Calculation
data_quality.incorrect_percentile_calculation
data_quality
Detects the copy-paste bug where percentile_25, median, and percentile_75 (or similarly-named percentile columns) all reuse the same threshold value in a CASE WHEN order_nr < threshold * population_size pattern, silently producing identical results across all percentile columns.
Hand-rolled percentile calculations using ROW_NUMBER() + population_size + CASE WHEN are easy to copy-paste and easy to get wrong: the classic bug is percentile_25, median, and percentile_75 all re-using the 0.25 threshold, so three columns report the same number. If you spot three aliased percentile columns sharing a threshold, either set them to 0.25/0.50/0.75 respectively or replace the manual logic with NTILE(4) or PERCENTILE_CONT() from the SQL standard.
SELECT
MAX(CASE WHEN order_nr < 25 * population_size THEN value END) AS percentile_25,
MAX(CASE WHEN order_nr < 25 * population_size THEN value END) AS median,
MAX(CASE WHEN order_nr < 25 * population_size THEN value END) AS percentile_75
FROM (
SELECT value,
ROW_NUMBER() OVER (ORDER BY value) AS order_nr,
COUNT(*) OVER () AS population_size
FROM measurement
) t;
SELECT
MAX(CASE WHEN order_nr < 0.25 * population_size THEN value END) AS percentile_25,
MAX(CASE WHEN order_nr < 0.50 * population_size THEN value END) AS median,
MAX(CASE WHEN order_nr < 0.75 * population_size THEN value END) AS percentile_75
FROM (
SELECT value,
ROW_NUMBER() OVER (ORDER BY value) AS order_nr,
COUNT(*) OVER () AS population_size
FROM measurement
) t;
error
Negative Concept ID Validation
data_quality.negative_concept_id_validation
data_quality
Concept IDs must be non-negative integers (>= 0).
Every concept_id in OMOP is a non-negative integer (>= 0); 0 represents unmapped records, positive values are real concepts. Negative literals never match any concept and are usually the result of either a sign error or a confusion with ROW_NUMBER-style synthetic IDs from another system. Replace the literal with the correct non-negative concept_id.
SELECT *
FROM condition_occurrence
WHERE condition_concept_id = -1;
SELECT *
FROM condition_occurrence
WHERE condition_concept_id = 201820;
warning
Non-Standard Date Literal Format
data_quality.non_standard_date_literal_format
data_quality
Detects date literals in non-standard formats (e.g., '01-jan-2011', '12/31/2020') that may be ambiguous or not portable across databases. ISO 8601 format (YYYY-MM-DD) is recommended for clarity and portability.
Date literals like '01/31/2020', '31-Jan-2020', or '01-jan-2011' are ambiguous: '01/02/2020' means January 2 in the US and February 1 in Europe, and engines disagree about which to pick. ISO 8601 (YYYY-MM-DD) is unambiguous and portable across every supported dialect. Rewrite literals to the ISO form, optionally wrapping in DATE '...' to make the type explicit.
SELECT *
FROM condition_occurrence
WHERE condition_start_date = '01/31/2020';
SELECT *
FROM condition_occurrence
WHERE condition_start_date = DATE '2020-01-31';
warning
Note NLP nlp_date for Temporal Filtering
data_quality.note_nlp_nlp_date_for_temporal_filtering
data_quality
note_nlp.nlp_date is the NLP processing time, not the clinical event date. Use note.note_date for temporal filtering.
note_nlp.nlp_date records when the NLP pipeline *processed* the source note, which can be months or years after the clinical event itself. For temporal cohort logic you need the clinical date, not the processing date — that lives on note.note_date. Join note_nlp back to note and filter on note_date instead.
SELECT note_nlp_id
FROM note_nlp
WHERE nlp_date >= DATE '2023-01-01';
SELECT nnlp.note_nlp_id
FROM note_nlp nnlp
JOIN note n ON nnlp.note_id = n.note_id
WHERE n.note_date >= DATE '2023-01-01';
warning
Note NLP Offset is Character Position
data_quality.note_nlp_offset_is_character_position
data_quality
note_nlp.offset is VARCHAR storing character positions and must not be used in joins or numeric operations without explicit casting.
note_nlp.offset is a VARCHAR column that stores the character position of the NLP-extracted term within the note text. Despite the numeric content, the column type is string — using it in a numeric comparison or JOIN without an explicit CAST leads to implicit-cast errors on strict dialects and silent mis-sorts on lenient ones. CAST to INT when you need numeric semantics.
SELECT *
FROM note_nlp
WHERE offset > 100;
SELECT *
FROM note_nlp
WHERE CAST(offset AS INT) > 100;
error
OMOP Schema Validation
data_quality.schema_validation
data_quality
Validates that all referenced tables and columns exist in OMOP CDM 5.4 schema. Schema violations indicate queries that will fail at runtime or produce incorrect results. Only validates physical table references - excludes CTEs, subqueries, and derived expressions.
Every table and column referenced in the query must exist in the OMOP CDM 5.4 specification. This rule catches typos (e.g. 'cohort_result' instead of 'cohort'), tables from other schemas or vocabulary extensions that aren't part of CDM 5.4, and non-existent columns on otherwise-valid tables. It operates only on physical references; CTEs, subquery aliases, and computed expressions are deliberately ignored so compound queries don't raise false positives.
SELECT person_id, cohort_start_date
FROM cohort_result
WHERE cohort_definition_id = 1;
SELECT condition_occurrence_id, person_id, condition_start_date
FROM condition_occurrence
WHERE condition_concept_id = 201820;
warning
Source Value Field Usage
data_quality.source_value_field_usage
data_quality
Warns when *_source_value fields (unstandardized source codes) are used for GROUP BY or analytical aggregation instead of standard concept fields
*_source_value columns store the raw, unstandardised source strings from the originating system and are meant for audit or provenance. Aggregating or grouping by source_value produces results that aren't comparable across sites (different ETLs map differently) and won't federate in multi-site studies. Aggregate on the paired *_concept_id for portability.
SELECT condition_source_value, COUNT(*) AS n
FROM condition_occurrence
GROUP BY condition_source_value;
SELECT condition_concept_id, COUNT(*) AS n
FROM condition_occurrence
GROUP BY condition_concept_id;
warning
Standard Concept NULL Handling
data_quality.standard_concept_null_handling
data_quality
Ensures correct handling of concept.standard_concept values. Non-standard concepts are represented by NULL, not 'N' or ''. Invalid comparisons may return zero rows.
In OMOP vocabulary, non-standard concepts are represented by standard_concept IS NULL — not 'N', not an empty string, not 'n'. A filter like `standard_concept = 'N'` returns zero rows because no concept actually carries that literal value. To select non-standard concepts, use `IS NULL`; to select standard or classification, use the explicit literals 'S' or 'C'.
SELECT concept_id
FROM concept
WHERE standard_concept = 'N';
SELECT concept_id
FROM concept
WHERE standard_concept IS NULL;
warning
Union Concept ID Domain Indicator
data_quality.union_concept_id_domain_indicator
data_quality
UNION queries combining concept_id values from multiple domains must include a domain indicator column to avoid ambiguity.
UNION-ing concept_id columns from multiple domains into a single output column loses the semantic difference — the downstream consumer cannot tell whether a given concept_id came from the Condition or Drug side. Always add an explicit domain-indicator column so each row's origin is preserved.
SELECT condition_concept_id AS concept_id FROM condition_occurrence
UNION
SELECT drug_concept_id AS concept_id FROM drug_exposure;
SELECT 'Condition' AS domain, condition_concept_id AS concept_id FROM condition_occurrence
UNION ALL
SELECT 'Drug' AS domain, drug_concept_id AS concept_id FROM drug_exposure;
warning
UNION vs UNION ALL for Clinical Events
data_quality.union_vs_union_all_clinical_events
data_quality
Detects UNION (without ALL) when combining clinical event data. UNION removes duplicates, but identical-looking rows may represent distinct clinical events. Use UNION ALL to preserve all events.
UNION de-duplicates rows; UNION ALL preserves them. Two distinct clinical events can share identical values on the selected columns (same person, same date, different visit) — de-duplicating them via UNION silently drops legitimate events and under-counts the cohort. Use UNION ALL for clinical-event combinations; use UNION only when you genuinely want to collapse duplicates and can justify why.
SELECT person_id, condition_start_date AS event_date
FROM condition_occurrence
UNION
SELECT person_id, drug_exposure_start_date
FROM drug_exposure;
SELECT person_id, condition_start_date AS event_date
FROM condition_occurrence
UNION ALL
SELECT person_id, drug_exposure_start_date
FROM drug_exposure;
warning
Unmapped Concept Handling
data_quality.unmapped_concept_handling
data_quality
Warns when filtering clinical tables by specific *_concept_id values without explicitly handling concept_id = 0 (unmapped records)
A longer write-up and example for this rule are still being authored.
error
Vocabulary Table Protection
data_quality.vocabulary_table_protection
data_quality
Prevents DELETE, UPDATE, INSERT, TRUNCATE, MERGE, or DROP TABLE operations on OMOP vocabulary tables. These are reference datasets managed by OHDSI and must remain read-only in analytical workflows.
OMOP vocabulary tables (concept, concept_ancestor, concept_relationship, vocabulary, domain, concept_class) are reference data distributed and versioned by OHDSI. Any local modification breaks reproducibility, invalidates cross-site comparisons, and typically conflicts with the next vocabulary release. Treat them as strictly read-only; vocabulary updates must be performed via official OHDSI releases.
DELETE FROM concept
WHERE concept_id = 0;
SELECT concept_id
FROM concept
WHERE concept_id = 0;
error
CDM v5.3 to v5.4 Column Renames
domain_specific.cdm_v53_to_v54_column_renames
domain_specific
Detects usage of deprecated OMOP CDM v5.3 column names that were renamed in v5.4.
SELECT visit_occurrence_id, admitting_source_concept_id
FROM visit_occurrence;
SELECT visit_occurrence_id, admitted_from_concept_id
FROM visit_occurrence;
error
Cohort Definition Syntax Not Executable SQL
domain_specific.cohort_definition_syntax_not_executable_sql
domain_specific
cohort_definition_syntax stores cohort definition metadata (JSON/OHDSI format), not executable SQL. Do not filter it using SQL keywords or OMOP table names.
SELECT cohort_definition_id FROM cohort_definition
WHERE cohort_definition_syntax LIKE '%SELECT%';
SELECT cohort_definition_id, cohort_definition_syntax
FROM cohort_definition;
warning
Condition Occurrence Cardinality Risk
domain_specific.condition_occurrence_cardinality_validation
domain_specific
Joining person to condition_occurrence without aggregation can produce multiple rows per person. This may lead to incorrect counts if a person has multiple condition records.
SELECT p.person_id, co.condition_concept_id
FROM person p
JOIN condition_occurrence co ON p.person_id = co.person_id;
SELECT p.person_id, COUNT(DISTINCT co.condition_concept_id) AS n_conditions
FROM person p
JOIN condition_occurrence co ON p.person_id = co.person_id
GROUP BY p.person_id;
error
Condition Occurrence Visit Hierarchy Validation
domain_specific.condition_visit_hierarchy_validation
domain_specific
When condition_occurrence joins to visit_detail, any reference to visit_occurrence columns requires a proper join to visit_occurrence through visit_detail using visit_occurrence_id.
SELECT co.condition_concept_id, vo.visit_start_date
FROM condition_occurrence co
JOIN visit_detail vd ON co.visit_detail_id = vd.visit_detail_id,
visit_occurrence vo;
SELECT co.condition_concept_id, vo.visit_start_date
FROM condition_occurrence co
JOIN visit_detail vd ON co.visit_detail_id = vd.visit_detail_id
JOIN visit_occurrence vo ON vd.visit_occurrence_id = vo.visit_occurrence_id;
warning
Cost Currency Concept ID For Multi-Currency
domain_specific.cost_currency_concept_id
domain_specific
Aggregating cost amount columns (total_paid, total_charge, etc.) without filtering or grouping by currency_concept_id mixes records from different currencies, producing incorrect financial totals.
Records in the cost table carry a currency_concept_id indicating which currency their amounts are denominated in. Summing total_paid or total_charge across mixed currencies, without filtering or grouping by currency_concept_id, produces a meaningless total (literally GBP + USD + EUR). Either restrict the query to a single currency, or aggregate per currency so downstream code can convert before rolling the figures up further.
SELECT SUM(total_paid) AS paid_total
FROM cost;
SELECT SUM(total_paid) AS paid_total
FROM cost
WHERE currency_concept_id = 44818668; -- US Dollar
error
Cost Event ID Polymorphic Resolution
domain_specific.cost_event_id_polymorphic_resolution
domain_specific
cost.cost_event_id is a polymorphic FK whose target table is identified by cost.cost_domain_id. Queries that join or filter on cost_event_id without a cost_domain_id restriction mix IDs from disjoint sequences.
cost.cost_event_id has no single foreign-key target; it points into a different table depending on cost.cost_domain_id. With cost_domain_id = 'Drug' it points into drug_exposure; with 'Visit' it points into visit_occurrence; and so on. Joining cost_event_id without filtering cost_domain_id either matches nothing (the value lives in a different sequence than the joined table's PK) or matches by coincidence (two unrelated tables happen to use overlapping integer ranges). Either outcome is a silent bug. The mirror rule for location_history.entity_id requires the same discipline.
SELECT c.cost_id, de.drug_concept_id
FROM cost c
JOIN drug_exposure de ON c.cost_event_id = de.drug_exposure_id;
SELECT c.cost_id, de.drug_concept_id
FROM cost c
JOIN drug_exposure de ON c.cost_event_id = de.drug_exposure_id
WHERE c.cost_domain_id = 'Drug';
warning
Cost Drug-Specific Columns Require Domain Filter
domain_specific.cost_paid_ingredient_cost_drug_specific
domain_specific
cost.paid_ingredient_cost and cost.paid_dispensing_fee are pharmacy-specific columns that are NULL or meaningless for non-drug costs. Filter by cost_domain_id = 'Drug' when using these columns.
SELECT cost_id, paid_ingredient_cost FROM cost;
SELECT cost_id, paid_ingredient_cost FROM cost
WHERE cost_domain_id = 'Drug';
error
Cost Payer Plan Period ID Join
domain_specific.cost_payer_plan_period_id_join
domain_specific
cost.payer_plan_period_id is a FK to payer_plan_period.payer_plan_period_id. Joins must use this column pair.
SELECT c.cost_id FROM cost c
JOIN payer_plan_period p ON c.payer_plan_period_id = p.person_id;
SELECT c.cost_id FROM cost c
JOIN payer_plan_period p ON c.payer_plan_period_id = p.payer_plan_period_id;
error
Death Cause Source Concept Not For Analytical Filtering
domain_specific.death_cause_source_concept_validation
domain_specific
Avoid using death.cause_source_concept_id for analytical filtering. Use death.cause_concept_id instead.
SELECT person_id FROM death
WHERE cause_source_concept_id = 4316491;
SELECT person_id FROM death
WHERE cause_concept_id = 4316491;
error
Death Join to Person Not to Clinical Event
domain_specific.death_join_to_person_not_to_clinical_event
domain_specific
The death table joins to person on person_id and has no foreign keys to clinical event tables. Joins between death and clinical tables must include person_id.
SELECT d.person_id FROM death d
JOIN condition_occurrence co ON d.death_date = co.condition_start_date;
SELECT d.person_id FROM death d
JOIN condition_occurrence co ON d.person_id = co.person_id;
warning
Dose Era Cross-Unit Comparison
domain_specific.dose_era_cross_unit_comparison
domain_specific
Aggregating dose_era.dose_value without constraining unit_concept_id mixes incompatible drug dose units (mg, mcg, IU, mL, …) and produces meaningless averages.
``dose_era.dose_value`` is stored alongside ``unit_concept_id``; the same drug ingredient may have rows in mg, mcg, IU, mL, mEq, etc. across sites and patients. Computing AVG / SUM / MIN / MAX without filtering or grouping by unit_concept_id mixes these into a single number that has no clinical meaning. The same discipline applies as for ``measurement.value_as_number``: pick one unit, or report per-unit aggregates.
SELECT AVG(dose_value) AS avg_dose
FROM dose_era
WHERE drug_concept_id = 1124300;
SELECT AVG(dose_value) AS avg_dose_mg
FROM dose_era
WHERE drug_concept_id = 1124300
AND unit_concept_id = 8576;
warning
Drug Days Supply Validation
domain_specific.drug_days_supply_validation
domain_specific
Validates that drug_exposure.days_supply is in a plausible range (1 to 365 days). Values outside this range indicate data quality issues or query logic errors.
SELECT person_id FROM drug_exposure
WHERE days_supply = 400;
SELECT person_id FROM drug_exposure
WHERE days_supply BETWEEN 1 AND 365;
error
Drug Era Concept Class Validation
domain_specific.drug_era_concept_class_validation
domain_specific
Ensures drug_era is filtered only on Ingredient-level concepts.
SELECT de.person_id FROM drug_era de
JOIN concept c ON de.drug_concept_id = c.concept_id
WHERE c.concept_class_id = 'Branded Drug';
SELECT de.person_id FROM drug_era de
JOIN concept c ON de.drug_concept_id = c.concept_id
WHERE c.concept_class_id = 'Ingredient';
warning
Drug Exposure Cardinality Awareness
domain_specific.drug_exposure_cardinality_validation
domain_specific
Counting drug_exposure rows may overcount patients due to multiple exposures (e.g., refills, restarts).
SELECT drug_concept_id, COUNT(*) AS patient_count
FROM drug_exposure
GROUP BY drug_concept_id;
SELECT drug_concept_id, COUNT(DISTINCT person_id) AS patient_count
FROM drug_exposure
GROUP BY drug_concept_id;
warning
Drug Exposure Quantity Misuse
domain_specific.drug_exposure_quantity_misuse
domain_specific
Detects use of drug_exposure.quantity as duration in date logic.
SELECT person_id,
drug_exposure_start_date + quantity AS end_date
FROM drug_exposure;
SELECT person_id,
drug_exposure_start_date + days_supply AS end_date
FROM drug_exposure;
warning
Drug Exposure Sig Parsing
domain_specific.drug_exposure_sig_parsing
domain_specific
Detects string parsing of drug_exposure.sig to extract structured dose information. The sig field is free-text and not standardized.
SELECT CAST(SUBSTRING(sig, 1, 3) AS INT) AS dose
FROM drug_exposure;
SELECT de.person_id, ds.amount_value, ds.amount_unit_concept_id
FROM drug_exposure de
JOIN drug_strength ds ON de.drug_concept_id = ds.drug_concept_id
WHERE ds.invalid_reason IS NULL;
warning
Drug Quantity Validation
domain_specific.drug_quantity_validation
domain_specific
Ensures drug_exposure.quantity is non-negative. Negative values indicate data quality or query logic issues.
SELECT person_id FROM drug_exposure
WHERE quantity < 0;
SELECT person_id FROM drug_exposure
WHERE quantity >= 0;
warning
Drug Strength Completeness (Amount vs Concentration)
domain_specific.drug_strength_numerator_denominator_for_concentration
domain_specific
drug_strength stores solid formulations in amount_value and liquid/injectable formulations in numerator_value/denominator_value. Queries using only amount_value exclude concentration-based drugs.
SELECT drug_concept_id, amount_value FROM drug_strength
WHERE amount_value > 500;
SELECT drug_concept_id,
COALESCE(amount_value, numerator_value / denominator_value) AS dose
FROM drug_strength;
warning
Drug Strength Validity Filter
domain_specific.drug_strength_validity_filter
domain_specific
drug_strength is time-versioned. Queries must filter for currently valid records.
SELECT drug_concept_id, amount_value FROM drug_strength;
SELECT drug_concept_id, amount_value FROM drug_strength
WHERE invalid_reason IS NULL
AND CURRENT_DATE BETWEEN valid_start_date AND valid_end_date;
error
Episode Event No Person ID
domain_specific.episode_event_no_person_id
domain_specific
episode_event table has no person_id column. To access person data, join episode_event to episode, then episode to person.
SELECT person_id FROM episode_event;
SELECT e.person_id FROM episode_event ee
JOIN episode e ON ee.episode_id = e.episode_id;
error
Episode Parent ID Self Join
domain_specific.episode_parent_id_self_join
domain_specific
episode.episode_parent_id is a self-referential FK to episode.episode_id. It must only join to episode.episode_id.
SELECT e.episode_id FROM episode e
JOIN episode parent ON e.episode_parent_id = parent.person_id;
SELECT e.episode_id FROM episode e
JOIN episode parent ON e.episode_parent_id = parent.episode_id;
warning
Event Cardinality Risk
domain_specific.event_cardinality_validation
domain_specific
Joining person to observation, or visit_occurrence to visit_detail, without aggregation produces multiple rows per parent record. Silent fan-out distorts downstream counts.
OMOP allows multiple ``observation`` rows per person (a single visit can produce dozens of observations) and multiple ``visit_detail`` rows per ``visit_occurrence`` (different units, transfers, sub-encounters). Plain joins from the parent table to either of these without aggregation produce row-level fan-out: a downstream ``COUNT(*) AS patients`` returns observation rows, not patients; ``COUNT(*) AS visits`` returns visit-detail rows, not visits. Use GROUP BY on the parent key, ``DISTINCT``, or explicit aggregation. Sibling rules already enforce this pattern for ``condition_occurrence``, ``drug_exposure``, and ``measurement``.
SELECT p.person_id, o.observation_concept_id
FROM person p
JOIN observation o ON p.person_id = o.person_id;
SELECT p.person_id, COUNT(DISTINCT o.observation_concept_id) AS n_obs
FROM person p
JOIN observation o ON p.person_id = o.person_id
GROUP BY p.person_id;
error
Event Date Column Correctness
domain_specific.event_date_column_correctness
domain_specific
Certain OMOP clinical event tables (procedure_occurrence, measurement, observation, specimen, note) use simplified date column names without '_start' suffix. Referencing non-existent *_start_date columns will cause errors.
SELECT person_id, procedure_start_date
FROM procedure_occurrence;
SELECT person_id, procedure_date
FROM procedure_occurrence;
error
Event-Field Polymorphic Resolution
domain_specific.event_field_polymorphic_resolution
domain_specific
OMOP v5.4 polymorphic FKs (note.note_event_id, observation.observation_event_id, measurement.measurement_event_id, episode_event.event_id) require their sibling *_event_field_concept_id to be filtered. Without the filter the join mixes IDs from disjoint sequences.
Each ``*_event_id`` column in OMOP v5.4 is a polymorphic foreign key — an INTEGER that points at a row in a different clinical table depending on the value of the sibling ``*_event_field_concept_id`` column. With no filter on the field-concept side, the analyst joins integer IDs from disjoint sequences: a ``measurement_event_id`` of 1234 might happen to equal a ``visit_occurrence_id`` of 1234 in the joined table, but the row doesn't refer to that visit at all. Filter the field-concept column first to identify which target table the event_id addresses, then join. The same discipline is enforced for ``cost.cost_event_id`` and ``location_history.entity_id``.
SELECT m.measurement_id, vo.visit_concept_id
FROM measurement m
JOIN visit_occurrence vo ON m.measurement_event_id = vo.visit_occurrence_id;
SELECT m.measurement_id, vo.visit_concept_id
FROM measurement m
JOIN visit_occurrence vo ON m.measurement_event_id = vo.visit_occurrence_id
WHERE m.meas_event_field_concept_id IS NOT NULL;
error
Location History Entity ID Requires Domain ID
domain_specific.location_history_entity_id_requires_domain_id
domain_specific
location_history.entity_id is a polymorphic FK identified by domain_id. Joins must filter domain_id to match the target table.
SELECT lh.location_history_id FROM location_history lh
JOIN person p ON lh.entity_id = p.person_id;
SELECT lh.location_history_id FROM location_history lh
JOIN person p ON lh.entity_id = p.person_id
WHERE lh.domain_id = 'Person';
warning
Measurement Cross-Unit Comparison
domain_specific.measurement_cross_unit_comparison
domain_specific
Aggregating measurement.value_as_number without constraining unit_concept_id mixes incompatible units and produces meaningless results.
SELECT AVG(m.value_as_number) FROM measurement m
WHERE m.measurement_concept_id = 3004249;
SELECT AVG(m.value_as_number) FROM measurement m
WHERE m.measurement_concept_id = 3004249
AND m.unit_concept_id = 8876;
warning
Measurement Duplicate Detection
domain_specific.measurement_duplicate_detection
domain_specific
Detects aggregation on measurement data without handling duplicates. Duplicate measurement records can exist and affect results.
SELECT AVG(value_as_number) FROM measurement;
SELECT person_id, measurement_date, measurement_concept_id,
AVG(value_as_number) AS avg_value
FROM measurement
GROUP BY person_id, measurement_date, measurement_concept_id;
error
Measurement Operator Concept Validation
domain_specific.measurement_operator_concept_validation
domain_specific
Ensures measurement.operator_concept_id uses only valid operator concepts.
SELECT person_id FROM measurement
WHERE operator_concept_id = 99999;
SELECT person_id FROM measurement
WHERE operator_concept_id IN (4172703, 4172704, 4171756, 4171754, 4171755);
error
Measurement Range Low/High Validation
domain_specific.measurement_range_low_high_validation
domain_specific
Detects logically impossible constraints where range_low > range_high.
SELECT person_id FROM measurement
WHERE range_low > range_high;
SELECT person_id FROM measurement
WHERE range_low <= range_high;
warning
Measurement Unit Validation
domain_specific.measurement_unit_validation
domain_specific
Detects queries that filter measurement.value_as_number against a numeric threshold without also constraining unit_concept_id. The same measurement concept can be stored in different units across sites (e.g. glucose in mmol/L vs mg/dL). A numeric threshold applied without a unit filter silently mixes patients measured in different unit conventions.
SELECT m.person_id FROM measurement m
WHERE m.measurement_concept_id = 3004249
AND m.value_as_number > 140;
SELECT m.person_id FROM measurement m
WHERE m.measurement_concept_id = 3004249
AND m.value_as_number > 140
AND m.unit_concept_id = 8876;
error
Measurement Value Representation Consistency
domain_specific.measurement_value_as_number_and_concept_validation
domain_specific
Detects when value_as_number and value_as_concept_id are both filtered with AND, which may indicate inconsistent use of quantitative and qualitative representations.
SELECT person_id FROM measurement
WHERE value_as_number > 140 AND value_as_concept_id = 4126681;
SELECT person_id FROM measurement
WHERE value_as_number > 140;
warning
Note NLP Snippet Misuse
domain_specific.note_nlp_snippet_misuse
domain_specific
note_nlp.snippet and lexical_variant are free text for context, not structured data. Use note_nlp_concept_id instead.
SELECT note_nlp_id FROM note_nlp
WHERE snippet LIKE '%diabetes%';
SELECT note_nlp_id FROM note_nlp
WHERE note_nlp_concept_id = 201820;
warning
Observation Value As Columns Mutually Contextual
domain_specific.observation_value_as_columns_mutually_contextual
domain_specific
value_as_number, value_as_string, and value_as_concept_id represent the same value in different formats. Only one is typically populated per row.
SELECT person_id FROM observation
WHERE value_as_number > 5 AND value_as_string = 'positive';
SELECT person_id FROM observation
WHERE value_as_number > 5;
error
Observation Value As Concept Confusion
domain_specific.observation_value_as_concept_confusion
domain_specific
Detects when the same concept_id is used for both observation_concept_id and value_as_concept_id within the same logical condition.
SELECT person_id FROM observation
WHERE observation_concept_id = 4083587
AND value_as_concept_id = 4083587;
SELECT person_id FROM observation
WHERE observation_concept_id = 4083587
AND value_as_concept_id = 4188539;
error
Observation Value As String Numeric Comparison
domain_specific.observation_value_as_string_numeric_comparison
domain_specific
Detects numeric comparisons on observation.value_as_string (VARCHAR). This can lead to incorrect results.
SELECT person_id FROM observation
WHERE value_as_string > 100;
SELECT person_id FROM observation
WHERE value_as_number > 100;
error
Person Birth Field Validation
domain_specific.person_birth_field_validation
domain_specific
Ensures person birth fields (year_of_birth, month_of_birth, day_of_birth) use plausible values within accepted ranges.
SELECT person_id FROM person
WHERE year_of_birth = 1800;
SELECT person_id FROM person
WHERE year_of_birth BETWEEN 1900 AND 2024;
warning
Procedure Occurrence Quantity Semantics
domain_specific.procedure_occurrence_quantity_semantics
domain_specific
Ensures procedure_occurrence.quantity is not confused with record counts. quantity represents units per procedure event, not the number of procedure records.
SELECT procedure_concept_id, SUM(quantity) AS procedure_count
FROM procedure_occurrence
GROUP BY procedure_concept_id;
SELECT procedure_concept_id, COUNT(*) AS procedure_count
FROM procedure_occurrence
GROUP BY procedure_concept_id;
error
Specimen Source ID Not Specimen ID
domain_specific.specimen_source_id_not_specimen_id
domain_specific
specimen_source_id is a free-text identifier from the source system, not an OMOP foreign key. Use specimen.specimen_id for joins.
SELECT s.specimen_id FROM specimen s
JOIN measurement m ON s.specimen_source_id = m.measurement_source_value;
SELECT s.specimen_id FROM specimen s
JOIN measurement m ON s.specimen_id = m.specimen_id;
warning
Visit Detail Admitted/Discharged Domain Validation
domain_specific.visit_detail_admitted_discharged_domain
domain_specific
visit_detail.admitted_from_concept_id and discharged_to_concept_id must reference concepts from Visit or Place of Service domains. Hardcoded concept IDs should always be validated using concept.domain_id.
SELECT visit_detail_id FROM visit_detail
WHERE admitted_from_concept_id = 8870;
SELECT vd.visit_detail_id FROM visit_detail vd
JOIN concept c ON vd.admitted_from_concept_id = c.concept_id
WHERE c.domain_id IN ('Visit', 'Place of Service');
warning
Visit Detail Dates Within Parent Visit
domain_specific.visit_detail_dates_within_parent_visit
domain_specific
Detects conditions where visit_detail dates fall outside the parent visit_occurrence range.
SELECT vd.visit_detail_id FROM visit_detail vd
JOIN visit_occurrence vo ON vd.visit_occurrence_id = vo.visit_occurrence_id
WHERE vd.visit_detail_start_date < vo.visit_start_date;
SELECT vd.visit_detail_id FROM visit_detail vd
JOIN visit_occurrence vo ON vd.visit_occurrence_id = vo.visit_occurrence_id
WHERE vd.visit_detail_start_date >= vo.visit_start_date
AND vd.visit_detail_end_date <= vo.visit_end_date;
error
Visit Detail Has No Preceding Visit Occurrence ID
domain_specific.visit_detail_has_no_preceding_visit_occurrence_id
domain_specific
visit_detail table has no preceding_visit_occurrence_id column. The temporal chain in visit_detail uses preceding_visit_detail_id. Use preceding_visit_occurrence_id only with visit_occurrence table.
SELECT visit_detail_id, preceding_visit_occurrence_id
FROM visit_detail;
SELECT visit_detail_id, preceding_visit_detail_id
FROM visit_detail;
error
Visit Detail Visit Occurrence Reference
domain_specific.visit_detail_visit_occurrence_reference
domain_specific
Detects potential issues when using visit_detail without proper visit_occurrence context or join.
SELECT visit_detail_id, visit_detail_concept_id
FROM visit_detail;
SELECT vd.visit_detail_id, vd.visit_detail_concept_id, vo.visit_concept_id
FROM visit_detail vd
JOIN visit_occurrence vo ON vd.visit_occurrence_id = vo.visit_occurrence_id;
warning
Visit Event Temporal Validation
domain_specific.visit_event_temporal_validation
domain_specific
Detects when clinical events are filtered to occur before visit_start_date, which may indicate a join mismatch or temporal inconsistency.
SELECT co.person_id FROM condition_occurrence co
JOIN visit_occurrence vo ON co.visit_occurrence_id = vo.visit_occurrence_id
WHERE co.condition_start_date < vo.visit_start_date;
SELECT co.person_id FROM condition_occurrence co
JOIN visit_occurrence vo ON co.visit_occurrence_id = vo.visit_occurrence_id
WHERE co.condition_start_date >= vo.visit_start_date;
warning
Visit Length-of-Stay Arithmetic
domain_specific.visit_length_of_stay_arithmetic
domain_specific
Computing visit length-of-stay (visit_end_date - visit_start_date or DATEDIFF) without restricting to inpatient-like visit_concept_ids mixes outpatient same-day visits (LOS = 0) into the calculation.
Length-of-stay arithmetic (`visit_end_date - visit_start_date` or `DATEDIFF(day, visit_start_date, visit_end_date)`) is meaningful only for inpatient-like visit types where the end-date can exceed the start-date. Outpatient visits (`visit_concept_id = 9202`) by spec have `visit_end_date = visit_start_date`, so a mixed query averages in a sea of zeros and understates inpatient LOS. Restrict to inpatient concept ids (9201, 9203, 262, 32037, …) before computing. Also note that `visit_end_date` is nullable for ongoing stays — consider `COALESCE(visit_end_date, CURRENT_DATE)` if the query is computing on live data.
SELECT AVG(visit_end_date - visit_start_date) AS avg_los
FROM visit_occurrence;
SELECT AVG(visit_end_date - visit_start_date) AS avg_los
FROM visit_occurrence
WHERE visit_concept_id = 9201
AND visit_end_date IS NOT NULL;
error
Visit Occurrence Type Concept Domain Validation
domain_specific.visit_occurrence_type_domain
domain_specific
visit_type_concept_id must reference a concept belonging to the Type Concept domain. Queries joining visit_type_concept_id to concept must filter by domain_id = 'Type Concept'.
SELECT vo.visit_occurrence_id FROM visit_occurrence vo
JOIN concept c ON vo.visit_type_concept_id = c.concept_id;
SELECT vo.visit_occurrence_id FROM visit_occurrence vo
JOIN concept c ON vo.visit_type_concept_id = c.concept_id
WHERE c.domain_id = 'Type Concept';
warning
Visit Outpatient Same-Day Validation
domain_specific.visit_outpatient_same_day_validation
domain_specific
Detects queries that filter outpatient visits (visit_concept_id = 9202) with multi-day date range logic, which may indicate confusion with inpatient logic.
SELECT person_id FROM visit_occurrence
WHERE visit_concept_id = 9202
AND DATEDIFF(day, visit_start_date, visit_end_date) > 30;
SELECT person_id FROM visit_occurrence
WHERE visit_concept_id = 9202
AND DATEDIFF(day, visit_start_date, visit_end_date) <= 1;
error
Relationship Boolean Comparison
domain_specific.vocabulary_relationship_boolean_comparison
domain_specific
relationship.is_hierarchical and defines_ancestry must be compared with valid boolean values (0,1,TRUE,FALSE), not strings.
SELECT relationship_id FROM relationship
WHERE is_hierarchical = 'yes';
SELECT relationship_id FROM relationship
WHERE is_hierarchical = 1;
warning
Person Year-of-Birth Age Arithmetic
domain_specific.year_of_birth_age_arithmetic
domain_specific
Computing age as `<year_expression> - person.year_of_birth` rounds the result by up to a year and silently ignores birth_datetime / month_of_birth / day_of_birth. Prefer full-date arithmetic when the data carries it.
Many OMOP cohort definitions compute age at index by subtracting `person.year_of_birth` from the event year: `EXTRACT(YEAR FROM co.condition_start_date) - p.year_of_birth`. This drops the month and day entirely, so a person born 1959-12-31 with an event on 2024-01-01 evaluates as 65 even though they are barely 64. For age cutoffs (`>= 65`) this introduces systematic off-by-one errors. The OMOP person table also carries `birth_datetime` (and `month_of_birth` / `day_of_birth`) for sites that captured them; prefer full-date arithmetic when available.
SELECT co.person_id
FROM person p
JOIN condition_occurrence co ON p.person_id = co.person_id
WHERE EXTRACT(YEAR FROM co.condition_start_date) - p.year_of_birth >= 65;
SELECT co.person_id
FROM person p
JOIN condition_occurrence co ON p.person_id = co.person_id
WHERE FLOOR((co.condition_start_date - p.birth_datetime) / 365.25) >= 65;
error
Care Site ID Join Validation
joins.care_site_id_join_validation
joins
Tables containing care_site_id must join to care_site using care_site_id. Joining on location_id, provider_id, or other columns is incorrect.
SELECT * FROM person p JOIN care_site cs ON p.location_id = cs.care_site_id;
SELECT * FROM person p JOIN care_site cs ON p.care_site_id = cs.care_site_id;
warning
Care Site Join Path Validation
joins.care_site_join_validation
joins
Ensures clinical tables join to location via care_site.
A longer write-up and example for this rule are still being authored.
error
Care Site to Location Join Validation
joins.care_site_location_join_validation
joins
care_site must join to location via location_id. Joining on other columns is incorrect.
SELECT * FROM care_site cs JOIN location l ON cs.care_site_id = l.location_id;
SELECT * FROM care_site cs JOIN location l ON cs.location_id = l.location_id;
error
Clinical Tables Require Person ID Linkage
joins.clinical_person_id_linkage_validation
joins
Clinical tables must be connected via person_id to ensure patient-level correctness. CTEs are treated as validated units and checked separately.
SELECT co.condition_occurrence_id, de.drug_exposure_id
FROM condition_occurrence co
JOIN drug_exposure de ON co.visit_occurrence_id = de.visit_occurrence_id;
SELECT co.condition_occurrence_id, de.drug_exposure_id
FROM condition_occurrence co
JOIN drug_exposure de ON co.person_id = de.person_id;
error
Clinical Primary Key Join Validation
joins.clinical_pk_cross_join_validation
joins
Ensures clinical event primary keys are not incorrectly used in joins. Each clinical event table has an independent primary key sequence with no semantic relationship to other tables.
SELECT * FROM condition_occurrence co
JOIN drug_exposure de ON co.condition_occurrence_id = de.drug_exposure_id;
SELECT * FROM condition_occurrence co
JOIN drug_exposure de ON co.person_id = de.person_id;
error
Clinical to Visit Detail Join Validation
joins.clinical_visit_detail_join_validation
joins
When joining clinical tables to visit_detail, visit_detail_id should be used. Joining visit_occurrence_id to visit_detail_id (or vice versa) is an ID type mismatch.
SELECT * FROM condition_occurrence co
JOIN visit_detail vd ON co.visit_occurrence_id = vd.visit_detail_id;
SELECT * FROM condition_occurrence co
JOIN visit_detail vd ON co.visit_detail_id = vd.visit_detail_id;
error
Cohort to Clinical Table Join Validation
joins.cohort_clinical_join_validation
joins
Ensures cohort joins to clinical tables using subject_id = person_id, either directly or via person table.
SELECT * FROM cohort c
JOIN condition_occurrence co ON c.subject_id = co.visit_occurrence_id;
SELECT * FROM cohort c
JOIN condition_occurrence co ON c.subject_id = co.person_id;
error
Concept Alias Reuse Validation
joins.concept_alias_reuse_validation
joins
Ensures each concept join uses a distinct alias when joining multiple concept_id columns (primary, source, type). Prevents ambiguous joins, semantic confusion, and incorrect results.
SELECT * FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_id
JOIN concept c ON co.condition_source_concept_id = c.concept_id;
SELECT * FROM condition_occurrence co
JOIN concept c1 ON co.condition_concept_id = c1.concept_id
JOIN concept c2 ON co.condition_source_concept_id = c2.concept_id;
error
Concept Ancestor Name Resolution Validation
joins.concept_ancestor_name_resolution
joins
Ensures concept_ancestor joins align with semantic intent (ancestor vs descendant) inferred from column aliases.
A longer write-up and example for this rule are still being authored.
error
Concept to Concept Class Join Validation
joins.concept_concept_class_join_validation
joins
If concept is joined to concept_class, the relationship should use concept.concept_class_id = concept_class.concept_class_id.
SELECT c.concept_id FROM concept c
JOIN concept_class cc ON c.vocabulary_id = cc.concept_class_id;
SELECT c.concept_id FROM concept c
JOIN concept_class cc ON c.concept_class_id = cc.concept_class_id;
error
Concept to Domain Join Validation
joins.concept_domain_join_validation
joins
If concept is joined to domain, the relationship should use concept.domain_id = domain.domain_id.
SELECT c.concept_id FROM concept c
JOIN domain d ON c.vocabulary_id = d.domain_id;
SELECT c.concept_id FROM concept c
JOIN domain d ON c.domain_id = d.domain_id;
error
Concept Join Validation
joins.concept_join_validation
joins
Ensures correct joins to the OMOP concept table. Columns ending with '_concept_id' must join to concept.concept_id. Vocabulary-based joins (e.g., concept_code) should include vocabulary_id.
SELECT co.person_id FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_name;
SELECT co.person_id FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_id;
error
Concept Relationship to Concept Join Validation
joins.concept_relationship_concept_join_validation
joins
Ensures concept_relationship joins to concept align with semantic intent (source vs target) inferred from aliases.
SELECT c_source.concept_name
FROM concept_relationship cr
JOIN concept c_source ON cr.concept_id_2 = c_source.concept_id;
SELECT c_source.concept_name
FROM concept_relationship cr
JOIN concept c_source ON cr.concept_id_1 = c_source.concept_id;
error
Concept Relationship to Relationship Join Validation
joins.concept_relationship_relationship_join_validation
joins
When concept_relationship is joined with relationship, it must use relationship_id on both sides.
SELECT * FROM concept_relationship cr
JOIN relationship r ON cr.concept_id_1 = r.relationship_id;
SELECT * FROM concept_relationship cr
JOIN relationship r ON cr.relationship_id = r.relationship_id;
warning
Concept Relationship Requires Relationship ID Filter
joins.concept_relationship_requires_relationship_id
joins
Queries using concept_relationship should typically filter on relationship_id to avoid cross-product joins. Exploratory/analytical queries may intentionally omit this filter to analyze all relationships.
concept_relationship is a many-to-many table: a single source concept typically has dozens of relationships ('Maps to', 'Is a', 'Subsumes', 'Has brand name', …). Joining to concept_relationship without filtering on relationship_id pulls them all, blowing up the row count and mixing semantically different relationships into one result set. For cohort definitions, always restrict to a single relationship type (typically relationship_id = 'Maps to'); exploratory queries should at least GROUP BY it so the direction of each match is visible in the output.
SELECT c1.concept_name, c2.concept_name
FROM concept c1
JOIN concept_relationship cr
ON c1.concept_id = cr.concept_id_1
JOIN concept c2
ON cr.concept_id_2 = c2.concept_id
WHERE c1.concept_id = 201820;
SELECT c1.concept_name, c2.concept_name
FROM concept c1
JOIN concept_relationship cr
ON c1.concept_id = cr.concept_id_1
AND cr.relationship_id = 'Maps to'
JOIN concept c2
ON cr.concept_id_2 = c2.concept_id
WHERE c1.concept_id = 201820;
error
Concept Synonym Join Validation
joins.concept_synonym_join_validation
joins
Ensures concept_synonym joins to concept via concept_id. Joining on names or other columns is unreliable because names are not unique.
SELECT cs.concept_synonym_name
FROM concept c
JOIN concept_synonym cs ON c.concept_name = cs.concept_synonym_name;
SELECT cs.concept_synonym_name
FROM concept c
JOIN concept_synonym cs ON c.concept_id = cs.concept_id;
error
Concept to Vocabulary Join Validation
joins.concept_vocabulary_join_validation
joins
If concept is joined to vocabulary, the join should use vocabulary_id. Other joins may be incorrect or ambiguous.
SELECT c.concept_id FROM concept c
JOIN vocabulary v ON c.domain_id = v.vocabulary_id;
SELECT c.concept_id FROM concept c
JOIN vocabulary v ON c.vocabulary_id = v.vocabulary_id;
warning
Cost Table Domain Validation
joins.cost_table_domain_validation
joins
Ensures cost joins use correct cost_domain_id to disambiguate polymorphic keys.
SELECT * FROM cost c
JOIN drug_exposure de ON c.cost_event_id = de.drug_exposure_id;
SELECT * FROM cost c
JOIN drug_exposure de ON c.cost_event_id = de.drug_exposure_id
WHERE c.cost_domain_id = 'Drug';
error
Death to Visit Occurrence Join Validation
joins.death_visit_occurrence_join_validation
joins
Ensures death joins to visit_occurrence using person_id. Flags missing or invalid joins.
SELECT * FROM death d
JOIN visit_occurrence vo ON d.death_date = vo.visit_start_date;
SELECT * FROM death d
JOIN visit_occurrence vo ON d.person_id = vo.person_id;
error
Drug Exposure to Drug Strength Join Validation
joins.drug_exposure_drug_strength_join_validation
joins
Ensures drug_exposure joins to drug_strength using drug_concept_id. Flags missing or non-standard joins.
SELECT * FROM drug_exposure de
JOIN drug_strength ds ON de.drug_source_concept_id = ds.drug_concept_id;
SELECT * FROM drug_exposure de
JOIN drug_strength ds ON de.drug_concept_id = ds.drug_concept_id;
error
Era Table Forbidden Join Validation
joins.era_forbidden_join_validation
joins
Ensures era tables (condition_era, drug_era, dose_era) are not joined to visit_occurrence, visit_detail, provider, or care_site tables, either directly or through intermediate tables.
SELECT * FROM drug_era de
JOIN visit_occurrence vo ON de.person_id = vo.person_id;
SELECT * FROM drug_era de;
error
Fact Relationship Polymorphic Join Validation
joins.fact_relationship_join_validation
joins
Ensures fact_relationship joins include correct domain_concept_id filtering. fact_id columns are polymorphic and must be disambiguated using domain_concept_id.
SELECT fr.fact_id_1 FROM fact_relationship fr
JOIN visit_occurrence vo ON fr.fact_id_1 = vo.visit_occurrence_id;
SELECT fr.fact_id_1 FROM fact_relationship fr
JOIN visit_occurrence vo ON fr.fact_id_1 = vo.visit_occurrence_id
WHERE fr.domain_concept_id_1 = 8;
warning
Join Path Validation
joins.join_path_validation
joins
Verifies that concept or concept_relationship tables are properly joined to clinical tables using standard concept fields
SELECT co.person_id FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.domain_id;
SELECT co.person_id FROM condition_occurrence co
JOIN concept c ON co.condition_concept_id = c.concept_id;
warning
Left Join Then Where On Right Table
joins.left_join_then_where_on_right_table
joins
Filtering right table columns in WHERE after LEFT JOIN turns it effectively into INNER JOIN.
SELECT p.person_id FROM person p
LEFT JOIN visit_occurrence vo ON p.person_id = vo.person_id
WHERE vo.visit_start_date >= DATE '2023-01-01';
SELECT p.person_id FROM person p
LEFT JOIN visit_occurrence vo
ON p.person_id = vo.person_id
AND vo.visit_start_date >= DATE '2023-01-01';
warning
Maps To Direction
joins.maps_to_direction
joins
Verifies that 'Maps to' relationship is used in the correct direction: concept_id_1 for source, concept_id_2 for standard concept
A longer write-up and example for this rule are still being authored.
error
Note NLP to Note Join Validation
joins.note_nlp_note_join_validation
joins
Ensures note_nlp joins to note using note_id. Flags missing or invalid joins.
SELECT * FROM note_nlp nnlp JOIN note n ON nnlp.note_nlp_id = n.note_id;
SELECT * FROM note_nlp nnlp JOIN note n ON nnlp.note_id = n.note_id;
warning
Observation Period Join Requires Date Overlap
joins.observation_period_join_validation
joins
observation_period joins to clinical tables must include date overlap constraints.
SELECT * FROM condition_occurrence co
JOIN observation_period op ON co.person_id = op.person_id;
SELECT * FROM condition_occurrence co
JOIN observation_period op
ON co.person_id = op.person_id
AND co.condition_start_date BETWEEN op.observation_period_start_date
AND op.observation_period_end_date;
warning
Payer Plan Period Join Validation
joins.payer_plan_period_join_validation
joins
Ensures payer_plan_period joins to clinical tables include proper date overlap conditions, not just person_id.
SELECT * FROM condition_occurrence co
JOIN payer_plan_period ppp ON co.person_id = ppp.person_id;
SELECT * FROM condition_occurrence co
JOIN payer_plan_period ppp
ON co.person_id = ppp.person_id
AND co.condition_start_date BETWEEN ppp.payer_plan_period_start_date
AND ppp.payer_plan_period_end_date;
error
Person ID Join Validation
joins.person_id_join_validation
joins
Ensures person_id columns only join to other person_id columns. Prevents incorrect joins to unrelated primary keys such as visit_occurrence_id or condition_occurrence_id.
SELECT * FROM person p
JOIN visit_occurrence vo ON p.person_id = vo.visit_occurrence_id;
SELECT * FROM person p
JOIN visit_occurrence vo ON p.person_id = vo.person_id;
error
Person to Location Join Validation
joins.person_location_join_validation
joins
person must join to location via location_id to get patient address. Joining on person_id, person_source_value, or other columns is incorrect.
SELECT * FROM person p JOIN location l ON p.person_id = l.location_id;
SELECT * FROM person p JOIN location l ON p.location_id = l.location_id;
error
Preceding Visit Occurrence Validation
joins.preceding_visit_occurrence_validation
joins
preceding_visit_occurrence_id must reference visit_occurrence.visit_occurrence_id via a proper self-join with person_id and temporal constraints.
SELECT vo.visit_occurrence_id FROM visit_occurrence vo
JOIN visit_occurrence prev ON vo.preceding_visit_occurrence_id = prev.person_id;
SELECT vo.visit_occurrence_id, vo.preceding_visit_occurrence_id
FROM visit_occurrence vo
JOIN visit_occurrence prev
ON vo.preceding_visit_occurrence_id = prev.visit_occurrence_id
AND vo.person_id = prev.person_id
AND prev.visit_end_date <= vo.visit_start_date;
error
Provider to Care Site Join Validation
joins.provider_care_site_join_validation
joins
provider must join to care_site via care_site_id to get practice location. Joining on provider_id, specialty_concept_id, or other columns is incorrect.
SELECT * FROM provider p JOIN care_site cs ON p.provider_id = cs.care_site_id;
SELECT * FROM provider p JOIN care_site cs ON p.care_site_id = cs.care_site_id;
error
Provider Join Validation
joins.provider_join_validation
joins
Clinical event tables must join to the provider table via provider_id. Joining on person_id, care_site_id, or other columns is incorrect.
SELECT * FROM condition_occurrence co
JOIN provider p ON co.person_id = p.provider_id;
SELECT * FROM condition_occurrence co
JOIN provider p ON co.provider_id = p.provider_id;
warning
Visit Detail Join Validation
joins.visit_detail_join_validation
joins
Ensures visit_detail joins to visit_occurrence using visit_occurrence_id. Joining only on person_id can produce incorrect results.
SELECT * FROM visit_detail vd
JOIN visit_occurrence vo ON vd.person_id = vo.person_id;
SELECT * FROM visit_detail vd
JOIN visit_occurrence vo ON vd.visit_occurrence_id = vo.visit_occurrence_id;
error
Visit Occurrence ID Join Validation
joins.visit_occurrence_id_join_validation
joins
Ensures visit_occurrence_id columns only join to other visit_occurrence_id columns. Prevents incorrect joins to unrelated identifiers such as person_id or condition_occurrence_id.
SELECT * FROM visit_occurrence vo
JOIN condition_occurrence co ON vo.visit_occurrence_id = co.condition_occurrence_id;
SELECT * FROM visit_occurrence vo
JOIN condition_occurrence co ON vo.visit_occurrence_id = co.visit_occurrence_id;
warning
Visit Occurrence INNER JOIN Validation
joins.visit_occurrence_inner_join_validation
joins
Detects INNER JOINs to visit_occurrence that may exclude events with NULL visit_occurrence_id.
SELECT co.person_id FROM condition_occurrence co
JOIN visit_occurrence vo ON co.visit_occurrence_id = vo.visit_occurrence_id;
SELECT co.person_id FROM condition_occurrence co
LEFT JOIN visit_occurrence vo ON co.visit_occurrence_id = vo.visit_occurrence_id;
warning
Clinical Event Date Should Not Be In Future
temporal.clinical_event_date_in_future_validation
temporal
Detects filtering logic that implies clinical event dates occur in the future. Future event dates may indicate data quality issues or incorrect query logic.
Queries that filter condition_start_date > CURRENT_DATE (or the same inequality against any clinical event date column) describe events that have not happened yet. In production OMOP data this is almost always a data-quality audit (trying to find future-dated events to clean up) and not a real cohort selection, so the rule warns rather than errors. If you genuinely need to audit, document that intent in a comment; if the predicate is accidentally reversed, flip the operator.
SELECT * FROM condition_occurrence
WHERE condition_start_date > CURRENT_DATE;
SELECT * FROM condition_occurrence
WHERE condition_start_date <= CURRENT_DATE;
warning
Datetime BETWEEN with Date Literal
temporal.datetime_between_date_literal
temporal
BETWEEN on datetime columns using date-only literals causes data loss because the end date excludes non-midnight times.
BETWEEN '2023-01-01' AND '2023-12-31' on a *_datetime column quietly drops every record from 2023-12-31 after 00:00:00, because the date literal is cast to midnight on that day. The problem is silent: the query returns a superset of 2023 data minus the last day's non-midnight measurements, which rarely shows up in spot-checks. Use half-open intervals (>= '2023-01-01' AND < '2024-01-01'), cast both sides to timestamps, or switch to the *_date column when sub-day precision is not needed.
SELECT *
FROM measurement
WHERE measurement_datetime BETWEEN DATE '2023-01-01' AND DATE '2023-12-31';
SELECT *
FROM measurement
WHERE measurement_datetime >= TIMESTAMP '2023-01-01 00:00:00'
AND measurement_datetime < TIMESTAMP '2024-01-01 00:00:00';
error
Death Date Before Birth Validation
temporal.death_date_before_birth_validation
temporal
Detects impossible temporal conditions where death occurs before birth.
A predicate that allows death_date < birth_datetime to evaluate TRUE encodes an impossibility, no patient can die before they are born. When this condition appears in a WHERE or JOIN clause, it is almost always a column-swap bug: the tables on either side of the comparison have been reversed, or start/end columns from a template have been mixed up. The rule flags the impossible comparison so the author can restore the intended temporal ordering before the query silently returns zero rows.
SELECT p.person_id
FROM person p
JOIN death d ON p.person_id = d.person_id
WHERE d.death_date < p.birth_datetime;
SELECT p.person_id
FROM person p
JOIN death d ON p.person_id = d.person_id
WHERE d.death_date > p.birth_datetime;
warning
Death Date In Future Validation
temporal.death_date_in_future_validation
temporal
Detects queries filtering for future death dates, indicating data or logic issues.
Filtering for death_date > CURRENT_DATE describes patients who die in the future, which is not possible. Like the clinical-event-in-future rule, this is almost always either a data-quality audit (looking for impossible deaths to clean up) or an accidentally reversed comparison. The severity is WARNING rather than ERROR to allow legitimate audit queries.
SELECT * FROM death
WHERE death_date > CURRENT_DATE;
SELECT * FROM death
WHERE death_date <= CURRENT_DATE;
error
End Before Start Validation
temporal.end_before_start_validation
temporal
Detects impossible constraints where start_date > end_date
Filtering for *_start_date > *_end_date (or the equivalent *_end_date < *_start_date) describes an impossible time range. In OMOP tables like visit_occurrence and drug_exposure every record has start <= end by specification, so a predicate that returns only records where start > end selects zero rows. Useful as a data-quality probe, but usually the operator was reversed by accident.
SELECT *
FROM visit_occurrence
WHERE visit_start_date > visit_end_date;
SELECT *
FROM visit_occurrence
WHERE visit_start_date <= visit_end_date;
warning
Unbounded Follow-up Window (Future Information Leakage)
temporal.future_information_leakage
temporal
Detects queries that compare dates across different clinical event tables (e.g. condition_start_date > drug_exposure_start_date) without bounding the later event against observation_period_end_date. The later event can fall outside the patient's observed follow-up window, introducing immortal-time bias and similar follow-up-window errors. Suppressed when observation_period is not joined at all — the observation_period_anchoring rule covers that case with a coherent fix.
When a query compares event dates across two clinical tables (e.g. 'condition started before first drug exposure'), the future-facing event must be bounded against observation_period_end_date. Without that bound, the comparison reaches into events that occurred AFTER the person's observation ended, silently pulling data that should not be in scope. The leakage is subtle because the join still returns rows; they are just rows representing out-of-bounds data. Add an explicit observation_period bound on the future-facing side.
SELECT co.person_id
FROM condition_occurrence co
JOIN drug_exposure de ON co.person_id = de.person_id
WHERE co.condition_start_date < de.drug_exposure_start_date;
SELECT co.person_id
FROM condition_occurrence co
JOIN drug_exposure de ON co.person_id = de.person_id
JOIN observation_period op ON co.person_id = op.person_id
WHERE co.condition_start_date < de.drug_exposure_start_date
AND de.drug_exposure_start_date <= op.observation_period_end_date;
warning
Nullable End Date NULL Handling
temporal.nullable_end_date_null_handling
temporal
Ensures nullable end_date columns are properly handled when used in functions, arithmetic, or comparisons to avoid NULL propagation issues.
Several OMOP end-date columns are nullable (drug_exposure_end_date, condition_end_date, device_exposure_end_date). Passing them directly into DATEDIFF, arithmetic, or a comparison propagates NULL through the expression, which then evaluates to NULL in WHERE and silently excludes the row. Wrap the nullable column in COALESCE with a sensible fallback (often the paired start_date), or filter IS NOT NULL explicitly, so the row-exclusion is deliberate rather than accidental.
SELECT person_id,
DATEDIFF(day, drug_exposure_start_date, drug_exposure_end_date) AS days_supply
FROM drug_exposure;
SELECT person_id,
DATEDIFF(day, drug_exposure_start_date,
COALESCE(drug_exposure_end_date, drug_exposure_start_date)) AS days_supply
FROM drug_exposure;
warning
Observation Period Anchoring
temporal.observation_period_anchoring
temporal
When observation_period is included in a query, ensures it is properly joined on person_id to clinical tables. Does not flag queries that simply don't use observation_period (valid design choice for descriptive queries).
When a query touches observation_period alongside a clinical table, OMOP semantics require the two to be linked by person_id, otherwise the join produces every (person, observation_period) combination, mixing one patient's observation window with another's clinical events. The rule fires when observation_period appears in the FROM list without a person_id equality linking it to the clinical side. If you only need observation_period's date columns without tying them to a specific patient, that is usually a query-design smell; restructure to make the person_id join explicit.
SELECT co.person_id
FROM condition_occurrence co, observation_period op
WHERE co.condition_start_date BETWEEN op.observation_period_start_date
AND op.observation_period_end_date;
SELECT co.person_id
FROM condition_occurrence co
JOIN observation_period op ON co.person_id = op.person_id
WHERE co.condition_start_date BETWEEN op.observation_period_start_date
AND op.observation_period_end_date;
error
Observation Period Date Range Logic
temporal.observation_period_date_range_logic
temporal
Ensures clinical event dates are tested within observation_period bounds. Detects reversed logic where observation_period dates are incorrectly used as values.
The conventional OMOP invariant is event_date BETWEEN observation_period_start_date AND observation_period_end_date: the event is the value being tested, the observation window is the range. Reversed forms like observation_period_start_date BETWEEN event_start_date AND event_end_date produce rows only when the observation window happens to fall inside a single event, which is almost never what was intended. It is a classic copy-paste bug from swapping the BETWEEN operands.
SELECT co.person_id
FROM condition_occurrence co
JOIN observation_period op ON co.person_id = op.person_id
WHERE op.observation_period_start_date
BETWEEN co.condition_start_date AND co.condition_end_date;
SELECT co.person_id
FROM condition_occurrence co
JOIN observation_period op ON co.person_id = op.person_id
WHERE co.condition_start_date
BETWEEN op.observation_period_start_date AND op.observation_period_end_date;
warning
Required Date Column Validation
temporal.required_date_column_validation
temporal
Temporal queries on clinical tables should use required (NOT NULL) date columns instead of nullable columns to avoid silently excluding records.
Each OMOP clinical table has at least one required (NOT NULL) date column and one or more optional end-date columns. Filtering on a nullable column (e.g. drug_exposure_end_date instead of drug_exposure_start_date) silently excludes every row where that column happens to be null, often a sizeable fraction of the data. Prefer the required column for the primary temporal filter; reach for the nullable ones only when their meaning is specifically what you need (duration calculations, end-of-period cohorts).
SELECT person_id
FROM drug_exposure
WHERE drug_exposure_end_date >= DATE '2023-01-01';
SELECT person_id
FROM drug_exposure
WHERE drug_exposure_start_date >= DATE '2023-01-01';