Skip to content
Venki Notes
Go back

Troubleshooting Data Dictionary Issues in MySQL and Aurora

After years of working as a Database Architect and handling escalations around MySQL environments on AWS, I noticed one topic kept coming back: Data Dictionary inconsistencies. Teams would panic, escalate, and often the fix was something they could have handled themselves — if they understood what was actually happening under the hood.

This post is my attempt to save you that escalation. Let’s break it down.

Table of Contents

Open Table of Contents

What Is the Data Dictionary?

Every relational database engine maintains a Data Dictionary — a collection of metadata that tracks objects like tables, indexes, columns, and constraints. Think of it as the database’s internal map of itself.

Here’s where it gets interesting: the implementation changed significantly between MySQL versions.

MySQL 5.7 (and Aurora MySQL v2, RDS MySQL 5.7)

MySQL 5.7 maintains two separate dictionaries that must stay in sync:

┌─────────────────────────────────────────────────┐
│              MySQL Server Layer                  │
│                                                 │
│   Filesystem-based Dictionary                   │
│   ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐            │
│   │.FRM │ │.TRN │ │.TRG │ │.OPT │            │
│   └─────┘ └─────┘ └─────┘ └─────┘            │
│                                                 │
├─────────────────────────────────────────────────┤
│              InnoDB Engine Layer                 │
│                                                 │
│   InnoDB Internal Dictionary                    │
│   (inside system tablespace)                    │
│                                                 │
└─────────────────────────────────────────────────┘

Both must agree for a table to be accessible.

MySQL 8.0 (and Aurora MySQL v3, RDS MySQL 8.0)

MySQL 8.0 introduced a single, unified, transactional data dictionary stored entirely in InnoDB tables. This was a major architectural improvement:

┌─────────────────────────────────────────────────┐
│         MySQL 8.0 Unified Dictionary            │
│                                                 │
│   ┌─────────────────────────────────────────┐   │
│   │  Single InnoDB-based Data Dictionary    │   │
│   │  • Atomic DDL support                   │   │
│   │  • Crash-safe schema changes            │   │
│   │  • No more .FRM files                   │   │
│   └─────────────────────────────────────────┘   │
│                                                 │
└─────────────────────────────────────────────────┘

DDL operations are now atomic — they either fully
succeed or fully roll back.

Why this matters: On MySQL 8.0 / Aurora v3, data dictionary inconsistencies are significantly less common because DDL operations are atomic. They still can happen, but the window is much narrower.

Why Does the Data Dictionary Become Inconsistent?

Let’s be precise with language here. We say “inconsistency” rather than “corruption.” True data corruption is rare and involves damaged data pages. Dictionary inconsistency means the metadata layers disagree about what exists — and that’s a design consequence, not a storage failure.

The Root Cause (MySQL 5.7 / Aurora v2)

Here’s the fundamental problem:

MySQL 5.7 does not guarantee ACID properties for schema changes. If something goes wrong mid-DDL, you can end up with one dictionary knowing about a table and the other not.

Note on Aurora MySQL v2: While Aurora’s distributed storage layer provides superior crash safety for data, the MySQL-layer dictionary operations still follow the same non-transactional DDL behavior as community MySQL 5.7.

The Root Cause (MySQL 8.0 / Aurora v3)

MySQL 8.0 introduced Atomic DDL, which means DDL statements are now crash-safe — they either complete fully or roll back. This dramatically reduces inconsistency risk, but doesn’t eliminate it entirely (edge cases around foreign_key_checks still apply).

Common Symptoms

When your data dictionary is inconsistent, you’ll typically see:

User-reported symptoms:

Error log messages:

ERROR 1146 (42S02): Table 'mydb.orders' doesn't exist

ERROR 1050 (42S01): Table 'mydb.orders' already exists

ERROR 1051 (42S02): Unknown table 'mydb.orders'
[Warning] InnoDB: Cannot open table mydb/orders from the internal
data dictionary of InnoDB though the .frm file for the table exists.

[ERROR] InnoDB: Trying to load index GEN_CLUST_INDEX for table
mydb.orders, but the index tree has been freed!

The paradox of a table both “existing” and “not existing” is the hallmark of a dictionary inconsistency — one layer sees it, the other doesn’t.

What Triggers These Inconsistencies?

1. Foreign Key Checks Disabled During DDL

This is the most common cause I’ve seen in production.

-- Developer does this to speed up migrations:
SET foreign_key_checks = 0;

-- Then runs DDL that changes table structure:
ALTER TABLE orders DROP INDEX idx_customer_id;
-- This index was required by a foreign key!

SET foreign_key_checks = 1;
-- Now BOTH tables (parent and child) "disappear"

The MySQL documentation explicitly warns about this:

“With foreign_key_checks=0, dropping an index required by a foreign key constraint places the table in an inconsistent state and causes the foreign key check that occurs at table load to fail.”

When MySQL detects the FK relationship is broken, it refuses to load both tables involved in the relationship.

This applies to both RDS MySQL and Aurora MySQL, all versions.

2. Crash During DDL Operations

If MySQL crashes (OOM, storage full, hardware failure, or even a MySQL bug) while a DDL is in progress:

VersionRisk LevelWhy
MySQL 5.7 / Aurora v2HighDDL is non-transactional. Crash = partial state.
MySQL 8.0 / Aurora v3LowAtomic DDL rolls back incomplete operations.

On MySQL 5.7, a crash mid-DDL can leave behind:

Aurora-specific note: Aurora’s storage layer is inherently crash-safe for data writes, which means storage-level issues causing mid-DDL crashes are rarer than on RDS MySQL or self-hosted. However, MySQL-layer crashes (OOM, bugs) can still trigger the same dictionary inconsistencies.

Diagnosis and Recovery Flowchart

┌─────────────────────────────────────┐
│   Table inaccessible / missing?     │
└──────────────┬──────────────────────┘


┌─────────────────────────────────────┐
│  Does the table use Foreign Keys?   │
└──────┬───────────────────────┬──────┘
       │ YES                   │ NO
       ▼                       ▼
┌──────────────────┐   ┌──────────────────────────┐
│ SET SESSION      │   │ Was there a recent crash  │
│ foreign_key_     │   │ or DDL operation?         │
│ checks = 0;     │   └────────┬─────────────────┘
│                  │            │
│ Can you access   │            ▼
│ the table now?   │   ┌──────────────────────────┐
└───┬──────────┬───┘   │ Consider PITR to before   │
    │YES       │NO     │ the DDL/crash event       │
    ▼          ▼       └──────────────────────────┘
┌────────┐  ┌──────────────────────────┐
│ Fix FK │  │ Check error logs for      │
│ defs & │  │ orphaned tablespace or    │
│indexes │  │ .frm mismatch messages    │
└────────┘  └──────────────────────────┘

Step-by-Step Recovery

Step 1: Check Foreign Keys First

This resolves the majority of cases I’ve seen:

-- In a session, disable FK checks
SET SESSION foreign_key_checks = 0;

-- Try to access the "missing" table
SELECT COUNT(*) FROM mydb.orders;

If the table comes back: The issue is a broken FK relationship. Fix it by:

  1. Reviewing the MySQL Foreign Key documentation — specifically the “Conditions and Restrictions” section
  2. Ensuring both parent and child tables have the required indexes
  3. Ensuring column definitions match exactly between FK columns

Prevention: Never run DDL statements that modify indexes or columns involved in foreign key relationships while foreign_key_checks = 0.

Step 2: Assess the Situation

Before attempting any recovery, answer these questions:

QuestionWhy It Matters
Is this a production database?Determines urgency and risk tolerance
Do you have recent backups/snapshots?Determines recovery options
When did the issue start?Helps identify the triggering event
What DDL ran before the issue?Points to root cause

Step 3: Protect Your Recovery Options

Do this immediately, regardless of which recovery path you choose:

# Extend backup retention to maximum (35 days)
# This preserves your PITR window

aws rds modify-db-instance \
  --db-instance-identifier your-instance \
  --backup-retention-period 35 \
  --apply-immediately

# For Aurora clusters:
aws rds modify-db-cluster \
  --db-cluster-identifier your-cluster \
  --backup-retention-period 35 \
  --apply-immediately

Step 4: Choose Your Recovery Path

Option A — Point-in-Time Recovery (Best option when available)

Restore to a point just before the problematic DDL:

# RDS MySQL
aws rds restore-db-instance-to-point-in-time \
  --source-db-instance-identifier your-instance \
  --target-db-instance-identifier your-instance-recovered \
  --restore-time "2026-05-18T10:00:00Z"

# Aurora MySQL
aws rds restore-db-cluster-to-point-in-time \
  --source-db-cluster-identifier your-cluster \
  --db-cluster-identifier your-cluster-recovered \
  --restore-to-time "2026-05-18T10:00:00Z"

Option B — Restore from Snapshot

If you have a snapshot from before the issue:

# Create a new instance from the good snapshot
aws rds restore-db-instance-from-db-snapshot \
  --db-instance-identifier your-instance-recovered \
  --db-snapshot-identifier your-good-snapshot

Option C — Manual Recovery (Advanced, MySQL 5.7 only)

For orphaned tablespace issues on MySQL 5.7, you may be able to:

  1. Identify the orphaned .ibd file
  2. Use ALTER TABLE ... DISCARD TABLESPACE and IMPORT TABLESPACE to reattach

This is risky and should only be attempted on non-production instances or copies.

Prevention Checklist

PracticeApplies To
Never run structural DDL with foreign_key_checks = 0All versions
Monitor disk space — avoid storage-full during DDLRDS MySQL especially
Set appropriate innodb_flush_log_at_trx_commit = 1RDS MySQL 5.7
Set sync_binlog = 1 if using binlogRDS MySQL 5.7
Upgrade to MySQL 8.0 / Aurora v3 for Atomic DDLAll
Test DDL operations in non-production firstAll
Maintain adequate backup retentionAll
Monitor error logs for early warning signsAll

RDS MySQL vs Aurora MySQL: Key Differences

AspectRDS MySQLAurora MySQL
Storage crash safetyEBS-based, relies on MySQL crash recoveryDistributed storage with built-in crash safety
DDL crash riskHigher — storage-level issues more likelyLower — storage layer is more resilient
sync_binlog importanceCritical for durabilityLess critical — Aurora handles durability at storage layer
Dictionary inconsistency riskStandard MySQL risk profileSlightly lower due to storage architecture
Recovery optionsPITR, snapshots, manual tablespace recoveryPITR, snapshots, backtrack (Aurora-specific)
Atomic DDL (8.0)SupportedSupported (Aurora v3)

Key Takeaways

  1. Data dictionary inconsistency ≠ data corruption. Your data is likely fine — the metadata just disagrees with itself.

  2. Foreign keys are the #1 culprit. Always check foreign_key_checks = 0 as your first diagnostic step.

  3. MySQL 8.0 / Aurora v3 dramatically reduces risk through Atomic DDL. If you’re still on 5.7, this is one more reason to upgrade.

  4. Protect your PITR window immediately when you discover an issue. Extend backup retention before doing anything else.

  5. These issues stem from MySQL’s architecture, not from the managed platform. Understanding the “why” helps you prevent recurrence.


Special thanks to Valter for contributing ideas and review to this article.


Share this post on:

Previous Post
Why Is My PostgreSQL Major Version Upgrade So Slow?
Next Post
Hello, I'm Venki — and welcome to my notes