Inconsistencies between system catalog and DocDB schema during DDL operations
Product | Affected Versions | Related Issues | Fixed In |
---|---|---|---|
YSQL | v2.14.0.0, v2.16.0.0, v2.18.0.0, v2.20.0.0 | #22802, #16712 | v2024.1 |
Description
When DDL operations are executed, the database writes metadata to both the PostgreSQL system catalog and DocDB schema. If a DDL fails, in certain specific scenarios, the PostgreSQL system catalog changes are rolled back but the DocDB schema changes are not, causing inconsistency between the PostgreSQL system catalog and DocDB schema.
Issues arising from inconsistency between the PostgreSQL system catalog and DocDB schema may include the following:
-
Inconsistent backup
When ALTER TABLE operations cause inconsistency between the PostgreSQL system catalog and DocDB schema, subsequent backups can't be restored. DML operations on the table are unaffected. Restoring backups taken after the failed DDL fail with the error "YSQL table not found: <table_name>_temp_old".
-
Failure to add column
Failed ALTER TABLE ADD COLUMN operations roll back changes in the PostgreSQL system catalog but do not roll back the DocDB schema. This can result in subsequent ALTER TABLE ADD COLUMN operations with the same column name failing with the error "The column already exists".
-
Orphan tables
Failed DDL operations can cause tables to be present in DocDB but not in the PostgreSQL system catalog. These tables are not accessible using the YSQL API but do not interfere with any future operations. Affected tables are listed in the YugabyteDB Anywhere UI, and can only be deleted manually.
Mitigation
Detection
You can use the following script (available for Python 2 and 3) to check for potential issues. Copy the script to the YB-Master leader node and execute it locally on the same node.
-
Sign in to the YB-Master leader node and use one of the following commands to download the appropriate version of the script.
Python 3:
wget https://raw.githubusercontent.com/yugabyte/yb-tools/main/ddl_atomicity/ddl_atomicity_check_script_python3.py
Python 2:
wget https://raw.githubusercontent.com/yugabyte/yb-tools/main/ddl_atomicity/ddl_atomicity_check_script_python2.py
-
Execute the script using one of the following commands.
Python 3:
python3 ddl_atomicity_check_script_python3.py
Python 2:
python2 ddl_atomicity_check_script_python2.py
The script takes the following optional arguments:
Argument Description -h, --help Command line help. --ysqlsh_path YSQLSH_PATH Path to ysqlsh executable. --master_conf_path MASTER_CONF_PATH Path to master configuration file. --master_interface_address MASTER_INTERFACE_ADDRESS Master UI interface IP. If not provided, it will be read from master_conf_path. --master_interface_port MASTER_INTERFACE_PORT Port for the master UI interface. --ysql_host YSQL_HOST Host for ysqlsh. --master_leader_only Check if the node is the master leader and exit if it is not. --curl_path CURL_PATH Path to curl executable. --grep_path GREP_PATH Path to grep executable. --awk_path AWK_PATH Path to awk executable.
The script returns one of the following messages when it detects an issue:
-
Inconsistent backup
Table <Table_name> with oid <Table_OID> and uuid <Table_UUID> exists in <database_name> but has a mismatched table name - TABLE NAME NEEDS TO BE FIXED
-
Failure to add column
Column <colunm_name> does not exist in table <Table_name> in database <database_name> - ORPHANED COLUMN NEEDS TO BE DROPPED
-
Orphan tables
Table <Table_name> with oid <Table_OID> and uuid <Table_UUID> does not exist in database <database_name> - ORPHANED TABLE NEEDS TO BE DROPPED
Issue mitigaton
Inconsistent backup
Option 1: Take a new backup using the following steps:
-
Rename the table for which restore is failing as follows:
ALTER TABLE <table_name> RENAME TO <table_name>_temp_old; ALTER TABLE <table_name>_temp_old RENAME TO <table_name>;
The first RENAME is required to align the PostgreSQL system catalog entries with the DocDB schema. The second RENAME renames the table to the original intended name. This step must be repeated for each table for which the issue has been reported.
-
Take a new backup. Restores of subsequent backups should be successful.
Option 2: Modify an already taken backup if an existing backup requires a fix as follows:
-
Modify the schema dump taken by ysql_dump by changing all references to
<table_name>
with<table_name>_temp_old
for all the tables for which failed restore was previously reported. -
After restoring the backup, rename the table to its intended name.
ALTER TABLE <table_name>_temp_old RENAME TO <table_name>;
Contact Yugabyte Support if you need additional help fixing the backup schema file.
Failure to Add Column
Contact Yugabyte Support to resolve the issue.
Orphan tables
Contact Yugabyte Support to resolve the issue.
Details
ALTER TABLE operations such as adding primary key (ALTER TABLE ADD PRIMARY KEY), or changing column type (ALTER TABLE ALTER COLUMN TYPE) require rewriting the table. To do this, ALTER TABLE operations rename the table to <table_name>_temp_old
, create a new table, and then drop the old one.
If this operation fails after the rename, the PostgreSQL system catalog will be rolled back; however, the associated DocDB table will have the name <table_name>_temp_old
.
If a backup is taken on a database in this state, when that backup is restored, the PostgreSQL system catalog refers to the table using the original name, while the DocDB schema refers to the table using the name <table_name>_temp_old
, causing inconsistency and restore operations to fail with the error "YSQL table not found: <table_name>_temp_old".
DML operations on the table continue normally as DML operations refer to the table by OID, not by name.
When ADD COLUMN operations fail, the DocDB schema for newly added columns is not rolled back, so subsequent ADD COLUMN operations fail with the error message "The column already exists".