UPDATE Statement (Impala 2.8 or higher only)
Updates an arbitrary number of rows in a Kudu table. This statement only works for Impala tables that use the Kudu storage engine.
Syntax:
UPDATE [database_name.]table_name SET col = val [, col = val ... ]
[ FROM joined_table_refs ]
[ WHERE where_conditions ]
Usage notes:
None of the columns that make up the primary key can be updated by the SET
clause.
The conditions in the WHERE
clause are the same ones allowed for the SELECT
statement. See SELECT Statement for details.
If the WHERE
clause is omitted, all rows in the table are updated.
The conditions in the WHERE
clause can refer to any combination of primary key columns or other columns. Referring to primary key columns in the WHERE
clause is more efficient than referring to non-primary key columns.
Because Kudu currently does not enforce strong consistency during concurrent DML operations, be aware that the results after this statement finishes might be different than you intuitively expect:
If some rows cannot be updated because their some primary key columns are not found, due to their being deleted by a concurrent
DELETE
operation, the statement succeeds but returns a warning.An
UPDATE
statement might also overlap withINSERT
,UPDATE
, orUPSERT
statements running concurrently on the same table. After the statement finishes, there might be more or fewer matching rows than expected in the table because it is undefined whether theUPDATE
applies to rows that are inserted or updated while theUPDATE
is in progress.
The number of affected rows is reported in an impala-shell message and in the query profile.
The optional FROM
clause lets you restrict the updates to only the rows in the specified table that are part of the result set for a join query. The join clauses can include non-Kudu tables, but the table from which the rows are deleted must be a Kudu table.
Statement type: DML
Important: After adding or replacing data in a table used in performance-critical queries, issue a COMPUTE STATS
statement to make sure all statistics are up-to-date. Consider updating statistics for a table after any INSERT
, LOAD DATA
, or CREATE TABLE AS SELECT
statement in Impala, or after loading data through Hive and doing a REFRESH table_name
in Impala. This technique is especially important for tables that are very large, used in join queries, or both.
Examples:
The following examples show how to perform a simple update on a table, with or without a WHERE
clause:
-- Set all rows to the same value for column c3.
-- In this case, c1 and c2 are primary key columns
-- and so cannot be updated.
UPDATE kudu_table SET c3 = 'not applicable';
-- Update only the rows that match the condition.
UPDATE kudu_table SET c3 = NULL WHERE c1 > 100 AND c3 IS NULL;
-- Does not update any rows, because the WHERE condition is always false.
UPDATE kudu_table SET c3 = 'impossible' WHERE 1 = 0;
-- Change the values of multiple columns in a single UPDATE statement.
UPDATE kudu_table SET c3 = upper(c3), c4 = FALSE, c5 = 0 WHERE c6 = TRUE;
The following examples show how to perform an update using the FROM
keyword with a join clause:
-- Uppercase a column value, only for rows that have
-- an ID that matches the value from another table.
UPDATE kudu_table SET c3 = upper(c3)
FROM kudu_table JOIN non_kudu_table
ON kudu_table.id = non_kudu_table.id;
-- Same effect as previous statement.
-- Assign table aliases in FROM clause, then refer to
-- short names elsewhere in the statement.
UPDATE t1 SET c3 = upper(c3)
FROM kudu_table t1 JOIN non_kudu_table t2
ON t1.id = t2.id;
-- Same effect as previous statements, but more efficient.
-- Use WHERE clause to skip updating values that are
-- already uppercase.
UPDATE t1 SET c3 = upper(c3)
FROM kudu_table t1 JOIN non_kudu_table t2
ON t1.id = t2.id
WHERE c3 != upper(c3);
Related information:
Using Impala to Query Kudu Tables, INSERT Statement, DELETE Statement (Impala 2.8 or higher only), UPSERT Statement (Impala 2.8 or higher only)
Parent topic: Impala SQL Statements