Databricks — Mastering Delta Tables: Unleashing the Power of Column Names with Spaces

A Step-by-Step Guide to Creating Delta Tables with Spaces in Column Names

Kuharan Bhowmik
3 min readJun 4, 2023

--

Photo by Tushar Rathour on Unsplash

Navigating the Spaces —

Delta tables are a powerful data storage format that provides reliability, performance, and advanced features for big data processing. However, one common challenge is working with column names that contain spaces. In this blog post, we will explore how to create Delta tables with spaces in column names and perform operations on them.

Step 1: Creating a Table with No Space in Column Names

To begin, let’s create a Delta table without spaces in column names. We’ll assume that you have a database and a table called “tmp” that you want to work with. First, let’s drop the table if it already exists to ensure a clean start:

sql(f"DROP TABLE IF EXISTS tmp")

Next, we can create the table with two columns, “col1” and “col2”, using the Delta format:

sql(f"""
CREATE OR REPLACE TABLE tmp (
col1 STRING,
col2 STRING
)
USING DELTA
LOCATION '/tables/tmp'
""")

By executing this code, you will have a Delta table named “tmp” with the specified columns.

Step 2: Upgrading the protocol version

Now, let’s say you want to rename the column “col1” to “col 1” to include a space. According to the SQL specification, column names must not contain spaces or any special characters other than underscore (_), dollar sign ($), or a combination of letters, digits, underscores, and dollar signs.

Unfortunately, directly renaming the column using the ALTER TABLE statement will not work in this case. Databricks will throw an error at this step. However, there is a workaround.

To achieve column renaming with spaces, you need to change the protocol version to 2.5. Execute the following code:

%sql
ALTER TABLE delta.`/tables/tmp`
SET tblproperties (
'delta.columnMapping.mode' = 'name',
'delta.minReaderVersion' = '2',
'delta.minWriterVersion' = '5'
)

By setting these Delta table properties, you enable the renaming of columns with spaces.

Step 3: Renaming Columns with Spaces

After executing this code, you can proceed to rename the column as desired:

%sql
ALTER TABLE delta.`/tables/tmp`
RENAME COLUMN col1 TO `col 1`

Step 4: Selecting Columns with Spaces

When querying Delta tables with column names containing spaces, you need to use the “``” notation to select the columns. Here’s an example of selecting all columns from the “tmp” table:

%sql
SELECT `col 1`, col2
FROM delta.`/tables/tmp`

Make sure to use the “``” notation around the column name with a space. This ensures that the query correctly identifies and retrieves the desired column.

Conquering Delta Tables with Column Names having spaces —

Creating Delta tables with spaces in column names requires a few additional steps compared to tables with regular column names. By adjusting the Delta table’s protocol version and using the “``” notation in queries, you can work seamlessly with columns that have spaces.

To access the complete code and example in a Databricks notebook, you can find it in the following GitHub repository:

Find the data bricks notebook here —

For more information on protocol versioning :

https://docs.delta.io/latest/versioning.html

Proceed with Caution: Potential Pitfalls and Warnings —

Warning: This method is new and some features are still in experimental mode and moving to production can lead to data loss. Make sure you carry out proper testing because protocol versioning is irreversible! Also upgrading protocol versioning may lead to a few undesired results like the below:

Enabling column mapping also enables random file prefixes, which removes the ability to explore data using Hive-style partitioning. See Do Delta Lake and Parquet share partitioning strategies?

Enabling column mapping on tables might break downstream operations that rely on Delta change data feed. See Change data feed limitations for tables with column mapping enabled.

Enabling column mapping on tables might break streaming read from the Delta table as a source, including in Delta Live Tables. See the Table streaming reads and writes

More here — https://learn.microsoft.com/en-us/azure/databricks/delta/delta-column-mapping

Happy data bricks-ing!

--

--