Updated: Mar 30
Azure Databricks lets you work with several languages like Python and SQL, but if you are migrating a project from a different system, you would probably want to continue with the same language. Recently I am working on a migration project from SQL server to Azure Databricks, and while migrating stored procedures to Databricks notebooks, I learned some stuff that I want to share with you.
Databricks SQL is a different dialect than SQL server TSQL, and although most of the commands work well without alteration, in some cases you need to make modifications.
I will list here the issues that I came across. I would love it if you can find more and help me broaden this list. Just leave a comment below.
Temp tables with select
select * into #temp from table
CREATE or replace TEMPORARY VIEW temp_table as select * from table
IIF checks of a logical condition, in Databricks its called IFF, the rest of the syntax is the same
To get the first 10 rows only
select top (10) * from table
select * from table limit 10
format lets you display dates and numeric values as specific formats.
for example a date like that: 28/03/2023.
This is an interesting case, because there is an actual difference between the two dialects.
ISNULL(value) in Databricks returns True\False if the value is null, where in SQL server this function return the second parameter if the value is null i.e. - ISNULL(value, -99). To get the same functionality in Databricks, use coalesce instead.
select 'Chen' + ' ' + 'Hirsh'
select 'Chen' || ' ' || 'Hirsh'
In SQL server you can use this syntax
select column1 as 'col1'
In Databricks you would get an error. Just loose the Apostrophes
I left the annoying think for last. In SQL server, whether your code need to be case sensitive depends on you collation. I haven't seen many database that work with case sensitive collation, though.
In Databricks, your value are case sensitive, meaning that there is a difference between
and one of them will not return data.
Another difference, thanks to @ben Hazan: when counting rows in a table with a condition:
This will work in both SQL and Databricks:
SELECT count(*) FROM users_dirty WHERE email IS NULL
But Databricks have another sytax:
SELECT count_if(email IS NULL) FROM users_dirty
I will try to add more differences if an when I find them, please feel free to suggest in the comments. And if you read all this post about syntax, you should really pat yourself on the back, and maybe get a hobby :-) .