SQLCell 2.0: Redesigning SQLCell for JupyterLab
I've been meaning to revisit <a href="https://github.com/tmthyjames/SQLCell">SQLCell</a> for some time. Besides the codebase being a product of my early days of learning how to program and that making contributions was cumbersome due to the poor design of the API, the real force behind me refactoring was that I've been using Jupyter Lab a lot lately and I still hate most SQL interfaces. I like the freedom of being able to use Jupyter as a pseudo-SQL app—there's not as much functionality as a fully fledged SQL interface, but for most of my needs, I just need to write queries; I don't need the same level of utility as a DBA would.
Therefore, I'm refactoring SQLCell completely to work with both Jupyter Notebook and Jupyter Lab. The major differences in this release are:
1) No UI buttons or graphs (yet)
2) The API is a lot cleaner
Back in the day of Jupyter Notebooks, you could publish a notebook that injected malicious Javascript onto a user's computer. Now, <a href="https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/JavaScript%20Notebook%20Extensions.html">that is no longer (publicly) possible</a>. I say public, because there's not a public Javascript API for Lab like there is for Notebook. This comes with its benefits and costs, as is reflected in many Github conversations (<a href="https://github.com/jupyterlab/jupyterlab/issues/3118">here</a> and <a href="https://github.com/jupyterlab/jupyterlab/issues/3748">here</a> and <a href="https://github.com/jupyterlab/jupyterlab/pull/2595">here</a>). With that, I am still looking into building graphs and UI features into a future version. And hopefully this release will attract more contributors as the API is much cleaner than version 1's.
Here are the features I'm starting with:
>### [Engine Persistence](#Engine-Persistence)
>### [Python Variables as Query Parameters](#Python-Variables-as-Query-Parameters)
>### [Engine Aliasing](#Engine-Aliasing)
>### [Defining and Persisting Hooks](#Defining-and-Persisting-Hooks)
>### [Write SQL Output to Session Variable](#Write-SQL-Output-to-Session-Variable)
>### [Background Queries](#Background-Queries)
To install SQLCell use `pip`:
I've been meaning to revisit SQLCell for some time. Besides the codebase being a product of my early days of learning how to program and that making contributions was cumbersome due to the poor design of the API, the real force behind me refactoring was that I've been using Jupyter Lab a lot lately and I still hate most SQL interfaces. I like the freedom of being able to use Jupyter as a pseudo-SQL app—there's not as much functionality as a fully fledged SQL interface, but for most of my needs, I just need to write queries; I don't need the same level of utility as a DBA would.
Therefore, I'm refactoring SQLCell completely to work with both Jupyter Notebook and Jupyter Lab. The major differences in this release are:
1) No UI buttons or graphs (yet)
2) The API is a lot cleaner
Back in the day of Jupyter Notebooks, you could publish a notebook that injected malicious Javascript onto a user's computer. Now, that is no longer (publicly) possible. I say public, because there's not a public Javascript API for Lab like there is for Notebook. This comes with its benefits and costs, as is reflected in many Github conversations (here and here and here). With that, I am still looking into building graphs and UI features into a future version. And hopefully this release will attract more contributors as the API is much cleaner than version 1's.
Here are the features I'm starting with:
Engine Persistence¶
Python Variables as Query Parameters¶
Engine Aliasing¶
Defining and Persisting Hooks¶
Write SQL Output to Session Variable¶
Background Queries¶
To install SQLCell use pip
:
pip install sqlcell
Then load it in your Notebook/Lab by using the `%load_ext` command:
Then load it in your Notebook/Lab by using the %load_ext
command:
%load_ext sqlcell.sqlcell
## Engine Persistence
Engine Persistence¶
The first time you use `%%sql`, you'll need to pass your connection string as an `--engine` argument.
The first time you use %%sql
, you'll need to pass your connection string as an --engine
argument.
%%sql --engine=mssql+pymssql://SomeUserName:SomePassword@SomeHostName/tdobbins
SELECT TOP 2 N1.*
FROM NashvilleHousingMSSQL N1
JOIN NashvilleHousingMSSQL N2 ON N2.ParcelID = N1.ParcelID
ORDER BY Address DESC
After that, SQLCell will create a few aliases for you to easily refer to the engine you're wanting to run against. Now you can pass the entire connection string or either the hostname or the database name, and `%%sql` will use the same engine for all three. Note, if you have two databases with the same name, consider just using the hostname.
After that, SQLCell will create a few aliases for you to easily refer to the engine you're wanting to run against. Now you can pass the entire connection string or either the hostname or the database name, and %%sql
will use the same engine for all three. Note, if you have two databases with the same name, consider just using the hostname.
%%sql --engine=SomeHostName
SELECT TOP 2 *
FROM NashvilleHousingMSSQL
`%%sql` will also remember the last used engine so you don't have to refer to it everytime you run a query.
%%sql
will also remember the last used engine so you don't have to refer to it everytime you run a query.
%%sql
SELECT TOP 2 *
FROM NashvilleHousingMSSQL
## Python Variables as Query Parameters
Python Variables as Query Parameters¶
If you want to pass Python variables to your query, you can use the colon (`foo = :foo`) syntax (similar to SQLAlchemy's usage).
If you want to pass Python variables to your query, you can use the colon (foo = :foo
) syntax (similar to SQLAlchemy's usage).
foo = 'NASHVILLE'
bar = 132000
baz = ('SINGLE FAMILY', 'RESIDENTIAL CONDO')
%%sql --engine=SomeHostName
select *
from NashvilleHousingMSSQL
where PropertyCity = :foo
and SalePrice like :bar
and LandUse in :baz
And the same with a Postgres dialect:
And the same with a Postgres dialect:
foo = 'NASHVILLE'
bar = 132000
baz = ('SINGLE FAMILY', 'RESIDENTIAL CONDO')
%%sql --engine=PG
select n2.*
from "NashvilleHousingPostgres" n1
JOIN "NashvilleHousingPostgres" n2 on n2."ParcelID" = n1."ParcelID"
where n2."PropertyCity" = :foo
and n2."SalePrice" = :bar
and n2."LandUse" in :baz
## Engine Aliasing
Engine Aliasing¶
Similar to <a href="https://github.com/tmthyjames/SQLCell#declare-engines">version 1</a>, you can also alias engine strings. Unlike version 1, there are no UI buttons (yet). Using the following syntax:
Similar to version 1, you can also alias engine strings. Unlike version 1, there are no UI buttons (yet). Using the following syntax:
%%sql --engines
PG=postgresql://username:@localhost/db
MSS=mssql+pymssql://SomeUserName:SomePassword@180.36.3.76/tdobbins
LC=mssql+pymssql://username:password@180.36.2.90/DB1
we can now use `%%sql` with an alias as our `--engine` parameter:
we can now use %%sql
with an alias as our --engine
parameter:
%%sql --engine=MSS
select top 2 * from NashvilleHousingMSSQL
To view all your engines, use the `list` argument:
To view all your engines, use the list
argument:
%%sql --engines
list
To clear your engines, use the `--refresh` command like so:
To clear your engines, use the --refresh
command like so:
%%sql --refresh
engines
## Defining and Persisting Hooks
Defining and Persisting Hooks¶
One thing I tried to do with version 1 is mimic `psql` commands (since I was using postgres exclusively), but doing that forced limitations on users who used other DBs. With this version, I'm removing those limitations by allowing users to register their own often-run shortcut queries like so:
One thing I tried to do with version 1 is mimic psql
commands (since I was using postgres exclusively), but doing that forced limitations on users who used other DBs. With this version, I'm removing those limitations by allowing users to register their own often-run shortcut queries like so:
%%sql --hook
tables MSS = USE {0};
SELECT * FROM sys.Tables;
columns_mss MSS = EXEC sp_columns {0};
columns_db1 DB1 = EXEC sp_columns {0};
views mssql+pymssql://username:password@hostname/DB = USE {0};
SELECT * FROM sys.Views;
tables_db1 DB1 = USE {0};
SELECT * FROM sys.Tables;
views_l mssql+pymssql://:@localhost/DB = USE {0};
SELECT * FROM sys.Views;
databases LC = SELECT name FROM master.sys.databases;
The syntax for defining hooks is:
> ```
<name> <engine|alias> = <sql command>```
Take this example:
>```
tables MSS = USE {0};
SELECT * FROM sys.Tables;
```
where
`tables` is our command shortcut.
`MSS` is an engine alias we have already defined with `--engines`.
and
```
USE {0};
SELECT * FROM sys.Tables;
```
is the comand we want to run when we issue our shortcut.
To use your newly-defined shortcuts, prepend `~` to your shortcut name:
The syntax for defining hooks is:
<name> <engine|alias> = <sql command>
Take this example:
tables MSS = USE {0}; SELECT * FROM sys.Tables;
where
tables
is our command shortcut.
MSS
is an engine alias we have already defined with --engines
.
and
USE {0};
SELECT * FROM sys.Tables;
is the comand we want to run when we issue our shortcut.
To use your newly-defined shortcuts, prepend ~
to your shortcut name:
%%sql
~columns_mss NashvilleHousingMSSQL
Since we defined `columns_mss` as having an argument, `{0}`, we can pass a table name to it.
Since we defined columns_mss
as having an argument, {0}
, we can pass a table name to it.
To view your hooks, use `list`:
To view your hooks, use list
:
%%sql --hook
list
And to clear your hook list, just run `--refresh` with `hooks` as the cell content, like we did for engines:
And to clear your hook list, just run --refresh
with hooks
as the cell content, like we did for engines:
%%sql --refresh
hooks
## Write SQL Output to Session Variable
Write SQL Output to Session Variable¶
To write the output of your query to a Python variable, use the `--var` argument.
To write the output of your query to a Python variable, use the --var
argument.
%%sql --engine=LC --var=df
select 1 as some_lc_column
df
## Background Queries
Background Queries¶
To run queries in the background, use the `--background` flag:
To run queries in the background, use the --background
flag:
%%sql --engine=LC --var=another_df --background
select 1 as some_lc_column
Notice here that `another_df` is undefined since the query is still running:
Notice here that another_df
is undefined since the query is still running:
another_df
As soon as the query is done, you can call `another_df` successfully.
As soon as the query is done, you can call another_df
successfully.
another_df
I'm working on a way to notify the user when a query finishes in the background. I'm also working on adding UI buttons using ipywidgets. Enjoy!
I'm working on a way to notify the user when a query finishes in the background. I'm also working on adding UI buttons using ipywidgets. Enjoy!