SQLCell 2.0: Redesigning SQLCell for JupyterLab
I've been meaning to revisit <a href="https://github.com/tmthyjames/SQLCell">SQLCell</a> for some time. Besides the codebase being a product of my early days of learning how to program and that making contributions was cumbersome due to the poor design of the API, the real force behind me refactoring was that I've been using Jupyter Lab a lot lately and I still hate most SQL interfaces. I like the freedom of being able to use Jupyter as a pseudo-SQL app—there's not as much functionality as a fully fledged SQL interface, but for most of my needs, I just need to write queries; I don't need the same level of utility as a DBA would. Therefore, I'm refactoring SQLCell completely to work with both Jupyter Notebook and Jupyter Lab. The major differences in this release are: 1) No UI buttons or graphs (yet) 2) The API is a lot cleaner Back in the day of Jupyter Notebooks, you could publish a notebook that injected malicious Javascript onto a user's computer. Now, <a href="https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/JavaScript%20Notebook%20Extensions.html">that is no longer (publicly) possible</a>. I say public, because there's not a public Javascript API for Lab like there is for Notebook. This comes with its benefits and costs, as is reflected in many Github conversations (<a href="https://github.com/jupyterlab/jupyterlab/issues/3118">here</a> and <a href="https://github.com/jupyterlab/jupyterlab/issues/3748">here</a> and <a href="https://github.com/jupyterlab/jupyterlab/pull/2595">here</a>). With that, I am still looking into building graphs and UI features into a future version. And hopefully this release will attract more contributors as the API is much cleaner than version 1's.Here are the features I'm starting with:>### [Engine Persistence](#Engine-Persistence)>### [Python Variables as Query Parameters](#Python-Variables-as-Query-Parameters)>### [Engine Aliasing](#Engine-Aliasing)>### [Defining and Persisting Hooks](#Defining-and-Persisting-Hooks)>### [Write SQL Output to Session Variable](#Write-SQL-Output-to-Session-Variable)>### [Background Queries](#Background-Queries)To install SQLCell use `pip`:I've been meaning to revisit SQLCell for some time. Besides the codebase being a product of my early days of learning how to program and that making contributions was cumbersome due to the poor design of the API, the real force behind me refactoring was that I've been using Jupyter Lab a lot lately and I still hate most SQL interfaces. I like the freedom of being able to use Jupyter as a pseudo-SQL app—there's not as much functionality as a fully fledged SQL interface, but for most of my needs, I just need to write queries; I don't need the same level of utility as a DBA would.
Therefore, I'm refactoring SQLCell completely to work with both Jupyter Notebook and Jupyter Lab. The major differences in this release are:
1) No UI buttons or graphs (yet)
2) The API is a lot cleaner
Back in the day of Jupyter Notebooks, you could publish a notebook that injected malicious Javascript onto a user's computer. Now, that is no longer (publicly) possible. I say public, because there's not a public Javascript API for Lab like there is for Notebook. This comes with its benefits and costs, as is reflected in many Github conversations (here and here and here). With that, I am still looking into building graphs and UI features into a future version. And hopefully this release will attract more contributors as the API is much cleaner than version 1's.
Here are the features I'm starting with:
Engine Persistence¶
Python Variables as Query Parameters¶
Engine Aliasing¶
Defining and Persisting Hooks¶
Write SQL Output to Session Variable¶
Background Queries¶
To install SQLCell use pip:
pip install sqlcellThen load it in your Notebook/Lab by using the `%load_ext` command:Then load it in your Notebook/Lab by using the %load_ext command:
%load_ext sqlcell.sqlcell## Engine PersistenceEngine Persistence¶
The first time you use `%%sql`, you'll need to pass your connection string as an `--engine` argument.The first time you use %%sql, you'll need to pass your connection string as an --engine argument.
%%sql --engine=mssql+pymssql://SomeUserName:SomePassword@SomeHostName/tdobbinsSELECT TOP 2 N1.*FROM NashvilleHousingMSSQL N1JOIN NashvilleHousingMSSQL N2 ON N2.ParcelID = N1.ParcelIDORDER BY Address DESCAfter that, SQLCell will create a few aliases for you to easily refer to the engine you're wanting to run against. Now you can pass the entire connection string or either the hostname or the database name, and `%%sql` will use the same engine for all three. Note, if you have two databases with the same name, consider just using the hostname.After that, SQLCell will create a few aliases for you to easily refer to the engine you're wanting to run against. Now you can pass the entire connection string or either the hostname or the database name, and %%sql will use the same engine for all three. Note, if you have two databases with the same name, consider just using the hostname.
%%sql --engine=SomeHostName SELECT TOP 2 * FROM NashvilleHousingMSSQL`%%sql` will also remember the last used engine so you don't have to refer to it everytime you run a query.%%sql will also remember the last used engine so you don't have to refer to it everytime you run a query.
%%sql SELECT TOP 2 * FROM NashvilleHousingMSSQL## Python Variables as Query ParametersPython Variables as Query Parameters¶
If you want to pass Python variables to your query, you can use the colon (`foo = :foo`) syntax (similar to SQLAlchemy's usage).If you want to pass Python variables to your query, you can use the colon (foo = :foo) syntax (similar to SQLAlchemy's usage).
foo = 'NASHVILLE'bar = 132000baz = ('SINGLE FAMILY', 'RESIDENTIAL CONDO')%%sql --engine=SomeHostName select * from NashvilleHousingMSSQLwhere PropertyCity = :foo and SalePrice like :bar and LandUse in :bazAnd the same with a Postgres dialect:And the same with a Postgres dialect:
foo = 'NASHVILLE'bar = 132000baz = ('SINGLE FAMILY', 'RESIDENTIAL CONDO')%%sql --engine=PGselect n2.* from "NashvilleHousingPostgres" n1JOIN "NashvilleHousingPostgres" n2 on n2."ParcelID" = n1."ParcelID"where n2."PropertyCity" = :foo and n2."SalePrice" = :bar and n2."LandUse" in :baz## Engine AliasingEngine Aliasing¶
Similar to <a href="https://github.com/tmthyjames/SQLCell#declare-engines">version 1</a>, you can also alias engine strings. Unlike version 1, there are no UI buttons (yet). Using the following syntax:Similar to version 1, you can also alias engine strings. Unlike version 1, there are no UI buttons (yet). Using the following syntax:
%%sql --enginesPG=postgresql://username:@localhost/dbMSS=mssql+pymssql://SomeUserName:SomePassword@180.36.3.76/tdobbinsLC=mssql+pymssql://username:password@180.36.2.90/DB1we can now use `%%sql` with an alias as our `--engine` parameter:we can now use %%sql with an alias as our --engine parameter:
%%sql --engine=MSSselect top 2 * from NashvilleHousingMSSQLTo view all your engines, use the `list` argument:To view all your engines, use the list argument:
%%sql --engineslistTo clear your engines, use the `--refresh` command like so:To clear your engines, use the --refresh command like so:
%%sql --refreshengines## Defining and Persisting HooksDefining and Persisting Hooks¶
One thing I tried to do with version 1 is mimic `psql` commands (since I was using postgres exclusively), but doing that forced limitations on users who used other DBs. With this version, I'm removing those limitations by allowing users to register their own often-run shortcut queries like so:One thing I tried to do with version 1 is mimic psql commands (since I was using postgres exclusively), but doing that forced limitations on users who used other DBs. With this version, I'm removing those limitations by allowing users to register their own often-run shortcut queries like so:
%%sql --hooktables MSS = USE {0};SELECT * FROM sys.Tables;columns_mss MSS = EXEC sp_columns {0};columns_db1 DB1 = EXEC sp_columns {0};views mssql+pymssql://username:password@hostname/DB = USE {0};SELECT * FROM sys.Views;tables_db1 DB1 = USE {0};SELECT * FROM sys.Tables;views_l mssql+pymssql://:@localhost/DB = USE {0};SELECT * FROM sys.Views;databases LC = SELECT name FROM master.sys.databases;The syntax for defining hooks is:> ```<name> <engine|alias> = <sql command>```Take this example:>```tables MSS = USE {0};SELECT * FROM sys.Tables;```where `tables` is our command shortcut.`MSS` is an engine alias we have already defined with `--engines`.and ```USE {0};SELECT * FROM sys.Tables;```is the comand we want to run when we issue our shortcut. To use your newly-defined shortcuts, prepend `~` to your shortcut name:The syntax for defining hooks is:
<name> <engine|alias> = <sql command>
Take this example:
tables MSS = USE {0}; SELECT * FROM sys.Tables;
where
tables is our command shortcut.
MSS is an engine alias we have already defined with --engines.
and
USE {0};
SELECT * FROM sys.Tables;
is the comand we want to run when we issue our shortcut.
To use your newly-defined shortcuts, prepend ~ to your shortcut name:
%%sql~columns_mss NashvilleHousingMSSQLSince we defined `columns_mss` as having an argument, `{0}`, we can pass a table name to it.Since we defined columns_mss as having an argument, {0}, we can pass a table name to it.
To view your hooks, use `list`:To view your hooks, use list:
%%sql --hooklistAnd to clear your hook list, just run `--refresh` with `hooks` as the cell content, like we did for engines:And to clear your hook list, just run --refresh with hooks as the cell content, like we did for engines:
%%sql --refreshhooks## Write SQL Output to Session VariableWrite SQL Output to Session Variable¶
To write the output of your query to a Python variable, use the `--var` argument.To write the output of your query to a Python variable, use the --var argument.
%%sql --engine=LC --var=dfselect 1 as some_lc_columndf## Background QueriesBackground Queries¶
To run queries in the background, use the `--background` flag:To run queries in the background, use the --background flag:
%%sql --engine=LC --var=another_df --backgroundselect 1 as some_lc_columnNotice here that `another_df` is undefined since the query is still running:Notice here that another_df is undefined since the query is still running:
another_dfAs soon as the query is done, you can call `another_df` successfully.As soon as the query is done, you can call another_df successfully.
another_dfI'm working on a way to notify the user when a query finishes in the background. I'm also working on adding UI buttons using ipywidgets. Enjoy!I'm working on a way to notify the user when a query finishes in the background. I'm also working on adding UI buttons using ipywidgets. Enjoy!