I've been meaning to revisit <a href="https://github.com/tmthyjames/SQLCell">SQLCell</a> for some time. Besides the codebase being a product of my early days of learning how to program and that making contributions was cumbersome due to the poor design of the API, the real force behind me refactoring was that I've been using Jupyter Lab a lot lately and I still hate most SQL interfaces. I like the freedom of being able to use Jupyter as a pseudo-SQL app—there's not as much functionality as a fully fledged SQL interface, but for most of my needs, I just need to write queries; I don't need the same level of utility as a DBA would. ​Therefore, I'm refactoring SQLCell completely to work with both Jupyter Notebook and Jupyter Lab. The major differences in this release are:​    1) No UI buttons or graphs (yet)    2) The API is a lot cleaner    Back in the day of Jupyter Notebooks, you could publish a notebook that injected malicious Javascript onto a user's computer. Now, <a href="https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/JavaScript%20Notebook%20Extensions.html">that is no longer (publicly) possible</a>. I say public, because there's not a public Javascript API for Lab like there is for Notebook. This comes with its benefits and costs, as is reflected in many Github conversations (<a href="https://github.com/jupyterlab/jupyterlab/issues/3118">here</a> and <a href="https://github.com/jupyterlab/jupyterlab/issues/3748">here</a> and <a href="https://github.com/jupyterlab/jupyterlab/pull/2595">here</a>). With that, I am still looking into building graphs and UI features into a future version. And hopefully this release will attract more contributors as the API is much cleaner than version 1's.​Here are the features I'm starting with:​>### [Engine Persistence](#Engine-Persistence)​>### [Python Variables as Query Parameters](#Python-Variables-as-Query-Parameters)​>### [Engine Aliasing](#Engine-Aliasing)​>### [Defining and Persisting Hooks](#Defining-and-Persisting-Hooks)​>### [Write SQL Output to Session Variable](#Write-SQL-Output-to-Session-Variable)​>### [Background Queries](#Background-Queries)​To install SQLCell use `pip`:

I've been meaning to revisit SQLCell for some time. Besides the codebase being a product of my early days of learning how to program and that making contributions was cumbersome due to the poor design of the API, the real force behind me refactoring was that I've been using Jupyter Lab a lot lately and I still hate most SQL interfaces. I like the freedom of being able to use Jupyter as a pseudo-SQL app—there's not as much functionality as a fully fledged SQL interface, but for most of my needs, I just need to write queries; I don't need the same level of utility as a DBA would.

Therefore, I'm refactoring SQLCell completely to work with both Jupyter Notebook and Jupyter Lab. The major differences in this release are:

1) No UI buttons or graphs (yet)
2) The API is a lot cleaner

Back in the day of Jupyter Notebooks, you could publish a notebook that injected malicious Javascript onto a user's computer. Now, that is no longer (publicly) possible. I say public, because there's not a public Javascript API for Lab like there is for Notebook. This comes with its benefits and costs, as is reflected in many Github conversations (here and here and here). With that, I am still looking into building graphs and UI features into a future version. And hopefully this release will attract more contributors as the API is much cleaner than version 1's.

Here are the features I'm starting with:

Engine Persistence ¶

Python Variables as Query Parameters ¶

Engine Aliasing ¶

Defining and Persisting Hooks ¶

Write SQL Output to Session Variable ¶

Background Queries ¶

To install SQLCell use pip:

pip install sqlcell

Then load it in your Notebook/Lab by using the `%load_ext` command:

Then load it in your Notebook/Lab by using the %load_ext command:

In [1]:

%load_ext sqlcell.sqlcell

## Engine Persistence

Engine Persistence¶

The first time you use `%%sql`, you'll need to pass your connection string as an `--engine` argument.

The first time you use %%sql, you'll need to pass your connection string as an --engine argument.

In [19]:

%%sql --engine=mssql+pymssql://SomeUserName:SomePassword@SomeHostName/tdobbins​SELECT TOP 2 N1.*FROM NashvilleHousingMSSQL N1JOIN NashvilleHousingMSSQL N2 ON N2.ParcelID = N1.ParcelIDORDER BY Address DESC

Out[19]:

	Acreage	Address	Bedrooms	BuildingValue	City	ExteriorWall	FinishedArea	FoundationType	FullBath	Grade	...	SalePrice	SoldAsVacant	State	Suite/Condo#	TaxDistrict	TotalValue	Unnamed:0	Unnamed:0.1	YearBuilt	image
0	0.11	804 LENA ST	None	0.0	NASHVILLE	None	None	None	None	None	...	16000	No	TN	None	URBAN SERVICES DISTRICT	9000.0	87	87	None	None
1	0.11	804 LENA ST	None	0.0	NASHVILLE	None	None	None	None	None	...	16000	No	TN	None	URBAN SERVICES DISTRICT	9000.0	87	87	None	None

2 rows × 31 columns

After that, SQLCell will create a few aliases for you to easily refer to the engine you're wanting to run against. Now you can pass the entire connection string or either the hostname or the database name, and `%%sql` will use the same engine for all three. Note, if you have two databases with the same name, consider just using the hostname.

After that, SQLCell will create a few aliases for you to easily refer to the engine you're wanting to run against. Now you can pass the entire connection string or either the hostname or the database name, and %%sql will use the same engine for all three. Note, if you have two databases with the same name, consider just using the hostname.

In [3]:

%%sql --engine=SomeHostName        SELECT TOP 2 * FROM NashvilleHousingMSSQL

Out[3]:

	Acreage	Address	Bedrooms	BuildingValue	City	ExteriorWall	FinishedArea	FoundationType	FullBath	Grade	...	SalePrice	SoldAsVacant	State	Suite/Condo#	TaxDistrict	TotalValue	Unnamed:0	Unnamed:0.1	YearBuilt	image
0	0.76	5316 OVERTON RD	4.0	300400.0	NASHVILLE	BRICK	2370.0	FULL BSMT	2.0	B	...	267300	No	TN	None	URBAN SERVICES DISTRICT	355400.0	24	24	1968.0	\184000\836001.JPG
1	NaN	None	NaN	NaN	None	None	NaN	None	NaN	None	...	207000	No	None	None	None	NaN	59	59	NaN	None

2 rows × 31 columns

`%%sql` will also remember the last used engine so you don't have to refer to it everytime you run a query.

%%sql will also remember the last used engine so you don't have to refer to it everytime you run a query.

In [15]:

%%sql        SELECT TOP 2 * FROM NashvilleHousingMSSQL

Out[15]:

	Acreage	Address	Bedrooms	BuildingValue	City	ExteriorWall	FinishedArea	FoundationType	FullBath	Grade	...	SalePrice	SoldAsVacant	State	Suite/Condo#	TaxDistrict	TotalValue	Unnamed:0	Unnamed:0.1	YearBuilt	image
0	0.76	5316 OVERTON RD	4.0	300400.0	NASHVILLE	BRICK	2370.0	FULL BSMT	2.0	B	...	267300	No	TN	None	URBAN SERVICES DISTRICT	355400.0	24	24	1968.0	\184000\836001.JPG
1	NaN	None	NaN	NaN	None	None	NaN	None	NaN	None	...	207000	No	None	None	None	NaN	59	59	NaN	None

2 rows × 31 columns

## Python Variables as Query Parameters

Python Variables as Query Parameters¶

If you want to pass Python variables to your query, you can use the colon (`foo = :foo`) syntax (similar to SQLAlchemy's usage).

If you want to pass Python variables to your query, you can use the colon (foo = :foo) syntax (similar to SQLAlchemy's usage).

In [14]:

foo = 'NASHVILLE'bar = 132000baz = ('SINGLE FAMILY', 'RESIDENTIAL CONDO')

In [5]:

%%sql --engine=SomeHostName        select * from NashvilleHousingMSSQLwhere PropertyCity = :foo    and SalePrice like :bar    and LandUse in :baz

Out[5]:

	Acreage	Address	Bedrooms	BuildingValue	City	ExteriorWall	FinishedArea	FoundationType	FullBath	Grade	...	SalePrice	SoldAsVacant	State	Suite/Condo#	TaxDistrict	TotalValue	Unnamed:0	Unnamed:0.1	YearBuilt	image
0	None	None	None	None	None	None	None	None	None	None	...	132000	No	None	8	None	None	0.0	0.0	None	None
1	None	None	None	None	None	None	None	None	None	None	...	132000	No	None	8	None	None	NaN	NaN	None	None

2 rows × 31 columns

And the same with a Postgres dialect:

And the same with a Postgres dialect:

In [7]:

foo = 'NASHVILLE'bar = 132000baz = ('SINGLE FAMILY', 'RESIDENTIAL CONDO')

In [8]:

%%sql --engine=PGselect n2.* from "NashvilleHousingPostgres" n1JOIN "NashvilleHousingPostgres" n2 on n2."ParcelID" = n1."ParcelID"where n2."PropertyCity" = :foo    and n2."SalePrice" = :bar    and n2."LandUse" in :baz

Out[8]:

	Acreage	Address	Bedrooms	BuildingValue	City	ExteriorWall	FinishedArea	FoundationType	FullBath	Grade	...	PropertyCity	SaleDate	SalePrice	SoldAsVacant	State	Suite/Condo#	TaxDistrict	TotalValue	YearBuilt	image
0	None	None	None	None	None	None	None	None	None	None	...	NASHVILLE	2013-01-24	132000	No	None	8	None	None	None	None

1 rows × 29 columns

## Engine Aliasing

Engine Aliasing¶

Similar to <a href="https://github.com/tmthyjames/SQLCell#declare-engines">version 1</a>, you can also alias engine strings. Unlike version 1, there are no UI buttons (yet). Using the following syntax:

Similar to version 1, you can also alias engine strings. Unlike version 1, there are no UI buttons (yet). Using the following syntax:

In [5]:

%%sql --engines​PG=postgresql://username:@localhost/db​MSS=mssql+pymssql://SomeUserName:SomePassword@180.36.3.76/tdobbins​LC=mssql+pymssql://username:password@180.36.2.90/DB1

Out[5]:

'Engines successfully registered'

we can now use `%%sql` with an alias as our `--engine` parameter:

we can now use %%sql with an alias as our --engine parameter:

In [6]:

%%sql --engine=MSS​select top 2 * from NashvilleHousingMSSQL

Out[6]:

	Acreage	Address	Bedrooms	BuildingValue	City	ExteriorWall	FinishedArea	FoundationType	FullBath	Grade	...	SalePrice	SoldAsVacant	State	Suite/Condo#	TaxDistrict	TotalValue	Unnamed:0	Unnamed:0.1	YearBuilt	image
0	0.76	5316 OVERTON RD	4.0	300400.0	NASHVILLE	BRICK	2370.0	FULL BSMT	2.0	B	...	267300	No	TN	None	URBAN SERVICES DISTRICT	355400.0	24	24	1968.0	\184000\836001.JPG
1	NaN	None	NaN	NaN	None	None	NaN	None	NaN	None	...	207000	No	None	None	None	NaN	59	59	NaN	None

2 rows × 31 columns

To view all your engines, use the `list` argument:

To view all your engines, use the list argument:

In [7]:

%%sql --engineslist

Out[7]:

	Alias	Engine
0	PG	postgresql://username:@localhost/db
1	MSS	mssql+pymssql://SomeUserName:SomePassword@17...
2	LC	mssql+pymssql://username:password@180.36.2.9...

To clear your engines, use the `--refresh` command like so:

To clear your engines, use the --refresh command like so:

In [4]:

%%sql --refreshengines

Out[4]:

'Removed all records from engines'

## Defining and Persisting Hooks

Defining and Persisting Hooks¶

One thing I tried to do with version 1 is mimic `psql` commands (since I was using postgres exclusively), but doing that forced limitations on users who used other DBs. With this version, I'm removing those limitations by allowing users to register their own often-run shortcut queries like so:

One thing I tried to do with version 1 is mimic psql commands (since I was using postgres exclusively), but doing that forced limitations on users who used other DBs. With this version, I'm removing those limitations by allowing users to register their own often-run shortcut queries like so:

In [3]:

%%sql --hook​tables MSS = USE {0};SELECT * FROM sys.Tables;​columns_mss MSS = EXEC sp_columns {0};​columns_db1 DB1 = EXEC sp_columns {0};​views mssql+pymssql://username:password@hostname/DB = USE {0};SELECT * FROM sys.Views;​tables_db1 DB1 = USE {0};SELECT * FROM sys.Tables;​views_l mssql+pymssql://:@localhost/DB = USE {0};SELECT * FROM sys.Views;​databases LC = SELECT name FROM master.sys.databases;

Out[3]:

'Hook successfully registered'

The syntax for defining hooks is:​> ```<name> <engine|alias> = <sql command>```​Take this example:​>```tables MSS = USE {0};SELECT * FROM sys.Tables;```​where ​`tables` is our command shortcut.​`MSS` is an engine alias we have already defined with `--engines`.​and     ```USE {0};SELECT * FROM sys.Tables;```​is the comand we want to run when we issue our shortcut.    To use your newly-defined shortcuts, prepend `~` to your shortcut name:

The syntax for defining hooks is:

<name> <engine|alias> = <sql command>

Take this example:

tables MSS = USE {0};
SELECT * FROM sys.Tables;

where

tables is our command shortcut.

MSS is an engine alias we have already defined with --engines.

and

USE {0};
SELECT * FROM sys.Tables;

is the comand we want to run when we issue our shortcut.

To use your newly-defined shortcuts, prepend ~ to your shortcut name:

In [8]:

%%sql~columns_mss NashvilleHousingMSSQL

Out[8]:

	CHAR_OCTET_LENGTH	COLUMN_DEF	COLUMN_NAME	DATA_TYPE	IS_NULLABLE	LENGTH	NULLABLE	ORDINAL_POSITION	PRECISION	RADIX	REMARKS	SCALE	SQL_DATA_TYPE	SQL_DATETIME_SUB	SS_DATA_TYPE	TABLE_NAME	TABLE_OWNER	TABLE_QUALIFIER	TYPE_NAME
0	NaN	None	Unnamed:0	-5	YES	8	1	1	19	10.0	None	0.0	-5	None	108	NashvilleHousingMSSQL	dbo	tdobbins	bigint
1	NaN	None	Unnamed:0.1	-5	YES	8	1	2	19	10.0	None	0.0	-5	None	108	NashvilleHousingMSSQL	dbo	tdobbins	bigint
2	2.147484e+09	None	ParcelID	-1	YES	2147483647	1	3	2147483647	NaN	None	NaN	-1	None	39	NashvilleHousingMSSQL	dbo	tdobbins	text
3	2.147484e+09	None	LandUse	-1	YES	2147483647	1	4	2147483647	NaN	None	NaN	-1	None	39	NashvilleHousingMSSQL	dbo	tdobbins	text
4	2.147484e+09	None	PropertyAddress	-1	YES	2147483647	1	5	2147483647	NaN	None	NaN	-1	None	39	NashvilleHousingMSSQL	dbo	tdobbins	text
5	2.147484e+09	None	Suite/Condo#	-1	YES	2147483647	1	6	2147483647	NaN	None	NaN	-1	None	39	NashvilleHousingMSSQL	dbo	tdobbins	text
6	2.147484e+09	None	PropertyCity	-1	YES	2147483647	1	7	2147483647	NaN	None	NaN	-1	None	39	NashvilleHousingMSSQL	dbo	tdobbins	text
7	2.147484e+09	None	SaleDate	-1	YES	2147483647	1	8	2147483647	NaN	None	NaN	-1	None	39	NashvilleHousingMSSQL	dbo	tdobbins	text
8	NaN	None	SalePrice	-5	YES	8	1	9	19	10.0	None	0.0	-5	None	108	NashvilleHousingMSSQL	dbo	tdobbins	bigint
9	2.147484e+09	None	LegalReference	-1	YES	2147483647	1	10	2147483647	NaN	None	NaN	-1	None	39	NashvilleHousingMSSQL	dbo	tdobbins	text
10	2.147484e+09	None	SoldAsVacant	-1	YES	2147483647	1	11	2147483647	NaN	None	NaN	-1	None	39	NashvilleHousingMSSQL	dbo	tdobbins	text
11	2.147484e+09	None	MultipleParcelsInvolvedinSale	-1	YES	2147483647	1	12	2147483647	NaN	None	NaN	-1	None	39	NashvilleHousingMSSQL	dbo	tdobbins	text
12	2.147484e+09	None	OwnerName	-1	YES	2147483647	1	13	2147483647	NaN	None	NaN	-1	None	39	NashvilleHousingMSSQL	dbo	tdobbins	text
13	2.147484e+09	None	Address	-1	YES	2147483647	1	14	2147483647	NaN	None	NaN	-1	None	39	NashvilleHousingMSSQL	dbo	tdobbins	text
14	2.147484e+09	None	City	-1	YES	2147483647	1	15	2147483647	NaN	None	NaN	-1	None	39	NashvilleHousingMSSQL	dbo	tdobbins	text
15	2.147484e+09	None	State	-1	YES	2147483647	1	16	2147483647	NaN	None	NaN	-1	None	39	NashvilleHousingMSSQL	dbo	tdobbins	text
16	NaN	None	Acreage	6	YES	8	1	17	15	10.0	None	NaN	6	None	109	NashvilleHousingMSSQL	dbo	tdobbins	float
17	2.147484e+09	None	TaxDistrict	-1	YES	2147483647	1	18	2147483647	NaN	None	NaN	-1	None	39	NashvilleHousingMSSQL	dbo	tdobbins	text
18	NaN	None	Neighborhood	6	YES	8	1	19	15	10.0	None	NaN	6	None	109	NashvilleHousingMSSQL	dbo	tdobbins	float
19	2.147484e+09	None	image	-1	YES	2147483647	1	20	2147483647	NaN	None	NaN	-1	None	39	NashvilleHousingMSSQL	dbo	tdobbins	text
20	NaN	None	LandValue	6	YES	8	1	21	15	10.0	None	NaN	6	None	109	NashvilleHousingMSSQL	dbo	tdobbins	float
21	NaN	None	BuildingValue	6	YES	8	1	22	15	10.0	None	NaN	6	None	109	NashvilleHousingMSSQL	dbo	tdobbins	float
22	NaN	None	TotalValue	6	YES	8	1	23	15	10.0	None	NaN	6	None	109	NashvilleHousingMSSQL	dbo	tdobbins	float
23	NaN	None	FinishedArea	6	YES	8	1	24	15	10.0	None	NaN	6	None	109	NashvilleHousingMSSQL	dbo	tdobbins	float
24	2.147484e+09	None	FoundationType	-1	YES	2147483647	1	25	2147483647	NaN	None	NaN	-1	None	39	NashvilleHousingMSSQL	dbo	tdobbins	text
25	NaN	None	YearBuilt	6	YES	8	1	26	15	10.0	None	NaN	6	None	109	NashvilleHousingMSSQL	dbo	tdobbins	float
26	2.147484e+09	None	ExteriorWall	-1	YES	2147483647	1	27	2147483647	NaN	None	NaN	-1	None	39	NashvilleHousingMSSQL	dbo	tdobbins	text
27	2.147484e+09	None	Grade	-1	YES	2147483647	1	28	2147483647	NaN	None	NaN	-1	None	39	NashvilleHousingMSSQL	dbo	tdobbins	text
28	NaN	None	Bedrooms	6	YES	8	1	29	15	10.0	None	NaN	6	None	109	NashvilleHousingMSSQL	dbo	tdobbins	float
29	NaN	None	FullBath	6	YES	8	1	30	15	10.0	None	NaN	6	None	109	NashvilleHousingMSSQL	dbo	tdobbins	float
30	NaN	None	HalfBath	6	YES	8	1	31	15	10.0	None	NaN	6	None	109	NashvilleHousingMSSQL	dbo	tdobbins	float

Since we defined `columns_mss` as having an argument, `{0}`, we can pass a table name to it.

Since we defined columns_mss as having an argument, {0}, we can pass a table name to it.

To view your hooks, use `list`:

To view your hooks, use list:

In [10]:

%%sql --hooklist

Out[10]:

	Alias	Engine	Hook
0	tables	mssql+pymssql://SomeUserName:SomePassword@17...	USE {0};\nSELECT * FROM sys.Tables;
1	columns_mss	mssql+pymssql://SomeUserName:SomePassword@17...	EXEC sp_columns {0};
2	columns_db1	mssql+pymssql://username:password@180.36.2.9...	EXEC sp_columns {0};
3	views	mssql+pymssql://username:password@180.36.2.9...	USE {0};\nSELECT * FROM sys.Views;
4	tables_db1	mssql+pymssql://username:password@180.36.2.9...	USE {0};\nSELECT * FROM sys.Tables;
5	views_l	mssql+pymssql://:@localhost/DB1	USE {0};\nSELECT * FROM sys.Views;
6	databases	mssql+pymssql://username:password@180.36.2.9...	SELECT name FROM master.sys.databases;

And to clear your hook list, just run `--refresh` with `hooks` as the cell content, like we did for engines:

And to clear your hook list, just run --refresh with hooks as the cell content, like we did for engines:

In [2]:

%%sql --refreshhooks

Out[2]:

'Removed all records from hooks'

## Write SQL Output to Session Variable

Write SQL Output to Session Variable¶

To write the output of your query to a Python variable, use the `--var` argument.

To write the output of your query to a Python variable, use the --var argument.

In [6]:

%%sql --engine=LC --var=dfselect 1 as some_lc_column

Out[6]:

	some_lc_column
0	1

In [7]:

df

Out[7]:

	some_lc_column
0	1

## Background Queries

Background Queries¶

To run queries in the background, use the `--background` flag:

To run queries in the background, use the --background flag:

In [8]:

%%sql --engine=LC --var=another_df --backgroundselect 1 as some_lc_column

Notice here that `another_df` is undefined since the query is still running:

Notice here that another_df is undefined since the query is still running:

In [9]:

another_df

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-9-135decb41fec> in <module>
----> 1 another_df

NameError: name 'another_df' is not defined

As soon as the query is done, you can call `another_df` successfully.

As soon as the query is done, you can call another_df successfully.

In [11]:

another_df

Out[11]:

	some_lc_column
0	1

I'm working on a way to notify the user when a query finishes in the background. I'm also working on adding UI buttons using ipywidgets. Enjoy!

I'm working on a way to notify the user when a query finishes in the background. I'm also working on adding UI buttons using ipywidgets. Enjoy!

SQLCell 2.0: Redesigning SQLCell for JupyterLab

Engine Persistence ¶

Python Variables as Query Parameters ¶

Engine Aliasing ¶

Defining and Persisting Hooks ¶

Write SQL Output to Session Variable ¶

Background Queries ¶

Engine Persistence¶

Python Variables as Query Parameters¶

Engine Aliasing¶

Defining and Persisting Hooks¶

Write SQL Output to Session Variable¶

Background Queries¶

Share on

You May Also Enjoy

So you want to write a book? A conversation with Manning author John Berryman

Concept Frequency-Inverse Concept Document Frquency: Analyzing Concepts in Text

Think Twice Before You Accept That Fancy Data Science Job

Using Bigram Paragraph Vectors for Concept Detection