* [Description](#Description)* [Installation](#Installation)* [Buttons](#Buttons)* [Declaring Engines](#Declaring-Engines)* [Passing variables from Python to SQL](#Passing-variables-from-Python-to-SQL)* [Pass output from SQL to Python](#Pass-output-from-SQL-to-Python)* [Query plan as color coded table or graph](#Query-plan-as-color-coded-table-or-graph)

## Description​This project came about after I got tired of importing SQLAlchemy everytime I wanted to run a query. You should just be able to query your database directly from Jupyter, right? That's what this project accomplishes—along with many other fetaures that make it more like a SQL GUI than a Jupyter plugin.​First, we need to pull down a copy of the project from [here](https://github.com/tmthyjames/SQLCell).

Description¶

This project came about after I got tired of importing SQLAlchemy everytime I wanted to run a query. You should just be able to query your database directly from Jupyter, right? That's what this project accomplishes—along with many other fetaures that make it more like a SQL GUI than a Jupyter plugin.

First, we need to pull down a copy of the project from here.

## Installation

Installation¶

Just `git clone` the repo and `cp` the `sqlcell_app.py` file to Jupyter's startup directory (on my computer, the directory is `~/.ipython/profile_default/startup`, but may be different depending on your OS and version of IPython/Jupyter) like so:

Just git clone the repo and cp the sqlcell_app.py file to Jupyter's startup directory (on my computer, the directory is ~/.ipython/profile_default/startup, but may be different depending on your OS and version of IPython/Jupyter) like so:

$ cd .ipython/profile_default/startup # your start up directory here$ git clone https://github.com/tmthyjames/SQLCell.git$ cp SQLCell/sqlcell_app.py sqlcell_app.py # cp sqlcell_app.py to the startup folder so it will get executed

Now, just use the `ENGINE` parameter along with your connection string and give it a go.

Now, just use the ENGINE parameter along with your connection string and give it a go.

In [5]:

%%sql ENGINE='postgresql://tdobbins:tdobbins@localhost:5432/sports'select * from nba limit 5

To execute: 0.01 sec | Rows: 5 | DB: sports | Host: localhost | READ MODE

	dateof	team	opp	pts	fg	fg_att	ft	ft_att	fg3	fg3_att	off_rebounds	def_rebounds	asst	blks	fouls	stls	turnovers
1	2015-10-27	DET	ATL	106	37	96	20	26	12	29	23	36	23	3	15	5	15
2	2015-10-27	ATL	DET	94	37	82	12	15	8	27	7	33	22	4	25	9	15
3	2015-10-27	CLE	CHI	95	38	94	10	17	9	29	11	39	26	7	21	5	11
4	2015-10-27	CHI	CLE	97	37	87	16	23	7	19	7	40	13	10	22	6	13
5	2015-10-27	NO	GS	95	35	83	19	27	6	18	8	25	21	3	26	9	19

## Buttons

Buttons¶

As you can see, this outputs a few things. You get your data for one, but you also get access to a lot of other functionality. For instance, the button group on the left allows you to:​<br/>​&emsp;&emsp;<button title="Explain Analyze Graph" type="button" class="btn btn-info btn-sm button-sect"><p class="fa fa-code-fork fa-rotate-270"></p></button> Run `EXPLAIN ANALYZE` query and return D3.js sankey chart for analysis.​&emsp;&emsp;<button title="Explain Analyze" type="button" class="btn btn-info btn-sm button-sect"><p class="fa fa-info-circle"></p></button> Run `EXPLAIN ANALYZE` query and return color-coded table for analysis.​&emsp;&emsp;<button type="button" title="Execute" class="btn btn-success btn-sm button-sect"><p class="fa fa-play"></p></button> Execute your query.​&emsp;&emsp;<button type="button" title="Execute and Return Data as Variable" class="btn btn-success btn-sm button-sect"><p class="">var</p></button> Save the output of your query to a pandas dataframe.​&emsp;&emsp;<button title="Save" class="btn btn-success btn-sm button-sect" type="button"><p class="fa fa-save"></p></button> Save the results of your query to a TSV.​&emsp;&emsp;<button title="Cancel Query" class="button-sect btn btn-danger btn-sm" type="button"><p class="fa fa-stop"></p></button> Halt your query.

As you can see, this outputs a few things. You get your data for one, but you also get access to a lot of other functionality. For instance, the button group on the left allows you to:

Run EXPLAIN ANALYZE query and return D3.js sankey chart for analysis.

Run EXPLAIN ANALYZE query and return color-coded table for analysis.

Execute your query.

Save the output of your query to a pandas dataframe.

Save the results of your query to a TSV.

Halt your query.

The button on the right is an execute button. Instead of using `shift + enter` to run your query you can just press that button. This turns into a group of buttons when you declare more than one engine. More on this in the net section.

The button on the right is an execute button. Instead of using shift + enter to run your query you can just press that button. This turns into a group of buttons when you declare more than one engine. More on this in the net section.

## Declaring Engines

Declaring Engines¶

You can save the connection string with the `declare_engines` flag. This will allow you to run queries without specifying the `ENGINE` parameter as in our first query.

You can save the connection string with the declare_engines flag. This will allow you to run queries without specifying the ENGINE parameter as in our first query.

In [6]:

%%sql --declare_engines newLOCAL=postgresql://tdobbins:tdobbins@localhost:5432/

new engines created

To declare multiple engines, you can use the `append` argument, or you can specify multiple in your initial engine declaration. Let's use the `append` argument here. I have two other postgres instances I will declare, one RDS and one EC2. The right side of the equation is what the button text will be; the left side is the connection string.

To declare multiple engines, you can use the append argument, or you can specify multiple in your initial engine declaration. Let's use the append argument here. I have two other postgres instances I will declare, one RDS and one EC2. The right side of the equation is what the button text will be; the left side is the connection string.

In [8]:

%%sql --declare_engines appendRDS=postgresql://username:password@orionpax.cvb2tnvqksy5.us-east-1.rds.amazonaws.com:5432/EC2=postgresql://username:password@ec2-34-204-5-97.compute-1.amazonaws.com:5432/

new engines created

Now you can easily switch between different connection engines by pushing the engine's button.

Now you can easily switch between different connection engines by pushing the engine's button.

## Passing variables from Python to SQL

Passing variables from Python to SQL¶

This is my favorite part of SQLCell. The ability to pass variables from Python to SQL makes dynamic queries easier to write, which especially handy for web developers using a wrapper such as SQLAlchemy or something similar. Let's test it out.

This is my favorite part of SQLCell. The ability to pass variables from Python to SQL makes dynamic queries easier to write, which especially handy for web developers using a wrapper such as SQLAlchemy or something similar. Let's test it out.

In [10]:

# in this cell, define your python variablessome_int = 30some_float = .3some_str = 'GS'some_tuple = ('WAS', 'MIN')

Now that we've defined our Python variables, we can construct a dynamic query.

Now that we've defined our Python variables, we can construct a dynamic query.

In [11]:

%%sql DB=sportsSELECT * FROM nba WHERE pts > %(some_int)s    AND fg/fg_att::FLOAT > %(some_float)s    AND team = %(some_str)s    AND opp IN %(some_tuple)s

To execute: 0.027 sec | Rows: 6 | DB: sports | Host: localhost | READ MODE

	dateof	team	opp	pts	fg	fg_att	ft	ft_att	fg3	fg3_att	off_rebounds	def_rebounds	asst	blks	fouls	stls	turnovers
1	2015-11-12	GS	MIN	129	48	89	15	16	18	38	6	27	34	7	25	8	11
2	2016-02-03	GS	WAS	134	49	93	16	29	20	42	11	38	35	5	24	10	18
3	2016-03-21	GS	MIN	109	43	87	15	24	8	24	13	31	31	10	20	13	17
4	2016-03-29	GS	WAS	102	38	87	15	23	11	25	7	48	27	6	23	11	16
5	2016-04-05	GS	MIN	117	49	98	7	8	12	35	9	37	35	5	29	9	24
6	2016-11-26	GS	MIN	115	45	81	14	17	11	22	12	33	25	8	14	9	18

You can also use the colon syntax. Some may find this cleaner and more readable.

You can also use the colon syntax. Some may find this cleaner and more readable.

In [12]:

%%sql DB=sportsSELECT * FROM nba WHERE pts > :some_int    AND fg/fg_att::FLOAT > :some_float    AND team = :some_str    AND opp IN :some_tuple

To execute: 0.014 sec | Rows: 6 | DB: sports | Host: localhost | READ MODE

	dateof	team	opp	pts	fg	fg_att	ft	ft_att	fg3	fg3_att	off_rebounds	def_rebounds	asst	blks	fouls	stls	turnovers
1	2015-11-12	GS	MIN	129	48	89	15	16	18	38	6	27	34	7	25	8	11
2	2016-02-03	GS	WAS	134	49	93	16	29	20	42	11	38	35	5	24	10	18
3	2016-03-21	GS	MIN	109	43	87	15	24	8	24	13	31	31	10	20	13	17
4	2016-03-29	GS	WAS	102	38	87	15	23	11	25	7	48	27	6	23	11	16
5	2016-04-05	GS	MIN	117	49	98	7	8	12	35	9	37	35	5	29	9	24
6	2016-11-26	GS	MIN	115	45	81	14	17	11	22	12	33	25	8	14	9	18

## Pass output from SQL to Python

Pass output from SQL to Python¶

To return the output as a Python variable, use the `var` button, or you can use the `MAKE_GLOBAL` parameter. Let's do it using the parameter.

To return the output as a Python variable, use the var button, or you can use the MAKE_GLOBAL parameter. Let's do it using the parameter.

In [13]:

%%sql DB=sports MAKE_GLOBAL=dfSELECT * FROM nba WHERE pts > :some_int    AND fg/fg_att::FLOAT > :some_float    AND team = :some_str    AND opp IN :some_tuple

To execute: 0.013 sec | Rows: 6 | DB: sports | Host: localhost | READ MODE

	dateof	team	opp	pts	fg	fg_att	ft	ft_att	fg3	fg3_att	off_rebounds	def_rebounds	asst	blks	fouls	stls	turnovers
1	2015-11-12	GS	MIN	129	48	89	15	16	18	38	6	27	34	7	25	8	11
2	2016-02-03	GS	WAS	134	49	93	16	29	20	42	11	38	35	5	24	10	18
3	2016-03-21	GS	MIN	109	43	87	15	24	8	24	13	31	31	10	20	13	17
4	2016-03-29	GS	WAS	102	38	87	15	23	11	25	7	48	27	6	23	11	16
5	2016-04-05	GS	MIN	117	49	98	7	8	12	35	9	37	35	5	29	9	24
6	2016-11-26	GS	MIN	115	45	81	14	17	11	22	12	33	25	8	14	9	18

Now, `df` contains the dataframe that gets returned from the query.

Now, df contains the dataframe that gets returned from the query.

In [14]:

df

Out[14]:

	dateof	team	opp	pts	fg	fg_att	ft	ft_att	fg3	fg3_att	off_rebounds	def_rebounds	asst	blks	fouls	stls	turnovers
0	2015-11-12	GS	MIN	129	48	89	15	16	18	38	6	27	34	7	25	8	11
1	2016-02-03	GS	WAS	134	49	93	16	29	20	42	11	38	35	5	24	10	18
2	2016-03-21	GS	MIN	109	43	87	15	24	8	24	13	31	31	10	20	13	17
3	2016-03-29	GS	WAS	102	38	87	15	23	11	25	7	48	27	6	23	11	16
4	2016-04-05	GS	MIN	117	49	98	7	8	12	35	9	37	35	5	29	9	24
5	2016-11-26	GS	MIN	115	45	81	14	17	11	22	12	33	25	8	14	9	18

You can also pass the output of the query to python as the raw result set using the `RAW` parameter.

You can also pass the output of the query to python as the raw result set using the RAW parameter.

In [15]:

%%sql DB=sports MAKE_GLOBAL=df RAW=TrueSELECT * FROM nba WHERE pts > :some_int    AND fg/fg_att::FLOAT > :some_float    AND team = :some_str    AND opp IN :some_tuple

To execute: 0.014 sec | Rows: 6 | DB: sports | Host: localhost | READ MODE

	dateof	team	opp	pts	fg	fg_att	ft	ft_att	fg3	fg3_att	off_rebounds	def_rebounds	asst	blks	fouls	stls	turnovers
1	2015-11-12	GS	MIN	129	48	89	15	16	18	38	6	27	34	7	25	8	11
2	2016-02-03	GS	WAS	134	49	93	16	29	20	42	11	38	35	5	24	10	18
3	2016-03-21	GS	MIN	109	43	87	15	24	8	24	13	31	31	10	20	13	17
4	2016-03-29	GS	WAS	102	38	87	15	23	11	25	7	48	27	6	23	11	16
5	2016-04-05	GS	MIN	117	49	98	7	8	12	35	9	37	35	5	29	9	24
6	2016-11-26	GS	MIN	115	45	81	14	17	11	22	12	33	25	8	14	9	18

Now, `df` is the raw result set returned by SQLAlchemy.

Now, df is the raw result set returned by SQLAlchemy.

In [16]:

df

Out[16]:

[(datetime.date(2015, 11, 12), u'GS', u'MIN', 129L, 48, 89, 15, 16, 18, 38, 6, 27, 34, 7, 25, 8, 11),
 (datetime.date(2016, 2, 3), u'GS', u'WAS', 134L, 49, 93, 16, 29, 20, 42, 11, 38, 35, 5, 24, 10, 18),
 (datetime.date(2016, 3, 21), u'GS', u'MIN', 109L, 43, 87, 15, 24, 8, 24, 13, 31, 31, 10, 20, 13, 17),
 (datetime.date(2016, 3, 29), u'GS', u'WAS', 102L, 38, 87, 15, 23, 11, 25, 7, 48, 27, 6, 23, 11, 16),
 (datetime.date(2016, 4, 5), u'GS', u'MIN', 117L, 49, 98, 7, 8, 12, 35, 9, 37, 35, 5, 29, 9, 24),
 (datetime.date(2016, 11, 26), u'GS', u'MIN', 115L, 45, 81, 14, 17, 11, 22, 12, 33, 25, 8, 14, 9, 18)]

## Query plan as color coded table or graph

Query plan as color coded table or graph¶

This feature helps to point out slow spots in your queries. First let's look at the query plan table. At the moment, it only works with PostgreSQL.

This feature helps to point out slow spots in your queries. First let's look at the query plan table. At the moment, it only works with PostgreSQL.

In [19]:

%%sql DB=econselect average_family_size,    average_home_value,    average_household_income,    average_household_size,    countyfrom block_data bdjoin block_shapes bs on bs.id = bd.block_code 

To execute: 0.011 sec | Rows: 8 | DB: econ | Host: localhost | READ MODE

	QUERY PLAN
1	Hash Join (cost=33.60..89.81 rows=472 width=48) (actual time=0.499..0.977 rows=472 loops=1)
2	Hash Cond: (bs.id = bd.block_code)
3	-> Seq Scan on block_shapes bs (cost=0.00..49.72 rows=472 width=8) (actual time=0.007..0.182 rows=472 loops=1)
4	-> Hash (cost=27.71..27.71 rows=471 width=56) (actual time=0.470..0.470 rows=471 loops=1)
5	Buckets: 1024 Batches: 1 Memory Usage: 49kB
6	-> Seq Scan on block_data bd (cost=0.00..27.71 rows=471 width=56) (actual time=0.008..0.352 rows=471 loops=1)
7	Planning time: 0.192 ms
8	Execution time: 1.054 ms

The darker the color, the slower the node. Here, we can see that most of the time was used by a hash join that returned 472 rows. Let's look at the query plan graph, built with d3.js. It's a sankey graph that shows the hierarchy of the nodes and their respective cost in terms of tie and rows.

The darker the color, the slower the node. Here, we can see that most of the time was used by a hash join that returned 472 rows. Let's look at the query plan graph, built with d3.js. It's a sankey graph that shows the hierarchy of the nodes and their respective cost in terms of tie and rows.

In [1]:

%%sql DB=econSELECT *,     CASE         WHEN num_of_sb > 0 THEN true        ELSE false    end AS has_starbucksFROM (    SELECT        ST_Area(geom::geography) / 1609.34^2 AS sq_miles,        "HC03_VC96",        "HC01_VC03",        "HC01_VC103",        "HC03_VC67",        "HC03_VC68",        "HC03_VC88",        "HC03_VC49",        "HC01_VC88",        "HC03_VC135",        "HC03_VC136",        "HC03_VC92",        z."HC03_VC12",        z."HC03_VC41",        "HC03_VC43",        "HC03_VC161",        "HC01_VC118",        "HC03_VC87",        z."HC03_VC171",        zn.*,        z.starbucks_count,        sqrt("HC01_VC03") as "HC01_VC03_sqrt",        count(sb."zipcode") over (            partition by sb.zipcode        ) as num_of_sb,        sb."City",        sb."State",        sb."Latitude",        sb."Longitude"    from zip_data_sb z    join zip_data2 z2 on z2."Id" = z."Id"    join zip_data3 z3 on z3."Id" = z."Id"    left join sb_stores sb on sb."zipcode" = z."Id2"    join cb_2015_us_zcta510_500k c on c.geoid10::bigint = z."Id2"    join (        select zip,            sum(case when naics ~ '^445' then est end)::bigint as "grocery_stores",            sum(case when naics ~ '^722' then est end)::bigint as "restaurants",            sum(case when naics ~ '^311' then est end)::bigint as "manufacturing",            sum(case when naics ~ '^54171' or naics = '518210' then est end)::bigint as "tech",            sum(case when naics ~ '^611' then est end)::bigint as "colleges",            sum(case when naics = '------' then est end)::bigint as "total",            sum(case when naics = '531120' then est end)::bigint as "shopping_centers",            sum(case when naics ~ '^23611' then est end)::bigint as "new_homes",            sum(case when naics ~ '^5311' then est end)::bigint as "hotels",            sum(case when naics ~ '^4411' then est end)::bigint as "car_dealers",            sum(case when naics ~ '^4853' then est end)::bigint as "taxis",            sum(case when naics = '481111'                 or naics in ('485999', '488119', '488190') then est end)::bigint as "airport_related"        from zip_to_naics         group by zip    ) zn on zn.zip = z."Id2") s

To execute: 48.948 sec | Rows: 1 | DB: econ | Host: localhost

SQLCell Part I: The Basics

Description¶

Installation¶

Buttons¶

Declaring Engines¶

Passing variables from Python to SQL¶

Pass output from SQL to Python¶

Query plan as color coded table or graph¶

Share on

You May Also Enjoy

SQLCell 2.0: Redesigning SQLCell for JupyterLab

So you want to write a book? A conversation with Manning author John Berryman

Concept Frequency-Inverse Concept Document Frquency: Analyzing Concepts in Text

Think Twice Before You Accept That Fancy Data Science Job