SQLCell Part I: The Basics
* [Description](#Description)
* [Installation](#Installation)
* [Buttons](#Buttons)
* [Declaring Engines](#Declaring-Engines)
* [Passing variables from Python to SQL](#Passing-variables-from-Python-to-SQL)
* [Pass output from SQL to Python](#Pass-output-from-SQL-to-Python)
* [Query plan as color coded table or graph](#Query-plan-as-color-coded-table-or-graph)
## Description
This project came about after I got tired of importing SQLAlchemy everytime I wanted to run a query. You should just be able to query your database directly from Jupyter, right? That's what this project accomplishes—along with many other fetaures that make it more like a SQL GUI than a Jupyter plugin.
First, we need to pull down a copy of the project from [here](https://github.com/tmthyjames/SQLCell).
Description¶
This project came about after I got tired of importing SQLAlchemy everytime I wanted to run a query. You should just be able to query your database directly from Jupyter, right? That's what this project accomplishes—along with many other fetaures that make it more like a SQL GUI than a Jupyter plugin.
First, we need to pull down a copy of the project from here.
## Installation
Installation¶
`git clone` the repo and `cp` the `sqlcell_app.py` file to Jupyter's startup directory (on my computer, the directory is `~/.ipython/profile_default/startup`, but may be different depending on your OS and version of IPython/Jupyter) like so:
Just git clone
the repo and cp
the sqlcell_app.py
file to Jupyter's startup directory (on my computer, the directory is ~/.ipython/profile_default/startup
, but may be different depending on your OS and version of IPython/Jupyter) like so:
$ cd .ipython/profile_default/startup # your start up directory here
$ git clone https://github.com/tmthyjames/SQLCell.git
$ cp SQLCell/sqlcell_app.py sqlcell_app.py # cp sqlcell_app.py to the startup folder so it will get executed
`ENGINE` parameter along with your connection string and give it a go. , just use the
Now, just use the ENGINE
parameter along with your connection string and give it a go.
%%sql ENGINE='postgresql://tdobbins:tdobbins@localhost:5432/sports'
select * from nba limit 5
## Buttons
Buttons¶
As you can see, this outputs a few things. You get your data for one, but you also get access to a lot of other functionality. For instance, the button group on the left allows you to:
<br/>
  <button title="Explain Analyze Graph" type="button" class="btn btn-info btn-sm button-sect"><p class="fa fa-code-fork fa-rotate-270"></p></button> Run `EXPLAIN ANALYZE` query and return D3.js sankey chart for analysis.
  <button title="Explain Analyze" type="button" class="btn btn-info btn-sm button-sect"><p class="fa fa-info-circle"></p></button> Run `EXPLAIN ANALYZE` query and return color-coded table for analysis.
  <button type="button" title="Execute" class="btn btn-success btn-sm button-sect"><p class="fa fa-play"></p></button> Execute your query.
  <button type="button" title="Execute and Return Data as Variable" class="btn btn-success btn-sm button-sect"><p class="">var</p></button> Save the output of your query to a pandas dataframe.
  <button title="Save" class="btn btn-success btn-sm button-sect" type="button"><p class="fa fa-save"></p></button> Save the results of your query to a TSV.
  <button title="Cancel Query" class="button-sect btn btn-danger btn-sm" type="button"><p class="fa fa-stop"></p></button> Halt your query.
As you can see, this outputs a few things. You get your data for one, but you also get access to a lot of other functionality. For instance, the button group on the left allows you to:
Run EXPLAIN ANALYZE
query and return D3.js sankey chart for analysis.
Run EXPLAIN ANALYZE
query and return color-coded table for analysis.
Execute your query.
Save the output of your query to a pandas dataframe.
Save the results of your query to a TSV.
Halt your query.
The button on the right is an execute button. Instead of using `shift + enter` to run your query you can just press that button. This turns into a group of buttons when you declare more than one engine. More on this in the net section.
The button on the right is an execute button. Instead of using shift + enter
to run your query you can just press that button. This turns into a group of buttons when you declare more than one engine. More on this in the net section.
## Declaring Engines
Declaring Engines¶
You can save the connection string with the `declare_engines` flag. This will allow you to run queries without specifying the `ENGINE` parameter as in our first query.
You can save the connection string with the declare_engines
flag. This will allow you to run queries without specifying the ENGINE
parameter as in our first query.
%%sql --declare_engines new
LOCAL=postgresql://tdobbins:tdobbins@localhost:5432/
To declare multiple engines, you can use the `append` argument, or you can specify multiple in your initial engine declaration. Let's use the `append` argument here. I have two other postgres instances I will declare, one RDS and one EC2. The right side of the equation is what the button text will be; the left side is the connection string.
To declare multiple engines, you can use the append
argument, or you can specify multiple in your initial engine declaration. Let's use the append
argument here. I have two other postgres instances I will declare, one RDS and one EC2. The right side of the equation is what the button text will be; the left side is the connection string.
%%sql --declare_engines append
RDS=postgresql://username:password@orionpax.cvb2tnvqksy5.us-east-1.rds.amazonaws.com:5432/
EC2=postgresql://username:password@ec2-34-204-5-97.compute-1.amazonaws.com:5432/
Now you can easily switch between different connection engines by pushing the engine's button.
Now you can easily switch between different connection engines by pushing the engine's button.
## Passing variables from Python to SQL
Passing variables from Python to SQL¶
This is my favorite part of SQLCell. The ability to pass variables from Python to SQL makes dynamic queries easier to write, which especially handy for web developers using a wrapper such as SQLAlchemy or something similar. Let's test it out.
This is my favorite part of SQLCell. The ability to pass variables from Python to SQL makes dynamic queries easier to write, which especially handy for web developers using a wrapper such as SQLAlchemy or something similar. Let's test it out.
# in this cell, define your python variables
some_int = 30
some_float = .3
some_str = 'GS'
some_tuple = ('WAS', 'MIN')
Now that we've defined our Python variables, we can construct a dynamic query.
Now that we've defined our Python variables, we can construct a dynamic query.
%%sql DB=sports
SELECT *
FROM nba
WHERE pts > %(some_int)s
AND fg/fg_att::FLOAT > %(some_float)s
AND team = %(some_str)s
AND opp IN %(some_tuple)s
You can also use the colon syntax. Some may find this cleaner and more readable.
You can also use the colon syntax. Some may find this cleaner and more readable.
%%sql DB=sports
SELECT *
FROM nba
WHERE pts > :some_int
AND fg/fg_att::FLOAT > :some_float
AND team = :some_str
AND opp IN :some_tuple
## Pass output from SQL to Python
Pass output from SQL to Python¶
To return the output as a Python variable, use the `var` button, or you can use the `MAKE_GLOBAL` parameter. Let's do it using the parameter.
To return the output as a Python variable, use the var
button, or you can use the MAKE_GLOBAL
parameter. Let's do it using the parameter.
%%sql DB=sports MAKE_GLOBAL=df
SELECT *
FROM nba
WHERE pts > :some_int
AND fg/fg_att::FLOAT > :some_float
AND team = :some_str
AND opp IN :some_tuple
Now, `df` contains the dataframe that gets returned from the query.
Now, df
contains the dataframe that gets returned from the query.
You can also pass the output of the query to python as the raw result set using the `RAW` parameter.
You can also pass the output of the query to python as the raw result set using the RAW
parameter.
%%sql DB=sports MAKE_GLOBAL=df RAW=True
SELECT *
FROM nba
WHERE pts > :some_int
AND fg/fg_att::FLOAT > :some_float
AND team = :some_str
AND opp IN :some_tuple
Now, `df` is the raw result set returned by SQLAlchemy.
Now, df
is the raw result set returned by SQLAlchemy.
## Query plan as color coded table or graph
Query plan as color coded table or graph¶
This feature helps to point out slow spots in your queries. First let's look at the query plan table. At the moment, it only works with PostgreSQL.
This feature helps to point out slow spots in your queries. First let's look at the query plan table. At the moment, it only works with PostgreSQL.
%%sql DB=econ
select average_family_size,
average_home_value,
average_household_income,
average_household_size,
county
from block_data bd
join block_shapes bs on bs.id = bd.block_code
The darker the color, the slower the node. Here, we can see that most of the time was used by a hash join that returned 472 rows. Let's look at the query plan graph, built with d3.js. It's a sankey graph that shows the hierarchy of the nodes and their respective cost in terms of tie and rows.
The darker the color, the slower the node. Here, we can see that most of the time was used by a hash join that returned 472 rows. Let's look at the query plan graph, built with d3.js. It's a sankey graph that shows the hierarchy of the nodes and their respective cost in terms of tie and rows.
%%sql DB=econ
SELECT *,
CASE
WHEN num_of_sb > 0 THEN true
ELSE false
end AS has_starbucks
FROM (
SELECT
ST_Area(geom::geography) / 1609.34^2 AS sq_miles,
"HC03_VC96",
"HC01_VC03",
"HC01_VC103",
"HC03_VC67",
"HC03_VC68",
"HC03_VC88",
"HC03_VC49",
"HC01_VC88",
"HC03_VC135",
"HC03_VC136",
"HC03_VC92",
z."HC03_VC12",
z."HC03_VC41",
"HC03_VC43",
"HC03_VC161",
"HC01_VC118",
"HC03_VC87",
z."HC03_VC171",
zn.*,
z.starbucks_count,
sqrt("HC01_VC03") as "HC01_VC03_sqrt",
count(sb."zipcode") over (
partition by sb.zipcode
) as num_of_sb,
sb."City",
sb."State",
sb."Latitude",
sb."Longitude"
from zip_data_sb z
join zip_data2 z2 on z2."Id" = z."Id"
join zip_data3 z3 on z3."Id" = z."Id"
left join sb_stores sb on sb."zipcode" = z."Id2"
join cb_2015_us_zcta510_500k c on c.geoid10::bigint = z."Id2"
join (
select zip,
sum(case when naics ~ '^445' then est end)::bigint as "grocery_stores",
sum(case when naics ~ '^722' then est end)::bigint as "restaurants",
sum(case when naics ~ '^311' then est end)::bigint as "manufacturing",
sum(case when naics ~ '^54171' or naics = '518210' then est end)::bigint as "tech",
sum(case when naics ~ '^611' then est end)::bigint as "colleges",
sum(case when naics = '------' then est end)::bigint as "total",
sum(case when naics = '531120' then est end)::bigint as "shopping_centers",
sum(case when naics ~ '^23611' then est end)::bigint as "new_homes",
sum(case when naics ~ '^5311' then est end)::bigint as "hotels",
sum(case when naics ~ '^4411' then est end)::bigint as "car_dealers",
sum(case when naics ~ '^4853' then est end)::bigint as "taxis",
sum(case when naics = '481111'
or naics in ('485999', '488119', '488190') then est end)::bigint as "airport_related"
from zip_to_naics
group by zip
) zn on zn.zip = z."Id2"
) s