SQLCell Part I: The Basics
* [Description](#Description)* [Installation](#Installation)* [Buttons](#Buttons)* [Declaring Engines](#Declaring-Engines)* [Passing variables from Python to SQL](#Passing-variables-from-Python-to-SQL)* [Pass output from SQL to Python](#Pass-output-from-SQL-to-Python)* [Query plan as color coded table or graph](#Query-plan-as-color-coded-table-or-graph)## DescriptionThis project came about after I got tired of importing SQLAlchemy everytime I wanted to run a query. You should just be able to query your database directly from Jupyter, right? That's what this project accomplishes—along with many other fetaures that make it more like a SQL GUI than a Jupyter plugin.First, we need to pull down a copy of the project from [here](https://github.com/tmthyjames/SQLCell).Description¶
This project came about after I got tired of importing SQLAlchemy everytime I wanted to run a query. You should just be able to query your database directly from Jupyter, right? That's what this project accomplishes—along with many other fetaures that make it more like a SQL GUI than a Jupyter plugin.
First, we need to pull down a copy of the project from here.
## InstallationInstallation¶
`git clone` the repo and `cp` the `sqlcell_app.py` file to Jupyter's startup directory (on my computer, the directory is `~/.ipython/profile_default/startup`, but may be different depending on your OS and version of IPython/Jupyter) like so:Just git clone the repo and cp the sqlcell_app.py file to Jupyter's startup directory (on my computer, the directory is ~/.ipython/profile_default/startup, but may be different depending on your OS and version of IPython/Jupyter) like so:
$ cd .ipython/profile_default/startup # your start up directory here$ git clone https://github.com/tmthyjames/SQLCell.git$ cp SQLCell/sqlcell_app.py sqlcell_app.py # cp sqlcell_app.py to the startup folder so it will get executed, just use the `ENGINE` parameter along with your connection string and give it a go.Now, just use the ENGINE parameter along with your connection string and give it a go.
%%sql ENGINE='postgresql://tdobbins:tdobbins@localhost:5432/sports'select * from nba limit 5## ButtonsButtons¶
As you can see, this outputs a few things. You get your data for one, but you also get access to a lot of other functionality. For instance, the button group on the left allows you to:<br/>  <button title="Explain Analyze Graph" type="button" class="btn btn-info btn-sm button-sect"><p class="fa fa-code-fork fa-rotate-270"></p></button> Run `EXPLAIN ANALYZE` query and return D3.js sankey chart for analysis.  <button title="Explain Analyze" type="button" class="btn btn-info btn-sm button-sect"><p class="fa fa-info-circle"></p></button> Run `EXPLAIN ANALYZE` query and return color-coded table for analysis.  <button type="button" title="Execute" class="btn btn-success btn-sm button-sect"><p class="fa fa-play"></p></button> Execute your query.  <button type="button" title="Execute and Return Data as Variable" class="btn btn-success btn-sm button-sect"><p class="">var</p></button> Save the output of your query to a pandas dataframe.  <button title="Save" class="btn btn-success btn-sm button-sect" type="button"><p class="fa fa-save"></p></button> Save the results of your query to a TSV.  <button title="Cancel Query" class="button-sect btn btn-danger btn-sm" type="button"><p class="fa fa-stop"></p></button> Halt your query.As you can see, this outputs a few things. You get your data for one, but you also get access to a lot of other functionality. For instance, the button group on the left allows you to:
Run EXPLAIN ANALYZE query and return D3.js sankey chart for analysis.
Run EXPLAIN ANALYZE query and return color-coded table for analysis.
Execute your query.
Save the output of your query to a pandas dataframe.
Save the results of your query to a TSV.
Halt your query.
The button on the right is an execute button. Instead of using `shift + enter` to run your query you can just press that button. This turns into a group of buttons when you declare more than one engine. More on this in the net section.The button on the right is an execute button. Instead of using shift + enter to run your query you can just press that button. This turns into a group of buttons when you declare more than one engine. More on this in the net section.
## Declaring EnginesDeclaring Engines¶
You can save the connection string with the `declare_engines` flag. This will allow you to run queries without specifying the `ENGINE` parameter as in our first query.You can save the connection string with the declare_engines flag. This will allow you to run queries without specifying the ENGINE parameter as in our first query.
%%sql --declare_engines newLOCAL=postgresql://tdobbins:tdobbins@localhost:5432/To declare multiple engines, you can use the `append` argument, or you can specify multiple in your initial engine declaration. Let's use the `append` argument here. I have two other postgres instances I will declare, one RDS and one EC2. The right side of the equation is what the button text will be; the left side is the connection string.To declare multiple engines, you can use the append argument, or you can specify multiple in your initial engine declaration. Let's use the append argument here. I have two other postgres instances I will declare, one RDS and one EC2. The right side of the equation is what the button text will be; the left side is the connection string.
%%sql --declare_engines appendRDS=postgresql://username:password@orionpax.cvb2tnvqksy5.us-east-1.rds.amazonaws.com:5432/EC2=postgresql://username:password@ec2-34-204-5-97.compute-1.amazonaws.com:5432/Now you can easily switch between different connection engines by pushing the engine's button.Now you can easily switch between different connection engines by pushing the engine's button.
## Passing variables from Python to SQLPassing variables from Python to SQL¶
This is my favorite part of SQLCell. The ability to pass variables from Python to SQL makes dynamic queries easier to write, which especially handy for web developers using a wrapper such as SQLAlchemy or something similar. Let's test it out.This is my favorite part of SQLCell. The ability to pass variables from Python to SQL makes dynamic queries easier to write, which especially handy for web developers using a wrapper such as SQLAlchemy or something similar. Let's test it out.
# in this cell, define your python variablessome_int = 30some_float = .3some_str = 'GS'some_tuple = ('WAS', 'MIN')Now that we've defined our Python variables, we can construct a dynamic query.Now that we've defined our Python variables, we can construct a dynamic query.
%%sql DB=sportsSELECT * FROM nba WHERE pts > %(some_int)s AND fg/fg_att::FLOAT > %(some_float)s AND team = %(some_str)s AND opp IN %(some_tuple)sYou can also use the colon syntax. Some may find this cleaner and more readable.You can also use the colon syntax. Some may find this cleaner and more readable.
%%sql DB=sportsSELECT * FROM nba WHERE pts > :some_int AND fg/fg_att::FLOAT > :some_float AND team = :some_str AND opp IN :some_tuple## Pass output from SQL to PythonPass output from SQL to Python¶
To return the output as a Python variable, use the `var` button, or you can use the `MAKE_GLOBAL` parameter. Let's do it using the parameter.To return the output as a Python variable, use the var button, or you can use the MAKE_GLOBAL parameter. Let's do it using the parameter.
%%sql DB=sports MAKE_GLOBAL=dfSELECT * FROM nba WHERE pts > :some_int AND fg/fg_att::FLOAT > :some_float AND team = :some_str AND opp IN :some_tupleNow, `df` contains the dataframe that gets returned from the query.Now, df contains the dataframe that gets returned from the query.
You can also pass the output of the query to python as the raw result set using the `RAW` parameter.You can also pass the output of the query to python as the raw result set using the RAW parameter.
%%sql DB=sports MAKE_GLOBAL=df RAW=TrueSELECT * FROM nba WHERE pts > :some_int AND fg/fg_att::FLOAT > :some_float AND team = :some_str AND opp IN :some_tupleNow, `df` is the raw result set returned by SQLAlchemy.Now, df is the raw result set returned by SQLAlchemy.
## Query plan as color coded table or graphQuery plan as color coded table or graph¶
This feature helps to point out slow spots in your queries. First let's look at the query plan table. At the moment, it only works with PostgreSQL.This feature helps to point out slow spots in your queries. First let's look at the query plan table. At the moment, it only works with PostgreSQL.
%%sql DB=econselect average_family_size, average_home_value, average_household_income, average_household_size, countyfrom block_data bdjoin block_shapes bs on bs.id = bd.block_code The darker the color, the slower the node. Here, we can see that most of the time was used by a hash join that returned 472 rows. Let's look at the query plan graph, built with d3.js. It's a sankey graph that shows the hierarchy of the nodes and their respective cost in terms of tie and rows.The darker the color, the slower the node. Here, we can see that most of the time was used by a hash join that returned 472 rows. Let's look at the query plan graph, built with d3.js. It's a sankey graph that shows the hierarchy of the nodes and their respective cost in terms of tie and rows.
%%sql DB=econSELECT *, CASE WHEN num_of_sb > 0 THEN true ELSE false end AS has_starbucksFROM ( SELECT ST_Area(geom::geography) / 1609.34^2 AS sq_miles, "HC03_VC96", "HC01_VC03", "HC01_VC103", "HC03_VC67", "HC03_VC68", "HC03_VC88", "HC03_VC49", "HC01_VC88", "HC03_VC135", "HC03_VC136", "HC03_VC92", z."HC03_VC12", z."HC03_VC41", "HC03_VC43", "HC03_VC161", "HC01_VC118", "HC03_VC87", z."HC03_VC171", zn.*, z.starbucks_count, sqrt("HC01_VC03") as "HC01_VC03_sqrt", count(sb."zipcode") over ( partition by sb.zipcode ) as num_of_sb, sb."City", sb."State", sb."Latitude", sb."Longitude" from zip_data_sb z join zip_data2 z2 on z2."Id" = z."Id" join zip_data3 z3 on z3."Id" = z."Id" left join sb_stores sb on sb."zipcode" = z."Id2" join cb_2015_us_zcta510_500k c on c.geoid10::bigint = z."Id2" join ( select zip, sum(case when naics ~ '^445' then est end)::bigint as "grocery_stores", sum(case when naics ~ '^722' then est end)::bigint as "restaurants", sum(case when naics ~ '^311' then est end)::bigint as "manufacturing", sum(case when naics ~ '^54171' or naics = '518210' then est end)::bigint as "tech", sum(case when naics ~ '^611' then est end)::bigint as "colleges", sum(case when naics = '------' then est end)::bigint as "total", sum(case when naics = '531120' then est end)::bigint as "shopping_centers", sum(case when naics ~ '^23611' then est end)::bigint as "new_homes", sum(case when naics ~ '^5311' then est end)::bigint as "hotels", sum(case when naics ~ '^4411' then est end)::bigint as "car_dealers", sum(case when naics ~ '^4853' then est end)::bigint as "taxis", sum(case when naics = '481111' or naics in ('485999', '488119', '488190') then est end)::bigint as "airport_related" from zip_to_naics group by zip ) zn on zn.zip = z."Id2") s