Harry Li Consulting, LLC

Auto-complete table and column names in Jupyter with SQLAlchemy

SQL is pretty fantastic. It’s easy to learn and with the right training it can do very powerful things. Even in the wrong hands, it can do powerful things! And yet, as a programmer-turned-data scientist, I’m annoyed every time that I have to write a query. Many times, my irritation comes from not having a good SQL editor that auto-completes table and column names. Other times, it’s from writing the same SQL clause over and over again, copying and pasting and setting up hazards for my future self.

Since I work mostly in Jupyter for my analysis contracts, I wanted to share with others how I use SQLAlchemy to DRY up my queries and get auto-complete for table and column names.

Getting started

First off, I take advantage of SQLAlchemy’s reflection functionality. I’m often working with tables that have dozens of columns, I don’t want to go through the pain of specifying every single column and column type. Rather, I write something like this to load a SQLAlchemy table object based upon a table in the database:

conn_string = "postgresql://postgres:@localhost:5432/postgres"
engine = create_engine(conn_string)
meta = MetaData(bind=engine)
t_table = sqlalchemy.Table('names', meta, autoload=True)
Screenshot of table ‘names’ with three records

For the table names shown above, the t_table object gets loaded with this:

Reflected SQLAlchemy Table object shown in Jupyter

Now that we have tables represented as Python objects, I can start doing some fun things like parameterizing queries in loops and then kicking it off as I walk away to get a pastry while data gets fetched. Below, I show a simple query that gets run in a for loop with results saved as pandas Dataframes.

The first loop uses string substitution to parameterize a query. The second loop runs the same exact query, but uses more SQLAlchemy features.

For really simple or one off queries, I generally prefer just writing the raw SQL. For more complicated ones and especially if I have many queries that are subtly different, I like using the more programmatic SQLAlchemy way.

Auto-complete column names

Above, the second loop containing the more SQLAlchemy-ish query, you’ll see that I was able to access the first_name column of the names table by writing t_names.c.first_name. I originally thought that Jupyter’s built-in auto-complete of defined variables and functions would also extend to whatever was held in this mysterious c collection.

I was wrong. When I type t_names.c into a Jupyter cell and then hit Tab, I get nothing. So instead, with a little meta-programming magic, I wrote a loop that goes through a SQLAlchemy table object’s columns and explicitly defines them as attributes back on the object and gives it the name c_<column name>.

Dynamically defining column attributes

After running this loop, I can type t_names.c_ and then hit tab to let Jupyter do its normal auto-complete thing and we see a list of available column names. Tada!

Auto-complete table names

While the code I’ve shown so far will reflect a table and then tack on attributes for each column, you still need to reflect each table individually. One simple fix is to use SQLAlchemy’s Inspector object to grab a list of all the table names in a schema and then go through and reflect them individually. I tried that and found it incredibly annoying to have to wait for possibly hundreds of tables to finish reflecting before I could run anything.

Instead, I like doing it on-demand. Essentially, for a schema named ‘public,’ I create an empty class. I then use an Inspector to loop through all the table names, writing a little lambda for each one and tacking them on as properties onto the Public class.

I’ve included the full Jupyter notebook for doing this magic below. The punchline is that in my Jupyter notebooks I can type public. and hit tab to get a list of table names. Accessing public.names will on-demand reflect the table from the database. And after said reflection, I then get auto-completion of columns names, too.

Well, I hope the above was useful. If none of it made sense, I apologize for wasting your time. If your interest is piqued, you can see the full Jupyter code here. It will connect to a local Postgres instance and assume that a table called names exists in the public schema.

%d bloggers like this: