Uploading Data to a Table with SQL Alchemy

Let’s say that there is a table that needs to be filled or updated with new rows and values. How is this possible using SQLAlchemy? Well, luckily with the use of the good old Python dictionaries and the .tosql() method, we can manipulate our tables!

Let’s take a look at an example below:

First, let’s create a connection to our Postgres database

import pandas as pd
import os as os
from dotenv import load_dotenv
from sqlalchemy import create_engine

load_dotenv('.env')

user = os.environ['login']
password = os.environ['password']
dbname = os.environ['dbname']
host = os.environ['host']
port = os.environ['port']


engine = create_engine(
    (
        f"postgresql://{user}:"
        f"{password}"
        f"@{host}:{port}/{dbname}"
    )
)

connection = engine.connect()

Let’s call all the “test” table in our public schema

pd.read_sql_query("SELECT * FROM public.test", connection)

Let’s say this is our output, and we know that ‘abc’ and ‘def’ are columns in the test table.

	abc	def
    ---    ---

Now, one crucial thing to consider is the data types of each field. Find that by:

test_data_types_query = "SELECT column_name, data_type, table_schema, table_name FROM information_schema.columns WHERE table_name = 'test'
pd.read_sql_query(test_data_types_query, connection)

Output:

     column_name	data_type	table_schema	table_name
0        abc	character varying	public	test
1	    def	character varying	public	test

Ok, so both fields are strings.

test_data = pd.DataFrame(
    {
        'abc': ['a','b','c'],
        'def': ['1','2','3'] # These can only be strings type
    }
)

Now let’s send the data to the table and check the table.

test_data.to_sql(
    name = 'test',
    con = connection,
    if_exists='append',
    index=False
)

connection.commit()

pd.read_sql_query("SELECT * FROM public.test", connection)

The table has been updated!

    abc	def
0	a	1
1	b	2
2	c	3

Ok, well that was quite a lot to take in, but the general idea is to use the .to_sql() method with the specified parameters to make changes to the Postgres database. The parameters that are used are:

  • ‘name’: The name of the SQL table
  • ‘con’: The database connection, in this case, we already declared the ‘connection’ variable
  • ‘index’: Whether to write the DataFrame index as a column in the table
    • ‘True’ or not using a parameter at all will include the indexing in the SQL table and using ‘false’ will exclude it.
  • ‘if_exists’: If the table exists and has some data, we can specify what to do. Options include ‘fail’, ‘replace’, and ‘append’.
    • Using ‘fail’ will grant an error if the table exists and this can be used to prevent any accidental data loss.
    • Using ‘replace’ will replace whatever is in the table already with an empty table and add whatever we just specified with .to_sql().
    • Using ‘append’ will add to our table if it does exist.