Let’s say that there is a table that needs to be filled or updated with new rows and values. How is this possible using SQLAlchemy? Well, luckily with the use of the good old Python dictionaries and the .tosql() method, we can manipulate our tables!
Let’s take a look at an example below:
First, let’s create a connection to our Postgres database
import pandas as pd import os as os from dotenv import load_dotenv from sqlalchemy import create_engine load_dotenv('.env') user = os.environ['login'] password = os.environ['password'] dbname = os.environ['dbname'] host = os.environ['host'] port = os.environ['port'] engine = create_engine( ( f"postgresql://{user}:" f"{password}" f"@{host}:{port}/{dbname}" ) ) connection = engine.connect()
Let’s call all the “test” table in our public schema
pd.read_sql_query("SELECT * FROM public.test", connection)
Let’s say this is our output, and we know that ‘abc’ and ‘def’ are columns in the test table.
abc def --- ---
Now, one crucial thing to consider is the data types of each field. Find that by:
test_data_types_query = "SELECT column_name, data_type, table_schema, table_name FROM information_schema.columns WHERE table_name = 'test' pd.read_sql_query(test_data_types_query, connection)
Output:
column_name data_type table_schema table_name 0 abc character varying public test 1 def character varying public test
Ok, so both fields are strings.
test_data = pd.DataFrame( { 'abc': ['a','b','c'], 'def': ['1','2','3'] # These can only be strings type } )
Now let’s send the data to the table and check the table.
test_data.to_sql( name = 'test', con = connection, if_exists='append', index=False ) connection.commit() pd.read_sql_query("SELECT * FROM public.test", connection)
The table has been updated!
abc def 0 a 1 1 b 2 2 c 3
Ok, well that was quite a lot to take in, but the general idea is to use the .to_sql() method with the specified parameters to make changes to the Postgres database. The parameters that are used are:
- ‘name’: The name of the SQL table
- ‘con’: The database connection, in this case, we already declared the ‘connection’ variable
- ‘index’: Whether to write the DataFrame index as a column in the table
- ‘True’ or not using a parameter at all will include the indexing in the SQL table and using ‘false’ will exclude it.
- ‘if_exists’: If the table exists and has some data, we can specify what to do. Options include ‘fail’, ‘replace’, and ‘append’.
- Using ‘fail’ will grant an error if the table exists and this can be used to prevent any accidental data loss.
- Using ‘replace’ will replace whatever is in the table already with an empty table and add whatever we just specified with .to_sql().
- Using ‘append’ will add to our table if it does exist.