The current project I'm working on has database tables that are over 2 million records which I'm populating from an external source. The external source provides a postgres dump file to help make data imports easier. My current approach is to restore the backup to a temporary database and use the PostgreSQL COPY command to export the data to CSV files. The import works great, I've built a django management comand to read the data from a CSV file, hydrate the data models and then bulk load the data using Django's bulk upload. This all works smoothly when loading the data the first time.
Some of the datasets from the data load are big so I have to resort to using Django's Raw query feature to perform my import process. When I tried to run a raw delete command during the import process I got the following error:
update or delete on table violates foreign key constraint on table DETAIL: Key (id) is still referenced from table.
After doing some digging I noticed the Django ORM does not actually create ON CASACADE DELETE at the database level. Django actually iterates over the records via the ORMs delete method to perform the cascade. This is a huge problem because of the size of the tables. I need to be able to drop down to raw SQL when deleting the table.
In order to run the raw SQL delete query I needed to add ON CASCADE DELETE to the appropriate foreign key fields. I also need to tell the django ORM to create the ON CASCADE DELETE at the database level. This poses a problem because the django ORM does not support CASCADE ON DELETE at the database level. To fix this I need to disable constraints on appropriate foriegn key fields and manually add the constraint using raw SQL. The first thing I did was remove the current constraints being created by the django ORM. I did this by passing the db_constraint argument.
Class YourModel(models.Model): your_field = models.ForeignKey(YourOtherModel, db_constraint=False)
This will tell your database engine to create the foreign key field but it will not add a constraint to the field. Once I disabled all constraints on the necessary ForeignKey fields, I created a custom migration using djangos run sql migration command. This allowed me to add ON CASCADE DELETE to the database column. Since I disabled the constraint on the ORM I did not have to look for existing constraints and delete them, I just manually added the constraints using the migrations framework.
class Migration(migrations.Migration): dependencies = [ ('yourapp', '0001_initial'), ] operations = [ migrations.RunSQL( "ALTER TABLE yourapp_yourmodel ADD CONSTRAINT your_model_target_id_fk_id foreign key (target_id) references yourapp_yourothermodel(id) ON DELETE CASCADE", reverse_sql="ALTER TABLE yourapp_yourmodel DROP CONSTRAINT your_model_target_id_fk_id") ]
Obviously this can get pretty cumbersome and difficult to manage if you have a bunch of tables with foreign key constraints. This also leaves a bit of cognitive overhead as you need to remember where you manually created a custom migration or parts of your code will break. In order to address this I'm going to build a custom ForeignKey field that will build the ON DELETE CASCADE constraints. This field may not work on all databases but I'm heavily invested in PostgreSQL so I'm ok with that trade-off. I'll post an update once I have built the custom field.