Analysing and sharing genetic data with Python
We give an overview of how Python is used in bioinformatics, focusing on high throughput DNA sequencing and clinical genetic variant detection and classification.
We show our Django application, VariantGrid, developed by SA Health, which has also been used as a base for other projects, including the Australian Genomics project Shariant – the first platform used across Australia and New Zealand diagnostic labs for sharing variant classifications.
See this talk and many more by getting your ticket to PyCon AU now!
I want a ticket!Python is one of the most popular languages used in Bioinformatics (a discipline that involves the intersection of biology and computing). Software has become critical to delivering modern healthcare to manage the enormous amounts of data. The latest DNA sequencers produce over a Terabyte of data per day.
We give a brief overview of how DNA sequencing is performed clinically, and how Python fits into software processing (workflow systems like SnakeMake acting as a ‘glue’ for High Performance Computing, as well as quick prototyping language for custom research questions).
We show a technical overview of VariantGrid - a Django application for storing, analysing and classifying genetic variants developed by SA Health that processes around 6000 patients a year.
An analysis may involve multiple patients with a million variants each, and VariantGrid allows users to perform custom analyses by building variant filters via an interactive drag and drop node-graph interface. This is implemented as a Directed Acyclic Graph of Django Q objects, which are logically combined to generate a query.
VariantGrid has been released as open and free for research use on GitHub: https://github.com/SACGF/variantgrid.
The system is also used as a technology platform for other variant database projects such as RUNX1 - an international collaboration for a rare form of Leukemia, and Shariant - an Australian Genomics project for sharing variant knowledge between Australian and New Zealand diagnostic labs. Shariant has shared more than 23,000 variant submissions from 13 labs to date (Tudini et al 2022, PMID: 36332611), enabling laboratories to provide faster and more accurate diagnoses and care for patients.
David Lawrence worked as a mostly-Java programmer for 9 years before becoming a Bioinformatician in 2011 where he soon fell in love with Python. He works for the ACRF Cancer Genomics Facility at SA Pathology and UniSA, doing translational research for SA Health
Google Scholar: https://scholar.google.com.au/citations?user=mnHz4gluptkC&hl=en