$whoami--role ms-bioinformatics

Zahera (Fathima) Khatoon

MS Bioinformatics Candidate|NGS & ML Pipelines|Computational Biology

Building reproducible NGS pipelines and machine learning models for biological data — focused on automation, measurable impact, and translational applications.

Austin, TXopen to co-op & full-time roles
Program
MS · Bioinformatics
Focus
NGS · ML
Stack
Py · R · Bash
ATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGC
seq://chr1:p.R217H
model::helix.v2
01About

Building the bridge between wet lab insight and computational scale.

I'm an MS Bioinformatics candidate at Northeastern University (expected May 2027), with prior training in bioinformatics and microbiology from Osmania University. My graduate work sits at the intersection of structural biology, NGS genomics, and software engineering — with hands-on experience on HPC (SLURM) and implementing core algorithms from scratch in Python.

Based in Austin, TX — actively seeking co-op and entry-level roles in computational biology, NGS analysis, ML for biology, and translational genomics.

  • Computational Biology

    NGS analysis, variant calling, structural modeling, and docking.

  • Machine Learning

    Deep learning for image segmentation and biological feature extraction.

  • Pipeline Engineering

    Reproducible, automated workflows with Git, Bash, and Python tooling.

02Academic Projects

Graduate coursework & research projects.

A collection of bioinformatics projects from Northeastern and Osmania — spanning algorithms implemented from scratch, NGS pipelines on HPC, software packaging, and structural biology.

  • P-01Hackathon · Deep Learning
    Apr 2026

    Cell Segmentation ML Model

    BioHack 2026 — Northeastern University

    Developed an ML pipeline for automated cell segmentation within a 48-hour hackathon, applying image processing techniques to segment cell boundaries from microscopy data.

    • Delivered a working end-to-end segmentation pipeline in 48 hours
    • Applied classical image processing + ML for cell boundary detection
    • Benchmarked segmentation quality against expert-annotated microscopy
    Stack
    • Python
    • OpenCV
    • NumPy
    • scikit-image
    View on GitHub
  • P-02Algorithms · Supervised ML
    PrivateMar 2026

    Phylogenetic Tree (UPGMA) & Sequence Classifier

    Northeastern University — BINF6400

    Implemented UPGMA from scratch in Python for phylogenetic tree construction; built a supervised classifier for coding vs. non-coding sequences using feature engineering.

    • UPGMA implemented from scratch — no external tree-building libraries
    • Feature-engineered coding vs. non-coding classifier on DNA sequences
    • Produced publication-style dendrogram visualizations
    Stack
    • Python
    • NumPy
    • scikit-learn
    • Matplotlib
    Private repo — available on request
  • P-03Genomics · HPC
    PrivateFeb 2026

    NGS Quality Control Pipeline — E. coli & Rat

    Northeastern University — BINF6400

    Built end-to-end paired-end QC pipeline on the Explorer HPC cluster (SLURM); achieved >95% read retention post-trimming with automated MultiQC reporting.

    • >95% read retention across paired-end E. coli and rat datasets
    • SLURM job orchestration on Explorer HPC for batch processing
    • Automated MultiQC reports for reproducible QC gates
    Stack
    • Bash
    • FastQC
    • Trimmomatic
    • BBDuk
    • MultiQC
    • SLURM
    Private repo — available on request
  • P-04Genomics · Algorithms
    PrivateFeb 2026

    Genome Assembly & ORF Identification

    Northeastern University — BINF6400

    Implemented a greedy overlap-based genome assembler with coverage visualization; PWM motif-first ORF pipeline detecting 20 high-confidence ORFs across 20 sequences.

    • Greedy overlap assembler with per-base coverage plots
    • PWM motif-first ORF discovery across 20 input sequences
    • Detected 20 high-confidence open reading frames
    Stack
    • Python
    • Biopython
    • NumPy
    • Matplotlib
    Private repo — available on request
  • P-05Software Engineering · OOP
    PrivateSep – Dec 2025

    Bioinformatics Python Package — CCDS Analysis

    Northeastern University — BINF6200

    Engineered a modular CCDS parsing package using OOP, decorators, and list comprehensions — achieving 96% test coverage and a perfect pylint score.

    • 96% unit test coverage with pytest
    • Perfect pylint score — 10.0 / 10.0
    • Modular OOP design with decorators and clean separation of concerns
    Stack
    • Python
    • pytest
    • pylint
    • OOP
    Private repo — available on request
  • P-06Algorithms · Regulatory Genomics
    PrivateSep – Dec 2025

    Sequence Alignment & Motif Analysis

    Northeastern University — BINF6200

    Implemented Smith-Waterman and Needleman-Wunsch alignment algorithms; analyzed transcription factor binding motifs via PSSM correlated with gene expression data.

    • Dynamic programming: Smith-Waterman (local) & Needleman-Wunsch (global)
    • PSSM-based TF binding motif scoring
    • Correlated motif strength with gene expression signals
    Stack
    • Python
    • NumPy
    • Pandas
    • Biopython
    Private repo — available on request
  • P-07Structural Biology · Drug Discovery
    2022 – 2023

    Homology Modeling & Molecular Docking — GLT6D1

    Osmania University

    Structural analysis of GLT6D1 via BLAST, 3D homology modeling (MODELLER), and molecular docking (AutoDock); validated model quality with Ramachandran plot and DOPE scoring, identifying key binding-site interactions.

    • 3D homology model built with MODELLER and validated via Ramachandran analysis
    • DOPE-score model refinement and quality validation
    • AutoDock-based docking identified key binding-site interactions
    Stack
    • MODELLER
    • AutoDock
    • PyMOL
    • BLAST
    View on GitHub
03Education & Experience

My academic & professional timeline.

A combined timeline of graduate study at Northeastern, prior training at Osmania University, and direct patient-care experience — shaping my practical, cross-disciplinary approach to bioinformatics.

  1. E-01Education

    MS, Bioinformatics

    Northeastern University·Boston, MA · Remote

    Sep 2025 – May 2027 (Expected)
    • Coursework: BINF 6200 (Bioinformatics Programming) and BINF 6400 (Genomics & Computational Biology).
    • Focus: NGS pipelines on HPC (SLURM), algorithmic bioinformatics, and ML for biology.
  2. E-02Experience

    Medical Laboratory Technician

    Sunflower Memory Care Center·Cedar Park, TX

    Apr 2024 – May 2024
    • Monitored and recorded patient vitals for 10+ residents with Alzheimer's and dementia daily.
    • Provided direct care support for 10+ cognitively impaired residents per shift, reducing incident response time.
    • Collaborated with 5+ clinical staff, documenting 100% of daily observations to support care continuity.
  3. E-03Education

    Postgraduate Diploma, Bioinformatics

    Osmania University·India

    Jun 2021 – Aug 2022
    • Structural biology focus: homology modeling, molecular docking, and Ramachandran validation.
    • Capstone: structural analysis of GLT6D1 via BLAST, MODELLER, and AutoDock.
  4. E-04Education

    BS, Microbiology, Chemistry & Bioinformatics

    Osmania University·India

    Jun 2005 – Aug 2008
    • Triple-major foundation spanning molecular microbiology, chemistry, and computational biology.
04Technical Skills

A toolkit built for reproducible science.

Languages, bioinformatics tools, and HPC platforms I use across coursework and research — from raw FASTQ to annotated VCF and beyond.

Languages

06
  • Python
  • R
  • Bash / Shell
  • SQL
  • C
  • C++

Bioinformatics Tools

14
  • FastQC
  • Trimmomatic
  • BBDuk
  • MultiQC
  • BWA
  • Bowtie2
  • SAMtools
  • GATK
  • IGV
  • BLAST
  • MUSCLE
  • MODELLER
  • AutoDock
  • PyMOL

NGS & Genomics

08
  • Paired-end QC
  • Genome Assembly
  • ORF Identification
  • Variant Calling
  • Phylogenetics (UPGMA)
  • Smith-Waterman
  • Needleman-Wunsch
  • PWM / PSSM Motifs

Platforms & Formats

09
  • Linux / Unix
  • HPC (SLURM)
  • Git / GitHub
  • Jupyter
  • Conda
  • FASTQ
  • FASTA
  • BAM
  • VCF
05Contact

Let's build something meaningful together.

Open to roles and collaborations in computational biology, ML for biology, and translational genomics. Based in Austin, TX — remote friendly.

Open to opportunities in Austin, TX.

Curriculum Vitae

Full résumé with publications, projects, and technical proficiencies.

View CVprint to PDF