Insights with Ian: Genomics meets aquaculture

Genomics has transformed every corner of the life sciences over the past three decades. Aquaculture has been no exception. In this Insights with Ian post, we explore how this revolution unfolded, and what it means for modern breeding programmes.

A short history

Our story begins with the launch of the Human Genome Project (HGP) in 1990. Using a map-based Sanger sequencing strategy it took an international consortium of scientists until February 2001 to publish a working draft. Still, only around 90% of euchromatic DNA was covered, and there were numerous gaps and errors. The decision to release data as it was obtained proved a major success, fostering much technological innovation and a rapid expansion of genome science.

A proprietary human genome project was subsequently launched in 1998 with the formation of Celera Genomics. Increased computing power, the adoption of whole-genome shotgun sequencing and access to the HGP data enabled the rival Celera Draft to be published in just three years, at the same time as the HGP paper. The HGP officially ended in 2003 with the completion of a much more accurate reference sequence with an error rate of less than 1 in 10,000 bases. By then, the project was estimated to have cost US$2.7 billion. Around 8% of the genome remained unresolved, largely composed of heterochromatic regions (chromosomal regions with tightly packed histones and repetitive DNA sequences} which are often found at the centre and ends of chromosomes. It took the development of new long-read sequencing technologies (PacBio/Nanopore) to resolve these inaccessible regions. The first truly gap free-human genome sequence was not published until 2022, some 32 years after the start of the HGP.

Sequencing aquaculture genomes

One interesting side consequence of the HGP was the sequencing of the Japanese pufferfish (Takifugu rubripes) genome (Fig. 1) with a paper published in 1993. This species was not chosen for its culinary appeal or importance in aquaculture, but because its genome was 7-8 times smaller than that of humans (365 million base pairs versus 2.85 billion base pairs) while containing a similar number of protein-coding genes.

Fig. 1. The Tiger pufferfish (Takifugu rubripes), the first fish with a sequenced genome.

By 2010 an initial sequence had been obtained for the much larger genome of Atlantic salmon (Salmo salar), the most commercially important aquaculture species. This achievement involved the combined efforts of researchers from Canada, Norway and Chile.

Fast forward to 2026 and hybrid, multi-platform approaches using a combination of long-read and short-read (Illumina) sequencing plus advanced scaffolding techniques have made 30x coverage, chromosome-level genome assemblies both fast and cheap to produce (30x coverage genomes are suitable for most scientific and practical applications). The Vertebrate Genome Product aims to finish high-quality genome assemblies for all 70,000 vertebrate species within 10 years. Currently, roughly 1-3 days of pipeline work re needed for each 30x genome, at a cost of approximately US$1,000-1,500. Even high-end long-read sequencing (PacBio/Nanopore) remains orders of magnitude cheaper than the HGP, at U$3,000–6,000 per genome.

The result is an exponential decrease in the time and cost required to sequence an individual genome. Numerous companies provide commercial services for genome sequencing and assembly, enabling scores of genomes from commercially important aquaculture species to be sequenced. Genomics has therefore become truly democratised, meaning the sequencing of any genome of aquaculture interest is now easily within the reach of individual research institutes and companies.

Genome annotation

Chromosome-level genome sequences are simply long lists of nucleotides and require annotation to discover the location of protein-coding genes and their various regulatory elements dispersed across the genome. These include proximal promoter sequences which activate protein transcription as well as enhancers or silencers that influence how transcription factors bind to promoters to modulate gene expression.

Still other areas of the genome are involved in the expression of non-coding RNAs, e.g., microRNAs and long non-coding RNAs, which have important regulatory roles. Various assays and experiments have been devised for the functional annotation of genomes. The task is so large that it requires whole communities of scientists working with standardised methods. A good example from aquaculture is the EU Horizon 2020 project AQUA-FAANG “Advancing European Aquaculture by Genome Functional Annotation”. This project involved leading research groups and companies from 9 European countries, with the goal of functionally annotating the genomes of six commercially important fish species. The main findings of the project and a roadmap for future scientific exploration and exploitation of the data have been published in an Industry White Paper. The data is hosted on the Ensembl genome browser based at the European Bioinformatics Institute (EBI) located at the Wellcome Genome Campus in Hinxton, UK. Ensembl plays a pivotal role in making high quality annotated genomes publicly available and it has developed tools to visualise and interrogate increasingly rich layers of data. The latest version of Ensembl now provides access to over 4,700 annotated genomes and several pangenomes (e.g., 27 pig breeds).

Genomic selection of candidate broodstock

Genomic prediction models allow more accurate estimates of the genetic merit than traditional methods based on pedigree information alone (as discussed in an earlier post). Consequently, and as the cost of high-density genotyping has fallen, there has been a rapid and widespread adoption of genomic selection for aquaculture breeding.

Single-step Genomic BLUP (Best Linear Unbiased Prediction) methods assume equal SNP effects and are suitable for traits controlled by many genes distributed across the genome as is often the case (e.g., for growth traits). They have the advantage that some animals with only phenotype information can be included in the training set required for the prediction model, which increases selection accuracy. Bayesian models of various types have been proposed which may outperform linear models for traits which have one or more sequence regions responsible for a large proportion of the phenotypic variation (QTLs, Quantitative Trait Loci). Genome Wide Association Studies (GWAS) which require thousands of animals to achieve statistical power can be used to identify QTLs for use in direct selection or to weight genomic prediction models.

The genomes of oysters and other molluscs deserve a special mention for some genetic peculiarities impacting selective breeding strategies (reviewed here). Not least is their very high level of genetic diversity compared to shrimp or fish. This is perhaps related to much higher mutation rates because of the sheer number of meiosis events needed to produce tens of millions of eggs per mature individual over the spawning season. The high degree of polymorphism found in molluscan genomes present challenges both for their assembly and the general applicability of scientific results in breeding applications.

Genome structural variation (SV)

Most genomic selection models only consider single nucleotide polymorphisms or haplotypes (groups of linked SNPs) when making breeding decisions. This is not ideal because it ignores the various types of structural variation that may contribute to the additive genetic variance responsible for differences in phenotypes. The next section considers how understanding genome structural variation might contribute to improved breeding decisions. There are three main classes of genome structural variation namely chromosomal inversions (CI), indels (deletions/ insertions) and copy number variation (CNV).

Fig. 2 Classes of structural variation in genomes.

The AQUAFAANG project reported genome wide surveys of structural variants (SVs) including inversions, deletions and duplications in two important aquaculture species. A combination of short read sequencing, bioinformatic tools and manual curation were used to produce a map of 15,483 high confidence SVs in Atlantic salmon with a true positive SV presence/absence plus genotype call rate of 81% based on validation by PCR and longread sequencing (MinION) at 50x coverage. Some 21,428 SVs were identified in European sea bass. In both species, many SVs were found near or overlapping functional genomic features, including protein-coding genes, long non-coding RNA genes, and pseudogenes. Significant differences in SV allele frequency were found between farmed and wild populations, particularly in regions linked to brain, nervous system function and behaviour. This is of interest because domestication is known to alter many aspects of fish behaviour including surface feeding and a diminution of the startle response.

Significance of SVs for aquaculture breeding

Chromosomal inversions

Chromosomal inversions (CI) are a structural rearrangement in which a segment of chromosome breaks in two places, flips 180⁰ and reinserts in the reverse orientation. CIs which can span several megabases of sequence supress recombination within the inverted region so that genes inside the inversion are inherited together as a block. This is thought to allow co-adapted gene complexes to contribute to local adaptation notwithstanding the presence of gene flow between populations. The main methods for identifying CIs are described in Table 1. Once identified CIs can be conveniently identified by PCR-based methods.

Table 1: Strengths and weaknesses of main methods for identifying CIs.

A cool example of the importance of CIs was provided in the March 2026 issue of the journal Science. The paper studied Atlantic silverside (Menidia menidia) which are distributed along an extreme latitudinal temperature gradient on the Eastern Seaboard of North America. Three 3 large CIs on multiple chromosomes were associated with growth rate, body shape, lipid content and vertebral number. Cross breeding experiments established that each inversion acted as a genetic switch with contrasting selection patterns across latitudes enabling fine-tuned responses to temperature despite gene flow. Three large putative Cis (12-22 Mb) in the King scallop (Pecten maximus) are also strongly associated with sea surface temperature and may explain why differences in reproductive timing at relatively small spatial scales are maintained across King scallop populations (see here).

The genomes of rainbow trout (Oncorhynchus mykiss) contain a large 55 Mb inversion on chromosome 5 and a smaller 14 Mb inversion on chromosome 20. The 55 Mb inversion, containing more than 1,000 genes, is strongly associated with anadromy vs residency (i.e. steelhead vs resident trout) and consequently differences in developmental timing, smoltification and sexual maturation. The frequency of the derived insertion increases at high latitudes, but it is not present in all populations such as those in Southeast Alaska.

CIs are currently underutilised in aquaculture breeding and would benefit from more research. Programmes could tailor selection strategies to the inversion frequencies present in their own broodstock. For example, selection for the anadromy associated CI would be expected to improve growth and survival in rainbow trout transferred to sea water net pens. Selection for an increased frequency of CIs associated with warmer temperatures could also prove worthwhile in species impacted by climate change. A consideration of CIs may also strengthen breeding decisions more generally by improving prediction accuracy in genomic selection. CI genotypes could be treated as fixed or random effects, encoded as haplotypes, or multi-kernel models could be used where SNPs inside inversion regions form a separate relationship matrix, or reaction-norm framework.

Indels and Copy Number Variation (CNVs)

Indels that increase or decrease the number of copies of a gene (CNVs) can change the amount of gene product produced. Because many aquaculture species rely heavily on innate immunity and rapid growth, these dosage changes can have large phenotypic effects. A strong example is the NOD-like receptor (NLR) gene family studied in the model fish species Danio rerio which has 1,500 copies of NLR genes. Copy number varies widely between individuals and populations, with only 4% of the genes present in 80% of the individuals. NLR genes are thought to help mediate innate immune responses, acting as pattern recognition receptors or components of inflammasomes. Around 50% of the copies in zebrafish were monomorphic and the rest showed low genetic diversity. Variation in NLR copy numbers may therefore influence disease resistance which is one of the most important breeding targets in finfish and shellfish. In another example, evidence was obtained that a sub-class of NLR gene in Pacific whiteleg shrimp (Litopenaeus vannamei) functions in viral recognition and host immunity following infection with White Spot Syndrome Virus (WSSV) one of the most devastating diseases in this species.

Short tandem repeats (STRs), a form of repetitive indel variation, are particularly abundant in penaeid shrimp genomes. In several shrimp species including L. vannamei, STRs make up 26–32% of the genome. A study identified 84 STR loci significantly associated with body weight with some showing a direct linear correlation between repeat number and growth phenotype. STRs are therefore an interesting target that could be explored for genetic selection in shrimp.

Role of microbiome

Fish guts contain a complex microbial fauna of bacteria, archaea, yeast and fungi, collectively known as the microbiome. Co-operation between the intestinal microbiome and host genes is important for nutrient absorption, immune function and growth (see here). Phenotypic traits heavily influenced by the composition of the gut microbiome include those related to behaviour and metabolic regulation (reviewed here). Genetic variation in the host genome is thought to shape the composition and function of the microbiome, presumably via the production of various metabolites and signalling molecules. Gilthead sea bream selected for fast growth had a distinct microbiome composition and a greater tolerance of including plant material in the diet compared to fish selected for intermediate or slow growth. The fast growth line also coped better following infection by intestinal parasites. Rainbow trout lines selected over multiple generations for either fast or slow growth showed a different representation of microbial taxa. Trout bred for fast growth were enriched in cellulose- and amino acid–fermenting taxa whereas fish bred for slow growth contained higher proportions of opportunistic pathogens (see here). Selection for cold tolerance in white legged shrimp and Nile tilapia improved low-temperature performance of both host and microbiome, indicating they may function as a single selection unit (the hologenome theory of evolution).

Understanding the mechanisms behind host–microbe interactions, and the genetic factors that shape them, is an active and important area of research. As this field develops, microbiome‑informed breeding strategies could become a valuable tool for improving aquaculture species.

Future thoughts

Historical investments in genomics research and related technologies (engineering, software development, and sequencing chemistry) have returned very handsome dividends. The structure and function of aquaculture genomes are now better understood although much remains to be discovered, particularly given the enormous range of species under commercial cultivation. Genomic selection models are improving aquaculture breeding outcomes for many species. As knowledge expands more sophisticated GS models will likely appear offering the prospect of still more accurate predictions and faster rates of genetic gain. Rigorous bioeconomic modelling of the benefits and additional costs associated with any future developments will be required to establish their commercial viability.

Prof. Ian Johnston, Co-founder and consultant to Xelect

Latest News

BPM

Beyond Disease Resistance: Breeding More Resilient Aquaculture Populations

July 16, 2026

Insights with Ian

Insights with Ian: Realised genetic gain in breeding programmes

July 7, 2026

Lab

Inside the Xelect Lab: What SNP Markers Are and Why They Matter in Aquaculture?

June 29, 2026

BPM

Breeding for a changing climate: how genetics is building thermal resilience in salmonids

May 18, 2026

Insights with Ian