Open-access databases such as the European Nucleotide Archive (ENA) contain more than 2.4 million bacterial genomes, and this number continues to grow rapidly. Until now, searching these vast ...
Sequence alignment algorithms form the foundation of comparative genomics, structural biology and evolutionary studies by identifying regions of similarity among nucleotide or protein sequences.
We trained DEDAL, an algorithm based on deep-learning language models, to generate pairwise alignments of protein sequences taking into account the sequence-specific context of amino acid ...