微生物基因组数据挖掘与生物信息学中心欢迎您!
time
Bacterial Genomic Seuqnece Data Mining & Bioinformatic Analysis

Rapid Analysis of Genetic Relationship among Achromobacter spp. Strains by Batch-ANIm

Batch-ANIm由ANI计算软件Jspecies和多个功能程序脚本组成。该软件通过Aspera高速完成大量基因组序列下载,平行启动多个Jspecies程序并自动完成两两基因组配对和序列导入,能够快速计算获得大量基因组两两之间ANI值。使用方法详见https://www.microbialgenomic.cn/Batch-ANIm.html

如果使用Batch-ANIm,请引用一下参考文献: 利用Batch-ANIm快速分析无色杆菌属菌株间的亲缘关系. 基因组学与应用生物学, 2021, 40(2):695-704.

Batch_ANIm下载地址: Batch-ANIm.tar.gz
 
使用说明
1. 依赖程序及模块安装与配置(Installation of required programs) and Perl modules
===========================================================================

 

Batch-ANIm运行前,需要安装好Java、Aspera、MUMmer、以及相关Perl模块,此外运行 Jpecies 时,需要设置工作文件夹和MUMmer执行文件路径(But before running, Batch-ANI needs to pre-install Java, Aspera, Mummer, several Perl Modules,and R packages. In addition, the paths of workplace and mummer in Jspecies need to be set.)

Perl模块列表(Perl Modules list):
(1) threads; (2) Getopt::Long;

R程序及相关安装包(R software and packages list):
(1) ape; (2) plotrix;

MUMmer安装(MUMmer installation)
请参考http://mummer.sourceforge.net/

Jspecies设置 (Jspecies set):
Jspecies使用前,用户需要设置工作文件夹绝对路径,推荐使用ANI_direction作为工作文件夹,同时设置nucmer执行文件路径,具体使用说明参见http://imedea.uib-csic.es/jspecies/index.html.(before using jspecies, the users needs to set the absolute pathway for workplace (recommend to use ANI_direction as workplace), and nucmer manually. Please refer to jspecies usage available at http://imedea.uib-csic.es/jspecies/index.html.)

Aspera安装(Aspera installation):
请参考https://downloads.asperasoft.com/(Aspera could be download from https://downloads.asperasoft.com/)

2. 运行Batch-ANIm(Running Batch-ANIm)
=========================================

 

(1) genome_batch_retrive.pl 采用Aspera高速下载基因组序列数据; 输入文件:每一行包括一个基因组FTP下载地址的文本文档

下载格式说明:用户可以根据需要下载全基因组的注释文件、全全基因组核苷酸序列文件 、全部编码基因核苷酸序列、全部编码基因氨基酸序列

例如:

genomes/all/GCF/001/571/245/GCF_001571245.1_ASM157124v1/GCF_001571245.1_ASM157124v1_genomic.fna.gz 只下载全基因组的核苷酸序列文件

genomes/all/GCF/001/571/245/GCF_001571245.1_ASM157124v1 涵盖所有基因组的数据。

#在使用之前,用户应该安装Aspera并为以下参数设置绝对路径,例如:
my $ascp = '/home/xiangyang/.aspera/connect/bin/ascp'; # Aspera执行程序绝对路径
my $asperaweb_id_dsa_openssh = '/home/xiangyang/.aspera/connect/etc/asperaweb_id_dsa.openssh'; # asperaweb_id_dsa.openssh文件绝对路径
my $aspera_tokenauth_id_rsa = '/home/xiangyang/.aspera/connect/etc/aspera_tokenauth_id_rsa'; # aspera_tokenauth_id_rsa文件绝对路径

用法:
perl genome_batch_retrive.pl -f FTP_download_site_list -dir genome_out_dir

执行例子:
perl /home/xiangyang/Batch-ANIm/genome_batch_retrive.pl -f /home/xiangyang/Batch-ANIm/Achromobacter_list_20191220 -dir /home/xiangyang/Batch-ANIm/Achromobacter_20191220

参数
=======================
-f, --FTP_download_site_list
文本文件,其中每行包含一个基因组的FTP下载网站。用户可以从http://www.ncbi.nlm.nih.gov/genome/browse/网站获取(A text file, in which each line contains the FTP download website for one genome. User can obtain a batch of genomic download websites from http://www.ncbi.nlm.nih.gov/genome/browse/.)

单个基因组数据的文件格式解释(Example for explaination file format for individual genome data)
---------------------------------------------------------------------------------------------------------------
文件格式(file_format) 解释(explaination)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
GCF_000429385.1_ASM42938v1/GCF_000429385.1_ASM42938v1_cds_from_genomic.fna.gz 基因核苷酸序列
GCF_000429385.1_ASM42938v1/GCF_000429385.1_ASM42938v1_genomic.fna.gz 基因组fasta序列
GCF_000429385.1_ASM42938v1/GCF_000429385.1_ASM42938v1_genomic.gbff.gz 完整序列加注释信息
GCF_000429385.1_ASM42938v1/GCF_000429385.1_ASM42938v1_protein.faa.gz 基因蛋白质序列
----------------------------------------------------------------------------------------------------------------

-dir, --genome_out_dir
保存下载基因组数据的目录(a directory to hold download genomes data.)
-h, --help
显示帮助信息(Show this message.)

(2)Batch_import_conf.pl 根据菌株名文件列表生成配置文件,平行启动多个Jspecies,并自动完成基因组序列导入和基因组配对选择。

运行Batch_import_conf.pl前,启动Jspecies,设置Jspecies工作文件夹绝对路径,推荐使用Batch_import_conf.pl的ANI_direction参数作为工作文件夹,同时设置nucmer执行文件路径,具体使用说明参见http://imedea.uib-csic.es/jspecies/index.html.(before using jspecies, the users needs to set the absolute pathway for workplace (recommend to use ANI_direction as workplace), and nucmer manually. Please refer to jspecies usage available at http://imedea.uib-csic.es/jspecies/index.html.)

用法:
perl batch_import_conf.pl -l list_file -f fasta_file_dir -d, ANI_direction [-a int]
执行例子:
perl /home/xiangyang/Batch-ANIm/bacth_import_conf.pl -l /home/xiangyang/Batch-ANIm/test_data/Achromobacter_name -f /home/xiangyang/Batch-ANIm/test_data/genome_fasta -d /home/xiangyang/Batch-ANIm/ani_A -a 3

参数
=======================
必须参数(REQUIRED ARGUMENTS):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-l, --list_file
菌株名文件列表,每一行包括一个菌株名(each line contains a genome name, which is subject to calculate ani value.)
-f, --fasta_file_dir
包括多个基因组核苷酸序列文件的文件夹(a directory, which contains nucleotide sequences of each genome.)
-d, --ANI_direction
jspecies工作文件夹(ANI_directiona directory, which is the workplace for jspecies running in parallel.)

非必须参数(OPTIONAL ARGUMENTS):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-m, --multiple_threads
设置Jspecies平行运行数量,默认平行数为1(set the number of threads to use, the more the rate is fast (default: 1).)
-h, --help
显示帮助信息(Show this message.)


***平行启动Jspecies程序后,如果已将ANI_direction参数设置为Jspecies工作文件夹,此时用户只需手工分别选择ANI_0, ANI_1, or ANI2...等文件夹作为Jspecies输入文件组即可,最后点击ok开始计算ANIm。(After open Jspecies in multiple parallel, if using ANI_direction as workplace, users only needs to choose ANI_0, ANI_1, or ANI2...as input and click ok manually. Jscpeial will begin to calculate ANIm. Please refer to jspecies usage available at http://imedea.uib-csic.es/jspecies/index.html.)

***Jspecies程序全部运行完成后,用户需要手工分别保存每个平行运行的结果,并将结果文件存放在同一个文件夹下,此文件夹为test_to_matrix.pl的输入数据文件
(After jspecies finish ANIm calculation, users needs to save the results manually, and put result files of all parallel under the same diretory, which is used as input data for test_to_matrix.pl)

(3)text_to_matrix.pl将多个平行运行Jspecies产生的结果文件转换成一个完整的ANIm矩阵文件;输入文件:文件夹存放有多个ANIm结果文件。

用法:
perl text_to_matrix.pl -l list_file -dir jspecies_out_dir -ani ANIm_out -out matrix_out

执行例子:
perl /home/xiangyang/Batch-ANIm/text_to_matrix.pl -l /home/xiangyang/Batch-ANIm/109.list -dir /home/xiangyang/Batch-ANIm/ani_109 -ani /home/xiangyang/Batch-ANIm/108_result/109.out -out /home/xiangyang/Batch-ANIm/109.out_distance

参数:
=======================
-l, --list_file
菌株名文件列表,每一行包括一个菌株名(each line contains a genome name, which is subject to calculate ani value.)
-dir, --jspecies_out_dir
平行运行Jspecies结束后,存放有多个ANIm结果文件的文件夹(a directory contains serveral files, which are the ani output from jspecies.) -ani, --ANIm_out
-ani, --ANIm_out
输出ANIm值矩阵文件(ani matrix, which summarizes all ani results from jspecies running in parellel.)
-out, --metrix_out
输出“100-ANIm”矩阵文件(100-ANIm matrix, which summarizes all ani results from jspecies running in parellel.)
-h, --help
显示帮助信息(Show this message.)

(4)ANIm_matrix_cluster.R根据“100-ANIm”进化距离值,对菌株进行聚类分析

参数:
=======================

(1) out_file: 聚类输出的pdf文件绝对路径

(2) matrix: ANIm矩阵文件绝对路径


用法:

Rscript ANIm_matrix_cluster.R out_file matrix
执行例子:

Rscript /home/xiangyang/Batch-ANIm/ANIm_matrix_cluster.R /home/xiangyang/Batch-ANIm/1027_genome.out /home/xiangyang/Batch-ANIm/1027_test.out_distance

 

Dr. Xiangyang Li (E-mail: lixiangyang@fudan.edu.cn, lixiangyang1984@gmail.com), Fudan university; Kaili University; Bacterial Genome Data mining & Bioinformatic Analysis (https://www.microbialgenomic.cn/).

Copyright 2019—2025, Xiangyang Li. All Rights Reserved.