Module:Bio::Tools::Run::RemoteBlast

From BioPerl
Jump to: navigation, search
Bio::Tools::Run::RemoteBlast
PDoc Bio::Tools::Run::RemoteBlast
CPAN Bio::Tools::Run::RemoteBlast
metaCPAN Bio::Tools::Run::RemoteBlast


Contents

Purpose

Submit and retrieve BLAST jobs from an NCBI style CGI script (WebBlast)

Status

  • This module generally works, but it defaults to using plain text BLAST output (which is supposed to be unparseable by NCBI at some point). See NCBI_Blast_email and mailing list post for warnings.
  • Saving XML output is now supported; it is expected that the default parser will be set to XML in the future for maintainability due to constant problems with parsing BLAST text output. Parsing text output is still supported and will parse text BLAST output as of 7/21/06.
  • An option has been added (experimental) to modify the URL base address for BLAST queries. However, I must emphasize that we don't plan on supporting this option actively. All NCBI BLAST interfaces apparently defer to the DEFAULT CGI interface; therefore, changing the URL base address to any of the other BLAST interfaces on the NCBI website will not work. We recommend, instead, that you change the database parameter. The following URLs gives a list of the BLAST databases available for remote access, including those for WGS, SNP, and CDD (Thanks to Malcolm Cook for pointing this out).
  • As sussed out by Malcolm Cook, you can obtain results of NCBI's Conserved Domain search by
# 1) set your database like this:
 -database => 'cdsearch/cdd', # c.f. http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html for other cdd database options
# 2) add this line before submitting the job:
$Bio::Tools::Run::RemoteBlast::HEADER{'SERVICE'} = 'rpsblast';

--Chris Fields 12:17, 21 July 2006 (EDT)

Comments

Future plans are to update RemoteBlast and StandAloneBlast to implement methods from a unified abstract base class, Module:Bio::Tools::Run::Blast. This will allow for more RemoteBlast-like interfaces to be added as needed.

A new module, RemoteNetBlast, will be created so that individuals setting up webservices may rely on remote blast functionality. Since NCBI updates their executables whenever they change their server setup, the new module will wrap netblast/blastcl3. The module will be based on the current StandAloneBlast code, which includes many of the same data structures already. Unlike RemoteNCBI, this module would require the installation of additional software. The user would be required to update that software separately from bioperl. Since the bioperl release cycle is not tied to the NCBI blast release cycle, inevitable periods of broken code will remain unpredictable; however, parsing issues will likely be sorted out within a week or two after they are reported.

RemoteNCBI, will remain a "pure perl" implementation, but will default to returning XML formatted results and use XML-based parsing. The current equivalent would be:

 my $remote_blastxml = Bio::Tools::Run::RemoteBlast->new
    ('-verbose'    => $v,
     '-prog'       => $prog,
     '-data'       => $db,
     '-readmethod' => 'xml',  # tells the parser to use blastxml format for parsing
     '-expect'     => $e_val,
     );
 $remote_blastxml->retrieve_parameter('FORMAT_TYPE', 'XML'); # tells NCBI to send XML back

While this does little to connect the development cycles, it does ensure that minor variations will have no practical effect and is expected to reduce the time required for future enhancements.

Note that XML::SAX is now required for XML parsing through RemoteBlast, and XML::SAX::ExpatXS and expat is recommended if you want to speed up parsing.

The modules in the Bio::Tools::Run::Blast family may include:

 --> StandAlone     (support for both WU-BLAST and NCBI-BLAST local binaries and 
                     eventually MPI-BLAST too)
 --> RemoteNCBI     (currently the RemoteBlast server)
 --> RemoteEBISOAP  (EBI has a nice SOAP interface that works quite well, but 
                     may not provide all the same databases as what people expect 
                     from NCBI)
 --> RemoteNetBlast (blastcl3 or netblast local executable)

Authors

As of 060126, ownership of this module was accepted by Roger Hall, who then promptly had a heart attack.

He is back on the job as of 070815, with some help from Chris Fields.


Thanks always to the original author, Jason Stajich.

Bugs

Bugs could linked here from bugzilla --jason stajich 16:58, 4 November 2005 (EST)

Using the xml format shown above did not work for me (object did not return XML), however, I was able to grab the most recent Bio::SearchIO::blast.pm attachment from this bugzilla bug and my RemoteBlast app now works. There is also a fix for the XML problem.

--User:Alanraetz 24 January 2006

Well it should return a Bio::SearchIO object but anyways, but I guess the fix works for people so it should be checked in. --jason stajich 10:54, 25 January 2006 (EST)

The two bug fixes were simple changes to pattern matching that broke with the latest change in NCBI's WebBlast interface. I thought the option of saving XML would be nice even though it's not as clean as text. There are other changes with the new WebBlast text output (rearranging location of 'Query=' and 'Database:'and adding RID to the header, addition of 'ALIGNMENTS' to alignment section, so on), that don't seem to have an effect on parsing. --Chris Fields 10:23, 26 January 2006 (EST)

I plan to check these in as soon as I have tested them as suggested. --Rogerhall 15:57, 10 February 2006 (EST)

Personal tools
Namespaces
Variants
Actions
Main Links
documentation
community
development
Toolbox