In-class assignment: Internet

 

1. Using LWP::Simple, download the file " http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Search&db=Protein&term=AAG60095&doptcmdl=GenPept".  It is the GenBank record for an Arabidopsis expansin protein.  Extract the DEFINITION line, and the ACCESSION line, and combine them into a FASTA header in the form: >ACCESSION | DEFINITION.  Then extract the polypeptide sequence (it is between the ORIGN line and a line with nothing but // on it.  Remove the spaces and numbers from the sequence and print it below the FASTA header.  That is, make a FASTA-formatted file from the information in this web site.

 

 

2.  Using HTML::TokeParser, make a list of every link (the "href" attribute of <a > tags) and its name (the text between the <a > and </a> tags from the above file.  Ignore <a> tags that don't have an href attribute (they aren't external links).

 

3. Using the LWP::UserAgent and HTTP::Request::Common modules, write a program that responds to all of the inputs in the http://biolinx.bios.niu.edu/cgi-bin/bios546/hello5.cgi  program.  See the http://biolinx.bios.niu.edu/bios546/start_hello5.html web page first to get a feel for it.