In-class assignment: Internet
1. Using LWP::Simple, download the file " http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Search&db=Protein&term=AAG60095&doptcmdl=GenPept". It is the GenBank record for an Arabidopsis expansin protein. Extract the DEFINITION line, and the ACCESSION line, and combine them into a FASTA header in the form: >ACCESSION | DEFINITION. Then extract the polypeptide sequence (it is between the ORIGN line and a line with nothing but // on it. Remove the spaces and numbers from the sequence and print it below the FASTA header. That is, make a FASTA-formatted file from the information in this web site.
2. Using HTML::TokeParser, make a list of every link (the "href" attribute of <a > tags) and its name (the text between the <a > and </a> tags from the above file. Ignore <a> tags that don't have an href attribute (they aren't external links).
3. Using the LWP::UserAgent and HTTP::Request::Common
modules, write a program that responds to all of the inputs in the http://biolinx.bios.niu.edu/cgi-bin/bios546/hello5.cgi program.
See the http://biolinx.bios.niu.edu/bios546/start_hello5.html
web page first to get a feel for it.