BIOS 546 In class exercises Regular Expresions

 

1.  Open the file /home/bios546/ASHV.seq.  Skip the initial comment line (it starts with ">", which is standard for FASTA format).  Then remove all numbers and spaces from the rest of the lines (the actual sequence), convert them all the upper case, and concatenate them into one sequence.  Print it out.

 

2. Find all instances of the EcoR1 restriction site (GAATTC) in the above sequence.  Report the positions of each site.

 

3. Write a program that prompts the user for an input sequence, then tests HFV.seq for the presence of that sequence and prints out an appropriate response.  Make it case-insensitive.

 

4. Using the same file, extract the comment line and split it at "|" characters into an array.  Then join the array with "\n" characters and print it out.

 

5. Count the number of A's, C's, G's and T's.  Also calculate the percentage of G+C.