Domande d'esame VERIFICATO

Final exam Di Lena. Module 3 version 9

Università degli studi di Bologna bioinformatics 2020
9 visualizzazioni
9 download
★ 3,0 (1)
Condividi: WhatsApp Telegram
Anteprima pagina 1 —  Final exam Di Lena. Module 3 version 9

Di cosa parla

  • Context and Purpose: This document presents a shell scripting problem set for a Bioinformatics course (Programming for Bioinformatics, Module 3, AY 2019-2020). It is designed to assess students' ability to use command-line utilities for manipulating biological data.
  • Provided Data: The exercises are based on a `fastq.txt` file, which contains data in the standard FASTQ format, including sequence identifiers (starting with '@'), DNA sequences, a '+' separator line, and quality scores.
  • Exercise 1: Data Extraction and Counting with `cut`, `sort`, `tail`, `wc`:
    • Part (a): Students must determine the output of `cat fastq.txt | cut -c 1-3 | sort -u | tail -5 | wc -l`. This sequence of commands extracts the first three characters from each line, sorts them uniquely, selects the last five unique entries, and then counts the number of resulting lines.
    • Part (b): Students need to predict the output of `cat fastq.txt | cut -c 1-3 | tail -5 | sort -u | wc -l`. This variation highlights the importance of command order, as `tail -5` is applied before `sort -u`, affecting the set of lines processed for unique sorting and counting.
  • Exercise 2: Pattern Matching and Counting with `grep` and `wc`: This question asks for the output of `cat fastq.txt | grep "^..B" | wc -l`. The `grep` command filters lines that begin with any two characters followed by the letter 'B', and `wc -l` then counts these matching lines, testing basic regular expression understanding.
  • Exercise 3: Advanced Pattern Matching and Counting with `awk`: Students are tasked with predicting the output of `cat fastq.txt | awk 'BEGIN{n=0} $1 ~ /^..B/{if($1 ~ /@/) n++}END{print n}'`. This `awk` script initializes a counter `n`. For each line, it checks if the first field (`$1`) matches the pattern `^..B`. If true, it further checks if the same first field contains an '@' symbol. If both conditions are met, `n` is incremented. Finally, the script prints the total count `n`. This evaluates conditional logic and field-specific pattern matching within `awk`.
  • Exercise 4: Line Deletion and Counting with `sed` and `wc`: The final task involves predicting the output of `cat fastq.txt | sed -e '/^..B/d' | sed -e '/@/d' | wc -l`. This command pipeline uses `sed` twice: first to delete lines starting with `^..B`, and subsequently to delete any remaining lines containing the '@' symbol. `wc -l` then counts the lines that were not deleted by either `sed` command, demonstrating sequential filtering and deletion.

Altri appunti di PROGRAMMING FOR BIOINFORMATICS [cod. 69442]

Condividi questi appunti

WhatsApp Telegram