Domande d'esame VERIFICATO

Final exam Di Lena. Module 3 version 9

Università degli studi di Bologna bioinformatics 2020

9 visualizzazioni

9 download

★ 3,0 (1)

Anteprima pagina 1 — Final exam Di Lena. Module 3 version 9

Di cosa parla

Context and Purpose: This document presents a shell scripting problem set for a Bioinformatics course (Programming for Bioinformatics, Module 3, AY 2019-2020). It is designed to assess students' ability to use command-line utilities for manipulating biological data.
Provided Data: The exercises are based on a `fastq.txt` file, which contains data in the standard FASTQ format, including sequence identifiers (starting with '@'), DNA sequences, a '+' separator line, and quality scores.
Exercise 1: Data Extraction and Counting with `cut`, `sort`, `tail`, `wc`:
- Part (a): Students must determine the output of `cat fastq.txt | cut -c 1-3 | sort -u | tail -5 | wc -l`. This sequence of commands extracts the first three characters from each line, sorts them uniquely, selects the last five unique entries, and then counts the number of resulting lines.
- Part (b): Students need to predict the output of `cat fastq.txt | cut -c 1-3 | tail -5 | sort -u | wc -l`. This variation highlights the importance of command order, as `tail -5` is applied before `sort -u`, affecting the set of lines processed for unique sorting and counting.
Exercise 2: Pattern Matching and Counting with `grep` and `wc`: This question asks for the output of `cat fastq.txt | grep "^..B" | wc -l`. The `grep` command filters lines that begin with any two characters followed by the letter 'B', and `wc -l` then counts these matching lines, testing basic regular expression understanding.
Exercise 3: Advanced Pattern Matching and Counting with `awk`: Students are tasked with predicting the output of `cat fastq.txt | awk 'BEGIN{n=0} $1 ~ /^..B/{if($1 ~ /@/) n++}END{print n}'`. This `awk` script initializes a counter `n`. For each line, it checks if the first field (`$1`) matches the pattern `^..B`. If true, it further checks if the same first field contains an '@' symbol. If both conditions are met, `n` is incremented. Finally, the script prints the total count `n`. This evaluates conditional logic and field-specific pattern matching within `awk`.
Exercise 4: Line Deletion and Counting with `sed` and `wc`: The final task involves predicting the output of `cat fastq.txt | sed -e '/^..B/d' | sed -e '/@/d' | wc -l`. This command pipeline uses `sed` twice: first to delete lines starting with `^..B`, and subsequently to delete any remaining lines containing the '@' symbol. `wc -l` then counts the lines that were not deleted by either `sed` command, demonstrating sequential filtering and deletion.

Vedi tutto il file Scarica

Altri appunti di PROGRAMMING FOR BIOINFORMATICS [cod. 69442]

Final exam Di Lena. Module 3 version 5 ★ 3,0 Final exam Module 3: Di Lena version 1 ★ 3,0 Final exam Di Lena module 3 version 2 ★ 2,5 Final exam Di Lena. Module 3 versin 3 ★ 3,0 Final exam Module 3 Dilena version 4 ★ 3,5 Final exam Di Lena: Module3 version 4 ★ 3,5

Vedi tutti gli appunti di bioinformatics

Final exam Di Lena. Module 3 version 9

Di cosa parla

Altri appunti di PROGRAMMING FOR BIOINFORMATICS [cod. 69442]

Ottieni i primi crediti!

Carica i tuoi file

Unisciti ai gruppi di studio

Invita i tuoi colleghi

Accidenti, ancora non abbiamo il tuo corso di laurea!

Consiglia ai tuoi amici

Final exam Di Lena. Module 3 version 9

Di cosa parla

Altri appunti di PROGRAMMING FOR BIOINFORMATICS [cod. 69442]

Condividi questi appunti