Domande d'esame VERIFICATO

Final exam Di Lena. Module 3 version 7

Università degli studi di Bologna bioinformatics 2020
★ 3,5 (1)
Condividi: WhatsApp Telegram
Anteprima pagina 1 — Final exam Di Lena. Module 3 version 7

Di cosa parla

  • The document presents a shell scripting exam focused on bioinformatics applications.
  • It includes a sample dataset named fastq.txt, structured similarly to a FASTQ file, containing sequence identifiers, nucleotide sequences, and quality scores. This dataset is central to all the practical exercises.
  • Question 1 (2 points) involves analyzing two slightly different pipelines of commands:
    • cat fastq.txt | sort | cut -c 1-11 | tail -4 | uniq | wc -1
    • cat fastq.txt | cut -c 1-11 | sort | tail -4 | uniq | wc -1
    • This question specifically tests the impact of command order (sort before cut vs. cut before sort) on the final output, as well as the functionality of tail, uniq, and wc -l for data manipulation and counting.
  • Question 2 (2 points) requires determining the output of the command: cat fastq.txt | grep "@.*6" | wc -l. This problem evaluates the ability to use grep for advanced pattern matching (finding lines with '@' followed by any characters and then '6') and then counting these lines using wc -l.
  • Question 3 (3 points) focuses on interpreting a more complex awk script: cat fastq.txt | awk '$1 /^@/{if(!a[$1]) {a[$1]=1; n++;}}END{print n}'. This command is designed to identify and count the number of unique sequence identifiers (lines starting with '@') within the fastq.txt file, demonstrating proficiency in using awk for data processing, conditional logic, and array management.
  • Question 4 (3 points) asks for the output of: cat fastq.txt | sed '/^@/d' | wc -l. This exercise assesses the understanding of sed for deleting specific lines (those starting with '@') and then using wc -l to count the total number of remaining lines after the deletion.
  • Collectively, these exercises are designed to evaluate a student's practical skills in using common Unix shell commands and scripting for tasks relevant to bioinformatics data processing, such as filtering, sorting, extracting, and counting specific elements within large text-based datasets.

Altri appunti di PROGRAMMING FOR BIOINFORMATICS [cod. 69442]

Condividi questi appunti

WhatsApp Telegram