Two considerations arise in trying to tune the PSSM so that it adequately represents the training sequences. Which of the following is not their description?
(a) If a given column in 20 sequences has only isoleucine, it is not very likely that different amino acid will be found in other sequences with that motif because the residue is probably important for function
(b) If a given column in 20 sequences has only isoleucine, it is very likely that different amino acid will be found in other sequences with that motif because the residue is probably important for function
(c) If the number of sequences with the found motif is large and reasonably diverse, the sequences represent a good statistical sampling of all sequences that are ever likely to be found with that same motif
(d) Another column in the motif from the 20 sequences may have several amino acids, and some amino acids may not be represented at all
I have been asked this question in an online interview.
This intriguing question originated from Position topic in portion Multiple Sequence Alignment of Bioinformatics