SequenceAnalysis

Sequence Analysis (in Progress)¶

This "Part 1" is a collection of Python-based example codes focussing on sequence analysis on a very basic level (see TOC). It is motivated by the many jobs descriptions currently online for genome analysis, next-gen-sequencing (NGS) advances and my fascination towards life science in general. Apologies for the code often being a little sloppy and "easy-going". Due to a limited time frame, priorities were set to "progress & visualization" instead of "finding the most efficient solution" :)
Whenever a function or method is developed to fulfill the goals, it is outsources into the subfolder "ModulesOwn", take a look and have fun. Check out my website (www.bio-century.net) for demos, literature recommendations, links to data bases and further inspirations.

# export image gallery yes or no
saveGallery = True
# saveGallery = False

0. Reference Table ¶

Tag or Variable	Meaning
mySequence	Target Sequence
mySOI	Sequence of Interest
ssDNA	single-stranded DNA

1. First Steps in Sequence Analysis (Groundwork)¶

1.1 Color Strings in Terminal Output ¶

Goal:

Load a target DNA-sequence ("mySequence") from external file
split the DNA at first incidence of a dedicated subpart ("mySOI") and
colorprint a for visualization purposes.

# sources
# - https://www.bioinformatics.org/sms2/random_dna.html
# - https://pkg.go.dev/github.com/whitedevops/colors


# include packages
from ModulesExternal.TerminalColors import TerminalColors


# initialize and define variables and parameter
mySOI="CGCCAAAAA"

# load file line-by-line
with open('./ModulesOwn/A_Groundwork_Data/01_01_SeqOfInterest.txt') as f:
    myLines = f.readlines()

# compute and visualize results
# find mySO, split and color it
for ii, mySequence in enumerate(myLines):
    SOI_POS = mySequence.upper().rfind(mySOI)
    print(mySequence[:SOI_POS - 1]
          + f"{TerminalColors.Green}"
          + mySequence[SOI_POS:SOI_POS + len(mySOI)].upper()
          + f"{TerminalColors.Default}", end = '\n')
    print(" "*(SOI_POS - 1)
          + f"{TerminalColors.LightBlue}"
          + mySequence[SOI_POS:SOI_POS + len(mySOI)].upper()
          + f"{TerminalColors.Default}" + mySequence[SOI_POS + 1:], end = '')

ctgggactctagctgatccacccgcctagggcagcacacataggacgtagctCGCCAAAAA
                                                    CGCCAAAAAgccaaaaagacgaacccaccatgcccagacgcatctggctaagctc

1.2 Computation of Complement, Reverse and Reverse Complement of ssDNA ¶

Goal:
To determine all modalities of a DNA-single-strand sequence, i.e.

the complement-,
the reverse- as well as
the reverse-complement-strand

and visualize them.

# sources
# - https://www.bioinformatics.org/sms2/random_dna.html


# include modules
from ModulesExternal.TerminalColors import TerminalColors
import ModulesOwn.A_Groundwork as Groundwork
from tkinter import *
from tkinter.ttk import *


# initialize and define variables and parameter
mySequenceInput      = "ctgggactctagctgatccacccgcctagggcagcacacataggacgtagctgcgccaaaaagacgaacccaccatgcccagacgcatctggctaagctc"
mySequence           = mySequenceInput.upper()
myNumberOfSOIs       = 3
mySOI                = [""] * myNumberOfSOIs
myColors             = [""] * myNumberOfSOIs
mySequenceColored    = [""] * myNumberOfSOIs
myColorsDNA          = [TerminalColors.Green, TerminalColors.Yellow, TerminalColors.Blue, TerminalColors.Magenta]
myDefaultColor       = TerminalColors.Default


# compute and visualize results
returnSeq = Groundwork.ComplRev(mySequence,"Sequence")
mySequenceColoredDNA = Groundwork.ColorDNA(returnSeq, myColorsDNA, myDefaultColor)
print("Sequence: \t \t", mySequenceColoredDNA, end = '\n')
print("\t \t \t", ''.join('|' for i in returnSeq), end = '\n')

returnSeq = Groundwork.ComplRev(mySequence,"SequenceComplement")
mySequenceColoredDNA = Groundwork.ColorDNA(returnSeq, myColorsDNA, myDefaultColor)
print("Complement: \t \t", mySequenceColoredDNA, end = '\n')
print("")

returnSeq = Groundwork.ComplRev(mySequence,"SequenceReverse")
mySequenceColoredDNA = Groundwork.ColorDNA(returnSeq, myColorsDNA, myDefaultColor)
print("Reverse: \t \t", mySequenceColoredDNA, end = '\n')
print("\t \t \t", ''.join('|' for i in returnSeq), end = '\n')

returnSeq = Groundwork.ComplRev(mySequence,"SequenceReverseComplement")
mySequenceColoredDNA = Groundwork.ColorDNA(returnSeq, myColorsDNA, myDefaultColor)
print("Reverse Complement: \t", mySequenceColoredDNA, end = '\n')

Sequence: 	 	 CTGGGACTCTAGCTGATCCACCCGCCTAGGGCAGCACACATAGGACGTAGCTGCGCCAAAAAGACGAACCCACCATGCCCAGACGCATCTGGCTAAGCTC
	 	 	 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Complement: 	 	 GACCCTGAGATCGACTAGGTGGGCGGATCCCGTCGTGTGTATCCTGCATCGACGCGGTTTTTCTGCTTGGGTGGTACGGGTCTGCGTAGACCGATTCGAG

Reverse: 	 	 CTCGAATCGGTCTACGCAGACCCGTACCACCCAAGCAGAAAAACCGCGTCGATGCAGGATACACACGACGGGATCCGCCCACCTAGTCGATCTCAGGGTC
	 	 	 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Reverse Complement: 	 GAGCTTAGCCAGATGCGTCTGGGCATGGTGGGTTCGTCTTTTTGGCGCAGCTACGTCCTATGTGTGCTGCCCTAGGCGGGTGGATCAGCTAGAGTCCCAG

1.3 Identification Sequences of Interest within a Target Sequence ¶

Goal: To identify and mark the the position of a SOI (mySOI) in a given target sequence (mySequence). The return "positionsSOI" shall be

1 if the respective base at position ii of the sequence "mySequence" is part of mySOI
>1 if two mySOI-regions overlap and
0 otherwise.

# sources
# - https://www.bioinformatics.org/sms2/random_dna.html


# include modules
from ModulesExternal.TerminalColors import TerminalColors
import ModulesOwn.A_Groundwork as Groundwork


# initialize and define variables and parameter
mySequenceInput      = "ctgggactctagctgatccacccgcctagggcagcacacataggacgtagctgcgccaaaaagacgaacccaccatgcccagacgcatctggctaagctc"
mySequence           = mySequenceInput.upper()
mySOI                = "GG"
myCount, mySOIPositions = Groundwork.SOIPositions(mySequence, mySOI, "countchain")
myColor                 = TerminalColors.Green
myDefaultColor          = TerminalColors.Default
mySequenceColored       = Groundwork.ColorTheSeq(mySequence, mySOIPositions, myColor, myDefaultColor)
mySOIPositionsColored   = Groundwork.ColorTheSeq(mySOIPositions, mySOIPositions, myColor, myDefaultColor)


# compute and visualize results
print("mySequence: \t " + mySequenceColored, end = "\n")
print("mySOIPositions:  " + mySOIPositionsColored, end = '\n')
labelNum, labelX = Groundwork.SeqNumberGen(mySequence, 10)
print("\t\t " + labelNum, end = '\n')
print("\t\t " + labelX, end = '\n')
myCount, mySOIPositionsIndices = Groundwork.SOIPositions(mySequence, mySOI, "indices")
print("Indices: \t", mySOIPositionsIndices, end = "\n")

mySequence: 	 CTGGGACTCTAGCTGATCCACCCGCCTAGGGCAGCACACATAGGACGTAGCTGCGCCAAAAAGACGAACCCACCATGCCCAGACGCATCTGGCTAAGCTC
mySOIPositions:  0012100000000000000000000000121000000000001100000000000000000000000000000000000000000000001100000000
		 1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
		 ---------|---------|---------|---------|---------|---------|---------|---------|---------|---------x
Indices: 	 [2, 3, 28, 29, 42, 90]

1.4 Identification of Multiple Sequences of Interest within a Target Sequence ¶

Goal:

To generalize the algorithm above. The positions of 3 given SOIs (mySOIs) shall be

identified in a target sequence (mySequence),
labelled individually (green, blue and yellow) and
a character of the target sequence shall be colored in red, if there is an overlap between the same or different SOIs.

# include modules
from ModulesExternal.TerminalColors import TerminalColors
import ModulesOwn.A_Groundwork as Groundwork


# initialize and define variables and parameter
dictTargetSequence                  = {}
dictSOIs                            = {}
mySequenceInput                     = "ctgggactctagctgatccacccgcctagggcagcacacataggacgtagctgcgccaaaaagacgaacccaccatgcccagacgcatctggctaagctc"
dictTargetSequence["mySequence"]    = mySequenceInput.upper()
dictSOIs["mySOI01"]                 = "AGG"
dictSOIs["mySOI02"]                 = "GAC"
dictSOIs["mySOI03"]                 = "CTAG"
SOICount                            = [0]  * len(dictSOIs)
SOIPositions                        = [0]  * len(dictSOIs)
SOIPositionsIndices                 = []
mySequenceColored                   = [""] * len(dictSOIs)
myColors                            = [TerminalColors.Green, TerminalColors.Blue, TerminalColors.Yellow]
myDefaultColor                      = TerminalColors.Default
myColorWarning                      = TerminalColors.Red
dictSOIPositions                    = {}
dictSOIPositionsIndices             = {}


# compute and visualize results
labelNum, labelX = Groundwork.SeqNumberGen(dictTargetSequence["mySequence"], 10)
print("\t\t\t  " + labelNum, end = '\n')
print("\t\t\t  " + labelX, end = '\n')

for count, value in enumerate(dictSOIs):
    SOICount[count], dictSOIPositions[value]        = Groundwork.SOIPositions(dictTargetSequence["mySequence"], dictSOIs[value], "countchain")
    SOICount[count], dictSOIPositionsIndices[value] = Groundwork.SOIPositions(dictTargetSequence["mySequence"], dictSOIs[value], "indices")
    mySequenceColored[count]                        = Groundwork.ColorTheSeq(dictTargetSequence["mySequence"], dictSOIPositions[value], myColors[count], myDefaultColor)
    SOIPositions[count]                             = dictSOIPositions[value]
    print(value + ": \t \t ", mySequenceColored[count], end = '\n')

mySequenceColored, mySOIPositionsTotalColored, mySOIPositionsTotal = Groundwork.ColorTheSeqMerge(dictTargetSequence["mySequence"],
                                                                                                 SOIPositions,
                                                                                                 myColors,
                                                                                                 myDefaultColor,
                                                                                                 myColorWarning)

print("-" * 126, end = '\n')
print("mySOIs (total): \t ", mySequenceColored, end = '\n')
print("mySOIPositions (total):  ", mySOIPositionsTotalColored, end = '\n')
print("-" * 126, end = '\n')
print("mySOI01 Indices:\t ", dictSOIPositionsIndices["mySOI01"])
print("mySOI02 Indices:\t ", dictSOIPositionsIndices["mySOI02"])
print("mySOI03 Indices:\t ", dictSOIPositionsIndices["mySOI03"])

			  1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
			  ---------|---------|---------|---------|---------|---------|---------|---------|---------|---------x
mySOI01: 	 	  CTGGGACTCTAGCTGATCCACCCGCCTAGGGCAGCACACATAGGACGTAGCTGCGCCAAAAAGACGAACCCACCATGCCCAGACGCATCTGGCTAAGCTC
mySOI02: 	 	  CTGGGACTCTAGCTGATCCACCCGCCTAGGGCAGCACACATAGGACGTAGCTGCGCCAAAAAGACGAACCCACCATGCCCAGACGCATCTGGCTAAGCTC
mySOI03: 	 	  CTGGGACTCTAGCTGATCCACCCGCCTAGGGCAGCACACATAGGACGTAGCTGCGCCAAAAAGACGAACCCACCATGCCCAGACGCATCTGGCTAAGCTC
------------------------------------------------------------------------------------------------------------------------------
mySOIs (total): 	  CTGGGACTCTAGCTGATCCACCCGCCTAGGGCAGCACACATAGGACGTAGCTGCGCCAAAAAGACGAACCCACCATGCCCAGACGCATCTGGCTAAGCTC
mySOIPositions (total):   0000111011110000000000000112210000000000011211000000000000000011100000000000000001110000000000000000
------------------------------------------------------------------------------------------------------------------------------
mySOI01 Indices:	  [27, 41]
mySOI02 Indices:	  [4, 43, 62, 81]
mySOI03 Indices:	  [8, 25]

1.5 Data Grouping and Storage ¶

1.5.1 Collection / Namedtuples ¶

For further sequence analysis it might be useful to establish data containers in order to group different variables and parameter that belong together. One possibility in Python is to set up a collection (namedtuples). Please keep in mind, that long target sequences should be excluded / stored externally and then should be assigned by a nametag identifyer.

# sources
# - https://stackoverflow.com/questions/9713352/python-creating-a-list-with-a-unique-automatically-generated-name
# - https://www.pythonpool.com/python-string-to-variable-name/
# - https://www.pythonforbeginners.com/basics/convert-string-to-variable-name-in-python


# include modules
from collections import namedtuple
from ModulesExternal.TerminalColors import TerminalColors
import ModulesOwn.A_Groundwork as Groundwork


# initialize and define variables and parameter
dictTargetSequence                  = {}
dictSOIs                            = {}
mySOI                               = []
dictTargetSequence["mySequence01"]  = "ctgggactctagctgatccacccgcctagggcagcacacatagg".upper()
dictTargetSequence["mySequence02"]  = "acgtagctgcgccaaaaagacgaacccaccatgcccagacgcatctggctaagctc".upper()
dictSOIs["mySOI01"]                 = "AGG"
dictSOIs["mySOI02"]                 = "GAC"
dictSOIs["mySOI03"]                 = "CTAG"
SOICount                            = [0]  * len(dictSOIs)
SOIPositions                        = [0]  * len(dictSOIs)
dictIndices                         = {}
mySequenceColored                   = [""] * len(dictSOIs)
myColors                            = [TerminalColors.Green, TerminalColors.Blue, TerminalColors.Yellow]
myColorsStr                         = ["Green", "Blue", "Yellow","Green", "Blue", "Yellow"] # Terminal Colors are tricky to store. Using identifiers instead
myDefaultColor                      = TerminalColors.Default
myColorWarning                      = TerminalColors.Red
mySOIsAndIndices                    = namedtuple("myDataCollection","mySequenceKey mySequenceValue mySOIKey mySOIValue myColors myIndices")
mySOIsAndIndices_array              = []
mySeqPos                            = []
dictSOIPositions                    = {}
dictSOIPositionsIndices             = {}


# compute and visualize results
for mySeqsKey in dictTargetSequence:
    kk = 0
    for kk, myCurrentKey in enumerate(dictSOIs):
        mySOIsAndIndices_array.append(mySOIsAndIndices(mySeqsKey, dictTargetSequence[mySeqsKey], myCurrentKey, dictSOIs[myCurrentKey], myColorsStr[kk], []))

for countTargetSeq, valueTargetSeq in enumerate(dictTargetSequence):
    print(valueTargetSeq)
    labelNum, labelX = Groundwork.SeqNumberGen(dictTargetSequence[valueTargetSeq], 10)
    print("\t\t\t  " + labelNum, end = '\n')
    print("\t\t\t  " + labelX, end = '\n')
    for count, value in enumerate(dictSOIs):
        SOICount[count], dictSOIPositions[value]        = Groundwork.SOIPositions(dictTargetSequence[valueTargetSeq], dictSOIs[value], "countchain")
        SOICount[count], dictSOIPositionsIndices[value] = Groundwork.SOIPositions(dictTargetSequence[valueTargetSeq], dictSOIs[value], "indices")
        mySequenceColored[count]                        = Groundwork.ColorTheSeq(dictTargetSequence[valueTargetSeq], dictSOIPositions[value], myColors[count], myDefaultColor)
        SOIPositions[count]                             = dictSOIPositions[value]
        print(value + ": \t \t ", mySequenceColored[count], end = '\n')
    print("\n")

for SOIs in mySOIsAndIndices_array:
    for ii in range(len(SOIs.mySequenceValue) - len(SOIs.mySOIValue)):
        if SOIs.mySequenceValue[ii:ii + len(SOIs.mySOIValue)] == SOIs.mySOIValue:
            SOIs.myIndices.append(ii)
    print(SOIs)

mySequence01
			  12345678901234567890123456789012345678901234
			  ---------|---------|---------|---------|----
mySOI01: 	 	  CTGGGACTCTAGCTGATCCACCCGCCTAGGGCAGCACACATAGG
mySOI02: 	 	  CTGGGACTCTAGCTGATCCACCCGCCTAGGGCAGCACACATAGG
mySOI03: 	 	  CTGGGACTCTAGCTGATCCACCCGCCTAGGGCAGCACACATAGG


mySequence02
			  12345678901234567890123456789012345678901234567890123456
			  ---------|---------|---------|---------|---------|------
mySOI01: 	 	  ACGTAGCTGCGCCAAAAAGACGAACCCACCATGCCCAGACGCATCTGGCTAAGCTC
mySOI02: 	 	  ACGTAGCTGCGCCAAAAAGACGAACCCACCATGCCCAGACGCATCTGGCTAAGCTC
mySOI03: 	 	  ACGTAGCTGCGCCAAAAAGACGAACCCACCATGCCCAGACGCATCTGGCTAAGCTC


myDataCollection(mySequenceKey='mySequence01', mySequenceValue='CTGGGACTCTAGCTGATCCACCCGCCTAGGGCAGCACACATAGG', mySOIKey='mySOI01', mySOIValue='AGG', myColors='Green', myIndices=[27])
myDataCollection(mySequenceKey='mySequence01', mySequenceValue='CTGGGACTCTAGCTGATCCACCCGCCTAGGGCAGCACACATAGG', mySOIKey='mySOI02', mySOIValue='GAC', myColors='Blue', myIndices=[4])
myDataCollection(mySequenceKey='mySequence01', mySequenceValue='CTGGGACTCTAGCTGATCCACCCGCCTAGGGCAGCACACATAGG', mySOIKey='mySOI03', mySOIValue='CTAG', myColors='Yellow', myIndices=[8, 25])
myDataCollection(mySequenceKey='mySequence02', mySequenceValue='ACGTAGCTGCGCCAAAAAGACGAACCCACCATGCCCAGACGCATCTGGCTAAGCTC', mySOIKey='mySOI01', mySOIValue='AGG', myColors='Green', myIndices=[])
myDataCollection(mySequenceKey='mySequence02', mySequenceValue='ACGTAGCTGCGCCAAAAAGACGAACCCACCATGCCCAGACGCATCTGGCTAAGCTC', mySOIKey='mySOI02', mySOIValue='GAC', myColors='Blue', myIndices=[18, 37])
myDataCollection(mySequenceKey='mySequence02', mySequenceValue='ACGTAGCTGCGCCAAAAAGACGAACCCACCATGCCCAGACGCATCTGGCTAAGCTC', mySOIKey='mySOI03', mySOIValue='CTAG', myColors='Yellow', myIndices=[])

The accessibility to the info here stored as values can be a little tricky though. Here are two possibilities to do that:

# initialize and define variables and parameter
#            TargetSeq       SOI        Item2Look4
SEARCH_ITEM=["mySequence02", "mySOI02", "myColors"]


# compute and visualize results
INDEX_ARRAY=(list(dictTargetSequence.keys()).index(SEARCH_ITEM[0])) * len(dictSOIs) + (list(dictSOIs.keys()).index(SEARCH_ITEM[1]))
print("You are asking for the value of \"" + SEARCH_ITEM[2] + "\" for the sequence of interest \"" + SEARCH_ITEM[1] + "\" within the target sequence \"" + SEARCH_ITEM[0] + "\".")
print("-" * 124, end = '\n')
res = mySOIsAndIndices_array.__getitem__(INDEX_ARRAY).__getitem__(mySOIsAndIndices._fields.index(SEARCH_ITEM[2]))
print("The value is as follows: \t ", res)

You are asking for the value of "myColors" for the sequence of interest "mySOI02" within the target sequence "mySequence02".
----------------------------------------------------------------------------------------------------------------------------
The value is as follows: 	  Blue

# initialize and define variables and parameter
# Data-pair2look4:
#           key          value
SEARCH_ITEM=["mySOIKey", "mySOI02"]


# compute and visualize results
print("All items fulfilling the requirements:")
print("-" * 197, end = '\n')
for SOIs in mySOIsAndIndices_array:
    test_data = SOIs.__getitem__(mySOIsAndIndices._fields.index(SEARCH_ITEM[0]))
    if test_data == SEARCH_ITEM[1]:
        res = SOIs.__getitem__(mySOIsAndIndices._fields.index(SEARCH_ITEM[0]))
        print(SOIs)

All items fulfilling the requirements:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
myDataCollection(mySequenceKey='mySequence01', mySequenceValue='CTGGGACTCTAGCTGATCCACCCGCCTAGGGCAGCACACATAGG', mySOIKey='mySOI02', mySOIValue='GAC', myColors='Blue', myIndices=[4])
myDataCollection(mySequenceKey='mySequence02', mySequenceValue='ACGTAGCTGCGCCAAAAAGACGAACCCACCATGCCCAGACGCATCTGGCTAAGCTC', mySOIKey='mySOI02', mySOIValue='GAC', myColors='Blue', myIndices=[18, 37])

1.5.2 Store in JSON-Format ¶

Probably more convient, common and thus advisory is to use a dictionary / JSON-format combination.

Step 1: Dicitonary Setup and Test-Readout

# sources
# - https://python.plainenglish.io/how-to-read-and-write-to-json-file-in-python-ef35460aaeb5


# initialize and define variables and parameter
# Dicitonary Setup
dictSeqGroup =  {
                "SOIs": {
                        "mySOI01": {
                        "Seq": "AGG",
                        "myColor": "Green",
                        },
                        "mySOI02": {
                        "Seq": "GAC",
                        "myColor": "Blue",
                        },
                        "mySOI03": {
                        "Seq": "CTAG",
                        "myColor": "Yellow",
                        }
                },
                "targetSeq": {
                        "mySequence01": "CTGGGACTCTAGCTGATCCACCCGCCTAGGGCAGCACACATAGG",
                        "mySequence02": "ACGTAGCTGCGCCAAAAAGACGAACCCACCATGCCCAGACGCATCTGGCTAAGCTC"
                }
                }


# compute and visualize results
# Test-Readout
print(dictSeqGroup["targetSeq"]["mySequence02"])     # <-- 1. query
print(dictSeqGroup["SOIs"]["mySOI03"]["myColor"])    # <-- 2. query

ACGTAGCTGCGCCAAAAAGACGAACCCACCATGCCCAGACGCATCTGGCTAAGCTC
Yellow

Step 2: Data Export into JSON-Format

# sources
# - https://pynative.com/python-prettyprint-json-data/
# - https://stackoverflow.com/questions/73800772/returning-jsondecodeerror-expecting-value


# include modules
import json

with open("./ModulesOwn/A_Groundwork_Data/01_05_02_DictSeqGroup.json", "w") as outfile:
    json.dump(dictSeqGroup, outfile, indent = 4, sort_keys = True)

Step 3: Run Operations on Data and Save Modifications. Here: We determine and add the SOIPositions-Indices to each tarSeq-mySOI-pair, include the data in the master-dictionary and save everything in JSON-format.

# initialize and define variables and parameter
myColors                            = [TerminalColors.Green, TerminalColors.Blue, TerminalColors.Yellow]
myDefaultColor                      = TerminalColors.Default
myColorWarning                      = TerminalColors.Red

# (Re-)Open JSON file
with open('./ModulesOwn/A_Groundwork_Data/01_05_02_DictSeqGroup.json', 'r') as openfile:
    # Reading from json file
    dictSeqGroup = json.load(openfile)


# compute and visualize results
# Determine and print indices in terminal
for countTargetSeq, valueTargetSeq in enumerate(dictSeqGroup["targetSeq"].keys()):    
    labelNum, labelX = Groundwork.SeqNumberGen(dictSeqGroup["targetSeq"][valueTargetSeq], 10)
    print("\t\t\t  " + labelNum, end = '\n')
    print("\t\t\t  " + labelX, end = '\n')
    for count, value in enumerate(dictSeqGroup["SOIs"].keys()):
        SOICount[count], dictSOIPositions[value]        = Groundwork.SOIPositions(dictSeqGroup["targetSeq"][valueTargetSeq], dictSeqGroup["SOIs"][value]["Seq"], "countchain")
        SOICount[count], dictSOIPositionsIndices[value] = Groundwork.SOIPositions(dictSeqGroup["targetSeq"][valueTargetSeq], dictSeqGroup["SOIs"][value]["Seq"], "indices")
        mySequenceColored[count]                        = Groundwork.ColorTheSeq(dictSeqGroup["targetSeq"][valueTargetSeq], dictSOIPositions[value], myColors[count], myDefaultColor)
        SOIPositions[count]                             = dictSOIPositions[value]
        print(value + ": \t \t ", mySequenceColored[count], end = '\n')
        if "myIndices" in dictSeqGroup["SOIs"][value]:
            pass
        else:
            dictSeqGroup["SOIs"][value]["myIndices"] = {}
        dictSeqGroup["SOIs"][value]["myIndices"][valueTargetSeq] = dictSOIPositionsIndices[value]

    for count, value in enumerate(dictSeqGroup["SOIs"].keys()):
        print("\t\t\t ", dictSOIPositionsIndices[value], end = '\n')
    print("\n")

# Write info into json file
with open("./ModulesOwn/A_Groundwork_Data/01_05_02_DictSeqGroup.json", "w") as outfile:
    json.dump(dictSeqGroup, outfile, indent = 4, sort_keys = True)

			  12345678901234567890123456789012345678901234
			  ---------|---------|---------|---------|----
mySOI01: 	 	  CTGGGACTCTAGCTGATCCACCCGCCTAGGGCAGCACACATAGG
mySOI02: 	 	  CTGGGACTCTAGCTGATCCACCCGCCTAGGGCAGCACACATAGG
mySOI03: 	 	  CTGGGACTCTAGCTGATCCACCCGCCTAGGGCAGCACACATAGG
			  [27, 41]
			  [4]
			  [8, 25]


			  12345678901234567890123456789012345678901234567890123456
			  ---------|---------|---------|---------|---------|------
mySOI01: 	 	  ACGTAGCTGCGCCAAAAAGACGAACCCACCATGCCCAGACGCATCTGGCTAAGCTC
mySOI02: 	 	  ACGTAGCTGCGCCAAAAAGACGAACCCACCATGCCCAGACGCATCTGGCTAAGCTC
mySOI03: 	 	  ACGTAGCTGCGCCAAAAAGACGAACCCACCATGCCCAGACGCATCTGGCTAAGCTC
			  []
			  [18, 37]
			  []

Step 4: Reload JSON-file and print the content. The indices are now added to each targetSeq-mySOI-pair.

# sources
# - https://www.geeksforgeeks.org/read-a-file-line-by-line-in-python/


# Using readlines()
file1 = open('./ModulesOwn/A_Groundwork_Data/01_05_02_DictSeqGroup.json', 'r')
Lines = file1.readlines()
  
 # compute and visualize results
count = 0
# Strips the newline character
for line in Lines:
    count += 1
    print(line, end = '')

{
    "SOIs": {
        "mySOI01": {
            "Seq": "AGG",
            "myColor": "Green",
            "myIndices": {
                "mySequence01": [
                    27,
                    41
                ],
                "mySequence02": []
            }
        },
        "mySOI02": {
            "Seq": "GAC",
            "myColor": "Blue",
            "myIndices": {
                "mySequence01": [
                    4
                ],
                "mySequence02": [
                    18,
                    37
                ]
            }
        },
        "mySOI03": {
            "Seq": "CTAG",
            "myColor": "Yellow",
            "myIndices": {
                "mySequence01": [
                    8,
                    25
                ],
                "mySequence02": []
            }
        }
    },
    "targetSeq": {
        "mySequence01": "CTGGGACTCTAGCTGATCCACCCGCCTAGGGCAGCACACATAGG",
        "mySequence02": "ACGTAGCTGCGCCAAAAAGACGAACCCACCATGCCCAGACGCATCTGGCTAAGCTC"
    }
}

2. Set Up a simple Graphical User Interface (GUI)¶

Goal:

Set up a simple first draft of a Grahical User Interface (GUI) for future applications. The GUI should

analyse a given DNA-Sequence with respect to 3 mySOIs and color code the respective positions in the target sequence
save the results in an external file
look fancy
provide space for a next analysis step. Prefered is a tab-design of the GUI.

# sources
# - https://www.delftstack.com/howto/python-tkinter/tkinter-tabs/
# - https://github.com/rdbende/Azure-ttk-theme
# - https://www.javatpoint.com/python-tkinter-text
# - https://learn.microsoft.com/en-us/windows/win32/api/dwmapi/ne-dwmapi-dwmwindowattribute
# - https://www.plus2net.com/python/tkinter-grid.php
# - https://www.pythontutorial.net/tkinter/tkinter-validation/
# - https://stackoverflow.com/questions/17251016/python-tkinter-how-to-change-the-windows-border-color
# - https://stackoverflow.com/questions/18052395/array-of-buttons-in-python
# - https://stackoverflow.com/questions/53461066/python-showing-next-tab-by-pressing-a-button
# - https://stackoverflow.com/questions/70224682/is-there-a-way-in-tkinter-to-left-align-tabs-notebook
# - https://www.tutorialspoint.com/how-to-save-the-contents-of-a-textbox-in-tkinter
# - https://www.tutorialspoint.com/how-to-change-the-color-of-certain-words-in-a-tkinter-text-widget


# include modules
from IPython.display import Image as ImageIPython
import json
import TKinterModernThemes as TKMT
import ModulesOwn.A_Groundwork as Groundwork
import ModulesOwn.B_SimpleTabbedGUI as SimpleTabbedGUI
from ModulesExternal.TerminalColors import TerminalColors
import numpy as np
from PIL import Image, ImageTk
import sys
from tkinter import Text, END
from tkinter import *
from tkinter.ttk import *
import tkinter as tk
from tkinter import ttk


def main():
    # convert BCN-image to .ico in order to make image applicable as a logo for the window frame
    from PIL import Image
    logo = Image.open("./icons/logoBCNicon.png")
    logo.save("./icons/logoBCNicon.ico", format = 'ICO')


    # define colors and constants
    myForegroundColor          = "#4c87c8"
    myBackgroundColor          = "#000a34"
    dictSequence               = {}
    mySOI                      = []
    myNumberLabelBlank         = 30
    myLabelBlank               = [""] * myNumberLabelBlank
    myDefaultColor             = TerminalColors.Default
    myColorWarning             = TerminalColors.Red
    myDefaultColorTextBox      = "white"
    myColorWarningTextBox      = "red"
    myInput                    = [""] * 4
    myColorsTextBox = []
    myColors                   = [TerminalColors.Green, TerminalColors.Blue, TerminalColors.Yellow]


    # initialize GUI
    root = tk.Tk()

    # set graphical appearance
    root.configure(background = myBackgroundColor)
    Groundwork.DarkTitleBar(root)
    root.title('Tabbed GUI')
    root.tk.call("source", "./_themes/Sun-Valley-ttk-theme-main/sv.tcl")
    root.tk.call("set_theme", "dark")
    root.iconbitmap("./icons/logoBCNicon.ico")
    ttk.Style().configure("RB.TButton", foreground = myForegroundColor)

    # create a tab control that manages multiple tabs
    tabsystem = ttk.Notebook(root)

    # create new tabs using frame widget
    tab1 = Frame(tabsystem)
    tab2 = Frame(tabsystem)

    # initialize some blank labels used for GUI design
    for ii in range(0, len(myLabelBlank)):
        myLabelBlank[ii]   = ttk.Label(tab1, text = "   ", background = myBackgroundColor)

    # set style
    s = ttk.Style(root)
    s.configure("TFrame",        background = myBackgroundColor, foreground = myBackgroundColor)
    s.configure("TNotebook",     background = myBackgroundColor, foreground = myBackgroundColor)
    s.configure("TNotebook.Tab", background = myBackgroundColor, foreground = "white")


    # load (previous) dataset
    with open("./ModulesOwn/B_SimpleTabbedGUI_Data/02_Sequence.json", "r") as openfile:
        dictSequence = json.load(openfile)


    for ii in range(len(dictSequence["SOIs"])):
        myColorsTextBox.append(dictSequence["SOIs"]["mySOI""{:02d}".format(ii+1)]["myColor"])


    # set top label
    columnCount = 0; blankCount = 0
    myLabelBlank[blankCount].grid(row = columnCount, column = 0); blankCount  += 1
    myLabelBlank[blankCount].grid(row = columnCount, column = 2); blankCount  += 1
    myLabelBlank[blankCount].grid(row = columnCount, column = 4); blankCount  += 1
    myLabelBlank[blankCount].grid(row = columnCount, column = 6); columnCount += 1
    myLabelBlank[blankCount].grid(row = columnCount, column = 8); columnCount += 1

    myLabel = tk.Label(tab1, text = "Sequence Analysis", font = ('Arial', 14, 'bold'), bg = myBackgroundColor)
    myLabel.grid(row = columnCount, column = 3, rowspan = 1, columnspan = 5, sticky = tk.W + tk.E)
    columnCount += 1
    blankCount  += 1
    myLabelBlank[blankCount].grid(row = columnCount, column = 0)
    myValidation                    = [] * 4


    # set Input textboxes and labels
    for ii in range(len(dictSequence["SOIs"]) + 1):
        blankCount += 1
        myLabelBlank[blankCount+ii].grid(row = columnCount, column = 2)
        columnCount += 1
        if ii == 0:
            currentKey = "mySequence"
            currentValue = dictSequence["mySequence"]
            myLabel     = ttk.Label(tab1, text = currentKey + ": ", anchor="e", justify = "left",
                                    font = ('Courier New', 12, 'bold'), 
                                    foreground = "white", background = myBackgroundColor)
            myLabel.grid(row = columnCount, column = 1, sticky = tk.W + tk.E)
            myInput[ii] = ttk.Entry(tab1, width = 40, style = "EntryStyle.TEntry",
                                        font = ('Courier New', 12, 'bold'),
                                        foreground = myForegroundColor, background = myBackgroundColor)
            myInput[ii].insert("end", currentValue)
        else:
            currentKey = "mySOI""{:02d}".format(ii)
            currentValue = dictSequence["SOIs"][currentKey]["Seq"]
            myLabel     = ttk.Label(tab1, text = currentKey + ": ", anchor = "e", justify = "left",
                                    font = ('Courier New', 12, 'bold'), background = myBackgroundColor,
                                    foreground = myColorsTextBox[ii - 1])
            myLabel.grid(row = columnCount, column = 1, sticky = tk.W + tk.E)
            myInput[ii] = ttk.Entry(tab1, width = 40, style = "EntryStyle.TEntry",
                                        font = ('Courier New', 12, 'bold'),
                                        foreground = myForegroundColor, background = myBackgroundColor)
            myInput[ii].insert("end", currentValue)
        myInput[ii].grid(row = columnCount, column = 3, rowspan = 1, columnspan = 5)
        blankCount += 1; columnCount += 1
        myLabelBlank[blankCount + ii].grid(row = columnCount, column = 2)


    columnCount += 1; blankCount += 1
    myLabelBlank[blankCount].grid(row = columnCount, column = 0); columnCount += 1


    # determine colored result for textfield of GUI
    myNumberOfSOIs       = len(dictSequence["SOIs"])
    myCount              = [0] * myNumberOfSOIs
    mySOIPositions       = [0] * myNumberOfSOIs
    mySequenceColored    = [""] * myNumberOfSOIs
    mySequence           = dictSequence["mySequence"]

    columnCount  += 1
    for ii in range(len(dictSequence["SOIs"])):
        currentKey = "mySOI""{:02d}".format(ii + 1)
        currentValue = dictSequence["SOIs"][currentKey]["Seq"]
        myCount[ii], mySOIPositions[ii] = Groundwork.SOIPositions(mySequence, currentValue, "countchain")
        mySequenceColored[ii]           = Groundwork.ColorTheSeq(mySequence, mySOIPositions[ii], myColors[ii], myDefaultColor)

    mySequenceColored, mySOIPositionsTotalColored, mySOIPositionsTotal = Groundwork.ColorTheSeqMerge(mySequence, mySOIPositions, myColors,
                                                                                            myDefaultColor, myColorWarning)
    print("mySOIs (total): \t \t", mySequenceColored, end = '\n')
    print("mySOIPositions (total) : \t", mySOIPositionsTotalColored, end = '\n')
    myLabel = ttk.Label(text = mySequence, font=('Arial', 14, 'bold'))


    # write colored result in textfield of GUI
    myText = Text(tab1, height = 3, width = 35, background = myBackgroundColor, highlightthickness = 0,
                borderwidth = 0, font = ('Courier New', 12, 'bold'))
    SimpleTabbedGUI.AnalyseSeq(myText, myNumberOfSOIs, mySequence, mySOIPositions, myColorsTextBox,
            myDefaultColorTextBox, myColorWarningTextBox, myBackgroundColor, tab1, columnCount, myInput)
    myText.delete
    myTextnew= Text(tab1, height = 2, width = 35, background = myBackgroundColor, highlightthickness = 0,
                    borderwidth = 0, font = ('Courier New', 12, 'bold'))
    myTextnew = SimpleTabbedGUI.ColorTheSeqMergeGUI(myTextnew, dictSequence["mySequence"], mySOIPositions, myColorsTextBox,
                                                myDefaultColorTextBox, myColorWarningTextBox)
    myTextnew.grid(row = columnCount, column = 3, rowspan = 1, columnspan = 5, sticky = tk.W + tk.E)
    myColumnCountAnalysisStatic = columnCount; columnCount += 1


    # imprint user manual
    columnCount  += 1; blankCount += 1
    myLabelBlank[blankCount].grid(row = columnCount, column = 0); columnCount += 1
    myLabelLegend  = ttk.Label(tab1, text = "Sequences of Interest (SOIs): Green, blue and yellow", anchor = "w", justify = "left",
                                font = ('Courier New', 10, 'bold'), background = myBackgroundColor, foreground = "white")
    myLabelLegend.grid(row = columnCount, column = 3, rowspan = 1, columnspan = 5, sticky = tk.W + tk.E); columnCount += 1
    myLabelLegend  = ttk.Label(tab1, text = "Overlaps: red", anchor = "w", justify = "left", font = ('Courier New', 10, 'bold'),
                                background = myBackgroundColor, foreground = "white")
    myLabelLegend.grid(row = columnCount, column = 3, rowspan = 1, columnspan = 5, sticky = tk.W + tk.E)
    columnCount  += 1; columnCount += 1; blankCount += 1
    myLabelBlank[blankCount].grid(row = columnCount, column = 0); columnCount += 1
    myLabelMan     = ttk.Label(tab1,text = "Manual: ", anchor = "w", justify = "left", font = ('Courier New', 10, 'bold'),
                            background = myBackgroundColor, foreground = "white")
    myLabelMan.grid(row = columnCount, column = 3, rowspan = 1, columnspan = 5, sticky = tk.W + tk.E); columnCount += 1
    myLabelMan     = ttk.Label(tab1,text = "1. Modify Sequences", anchor = "w", justify="left", font = ('Courier New', 10, 'bold'),
                            background = myBackgroundColor, foreground = "white")
    myLabelMan.grid(row = columnCount, column = 3, rowspan = 1, columnspan = 5, sticky = tk.W + tk.E); columnCount += 1
    myLabelMan     = ttk.Label(tab1,text = "2. Click \"Analyze\"", anchor = "w", justify = "left", font = ('Courier New', 10, 'bold'),
                            background = myBackgroundColor, foreground = "white")
    myLabelMan.grid(row = columnCount, column = 3, rowspan = 1, columnspan = 5, sticky = tk.W + tk.E); columnCount += 1
    myLabelMan     = ttk.Label(tab1,text = "3. Click \"Save\"", anchor = "w", justify = "left", font = ('Courier New', 10, 'bold'),
                            background = myBackgroundColor, foreground = "white")
    myLabelMan.grid(row = columnCount, column = 3, rowspan = 1, columnspan = 5, sticky = tk.W + tk.E)
    columnCount  += 1; blankCount += 1
    myLabelBlank[blankCount].grid(row = columnCount, column = 0); columnCount += 1


    # Set control buttons and define actions
    myButtonAnalysis = ttk.Button(tab1,text = "Analyze", style = "RB.TButton", command = lambda: SimpleTabbedGUI.AnalyseSeq(myText, myNumberOfSOIs,
                                                                                                            mySequence, mySOIPositions,
                                                                                                            myColorsTextBox, myDefaultColorTextBox,
                                                                                                            myColorWarningTextBox, myBackgroundColor,
                                                                                                            tab1, myColumnCountAnalysisStatic, myInput))
    myButtonAnalysis.grid(row = columnCount, column = 3, rowspan = 1, columnspan = 3, sticky = tk.W + tk.E)
    myButtonSave  = ttk.Button(tab1, text = "Save", style = "RB.TButton", command = lambda: SimpleTabbedGUI.SaveSequence(dictSequence, myInput))
    myButtonSave.grid(row = columnCount, column = 7, sticky = tk.W + tk.E)

    columnCount += 1; blankCount += 1
    myLabelBlank[blankCount].grid(row = columnCount, column = 0); columnCount += 1
    myButton      = ttk.Button(tab1, text = "Restore", style = "RB.TButton", command = lambda: SimpleTabbedGUI.LoadDataset(myInput))
    myButton.grid(row = columnCount, column = 3, rowspan = 1, columnspan = 3, sticky = tk.W + tk.E)
    myButtonClose = ttk.Button(tab1, text = "Close", style = "RB.TButton", command = root.destroy)
    myButtonClose.grid(row = columnCount, column = 7, sticky = tk.W + tk.E)

    columnCount += 1; blankCount += 1
    myLabelBlank[blankCount].grid(row = columnCount, column = 0); columnCount += 1
    myButtonClear = ttk.Button(tab1, text = "Clear", style = "RB.TButton", command = lambda: SimpleTabbedGUI.ClearTextandFields(dictSequence, myInput))
    myButtonClear.grid(row = columnCount, column = 3, rowspan = 1, columnspan = 3, sticky = tk.W + tk.E)
    columnCount += 1; blankCount += 1
    myLabelBlank[blankCount].grid(row = columnCount, column = 0); columnCount += 1


    # set logo bio-century.net
    myImage        = Image.open('./icons/logoBCN.png')
    imageResized = myImage.resize((288, 47))
    myIcon=ImageTk.PhotoImage(imageResized)
    columnCount  += 1; blankCount += 1
    myLabelBlank[blankCount].grid(row = columnCount, column = 0)
    myLogoLabel    = ttk.Label(tab1, image = myIcon, padding = 5, background = myBackgroundColor)
    myLogoLabel.grid(row = columnCount, column = 3, rowspan = 1, columnspan = 5)

    columnCount += 1; blankCount += 1
    myLabelBlank[blankCount].grid(row = columnCount, column = 0); columnCount += 1
    myButtonNext   = ttk.Button(tab1, text="Next Tab", command = lambda: tabsystem.select(tab2))
    myButtonNext.grid(row = columnCount, column=7, sticky = tk.W + tk.E); columnCount += 1
    columnCount  += 1; blankCount += 1
    myLabelBlank[blankCount].grid(row = columnCount, column = 0); columnCount += 1


    # set "Next Idea" tab
    label2nd = Label(tab2, text = "Here is some space for a new idea!", background = myBackgroundColor)
    label2nd.grid(column = 1, row = 1, padx = 170,pady = 250)
    tabsystem.add(tab1, text = 'Sequence Analysis')
    tabsystem.add(tab2, text = 'Next Idea')
    tabsystem.grid()

    display(ImageIPython(filename = './figures/TabbedGUI.png'))
    root.mainloop()


if __name__ == "__main__":
    main()

mySOIs (total): 	 	 CTGGGACTCTAGCTGATCCACCCGCCTAGGGCAGCACA
mySOIPositions (total) : 	 00001110111100000000000001122100000000

3. Mutations, Insertions and Deletions ¶

3.1 Point Mutations ¶

3.1.1 Hamming Distance ¶

# include modules
from ModulesExternal.TerminalColors import TerminalColors
import ModulesOwn.A_Groundwork as Groundwork
import ModulesOwn.C_Mutations as Mutations


# initialize and define variables and parameter
dictTargetSequence                  = {}
#                                 123 456 789 012 345 678 901 234 567 890
#                                 M   L   G   L   W   L   L   A   D   *
dictTargetSequence["mySeq01"]  = "atg ctg gga ctc tgg ctg ctg gct gat taa".replace(" ","").upper()
dictTargetSequence["mySeq02"]  = "atg ctg gTa ctc tAA ctA ctg gAt gat taa".replace(" ","").upper()


# compute hamming distance
hammingDistance, mismatchDNA = Mutations.hammingDistance(dictTargetSequence["mySeq01"], dictTargetSequence["mySeq02"])


# print results
print("-" * 50)
print(" " * 5 + "Results Comparison mySeq01 And mySeq02")
print("-" * 50)
print(" " * 2 + "Hamming Distance: \t" + str(hammingDistance))
print(" " * 2 + "At Position: \t \t" + str(mismatchDNA))
print("-" * 50)
print()

Mutations.printSeqDifferences(dictTargetSequence["mySeq01"], dictTargetSequence["mySeq02"], mismatchDNA)

--------------------------------------------------
     Results Comparison mySeq01 And mySeq02
--------------------------------------------------
  Hamming Distance: 	5
  At Position: 	 	[7, 13, 14, 17, 22]
--------------------------------------------------

mySeq01: 	ATGCTGGGACTCTGGCTGCTGGCTGATTAA
		||||||| |||||  || |||| |||||||
mySeq02: 	ATGCTGGTACTCTAACTACTGGATGATTAA

3.1.2 Classify Mutations ¶

# sources
# - http://biopython.org/DIST/docs/tutorial/Tutorial.html

# data source
# - https://www.ncbi.nlm.nih.gov/datasets/taxonomy/2697049/
# Severe acute respiratory syndrome coronavirus 2


# include modules
from Bio.Seq import Seq
import ModulesOwn.C_Mutations as Mutations


dictTargetSequence["mySeq01Translated"] = Seq(dictTargetSequence["mySeq01"]).translate()
dictTargetSequence["mySeq02Translated"] = Seq(dictTargetSequence["mySeq02"]).translate()

hammingDistance, mismatchAA = Mutations.hammingDistance(dictTargetSequence["mySeq01Translated"], dictTargetSequence["mySeq02Translated"])

labelNum, labelX = Groundwork.SeqNumberGen(dictTargetSequence["mySeq01Translated"], 10)
print("\t\t" + labelNum, end = '\n')
Mutations.printSeqDifferences(dictTargetSequence["mySeq01Translated"], dictTargetSequence["mySeq02Translated"], mismatchAA)
print("\n")

# print results
print("-" * 50)
print(" " * 5 + "Results Comparison mySeq01 And mySeq02")
print("-" * 50)
print(" " * 2 + "Hamming Distance: \t" + str(hammingDistance))
print(" " * 2 + "At (Position -1): \t" + str(mismatchAA))
print("-" * 50)
print("")

Mutations.printMutationCharakterAminoAcidSeqs(dictTargetSequence["mySeq01"], dictTargetSequence["mySeq02"], mismatchDNA)

		1234567890
mySeq01: 	MLGLWLLAD*
		|| | || ||
mySeq02: 	MLVL*LLDD*

--------------------------------------------------
     Results Comparison mySeq01 And mySeq02
--------------------------------------------------
  Hamming Distance: 	3
  At (Position -1): 	[2, 4, 7]
--------------------------------------------------

G -> V at position 3: 	 Missense Conservative Mutation
W -> * at position 5: 	 Nonsense Mutation
L -> L at position 6: 	 Silent Mutation
A -> D at position 8: 	 Missense Non-Conservative Mutation

3.1.3 Reading Frames and Gene Maps (SARS-CoV-2)¶

Goal:
Create a DNA map depicting the genes of Sars-CoV-2 reversly-transcribed genome (Sars-CoV-2 has an RNA-genome). Show a close-up of the relatively small envelope protein E.

Step 01: Load in genome from fna-file. Show DNA-sequence and the translated result.

# sources:
# https://www.nature.com/articles/s41392-021-00653-w/figures/1
# https://datagy.io/python-find-index-substring/
# - https://en.wikipedia.org/wiki/Conservative_replacement

# data source:
# https://www.ncbi.nlm.nih.gov/datasets/taxonomy/2697049/


# include modules
import re
from Bio.Seq import Seq
from ModulesExternal.TerminalColors import TerminalColors
import warnings
warnings.filterwarnings('ignore')


def findGenes(mySeq, startPos, endPos):
    mySeqColoredList               = []
    mySeqTmpTranslatedIndicesM     = []
    mySeqTmpTranslatedIndicesStars = []

    for ii in range(3):
        mySeqTmp = mySeq[startPos + ii : endPos]
        mySeqTmpTranslated = Seq(mySeqTmp).translate()
        if mySeqTmpTranslated[0] == "M" and mySeqTmpTranslated[-1] == "*":
            mySeqColored = TerminalColors.Green + mySeqTmpTranslated + TerminalColors.Default
        else:
            mySeqColored = TerminalColors.Red + mySeqTmpTranslated + TerminalColors.Default
        mySeqColoredList.append(mySeqColored)

        indices_object = re.finditer(pattern='M', string = str(mySeqTmpTranslated))
        mySeqTmpTranslatedIndicesM.append([index.start() for index in indices_object])

        indices_object = re.finditer(pattern='\*', string = str(mySeqTmpTranslated))
        mySeqTmpTranslatedIndicesStars.append([index.start() for index in indices_object])

    return mySeqColoredList, mySeqTmpTranslatedIndicesM, mySeqTmpTranslatedIndicesStars


# initialize and define variables and parameter
dictGenes         = {}
counterGenes      = 0
mySeq             = ""
myPositioningFile = open('./Data/cds.fna', 'r')
mySeqFile         = open('./Data/genomic.fna', 'r')
lines             = myPositioningFile.readlines()

for ii, line in enumerate(lines):
    if line.find(">NC_045512.2:") > -1:
        lineCropped = line[len(">NC_045512.2:"):line.index("[") - 1]
        position = lineCropped[:lineCropped.index(" ")]
        positionStart = position[:position.index("-")]
        positionEnd = position[position.index("-")+1:]
        geneName = lineCropped[lineCropped.index(" ") + 1:]
        dictGenes[counterGenes] = {}
        dictGenes[counterGenes]["Name"] = geneName
        dictGenes[counterGenes]["positionStart"] = positionStart
        dictGenes[counterGenes]["positionEnd"] = positionEnd
        counterGenes += 1
myPositioningFile.close()

lines        = mySeqFile.readlines()
for ii, line in enumerate(lines):
    if ii > 0:
        mySeq = mySeq + line
mySeqFile.close()
mySeq = mySeq.replace("\n","")

# protein E envelope: print all three reading frames
startPos = 26242
endPos   = 26472
mySeqColoredList, mySeqIndicesM, mySeqIndicesStars = findGenes(mySeq, startPos, endPos)


dashlineLength = 101
indentHeadlines = 45
ii = 0
print("-" * dashlineLength)
print(" " * indentHeadlines + "Reading Frame 01".upper())
print("-" * dashlineLength)
print("Sequence translated: \t " + str(mySeqColoredList[ii]),  end = '\n')
print("Gene Start Position: \t " + str(mySeqIndicesM[ii]),     end = '\n')
print("Gene Stop Position: \t "  + str(mySeqIndicesStars[ii]), end = '\n')
print("-" * dashlineLength)
print("")
ii += 1
print("-" * dashlineLength)
print(" " * indentHeadlines + "Reading Frame 02".upper())
print("-" * dashlineLength)
print("Sequence translated: \t " + str(mySeqColoredList[ii]),  end = '\n')
print("Gene Start Position: \t " + str(mySeqIndicesM[ii]),     end = '\n')
print("Gene Stop Position: \t "  + str(mySeqIndicesStars[ii]), end = '\n')
print("-" * dashlineLength)
print("")
ii += 1
print("-" * dashlineLength)
print(" " * indentHeadlines + "Reading Frame 03".upper())
print("-" * dashlineLength)
print("Sequence translated: \t " + str(mySeqColoredList[ii]),  end = '\n')
print("Gene Start Position: \t " + str(mySeqIndicesM[ii]),     end = '\n')
print("Gene Stop Position: \t "  + str(mySeqIndicesStars[ii]), end = '\n')
print("-" * dashlineLength)

-----------------------------------------------------------------------------------------------------
                                             READING FRAME 01
-----------------------------------------------------------------------------------------------------
Sequence translated: 	 LCTHSFRKRQVR**LIAYFFFLLSWYSC*LH*PSLLRFDCVRTAAILLT*VL*NLLFTFTLVLKI*ILLEFLIFWS
Gene Start Position: 	 []
Gene Stop Position: 	 [12, 13, 28, 31, 49, 52, 65]
-----------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------
                                             READING FRAME 02
-----------------------------------------------------------------------------------------------------
Sequence translated: 	 YVLIRFGRDRYVNS**RTSFSCFRGILASYTSHPYCASIVCVLLQYC*RESCKTFFLRLLSC*KSEFF*SS*SSGL
Gene Start Position: 	 []
Gene Stop Position: 	 [14, 15, 47, 62, 68, 71]
-----------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------
                                             READING FRAME 03
-----------------------------------------------------------------------------------------------------
Sequence translated: 	 MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPDLLV*
Gene Start Position: 	 [0]
Gene Stop Position: 	 [75]
-----------------------------------------------------------------------------------------------------

# sources:
# https://github.com/Edinburgh-Genome-Foundry/DnaFeaturesViewer/blob/master/examples/overview_and_detail.py
# https://github.com/Edinburgh-Genome-Foundry/DnaFeaturesViewer


# include modules
from dna_features_viewer import GraphicFeature, GraphicRecord
from dna_features_viewer import GraphicFeature, GraphicRecord
import matplotlib.pyplot as plt

featuresList = []
whitelist = [0,1,2,3,4,5,7,8,9,10,11,12,13,14,17,18,19,20,21,22,23,24,25,26,27]

for ii in whitelist: #range(len(dictGenes)):
    if ii == 14:
        featuresList.append(GraphicFeature(start = int(dictGenes[ii]["positionStart"]), end = int(dictGenes[ii]["positionEnd"]), strand =+ 0, color = "#ffaaaa", label = dictGenes[ii]["Name"], alpha=0.0))
    else:
        featuresList.append(GraphicFeature(start = int(dictGenes[ii]["positionStart"]), end = int(dictGenes[ii]["positionEnd"]), strand =+ 0, color = "#ccccff", label = dictGenes[ii]["Name"], alpha=0.15))

record = GraphicRecord(sequence = mySeq,features=featuresList)
# record = GraphicRecord(sequence_length=32000,features=featuresList)
# ax1, _ = record.plot(figsize=(50, 2))#figure_width = 14)
# fig, (ax1) = plt.subplots(1, 1, figsize=(50, 2))
ax1, _ = record.plot(figure_width = 40)
ax1.set_title("Sequence detail", loc='left', weight='bold')
ax1.fill_between((startPos, endPos), +1000, -1000, color="#ccffcc",  alpha=0.65)

fig, (ax2) = plt.subplots(1, 1, figsize=(40, 2))
# fig, (ax2) = plt.subplots(1, 1, figure_width = 14)
zoom_start, zoom_end = 398, 428  # coordinates of the "detail"
cropped_record = record.crop((startPos-4, endPos+2))
cropped_record.plot_translation(ax=ax2, location=(startPos-4, endPos+2), fontdict={'weight': 'bold'})
cropped_record.plot(ax=ax2, plot_sequence=True)
ax2.set_title("Sequence detail", loc='left', weight='bold')

ax1.figure.savefig('./Figures/sequence_and_translation.png', bbox_inches='tight')

if saveGallery == True:
    ax1.figure.savefig('./../GALLERY/Sequence Analysis/03_01_03_Gene_Map01.png', bbox_inches='tight')
    ax2.figure.savefig('./../GALLERY/Sequence Analysis/03_01_03_Gene_Map02.png', bbox_inches='tight')
plt.show()

3.2 Needleman Wunsch Algorithm ¶

# https://stackoverflow.com/questions/68140575/styling-the-background-color-of-pandas-index-cell
# https://stackoverflow.com/questions/6667201/how-to-define-a-two-dimensional-array
# https://www.youtube.com/watch?v=of3B02hZGS0&t=1583s
# https://www.youtube.com/watch?v=b6xBvl0yPAY


# include modules
import numpy as np
import pandas as pd


# initialize and define variables and parameter
dictTargetSequence             = {}
# dictTargetSequence["mySeq01"]  = "tgg tg".replace(" ","").upper() # x
# dictTargetSequence["mySeq02"]  = "atc gt".replace(" ","").upper() # y
# scoringWeight   = [1, -1, -2]
# dictTargetSequence["mySeq01"]  = "cag cta".replace(" ","").upper() # x
# dictTargetSequence["mySeq02"]  = "cac ata".replace(" ","").upper() # y
# scoringWeight   = [1, -1, -1]
dictTargetSequence["mySeq01"]  = "atg ctg g".replace(" ","").upper() # x
dictTargetSequence["mySeq02"]  = "atg GAg A".replace(" ","").upper() # y
scoringWeight   = [2, -2, -1]


# scoreMatrix = np.zeros((len(dictTargetSequence["mySeq02"]) + 1, len(dictTargetSequence["mySeq01"]) + 1))
w, h = len(dictTargetSequence["mySeq01"]) + 3, len(dictTargetSequence["mySeq02"]) + 3
scoreMatrixAllValues = [[[] for x in range(1,w)] for y in range(1,h)]
scoreMatrixMaxValues = [[[] for x in range(1,w)] for y in range(1,h)]
scoreMatrixArrows    = [[[] for x in range(1,w)] for y in range(1,h)]
scoreMatrixTraceback = [[[] for x in range(1,w)] for y in range(1,h)]

for x in range(2, len(dictTargetSequence["mySeq01"]) + 2):
  scoreMatrixAllValues[0][x] = dictTargetSequence["mySeq01"][x - 2]
  scoreMatrixMaxValues[0][x] = dictTargetSequence["mySeq01"][x - 2]
  scoreMatrixArrows[0][x] = dictTargetSequence["mySeq01"][x - 2]
  scoreMatrixTraceback[0][x] = dictTargetSequence["mySeq01"][x - 2]
for y in range(2, len(dictTargetSequence["mySeq02"]) + 2):
  scoreMatrixAllValues[y][0] = dictTargetSequence["mySeq02"][y - 2]
  scoreMatrixMaxValues[y][0] = dictTargetSequence["mySeq02"][y - 2]
  scoreMatrixArrows[y][0] = dictTargetSequence["mySeq02"][y - 2]
  scoreMatrixTraceback[y][0] = dictTargetSequence["mySeq02"][y - 2]

scoreMatrixAllValues[1][1] = 0
scoreMatrixMaxValues[1][1] = 0

for x in range(1, len(dictTargetSequence["mySeq01"]) + 2):
  scoreMatrixAllValues[1][x] = int((x - 1) * scoringWeight[2])
  scoreMatrixMaxValues[1][x] = int((x - 1) * scoringWeight[2])
  scoreMatrixArrows[1][x]    = [0]
for y in range(1, len(dictTargetSequence["mySeq02"]) + 2):
  scoreMatrixAllValues[y][1] = int((y - 1) * scoringWeight[2])
  scoreMatrixMaxValues[y][1] = int((y - 1) * scoringWeight[2])
  scoreMatrixArrows[y][1]    = [2]
scoreMatrixArrows[1][1]      = "-"
scoreMatrixTraceback[1][1]   = "-"



# determine score matrix
for jj in range(2,len(dictTargetSequence["mySeq02"]) + 2):    # y-axis
  for ii in range(2,len(dictTargetSequence["mySeq01"]) + 2):  # x-axis
    # if ii > 0 and jj > 0:
      T = []
      if dictTargetSequence["mySeq02"][jj - 2] == dictTargetSequence["mySeq01"][ii - 2]:
        scoreTmp = scoringWeight[0]
      elif dictTargetSequence["mySeq02"][jj - 2] != dictTargetSequence["mySeq01"][ii - 2]:
        scoreTmp = scoringWeight[1]
      T.append(scoreMatrixMaxValues[jj][ii-1]   + scoringWeight[2])  # left
      T.append(scoreMatrixMaxValues[jj-1][ii-1] + scoreTmp)          # diagonal
      T.append(scoreMatrixMaxValues[jj-1][ii]   + scoringWeight[2])  # up
      scoreMatrixAllValues[jj][ii] = T
      scoreMatrixMaxValues[jj][ii] = max(T)
      scoreMatrixArrows[jj][ii]    = [i for i, j in enumerate(T) if j == max(T)]


dashlineLength = 140
print("-" * dashlineLength)
print(" " * 40 + "Score matrix with all value-triplets".upper())
print(" " * 43 + "(0: left, 1: diagonal, 2: up)".upper())
print("-" * dashlineLength)
# print("\n")
for jj in range(len(scoreMatrixAllValues)):
  print("\t", end='')
  for ii in range(len(scoreMatrixAllValues[jj])):
    currentValue = scoreMatrixAllValues[jj][ii]
    if isinstance(currentValue, list):
      print(currentValue, "\t", end='')
    if isinstance(currentValue, str):
      if jj == 0:
        print(currentValue, (15 - len(currentValue)) * " ", end = '')
      else:
        print(currentValue, "\t", end = '')
    if isinstance(currentValue, int):
      if jj == 0:
        print(currentValue, (5-len(str(currentValue))) * " ", end = '')
      elif jj == 1 and ii > 1:
        print(currentValue, "\t\t", end='')
      else:
        print(currentValue, "\t", end = '')
  if jj < (len(scoreMatrixAllValues) - 1):
    print("\n")
print("")
print("-" * dashlineLength)
print("\n")


print("-" * dashlineLength)
print(" " * 40 + "Score matrix with highest values".upper())
print("-" * dashlineLength)
# print("\n")
for jj in range(len(scoreMatrixMaxValues)):
  print("\t\t\t\t", end = '')
  for ii in range(len(scoreMatrixMaxValues[jj])):
    currentValue = scoreMatrixMaxValues[jj][ii]
    if isinstance(currentValue, list):
      print(currentValue, "\t", end = '')
    if isinstance(currentValue, str):
      if jj == 0:
        print(currentValue, "\t", end = '')
      else:
        print(currentValue, "\t", end = '')
    if isinstance(currentValue, int):
      if jj == 0:
        print(currentValue, (5 - len(str(currentValue))) * " ", end = '')
      elif jj == 1 and ii > 1:
        print(currentValue, "\t", end = '')
      else:
        print(currentValue, "\t", end = '')
  if jj < (len(scoreMatrixMaxValues) - 1):
    print("\n")
print("")
print("-" * dashlineLength)
print("\n")


print("-" * dashlineLength)
print(" " * 40 + "Score martrix showing the Arrowhead Orientation".upper())
print(" " * 43 + "(0: left, 1: diagonal, 2: up)".upper())
print("-" * dashlineLength)
# print("\n")
for jj in range(len(scoreMatrixArrows)):
  print("\t\t\t\t", end='')
  for ii in range(len(scoreMatrixArrows[jj])):
    currentValue = scoreMatrixArrows[jj][ii]
    if isinstance(currentValue, list):
      print(currentValue, "\t", end='')
    if isinstance(currentValue, str):
      if jj == 0:
        print(currentValue, "\t", end='')
      else:
        print(currentValue, "\t", end='')
    if isinstance(currentValue, int):
      if jj == 0:
        print(currentValue, (5-len(str(currentValue))) * " ", end='')
      elif jj == 1 and ii>1:
        print(currentValue, "\t", end = '')
      else:
        print(currentValue, "\t", end = '')
  if jj < (len(scoreMatrixArrows) - 1):
    print("\n")
print("")
print("-" * dashlineLength)
print("\n")

--------------------------------------------------------------------------------------------------------------------------------------------
                                        SCORE MATRIX WITH ALL VALUE-TRIPLETS
                                           (0: LEFT, 1: DIAGONAL, 2: UP)
--------------------------------------------------------------------------------------------------------------------------------------------
	[] 	[] 	A               T               G               C               T               G               G               

	[] 	0 	-1 		-2 		-3 		-4 		-5 		-6 		-7 		

	A 	-1 	[-2, 2, -2] 	[1, -3, -3] 	[0, -4, -4] 	[-1, -5, -5] 	[-2, -6, -6] 	[-3, -7, -7] 	[-4, -8, -8] 	

	T 	-2 	[-3, -3, 1] 	[0, 4, 0] 	[3, -1, -1] 	[2, -2, -2] 	[1, 1, -3] 	[0, -4, -4] 	[-1, -5, -5] 	

	G 	-3 	[-4, -4, 0] 	[-1, -1, 3] 	[2, 6, 2] 	[5, 1, 1] 	[4, 0, 0] 	[3, 3, -1] 	[2, 2, -2] 	

	G 	-4 	[-5, -5, -1] 	[-2, -2, 2] 	[1, 5, 5] 	[4, 4, 4] 	[3, 3, 3] 	[2, 6, 2] 	[5, 5, 1] 	

	A 	-5 	[-6, -2, -2] 	[-3, -3, 1] 	[0, 0, 4] 	[3, 3, 3] 	[2, 2, 2] 	[1, 1, 5] 	[4, 4, 4] 	

	G 	-6 	[-7, -7, -3] 	[-4, -4, 0] 	[-1, 3, 3] 	[2, 2, 2] 	[1, 1, 1] 	[0, 4, 4] 	[3, 7, 3] 	

	A 	-7 	[-8, -4, -4] 	[-5, -5, -1] 	[-2, -2, 2] 	[1, 1, 1] 	[0, 0, 0] 	[-1, -1, 3] 	[2, 2, 6] 	
--------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------------------------------------------------------------------------------------------------
                                        SCORE MATRIX WITH HIGHEST VALUES
--------------------------------------------------------------------------------------------------------------------------------------------
				[] 	[] 	A 	T 	G 	C 	T 	G 	G 	

				[] 	0 	-1 	-2 	-3 	-4 	-5 	-6 	-7 	

				A 	-1 	2 	1 	0 	-1 	-2 	-3 	-4 	

				T 	-2 	1 	4 	3 	2 	1 	0 	-1 	

				G 	-3 	0 	3 	6 	5 	4 	3 	2 	

				G 	-4 	-1 	2 	5 	4 	3 	6 	5 	

				A 	-5 	-2 	1 	4 	3 	2 	5 	4 	

				G 	-6 	-3 	0 	3 	2 	1 	4 	7 	

				A 	-7 	-4 	-1 	2 	1 	0 	3 	6 	
--------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------------------------------------------------------------------------------------------------
                                        SCORE MARTRIX SHOWING THE ARROWHEAD ORIENTATION
                                           (0: LEFT, 1: DIAGONAL, 2: UP)
--------------------------------------------------------------------------------------------------------------------------------------------
				[] 	[] 	A 	T 	G 	C 	T 	G 	G 	

				[] 	- 	[0] 	[0] 	[0] 	[0] 	[0] 	[0] 	[0] 	

				A 	[2] 	[1] 	[0] 	[0] 	[0] 	[0] 	[0] 	[0] 	

				T 	[2] 	[2] 	[1] 	[0] 	[0] 	[0, 1] 	[0] 	[0] 	

				G 	[2] 	[2] 	[2] 	[1] 	[0] 	[0] 	[0, 1] 	[0, 1] 	

				G 	[2] 	[2] 	[2] 	[1, 2] 	[0, 1, 2] 	[0, 1, 2] 	[1] 	[0, 1] 	

				A 	[2] 	[1, 2] 	[2] 	[2] 	[0, 1, 2] 	[0, 1, 2] 	[2] 	[0, 1, 2] 	

				G 	[2] 	[2] 	[2] 	[1, 2] 	[0, 1, 2] 	[0, 1, 2] 	[1, 2] 	[1] 	

				A 	[2] 	[1, 2] 	[2] 	[2] 	[0, 1, 2] 	[0, 1, 2] 	[2] 	[2] 	
--------------------------------------------------------------------------------------------------------------------------------------------

# determine traceback path
xpos = w - 2
ypos = h - 2
scoreMatrixTraceback[ypos][xpos]   = 1
pathList = []

while xpos > 0 and ypos > 0:
  # print(ypos, xpos)
  # print(scoreMatrixArrows[ypos][xpos])
  if isinstance(scoreMatrixArrows[ypos][xpos], list):
    if 1 in scoreMatrixArrows[ypos][xpos]:
      scoreMatrixTraceback[ypos][xpos] = 1
      xpos = xpos - 1
      ypos = ypos - 1
      pathList.append("D")
    elif 0 in scoreMatrixArrows[ypos][xpos]:
      scoreMatrixTraceback[ypos][xpos] = 1
      xpos = xpos - 1
      ypos = ypos
      pathList.append("L")
    elif 2 in scoreMatrixArrows[ypos][xpos]:
      scoreMatrixTraceback[ypos][xpos] = 1
      xpos = xpos
      ypos = ypos - 1
      pathList.append("U")
  elif scoreMatrixArrows[ypos][xpos] == "-":
    scoreMatrixTraceback[ypos][xpos] = 1
    xpos = 0
    ypos = 0
if xpos != 0 and  ypos != 0:
  print("ERROR: starting position not reached")


print("-" * dashlineLength)
print(" " * 40 + "Score matrix with highest values")
print(" " * 43 + "(0: left, 1: diagonal, 2: up)")
print("-" * dashlineLength)
for jj in range(len(scoreMatrixTraceback)):
  print("\t\t\t\t", end='')
  for ii in range(len(scoreMatrixTraceback[jj])):
    currentValue = scoreMatrixTraceback[jj][ii]
    if isinstance(currentValue, list):
      print(currentValue, "\t", end='')
    if isinstance(currentValue, str):
      if jj == 0:
        print(currentValue, "\t", end='')
      else:
        print(currentValue, "\t", end='')
    if isinstance(currentValue, int):
      if jj == 0:
        print(currentValue, (5-len(str(currentValue))) * " ", end='')
      elif jj == 1 and ii>1:
        print(currentValue, "\t", end='')
      else:
        print(currentValue, "\t", end='')
  if jj < (len(scoreMatrixTraceback)-1):
    print("\n")
print("")
print("-" * dashlineLength)
print("\n")

--------------------------------------------------------------------------------------------------------------------------------------------
                                        Score matrix with highest values
                                           (0: left, 1: diagonal, 2: up)
--------------------------------------------------------------------------------------------------------------------------------------------
				[] 	[] 	A 	T 	G 	C 	T 	G 	G 	

				[] 	1 	[] 	[] 	[] 	[] 	[] 	[] 	[] 	

				A 	[] 	1 	[] 	[] 	[] 	[] 	[] 	[] 	

				T 	[] 	[] 	1 	[] 	[] 	[] 	[] 	[] 	

				G 	[] 	[] 	[] 	1 	1 	1 	[] 	[] 	

				G 	[] 	[] 	[] 	[] 	[] 	[] 	1 	[] 	

				A 	[] 	[] 	[] 	[] 	[] 	[] 	1 	[] 	

				G 	[] 	[] 	[] 	[] 	[] 	[] 	[] 	1 	

				A 	[] 	[] 	[] 	[] 	[] 	[] 	[] 	1 	
--------------------------------------------------------------------------------------------------------------------------------------------

# https://stackoverflow.com/questions/72751096/plot-a-grid-with-arrows-on-it

import numpy as np
import matplotlib.pyplot as plt
import matplotlib

mySeq02Reverse = list(reversed(dictTargetSequence["mySeq02"]))
scale = 0.2
fig, ax = plt.subplots(figsize=(7, 7))
ax.set_xlim(-0.5, len(["",*dictTargetSequence["mySeq01"]]) + 0.5)
ax.set_ylim(-0.5, len(["",*dictTargetSequence["mySeq02"]]) + 0.5)



ax.set_xticks(range(len(["",*dictTargetSequence["mySeq01"]])+1),["","",*dictTargetSequence["mySeq01"]], weight = 'bold')
ax.set_yticks(range(len([*mySeq02Reverse])+2),[*mySeq02Reverse,"",""], weight = 'bold')
ax.tick_params(top=True, labeltop=True, bottom=False, labelbottom=False)
ax.set_title('mySeq01', fontsize=16, weight = 'bold')
ax.set_ylabel('mySeq02', fontsize=16, weight = 'bold')

ii=0.1
jj=0.1
# plt.text(ii,jj, "x", horizontalalignment='center', verticalalignment='center', transform=ax.transAxes)

for jj in range(0,len(dictTargetSequence["mySeq02"]) + 2):    # y-axis
  plt.axhline(y = jj - 0.5, linestyle = '--', alpha = 0.5,linewidth = 0.5)
  for ii in range(0,len(dictTargetSequence["mySeq01"]) + 2):  # x-axis
        if jj == 0:
            plt.axvline(x = ii-0.5, linestyle = '--', alpha = 0.5,linewidth = 0.5)
        
        if ii > 1 and jj > 0:
            if scoreMatrixTraceback[jj][ii] == 1:
                arrowcolor="green"
            else:
                arrowcolor="blue"
            if 1 in scoreMatrixArrows[jj][ii]:
                plt.arrow(ii,len(dictTargetSequence["mySeq02"]) + 2 - jj-1, scale*(-1), scale*(1), head_width=0.1, color=arrowcolor)
            if 0 in scoreMatrixArrows[jj][ii]:
                plt.arrow(ii,len(dictTargetSequence["mySeq02"]) + 2 - jj-1, scale*(-1), scale*(0), head_width=0.1, color=arrowcolor)
            if 2 in scoreMatrixArrows[jj][ii]:
                plt.arrow(ii,len(dictTargetSequence["mySeq02"]) + 2 - jj-1, scale*(0), scale*(1), head_width=0.1, color=arrowcolor)
        if ii==1 and jj > 1:
            if scoreMatrixTraceback[jj][ii] == 1:
                arrowcolor="green"
            else:
                arrowcolor="blue"
            if 1 in scoreMatrixArrows[jj][ii]:
                plt.arrow(ii,len(dictTargetSequence["mySeq02"]) + 2 - jj - 1, scale*(-1), scale*(1), head_width=0.1, color=arrowcolor)
            if 0 in scoreMatrixArrows[jj][ii]:
                plt.arrow(ii,len(dictTargetSequence["mySeq02"]) + 2 - jj - 1, scale*(-1), scale*(0), head_width=0.1, color=arrowcolor)
            if 2 in scoreMatrixArrows[jj][ii]:
                plt.arrow(ii,len(dictTargetSequence["mySeq02"]) + 2 - jj - 1, scale*(0), scale*(1), head_width=0.1, color=arrowcolor)
        if jj==1 and ii==1:
            # plt.Circle((ii,len(dictTargetSequence["mySeq02"]) + 2 - jj-1), 0.2, color='r')
            circle1 = plt.Circle((ii,len(dictTargetSequence["mySeq02"]) + 2 - jj-1), 0.05, color='g')
            ax.add_patch(circle1)
        if jj < 1 or ii < 1:
            circle1 = plt.Circle((ii,len(dictTargetSequence["mySeq02"]) + 2 - jj-1), 0.05, color='r')
            ax.add_patch(circle1)

if saveGallery == True:
    plt.savefig('./../GALLERY/Sequence Analysis/03_02_Needleman_Wunsch_Algorithm.png', bbox_inches='tight')
plt.show()


dashlineLength = 40
indentSpace = 3
charSpacing = 3
print("-" * dashlineLength)
print(" " * 7 + "PATH-RESULT DERIVED FROM PLOT")
print(" " * 7 + "(L: left, D: diagonal, U: up)")
print("-" * dashlineLength)
print(" "*indentSpace,end='')
for ii in range(len(pathList)):
    print(pathList[ii],end='')
    print(" " * charSpacing, end = '')
print("")
print("-" * dashlineLength)
pathListReverse = list(reversed(pathList))

----------------------------------------
       PATH-RESULT DERIVED FROM PLOT
       (L: left, D: diagonal, U: up)
----------------------------------------
   U   D   U   D   L   L   D   D   D   
----------------------------------------

alignedTargetSequence = {}
alignedTargetSequence["mySeq01"] = []
alignedTargetSequence["mySeq02"] = []
mySeq01Pos = 0
mySeq02Pos = 0

for ii in range(len(pathList)):
    if pathListReverse[ii] == "D":
        alignedTargetSequence["mySeq01"].append(dictTargetSequence["mySeq01"][mySeq01Pos])
        alignedTargetSequence["mySeq02"].append(dictTargetSequence["mySeq02"][mySeq02Pos])
        mySeq01Pos += 1
        mySeq02Pos += 1
    elif pathListReverse[ii] == "L":
        alignedTargetSequence["mySeq01"].append(dictTargetSequence["mySeq01"][mySeq01Pos])
        alignedTargetSequence["mySeq02"].append("-")
        mySeq01Pos += 1
        # mySeq02Pos += 1
    elif pathListReverse[ii] == "U":
        alignedTargetSequence["mySeq01"].append("-")
        alignedTargetSequence["mySeq02"].append(dictTargetSequence["mySeq02"][mySeq02Pos])
        # mySeq01Pos += 1
        mySeq02Pos += 1

dashlineLength = 70
indentSpace = 20
charSpacing = 3
print("-" * dashlineLength)
print(" " * 7 + "ALIGNMENT-RESULT BASED ON NEEDLEMAN-WUNSCH-ALGORITHM")
print("-" * dashlineLength)
print("   mySeq01: ",end='')
print(" "*(indentSpace-len("   mySeq01: ")),end='')
for ii in range(len(alignedTargetSequence["mySeq01"])):
    print(alignedTargetSequence["mySeq01"][ii],end='')
    print(" " * charSpacing, end = '')
print("")
print(" "*indentSpace,end='')
for ii in range(len(alignedTargetSequence["mySeq02"])):
    if alignedTargetSequence["mySeq01"][ii] == alignedTargetSequence["mySeq02"][ii]:
        print("|",end='')
    else:
        print(" ",end='')
    print(" " * charSpacing, end = '')
print("")
print("   mySeq02: ",end='')
print(" "*(indentSpace-len("   mySeq02: ")),end='')

for ii in range(len(alignedTargetSequence["mySeq02"])):
    print(alignedTargetSequence["mySeq02"][ii],end='')
    print(" " * charSpacing, end = '')
print("")
print("-" * dashlineLength)
print("\n")

----------------------------------------------------------------------
       ALIGNMENT-RESULT BASED ON NEEDLEMAN-WUNSCH-ALGORITHM
----------------------------------------------------------------------
   mySeq01:         A   T   G   C   T   G   -   G   -   
                    |   |   |           |       |       
   mySeq02:         A   T   G   -   -   G   A   G   A   
----------------------------------------------------------------------

%%capture
# https://www.blog.pythonlibrary.org/2021/02/02/drawing-text-on-images-with-pillow-and-python/

import ModulesOwn.A_Groundwork as Groundwork
import ModulesOwn.C_Mutations as Mutations
from IPython.display import clear_output
import time
import PIL
from ModulesExternal.TerminalColors import TerminalColors
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image, ImageDraw, ImageFont
# matplotlib.use('Agg')

# https://thesmithfam.org/blog/2012/10/25/temporarily-suppress-console-output-in-python/
from contextlib import contextmanager
import sys, os


colorRed          = "\033[31m"
colorGreen        = "\033[32m"
colorDefault      = "\033[39m"
cgreen = [0,255,0]
cred = [255,0,0]
cdefaucgreen= [0,0,0]

alignedTargetSequence={}
scoreAllWindow=[]
dictTargetSequence             = {}
dictTargetSequence["mySeq01"]  = "Gtg Atg gTt gcA gga Ttg ctg gaA tgc tgg".replace(" ","").upper() # x
dictTargetSequence["mySeq02"]  = "atg GAg A".replace(" ","").upper() # y
scoringWeight   = [2, -5, -1]
mySeq01StrColor=""
mySeq02StrColor=""

indentCount = 20

for kk in range(len(dictTargetSequence["mySeq01"])-len(dictTargetSequence["mySeq02"])+1):
    
    scoreAll=0
    scoreMatrixAllValues, scoreMatrixMaxValues, scoreMatrixArrows      = Mutations.needlemanWunschAlgorithm(dictTargetSequence["mySeq01"][kk:kk+len(dictTargetSequence["mySeq02"])], dictTargetSequence["mySeq02"], scoringWeight)
    scoreMatrixTraceback, pathList                                     = Mutations.needlemanWunschTracebackPathList(dictTargetSequence["mySeq01"][kk:kk+len(dictTargetSequence["mySeq02"])], dictTargetSequence["mySeq02"], scoreMatrixArrows)
    alignedTargetSequence["mySeq01"], alignedTargetSequence["mySeq02"] = Mutations.needlemanWunschComputeResult(dictTargetSequence["mySeq01"][kk:kk+len(dictTargetSequence["mySeq02"])], dictTargetSequence["mySeq02"], pathList)
    
    for jj in range(2,len(scoreMatrixTraceback)):
        for ii in range(2,len(scoreMatrixTraceback[jj])):
            if scoreMatrixTraceback[jj][ii] == 1:
                scoreAll = scoreAll + scoreMatrixTraceback[jj][ii] * scoreMatrixMaxValues[jj][ii]
    
    scoreAllWindow.append(scoreAll)
    
    mySeq01Str = ''.join(alignedTargetSequence["mySeq01"])
    mySeq02Str = ''.join(alignedTargetSequence["mySeq02"])

    mySeq01StrBeginning = dictTargetSequence["mySeq01"][:kk].lower()
    mySeq01StrEnd       = dictTargetSequence["mySeq01"][kk+len(dictTargetSequence["mySeq02"]):].lower()
    
    xpos = 30
    ypos = 30
    image = Mutations.text_color(xpos, ypos, mySeq01StrBeginning, mySeq01Str, mySeq01StrEnd, mySeq02Str, scoreAllWindow, scoreMatrixTraceback, scoreMatrixArrows, dictTargetSequence["mySeq01"], dictTargetSequence["mySeq02"],kk)
    image.save("./../NeedlemanWunschSeq/"+str(f"{kk:02d}")+".png")
    
    time.sleep(0.1)

import numpy as np
from PIL import Image
from PIL import Image
import math
import imageio
import random
import os

frames_All=[]
for kk in range(23):
    
    img = Image.open("./../NeedlemanWunschSeq/"+str(f"{kk:02d}")+".png")
    img=img.resize((600,300))
    # print("hello")
    frames_All.append(img)

frames_All[0].save('./gifs/NeedlemanWunschSeq.gif',
            save_all = True, append_images = frames_All[:], 
            optimize = True, loop = 0, duration = 420, format='GIF', disposal=2)

35204

# 2BeWorkedOn

4. K-mer-Analysis of a Sequence ¶

This section is about the analysis of a target sequence in view of it's k-mer subcomponents. It is a direct preparation to put together small DNA-fragments with overlap in order to get the full picture, as it is required for NGS-data exploration of very large DNA sequences.

4.1 Analysis of tRNA Using K-Fragmentation ¶

# data sources
# RNA-Sequence: S. cerevisiae initiator methionyl tRNAs
# Kolitz SE, Lorsch JR. Eukaryotic initiator tRNA: finely tuned and ready for action.
# FEBS Lett. 2010 Jan 21;584(2):396-404. doi: 10.1016/j.febslet.2009.11.047.
# PMID: 19925799; PMCID: PMC2795131.
# - https://pubmed.ncbi.nlm.nih.gov/19925799/
# 
# Tertiary structure scheme of t-RNA published by user Yikrazuul under
# the Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license
# - https://en.wikipedia.org/wiki/Transfer_RNA#/media/File:TRNA-Phe_yeast_1ehz.png
# adapted in terms of labeling
# 
# https://www.ncbi.nlm.nih.gov/nuccore/S70505.1
# AGCAGAGUGGCGCAGCGGAAGCGCGCUGGGCCCAUAACCCAGAGGDCGAZGGAUCGAAACCAUCCUCUFCUACCA
# dictTargetSequence = {}
# - https://de.wikipedia.org/wiki/TRNA


# include modules
from Bio.Seq import Seq
from IPython.display import Image as ImageIPython
import json
from ModulesExternal.TerminalColors import TerminalColors
import ModulesOwn.A_Groundwork as Groundwork
import ModulesOwn.D_KmerAnalysis as KmerAnalysis
import os
from sty import fg, bg, ef, rs


# initialize and define variables and parameter
# Acceptor stem: 1 and 6
# D arm        : 2
# Anticodon arm: 3
# Variable loop: 4
# T arm        : 5
# CCA tail     : 7
#           1        2                3                  4     5                 6       7
Seq_tmp0 = "AGCGCCGU GGCGCAGDGGAAGCGC GCAGGGCUCAUAACCCUG AUGDC CUCGGAUCGAAACCGAG CGGCGCU ACCA"
Seq_tmp  = Seq_tmp0.replace(" ","")
dictSeqGroup = {"targetSeq": {
                    "mySequence": Seq_tmp,
                }
            }
myColors                   = [TerminalColors.Green, TerminalColors.Blue, TerminalColors.Yellow]
myDefaultColor             = TerminalColors.Default
myColorWarning             = TerminalColors.Red


# load data
if os.path.exists("./Part01Data/04_01_DictSeqGroup.json"):
  os.remove("./Part01Data/04_01_DictSeqGroup.json")


# compute and visualize results
sequence_colored = fg(128,0,200) + "AGCGCCGU" + fg(128,0,0) + "GGCGCAGDGGAAGCGC" + fg(0,94,254) + "GCAGGGCU" + fg(100,100,100) 
sequence_colored = sequence_colored + "CAU" + fg(0,94,254) + "AACCCUG" + fg(211,85,0) + "AUGDC" + fg(32,118,0) + "CUCGGAUCGAAACCGAG" + fg(128,0,200)
sequence_colored = sequence_colored + "CGGCGCU" + f"{TerminalColors.Yellow}" + "ACCA" + f"{TerminalColors.Default}"
print("-" * 99, end='\n')
print("\t\t\tTertiary structure scheme of a tRNA molecule", end='\n')
print("-" * 99, end='\n')
print("Acceptor stem: \t " + fg(128,0,200) + "purple \n" + f"{TerminalColors.Default}" +
      "D arm: \t\t " + fg(128,0,0) + "red \n" + f"{TerminalColors.Default}" +
      "Anticodon arm: \t " + fg(0,94,254) + "blue \n" + f"{TerminalColors.Default}" +
      "Anticodon: \t " + fg(100,100,100) + "black \n" + f"{TerminalColors.Default}" +
      "CCA tail: \t " + f"{TerminalColors.Yellow}" + "yellow \n" + f"{TerminalColors.Default}" +
      "Variable loop: \t " + fg(211,85,0) + "orange \n" + f"{TerminalColors.Default}" +
      "T arm: \t\t " + fg(32,118,0) + "green" + f"{TerminalColors.Default}")
print("")
print("-" * 99, end='\n')
print("Target Sequence: \t" + sequence_colored)
print("RNA-Sequence: S. cerevisiae initiator methionyl tRNAs")
print("Source: Kolitz SE, Lorsch JR. Eukaryotic initiator tRNA: finely tuned and ready for action.")
print("FEBS Lett. 2010 Jan 21;584(2):396-404. doi: 10.1016/j.febslet.2009.11.047.")
print("PMID: 19925799; PMCID: PMC2795131.")
print("https://pubmed.ncbi.nlm.nih.gov/19925799/")
print("-" * 99, end='\n')
display(ImageIPython(filename = './Figures_scientific/TRNA-Phe_yeast.png', width=300))
print("Tertiary structure scheme of t-RNA published by user Yikrazuul under")
print("the Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license.")
print("https://en.wikipedia.org/wiki/Transfer_RNA#/media/File:TRNA-Phe_yeast_1ehz.png")

print("")
print("-" * 99, end='\n')
print("\t\t\t\tPalindroms as identified by algorithm")
print("-" * 99, end='\n')
print(f"{myDefaultColor}" + "Forward Sequence: \t " + f"{myColors[0]}" + "green" + f"{myDefaultColor}")
print("Reverse Complement: \t " + f"{myColors[1]}" + "blue" + f"{myDefaultColor}")
print("Overlays: \t\t " + f"{myColors[2]}" + "yellow" + f"{myDefaultColor}")
print("")

print("KLENGTH = 5")
print("-" * 99, end='\n')
klength = 5
dictSeqGroup = KmerAnalysis.FindPalindroms(dictSeqGroup, klength)
print("Target Sequence: \t" + sequence_colored)
print("-" * 99, end='\n')
with open("./ModulesOwn/D_KmerAnalysis/04_01_DictSeqGroup.json", "w") as outfile:
    json.dump(dictSeqGroup, outfile, indent = 4, sort_keys = True)
file1 = open('./ModulesOwn/D_KmerAnalysis/04_01_DictSeqGroup.json', 'r')
Lines = file1.readlines()
count = 0
for line in Lines:
    count += 1
file1.close()
seqstr=Groundwork.List2String(dictSeqGroup["targetSeq"]["mySequence"])
stringlist = KmerAnalysis.PrintPalindroms(dictSeqGroup, klength, myColors, myDefaultColor)

for ii in range(len(stringlist)):
    print("Palindroms "+"{:02d}".format(ii+1) + ": \t\t" +stringlist[ii])

print("")
print("KLENGTH = 7")
klength = 7
dictSeqGroup = KmerAnalysis.FindPalindroms(dictSeqGroup, klength)
print("-" * 99, end='\n')
print("Target Sequence: \t" + sequence_colored)
print("-" * 99, end='\n')
with open("./ModulesOwn/D_KmerAnalysis/04_01_DictSeqGroup.json", "w") as outfile:
    json.dump(dictSeqGroup, outfile, indent = 4, sort_keys = True)
file1 = open('./ModulesOwn/D_KmerAnalysis/04_01_DictSeqGroup.json', 'r')
Lines = file1.readlines()
count = 0
for line in Lines:
    count += 1
file1.close()
seqstr = Groundwork.List2String(dictSeqGroup["targetSeq"]["mySequence"])
stringlist = KmerAnalysis.PrintPalindroms(dictSeqGroup, klength, myColors, myDefaultColor)

for ii in range(len(stringlist)):
    print("Palindroms "+"{:02d}".format(ii+1) + ": \t\t" +stringlist[ii])

---------------------------------------------------------------------------------------------------
			Tertiary structure scheme of a tRNA molecule
---------------------------------------------------------------------------------------------------
Acceptor stem: 	 purple 
D arm: 		 red 
Anticodon arm: 	 blue 
Anticodon: 	 black 
CCA tail: 	 yellow 
Variable loop: 	 orange 
T arm: 		 green

---------------------------------------------------------------------------------------------------
Target Sequence: 	AGCGCCGUGGCGCAGDGGAAGCGCGCAGGGCUCAUAACCCUGAUGDCCUCGGAUCGAAACCGAGCGGCGCUACCA
RNA-Sequence: S. cerevisiae initiator methionyl tRNAs
Source: Kolitz SE, Lorsch JR. Eukaryotic initiator tRNA: finely tuned and ready for action.
FEBS Lett. 2010 Jan 21;584(2):396-404. doi: 10.1016/j.febslet.2009.11.047.
PMID: 19925799; PMCID: PMC2795131.
https://pubmed.ncbi.nlm.nih.gov/19925799/
---------------------------------------------------------------------------------------------------

Tertiary structure scheme of t-RNA published by user Yikrazuul under
the Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license.
https://en.wikipedia.org/wiki/Transfer_RNA#/media/File:TRNA-Phe_yeast_1ehz.png

---------------------------------------------------------------------------------------------------
				Palindroms as identified by algorithm
---------------------------------------------------------------------------------------------------
Forward Sequence: 	 green
Reverse Complement: 	 blue
Overlays: 		 yellow

KLENGTH = 5
---------------------------------------------------------------------------------------------------
Target Sequence: 	AGCGCCGUGGCGCAGDGGAAGCGCGCAGGGCUCAUAACCCUGAUGDCCUCGGAUCGAAACCGAGCGGCGCUACCA
---------------------------------------------------------------------------------------------------
Palindroms 01: 		AGCGCCGUGGCGCAGDGGAAGCGCGCAGGGCUCAUAACCCUGAUGDCCUCGGAUCGAAACCGAGCGGCGCUACCA
Palindroms 02: 		AGCGCCGUGGCGCAGDGGAAGCGCGCAGGGCUCAUAACCCUGAUGDCCUCGGAUCGAAACCGAGCGGCGCUACCA
Palindroms 03: 		AGCGCCGUGGCGCAGDGGAAGCGCGCAGGGCUCAUAACCCUGAUGDCCUCGGAUCGAAACCGAGCGGCGCUACCA
Palindroms 04: 		AGCGCCGUGGCGCAGDGGAAGCGCGCAGGGCUCAUAACCCUGAUGDCCUCGGAUCGAAACCGAGCGGCGCUACCA
Palindroms 05: 		AGCGCCGUGGCGCAGDGGAAGCGCGCAGGGCUCAUAACCCUGAUGDCCUCGGAUCGAAACCGAGCGGCGCUACCA
Palindroms 06: 		AGCGCCGUGGCGCAGDGGAAGCGCGCAGGGCUCAUAACCCUGAUGDCCUCGGAUCGAAACCGAGCGGCGCUACCA
Palindroms 07: 		AGCGCCGUGGCGCAGDGGAAGCGCGCAGGGCUCAUAACCCUGAUGDCCUCGGAUCGAAACCGAGCGGCGCUACCA

KLENGTH = 7
---------------------------------------------------------------------------------------------------
Target Sequence: 	AGCGCCGUGGCGCAGDGGAAGCGCGCAGGGCUCAUAACCCUGAUGDCCUCGGAUCGAAACCGAGCGGCGCUACCA
---------------------------------------------------------------------------------------------------
Palindroms 01: 		AGCGCCGUGGCGCAGDGGAAGCGCGCAGGGCUCAUAACCCUGAUGDCCUCGGAUCGAAACCGAGCGGCGCUACCA
Palindroms 02: 		AGCGCCGUGGCGCAGDGGAAGCGCGCAGGGCUCAUAACCCUGAUGDCCUCGGAUCGAAACCGAGCGGCGCUACCA
Palindroms 03: 		AGCGCCGUGGCGCAGDGGAAGCGCGCAGGGCUCAUAACCCUGAUGDCCUCGGAUCGAAACCGAGCGGCGCUACCA
Palindroms 04: 		AGCGCCGUGGCGCAGDGGAAGCGCGCAGGGCUCAUAACCCUGAUGDCCUCGGAUCGAAACCGAGCGGCGCUACCA
Palindroms 05: 		AGCGCCGUGGCGCAGDGGAAGCGCGCAGGGCUCAUAACCCUGAUGDCCUCGGAUCGAAACCGAGCGGCGCUACCA
Palindroms 06: 		AGCGCCGUGGCGCAGDGGAAGCGCGCAGGGCUCAUAACCCUGAUGDCCUCGGAUCGAAACCGAGCGGCGCUACCA
Palindroms 07: 		AGCGCCGUGGCGCAGDGGAAGCGCGCAGGGCUCAUAACCCUGAUGDCCUCGGAUCGAAACCGAGCGGCGCUACCA

5. Data Visualization ¶

5.1 Phred-Quality-Score-Depiciton for short FastQ-Sequences ¶

Goal: To visualize the quality scores of a short FastQ sequence.

# sources
# - https://www.biostars.org/p/317524/
# - https://gatk.broadinstitute.org/hc/en-us/articles/360035531872-Phred-scaled-quality-scores
# - https://www.geeksforgeeks.org/plot-a-vertical-line-in-matplotlib/
# - https://en.wikipedia.org/wiki/FASTQ_format
# - https://stackoverflow.com/questions/21465988/python-equivalent-to-hold-on-in-matlab
# - https://stackoverflow.com/questions/63470319/pyplot-add-single-text-to-xaxis-like-a-tick


# data sources
# Target Sequence taken from the example of the wikipedia article:
# - https://en.wikipedia.org/wiki/FASTQ_format


# include modules
import matplotlib.pyplot as plt
from ModulesExternal.TerminalColors import TerminalColors
import ModulesOwn.A_Groundwork as Groundwork
import numpy as np


# initialize and define variables and parameter
myDefaultColor                = TerminalColors.Default
dictTargetSequence               = {}
dictTargetSequence["mySequence"] = "GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAAT"

dictTargetSequence["Quality"]    = "!''*((((***+))%%%++)(%%%%).1***-+*''))"
labelNum, labelX = Groundwork.SeqNumberGen(dictTargetSequence["mySequence"], 10)


# compute and visualize results
def phredQualityTranform(mySeqQuality):
    plotvalues = []
    for ii in range(len(mySeqQuality)):
        plotvalues.append(ord(mySeqQuality[ii]))
    return plotvalues

plotvalues = phredQualityTranform(dictTargetSequence["Quality"])
x = np.arange(0, len(dictTargetSequence["Quality"]))
mcolor=[]
for ii in range(len(x)-1):
    if dictTargetSequence["mySequence"][ii] == "G":
        mcolor.append(TerminalColors.Green)
    elif dictTargetSequence["mySequence"][ii] == "C":
        mcolor.append(TerminalColors.Red)
    else:
        mcolor.append(TerminalColors.Blue)

plt.figure(figsize=(12,5))
ax = plt.axes()
plt.axis([0, len(plotvalues), min(plotvalues), max(plotvalues)])
for ii in range(len(x) - 1):
    if plotvalues[ii] < plotvalues[ii + 1]:
        mcolor="green"
    elif plotvalues[ii] > plotvalues[ii + 1]:
        mcolor="red"
    else:
        mcolor="blue"
    plt.plot(x[ii:ii + 2],plotvalues[ii:ii + 2], color = mcolor, linestyle = 'dashed', linewidth = 3) #, marker='o', markerfacecolor = mcolor, markersize = 10
    plt.text(ii - 0.25, min(plotvalues) - 1, dictTargetSequence["mySequence"][ii], color = "black", weight = "bold",fontsize = 12)
    plt.axvline(x = ii + 1, color = "black", linestyle='dashed', linewidth = 0.2, label = 'axvline - full height')
plt.text(ii + 0.75, min(plotvalues) - 1, dictTargetSequence["mySequence"][ii + 1], color = "black", weight = "bold",fontsize = 12)
plt.tick_params(axis = 'x', which = 'both', bottom = False, top = False, labelbottom = False)
plt.title("FastQ", weight = "bold", fontsize = 24)
plt.ylabel('Phred Score',weight="bold",fontsize = 16) 
plt.xlim(0, max(x))
plt.xticks(x, dictTargetSequence["mySequence"])
plt.yticks(np.arange(min(plotvalues), max(plotvalues) + 1, 1.0), weight = "bold", fontsize = 12)
plt.ylim(min(plotvalues), max(plotvalues))
if saveGallery == True:
    plt.savefig('./../GALLERY/Sequence Analysis/05_01_Phred_Quality_Score.png', bbox_inches='tight')
plt.show()

6. Appendix ¶

6.1 .html-Export of Code ¶

# export code to html
import os
os.system("jupyter nbconvert SequenceAnalysis.ipynb --to html")

fin = open("SequenceAnalysis.html", "rt", encoding="utf8")
fout = open("SequenceAnalysisPart01_mod.html", "wt", encoding="utf8")
for line in fin:
    if line.find("<ol>") > -1:
        line = line.replace("<ol>", "<ol type=\"1\">")
    elif line.find("<li>") > -1:
        line = line.replace("<li>", "<li type=\"1\">")
    fout.write(line)
fin.close()
fout.close()


import os
if os.path.isfile("SequenceAnalysis.html"):
    os.remove("SequenceAnalysis.html")
    os.rename("SequenceAnalysisPart01_mod.html", "SequenceAnalysis.html")

Sequence Analysis (in Progress)¶

Table of Contents¶

0. Reference Table¶

1. First Steps in Sequence Analysis (Groundwork)¶

1.1 Color Strings in Terminal Output¶

1.2 Computation of Complement, Reverse and Reverse Complement of ssDNA¶

1.3 Identification Sequences of Interest within a Target Sequence¶

1.4 Identification of Multiple Sequences of Interest within a Target Sequence¶

1.5 Data Grouping and Storage¶

1.5.1 Collection / Namedtuples¶

1.5.2 Store in JSON-Format¶

2. Set Up a simple Graphical User Interface (GUI)¶

3. Mutations, Insertions and Deletions¶

3.1 Point Mutations¶

3.1.1 Hamming Distance¶

3.1.2 Classify Mutations¶

3.1.3 Reading Frames and Gene Maps (SARS-CoV-2)¶

3.2 Needleman Wunsch Algorithm¶

4. K-mer-Analysis of a Sequence¶

4.1 Analysis of tRNA Using K-Fragmentation¶

5. Data Visualization¶

5.1 Phred-Quality-Score-Depiciton for short FastQ-Sequences¶

6. Appendix¶

6.1 .html-Export of Code¶

0. Reference Table ¶

1.1 Color Strings in Terminal Output ¶

1.2 Computation of Complement, Reverse and Reverse Complement of ssDNA ¶

1.3 Identification Sequences of Interest within a Target Sequence ¶

1.4 Identification of Multiple Sequences of Interest within a Target Sequence ¶

1.5 Data Grouping and Storage ¶

1.5.1 Collection / Namedtuples ¶

1.5.2 Store in JSON-Format ¶

3. Mutations, Insertions and Deletions ¶

3.1 Point Mutations ¶

3.1.1 Hamming Distance ¶

3.1.2 Classify Mutations ¶

3.2 Needleman Wunsch Algorithm ¶

4. K-mer-Analysis of a Sequence ¶

4.1 Analysis of tRNA Using K-Fragmentation ¶

5. Data Visualization ¶

5.1 Phred-Quality-Score-Depiciton for short FastQ-Sequences ¶

6. Appendix ¶

6.1 .html-Export of Code ¶