CG_ju_001_StringBasics



In [1]:
"A"


Out[1]:
"A"

In [2]:
"ACGT"


Out[2]:
"ACGT"

In [3]:
st = "ACGT"


Out[3]:
"ACGT"

In [4]:
length(st) # getting the length of a string


Out[4]:
4

In [5]:
"" # empty string (epsilon)


Out[5]:
""

In [6]:
length("")


Out[6]:
0

In [7]:
"ACGT"[rand(1:4)] # generating a random nucleotide


Out[7]:
'G': ASCII/Unicode U+0047 (category Lu: Letter, uppercase)

In [8]:
"ACGT"[rand(1:4)] # repeated invocations might yield different nucleotides


Out[8]:
'C': ASCII/Unicode U+0043 (category Lu: Letter, uppercase)

In [9]:
"ACGT"[rand(1:4)] # repeated invocations might yield different nucleotides


Out[9]:
'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase)

In [10]:
"ACGT"[rand(1:4)] # repeated invocations might yield different nucleotides


Out[10]:
'G': ASCII/Unicode U+0047 (category Lu: Letter, uppercase)

In [11]:
"ACGT"[rand(1:4)] # repeated invocations might yield different nucleotides


Out[11]:
'C': ASCII/Unicode U+0043 (category Lu: Letter, uppercase)

In [12]:
# now I'll make a random nucleotide string by concatenating random nucleotides
st = join(["ACGT"[rand(1:4)] for _ in 1:40])


Out[12]:
"ATCGGATTCGTAAGTTAGTCGGCAGGTACGCCTTTCGCTG"

In [13]:
st[2:4] # substring from position 2 up to (not including) position 4


Out[13]:
"TCG"

In Julia, the lowest offset is 1! Unlike C, Java, Python, Go. But like R.

Also, ranges like 2:4 are inclusive at both ends. 2:4 asks for the part of the string from offset 2 through offset 4.


In [14]:
st[1:3] # prefix of length 3


Out[14]:
"ATC"

In [15]:
st[length(st)-2:length(st)] # suffix of length 3


Out[15]:
"CTG"

In [16]:
st[end-2:end] # another way of getting the suffix of length 3


Out[16]:
"CTG"

In [17]:
st1, st2 = "CAT", "ATAC"


Out[17]:
("CAT", "ATAC")

In [18]:
st1


Out[18]:
"CAT"

In [19]:
st2


Out[19]:
"ATAC"

In [20]:
st1 * st2 # concatenation of 2 strings


Out[20]:
"CATATAC"