You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* removed dead code in io.go.
* merged Sequence structure into AnnotatedSequence and renamed AnnotatedSequence Sequence.
* added new mvp fasta io.
* rewrote FASTA IO to be sturdier.
* renamed feature.ParentAnnotatedSequence to feature.ParentSequence.
* removed whitespace constants and replaced with whitespace function.
* made .AddFeature() method public.
* added comment to .AddFeature() method.
* fixed comments in FASTA IO.
Copy file name to clipboardExpand all lines: docs/library-hashing.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ Hashes make incredibly powerful unique identifiers and with a wide array of hash
12
12
The golang team is currently figuring out the best way to implement blake3 into the standard library but in the meantime `poly` provides this special function and method wrapper to hash sequences using blake3. This will eventually be deprecated in favor of only using the `GenericSequenceHash()` function and `.Hash()` method wrapper.
@@ -33,7 +33,7 @@ Again, this will be deprecated in favor of using generic hashing with blake3 in
33
33
`poly` also provides a generic hashing function and method wrapper for hashing sequences with arbitrary hashing functions that use the golang standard library's hash function interface. Check out this switch statement in the [hash command source code](https://github.com/TimothyStiles/poly/blob/f51ec1c08820394d7cab89a5a4af92d9b803f0a4/commands.go#L261) to see all that `poly` provides in the command line utility alone.
Copy file name to clipboardExpand all lines: docs/library-io.md
+26-26Lines changed: 26 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,37 +3,37 @@ id: library-io
3
3
title: Sequence Input Output
4
4
---
5
5
6
-
At the center of `poly`'s annotated sequence support is the `AnnotatedSequence` struct. Structs are kind of Go's answer to objects in other languages. They provide a way of making custom datatypes and methods for developers to use. More on that [here](https://tour.golang.org/moretypes/2), [here](https://gobyexample.com/methods), and [here](https://www.golang-book.com/books/intro/9).
6
+
At the center of `poly`'s annotated sequence support is the `Sequence` struct. Structs are kind of Go's answer to objects in other languages. They provide a way of making custom datatypes and methods for developers to use. More on that [here](https://tour.golang.org/moretypes/2), [here](https://gobyexample.com/methods), and [here](https://www.golang-book.com/books/intro/9).
7
7
8
-
Anywho. `poly` centers around reading in various annotated sequence formats like genbank, or gff and parsing them into an `AnnotatedSequence` to do stuff with them. Whether that's being written out to JSON or being used by `poly` itself. Here are some examples.
8
+
Anywho. `poly` centers around reading in various annotated sequence formats like genbank, or gff and parsing them into an `Sequence` to do stuff with them. Whether that's being written out to JSON or being used by `poly` itself. Here are some examples.
9
9
10
10
## Readers
11
11
12
-
For all supported file formats `poly` supports a reader. A reader is a function literally named `ReadJSON(path)`, `ReadGbk(path)`, or `ReadGff(path)` that takes one argument - a filepath where your file is located, and returns an `AnnotatedSequence` struct.
12
+
For all supported file formats `poly` supports a reader. A reader is a function literally named `ReadJSON(path)`, `ReadGbk(path)`, or `ReadGff(path)` that takes one argument - a filepath where your file is located, and returns an `Sequence` struct.
These AnnotatedSequence structs contain all sorts of goodies but can be broken down into three sub main structs. `AnnotatedSequence.Meta`, `AnnotatedSequence.Features`, and `AnnotatedSequence.Sequence`.
20
+
These Sequence structs contain all sorts of goodies but can be broken down into three sub main structs. `Sequence.Meta`, `Sequence.Features`, and `Sequence.Sequence`.
21
21
22
22
> Before we move on with the rest of IO I think it'd be good to go over these sub structs in the next section but of course you can skip to [writers](#writers) if you'd like.
23
23
24
-
## AnnotatedSequence structs
24
+
## Sequence structs
25
25
26
-
Like I just said these AnnotatedSequence structs contain all sorts of goodies but can be broken down into three main sub structs:
26
+
Like I just said these Sequence structs contain all sorts of goodies but can be broken down into three main sub structs:
Here's how the AnnotatedSequence struct is actually implemented as of [commit c4fc7e](https://github.com/TimothyStiles/poly/blob/c4fc7e6f6cdbd9e5ed2d8ffdbeb206d1d5a8d720/io.go#L108).
32
+
Here's how the Sequence struct is actually implemented as of [commit c4fc7e](https://github.com/TimothyStiles/poly/blob/c4fc7e6f6cdbd9e5ed2d8ffdbeb206d1d5a8d720/io.go#L108).
33
33
34
34
```go
35
-
//AnnotatedSequence holds all sequence information in a single struct.
36
-
typeAnnotatedSequencestruct {
35
+
//Sequence holds all sequence information in a single struct.
36
+
typeSequencestruct {
37
37
MetaMeta
38
38
Features []Feature
39
39
SequenceSequence
@@ -42,11 +42,11 @@ Here's how the AnnotatedSequence struct is actually implemented as of [commit c4
42
42
43
43
> You can check out the original implementation [here](https://github.com/TimothyStiles/poly/blob/c4fc7e6f6cdbd9e5ed2d8ffdbeb206d1d5a8d720/io.go#L108) but I warn you that this is a snapshot and likely has been updated since last writing.
44
44
45
-
### AnnotatedSequence.Meta
45
+
### Sequence.Meta
46
46
47
47
The Meta substruct contains various meta information about whatever record was parsed. Things like name, version, genbank references, etc.
48
48
49
-
So if I wanted to get something like the Genbank Accession number for a AnnotatedSequence I'd get it like this:
49
+
So if I wanted to get something like the Genbank Accession number for a Sequence I'd get it like this:
50
50
51
51
```go
52
52
bsubAnnotatedSequence:=ReadGbk("data/bsub.gbk")
@@ -68,7 +68,7 @@ Same goes for a lot of other stuff:
68
68
Here's how the Meta struct is actually implemented in [commit c4fc7e](https://github.com/TimothyStiles/poly/blob/c4fc7e6f6cdbd9e5ed2d8ffdbeb206d1d5a8d720/io.go#L34) which is the latest as of writing.
69
69
70
70
```go
71
-
// Meta Holds all the meta information of an AnnotatedSequence struct.
71
+
// Meta Holds all the meta information of an Sequence struct.
72
72
typeMetastruct {
73
73
Namestring
74
74
GffVersionstring
@@ -93,9 +93,9 @@ Here's how the Meta struct is actually implemented in [commit c4fc7e](https://gi
93
93
94
94
You'll notice that there are actually three more substructs towards the bottom. They hold extra genbank specific information that's handy to have grouped together. More about how genbank files are structered can be found [here](https://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html).
95
95
96
-
### AnnotatedSequence.Features
96
+
### Sequence.Features
97
97
98
-
The `Features` substruct is actually a slice (golang term for what is essentially a dynamic length list) of `Feature` structs that can be iterated through. For example if you wanted to iterate through an `AnnotatedSequence`'s features and get their name (i.e GFP) and type (i.e CDS) you'd do it like this.
98
+
The `Features` substruct is actually a slice (golang term for what is essentially a dynamic length list) of `Feature` structs that can be iterated through. For example if you wanted to iterate through an `Sequence`'s features and get their name (i.e GFP) and type (i.e CDS) you'd do it like this.
99
99
100
100
```go
101
101
bsubAnnotatedSequence:=ReadGbk("data/bsub.gbk")
@@ -106,12 +106,12 @@ The `Features` substruct is actually a slice (golang term for what is essentiall
106
106
107
107
The `Feature` struct has about 10 or so fields which you can learn more about from this section in [commit c4fc7e](https://github.com/TimothyStiles/poly/blob/c4fc7e6f6cdbd9e5ed2d8ffdbeb206d1d5a8d720/io.go#L80).
108
108
109
-
### AnnotatedSequence.Sequence
109
+
### Sequence.Sequence
110
110
111
-
The AnnotatedSequence Sequence substruct is by far the most basic and critical. Without it well, you ain't go no DNA. The substruct itself has 4 simple fields.
111
+
The Sequence Sequence substruct is by far the most basic and critical. Without it well, you ain't go no DNA. The substruct itself has 4 simple fields.
112
112
113
113
```go
114
-
// Sequence holds raw sequence information in an AnnotatedSequence struct.
114
+
// Sequence holds raw sequence information in an Sequence struct.
115
115
typeSequencestruct {
116
116
Descriptionstring
117
117
Hashstring
@@ -122,7 +122,7 @@ The AnnotatedSequence Sequence substruct is by far the most basic and critical.
122
122
123
123
The `Description`, `Hash`, and `HashFunction` are at all identifying fields of the Sequence string. The `Description` is the same kind of short description you'd find in a `fasta` or `fastq` file. The `Hash` and `HashFunction` are used to create a unique identifier specify to the sequence string which you'll learn more about in the next chapter on sequence hashing.
124
124
125
-
To get an AnnotatedSequence sequence you can address it like so:
125
+
To get an Sequence sequence you can address it like so:
126
126
127
127
```go
128
128
bsubAnnotatedSequence:=ReadGbk("data/bsub.gbk")
@@ -133,10 +133,10 @@ To get an AnnotatedSequence sequence you can address it like so:
133
133
134
134
`poly` tries to supply a writer for all supported file formats that have a reader.
135
135
136
-
Writers take two arguments. The first is an AnnotatedSequence struct, the second is a path to write out to.
136
+
Writers take two arguments. The first is an Sequence struct, the second is a path to write out to.
137
137
138
138
```go
139
-
// getting AnnotatedSequence(s) to write out again.
139
+
// getting Sequence(s) to write out again.
140
140
bsubAnnotatedSequence:=ReadGbk("data/bsub.gbk")
141
141
142
142
// writing out gbk file input as json.
@@ -154,7 +154,7 @@ To get an AnnotatedSequence sequence you can address it like so:
154
154
155
155
## Parsers
156
156
157
-
`poly` parsers are what actually parse input files from a string without any of the system IO. This is particularly useful if you're like me and have an old database holding genbank files as strings. You can take those strings from a database or whatever and just pass them to `ParseGbk()`, or `ParseGff()` and they'll convert them into AnnotatedSequence structs.
157
+
`poly` parsers are what actually parse input files from a string without any of the system IO. This is particularly useful if you're like me and have an old database holding genbank files as strings. You can take those strings from a database or whatever and just pass them to `ParseGbk()`, or `ParseGff()` and they'll convert them into Sequence structs.
158
158
159
159
```go
160
160
puc19AnnotatedSequence:=ParseGbk("imagine this is actually a gbk in string format.")
@@ -164,10 +164,10 @@ That's it. The reason we don't have a `ParseJSON()` is that golang, like almost
164
164
165
165
## Builders
166
166
167
-
`poly` builders take AnnotatedSequence structs and use them to build strings for different file formats.
167
+
`poly` builders take Sequence structs and use them to build strings for different file formats.
168
168
169
169
```go
170
-
// generating an AnnotatedSequence struct from a gff file.
0 commit comments