here it looks like it’s about 11 o’clock so we’re going to get started thank you all for joining us today for our webinar my name is Jackie Carville and I’ll be coordinating the webinar today I’m here with Matthew Kaiser who will be going over sequence editing and annotation within laser DNA stars laser gene genomics suite you may have noticed that your phone has been muted however we do encourage you to ask questions along the way to ask a question just type it into the chat dialog and select send to host I will then direct these questions to be to Matt to be answered for the whole group if you need any assistance or have any questions during the webinar you can send a chat message to me email me at webinars at DNA Starcom or tweet us at the twitter handle at DNA Star Inc and with that I’ll go ahead and turn it over to Matt Thank You Jackie I’ll switch over to my desktop here just a moment okay well welcome everybody and thanks for the intro Jackie today I’d like to do a webinar that discusses some of the more advanced tools that we have in our seek man Pro software that involved editing at different levels and also some different annotation options so I’ve been working with customers here for almost 10 years now so I have a pretty good idea of some of the features that I think are really useful and in some cases customers might miss or might not be aware that they occur in our in our software so hopefully we can cover some of those topics and of course I’ll take questions at the end so if there are things that I missed or didn’t explain clearly enough I’ll you’ll have a chance to get your questions answered so before we start as we like to do we’d like to give a little background on DNA star as a company there I know there are some of you that are familiar with our company and some of you are new and this might be your first webinar we are located in Madison Wisconsin I can assure you the weather looks absolutely nothing like this today it is snowing out and there’s over two feet of ice on Lake Monona hopefully in about a month and a half though it’ll it’ll look like this so so fingers are crossed on on that front so our company was founded by dr. Fred Blattner Fred in his lab or genomics pioneers and sequence e coli genome back in the 1990s so the software that they developed in the lab was commercialized and an ever since it’s really been a pioneer and and used by our customers for large projects like gentleman Belize and as well as smaller projects like you know assembling things like PCR implicants seek men Pro in particular was was a very important tool for putting together these genomic sequences and so Sigma and Pro has a lot of different options for those of you that work with seek men Pro and look at some of the menu options there you know that there’s many different things many different ways to edit sequences different ways to assemble sequences and it can be a little bit overwhelming and so the hope is today we’ll go through and I’ll try to outline the functionalities of seek man Pro and and we can kind of go through some different different workflows and so this software then was really designed out of a necessity for genomic sequencing and and when it was commercialized it was we’re able to you know have it run on desktop computers now more recently up on the cloud and in and hopefully to make these very powerful tools accessible to all of you our software is research grade it is most cited bioinformatics software in literature and so you can see right now we’re a little bit over 9,000 citations which and we hope that trend continues and so the support that we provide with the software is wide ranging I spend most of my time working with customers both customers who have purchased our software and have questions and as well as customers who are evaluating software my focus has been next-gen sequencing technologies and so so some of the data that we’ll look at today will be NGS but we’ll also spend some time with some Sanger data the support also includes things like our website we has numerous training videos so there’s and this is just a screenshot to show all different types of workflows and their short videos between a minute and five minutes long on all different topics and you can see under next-gen workflows there’s de novo assembly and ref guided gentleman bleah and so we’ll be talking about those workflows a bit today and looking at some of the tools there are also of course monthly webinars like the one that we’re doing right now so we try to do them once a month on all different topics and and

these are different people here looks like Jackie did the last three so December January February on three different topics and so if you have additional workflows that you’d like to learn more about you can certainly watch these up on YouTube and and learn more about our software so what we’re going to be talking about today is a sikh man and that both sikh man pro is where all the editing functions will be but we’ll also look a little bit at sikh man engine and so these are softwares that are part of the laser gene genomics suite and so the suite will include the assembly and downstream editing and analysis tools of course the NSR offers other packages that include things like phylogenetic alignments and structural biology and protein folding so again depending on your needs you may be interested in a subpage or the entire DNA star suite that includes all these sub packages so kind of the topic for today is is really seek man pro and the functionality that it provides and I’m going to try to explain the two different modes that Sikh man pro is used in and this is a you know it can be a bit confusing because there is one interface that does two quite different things and the two modes are the editable seek man files the sqd files and then the other mode is the uneditable BAM dot assembly files and depending on the type of project that you have you may have to use one workflow or the other or you may have the option to produce both types of files from say seek man engine assembly and to kind of describe it a little bit more detail the editable seat man sqd files when we’re in this mode this allows the user to do things like assemble sanger data so this is the you can drop sanger data right in the seek man Pro and align the data you can trim the Saenger files you can view the chromatograms do micro editing generate a consensus edit the consensus export the consensus from seek man so a common workflow would be a de novo genome assembly of a bacteria you know you will you want that to be editable so you can edit both the sequence data that’s aligned as well as the contexts that are formed so then there’s a little higher level editing so rather than single sequences you can assemble in in edit context which are clusters of sequences so I can break them apart I can edit the ends of the context and improve them from from the assembly and then there’s another level of editing and that’s the scaffolding so I can take context and I can order them into scaffolds and I can also have the ability to edit the scaffolds so so there’s different levels of editing that are possible in this seek man file that allows us to do things like genom closing so there’s additional alignment algorithms that allow us then to close gaps also to add annotations and we’ll look at that as well so a part of the workflow for you know for example for de novo genome assemblies to do all the editing create scaffolds close gaps add annotations now there are limitations to this workflow in our software the limit now is about 15 million reads and so that covers bacteria the smallest eukaryotes but if you have larger projects you know you know small to mid-size eukaryotes you you wouldn’t be able to necessarily load the whole genome in at one time I have customers that might work on one chromosome at a time that sits in the size so it’s something to be aware of is that there is a size limit here and you also get some performance issues that we can talk about that as these files get close to the limit there are certain algorithms in certain ways to edit that are that fall more in line with how the software functions and some of the others now the other mode is the BAM mode and we’ve got lots of webinars on our recorder on our website that focus on this new BAM file and the idea with BAM files are that seek man becomes really more of a viewer a way to look at assemble data rather than an editor and so in a common example would be if I have an RNA seek or a genome alignment to a human genome that I’ve done in in our seek men engine software that might produce a file that has chromosomes that are you know over a hundred mega bases long and may contain hundreds of millions even billions of sequence reads so they’re they’re gigantic files it’s not possible to edit those all these edit points that you have in a fully editable seek ment file you know add quite a bit of overhead to file sizes

and efficiency so so BAM files are great in that there’s no limit to the size really but they’re not editable now in this so snippin variant analysis is usually done in this kind of a workflow RNA seek and again the capacity we’ve had datasets with with close to 10 billion reads and an assembly so so we’ll think of this in two different modes and I’ll show this a little bit more in sequent engine we’re going to focus most of today on the editable version of the sequin file and so the editing tools again and we’ll try to go through these in the software Micro edits which are individual bases within individual aligned sequences or micro editing in the consensus again making human corrections on bases and then more wide scale editing context and scaffolds and then merging contexts together and so and there’s lots of different options within this these workflows then we’ll move on to some of the annotation options and so so annotations can be created in seek Man Pro four different ways maybe even five but we’ll talk about four one is of manually creating features so we can add a feature manually another way is to transfer features from a reference sequence to a consensus so a common example would be I use a reference genome to guide the assembly of my data the reference genome is annotated in in Mass I can transfer those annotations over to the consensus for my new organism I can also seek mental also Auto create features based on coverage information so different metrics from the coverage table for example regions that have high coverage or low coverage or regions where there’s a contig features are auto created and added to the consensus and we could look at those in our seek builder and then another way to bring in features is to blast regions and collect features from blast hits and add those with consensus and so again a number of nice ways to to bring annotations in into your project and so we’ll jump out of the PowerPoint here so that’s kind of an overview of what I’d like to do today and we’re going to start with a with a small project so I’m going to open seek man pro and the small project this is actually the demo data set that is provided with all the Sikh men demos and it is 14 trace files a bi files and usually I drag and drop these in and they go into this unassembled sequences window and I can go and look at the trace file so I just double click there and I can expand the trace file and take a look at it you can see the electropherogram Peaks there’s some options here I can zoom in and out so I have some nice control there I can also look at quality scores and if i zoom out too far then we lose room for them and we lose them so if I go back to quality it kind of expand so I can look at all the peaks oh it’s a nice view or four peaks and you see this vertical bar this bar is actually trim point and so right now there’s no trimming applied to these sequences and you can see the limit on the sequences all the way to 741 I can manually trim sequences so one way that some folks have you seek man is just to manually trim seconds I just grabbed the bar and dragged it and you can see that it’s it’s a little muted in color that means I added a trim point and it will be updated then in the limits range and I could also go down to the back end and then I could just kind of eyeball the quality scores and select the trim point so of course that’s nice if you have a project where you trying to get that last couple of base pairs to get something to align but of course you don’t want to do that for a lot of sequences so so we won’t do that there are automated trimmers and the reason I’m looking at trimmers now is that if you have a project that’s going to require editing before you do entity editing you want to be very certain that you’ve done an assembly at the right stringent seat that limits the amount of manual editing that you might have to do and so trimming and editing go really go hand in hand and so you want to be very familiar with the trimming tools that are available to you and seek man Pro has can assemble ABI files and it has its own set of options for cleaning up the data prior to assembly and we’ll also look at Sipan engine which is used for assembling next-gen data and it has its own set of tools for cleaning the data prior to assembly and so you want to be very familiar with with how to

trim now if I if I just assemble as is so I didn’t do any specify any trimming options I get a report file it tells me how things went together and you see that went together very quickly and so here’s the contig and I can see the ADI files here I’m just going to scroll through and you can see these red areas of mismatch that is and there’s quite a few of them and they seem to be in the five prime ends or three prime ends of some of the reads and this is this would be a five prime end you can see there’s quite a bit of mismatch there it’s not affecting the consensus that’s generated so we have the aligned reads here and this consensus is at the top seek man has a great consensus caller it’s really really robust I think it’s the best out in the market and so it usually can get it right even with noisy data but you don’t want to have airs if you can prevent them I’m also going to show a strategy view and this is more of a like a genome browser type view that shows us oops shows us where the reads are aligning and it gives us a depth of coverage histogram so I can see you know that were they all overlap each other you know we’ve got 12x coverage or something and out on the ends we just have one sequence and I can double click and go right to the point in the assembly now if I want to look at the trace data I can click the twisty triangles and I can see trace data if I hold control alt down and click I can do all of them at once and now I can also zoom in so I can see the peaks a little bit better and I can also again show quality scores all right so I can go in and look at this data quite well now those errors one of the questions that you’ll have are you know is the data clean enough to give me an accurate consensus sequence or am I going to have some problems in the consensus and so there’s ways to navigate the assembly rather than just scrolling through like this and there’s a little search window down them it’s easily missed down on the left corner here and if I click right on unspecified search I get a window that comes up that allows me to search through this project to look for potentially areas you know that are interesting and so one might be conflict and so I’m searching for a conflict now in the consensus between the aligned reads and so I’m going to search for anything where the consensus caller isn’t quite sure what to call at that location and so I can see here is a lowercase T and that means some of the reads have a strong tea and some there’s a visit just a gap it’s missing and so it’s not entirely confident now I could decide that you know this is some kind of a sequencing error so one thing I could do is edit one sequence and add a T there I don’t highly recommend editing the actual chromatogram at least in the project doesn’t actually change the original ABI files but it just changes what’s reported in seek men so that’s one kind of a micro edit is actually directly edit the ABI in some cases you can look at the peak and and figure out what you know why there’s some confusion at a certain peak another option is to actually edit the consensus itself so if I see we have an ambiguous here and I might say well even though I see season geez if I decide that that is a G I might override and just type a G in so I selected but now I’m noticing boy there’s a lot of a lot of errors here so I go to conflict I’m just clicking on these Chevron’s that move me from the next conflict I’m like wow that’s a lot of airs so I would stop at this point say you know what let’s try it let’s try to get a better assembly here I’m going to collapse this down again so I can see things a little bit better and I’m gonna make my quality scores go away it’s kind of how I prefer I can say oh yeah look at that assembly it’s just you know there’s something here so I might you know I might blast this I might highlight a sequence and say well what’s in my day to here that’s creating a problem and blast it I don’t I know what the problem is but I’ll blast that it’s going to hit vector sequence so this is a case where these trace files had some vector sequence that was untrimmed we also see this an NGS project so a real not so much the lumina genomic data but where we see these kind of end contaminants are things like transcriptome data or there may have been a cDNA library kit or something where some primers linkers adapters are untrimmed and on the ends of the sequences right and they show up just like this and in and then having you know millions of reads that have linkers like this you know causes a more inefficient assembly and lower quality and all that sort of thing so you want to trim back identify problems and trim

them and then go through all the steps of editing so I’m going to just close this I’m going to hit delete now and what happens is that contact dissolves and here’s my sequences again and I know that this has something called a Janus vector and the Janus vector you can see there’s a little question mark that means we haven’t we haven’t scanned for it yet and so I can go under options and I can scan all the sequences so I what it did there that just did a trimming step I said look for Janus and if it’s there trim it and the check mark means that that vector is found on every read in this assembly and now again there there are ways to use seek men just as a trimmer so you might have an instance where you really don’t want to assemble you just want to clean up the data and send them send the sequences somewhere and so I can do a contig export sequences into multiple files or single file and send out the trend sequences so that’s a workflow that occasionally comes up and now there’s there’s some options here under trim ends I have different stringencies and if I hover over them what this stringency is is a quality threshold it’s looking in a window along the sequence and-and-and and trimming due to the according to the quality and I can manually define window sizes all that sort of thing I’m going to leave it on I want to have some differences just for the webinar so I’m going to go with a vector trim but just low stringency and scan them later and now when I hit assemble it’ll do that last kind of a little bit of a change in the quality trim and it’s doing the vector trim so now when I look at my context and looking at these ends and I don’t see as many both scroll through so we just reduce the amount of mismatch by a lot and so now I might be ready I might say well go back into a high high string chancy assembly see if we can remove some of these these ends but we’ll go with this is for the webinar sake okay so I’m going to expand these out again and I’m going to go back to my search now I’m going to look for conflict in the consensus and there I find a conflict and so the conflict it looks like there is a C and one and some slippage possibly in another and I’m kind of looking at that Peaks a little bit of slippage so I think that the consensus is right and what’s really nice about seek man is that the consensus caller factors in the quality scores so it’s going to look at the quality and say we’ve got a high quality C here and nothing here what we’re still going to call it C but we’ll make it lowercase just so in case you may want to look at that okay and then I can go to the next conflict another Oh another thing I can do is you see that these these ends of these sequences have triangles on them what’s really nice about seek man and before I worked at DNA star I did a lot of manual editing like this with a program that when you trimmed it was lost for good and it really was a hindrance because I couldn’t go back and in and extend try men’s now when I extend this and look what happens all this mismatch that probably vector in some cases though it’s just low quality sequence see what happens here see there we didn’t now we’re getting a bunch of ends there but sometimes even a a crummy sequence like that you can get just a little bit more data to confirm your consensus and that one doesn’t work there let’s look for another one and you can see that’s the only conflict in the consensus so let’s change our search terms look for any differences so it’s a little so this is where seek man makes a call based on quality but other methods like the majority method shows ambiguity and so here we can kind of look at some cases where there’s some ambiguity but seek man is making this call and so I can kind of get faith in you know is seek man making the right call here so I can look at Peaks we have a little valley there with a P you know so I’m just kind of looking at those and so it looks like there’s enough that’s a tea let’s extend one see we have there another one of those valleys I think it’s probably C so I’m good with that but I might decide to make a change there and I can again go ahead to the next one and we have again a valley with a C call all right some ends there so it’s a nice way to move through the assembly and so here’s a so here’s maybe I can extend here nope I start getting more mismatched I keep running into that vector so so haven’t found a good spot where I can make an

extension again so and again if there’s any changes that you need to make if I decide that well I really think that’s a G you know I can go in and highlight and I can make that change in the consensus so again this is the kind of micro editing kind of capability again both the trace files and the consensus that are that are really nice you know especially when you’re working with a bi files now once I’m done making the edits edit that I want to make I can send this consensus different places so one thing I can do is save the consensus out to a single file and this dialog has changed so for those of you that have used seek men before please make note of some of the options that are here so I can save as a number of different sequence files and I can save either the entire consensus or a range of consensus and then I can also do things like include gaps or if I have a reference sequence present I can check this option so this is really really an important piece that gets overlooked fairly frequently and that’s if I had a reference sequence that I aligned to by default seek men will be forced to use that reference of the consensus and it has something to do with the snip calling so if I want to export something that is not the same as my reference I check this box and hit save and it’ll save out it’ll recompute a consensus without the reference bias so it’s a important little checkbox to know about okay so that is of some some of the micro edits now let’s um on an ABI filed on let’s go and look at what a NGS project might look like so if I’ve got a bigger project with Illumina data for instance I’m not going to assemble that in Sipan Pro I need a more powerful alignment program C pen engine which is designed handle NGS data and again there are there are webinars on specifically this workflow so I’m going to kind of move through this part quickly just to get to the project but so if I’m going to do a de novo assembly or a genome assembly I can pick it de novo option or if I have a template we can pick a template adoption and I’ll show you an example of a templated as well so engine is set up so that you can make different choices in your workflows and name the project this is going to be an e coli k12 strain using my seek Illumina data and you’ll notice the output options here and it’s in it’s important depending on your project to select the right output option and in so if you have a project that has under ten million you might be little stretches to 15 but if it’s you know microbial in size you can create this editable project to work with if you have a project that is larger and it’s de novo then the BAM file even a de novo BAM file is going to be the option that you’re gonna want to choose a files are we don’t run into those too often it’s also an editable format but it’s not very efficient with next-gen data so I’m going to pick a seek man Pro format that’s editable pick my read files write the technology and then I can load unpaired or paired data I loaded too fast queue files here with a very tiny little bin actually it’s not even an insert on many of the reads they’re nice long my secrets and so here’s the screen I really want to get to and that’s when I do a de novo assembly in particular I want to make sure if I’m going to be doing a lot of editing and I want to build a consensus for a new strain that’s as accurate as possible I want to make sure that I trim those reads as accurately as possible before they get to some and so seek men engine provides a whole host of different trimming tools and for instance if there is that adapt or vector that’s contaminating my data I can set and load in a vector file and sequin engine will do a vector adapter scan it is mer based identify those contaminating sequences and trim them off prior to assembly of course there’s also quality trimming you know in you know so you can and these are quality score ranges set for Illumina data if I loaded a different platform these are going to change a little bit the window sizes we can do fixed end trimming there’s quality score independent trimming called trim tumor that that is a good option if the quality scores aren’t as accurate in some cases the quality scores aren’t as accurate as they need to be so there’s a lot of different trimming options here to really clean that data up before you start doing any manual kind of editing okay so then we say okay and I’m just going to breeze through this and it’s ready to assemble and what we get is a dot sqd file it can be a lot bigger than the ABI file that we’re just looking at so I’m just going to quit this and open

up the project I think this is it here so here’s the de novo assembly when you open these de novo assemblies by default what you’ll have is a report file that summarizes the assembly tells me how long it took tells me which version of sequin engine I used how many contacts I have how many contexts reached the genome length the content 50 average coverage so I get all the metrics here so and also when I make edits the edits will be added to the end of this report file and so if I so that’s what we’ll look at a bit today so so we can sort the context different ways so this is a big editable project so I can sort by length I see we have some nice really big con tags and I can double click on them and we can look at you know and scroll through context all right there’s no trace here there’s still quality scores but we don’t have traces all right so collected and I can look at these in a strategy view as well and so the strategy of you now has got a lot more than our first project we can see these peaks and valleys the pair consistency is usually it’s green when we have distances with this data set there’s very little pair distance so it’s called inconsistent they actually overlap each other but that’s not really a concern it’s just a nature of this particular data set and so I can go through and look at coverage maps and so when I do a de novo assembly I’m really concerned you know is this an accurate assembly do we have areas in these contexts that look you know like maybe they weren’t assembled properly and so I don’t want to have to look through all the contexts but I might if I you know if I have a hundred content that I’m trying to put this genom together I may spend the time and look at each contig and then do some macro editing or some end editing so for example on this context I get these two sequences extending off the end and there’s a little bit a little bit of mismatch they’re not too big of a problem but if there is more mismatch especially at the join where it joins us a thicker area of coverage I might decide well that might be a chimeric read that’s contaminating which which does occur so I might say let’s just highlight those areas and hit delete and trim them so one thing I’ll do is I might manually go through some of these contexts and do this kind of micro editing it’s not necessarily one sequence at a time but it might highlight a group of them you know and I call them the toenails and kind of trim those off like that you know now what I just trimmed I probably wouldn’t have done that because I don’t see mismatched well here’s one that is okay here’s one where it’s a thinner area you can see we have so these are kind of areas that sometimes are see this kind of thin extension sometimes that is actually a chimeric read and I’d like to I like to trim those okay so that’s another way to kind of edit context now there might be errors in the middle of context now with Illumina data my seek it’s you know it’s really good data so there aren’t going to be a lot of errors in the assembly it’s going to an engine is very stringent but if you look you’re going to find a couple places in each assembly where you might not want to contig in one piece you might want to break it apart and so I’m going to actually I know there’s one in here and I have to sort by name so I’m going to sort by name and just keep it in order here so we’re going to go to contact 257 and I’m gonna go to the strategy view there’s an area that would be concerning and you can see that in the coverage map we have an area where there’s one this red line is one read that is joining two areas of coverage and it looks like there’s quite a bit of coverage here our excuse me there’s quite a bit of a match but it’s small and I might not trust that I might say well let’s break that apart and maybe try to confirm it in a way now one way to confirm is to manually go in just like we did the Sanger data and extend ends and I can see well that actually matches pretty good I’m gonna though and there’s some low quality what happened is there’s some low quality scores there that cause some of these reads to be trimmed back and so it may just be you know so I can extend now if I extend it I saw a lot of mismatch like that one has more mismatch you know so I’m not so sure I might look at a region like this and say well you know maybe that’s let’s go to the other side and here we seem a

little bit more mismatch in there so we can see why it trimmed back so something’s going on there so I might decide just break this contact apart so now is where we moved to rather than micro edits macro edit and I’m going to split this contig and now you’ll notice it’s taken a little long this is a pretty strong computer it took a little longer there then it did with the API files and the reason for that is there’s well many many more sequences in this in this contact in network the ABI so it’s something to be aware of is that when you start editing large files of lots of sequences you might not want to do that on a laptop you might want a little sturdier desktop so it can do those functions more robustly and I’m going to go back to the project report and I can see what happened we split content 257 it position 20 at 62 and we created a new con take 276 and so so we have two contexts now where we had one so I just edited that context and not what I can demonstrate so those two contexts we think may go together but we can there’s different ways we can confirm this and here’s a step in editing that is key when you have projects like this with a lot of context in them when we go to the content menu there are things where I can align and force join and do things to the project you want to resist clicking the line contigs when you’re unless you’re sure you want to align certain contexts together this date has already been aligned to seek men engine seek men Pro has its own aligning algorithms you don’t want to have to realign all these contexts together with Siegman Pro sequin engine has already done that so what I want to do is more focus kind of alignment so if I want to join these contexts together what I want to make sure to do is lock this project down and so what I’m going to do here is I think I can just go up here is lock everything first so now when you lock context keep this lock somewhat that means that these contexts cannot be realigned without unlocking them first and so I’m going to go down and so for example if I wanted to figure out if those two contexts really should have gone together I’m going to focus on them and I’m going to unlock 257 and 276 and there’s more than one way to do this but I could use some of the tools here now under the contact man you and you can see I can use something called extend contig ends and extend contact ends is an algorithm that’s doing what I had just done manually I was manually extending contig ends by grabbing the little black triangle and extending off the end but there’s an algorithm that can also do that in mass and so I can extend ends you can see they’re highlighted in yellow and go to the other one so you can see and there’s another extended end so it’s doing some of that kind of automatically and so I can extend those ends and now I can say well do these align together and so I can go back now and now at this point I might be doing some gap closure I might have decided well that was thin that area maybe I did a PCR amplify added another sequence to the project and again all the gap closer steps are there’s a whole webinar dedicated just a gap closure so an area like this I might actually have added another sequence I’m just demonstrating now I can break context apart I can put context back together so I’m go back to contig and i can align a couple different ways if I align context it’s only go look at those two and it’s safe to align them the reason that it would be unsafe to align all of them together is it is possible that that area is an is element that occurs you know 50 to 60 times in the c coli genome and that there may be other contexts that share homology with those those ends so if I unlocked everything and said reassemble I may get some reassortment that that I don’t want so I just want to try to put these two back together so I can align these contexts and so in my report now it says it’s merging them the percent match so again a contact editing I can put them together now another way to do this is I’m going to go back in and split a strategy view so let’s go back here I’m going to split my context one more time

okay so there’s 257 and 277 now another thing to do is to make another order above contact level at scaffold level so I could say let’s make a new scaffold and I can also maybe there’s an I think there’s another sequence that I could add to a scaffold so I can create scaffolds here and add more sequences to a scaffold and in a scaffold now there’s multiple ways to do this and again this is covered in some of our de novo genome work but if you have mate pair data we can auto create scaffolds we have an automated genome workflow that will automatically create scaffolds and put all these contexts in a putative order or can do it manually like I’ve just done the scaffold has another order of information called position the position and I’ll I can add a position you know one and I’m just going to change that to so however I get this information I might think that they are in position one two three and I can sort the position column and I can align now there’s another way try to merge with context now I know that I added contact 264 that poly isn’t going to overlap but I can say let’s see if there’s overlap now in these scaffolds so there’s another algorithm it’s similar to align context but it’s aligned context and scaffold and and it is looking for overlap between adjoining contexts in this structure so again it’s another way to merge context safely and you can see that the one that I just kind of randomly picked it didn’t align it but it did lineart to that that I had split into one contig again I can move I can edit a scaffold by moving I clicked and I can drag and drop that context a well that one didn’t fit in that scaffold maybe there’s another sequence that I can add in so contact editing then we can build scaffold we can edit scaffolds as well and and so forth so that’s a lot of the different editing tools now in the last couple of minutes here I wanna show you some feature handling tools and so I’m going to go to a different project now and there’s not as much content here so and that’s not that project this one okay and so this is again the same data set but aligned to a genomic template and this is a template that I’ve modified a bit and we’re looking in the strategy view and you can see now instead of a bunch of contexts we have one context it’s the full genome length and I’m looking in the strategy view and I can see all these little hash marks and if I hover I can see that this is a gene feature so we have upper and lower strand feature annotations now associated with the first you know part of this of this gen ohm and so there’s a couple different ways I can create features so one way is to highlight a region so I might just kind of go and select a region and go to the feature menu and create a new consensus feature we’ll call it Matt and I can give it a description it’s I can define what the feature is and we’ll pick gene okay so I’ve just manually created the feature oh there’s a test a test feature I did earlier so so right now there’s only two features on my consensus that is largely unanswered and this is a table than of consensus Cheers now I can also so I’m gonna close this and I’m just going to close this window again here and go back to my strategy view okay so here’s all the referenced features let’s see here how I get them to come back here for me and

I’m not getting in the pop back for me here well I’ll move to the next annotation okay so I can also bring in features by selecting a range and blasting it and when I get the blast hits back I can collect the features so so this takes several minutes to run and I can blast ranges up to tens of thousands of Kb somewhere in that range and I can look at the blast hits I don’t to make that a little bit bigger here see here we’ll go to font size make that more readable and so I can look at my different blast hits and if there’s features contained within these blast hits I can collect those features and so what it’s doing now is it’s saying in the in the database blast region on this sequence that I selected all these features are contained there and so I can add those directly to my consensus feature and so it’s a way to blast and annotate now another way that annotations are brought in is through the coverage report so I’ll go back here okay and so in the coverage report there are different regions so this report lists areas that are deeper or thinner than expected and they might be interesting and you know depending on what your analysis is you may want to know where there’s insufficient coverage or or excessive coverage and so I can go to project parameters and I can control what’s considered maximum coverage or minimum coverage so I can set that parameter and save the file and when I export the consensus to our seat builder program these locations will be will be annotated as feature types okay so so that’s manual addition blast coverage features and then there’s the transfer over I’m gonna try that again I might have got my C here I think I can get it back here you see oh I think I know what what’s going on here is this project has an unmarked reference right now alright so I just marked my reference and so on I’m going to show the feature table what I want to show is a feature table of all the reference features it might be building here just a little bit come on feature table okay okay so these are the features from my reference now so I had to remarket I’d forgotten that it unmarked it so here’s all the reference features now I can go to the reference features and see here select a gene and I have a couple different options I can either take all the features here and move them to my consensus or can do just subtypes with bacteria of course c.d.s overlap genes almost a perfect correlation so I may not want to transfer all of them so I can select all of type gene and say okay take all the gene features now and transfer them over and so I can go back and copy these selected now over the consensus sequence so now my consensus contains a mix of different features from a mix of different sources there’s those from the reference sequence manually created features blast collected features and then what we don’t actually see in this table are the coverage features now there’s multiple different things you can do I can take this this gonna have to select my contig here I can say it Consensus out into a single annotated file again I have a reference and assembly disassembly so I want to make sure to do an unforced consensus or unmarked the reference like I had had done before accidentally and so I can save this annotated dot seq file out or a GenBank file another thing I can do is just send the consensus out to our seek builder program and so seek builder then is our sequence editing program it

allows us to make circular Maps what we have visible here is a linear map there’s different feature types that are in this file now and I’m just going to show you again this was we started with a none annotated consensus and now we have different features that we can apply so for example you know here are gene features and I’m zoomed way out so I could zoom in and there’s Matt you know course seek builder has the ability to move features around and do things like attach labels all right I can set default colors and stylings for all the features again there’s separate videos that cover what seat bowler can do but we can see all these features the front half of the reference these are from the reference here’s the one that I manually created and I believe that these are the blast features here that are off in this little group here that I collected from blast hits and then of course we have the coverage features I’m gonna make the gene features go away too so they pop a little bit better and so coverage that exceed my 250 X threshold and there I can see some coverage features and there’s little blips here that you know I can zoom in on those we have some little ranges there where we had some Peaks probably over is elements or what we could we could probably turn this back on and maybe our ribosomal gene I think I’m not I’m not sure but yeah looks like a ribosomal gene we had some peaks of coverage so we can kind of interpret what some of those and coverage are so that’s what I all the material intended to cover today I know I went through a lot of stuff kind of fast I’m sure there’ll be some questions so I will stick around here for the next few minutes and answer any questions that have come up if I don’t get to your question today or in the webinar I’ll certainly answer you through email so again thanks for your attendance and I’ll take any questions