– [Hans] Okay Hello, everyone, and welcome to this collaborative webinar hosted by Integrated DNA Technologies and presented by PacBio on characterizing Alzheimer’s disease candidate genes and transcripts using custom target capture probes and single-molecule long-read sequencing My name is Dr. Hans Packer, and I will be serving as the moderator for today’s presentation The presentation today will be given by Dr. Jenny Gu Dr. Gu is a strategic business development manager at PacBio where she works to develop effective partnerships for advancing genomic applications with long-read single-molecule sequencing And as part of that collaboration, Dr. Gu has been working with IDT to develop NGS solutions to fully capture genomic regions, transcript diversity, and regulatory regions of interest for various things That’s what this presentation is about for looking at this Alzheimer’s Disease panel The presentation today should last about 40 minutes, and following that presentation, Dr. Gu will answer as many questions as possible from you, the attendees You can ask your questions or make comments at any time by typing them into the questions box, which is located at the right hand side of your screen in the GoToWebinar control panel If you just click on this little up arrow symbol on the questions bar there, it’ll pop that out into a separate window and you can actually grab the corner and make it a little bigger It’s easier to type into it And ask your questions at any time So, at the end of the presentation, then I will forward as many of those on to her as possible, and hopefully we’ll get through all the questions But if we don’t, we will get you a response to your questions following the webinar As attendees, you’ve been muted today but we do encourage you to ask your questions at any time The webinar is also being recorded, and we will make the recording available to you following the webinar We post those on our Vimeo and YouTube channels, which are shown on the screen right now You don’t need to remember these addresses We will also send you the links But if you go to those, you will find that we also have quite a bit of other NGS content We also have stuff on CRISPR genome editing, qPCR, SNP genotyping We have a lot of other applications and molecular biology content on there, so it’s a great place to go for a lot of resources We will also be sharing the slide deck for today’s presentation, and that will be posted on our SlideShare site, which is also shown on the screen Similarly, we post all of our webinar presentations there, and there’s, again, a lot of great content So, if you want to look at something, spend a little more time with a concept, go to these places The SlideShare deck will be up probably later today, and the video will be up within a couple of days You don’t need to remember any of this Again, we’ll be sending you a follow up email and we’ll get you all the links that you need to access this content later So, with all that housekeeping stuff in order, I’m going to hand this over to Dr. Gu so she can get started – [Jenny] Well, Hans, thank you for the invitation I’m really excited to be sharing the work that we’ve done together with IDT It’s been a really great collaboration as we try to advance our new long capture solutions for the research community And I want to thank our listeners for tuning in I appreciate your time And this is really a fast changing landscape, so probably I’ll give you some updates and previews of what we might expect down the road So for today, I wanted to, at some point review quickly our sequencing platform, the Sequel, which gives you SMRT sequencing, molecular based reads Primarily, we worked with the IDT capture workflow that we recommend for long-reads, for both genomic and transcriptomic captures, and I’ll touch on some best practices that will be helpful to keep in mind to make sure you’re maximizing on the data recovery And in particular, I speak about this in the context of our Alzheimer’s disease work that we’ve done together with IDT So, Alzheimer’s disease, it’s a pretty common form of neurodegenerative disease that we’re aware of and causes dementia in people who are impacted by this disease We expect the number of people to be impacted to grow in the coming years, especially as the elderly population gets bigger That clinical characterization of Alzheimer’s

involves progressive loss of memory, deficits in thinking, problem solving, and it has impact on their language abilities Neuropathologically, you can characterize the disease by progressive cortical atrophy due to neuronal loss and characteristic intracellular and extracellular deposits, insoluble tau and amyloid beta proteins So this is typically how that disease manifests itself Now, genetically, it’s actually very interesting You can divide the patients into two different groups, early onset and late onset For early onset, it seems that there’s a small set of genes that can count for early onset pretty well, but it’s only a small fraction, so five to 10 percent of the patients’ cases that you see Most of the patients that have these symptoms, two to 10 percent of them first display them in their 20s or their 30s But the late onset Alzheimer’s, these tend to manifest itself after the age of 65 Late onset is pretty complex in that it’s multifactorial, which means multiple genes could be involved and contributing to the disease, and we found out there’s a pretty strong genetic predisposition, and that if you have a relative with Alzheimer’s disease, your relative risk for Alzheimer’s is about 3.5 to 7.5 higher And we’ve seen that about 30 to 48 percent of patients with Alzheimer’s tend to have a first degree relative that is also affected So in trying to pinpoint which genetic risk are contributing to the disease, a number of GWAS studies have been conducted, and through those studies, we’ve identified as a community 20 plus genetic risk loci, but these loci have small odds ratios, so none of them are really clear contenders to contribute to the disease These include both common functional variants, and also rare and structural variants in that makeup So, looking at the history and the progression of the research to better understand Alzheimer’s is quite interesting So APOE is actually the strongest genetic risk factor for late onset Alzheimer’s And this was identified in 1992, so 25 years ago So this is a relatively new effort, I think, what we’re just starting to understand possibly what could be contributing to Alzheimer’s So in the early 2000s and even mid 2000, even a couple years ago, we studied the number of GWAS had been conducted to try to identify some of these other genetic loci So these GWAS studies cumulatively involved about 75,000 patients And so a number of candidates have emerged The problem is many of these SNPs tend to fall in very dense regions of the genome that are dense in genes, so the candidate that are specifically involved in a disease remains unclear And typically, the risk loci are not necessarily the causative genes So, following the GWAS studies, this requires a number of follow up studies to further characterize the candidate disease genes This involves DNA sequencing, transcriptome sequencing, proteomic studies, and methylomic studies And so the number of variants that have been identified, for example, is copy number variations in the CR1 gene or even loss of function mutations Splice variants have been identified, insertions and deletions And so the functional studies are required to further pinpoint the genes that may be causing the disease Here we wanna offer you another approach to try to further characterize and identify these candidate variants and the read for the functional studies So our Sequel System is a long-read sequencing platform It delivers average read range from 10 to 18 kilobases We can achieve high consensus accuracy, up to QV50 This means 99.999% accuracy So it allows you to characterize

both the SNPs and also the structural variants The throughput per cell is about five to eight gigabases For each of the run, we can run up to 16 SMRT cells The movie length time, you can adjust the data collection from 30 minutes to 10 hours, and for those 10 hour ones, you’re basically maximizing your data collection to obtain those long-read lengths I wanted to take a moment and talk about the typical data that comes off our platform and what it means when you’re collecting a 10 hour movie, for example So with a 10 hour movie, this is the typical data that comes off the sequencing platform, off the Sequel And you can see that we can collect read lengths up to greater than 60 kb, with the top five percent of the read greater than 35 kilobases Half of the data tend to fall in the greater than 20 kilobase range So this is helpful in understanding that the data coming off our platform is not a fixed length of a certain length In fact, you get a distribution of read, and this will allow you to characterize a range of structure variation, which is important contributors to human diversity and to disease So I think we all can appreciate that variants can come in different shapes and forms Structural variants in particular can be difficult to characterize There are some examples of the type of structural variants that have been observed are repeats, so copy number variants that repeats itself or tandem repeats You can also see duplication, triplication events, and within those duplication, triplication events, you might see an inversion in one of the copies So there is a range of structural variants that can be seen Now, when combining our platform with targeted enrichment sequencing, such as IDT xGen Lockdown Probes, this allows the scientist to directly characterize the complete genes, so fully sequencing introns with exons You can phase heterozygous SNPs That allows you to really call out different allelic haplotypes You can sequence and characterize repetitive regions, which might be helpful for the repeat expansion disorders You can characterize sequences upstream and downstream of the gene to really get other question of the regulatory regions that are involved for splicing or for gene expression We capture insertion and deletions, as well as copy number variants With targeted enrichment studies, you can obtain high coverage for your specific genes or regions of interest across multiple samples The type of genetic variation that you can capture with a long-read, for example, if we have a 10 kb read or up to 20 kb with the Sequel platform, it spans most of the variants in a single read So we can capture the SNPs, insertions, and deletions It can phase heterozygous SNPs and map them to their respective alleles We characterize repeat expansion, and even get other question of copy number variants, as well as identifying the inversions and the translocations But going beyond the 20 kilobases or more, you can start assembling those reads based on the heterozygous SNPs and really reconstruct a larger fragment of your genome to call haplotypes and large structural variations out into the megabase range of size So that’s one helpful advantage of working with long-reads Another benefit of long-reads for the transcriptomic space is that we can sequence the transcript in its entirety That is, on the 5 prime all the way through the poly(A) tail So we can sequence the mRNA from the beginning to the end without having to assemble the transcript And this is especially helpful in getting the direct evidence of the splice isoform so you know exactly which isoform is being expressed in your sample, in your tissue, (mumbles) This is important because proteins and their functions are not only impacted by variants in their exonic regions Variants in their intronic regions, or the regulatory regions, such as enhancers and promoters, will impact the expression levels, as well as the splice isoform

So we do know that there is high transcript isoform diversity arriving from alternative splicing I think we’re only starting to understand how diverse that could be In my analysis of transcriptomes, I would have a conservative estimate that we could expect up to 15 isoforms per gene So to study those isoforms, I think it’s especially helpful when you can sequence the full length And finally, another benefit of working with the long-read is that the long-read allows for improved mappability So if you have a five minute, it will map better to the respective alleles So if one allele, for example, had one heterozygous SNP profile that defers from the second copy, as we know, there’s two copies in our diploid genome, then the long-reads are able to phase between the heterozygous SNPs and really map it back to the correct allele So that’s one of the benefits on the genomic side Now, as these get transcribed into mRNA and spliced, some of those heterozygous SNPs are retained in the spliced isoforms So, for example, in this case, this is color coded so it’s easy to visually see it mapping back, but when you’re analyzing the data, you have only the bases So in this case, we need to take a heterozygous SNP of G and A, which is not in the second copy of the allele, but in the first copy, we can specifically trace the transcript back to the first copy So I just wanna take a moment now to dive into how these results look in a real case scenario of a Alzheimer’s disease experiment So we wanted to combine both the genomic and the transcriptomic together And through the combined data, we believe that this will provide better insight on how gene expression could be affected by the variants or how the protein function could also be impacted by the variant So we designed a gene panel together with IDT to target 35 candidate genes that have been IDed in the past GWAS study and apply this panel to two patients with Alzheimer’s In this study, we captured genomic fragment size of roughly six kilobases And then we were also able to recover full length transcripts, ranging from less than one kilobase and all the way up to 10 kilobases So the subject used in this study was a male and a female subject We sourced the genomic DNA from brain tissue, as well as skeletal muscle just for comparison And the total RNA, we sourced both from brain tissues for the two subjects These were the 35 genes that were included in the panels Some of these are suspect and some of them were intriguing candidates So the workflow for genomic DNA capture, we start from genomic DNA And I just wanted to make a quick comment If you’re already performing capture studies, you don’t need to redesign your probes You can just use the same probe set and just apply it to this workflow What we highly recommend is to start from high molecular weight DNA, and then you shear it to a seven kb fragment In this step, we actually include the barcode adapters early on for multiplexing So these are ligated to the genomic fragment, which are amplified to mass and size selected The reason for this size selection is to recover the five and nine kb fragments and really focus the sequencing around the longer fragments There is work at PacBio to improve this and see if we can eliminate the size selection step while still retaining all the benefits of the long range sequencing So that will be coming up soon So the multiplexing actually happens before the capture In doing this, you’re allowing to capture multiple samples and save on the capture costs With the eluted product from the capture, you amplify to generate the SMRTbell library, which is again size selected to focus the sequencing on the longer fragments

And that goes down the machine and out comes the data for analysis We recently released a new GitHub software analysis workflow to analyze this data and to help phase the capture data into the different alleles So how this works is that you will take the reads off the platform and you map it to your reference genome, in this case a human reference genome And we can phase the data, the reads, based on the heterozygous SNPs with SAMtools and bin the reads into separate haplotypes From here, we generate the consensus sequence for each of the allele, and you have a FASTA file that can be ported over for tertiary analysis So, I wanted to walk through a few trace file just so you can appreciate maybe some of the key points that I like to keep in mind for sample prep as best practices So we recommend starting with high molecular weight gDNA That’s two micrograms starting material And we shear to 10 kb, and in this case, the shear came down to around 8.7 kilobases for the brain tissue and about 8.2 for the skeletal muscle tissue So we like to start a little higher, And then, after working through the workflow, you will end up with a size selected SMRTbell library This sit the template that we use for our sequencing platform And that ended up being roughly around five kilobases long The sequencing result, when you map the data, we ended up with about 7.4 gigabases of data for the skeletal muscle and about 8.4 gigabases for the brain genomic samples This is roughly two million reads each So at the high level, some best practice that I can recommend is you can save on project costs by multiplexing, and one thing that I talk about shortly is you can design probes such that they’re spaced up to one kilobase apart Currently, you can multiplex up to 12 samples And down the road, we’re working towards 24 samples We highly recommend using our PacBio linear barcoded adapters so that you can de-multiplex the samples High molecular weight is required Finally, we highly recommend size selection to maximize on the long-read recovery In terms of how much to sequence, we suggest aiming for about 100 fold coverage of the targeted panel size to fully capture your genes For a transcriptomic capture workflow, it’s very similar We start with our mRNA, high quality preferred, and we convert this to a cDNA library At this step, for the second strand synthesis, this is where we incorporate the barcodes That way, we can multiplex the sample early on These are amplified We size select, and this is optional, depending on your research interest And the size selected samples are captured, washed, eluted We amplify them, make the SMRTbell libraries, sequence on the platform, and then from the data, we can analyze it The analysis workflow is a little bit more simplified We take the raw data and we put it into a SMRT Link It’s user interface software that we provide with the platform, and here it will actually process the data and generate consensus sequence for each transcript isoform which you can then port over for your tertiary analysis Here is a BioAnalyzer plot for what a good sample would look like for this study We recommend a RIN number of greater than six These are measured by these two peaks that are RNA integrity numbered In this case, the two samples were at eight, so they were really great samples to work with Here is an example profile on the distribution of transcript sizes that you might see from a whole transcriptome So this is not with capture

And I wanted to make a commentary about the size selection So I mentioned this was optional, and the reason why it’s optional is, as you can see, the fragment distribution tend to peak around two kb, and that’s the average transcript range for the transcriptome But suppose you had prior knowledge of your genes of interest and the splice products and you know that they tend to be five kb long, or if you’re interested in a 10 kb transcript, that’s when we highly recommend doing size selection so you eliminate the shorter fragment and then you can focus your capture around those five kb or longer So the best practices to keep in mind, great to start from high quality RNA transcripts Size selection is optional, but very helpful for specific fractions We recommend this solution to characterize splice isoforms It’s not recommended for characterizing gene expression levels at this time That’s something that we are working towards So far, the preliminary data suggests that some correlations can be made to the gene expression levels When you’re designing your experiment and trying to determine how much to sequence, we suggest aiming for minimum for 30 fold coverage per anticipated splice isoform I think that could be difficult to quantify if you don’t have prior knowledge, so a good rule of thumb that I like to use is for each gene I might anticipate 15 splice isoforms, and I use that as a guide The probes to capture these transcripts can be designed to the exons only And you can also include the introns, especially if you anticipate long retention around introns in the samples Now, in designing the capture panel, I mentioned if you’re already running captures, you could use the same probes and just apply it to your samples with our workflow and we cover those long fragments So the key benefit of working with the xGen Lockdown Probes is that it’s flexible in its design So what’s great about this is because we’re capturing the 5 kb fragments, (audio cuts out) only need one probe for each fragment So for example, in this case, we can space the probes such that they are about 1,000 base pairs apart, and it would still recover those long fragments So you would not have to space them zero bases apart You don’t have to tail them end to end So when you’re dropping out the probes, this might be realizing some cost savings, which would be great, or you could also consider expanding your panel to capture additional genes, just redistributing those probes to expand the study You could also expand the study by including more samples because you’re reducing the capture cost This probe that you design for your genomic capture can also be used for your cDNA capture Now, to dive into the results of the Alzheimer’s panel sequencing, so at the high level, we detected a broad range of genomic variants from SNPs and structural variants So for the structural variants, we identified 31 unique structural variants ranging from 65 base pairs and up to several kilobases in size The deletion here So we identify at least 15 events that had deletions of greater than 50 base pair, and these fell into 10 unique genes We also identify 16 insertion events greater than 50 base pairs, and this fell into eight genes which had these structural variants On the transcriptomic side, we identify 500 plus isoforms from the patients that were used in this study So each patient have, patient one had 515, patient two had 507, and when you look at the overlap, it’s actually a small overlap that was shared among the patients and also those reported in Gencode, so only 39 isoforms So all in all, we identified 88% novel spliced isoforms For Gencode, we took a conservative measure of using only the transcript that had full length mRNA evidence

supporting the splice variants Just to take a few deep dive into a couple of the genes that were in the panel and the type of data you could see, one thing I didn’t mention earlier is that a gene panel, this particular gene panel had a mixed design So some of the genes were covered in full length, so from 5 prime all the way through to 3 prime with the one kb spacing, but some of the genes were much larger, such as RIN3 RIN3 is a RAS and RAB interactor protein It interacts with GTPases, and it is actually 178 kilobases long So rather than designing a probe across the entire gene in this scenario we designed it to the exons only So this is an example of what your exon only probe design will capture So as you can see, here’s the exon, but with the long-read, we can actually span quite far into the intronic region So in a seven kb fragment shear, you could span seven kb downstream and seven kb upstream of that exon So in this particular case, we’ve also spotted a 50 base pair insertion This insertion was in the intronic region upstream of exon six And this is just a zoomed in view We used IGB to view the data And this is evidence that multiple reads support this insertion event, which has built up the confidence that the insertion is actually there In the second gene that I wanted to zoom in on, this is a zinc finger of a CW-type It also contains a PWWP domain, so it’s a transcription factor And there’s actually not a whole lot of literature around this gene There was one paper published in 2010 It’s an animal study And we know that it’s involved with chromatin remodeling and methylation state So it’s somehow involved with epigenetics, but I don’t think a whole lot is quite known about this gene We also, some studies have shown that it’s expressing quantitative trait loci, so it could influence (mumbles) genes downstream which might have an impact on the disease state So, we detected through this capture study a 750 base pair deletion in both patient one and patient two And this is just immediately downstream of exon 13, so it’s between exon 12 and exon 13 And that’s the zoomed in view, so there’s multiple reads that support evidence of this deletion BACE1 is a beta-secretase It’s a peptidase, which means that it catalyzes the first step in the formation of amyloid beta peptide to form the amyloid precursor protein So in this example, I wanted to show that here the probes are designed to space across the entire gene As you can see, we had a long intronic region here And so we tried to pepper in a few more probes, but certainly, once you hit the really low complexity region, it might be difficult to place the probe But nevertheless, we tried to recover that region And so from here, you can see that we are able to base this gene based on the SNP profile of each of the respective alleles Now, the long-read spans, if it spans across at least two heterozygous SNPs, then we can separate out the reads So in this case, this SNP and these SNP locations were used to parse out the reads and generate the separate alleles BIN1, I wanted to show this as another example of how we could phase across a larger gene This gene is just roughly 63 kilobases in size BIN1 is very interesting It has several isoforms There’s adaptor protein It might be involved in synaptic vesicle endocytosis There are many isoforms of this gene, actually It has been identified in a variety of functions depending on the tissue in which it’s expressed So here, we’re showing that, based on the SNP profile, we can separate out the allele,

and in a minute, I’m gonna start showing some examples of how you might want to link the transcript back to the specific alleles MAPT is a microtubule associated protein And this is the result for one of the patient where we able to separate out the genomic DNA into allele one and allele two And what we found is there was a heterozygous deletion in this allele up here Now, this allele has been transcribed into 21 isoforms, but in this allele, we found that there was only five isoforms which mapped specifically back to this copy In this copy, there was about three, there were a couple new novel exons that was found in transcript isoforms And finally, I want to go back to the original, this original gene that I was talking about, the transcription factor It’s a zinc finger So I mentioned that the 750 base pair deletion occurred around exon 13 in the intronic region And from the transcript of patient one and patient two, you could see that one patient had retained some of the introns and some new novel exons had popped up, and so it would be very interesting to link this back to the genomic variant And what was also interesting as we look in the NCBI database for this gene, there have been transcripts where the exon 13 was skipped And so it would be curious to know if the skipping of the exon came from the genomic variant which change the regulatory spacing of it or if it came about during the splicing So in conclusion, Alzheimer’s has a very wide economic impact on the global society Today, from the GWAS study, there are over 20 potential genetic risk variants which have been mapped But these are only risk variants, and these SNPs are not typically true causative variants Combining gDNA and cDNA data can give you more information in really understanding the impact of the variant if it’s occurring on a genomic level or on the transcriptomic level and give you some clues for follow on functional studies to really pinpoint the cause and the mechanisms of Alzheimer’s So with custom IDT probes, that allows for design flexibility to scale your project and allows you to phase and really trace the variants back to their alleles So in the end, we do feel that this is a powerful approach, that structural variants can be more informative for disease diagnostics, for disease management, for treatment, potential treatments down the road, current SNP mapping and exonic sequencing And I’d like to thank you for your time I want to recognize some of the individuals that were involved in this work So Kevin Eng had done the genomic captures Ting is a lead in developing transcriptomic captures Liz and Aaron are bioinformaticians that will help with the (mumbles) analysis and the structural variant analysis, and they also developed the tools for this analysis We recently released a new tool on GitHub, and this was done by Billy, to phase those capture sequences into a different allele And we also released this genomic data set from the capture So you can find this on our blog if you’re curious to dive into the data yourself and to try out the tools and understand what type of variants have been picked up And for all our future updates on targeted sequencing, please do tune into our blog and the targeted sequencing application page I also wanna thank our collaborators at IDT, Kristina, Jiashi, and Mirna They’ve been very instrumental in the brainstorming and really figuring what would be best solutions for the research community And feel free to reach out to me directly with your questions, and I’m happy to give some suggestions or even entertain some of the ideas that you may have for your research – [Hans] Okay

So, with that, thank you very much, Dr Gu, for your presentation And we have time now for some questions So if you haven’t already done so, type those into the questions box in the GoToWebinar control panel The little square with the up arrow in it, if you click that, you can pop the box out, make it a little easier to type into it I also wanna take a moment here to introduce, we have a panelist with us, too, Dr. Nick Downey And Dr. Downey is a application support scientist here at IDT And he’ll be addressing any specific questions for IDT products or knowledge So we do have a couple of questions, and I’m just gonna jump into those here Jenny, the first question is about It’s basically about, how well do you think this sort of approach could work for something that has a GC repeat expansion disorder or something like? And then the example that’s given is C9 FTD/ALS I’m assuming that the FTD is frontal temporal dementia ALS – [Jenny] Yeah So GC We can characterize repeat expansion disorders, and we have done that either through this approach or through a new approach that have recently been published in Nature I think it’s about the past couple weeks There are amplification steps involved in this process that we are not able to amplify through those It might be a challenge, and we do see some dropouts, but we have been able to sequence through some of the GC rich region If that didn’t work, we do have a new solution in the works for Cas9 captures, so that eliminates the potential, the need for amplification And that Nature paper have actually characterized repeat expansions in ataxia, and there we can directly sequence the captured sequence and our polymerase should not have an issue with it – [Hans] Okay The next question is, what kind of clinical impact does knowing what allele the gene maps to using the phasing technology? – [Jenny] Yeah, thanks for the question What I didn’t mention is, for example, APOE is the strongest genetic risk factor, and when you have one copy of it, it increases your risk by threefold, where if you have two copies, it increases your risk by eight to 10 fold So the importance of understanding the allele is, say, for example, if you know twp SNPs are contributing to that disease, it could be that those two SNPs have a synergistic effect on the disease state if they happen to be on the same strand If they are on different strands, in some cases, it might not have as large of an impact So having that context really helps improve your assessment of the impact the genetic (audio cuts out) may have on a disease – [Hans] Oh, that’s a very interesting answer Cool The next question is, is the Alzheimer’s disease panel available somewhere? – [Jenny] Is that a question for me or for Nick? – [Hans] You could answer it if you know the answer Otherwise, Nick can weigh in on that – [Jenny] Yeah, so commercially, there’s not a product for the Alzheimer’s disease panel, but we are happy to share the BAT file that was used in the study so you can order the probes directly from IDT We released that BAT file along with the data set release and it can be found through our blog If you want the BAT file directly, feel free to email me and I’ll send it your way – [Hans] Nick, did you wanna weigh in at all on custom panels or anything? – [Nick] No We can order those through the website, or you can contact applicationsupport@idtdna.com and we can help make sure that that’s organized The web tool requires the sequences, so if Jenny sends you the BAT file, we can help extract that out and make sure the sequences are in the correct format – [Hans] Okay Great So I think that this is a similar question to what you answered before, which is, is the long-read sequencing

more robust for low complexity regions, Jenny? – [Jenny] Yeah The polymerase that we use in our sequencing is reversed for those repetitive element I think the trick here is that during the simple prop workflow, we do involve PCR Sometimes we may get dropout if the application is not able to sequence through those regions So yeah, it’s very similar to the earlier question We do have a workaround in the works, and that is for using Cas9 to directly pull down the genomic regions of interest that may have this repetitive element And we’ve successfully sequenced repetitive elements in FMR1 through ataxia (mumbles) So we’ve done several examples Filaggrin is another one where we have been able to sequence through the repetitive region And I think that one actually involved PCR, but nevertheless, we were able to sequence through them – [Hans] Okay I do wanna point out to people, if you look in the chat box right now, you’ll see that we sent out the link now The slide deck is uploaded to our SlideShare site so you can go check out the slides and comb through those later and study any of the details that you may have missed or wanna look at a little bit further There’s another question for us here, Jenny So is there any difference in knowing that a deletion happens in the genomic DNA or whether it’s from just alternative splicing? – [Jenny] Yes I think if you are, for example, developing a genetic test So if, say, the deletion happened in the genomic region, then you know you can specifically test for that But if the deletion that impacted a function did not happen in a genomic region but happened in a transcriptomic space, then your genetic test would not necessarily pick that up So you may need to design a different diagnostic to really test the variants that are being transcribed and spliced and translated in a transcriptomic and a protein space which may not be retained in a genomic space – [Hans] Okay So we don’t have any other questions at the moment I’ll give people a minute or two to answer that I do really wanna thank Dr. Gu for presenting today and PacBio for supporting this We’ve had quite a few discussions in the last month to sort this all out And also to thank Dr. Downey for sitting in on this today, too Dr. Gu, you had some closing remarks that you wanted to make earlier Did you have anything that you wanted to add? – [Jenny] Yeah, sure We presented this example in the Alzheimer’s disease space, in the neuroscience space, but certainly, this can be applied to other disciplines, so for cancer research, for example The long-reads have improved the mappability, so therefore, it allows you to really tease out, in that case, maybe, pseudogenes and genes that we have some preliminary data that show that we can differentiate between those So there’s certainly a lot of potential opportunities and new questions that can be explored for complex diseases here We are certainly happy to help answer questions, and so don’t hesitate to reach out to me And I really do wanna thank IDT for hosting this webinar It’s really been a great pleasure to share our work and our collaboration The landscape is changing very fast We have new solutions coming up down in the pipeline, so I do encourage those that are interested to stay tuned And thank you, Hans – [Hans] You’re very welcome Hey, one more question for you, which is, where can people find the protocols for this panel? – [Jenny] Yes So the protocols can be found on our website I’m also happy to shoot it over But I believe we will also have an outbound email after this webinar, and I’ll be sure to include the resources there – [Hans] Okay, great Okay So we don’t have any other questions at the moment and we’re nearing the end of the hour, so I think we can wrap this up Thank you, everybody, for attending today and for participating in the Q and A today Again, thank you very much, Dr. Gu and Dr. Downey for participating in the Q and A That’s all I have So we will be sending you links to the recording and the slide deck by email

Also, you have the slide deck available to you in the chat box until I end the webinar, which is going to happen fairly soon So click the link now, or like I said, it’s coming to you You can also check out our SlideShare site, slideshare.net/idtdna and find this slide deck and others So with that being said, thanks again, everyone Thank you, Jenny, and thank you, Nick, and we’ll be talking to you soon