hello and welcome to the Amazon Web Services partner webinar series our topic today features how to capitalize on big data and data warehousing with amazon redshift i’m your host Jerry Sullivan I’m here with a partner marketing team in seattle and i’d like to welcome you to today’s webcast couple housekeeping items before we begin I’d like to remind folks that we are recording today’s session and while your lines have been muted we invite you to submit questions any time during this webcast by simply using the Q&A widget in your browser I also like to let folks know that we do make our content available on the AWS SlideShare channel as well as YouTube and you’ll find our recordings two webinars there I’d like to let books know that we will take time at the end for a question so we do encourage you submit questions any time during the webcast I’m pleased to be joined today by AB dices who is one of our AWS consulting partners and we’ll begin with an overview of AWS and Amazon redshift and then we’ll hear from our partners at dices on big data challenges and opportunities as well as customer use case and how to maximize your business impact with big data and Amazon redshift so without further delay I’d like to please welcome our guest today a mark office who’s the vice president global services at dices as well as Sankar Nagarajan it was the cloud solution architect and Isis and we have Matt yanshan who is our ecosystem solution architect here at Amazon Web Services so with that take it away Matt thanks Cherie so before we get started and actually talk about redshift just wanted to speak briefly about the Amazon Web Services Platform Amazon Web Services is a scalable secure and cost-effective platform for running your cloud computing workloads this is a simple view of the set of services that we offer the compute storage and data services are at the heart of the offering we then surround these services with a range of supporting components like management tools networking services and application augmentation services all this is hosted within our global data center footprint that allows you to consume services without having to build out facilities or equipment there’s a lot of equipment powering these cloud services in fact every day AWS adds enough server capacity to power amazoncom in 2003 when it was a 5 billion dollar enterprise database services is a foundation service of AWS database service customers are offered a broad choice for their database platform enabling them to deploy the best data store for their use case all database services are fully managed which means they provide automated management of low-level administrative functions this enables developers and administrators to focus on application functionality and performance rather than on patching and backup maintenance elastic ash provides a managed memcache memcache d service for accelerating frequently read data by cashing in a memory RDS is the relational database service RDS provides a choice of database engines my sequel Oracle and sequel server and a high-performance durable configuration that can be provisioned in minutes multi AZ replication is a feature of this service designed for production instances multi AZ provides one-click dr to a remote data center for fast failover and maximum data protection dynamo DB is the default choice for new application development dynamo DB is a no sequel data store designed for fast predictable performance at any scale a feature of dynamo DB is the ability to dial up and down provision throughput in order to meet changes in the workload finally we get to read shift redshift is where we will be spending this session it’s a data warehouse that has been specifically in hands for the cloud and was developed after many customers requested a fully managed high scale option for analytical workloads so red shift was conceived with a simple set of objectives to be faster cheaper and easier to use than any other data warehouse alternative to achieve these goals every aspect of the services and optimize for data warehouse workloads with extensive use of automation iowa efficiency and with the extensive use of automation io efficiency is especially important for performance and eliminating unnecessary i oh that’s been one of the great focuses of the redshift team here’s some common customer use cases that we’ve seen so far redshift is a relatively young service having launched an early 2013 but already we’re seeing widespread adoption in a variety of use cases first and foremost customers are able to reduce their costs by extending the data warehouse rather than adding hardware to their additional warehouse they’re migrating away from their data warehouse systems in some cases completely and they’re able to really respond faster it all comes down to agility costs and ability to grow your capacity with little or no risk you can skate scale data warehouse capacity as demand grows or scale it back as you don’t need it you can reduce hardware and software costs dramatically by an order of magnitude and we’ll talk about

this more in a bit importantly I your performance can be improved by an order of magnitude as well by migrating to redshift and you can make data more data available for analysis here’s an example of some customers who were already publicly using redshift Airbnb has been quite vocal about both our Hadoop managed to do platform EMR and also redshift Airbnb seen a 5 to 20 reduction in query times and a 4 times cost reduction over hive on Hadoop that they were using previously accord in media has seen similar reductions in fact even greater reductions in query times and an enterprise customer nokia has seen a fifty percent reduction in costs and also a notable improvement in query times so again you’re seeing both cost reductions and actual improve is when you switch to red shift so what is red shift under the hood it’s there several ways in which we reduce IO and redshift first of all this storage is directly attached to the redshift servers and large block sizes of one megabyte are used also we use own maps to skip entire blocks that the data requested is not contained in that block as we may get it factor in redshift high performance is the use of columnar storage it’s a columnar database which is optimized for scan operations so call under storage is really different from Rose storage which is what traditional relational databases use instead of locating data in rows in the same location on a disk redshift keeps columns column data physically adjacent so when performing an aggregation on hundreds of million rows in a single query such as calculating the average age of everyone at a database for example wretched is able to return the results really really quickly one of the features that a lot of the customers I work worth are really excited about with redshift is point-and-click resize any of us who’ve worked with data warehouses will know that the resize experience can be well sometimes a several week or several monk permit process once you get that hardware to resize your data warehouse with redshift the resize experience is really easy and really fast planning a preparation as I said often takes weeks when it’s time to execute but with redshift you can scale up and scale down in a single click by selecting the resize button on the management console AWS management console the user is prompted to enter the new desired cluster details basically the instance type and the number of nodes you required and that’s it your cluster goes into read-only mode well while the cluster resizes and importantly the cluster stays online so you can resize a petabyte size data warehouse while remaining online and read only mode entirely the way this works is that a new target is provisioned in the background and you’re only charged for the sole source cluster while this is all happening it’s fully automated so the data is automatically redistributed to the new nodes after you resize as I mentioned it goes into read-only mode during resize and all of the data copying actually happens parallel no to notice it doesn’t have to go through the leader node basically what that means is it resizes a lot faster it’s more efficient and you’re not charged for anything that you’re not using once the resize is complete we do a quick DNS flip over and that way your customers have no downtime in your end applications can have a fully resize cluster to use immediately importantly ramazan red ships have security built in you can encrypt your source data and s3 using either server-side encryption or your own encryption keys when you transmit that data to red shift and parallel using the copy command all of that data can be secured using SSL in transit and most importantly when your data is at rest in red shift we use hardware accelerated AES 256-bit encryption to encrypt every piece of data in red shift so all blocks on disk in in red shift and on Amazon s3 can be encrypted and to end from at rest and transit to process also there’s no direct access to compute nodes so you can shield you can shield your data from any rogue operators that may happen in a security event and lastly it supports Amazon VPC and recently we also announced support for resource level controls so you can control very specifically what I am users what sub users and groups have access to reg of resources on AWS a big part of security is also well Dave there’s data security there’s encryption but there’s also backup in dr and how do you recover you know if you if your data gets corrupted or something happens and you need you need to get back to a previous state what’s great about redshift is we’re constantly replicating the replication within the cluster and s3 is happening all the time so backup sets through your continuous automatic and incremental they’re designed an s3 as a service that we’re backing up to has eleven nines of durability so it’s continuous and automatic and incremental to a service that has eleven nines of durability we continuously monitor and the the cluster so for example if you were running a data warehouse yourself and a node or a drive failed you would have to replace that potentially take your cluster offline with redshift all of this is transparent we automatically replace cluster nodes and failed drives transparently in the background and stream the data back from s3 as as the cluster recovers and we add new notes to

replace the failed capacity this is one more point to make about the current state of data analysis and this is I think we’re ready shift is really a game changer various analysts have attempted to quantify the gap between data generated by applications and data that makes its way into an analytical environment the general trend is that the gap is large and growing in the people simply can’t capture and process the volume of data they’re generating so red shift is specifically designed to eliminate two of the major barriers to broad adoptions of analytics effort and cost operational data often does not find its way to the data warehouse or any kind of analytical environment because the cost of the data warehouse prevents smaller less proven datasets from getting entry on top of this the effort required to bring data into the warehouse is substantial relative to a spurs eval you so redshift has been built in price to change this it’s allowing customers to close the gap and start analyzing all of their data redshift is built with a focus on performance price and ease of use the XL nodes provide two terabytes of storage and cost less than one dollar an hour to get started if you need more than a single node the single node price is simply multiplied by the number of nodes provisions and there’s a linear pricing you only need to remember one price you can also reserved instances for one or three years and save over forty percent off the online price for a one-year reservation and over seventy percent with a three-year reservation reserved instances apply all of this translates to less than one thousand dollars per terabyte to keep all your data online for a full year redshift as a service integrates with multiple data services so it does not stand alone it’s as you saw the beginning we have a rich a broad and deep ecosystem of services that we provide as part of the Amazon Web Services Platform it has direct i mentioned parallel copy command from s3 and also from dynamodb you can also integrate it with other services running on our cloud computing platform ec2 EMR and redshift are often complimentary a lot of people are doing for example ETL work on a MapReduce cluster and then pushing that data into redshift via s3 i mentioned s3 a lot it’s a simple storage service with eleven nines of durability a great place to not only backup your data but also to stage your data for example between an ETL process and a load command into redshift or for long term backups before you you possibly migrate that data to glacier for long-term storage lastly it’s all of our services are designed to work in a hybrid environment as well if you have some or all of your data or some applications running on a corporate data center you can use services such as direct connect an AWS Storage Gateway to migrate that data or to leverage that data in a cloud environment like AWS the end redshift accelerated transfers through Storage Gateway to s3 for example or direct connect to provide a bridge between your on-premises applications and your data warehouse was certain the cloud on AWS so I hope that was a useful overview of AWS and redshift and with that I’d like to pass on to our partnered Isis thank you Matt good morning and good afternoon my name is amir abbas i am the vice president for global services at dices I walk you through some of the context around big data and data warehousing and what we’re seeing in the market and with our discussions our customers in vanilla my colleagues anchor to talk a little bit about or the technical impact of bread ship on and how we’ve sort of architected some of our engagement so participate in production to dices we’ve been around since the mid-90s our core focus is working with companies to provide technology solutions and services with the extreme emphasis on optimizing their operations cost and expense of repulsive looking to see what are some innovative ways that we can help our customers reduce their costs and that has sort of led us into a strong partnership with Amazon we are amazon consulting partner and have been for several years primarily because the solution set helps us provide a pretty strong value proposition to customers or we can help them optimize not only their operating expenses for technology but really help them speed up and accelerate how they can prolong unit location that can impact their business we are about to post an hour on 4,000 employees distributed around the world fairly large footprint in the United States and then abroad in South America and Brazil in India Singapore Malaysia Thailand and then over in Europe in Switzerland Hungary and product our

focus around big data services really helping customers especially through the life cycle of their data analytics process so we will have the customers with initial technology assessment evaluation look at sort of appropriate use cases and how we can define proof of concept developed sort of an Amazon based infrastructure and that could include technologies such as redshift Hadoop and EMR implementation so look at what is optimal ways to leverage at three and then get a little bit more deeper into the actual application that our customer most interested in insert looking at for big data optimization and get a little bit more detail into actually looking at the queries design and optimization develop sort of the analytics framework as well as sort of visualization framework as well we typically build our solutions around the core Amazon platform as well as some hybrid solution we are sort of tool agnostic so we work very closely with some of the visualization and analytics provider in the marketplace today we have invested a fair amount of energy in specifically the public cloud from Amazon developed a cloud enablement platform that we have a call debbie called zero which is actually helps us in help our customers leverage to various amazon efforts quickly an example of where people are using surveys then they really want to optimize an orchestra complete deployment of friendship and DMR clusters virtual tour comes in and helps our accelerate that process the region liners actually sir has been a fair amount of time and energy and investing in big data and answer the surrounding technologies is primarily because we think that there’s a huge opportunity within our customer base follow me to help them unlock the value of the data that they have but really sort of take that an impact their bottom line as well that’s a couple of sort of you know substantial points obviously they are seeing your data grow at an exponential rate you know some of the for the numbers out there that are public you know Facebook sort of generating almost 40 foot serve bytes of data for day we’re seeing some of the academic institutions with massive amounts of data and then so the guy has actual projects out there and transactional businesses such as a New York’s watch chain in also generating tremendous amount of volume per day the other sources of data that are sort of embedded within the organization and sometimes the enterprises are not fully leveraging them sort of or what we call the end of the dark data and that should be in historical gmail historical unstructured data that if there was a way in mind the companies that actually live recept we’re also seeing a lot of data actually being generated from sensors you know we work with a couple customers that have a generate tremendous amount of data from get somebody meters water meters around the world and that needs a sort of almost real-time out processing and we’re also seeing you know a lot of sort of public data sets being being published in that you see a lot of that in the health care and in the bio industry where a lot of Public Health Care’s or actually being sort of public data sets are being probably someone amazon others other repositories that can be available for utilization by different organizations that we studied and probably about 18 also months ago where you know Mackenzie had actually looked at a big data across industries n I you know what they found was a pretty much every sector had tremendous amount of data you know 200 terabytes of data was pretty standard across most of the industries and one petabyte I was very everybody with training so this is a probably 18 month also these numbers are probably much more realistic or even bigger than that then what they found but what what was actually more interesting in in that study was that something that they sort of the finance a big data values potential index we fishing to say hey you can unlock the values of the big data and in the various data passage inserting add you can actually see a dramatic improvement

in productivity within your enterprise as well as within some of those large sectors so in summary is the output of a lot of its research was you know everybody has tended to the data and there’s a minute amount of potential in unlocking the value of that data some of the use cases that are out there and I know Matt section of you would be here we found a lot of time with customers that are in the financial sector and talk a little bit about that too or gender this presentation where they are getting lots lots of data that needs of processing very quickly we are to engage you know in certain traditional ecommerce platform their website space analytic really are being mined together with some unstructured data that they may have from by the social networks or other sources so we see a lot of opportunity and activity in that space insert the telecom space so we see a lot of ways that people are using telecom data for front detection and may or may not be directly with telecom religion could be could be with sectors as well so for example in organization using telecom data to detect fraud in sort of the insurance healthcare outside as well the challenge that sort of historically is always existed is really they’re just been tremendous amount of bottlenecks in processing big data so whether that is really from a processing capacity where parallelism in a really sort of did not connect cut the villain and solve problem of getting the data process the other one was where you actually if you actually didn’t do a lot of distribution your failures the pretty high level failures that occurred in distributed computing and you cannot actually process those sufficiently and then again you know you mentioned earlier the amount of data is sort of not static so it’s purely increases and you don’t necessarily sunset older data you typically are doing analysis across a sort of both real-time data and historical gear so scalability in sparkly has become sort of a big issue as well let’s again you know problems with the traditional models including was that you did you all a very selective processor bound and when you wanted to relieve the restriction of the processor you enter distributed systems but then you saw raw radish the problem or we know how quickly can I get in a data from one location to another Hollywood sort of effectively distributed in an odd way keep track of a vision sensor in some of my distributor nodes and have the appropriate feedback loop in places of so some of those issues sort of have historically kept us from unlocking the value of data and then the other ones are was then insert a circle context we’re doing data warehousing and you wanted sort of analytics are on there typically wasn’t best oriented process and the number of reports you are running early in hundreds you know perhaps some ad hoc reports and the user base is actually pretty pretty small plane forward oh and what would you see in the more different dollars you know people want to be able to continue single data into your data warehouses do it extremely fast and then follow that smooth to the end users very very quickly then at the same time you want to make sure that they’re appropriate business intelligence and analytics applications that are completely integrated to the data warehousing platform so that the customers and the end users connected leverage that data in real time and and in Medicaid for big make sort of dynamic business decisions so the poor you know aspect if it is how do you sort of speed up getting the data into the data warehouse and then at the same time make sure that you are actually able to process it and then I get it packaged and delivered to to the customers and to your constituents the answer listen being at the end result being that really want his last the decision making time as as well we’ve actually been working in events for various customers as I mentioned earlier into walking them through the various or challenges with you know aggregating data and then so getting it assembled

and analyzed fairly quickly and what are they very interesting I engagements that we had the unfortunate ability to work based in was looking at you know one of the four issues that are led to the financial distress that we had a couple beers ago around the mortgage industry and there’s an incentive data that’s associated with each or mortgage that is actually should end user than their the whole lifecycle that’s associated with you know keeping track of that market and also keeping track of it as it gets security ID impacted and then sold to investors and one of the reasons around you know broadly around some of the distress that we had in the mortgage markets work that the visibility will daughter on what Coach K mortgage-backed security we working as one of our customers to do a fairly complex proof of concept where we kicked in in extremely large data sets that are associated with the mortgage industry we get them sort of input it into a data warehouse but what’s really interesting about the data warehouse is that it is certainly true me dynamic so that somebody’s credit rating and changing out in you know out there we can actually sort of map the effects of that into each individual mortgage-backed security so you get a real-time view of what’s really happening with those assets so it’s pretty interesting stuff like problem we can get the opportunity work closely you know with our four friends amazon as well as our end customer on that and we actually serves all differently you know a high level sort of framework and deployment orca texture that would based on using pretty much all the different components of amazon so YouTube dynamodb at three obviously redshift and then also utilize jumper soft integration that exists with a registered for visualization so in this case you know tremendous amount of data was actually sucked into the system is employee encrypted put into red ship and then very complex analytics we run in it Jim sign and then at the same time as we were sort of getting in newer data or and getting in changes to existing data sets overall like insert big and analytics that for being generated so fairly interesting use case and repetitive community work with similar ones that are more time to go critical ecommerce and elevate stays low I’d like to actually turn this over now 3some term can talk a little bit about more in a technical detail around this particular use case that we implemented search center please take over thanks summer good morning and our good afternoon everyone so I’m going to touch upon some of the technical aspects of Rick shift and you know how we held the our customers succeed in there you know the data processing with rich shift and you know the VA tools so as you you’d have inductor now direct ship helps the design and build in a very large scale data warehouses on demand right from you know gigabytes to deliver h2p debates and it is it has a high performance it’s faster cheaper and easier and in all the best thing is that it is fully managed which doesn’t require a lot of resources to set up and provision the hardware and software that is required and one of the things that rich ship shelves is so heavy lift large data scale processing where you have a higher flexibility you know for example you can choose a two terabyte single node to up to 1.6 petabyte the cluster and you can resize that depending on the changes to the alright looks like a I think we lost

synchronous oh let me sort of continue with the presentation second is a center mentioned in this particular case we actually utilize some of the higher-end clustering that was available to us and deployed the system monitor included some of the ages 1 plus rating systems as well as they suspended cell system with fairly large storage assigned assigned to it begin the advantage you know for us here was that as we were getting in lots of different areas with changing dynamically so we were actually able to in real time resize the systems as well in certain cases we actually had you know the necessary started out with two or three mortgage servicers data that is coming in and then we model sort of what would happen if we actually had for the full complement of the mortgage services providing it to us whether it was starting off the two and then extending it to post about seven or eight ambience of dynamically grow the sort of potent friend actually really was pretty effective in ensuring for the value of pressure to us and again the data set was extremely dynamic to as you can imagine if there are probably about five or so million mortgages that are issued out in just the United States alone and typical life cycle mortgage is about seven years before it so recycles so you know 50 million mortgages times seven your 35 million mortgage data that is actually being modeled in and assembled at the same time so the scalability component of that typically becomes very very important and as we sort of proved in in this particular that the GUI processors cave or something was free so in bmt to address the issues of sort of large-scale high volume processing that we encountered the other sort of important aspect was you know it’s great to be able to do a lot of this fancy stuff that can you actually do it you know fairly cost-effectively and again this is sort of a the advantage of working in in on demand environment typically comes in sort of a couple slices of men had mentioned that you know you can get about it terabyte sort of words of data warehousing for about a thousand dollars a month I was really sort of a good good benchmark to work awful that was really sort of interesting with it you know all the other sort of associated implements and associated machines that we needed whether to DC to whether it was somebody at three bucket I mentioned DynamoDB all of those because they were fairly dynamically used him for the exercise of processing and in sudden down the overall sort of cost was actually maintained for this particular initiative or any sentiments were if we had actually done something similar using sort of the traditional model or cost but it would have been about 19 or 20 times higher than they were to really utilize the on demand solutions from Enzo the other component again that was that came in that was very important was because we could resize the clusters pretty dynamically we were actually able to can do a lot more search to be analysis as well so there’s fair amount of algorithmic work that was done in the back end and you know constant tweaking of a sort of algorithm to figure out who was the appropriate risk for a particular mortgage-backed security and a lot of those algorithms are being fairly complex required in different types of system and computing capacity so that

was actually also handled pretty seamlessly with in direct shipped environment as well say and then you know we also were able to not only sell all the data that actually get it available in so simple visual or the ability for our customers as you simply visualize the in this particular case we actually used Google ization that was available from jaspersoft that has some integration with Amazon in friendship and again this was really really could actually show some of the end-user constituents that might be sitting at the desk and looking and utilizing data on the day-to-day job that they now have the ability to really slice and dice it in many many different ways and so be able to do it without actually having to submit requests for ever purports to their IT department so as you transferred a fair amount of capacity to have to utilize this data to the add to end user and constituents than we had or then they have access to in the past so finally I think from a overall summary perspective you know we we talked very much about Catholics savings in this particular case because it sort of the integrated you know licensing both from a harder perspective as well as some VI to licensing that was available there was really no capex investment period was entirely objects oriented and then the denser the office environment because we could scale up scale down pretty much on demand that was actually kept at a minimum finally from a Productivity perspective you know we were actually able to not only assess sort of very complex risk models associated with mortgage-backed security but really were able to pass it along and pass that information along to the end user community within the financial service organization that could then leverage and benefits ambassador it didn’t have and again it a little bit early in a sort of building Elvis completely but you know from what we know and what the early indications are that it does have and it is having such a direct impact on how sort of the future of mortgage-backed pretty look like at these in the United States if not in other parts of the body so let me sort of close the session and open it up for questions with this final slide we’d love to so have the ability to chat with with you folks one-on-one so please reach out to us and we can sort of walk you through a lot more detail on what we sort of have learned in this process and actually sir give you some more detailed generals as well we’d also like to welcome you to visit us at the Amazon read mid-con font friends is coming up in early November we are booth 78 a lot of people that are on this presentation will be there as well and let would look forward to speaking to you as Syrian mentioned this presentation I will be available if you do have any question again that I reach out to us directly so with that let me pass it on two or three Thank You Omar and thank you center and Matt for providing us with insights into how to map mais business impact with big data and Amazon redshift folks if you joined us on surf midstream during this webcast we do invite you to submit questions and we’ve had some coming in so I’d like to open it up and start with the question is and i will point this to matt or to amar and thank our if you’re back with us feel free to chime in as well the question is is the leader node involved in the data replication and failover of the underlying compute nodes Cherie I can take that this is matt so the leader notice for receiving client queries and for executing the query plans and it’s actually the redshift managed service that takes care of the data replication for your data warehouse cluster and the backups etc it’s also worth noting that you don’t pay for the leader node it comes as free as part of the service you only pay for nodes in your cluster that

actually store data so if that’s a that’s just answer your questions replication is a service level feature not directly tied to your leader node great thanks man omorrow I’ll point this question up to you because dices provide DevOps for cloud computing yes we work with our customers to provide we managed services and solutions so you know for helping customers leverage AWS we also provide 24 by 7 a monitoring management and support as well okay great thank you another question that’s coming in and this is a question for diocese so I’ll appoint this to you amar can you help with the initial migration as people are migrating to the cloud is that something that dices can provide service for yes so we can recognition earlier we can certainly help our customers you know work for them to understand the use cases and then initiate the migration plan and execute that as well perfect okay great Matt I’m going to point this question back to you when you started your presentation you were showing a diagram web database services and can you explain sort of the in how FA p Hana is works with that database services for Amazon sure so the slide I showed with RDS redshift dynamodb and some other services showed our managed database offerings so you’re free to run pretty much any database you want on ec2 which is our VM service our cloud computing service including a SAP HANA and we actually offer SAV hana by the hour are there the Hana one product for a 99 cents an hour there’s also a number of other SI p certified products on the platform we don’t offer a managed SI p hana offering at this time so when you run it you can run it and manager your cell phone on the ec2 service perfect thanks man that can you also a point this question back to you and mr feel free to chime in how does red chip cost compared to what cost would have been if you build out more of a traditional on prem data center can you maybe allude to the pricing for a redshift sure i think amazon pricing at one thousand dollars per terabyte is very very competitive and quite disruptive it would be difficult to replicate this in a traditional sense using other data warehouse alternatives we worked very hard to not sacrifice performance and to have great performance and to upgrade scalability and a feature set while at the same time keeping the cost very low right and another question up coming in Matt is done around sort of scalability but can you use red shift for smaller workloads absolutely you know we have that’s why we offer a selection of node types we have the XL no type in the eighth XL so it depends on your definition of small of course but we certainly have customers running smaller clusters and for smaller data sets and it’s still definitely very cost effective for for small work clothes as well there is always a cut over points and we offer that’s why we offer a variety of database services will find that some customers may start with RDS for example running a Oracle or my sequel or sequel server database and may outgrow that and then move to redshift but the same could be true in the other direction you may start with redshift and then realize because of the nature of your data or the size of your data store that perhaps it makes sense to move to a relational model and that’s where we work with partners like dices and the AWS lose an architecture team to help guide customers to make the right decision right thanks another question um and that alcohol points us back to you is that you have a test area where folks can check out sort of their data and how it converts into redshift maybe this is something dices you can share as well in terms of this might be a POC project but it might be a test yeah right well the great thing about red shift and the Amazon Web Services Platform in general is that it’s pay only for what you need and pay as you go so you’re free to build a data warehouse to test and in a sandbox style environments you can lock down that test environment by running it in a VPC so that it’s fully secure and at the same time you don’t have to commit to any long-term pricing or costs you can run a test for even an hour and see how it goes and do some experimentation we have a lot of customers also running cluster

clusters and perhaps a production mode and then have a parallel cluster running where they’re trying new schemas or new queries so it doesn’t affect the production workload and that type of flexibility is great on redshift not only the ability to resize but to run whole parallel data warehouses where you learned how to structure your data so that it operates best in production once you’re ready another question coming in is given that red chip expects table structures to be modeled and etl changes done before being able to copy your source files with new attributes what are the best practices to work around this limitation in and bring down latency that’s a great question so redshift has an ever-increasing set of functionality to actually do some elt in the data warehouse itself however it also is compatible with a number of ETL tools pretty much anything that can work with jdbc or ODBC or ideally natively with s3 so you can take advantage of the parallel copy and if you do for example load your data and then you realize that you need to do some transformations transformations after the fact that’s no problem there’s some commands such as create table as or you can essentially create temporary tables with the new schema or you know new formats and data types that then you can merge or replace the existing data with no downtime so again there’s a number of strategies whether it’s a new load from s3 with a newly transformed data or an in data warehouse transformation and again depending on how much data was talking about and what kind of transformations we’re talking about how much load for example it might put on the cluster or maybe it’s more efficient to do it outside of the cluster that’s something that partners like dices and solutions architects on the AWS team can help with Thanks oh it’s bedtime for just a few more questions here and but do keep your questions coming in if we don’t get to them during this webcast be assured that we will follow up with you directly the next question um Omar I’ll point back to you as despises offer support model for cloud environments I yes so we have a very different type of support models they can be sort of monthly sort of ongoing relationship recurring revenue type relationship that you can paradise where our teams are fully integrated and managing the solutions that for you or it can be more fun on demand sir model where you serve interact with us when you need help and until we provide that up and on demand basis weight and to that point on mark can you also explain to folks how you support teams in multiple locations across the globe so currently we do have followed son model our teams are located in the u.s. in Egypt cific in South America and Eastern Europe and they sort of work either or all of them on a 24 x seven bases or we will actually have a true follow the Sun model where our team the world with your counterparts or your teams in asia-pacific and then sort of workload come there will pass on to Europe and then on to Asia and then for the cycle starts all over again right Matt I’ll point this question back to you I’m can you elaborate on EMR and and talk a little bit more about that service did i lose you Matt sorry I was talking on mute I mean it’s a classic mistake so EMR is our managed MapReduce service and it it supports a number of different types of Hadoop frameworks at this time such as Apache Hadoop for a couple versions of map are so just like red shift takes away the pain of managing a very large data warehouse cluster EMR takes away the pain of running for example a large to do cluster you can also do some more forward thinking things like or rather future technologies such as spark shark you can run that on EMR as well supports hive and pig and all the classic tools you’d expect to see in a doob deployment and it’s very complementary to red shift and a lot of people use them together because for example using MapReduce for large-scale ETL can be very good and

because it integrates directly with s3 just like redshift does it makes it a great tool for for managing and transforming data at scale it’s also used in a variety of other contexts of course such as genomics or advertising and many other services so I see EMR and Hadoop in general and redshift is highly complementary on the AWS platform I think well with that folks I’d like to just um flip back real quick to a slide of how to connect with the folks at diocese and we would like to I’d like to take a moment to just think I’m our August and sent Gartner ranjan with dices our consulting partner along with Mattie engine our ecosystem solution architect for joining us today and I’d like to invite folks on please do reach out to our partner it dices and learn more about how to incorporate redshift into your Big Data projects and with that before you jump off we do appreciate if you’ll just take a moment for a very brief survey we appreciate your feedback very much and thank you again for joining this Amazon Web Services partner webinar series today thank you