this is a talk that gave several weeks ago in Atlanta our conference called enterprise data world and the organizers had asked me to talk about graph databases and simple stores for an audience where 95 percent of the audience was kind of not very familiar with the graph databases so I looked at the people that logged in and about half of them I know for sure know already a lot about this topic but you still might find some cool arguments that you can use in your own work environment and the organizers gave us this opportunity to have a webcast around the same talk that I gave there ok so the talk is about graph databases triple stores and how people use them and I’ll talk about one minute about France then kind of holistic overview of what a graph databases and then a typical example then I’ll compare a graph database to a triple store and then I get into the meat of the presentation as where do people use graph databases and triple stores I’ll give three examples and then once you’ve gone through those we can get back to the question why do people use graph databases and then one thing there was a fairly important for the people in the enterprise is how do you actually get a graph out of your relational database so this is roughly the overview of my my talk I’ll try to keep it within 30 minutes okay so France Inc company founded in 1984 most of you might know us from France list for the last six seven years been totally focused on semantic technology on our triple store and on our professional services around semantic technology we come out of Berkeley we’re currently located in you know Oakland ok then I was in a conference and antics playing with a graph databases and you can try to do that’s very complicated but I’ll just I kind of assume that almost everyone in the audience has a computer science background and people’s know about graph structures they know about nodes and edges and profit so I usually give the simple example of politicians and the committee’s they’re on and how you can look at that those those people in those committees as a graph and most of the time we also give a demo around this but we’ll see if we get to this today yeah so graph database there’s a whole bunch of graph database projects out there like a grandpa swarm but most of the audience will know about the other ones too and then the first thing is so how does a graph database actually differ from a relational database so a picture that I’ve been using for a long time now is where I explained that if you want to store in a relational database information about people you want to represent that a person has been to multiple schools might have had multiple spouses in their life multiple professions and multiple places where you lived then before you know what you need about eight tables to represent that yeah but the same information in a graph database so this is for the same person – you saw in the last picture but person one – and you see how you can represent the same exact information directly as notes and links and properties yeah fairly obvious to our people in this year this way by the way on the left hand side you see the name of the predicate and I’ll get into this user interface a little bit later in this talk so how is a graph database different from a relational database and here’s my my four ways to express that in the first place there’s no schema in a graph database you can say whatever you want to say obviously if you like a little bit of order in your life you might want to use an ontology to be more systematic about it one-to-many relationships are directly encoded so you don’t need any link tables which in many cases make lives easier then in graph databases and terpil stories you see you don’t have any indexing choices everything gets indexed for you anyway again this makes life easier and a lot of queries will work fast straight out of boss and then finally a graph database is a very low level representation you can express any other type of database in a graph database whether it’s and Hadoop a big data database or whether it’s a triple store or whether it’s a relational database it can all be modeled as notes and links and so what we see that we take we do a lot of rows and columns from relational databases we do a lot of transformations from XML into graphs obviously we deal

with RDF owl and finally half of our work consists of working with text where we take entities out of text and then return those entities into triples and stores it in a graph database yeah so a very short difference between the two and so now how it’s a triple store different from a graph database well I’d like to explain that by saying in a graph database you have something that’s mostly very local to the application or the database that you have you have nodes and links but they only mean something within the context of that graph triple stores and RDF is different because you expressed the graph as triples with a subject predicate object and optionally important element and where each element is a persistent you arrive here you see one triple here yawns talks it Atlanta each of them is a persistent URI and you could have somewhere on the web more information about each of these three you are eyes and this is it’s extremely powerful because what it allows you to do is to very easily link data sets together just by virtue of having persistent your lives and the whole web of data or the linked open data cloud is based on this principle then the other thing that ripple stores are graph databases is that there’s a whole body of standard surrounded all done by w3c and we have got our D of s who puts a object layer on top of the triples with all that as a logic we have a query language called sparkle and then another difference between a triple store and the graph databases that most graph databases live in memory as soon as you get out of memory then performance being drops down triples stores a little bit more hybrid there’s much more time spent on the query of devices so that you can roughly the same performance from a relational database for certain type of aggregate queries yes so this is how a clip will store different from a graph database and then what I’ve learned by now is that instead of talking about triples and talking about path databases it’s so much more illuminating to talk about examples and to give a demo so this is a demo that some of the people in the audience have seen already but I’ll like to do it anyway so here’s a demo that is based on the linked open data cloud kind of assuming that everyone in the in the audience is familiar with this new movement where people put databases on the web and publish them as triples and this is the picture of how in 2007 we already had several databases out there expressed as triples so for example the dbpedia as the RDF version of the Wikipedia geonames a database for seven million places on earth with edited longitudes everything is linked together so the city like Oakland will have in dbpedia will have one triple saying Oakland has geonames ID and then you get a number and then geo names will describe all the geospatial characteristics of Oakland so this was 2007 then this is 2010 and Scotney were bigger now in 2012 this billions and billions of triple from the area of in pharmaceutical in the life sciences a lot of in the area of publications this is all multimedia data this is government data this whole thing is probably around 40 50 billion triples right now and the demo that I usually give is about five databases that I download from the linked open data cloud five life signs databases one is a database at four thousand diseases water is a database with the medicine that you buy in the pharmacy there’s about a hundred thousand clinical trials eighteen hundred fda-approved drugs in the day in a database called drug Bank and then the database of side effects yeah and you all can download them freely from the web import them in any any of your graph databases or triple stores and let me show you how we work with that so let me see so here is the GUI for our triple store it’s called gruff freely downloadable let me take this away usually I’m in an audience and I ask people give me your paper drug but give them that you’re all muted let me come up with a drug yes so let’s say I do something like so all the triple step so

basically what happened is that I I can create a new triple store yeah and then I created a new triple store and then I can load triples and I could say look in triples and I can load them from the web I can type in a URI and then triples will be downloaded straight into this tool but I already have done that for you and everything is in free text indexed so let me looks like something like cancer and I Bui profam and I can look at clinical some cynical trials and so here you see three of the trials that discussed both ibuprofen and cancer I can double click one and now I see the table view of my triples so you see the dis potential trial number three two oh five four examples have discusses one disease which is breast cancer discusses a whole bunch of drugs and then a whole bunch of side effects so this is one triple T 205 discusses the drug say codeine I can click on codeine now I’m now I jumped from the clinical trial database into the drug database and here we see the chemical formula the blue part name we can see the mechanism of action the pharmacology etc we can look at other clinical trials that then discuss codeine and we can go on and on and on but let me get back to the graph screen and let me take a few steps back to get to this point so another way to explore a graph database of triple store is to look at some of the predators that you want to explore on screen so what you see here is all the outgoing and incoming predicates for everything the boss on the screen so far but we just choose a few of them just the diseases drugs side-effects and targets and I could click a letter here and I can choose a few things I want to discuss and I see that all the clinical trials are already connected to some extent and then I could click on another one and I could go on and on and on and get more and more information on the screen I can right click on something and I can say as a subject say the official title yeah so I can explore the graph officially I also can ask for the connection between elements so let me go back yes so here I was and I can say give me all the links between this trial and this trial through the poor predicate that I’d chose earlier yeah and so I can look at this then I can take any other drug or any other phenomenon so let’s take say a drug like MDMA it’s just another name for ecstasy yeah I can take something like this and first I could look at the triples so there’s some drugs cost some side-effects some diseases where people use this and I can say so how does this trial relate to a trial about ibuprofen and cancer and I like the system think I find forty one seven thousand or forty one thousand pass between it so I can look then on the screen to fight some shortest path could be cocaine methamphetamine diabetes etc and so basically what I did now is I let the database find links between two completely different types of clinical trials through a set of predictors that I find important at some point ATAR yeah so this is the graph view then the semantic web grounded well have has a special query language called Sparkle again most people are not deeply familiar with that already yeah here we see trial where we know here we see a query a sparkle query where we trying to put every drug in side effect trial and title where there’s a drug with the domain zubatov and a side effect but the name type two diabetes and then give me every trial that this is both this drug in the side effect and I can do the query and like in a relational database I could get my results back but the process is a graph database I can also show it to results as a graph yeah and so here we see the graph representation of the results of this particular query and then one thing you can do is you can also look for similarity between objects

so I can for example say given this particular clinical trial find all the other clinical trials that have roughly the same diseases side effects drugs and targets yeah so this is kind of a cluster analysis and I can do the query I get some results so I find that trial seven 300 to eight has eleven things in common with file one 300 nine one I can look at the graph and then I can say so how are they actually related and I can just take any two of them but yeah so I can keep going now writing the square is by hand it’s kind of hard so we also give you a completely official way to do this so say I want to find clinical trial to talk about morphine or and a gene called cytochrome C 3 then I could do it this way I can say well that’s try the point morphine let’s get a clinical trial and the database of course already knows all the predicates that lit that link to the object morphine so do give me all the predicates in the system that point to it so I said well I’m going to look at a drug and then I can look at say a target or gene and I can say give me all the links the point of this to this target and I can choose to create a sparkle of prologue query but I want the distinct results how many results I want but I can do the query and here we see the query that was created and here you see the results after very few of them nearly you see the result of this query that I just did that I built officially on the screen it’s kind of interesting to see what people don’t realize but in the graph database a typical graph database you said you get to see the name of the objects here but for us please look at the bottom of the screen see the full resource so what we try to do when we just placed up on the screen just to make it as user friendly as possible I can I can hit the key control f8 and then you see that this is actually what really happens in the triple store you get all these long URLs and notice that actually it’s not even a name of a drug in this case with just a number so what we also do is instead of showing the URL for the southern number we look up the label for this particular instance to show it so we do a lot of things underwater to make it easier for you to see your graph okay so this is a quick demo of a graph database and in that kind of show how this is a triple store by actually looking at the actual triples and the actual names of the predicate but again this is too painful for human beings so we make it very easy for you alright any questions so far okay then let’s continue so where do people use this stuff well what you see is that the intelligence agencies in the and and the DoD are very interested in this these are customers from Brahms but I know that all the others craft databases are also completely engaged in the intelligence community so there’s a there’s a big interest in that area and then the commercial world now also is getting deep into RDF and graph databases and it’s well we’ve done projects for each of these companies here a lot of top 500 companies there’s this pharmaceutical industry there is hospitals like Mayo Clinic at MD Anderson there’s media companies like Kodak an Adobe etc etc yes sir all over the place there’s a need for graph databases and now I’ll get into the why at the end of this talk so I’ll give three examples of where people use a graph database of different story the first one that I wanted to show to these people in in Atlanta there were Enterprise architects was that you can actually use this in enterprise for business intelligence so I talked about our customer M Docs that build a telco platform that knows almost everything about every customer in a real-time that has now proved to save about 20 percent of the total cost of customer care operations you talk about a new thing that we’re building in France is a supply chain management system that once early for disruptions in the supply chain and we did project that the partner top columns that built a reporting platform for 31 oil

companies and I’m going to say a little bit about each of them so the first one is the system we’ve built that M docks where we took information for more than 40 different databases and turn that into high level knowledge about customers yeah so normally I mean there’s an enormous amount of information about a telephone about a person in a telecom databases yeah they know called it records they don’t downloads they know whether the department is working they know where we are at any point in time you know our bills but if you call the call center and you’ve got any problems and these poor people have to go through an enormous amount of screens just to figure out your your overview so but endless wanted was to create a system where with one push on the button you have a total high-level overview of customers the way business people like to think about you and so we have a system where we have about five to ten thousand triples per customer yeah so we know your social connections and for each social connection by the way we know how many how often you call that particular person we know that you like science fiction movies and that you’re angry that we better give you a video download for free we know the margin we know your whether you’re good pair and where that’s going up and down we know whether you put your mood is and what it’s going up down we we can predict what you will go to another company another take another subscription yes or no you know what kind of plan you have and how to fit together so design we know where you’re on or choose the afternoon so it’s it’s a an amazing overview of a single person in real time you know usually show this picture where we say so instead of IT facts we store subjective information patterns trends geospatial things temporal effects probabilities absence of recurrence etc etc and this is the architecture and in attempt tech in two weeks I’ll talk extensively about this architecture and how we now apply this architecture to multiple other industries there where we start the relational databases we unify every defense into triples into a defense collector and then we have a rule-based system where we literally apply hundreds of business rule to create a higher level of knowledge that describe the state of every single customer ok so this is the MDOT use case we’ve turned it into a product that we call a lagravis that are smart or semantic entity tracking yeah where we can apply to IT asset management business visitors into the US ships entering the Bay Area credit cards insurance cards and unified view on bank customers and many many more opportunities then another system that we’re building internally is a tool to help company companies with risk in supply chain management in the supply chain so if you build a complicated product it’s a car or computer then you want to make sure that events that happen elsewhere in the world don’t affect your supply chain nowadays most companies have these systems just in time logistics yeah where you try to get to step in as look this as late as possible so you don’t have to have all inventory but of course that brings a risk yeah so but you know what you want to know is which parts produced by a sub-sub vendor will be less available to a flood in China or whichever our products will be affected by political unrest in Thailand and what happens a lot by the way I never realized it but that in supply chain competitors always also can disrupt your production process by for example buying of all the chips that you need anyway so there’s a lot of things that happen in the supply chain that it can affect you and this is just a picture of how this all works together but let me just click off of this okay so for supply chain minutes we needed to get three kind of graphs to come together the first one is in graph databases are really good for this is to take a bill of material yeah and you put the bill of material into a graph and the reason why graph databases are good for it’s because a bill of material is a recursive data structure it’s not just one list of lists which end products that contain parts of contain sub parts of cocaine sub parts so it can go deep then each part you want to have the

first-tier vendors that provide you two parts and okay so I guess that’s fairly obvious then you want the supply chain for the first-tier vendors yeah so you buy it as a producer or manufacturer you buy from offender the defender will buy from self awareness and the super end of a barcode from from their vendors etc and by the way this is a really hard thing to do because most offenders don’t like to talk about where they get that part spot from and in some areas you do know so when we do this in the military domain and SKU gets a like an engine for a nuclear engine then the DoD knows for every part ultimately even the mill where the steel came from but if you talk about like a car then that’s a much more difficult and then finally what we needed to do is we need to take all the parts and the businesses and we scrape the web for information about these parts and fenders and then we look at the countries well we look at country information we get all that news from the web we apply entity extractivist and we wrote some our own rules to find natural disasters political unrest in papers are in newspaper articles and then we can relate that to we can relate it to all the other parts in the autographs yesterday this is a particular little graph that links all the way from a part that goes through a producer that lives in a particular country where there is a flood Bangkok yeah so we can write rules let’s say well warn the defender we’ll have a problem with the part if there is a danger words in a particular text in a particular country where this country has a place yeah where this place has a particular producer that produces a particular part yeah and we buy from this particular vendor so this is the kind of rule that we use and that will warn you hey wait a second deep down in our supply chain is someone that might be affected by this so you better go check to see if you’ve been the really can deliver or maybe you want to buy up everything just before you your competitor buys up a particular part so this is another example and then finally we did a project with top quadrant company specialized in professional services and an ontology building and we worked for a nonprofit organization in Norway that combined that combines more than 31 oil companies that all have oil rigs in in the North Sea and all these platforms have have my law need to report every day about oil production and things that happen on the oil rigs and of course each oil company has their own IT company and so all these things are completely different and but we built a reporting platform where we took the XML spreadsheets that would come from Marrero Dex wealth XML or the Excel or relational databases we built members to them we store that in a relational database then we took all of that first ontology and so now we have a unified view on all the information coming from the oil rigs and then we have a set of templates so we can export it again as XML or Excel or HTML adjacent and I presented this to the people in the Atlantic just because well this is something that is that is known to the enterprise people as people no it’s a problem to integrate data sources and what I’m trying to explain is it’s very straightforward to do data integration in using graph databases and triple stars and I won’t go too deep but this was completely otology driven all the way from the bottom up okay so I described the use cases so now let’s go back to when do you want to use a graph database for triple store and I’ve talked about this before so this is actually a tenth you about how relational data it’s becoming less sexy to write about relational databases and no sequel and graph databases are currently coming up yeah and so live for people the enterprise is getting really complicated because what do you need to use do you need to use relational database for your project do you need a big data database like I do or Cassandra or do you need like a graph database or triple store and um actually at M Tech in two weeks I I have a full

presentation just about this topic about and this is more about the marketing terms but it’s big data fast data and complex data so give a whole talk about it so I’ll just summarize it here yeah if you give a regular enterprise application then you’re kind of dumb if you don’t use a relational database because thirty years have experienced 30 years of robustness and the Bible you even use something else yeah now if you have a billion objects so you have a half a billion or seven hundred million Facebook pages yeah then it’s not really going to work to have a relational database yeah then you need something that is super super fast to get a single object back there where that object can be a block that is basically contains a little tree it’s kind of flexible but not too flexible three of objects and in that case you want to use something like well a big data solution like I do and then you have applications where you have very very complex data and I showed you the graph with pharmaceutical data just imagine that you want to represent a human body the otology of human body you tried to stick that into a relational database or Hadoop I mean you wouldn’t get anywhere or in the supply chain yeah and all the risk factors there I mean you wouldn’t even think about putting that into a relational database or do ya then that’s where things get so complex that you actually want to graph databases of the store now of course all the thin this claim that they can do everything else there but that’s just that’s how we are a spenders yeah so again summarizing when a graph database a triple store this is the the kind of slide I like to show is when you need ultimately flexibility yes sir if you model knowledge and assets if you have hundreds of thousands of classes of different features or if you add everyday new classes new creatures or used to do a lot of rules or reasoning well then you better consider a bunch of store graph database when you need to ultimately link ability and all the other databases won’t help you and only only this semantic technology will help you or when you need pattern recognition and that was analysis then again you might want to consider graph database now and finally this is something that I’ve I’ve been doing is France for several years now is when you need to do event processing like we do with n box yeah we need to use spatial-temporal reasoning and social network analysis combine the flexible metadata ok so yeah these are my most important reasons why I want to use a graph database and then just to stressed it a little bit more I show some queries and as I’m there in a in a room but enterprise architects and of course they all know that sequin can write complex sequel queries and I say so say you want to do a query like from cents money to create increase cents money back to from there could be a case of trying to boost both both each other’s revenues yeah you could do a query where you say well Brian every ABCDE where from sets money to a sends money to B since money to C to create in them back and these are not the same yeah well I can imagine that most of you can express this as sequel but if you get to queries like this well you want to look for past that have in there indeterminate length then you get something like finds a path one of the paths to where there’s a path from France to Crais using said money that’s more than two and in purse and the intersection of these paths is empty well you would know how to write this in sequel at least maybe a percent of a percent of the sequel people knows how to write something like this but for us and the graph database is very easy to express yeah so again it’s hard in sequel because well all self joins and just try to write it as a sequel query and why is this hard in the distributed key-value store well again it’s very hard to write this there’s one expression that I showed you as a MapReduce expression it’s doable but it’s going to be a really huge are problem to get it done and then finally what I discussed at this meeting in event of us about the the dealing with defense and I tell people about the event ontology that we use where you have well you can express almost everything in the world as events whether it’s a hospital visit or a financial transaction of a call a telephone call or a meeting etc yes so you have defense that are that have a type and if it always has the list of actors sometimes the only one but mostly it’s more think Ethan’s usually have them somewhere and there’s always a start time and an end time for an event and finally there’s a lot of other metadata by defense yeah and what we’ve done in our graph database is at full libraries for social network analysis yes or how far as one person from

another now strands relationships to what group does this person belong how important is this person in the group etc etc we do geospatial search and then finally we have a lot of primitives to help you recent time and all of that helps you then finally to the query is that you would never do in a graph database sorry in the Innis in a relational database or in a Hadoop database where you say well find all the meetings that happened in November the apartments of birth is attended by the most important person Young’s friends her friends her friends yes the whole query would see here as you link social network analysis and database lookups in RDF and temporal reasoning especial all in one query paradigm very very powerful only doable in in a technology that supports complexity and then finally I’ve gone more over my half hour how do I get my triples out of my database well again I’m going to talk about it at M Tech but I see we work with partners that make it very easy and we working with both mule and talent where we can do the orchestration and where they can use R to RM l mapping another thing that we’ll talk about at M Tech or we use other mapping technologies that are friendlier to use and easier to to make for example we use a tool where you can actually officially look at the the columns in your comma separated file or relational database where you ask the system to generate templates then you get a template of how this example of a column in your particular table gets returned into triples and you can edit every part of this mapping and ultimately you can send the data straight to a Lego graph for this making tool and then concluded my talk in Atlantan that concludes my talk here and let’s see if we have questions if you have any questions there’s a question max yes so one question is where can I get the linked data you demoed sent me an email and I’ll send you back a link we have an EP side and and that link is on our FTP site it’s called yeah can you iterate reiterate the difference between a graph database and a triple store a graph a graph database also yes so I would say every graph every triple store is a graph database yeah because the house all semantic representation store because most graph databases have a lot of trouble dealing with all the strings that are used in the Semantic Web plus they don’t have all the things to deal with ontologies and with reasoning but I guess this is a deeper thing behind it and that is what people mostly talk about when of what they think when they ask this question is okay but graph databases are created to be extremely fast with graph algorithms yeah and triple stores are more well more more traditional Sparkle queries but not so much well weird France have created and we already have this for nearly four years now a whole graph database library in a total store to do with all the social network analysis but also the classical graph algorithms and we created actually several techniques to make it very easy and straightforward to take complex relationship and turn it into temporary adjacency lists yeah the heart of all graph database algorithms yeah very flexible and when we do that then when suddenly we are an extremely fast graph database and what I’m going to talk about at mtech is about our new Allegro graph protocol where we have a very compressed graph database with very fast access times where we can apply parallel Prolog to use all the processes in our system I actually will be there at their at the conference with a laptop with the billion triples that fit actually in memory but anyway that’s come to SEMATECH and i’ll talk to you about it another question is Gruff extensible could i ôm it inside of another project well we actually put people that are interested will make the code available

we have an open source too but on the other hand we if you want to use it another project will make it easy for you to do that how much of the work in the endless case and fault writing mythology’s well a lot because when you do a project like that you actually start with okay what are all all the things that we ultimately want to know about a customer and so now actually you begin with what are all the the questions we want to answer in a system like that so you come up with a large set of things that you want to be able to know an answer about a customer and then you say okay so what is this supporting autologous so there was a huge effort to use actually in this case table read composer to create the oncology’s that represent the knowledge about a customer and then of course once you once you have that knowledge about the customer then you have to reason back okay so how is the data in my database is going to support it so we also wrote ontology that describe the data in original source databases and autologous for how you want to represent events happening in your source databases as triples in your event queue so a lots of lots of pathologies all around okay the last question do you provide consulting around suppliant yes absolutely we provide consulting around supply chain their projects we helped you with the describing for products we help you with the ontology etc etc so we just send us an email and we’ll we’ll help you out