– Good morning Thank you for turning up so early I’m very excited and surprised to see that you all managed to get out of bed I’m gonna start by admitting a little secret to you One of my very first jobs I got early on in my career was as a junior verification and modeling engineer I had no idea what this is I knew two things I knew it was programming, and I knew they were gonna pay me I knew absolutely nothing else about this job Would anyone else like to admit that they’ve done the same thing? Fantastic (laughing) That makes me feel a lot better So what this job was actually about was building silicon chips, not processors They were actually for networking But what’s really interesting about this process is these designs start off looking a lot like software You write some syntax in some text files and through a very long and drawn out process, they get manufactured into these silicon wafers which then become your chip This process is extremely expensive A conservative estimate might put it at $5 or $10 million, and that’s really being conservative about it So this is where my job came in My job, as it turned out, was to write tests for these designs in C++ because you want to get the design right the first time That rarely happens, but that’s the idea And so we would incorporate simulations of these designs into C++ code and write tests for them Because this process of making these chips is so expensive and so drawn out, when you get a bug in the chip, you really want to make sure that you can fix it and work out what’s going on without having to change the chip And so you lose, you don’t have a lot of the niceties that you have when you’re building software You don’t have any debugger in the same sense There’s no running a debug build There’s no just adding some extra logs What you do have though, is a huge amount of statistics in the chip Huge amount Little registers that you can read numbers out of and these numbers are accumulating various bits of information about what’s going on inside And this is all you have to debug anything that might go wrong in the chip And you have to make sure it’s in the chip design from the start There’s no adding it in afterwards This is not possible So let’s compare this to how we build C++ We write a similar set of code and the equivalent to manufacturing I guess, would be putting it into production, so compiling it, testing it, deploying it, shipping it But how often do we think about how much this actually costs? It could cost a very small amount of money Is that actually quite expensive? How easy is it to actually change code that we’ve put into production? This is gonna depend heavily on what domain you’re in, but it’s worth thinking about because I think even if it’s really easy for you to update your code in production, you want to do it as least as possible So what do we mean by production ready? We talk about this a lot, especially in code reviews, for example Is this code production ready? I wouldn’t do this in production Things like that Maybe if it compiles, it’s production ready? I think we’ve all worked at places where this is true

If it’s tested, is it production ready? Well, one thing that production ready does not mean is bug free You always have to account that something might go wrong in production So how about this Observable Think back to that chip We made sure when we were designing the chip that it was observable so we could work out what goes wrong when it has been manufactured So this is what I want to talk about How can we make our software observable so that when something goes wrong in production, we can easily work out what went wrong and how we can fix it So let’s talk about instrumentation It’s in the title of the talk Let’s get a definition Should always start with a definition, right Wikipedia says instrumentation is a collective term for measuring instruments used for indicating measuring and recording physical quantities Fantastic, that’s quite useful to know And even better, it says in the context of computer programming, instrumentation refers to an ability to monitor or measure the level of a product’s performance to diagnose errors and write trace information This is exactly what it is But we should get a second opinion, so let’s ask Urban Dictionary Great source of all technical knowledge Urban Dictionary says, has its little example at the bottom, “Hey John, look at that expensive instrumentation.” And what Sue’s actually talking about here is the fact that if you add instrumentation to your software, it may have overheads, so it’s actually expensive So what I’m telling you is Urban Dictionary is a great source for C++ programmers So instrumentation allows us to observe deployed software We can monitor that it’s working, we can measure the performance of it as it runs, and we can diagnose errors when they occur and trace them back to what caused them Specifically in this talk, we’re gonna talk about source instrumentation So this is adding code to your source, adding features to your code to instrument it And this is built in to your production releases I really want to emphasize this The instrumentation is not something you just do in your debug build It’s something that’s in your product, in your software Your software becomes instrumented and runs in that form And there are some alternatives to doing this, but this is not what we’re gonna talk about today The best form of instrumentation, printf, I mean logging Always gotta log So this is what logging is for At some point early in the morning, you will get a message saying your software crashed This will happen, prepare for it And you will ask, what’s in the log And what they will tell you is std:: bad_alloc And you then have to work out why And you might then question why you became a software developer Maybe you start looking on jobs, on Stack Overflow for new jobs, but a few hours later you’ll probably ask them to reboot the machine and maybe it will fix everything Okay Let’s get a bit technical now So I have this little example program that we’re gonna work through in the talk Just a basic shell of a C++ program, does some processing and any exception that occurs in the processing is printed out We probably all wrote little snippets of code like this But what we’re gonna focus on is this error that’s coming out of the code, and we can assume

that it’s coming out of this process file function that we’ve read And maybe reads a file and processes the contents of the file So what we really need is a lot more information about this error This permission denied message is not nearly enough information for us to work out what’s gone wrong, so we could add some context And this is where we start talking about logging Maybe we use cout just to print some extra information out, and in this case wouldn’t it be useful to know which file we were accessing where the permission was denied Maybe in addition, we want some context about why this operation is even occurring, what user caused the operation, what connection the request came from Of course, this will vary depending on the sort of software you’re working on And then on top of this we can clean up this error message that we’re sending to the user through the exception because the user doesn’t care that permission for something has been denied What they care about is that their request failed and maybe we tell them a little bit of information We don’t expose too much of the internals of our application, but this is a good start Now we have a lot more information and maybe we can work out what went wrong So these little bits of writing stuffed to the screen at some point we’ll think, okay, we’re definitely logging now so we should use a logging library, be professional about it So maybe we use something off the Internet, maybe we get something to come up, package manager, maybe we write one ourselves, maybe the company has their own logging library Whatever it is, you know it’s gonna look a bit like this Maybe you get some ability to use format strings, you can specify severity levels, and have lots of useful features for writing log files and maintaining them But this really isn’t what’s interesting, I think, about logging The problem with logging is us, the humans We’ve made logo files to be human readable so that we can read them and understand quickly how to fix a problem And this is fine if you’ve got a little log file or a couple of files to read The problem gets increasingly worse when your software grows or more people are using your software You gradually become less and less happy because we don’t scale, we can’t read thousands of log files It’s not possible And so we started to imagine was this extra layer between our logs, which for the purposes of this talk, we’ll just call magic And this processes all this log data we’ve got and puts it into a nicer form for us to be able to read I’m not gonna go into too much detail about this magic, but roughly speaking, what it refers to is a growing ecosystem of software and services that let you process and understand your logs better And this huge amount of software that does this for you, and if you’re running your applications on the cloud, then you can probably just pipe all of your logs into some service and it will sort them out for you So what this gives you typically, is some ability to search through your logs for particular errors or a particular time It might do some reporting for you, produce some metrics, tell you how many errors occurred on a particular host, for example, and maybe it’ll give you some alerting

so you can only be notified when really important things happen You’re not constantly reading through these logs So this is a very rough overview of this systems The problem comes if we want to use these systems all we have to feed them is this human readable text that we munged together from all this information So the first thing a lot of these systems do is process this data into a much nicer format, give some structure to it Fill out, for example, the username and the IP address so we can search on these things and we can index all of our logs and all of our errors by user and say which errors occurred for this user These are typically implemented in various ways, but the most common is this mess of regular expressions you end up writing to parse out the data from your logs But this is insane We already had those bits of data in a nicely structured form in our application, and we merged them into a text format, which we then parsed back into a structured format What we really want to do is just output the data that we had in a structured form It’s still human readable if we need to, but it makes it so much more easy for machines to process it And this is what we want to do, we want to automate the processing of this huge mess of log data that we’ve acquired And this is becoming very popular in other languages It doesn’t seem to have caught on so much in C++ yet, but I think it’s something that is worth mentioning because it allows us to eliminate all of this unnecessary work And this idea of structuring our data brings us on to the more interesting topic, in my opinion, which is tracing Tracing is basically logging, but it gives a bit more information about what your logs mean So a trace is typically something that has a start and something that has an end So an operation that takes some amount of time And the questions we’re interested in asking and answering with tracing is what caused the error We want to build up a history of what happened in our system so that we can trace back the source of the error And the other thing we’re interested in doing is looking at performance, so how long did something take A good example of this is strace, brilliant little utility which instruments some program and logs out the system calls that are used by the program So if we look at this example of catting some file out, it will tell us that open was called because we need to read the file And later on we, call another system call to read the data out of the file The first thing this tells us is what actually happened The system call, the arguments and even the error message, the error code, sorry, which is very useful in itself, but we also learn the time that this occurred and how long it took so we can start looking for bottlenecks in the code So the way this tends to evolve

is you start with having some logging in your application Maybe you have this process file function that we touched on earlier It reads a file, opens a file, and then reads the data and processes each line of the file in some way And we log it and we log which file we read and we log which you user did it Useful Wouldn’t it be nice if we knew when it ended because then we know how long it takes We know when our code is finished with a file and it closes it off So eventually we realize, okay, now we’re doing tracing, we’re not doing logging anymore, so we’ll use the library for that And typically you get these little utilities that look a bit like your log code, but they’ll produce some instance of some trace object for you and then you can end it when your operation is finished, and many of these tools will, if you don’t do it explicitly, the destructor will do it for you So if your function finishes then the end trace gets written And you can add the same information as we did with the log Any sort of arbitrary information you think might be interesting, filename, user, IP address, you can add that as well So let’s look at an example of how this might be output because this data doesn’t have to be output as logs, but it’s quite common to do so Maybe we get these file read events with some metadata and then we get the end, maybe no application It opens up a couple of files and processes them Well, it’s common to do kind of like we looked at with the strace example is to only emit one event or these two events, so we emit the event when the operation ends and we emit all of the metadata and then instead of the start, we just record the time that the event started Alternatively, we could do, record the duration You get the same information regardless of what you put out Where tracing becomes really interesting is how you build up relationships between operations So if we have some connect operation, some user starts a connection to our service, we can trace that Then inside that connection, maybe we have to read some files, so we have another operation within an operation And so what we can do is actually link these together through some scheme or another, some incrementing integer or some sort of you UUID, and we can tag the inner trace with the ID of the outer trace So now we get this relationship between the traces You typically don’t have to worry about this because if you’re using a sufficient library, then it will have support for building these relationships So if you create your trace and you create your inner trace from the first trace, then they will get linked together through some identifier What’s even more interesting is that you can often write logs to traces, so now you can associate errors with particular operations, and this cascades So you have an error at the very bottom of your application, you failed to write a file You can then trace it all the way up to where the client invoked that operation that ultimately failed But generally, this is where the name tracing comes from ’cause you get to trace the origin of your errors Let’s look a little bit more at these relationships We can trace errors from client to root cause, but we can also see within an operation where bottlenecks are Is one operation taking longer than another

And this is actually essential for systems with any sort of concurrency in them Because if we imagine two concurrent streams of processing going on here, the interlinked, sorry, the interleaved bits of one happen, then bits of the other happen, then bits of the next one happen So by building these relationships, we actually know the correct flow and we don’t have to infer it from just reading through the log file This log file isn’t ordered, it’s just a complete mess of all the things that are concurrently happening So, pretty pictures time A really nice way to visualize these traces is with little timelines We can, for example, display all of the traces for a particular user Every time a user causes a file to be read, we can display that, but what we can also do is then display the connection for that user and we can draw the links between them So we start getting this sort of, it looks a little bit like a core graph, like the connection triggers the file read and we can add the other data in the file alongside it as well if we’re interested in correlating the two together This becomes even more interesting when you think about software that is broken up into multiple address spaces For example, multiple processors or if you have a distributed application were parts of it are running on different nodes You can then link operations which occur in completely different processes and those together, and you can say the cause of this was actually something that happened somewhere else, not in my address space So this has got a lot of benefits for doing, just by adding, taking our logs, but just adding a little bit more structure to them That’s all we’re doing So in a C++ application, we care a lot about performance, so we need to be very careful about the overheads which we incur One thing we’re particularly interested in is being able to disable our tracing, and this is some of the concepts of what you might do for your logs You want to be able to turn them off, and when you turn them off, you ideally want them to have no overhead, ideally You can’t always get this low, but with some systems you can And then what we care about is of course, the overhead where the tracing’s turned on because when it’s turned on is when it’s valuable And there’s a few things we can do A very common technique is to sample your traces, so only actually emit information for every 10th operation or every 100th operation We could aggregate our trace data, so instead of outputing all the details of every trace, we just output the count of things that happen and we just output the total time that all of the operations took This still has a lot of value because we can then see, for example, the average time that our operation took And if you’re thinking this sounds a lot like what I get from my profiler, then you’d be right In a sense, a profiler that you might use for development time is like a very specialized type of tracer for finding performance bottlenecks And then we can also build tracing systems but don’t actually use logging but a much more efficient version than formatting and writing out huge blobs of text to files We can imagine much better optimized binary formats

If you think about the tool tcpdump, this is a tracing tool, fundamentally It’s tracing your network activity in and out of a node and it has a very specific format for storing the traces, the network packets, in a file And if we go to look at the Linux kernel, they have a tracing file format as well Linux has this very elaborate mechanism of adding trace points into the kernel so you can see what’s happening With this, the big problem with any sort of tracing and any sort of logging is that the overhead grows as your application does more things If your application does 1,000 things, then you have to trace 1,000 times If it does 10,000, you have to trace 10,000 times, unless you’re doing some sampling, of course So the overhead always grows as your application starts to do more things or you want to trace more parts of your application, which leads us onto why we want to talk about metrics What are metrics? They’re just numbers, interesting numbers polled periodically And a really good example of this is htop Have you ever seen a screen like this? Little utility you can run and get it on most operating systems This is page htop in particular, but fundamentally they’re the same You get a lot of really useful information about your system You get your CPU usage, you get the amount of memory that you’re using, number of processes that are running, the uptime of your server Extremely useful But notice the wide array of different types of numbers We’ve got duration, we’ve got counters, we’ve got absolute values like memory, and we’ve got relative values like CPU usage This is really useful And typically what we want a metric for is the history of it, so if we have memory usage of a node, when our bad_alloc occurs in our production system, we’re able to look at this memory metric and see, well, something happened here, we should probably take a look at that And then we can build alerts on top of this So say, if our memory increases over a certain threshold, then tell us and maybe if we’re clever enough, we can find out why it broke and everything goes back to normal The typical workflow for system metrics at least, is very commonplace This has been a technique that’s been around a long, long, long time You collect the metrics from your servers, you store them somewhere, and then you analyze them And there are a huge number of systems that will do this for you, and again, if you’re using some sort of cloud provider, well, they also have a metrics collection system you can use What we want to do when we’re developing software is hook into this We want to expose our own metrics and have them collected and have them analyzed and alert on them and we want to get all the same benefits that we get when we’re monitoring our infrastructure and our service, our temperatures and CPUs Let’s look at what a metric is made of We give it a name, temperature, some sort of count, number of things happened, and we tag it if we have multiple versions of that metric We have multiple hosts then each of which have some temperature sensor, then we can tag it and say specifically which one this is a measurement for And of course the value And then the timestamp, which we took the measurement for

This example happens to be open metrics It’s a evolving an open standard for passing metric data between systems Let’s get back to some code That’s why we’re here Same example as earlier, little process file function Well, in a metrics library, and this library, it will be best to use the same library that, the library that your infrastructure recommends you use So if you’re using a particular type of monitoring software for your infrastructure, then they will probably have a C++ client that you can use to expose your own metrics What we can do is start building metrics in, for example, maybe we want to count something Maybe we want to count the number of times we read a file So we can add a little counter, we can add some tags and some metadata to do it, and then we can increment it every time we read a file The counter itself will typically look like this It will have some integer inside it, probably an atomic integer, and some function to increment the count, and a function to obtain the count Not very complicated The idea behind this counter though, is to keep that increment as lightweight as possible We want it to do as little as possible so when we add it into our code, so every time we read a file, the overhead we’re adding in to our application is negligible and a lot cheaper than if we were to put a log in the same place Typically, this will vary depending on the library and the infrastructure you’re using, but how you collect these counters will be through some thread that’s running periodically and picking up each value from each counter You have some registry somewhere in the library of all the counters and all the metrics, you run through each of them, pull out the value and publish them This is quite heavyweight work Maybe we’re formatting the metrics into text, maybe we’re sending them over some network socket or writing them to a file, but it doesn’t matter because we were only doing it every time we pull the data We’re only doing it every five seconds or every 10 seconds That increment, the thing that we actually put in the critical path of our code is still extremely cheap So having these counters doesn’t incur too much overhead I’m sure some of you are thinking though, I can definitely do better than this I know you’re thinking it It sounds like a really interesting problem to really optimize that increment, to get it as fast as possible Well, yeah, other people have thought about it too and they thought it was really interesting and they wrote papers on it and they tried to standardize it So if you’re really interested in how you can write really efficient counters, there’s a good paper for you to go and read Let’s look a different example What about if we wanted to count the number of times we read a line from a file This is a much more frequent occurrence than just reading a file in its entirety So this loop is critical performance wise, but it’s still fairly heavyweight We’re pulling out a line from a file, processing it some way, pausing it, so on and so on But it’s still quite a good, this makes it a good candidate for adding a metric to because the increment relative to what you’re doing is still fairly lightweight However, it’s still a nonzero overhead

and there will be situations where the cost of incrementing a counter is still an overhead to your operation, so we have to think about is the information we’re getting valuable enough to warrant slowing down the code Let’s look at the data that we might get out of this counter Say we’re pulling it every five seconds as our application starts up, counter zero, nothing’s happening, and then something starts happening and we start processing data, the numbers start going up This is meaningless You can look at that and really infer nothing other than the number went up a bit What we really want to do is visualize it of course, and now we get to see some really interesting things We can see roughly when processing starts and when the processing finishes These flat areas are where nothing’s going on And we can see roughly the number of lines that we’ve read through each file, when this leveling off occurs What’s even more interesting is when we host process the output from this metric and for example, graph the rate that the counter is increasing at Now it becomes even more obvious when our operation is starting and stopping We can see it roughly takes, the first one roughly takes 40 seconds and we can see very clearly that two operations occurred and we can see the throughput, so we can actually see the performance of our processing loop on the graph And even better what we can do is we can see if the performance changes throughout the processing, and this is something you wouldn’t typically see if all you do is collect the time your operation took and the number of lines you processed You would get an average over the whole operation What you see with this counter is you see if the rate changes So this excites me a lot If we tag our metrics in a nice way, we can correlate what’s happening depending on different dimensions in our system If we have multiple users, we can see that one of the users in our system, when they begin requesting, affects the performance of the other one And we can do other processing to these numbers as well We can graph the sum of the rate for all the users in our system, and now we learned even more We learned that there’s some sort of startup, slower performance and then after some time, we see the performance increase, so maybe this is an effect of file caching Once you’ve read the file once, it gets stored in memory and so it’s faster to process it a second time, and you can see that there’s some sort of limit So even though we have two users running requests in parallel, there’s some sort of ceiling to our performance and all of this information comes from adding just one counter to your loop that’s doing your interesting processing I think this is pretty cool And we can go one step further with our metrics and our pretty graph We can put this side by side with our tracing data that we found earlier, and now we start to fill in even more gaps in our knowledge Just looking at the metric, we don’t necessarily know whether this is two distinct operations or whether it’s one operation that happened to dip in performance very drastically, but if we look at the tracing data alongside it, we can verify that in fact, it was two distinct operations, and from the metrics, we can see the performance

within the operations I think this is really valuable lesson to take away Just very simple additions to your code can tell you so much information, but you do have to put a bit of thought into it It’s not effortless So, we’re at the end Nearly time for some coffee I’m desperate for some coffee What am I trying to say? Develop observable software Think back to that chip I talked about at the start We had to make sure that chip was observable so we could work out what happens when it goes wrong And I think there is a huge advantage in doing this with software as well Try to debug in development as you would in production so when something goes wrong in production, you don’t need to install your debugger on your production server You don’t need to install tracing tools or some other form of instrumentation Your software is monitorable and observable Your software becomes its own debugger But we have to take into account that while there is a lot of information to be had there there is overheads in doing it, but it doesn’t have to be expensive as long as you’re mindful of where you use different techniques We can log and we can log errors because we have to and we should We can trace things at a very coarse granularity that happen infrequently, and then in our more hot loops, we can think about adding some metrics instead so that there is less overhead incurred There are always trade offs, and the techniques here are very complimentary You use them together, and this includes other types of instrumentation Just because you’re adding some instrumentation to your code, to your software, doesn’t mean you can’t still use your debugger or tools like strace or other instrumentation tools That doesn’t mean that you can’t use your compiler to add instrumentation as well But with source instrumentation you can choose what information to expose that is useful to your particular domain So if you’re writing some sort of video processing framework, maybe you’re interested in counting things like the number of frames processed If you’re building a database, maybe you’re interested in the number of times a table is accessed or a row in a table is accessed This is all possible when you actually think about what instrumentation and what information you want out of your code So with that, I’m gonna thank you for coming and I hope you enjoy the rest of your conference (audience applauding) We have 10 minutes for questions and there are microphones if anyone would like, or I am around until Friday so please feel free to come and talk to me Yes – [Man] So I’ve had to look into tracing quite a bit myself and the availability of C++ libraries is poor I think – I’m sorry, could just, a little bit closer – [Man] The availability of a C++ libraries for tracing is poor at best, I think And I’m not really aware of any metrics like this What’s your experience? Have you got any suggestions? – So the question was, are there any specific examples of metrics libraries and tracing libraries for C++ I left this out of the presentation sort of on purpose because as the gentleman says, the choice is, while there is choice, there’s no sort of defacto standard There’s no, and this is even true for logging libraries There are thousands of logging libraries Every framework has its own logging library Every company I’ve worked in has their own internal logging lobby, there’s no standard for it

But to directly answer your question, the library I’ve had most, the infrastructure I’ve used the most is a piece of software called Prometheus, and it has a number of C++ clients which let you expose metrics to Prometheus So it’s very, the problem is that these libraries aren’t standard and they’re are often specific to the tool you’re using, which is unfortunate And for tracing, there is a evolving open source project called OpenTracing, and that links into a piece of software called Jaeger, which is a distributed tracing system and that has C++ clients So those are two things you could look at But this is an interesting point I think we could do a lot better to try and evolve some libraries, maybe not necessarily in a standard context, but at least as a community where there are tools which become the defacto standard So if we use a library from here and the library from here, we write some code, we can use a common library to introduce metrics and traces Yeah Yes sir – [Man] Yeah, I guess all three of us had the same question Can you say a little more about this Jaeger library? Does it work with just other processes across C++ or does it work cross language as well as cross process? So my company, we use scripting language in addition to JavaScript, in addition to C++, and having visibility across all three would be great – So the question was specific to Jaeger, and I guess perhaps generally with tracing clients, is there any cross language clients we could use so you can collect tracing information from different parts of your stack So the answer to that is yes, Jaeger specifically supports pretty much every language Jaeger is an implementation of the OpenTracing standard, and the OpenTracing standard, this OpenTracing has lots of clients for all different languages, so you could put some traces in your C++ code that actually call operations in a different language and then still link those traces together So I think you mentioned JavaScript I’m not 100% sure, but I would be surprised if there wasn’t the JavaScript client There’s definitely things like Python and Ruby and anything like that (audience member speaking unintelligibly) – [Man] Okay Alright I guess just as a comment, we effectively had to hand roll our own instrumentation library And for those who might use Intel’s TBB, they do have an enumerable threads storage counter It kind of does the atomic thing you were talking about – Yeah – [Man] Yeah, so just as an FYI – Yeah – [Man] So thank you – Any other questions? Yes sir – [Man] Can you just spell Prometheus, how is it spelled so that I can look it up? – The Prometheus, what is – How do you spell Prometheus? P-R-O-M etheus (laughing) It’s the same as the film, the alien film, Prometheus It’s a great metrics collection software I quite like it myself Yes sir – [Man] Hey, I think I heard you say that you should be careful about not putting instrumentation in everywhere – Yeah – [Man] And while it’s easy to agree with that, I think the recommendation should be instrumentation is a key functionality of your software You cannot operate the software without instrumentation and you should put it everywhere where it’s needed, just like basic functionality of the thing, right If your thing is supposed to compute a square root, you compute the square root – Right – [Man] If you’re supposed to run the software, you instrument it – So the comment was maybe the advice should be you instrument as much as you can and especially things that are important Is that right? – [Man] Well, in a former life I was an SRE, and the advice I would give any developer is you define the SLOs and SLAs for the software that you’re building and then you build the instrumentation that you need to monitor for those SLOs and SLAs Period, right And you want to instrument a little bit more than that

so you can troubleshoot and find out why you’re in violation of your SLOs, but you start from what are the properties of the software that you want to maintain or the system, and then you put all the instrumentation you need to be able to maintain those properties, right That means you have to buy an extra CPU and AWS or Google Compute Engine in my case, then go ahead and do it – So this is an extremely good point in that sometimes the instrumentation is critical or even a requirement of your software and so any overhead has to be acceptable and you just have to deal with– – [Man] It’s not overhead, right Overhead is stuff that you don’t need In this case, instrumentation is something you need or it’s hard to have the functionality of the system – So this talk is actually about half the size that it started out, and I had a section about exactly this The trade-offs, it went into much more detail about the trade-offs between different types of instrumentation, how much overhead is good enough or little overhead, and when you might use different things When you need to use instrumentation because as a requirement from a customer or you have, for example, an SLA, then how can you make a trace as efficient as possible Unfortunately, that turned out to be far too much to cover in one go Thank you for your question Yes sir – [Man] I was gonna basically agree and I think, but with logging it’s interesting I think, is that often, at least in my experience, is that you put logging where you need it to solve a problem You don’t put it for the future problems you’re going to have So it’s usually too late And I wonder whether tracing, I suppose your comment is you put it everywhere or for the SLA or however, but it’s a very difficult problem for knowing exactly beforehand where to put it I think it’s an interesting topic for people to do So you could do at kind of a business level So you say, right, okay, but we’ll put logging, tracing and metrics on business terms if you like and not do it any lower And this also depends on what you’re using them for Are you using them to monitor or for a debugging tool Logging is often more a debugging tool whereas tracing is more monitoring and metrics, I think – So the summary of that comment I think is that logging is often more geared towards things that you absolutely need to record, like errors and tracing, and specifically, the gentlemen said, errors are logs of the things that you know you need to record whereas traces are typically more for things you think you might need to record for future issues that you don’t know about And yeah, I completely agree I think the more you start to look at tracing as something distinct from logging, you realize that logging is really just for errors It’s about adding context to errors, maybe warnings as well, but that’s really all you should be using logging for and if you have some customer requirement that you produce a log file in the format, then of course, you need to do some logging But I think if you generally all this sort of info logging, the debug logging, we should be thinking about in a more structured way, perhaps looking at using some tracing instead And one of the ideas of this talk was to try and start a discussion because I think in the C++ community, we overlook a lot of the problems of actually running C++ in production It’s quite hard work And we worry too much about angle brackets and syntax and the newest feature of lambdas So with that, my time is up, so thank you for coming I’ll see you around (audience applauding)