This is going to be a blog about the best logging practices for your web/mobile/pretty much any application; more inclined towards security. I often get asked this question - what all should be logged? The short answer to this is - 'everything'. However throwing in a few lines of code to write log files is one thing, and having them intelligently presented, prioritising the right information and having someone to actually deal with those log alerts is quite another matter altogether. I'm going to share my thoughts on how to optimise this entire pipeline, and get the most out of your application while keeping your users secure.
I'm writing this for a number of reasons, one of them being the number of databases I keep seeing popping up on the hacker forums. A lot of startups, and even established enterprises unfortunately still run applications which are highly vulnerable. That's acceptable, everything is vulnerable. What's unacceptable is not keeping an eye on what's going on; once an application starts being exploited, there is most often nobody at the other end noticing what goes on. Trust me, when you exfilterate terabytes of data from someone's database, you're definitely going to see anomalies in the logs! You'd think someone would be there to stop you right....right? right??? anyone? Enough of a rant....let's get straight to it.
When you hand over a completed application to your client, there has to be an operations teams in place to attend to security alerts. This could be as little as a guy (say Bob) in a pub sipping a beer with a mobile phone that gets alerts (hopefully sober enough to take action), or it could be as much as having a floor full of dedicated analysts - the same principle applies. A lot of times, especially with smaller operations, there simply isn't anyone to deal with the logs and alerts. This is the first step. You can actually skip over the entire remaining portion of the post if this is not in place, because even the best logging practices will get nowhere if there's no one/nothing (it could be some crafty code you've written to block intruders) to deal with the logs.
So let's get started from the basics. Timestamps : one would think that this goes without saying, however you'll be surprised to know that sometimes the very basics are overlooked. All logs should be timestamped in a universal format. It's recommended to use GMT/UTC - this can always be converted to your timezone later, or if you're using a log aggregation software, then the settings can be changed there. However GMT is best for co-relation of events, especially if your servers are spread across multiple timezones. Next let's get to some of the HTTP components. The request-response cycle is what your application lives on. Application logs are not to be confused with your HTTP logs. HTTP logs are immensely important as well and can be a powerful tool to diagnose a wide range of issues that your app may have. It is fairly common, and a good practise to place your application behind a reverse proxy of sorts. In such a setup, we let the reverse proxy do the heavy lifting of handling the requests from the public, while your application server can work in the sheltered neighborhood of the region behind the reverse proxy. Scalability and security are some of the added advantages of setting up behind a reverse proxy, however that's a topic for another day. If you're using a reverse proxy, some data points such as the Source IP of incoming requests and the User-Agent will have to be forwarded from your reverse proxy. For example, if you setup the reverse proxy correctly, the originating IP of the request can be retrieved from the HTTP_X_FORWARDED_FOR HTTP header. The name of the header may vary mildly depending on your reverse proxy of choice and implementation style, hence I will not be going into the specifics. There is more than enough literature on the internet to get this up and working correctly. So a quick recap - you'll want to get the IP address and the User agent of the incoming request into your application. These two pieces of information should ideally sit alongside every single event that you log in your app.
Now that we've got the basics - IP, User-Agent, Timestamp - let's get down to some of the specifics. Depending on your type of application, some of these may not apply to you, but it will be good to give all of them a good read - because someday they may apply to you.
If your application has a login page or an authentication mechanism, you should log every possible data point concerned with this. Details to be logged are username, email, authentication success/failure, if failure - then why did it fail? All these details should be logged.
Cookies & Tokens
With the amount of data generated by a modern web application, it is often to give some thought to this. If possible log everything to a data cold storage, and send relevant details to a platform for aggregating, alerting and searching through logs. When things blow up, you can go break the cold storage and fish all the extra data out. Data is pretty cheap nowdays, take advantage of it!
If your app has user roles and those are presumably defined by some sort of user id in the database, then it's quite common to have those user ids in urls. Take for example an 'edit user page' at example.com/edit/?user=1. Now you've obviously (hopefully) built some logic into the application that prevents user '1' from accessing the url example.com/edit/?user=2. However when such a request is made it should be 'warning' level log. If a few more such requests happen there should be a real-time alert (see alerts below) sent to operations personnel - this usually means that someone is trying to poke around your app and get access to things they aren't authorised to access.
This file could get huge quickly so decide what you really want to log. If you can afford it, log all database queries. However aggregate only what's necessary. Look at the amounts of data going in and out of your database per unit time on average. Pretty basic stuff - it can save a lot of trouble.
Formats, schemas and nomenclature
It's best to use a standard such as JSON that is easily integrated into a variety of systems. As your system grows, you'll likely want to aggregate your logs, having them in JSON will make life a lot easier when building your ingestion pipelines. While schema's are optional, it is always helpful to have a consistent schema as best possible to make it easier on the guys watching the logs. Nomenclature and names should most certainly be consistent across all logs. For example source_ip should not turn into SourceIP or sourceIP. Choose one and stick with it throughout else you'll have people pulling their (and likely your) hair out.
Having text log files is great, and once you gain command over grep, awk, sed, cut and the likes - you'll have complete power on sorting. However eventually you'll want to have all your logs in one place in a more powerful platform. There are plenty of options in this space. If you're new to the game, take a look at the ELK stack which is great for this sort of a thing. Go with the open source version if possible. There are other great options out there like Prometheus - pick your flavour. Aggregations are great because they allow you to co-relate a wide series of events from different logs based on different search criteria or pivot points. For example, you could take a look at a certain frame of time and have all the information right there in front of you. Information such as http requests, log in attempts, POST data for a certain window of time can be easily accessed and searched.
Getting it all together
Aggregations of those logs give you more power. For example you can list the, number of http requests per unit time from a given User-Agent and IP address combo. You can graphically plot this using Kibana, or the equivalent visualisation tools in other stacks. You can then watch your logs for some days and write simple alerts. It could be as simple as, if the volume of data in http responses to a certain IP exceeds (a threshold decided by you) bytes per unit time, then raise an alert to an operator. You'd be surprised at how far simple stuff like this can get you. Once you get a feel of what abuse of an application looks like (call your hacker buddies to have a go at your app, be sure to pay them as well), by all means go all out and write more advanced stuff. You can also stick a good open source Web Application Firewall like ModSecurity in your reverse proxy and you'll have some additional protection and logging.
Don't think that it's a deep mystical dark art to fend off evil hackers on the internet, or that you need an office full of security analysts. It just takes a bit of understanding about how things work, and some clever mechanisms in place. Yes, good old Bob sipping his pint at the bar can thwart an attack (assuming he's sober enough again) that could potentially cause your company (or your client's company) to close down. Devs - please advice your clients on this as well. It is in your direct business interest to do so. If your clients get hacked and have to shut shop, that means it's less business for you. If this application does critical things, like handle people's personal data, or control equipment in the physical world, then you should probably have a dedicated security member (someone who really understands his stuff) on board as part of the development team, as well as the operations team.
Did I miss anything? Surely I must have - go on, you know what to do....put it in the comments below and save someone their company.