Ship decompressed Amazon CloudWatch Logs to Amazon S3 and Splunk utilizing Amazon Information Firehose

You need to use Amazon Information Firehose to mixture and ship log occasions out of your purposes and companies captured in Amazon CloudWatch Logs to your Amazon Easy Storage Service (Amazon S3) bucket and Splunk locations, to be used instances comparable to knowledge analytics, safety evaluation, software troubleshooting and so on. By default, CloudWatch Logs are delivered as gzip-compressed objects. You may want the information to be decompressed, or need logs to be delivered to Splunk, which requires decompressed knowledge enter, for software monitoring and auditing.

AWS launched a characteristic to assist decompression of CloudWatch Logs in Firehose. With this new characteristic, you’ll be able to specify an possibility in Firehose to decompress CloudWatch Logs. You now not should carry out extra processing utilizing AWS Lambda or post-processing to get decompressed logs, and may ship decompressed knowledge to Splunk. Moreover, you should use non-obligatory Firehose options comparable to report format conversion to transform CloudWatch Logs to Parquet or ORC, and dynamic partitioning to robotically group streaming information based mostly on keys within the knowledge (for instance, by month) and ship the grouped information to corresponding Amazon S3 prefixes.

On this submit, we take a look at the right way to allow the decompression characteristic for Splunk and Amazon S3 locations. We begin with Splunk after which Amazon S3 for brand new streams, then we tackle migration steps to reap the benefits of this characteristic and simplify your current pipeline.

Decompress CloudWatch Logs for Splunk

You need to use subscription filter in CloudWatch log teams to ingest knowledge on to Firehose or by Amazon Kinesis Information Streams.

Notice: For the CloudWatch Logs decompression characteristic, you want a HTTP Occasion Collector (HEC) knowledge enter created in Splunk, with indexer acknowledgement enabled and the supply sort. That is required to map to the correct supply sort for the decompressed logs. When creating the HEC enter, embody the supply sort mapping (for instance, aws:cloudtrail).

To create a Firehose supply stream for the decompression characteristic, full the next steps:

Present your vacation spot settings and choose Uncooked endpoint as endpoint sort.

You need to use a uncooked endpoint for the decompression characteristic to ingest each uncooked and JSON-formatted occasion knowledge to Splunk. For instance, VPC Move Logs knowledge is uncooked knowledge, and AWS CloudTrail knowledge is in JSON format.

Enter the HEC token for Authentication token.
To allow decompression characteristic, deselect Rework supply information with AWS Lambda beneath Rework information.
Choose Activate decompression and Activate message extraction for Decompress supply information from Amazon CloudWatch Logs.
Choose Activate message extraction for the Splunk vacation spot.

Message extraction characteristic

After decompression, CloudWatch Logs are in JSON format, as proven within the following determine. You possibly can see the decompressed knowledge has metadata data comparable to logGroup, logStream, and subscriptionFilters, and the precise knowledge is included inside the message area beneath logEvents (the next instance exhibits an instance of CloudTrail occasions within the CloudWatch Logs).

If you allow message extraction, Firehose will extract simply the contents of the message fields and concatenate the contents with a brand new line between them, as proven in following determine. With the CloudWatch Logs metadata filtered out with this characteristic, Splunk will efficiently parse the precise log knowledge and map to the supply sort configured in HEC token.

Moreover, If you wish to ship these CloudWatch occasions to your Splunk vacation spot in actual time, you should use zero buffering, a brand new characteristic that was launched not too long ago in Firehose. You need to use this characteristic to arrange 0 seconds because the buffer interval or any time interval between 0–60 seconds to ship knowledge to the Splunk vacation spot in actual time inside seconds.

With these settings, now you can seamlessly ingest decompressed CloudWatch log knowledge into Splunk utilizing Firehose.

Decompress CloudWatch Logs for Amazon S3

The CloudWatch Logs decompression characteristic for an Amazon S3 vacation spot works much like Splunk, the place you’ll be able to flip off knowledge transformation utilizing Lambda and activate the decompression and message extraction choices. You need to use the decompression characteristic to jot down the log knowledge as a textual content file to the Amazon S3 vacation spot or use with different Amazon S3 vacation spot options like report format conversion utilizing Parquet or ORC, or dynamic partitioning to partition the information.

Dynamic partitioning with decompression

For Amazon S3 vacation spot, Firehose helps dynamic partitioning, which lets you constantly partition streaming knowledge by utilizing keys inside knowledge, after which ship the information grouped by these keys into corresponding Amazon S3 prefixes. This lets you run high-performance, cost-efficient analytics on streaming knowledge in Amazon S3 utilizing companies comparable to Amazon Athena, Amazon EMR, Amazon Redshift Spectrum, and Amazon QuickSight. Partitioning your knowledge minimizes the quantity of information scanned, optimizes efficiency, and reduces prices of your analytics queries on Amazon S3.

With the brand new decompression characteristic, you’ll be able to carry out dynamic partitioning with none Lambda perform for mapping the partitioning keys on CloudWatch Logs. You possibly can allow the Inline parsing for JSON possibility, scan the decompressed log knowledge, and choose the partitioning keys. The next screenshot exhibits an instance the place inline parsing is enabled for CloudTrail log knowledge with a partitioning schema chosen for account ID and AWS Area within the CloudTrail report.

Report format conversion with decompression

For CloudWatch Logs knowledge, you should use the report format conversion characteristic on decompressed knowledge for Amazon S3 vacation spot. Firehose can convert the enter knowledge format from JSON to Apache Parquet or Apache ORC earlier than storing the information in Amazon S3. Parquet and ORC are columnar knowledge codecs that save house and allow sooner queries in comparison with row-oriented codecs like JSON. You need to use the options for report format conversion beneath the Rework and convert information settings to transform the CloudWatch log knowledge to Parquet or ORC format. The next screenshot exhibits an instance of report format conversion settings for Parquet format utilizing an AWS Glue schema and desk for CloudTrail log knowledge. When the dynamic partitioning settings are configured, report format conversion works together with dynamic partitioning to create the information within the output format with a partition folder construction within the goal S3 bucket.

Migrate current supply streams for decompression

If you wish to migrate an current Firehose stream that makes use of Lambda for decompression to this new decompression characteristic of Firehose, consult with the steps outlined in Enabling and disabling decompression.

Pricing

The Firehose decompression characteristic decompress the information and fees per GB of decompressed knowledge. To grasp decompression pricing, consult with Amazon Information Firehose pricing.

Clear up

To keep away from incurring future fees, delete the assets you created within the following order:

Delete the CloudWatch Logs subscription filter.
Delete the Firehose supply stream.
Delete the S3 buckets.

Conclusion

The decompression and message extraction characteristic of Firehose simplifies supply of CloudWatch Logs to Amazon S3 and Splunk locations with out requiring any code growth or extra processing. For an Amazon S3 vacation spot, you should use Parquet or ORC conversion and dynamic partitioning capabilities on decompressed knowledge.

For extra data, consult with the next assets:

In regards to the Authors

Ranjit Kalidasan is a Senior Options Architect with Amazon Internet Providers based mostly in Boston, Massachusetts. He’s a Companion Options Architect serving to safety ISV companions co-build and co-market options with AWS. He brings over 25 years of expertise in data expertise serving to international clients implement advanced options for safety and analytics. You possibly can join with Ranjit on LinkedIn.

Phaneendra Vuliyaragoli is a Product Administration Lead for Amazon Information Firehose at AWS. On this position, Phaneendra leads the product and go-to-market technique for Amazon Information Firehose.