Category: Data, firewall

Let me start by giving an example use case for to using S3 Select .Suppose you need to analyze data stored in an S3 bucket in CSV/JSON format, and the data is frequently updated and new data is uploaded in a new GZIP-ed/ BZIP2 (CSV/JSON) every day. Without S3 Select you would need to download, decompress, and process the entire CSV to get the data you needed.

This means you’re dealing with an order of magnitude less data and this in turn can dramatically improve the performance and reduce the cost of applications that need to access data in S3.

By reducing the volume of data that has to be loaded and processed by your applications, S3 Select can improve the performance of most applications that frequently access data from S3 by up to 400%.”

Here is what i find very interesting about S3 Select, you can specify the format in which you want your output, using the OutputSerialization parameter for the example above i specified as ‘CSV’ but if i wanted the output to be in JSON format i could have simply done; The ability to partially retrieve data is particularly comes in very handy when building and working for serverless applications built with AWS Lambda.

Related Articles