• Skip to primary navigation
  • Skip to main content
  • Skip to footer

In Out Code

Your comprehensive guide to optimized code

  • Data Structures
  • Python
    • Control Flow
    • HackerRank
    • Input and Output
    • Modules
  • AWS

How to List AWS S3 Bucket Names and Prefixes

You are here: Home / AWS / How to List AWS S3 Bucket Names and Prefixes

June 19, 2020 by Daniel Andrews

If you need more information about the objects and buckets in your S3 data lake, the quickest solution is likely to be the SDK and AWS CLI.

In this example, we’ll be using the boto3 Python SDK to achieve two goals:

  1. Find all bucket names and prefixes
  2. Find all bucket names and keys

Pre-requisites:

  1. Configure the AWS CLI on your machine
  2. Install Python
  3. Install boto3 (Run: pip install boto3)

Find All Bucket Names and Prefixes

import boto3

s3_client = boto3.client("s3")
s3_resource = boto3.resource('s3')

paginator = s3_client.get_paginator("list_objects_v2")

def get_matching_s3_objects(bucket):
    """
    Generate all CommonPrefixes in an S3 bucket.

    :param bucket: Name of the S3 bucket.
    """
    
    kwargs = {'Bucket': bucket, 'Delimiter': '/'}
    
    for page in paginator.paginate(**kwargs):
        try:
            prefix = page["CommonPrefixes"]
        except KeyError:
            break

        for obj in prefix:
            yield obj

def get_matching_s3_prefixes(bucket):
    """
    Retrieve just the Prefix from CommonPrefixes.

    :param bucket: Name of the S3 bucket.
    """
    for obj in get_matching_s3_objects(bucket):
        yield obj["Prefix"]

def main():
    for bucket in s3_resource.buckets.all():
        try:
            for prefix in get_matching_s3_prefixes(bucket.name):
                prefix = prefix.replace('/','')
                print(f"{bucket.name},{prefix}")
        except:
            print(f"Cannot access bucket: {bucket.name}")
        
if __name__ == '__main__':
    main()

Find All Bucket Names and Keys

import boto3

s3 = boto3.client("s3")
paginator = s3.get_paginator("list_objects_v2")

def get_matching_s3_objects(bucket):
    """
    Generate objects in an S3 bucket.

    :param bucket: Name of the S3 bucket.
    """
    
    kwargs = {'Bucket': bucket}
    
    for page in paginator.paginate(**kwargs):
        try:
            contents = page["Contents"]
        except KeyError:
            break

        for obj in contents:
            yield obj

def get_matching_s3_keys(bucket):
    """
    Generate the keys in an S3 bucket.

    :param bucket: Name of the S3 bucket.
    """
    for obj in get_matching_s3_objects(bucket):
        yield obj["Key"]

#for bucket in s3.buckets.all():
def main():
    bucket='voyager-demo-curated'
    for key in get_matching_s3_keys(bucket):
        print(f"{bucket}/{key}")
        
if __name__ == '__main__':
    main()

Category iconAWS Tag iconAWS S3,  boto3

About Daniel Andrews

Passionate about all things data and cloud. Specializing in Python, AWS and DevOps, with a Masters degree in Data Science from City University, London, and a BSc in Computer Science.

Footer

Recent Posts

  • How to Setup Neo4j on AWS ECS (EC2)
  • How to Setup Neo4j on AWS EC2
  • How to List AWS S3 Bucket Names and Prefixes
  • Amazon Redshift Tutorial (AWS)
  • Big O: How to Calculate Time and Space Complexity

.