top of page

Sending MongoDB Atlas logs to Logz.io

Updated: Dec 26, 2022

Send MongoDB Atlas cluster logs to Logz.io periodically through a scheduled AWS Lambda function.

Photo by Kate Macate on Unsplash


MongoDB Atlas retains 30 days of log activity for mongo processes and audit logs for each cluster you own. These logs are retrievable from the console, or through the API.


I wanted to view my MongoDB logs in Logz.io, just like all other logs we collect, so all data can be analyzed and visualized together. I had already used an AWS Lambda for sending RDS logs from Cloudwatch to Logz.io, but couldn’t find a solution for retrieving logs from Atlas. I realized that I already have a solution for forwarding Cloudwatch logs, so all I have to do is figure out how to get the logs there, and the rest is already covered.


I figured that I need to the following: 1. Get credentials for the Atlas API and place them in a secured location in AWS. 2. Write a Lambda function that retrieves the logs periodically from MongoDB Atlas and prints them to the Lambda’s Cloudwatch logs. I prefer this to an EC2 Instance with a bash script and a cronjob for ease of deployment and operation. 3. Ship all logs from this Cloudwatch Log group to Logz.io using their Lambda function, packaged as a CloudFormation template here.


I mapped out the architecture using a CloudSkew diagram:


Architecture diagram


I chose to write my Lambda as an AWS SAM Application because I wanted to easily commit it to Git, configure a CICD pipeline if needed, and deploy the EventBridge schedule and all wanted dependencies as part of the app.

The following requirements are needed: 1. Your AWS Infrastructure is connected to MongoDB Atlas through a VPC Peering Connection or the cluster must have public IPs. 2. You have the required privileges to create an API Key on your MongoDB Atlas organization. 3. You have installed and configured awscli, and installed aws-sam. 4. You have sufficient permissions on your AWS Account for the resources in this guide.

When everything was ready I could start working.


Step 1: Generate Atlas API Credentials

Start with generating MongoDB Atlas API public and private keys. These credentials are secret so they should not be saved with our code, so I created a new secret in AWS Systems Manager Parameter Store and placed them there, as well as the Cluster ID in the following format:

{
  cluster_id: "cluster id",
  public_key: "public api key from atlas",
  private_key: "private api key from atlas"
}

Our function will have the permissions to retrieve them when needed.


Step 2: Create your SAM App

I ran the following command and configured a python3.8 app based on a docker image.

sam init

After that I started editing the template.yaml file, which describes all the app resources and config, and did the following:

  1. Changed the function timeout to 60. The API call was taking around 10–15 seconds to retrieve the tar.gz file, and the default 5 second timeout wasn’t enough. When retrieving more than an hour of logs, you might want to change this number to be even higher.

  2. Changed the trigger event to a Schedule. This creates an EventBridge schedule instead of the default API Gateway. I added a cron schedule for every hour. You can change this schedule to be whatever you need.

  3. Added an inline policy to the function. This policy allowed my function to retrieve the secrets from ssm.

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: >
  python3.8

  Sample SAM Template for atlas-cloudwatch-logs

# More info about Globals: https://github.com/awslabs/serverless-application-model/blob/master/docs/globals.rst
Globals:
  Function:
    Timeout: 60

Resources:
  AtlasCloudwatchFunction:
    Type: AWS::Serverless::Function # More info about Function Resource: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#awsserverlessfunction
    Properties:
      PackageType: Image
      Events:
        AtlasCloudwatchScrapeSchedule:
          Type: Schedule
          Properties:
            Schedule: 'cron(0 * * * ? *)'
            Enabled: True
      Policies:
      - Statement:
        - Sid: SSMDescribeParametersPolicy
          Effect: Allow
          Action:
          - ssm:DescribeParameters
          Resource: '*'
        - Sid: SSMGetParameterPolicy
          Effect: Allow
          Action:
          - ssm:GetParameters
          - ssm:GetParameter
          - ssm:GetParametersByPath
          Resource: 'arn:aws:ssm:<region>:<account-id>:parameter/*'
    Metadata:
      Dockerfile: Dockerfile
      DockerContext: ./atlas_cloudwatch
      DockerTag: python3.8-v1


Outputs:
  # ServerlessRestApi is an implicit API created out of Events key under Serverless::Function
  # Find out more about other implicit resources you can reference within SAM
  # https://github.com/awslabs/serverless-application-model/blob/master/docs/internals/generated_resources.rst#api
  AtlasCloudwatchFunction:
    Description: "atlas cloudwatch Lambda Function ARN"
    Value: !GetAtt AtlasCloudwatchFunction.Arn
  AtlasCloudwatchFunctionIamRole:
    Description: "Implicit IAM Role created for atlas cloudwatch function"
    Value: !GetAtt AtlasCloudwatchFunctionRole.Arn

After we got that out of the way I could move on to the python code.

  • I first wrote a function to retrieve the secrets from SSM.

  • I then defined the secrets as local variables, and defined the node names as well.

  • We will be fetching logs from each node, so all nodes in your cluster must be defined here. These can also be fetched from the API instead of written hard-coded.

  • I found the start and end time of my wanted logs using the time.time() function that returns seconds since UNIX Epoch, which is what the Atlas API requests as the startTime and endTime params for log fetching.

  • I looped through all nodes, and made a GET request to the Atlas API. More info about the API endpoint I used can be found here. I took the mongodb.gz log, because I found it to include all the logs I needed. You might choose to request a different log, or multiple logs.

  • I decompressed the gz file using the gzip library, and printed it out. This made the logs appear in the Lambda’s Cloudwatch log group.

  • Finally, I returned a 200 status code to signal the function has succeeded.

import json
import boto3
import requests
import gzip
import time
from requests.auth import HTTPDigestAuth

ssm = boto3.client("ssm")

# Function for retrieving secrets from a given path in ssm
def get_ssm_secret(parameter_name):
    return ssm.get_parameter(
        Name=parameter_name,
        WithDecryption=True
    )


def lambda_handler(event, context):

    # Retrieve secrets from ssm
    secrets_location = "/secrets/atlas"
    secret = get_ssm_secret(secrets_location)
    ssm_params = json.loads(secret.get("Parameter").get("Value"))

    # Define secrets as local vars
    CLUSTER_ID = ssm_params["cluster_id"]
    PUBLIC_KEY = ssm_params["public_key"]
    PRIVATE_KEY = ssm_params["private_key"]

    # Enter nodes in cluster for log collection
    nodes = ["xxxx-00-01.mlmu7.mongodb.net",
             "xxxx-00-01.mlmu7.mongodb.net",
             "xxxx-00-02.mlmu7.mongodb.net"]

    # Find seconds since unix epoch for atlas logs startTime and endTime
    now = int(time.time())
    hour_ago = now - 3600

    # Loop through nodes and receive it's audit logs
    # Other logs are available (mongos-audit-log.gz, mongos.gz, mongodb.gz), but data is irrelevant
    for node in nodes:
        url = f"https://cloud.mongodb.com/api/atlas/v1.0/groups/{CLUSTER_ID}/clusters/{node}/logs/mongodb.gz"
        response = requests.get(url, params={"startDate": hour_ago, "endDate": now}, headers={"Accept": "application/gzip"}, auth=HTTPDigestAuth(PUBLIC_KEY, PRIVATE_KEY))
        content = response.content
        print(content)

        # Logs are received as a gzip response, so decompress the logs and
        # print them to cloudwatch for collection
        text = gzip.decompress(content).decode('utf-8')
        print(text)

    return {
        "statusCode": 200,
        "body": json.dumps(
            {
                "message": "sent logs",
            }
        ),
    }

After checking my code worked and configuring aws-cli locally, I was able to build and deploy it by running the following commands:


sam build && sam deploy

After deploying the application successfully, I ran a few test invocations from the AWS Console to see that the API Credentials are working and that I am receiving logs. After that I could move on to the final step.



Step 3: Deploy Logz.io Logs shipper function

I followed the following guide to deploy a CloudFormation template that can configure a function that listens to my AWS SAM App’s Cloudwatch logs group, and ship these logs to Logz.io.


I later tested invoking my function again and saw that the Log shipper function was being triggered and pushing logs to Logz.io.


Summary

Architecture diagram


I was quite surprised that Logz.io didn’t have built in integration to MongoDB Atlas but I was happy to find great documentation on both platforms so the development of this small app was quick and easy.

Happy Coding!

27 views0 comments

Comments


bottom of page