AWS Chatbot custom message – solution

Most DevOps people who set up AWS Chatbot integrations with other AWS services eventually start wondering how to send custom messages through Chatbot.

At this point I would to remind you that your life will be much easier if you give up on the idea and instead send your message directly to Slack using a web hook.

But if you want to see this to the end:

Tom Stroobants documented the general SNS message format that Chatbot expects and it looks like this:

{
  "version": "0",
  "time": "1970-01-01T00:00:00Z",
  "id": "00000000-0000-0000-0000-000000000000",
  "account": "[your real account id]",
  "region": "[a real region]",
  "source": "aws.[a service prefix e.g. ec2]",
  "detail-type": "[you can use this field for your message]",
  "resources": [],
  "detail": {}
}

As long as these fields are present in the message AWS Chatbot will forward the message to Slack, but will not display any more details other than the text in the “detail-type” field, and doubles up that text.

To make AWS Chatbot deliver a more detailed message, one has to format the message according to the AWS Events that Chatbot supports. Which means our messages will have to have a predefined “detail-type” and “source”.

To see examples of all message formats that Chatbot can display, to find one that we could co-opt for our purposes:

  1. Open the EventBridge console at https://console.aws.amazon.com/events/.
  2. In the navigation pane, choose Rules.
  3. Choose Create rule.
  4. Enter a name and description for the rule.
  5. For Define pattern, choose Rule with an event pattern.
  6. Hit Next.
  7. For Event source, leave it on AWS events
  8. Now you can browse all available events under Sample Event / AWS events.

You will quickly notice that the event names are quite specific, and you might not want to use “VoiceId Batch Fraudster Registration Action” for your custom message.

I found that the “AWS Health Event” is innocent enough to be reusable, and now I am able to send free form paragraphs using the following:

{
    "version": "0",
    "id": "00000000-0000-0000-0000-000000000000",
    "account": "[my AWS account number]",
    "time": "1970-01-01T00:00:00Z",
    "region": "us-east-1",
    "source": "aws.health",
    "detail-type": "AWS Health Event",
    "resources": [],
    "detail": {
      "eventDescription": [{
        "language": "en_US",
        "latestDescription": "Long form message\nMore lines"
      }]
    }
}

I hope somebody with good enough connections to the AWS Chatbot team will get more details out of them, right now their official line is “AWS Chatbot only supports AWS Services”. Help?

HTH, imre

j j j

AWS Force MFA example policy doesn’t work on Administrators – Fix

There are several example policies written by Amazon itself, and also by other security providers like Yubico that claim to enforce MFA use, but simply do not work on users who have AdministratorAccess policy.

Here is an actual example policy written by Amazon that actually works: https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_examples_aws_my-sec-creds-self-manage.html

j j j

AWS SAM and the case of missing directories – Solution

I think serverless is the future, so I have been going through a Nodejs Lambda tutorial to learn more about it. Amazon recommends SAM for provisioning Lambda functions and it’s pretty straightforward at first sight.

Later you will learn that AWS SAM is opinionated, and it keeps its opinions to itself. It doesn’t tell you what it does, how it does it, there are no options to change it, and not much of a debug function to see what goes wrong.

sam build did not copy a directory into my Node package. Documentation says nothing about this behavior. Debug shows no information about what gets copied. Googling led nowhere. Eventually I simply guessed that it reads my .gitignore file and it ignores everything that I want Git to ignore, and I was right. And I was grumpy.

TL/DR: If you have missing files or missing directories in your sam build package, look at your .gitignore

j j j

LambdaAccessDenied error in AWS Load Balancer – Solution

Permission handling in ELB and Lambda is somewhat magical, some of the tools autoprovision permissions behind the scene, and some of them sometimes mess up.

I had a Lambda that I was invoking from a load balancer and it simply did not work. The only hint was “LambdaAccessDenied” in the ALB logs.

I had everything configured correctly. I have added a lambda permission for the entire elasticloadbalancing.amazonaws.com service to invoke my function. I had the proper target groups. I had even enabled AWS SAM to autoprovision the IAM roles. The Lambda function was firing correctly, I had logs to show that it was executing.

But I kept getting “502 Bad Gateway” from the load balancer and the logs kept showing LambdaAccessDenied.

I removed all the custom stuff I created. I removed the alias. I removed and re provisioned the entire lambda function. I removed and recreated the target group.

Eventually I removed the target group and the permission I created,
and provisioned an “Application Load Balancer” Trigger from the Lambda console. This created a new target group and a new resource-based policy under Permissions, and suddenly everything started working, even though the new entries looked exactly the same as the entries I created.

Since there are only five entries on Google that even mention this error message, I figured you might want to save some time and learn from my experience.

j j j

How to backup and restore an Easy-RSA certificate authority

Easy-RSA is great, but the documentation doesn’t cover much about backup and restore, so this is a quick write up on this topic.

If you want to back up your entire CA, save your easyrsa3/pki directory. You can simply restore this pki directory in a new install of easy-rsa and you will be back in business.

If you don’t want to backup your issued certificates, because for example you are using your CA for VPN authentication (then you only need the certificate serials for revocation, those are in pki/index.txt), then you only need to save the following four files:

pki/ca.crt
pki/private/ca.key
pki/issued/server.crt
pki/private/server.key

These files don’t ever change, so you don’t need to back them up frequently.

When you want to restore your easy-rsa install, you first have to create a skeleton pki directory with the easy-rsa init-pki command, then put the four files from above back in their previous places.

easy-rsa will still complain about other missing files and directories, but it doesn’t expect any data in those, so we can simply create empty files and directories to fix this:

touch easy-rsa/easyrsa3/pki/serial
touch easy-rsa/easyrsa3/pki/index.txt
touch easy-rsa/easyrsa3/pki/index.txt.attr
mkdir easy-rsa/easyrsa3/pki/certs_by_serial

So if you see errors like:

Easy-RSA error:

Missing expected CA file: serial (perhaps you need to run build-ca?)

Then run the empty file creation commands above.

If you have any questions, your best bet is to reach me on twitter at https://twitter.com/imreFitos

j j j

ELTE stunnel setup for Mac in 2021

ELTE is a great university but they don’t support Apple products well. If you are an ELTE student, use a Mac, and trying to access ELTE resources from home during the lockdown, this is the tutorial you need.

You have to have a Caesar or IIG username and password for this to work.

Step 1: install the Homebrew package manager from https://brew.sh/

  • Click on Applications -> Utilities -> Terminal
  • Copy the following line into the Terminal window (this is one single line):
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
  • When it asks you for your password, enter your computer’s password.

Please note: this can take 10-20 minutes to complete.

Step 2: install the stunnel package using Homebrew

  • in the same Terminal window, type the following line:
brew install stunnel

Step 3: put the ELTE stunnel.conf file in the stunnel directory

The following 7 lines are the configuration for stunnel. You need to save this into a file on your computer called /usr/local/etc/stunnel/stunnel.conf

foreground = yes
socket = l:TCP_NODELAY=1
socket = r:TCP_NODELAY=1
[proxys]
accept = 8080
connect = proxy.elte.hu:8443
client = yes

Step 4: Start up stunnel

brew services start stunnel

This will make sure that stunnel will always be running on your computer, even after rebooting.

Step 5: Configure your computer to go through ELTE for web browsing

  • Go to Apple Icon -> System Preferences -> Network
  • Click on the “Advanced” button in the bottom right corner
  • Click on the “Proxies” tab on the top row
  • Select “Web Proxy (HTTP)”
  • Add 127.0.0.1 under Web Proxy Server
  • Add 8080 next to the 127.0.0.1 after the colon symbol
  • Enable the “Proxy server requires password” option
  • Enter your Caesar/IIG username and password
  • ALSO repeat this under “Secure Web Proxy (HTTPS)”

This is it! Your web browsers will start going through ELTE with all their traffic.

To test, start up a browser, and google the following phrase “what is my ip address”. If you did everything right, the IP address Google will report back will start with 157.181.

Step 6: Turn off the ELTE browser redirect when you don’t need it

The setup above will send all your web browsing through ELTE, including YouTube and Netflix traffic, so it will be slow for you and problematic for them. It’s better to turn it off when you don’t need it.

  • Go to Apple Icon -> System Preferences -> Network
  • Click on the “Advanced” button in the bottom right corner
  • Click on the “Proxies” tab on the top row
  • UNselect “Web Proxy (HTTP)”
  • UNselect “Secure Web Proxy (HTTPS)”

That’s it, you are all set.

imre

j j j

How to monitor and alert on the Sidekiq Retry Queue

Sidekiq is the most popular queue processing service for Ruby on Rails. It has many brilliant features, one of them is its automatic retry when a queued job fails, to account for intermittent problems.

The retry system is automatic, by default Sidekiq retries a job 25 times before putting it on the Dead Job Queue. The retry delay grows exponentially – by the 25th retry a job would have spent three weeks in the Retry Queue!

Of course generally everybody has an alert system for when jobs fail. But, the Sidekiq retry logic works well and most errors are transient, so people grow complacent and start ignoring the messages about the failed jobs.

This works well until it doesn’t. This was the point when I started looking into ways to properly monitor the Sidekiq Retry Queue.

I had the following questions:

  • How to alert on jobs that have failed too many times for comfort?
  • How to alert if a deluge of jobs fail?
  • How to make sure the alerts we send are actionable?
  • How to check if the alerting system is operational?

I took some time during Christmas and wrote a single file ruby app called https://github.com/imreFitos/sidekiq_retry_alert. This app queries a Sidekiq server’s Retry Queue and sends alerts to a Slack channel when a single job keeps failing repeatedly, and if it finds a lot of failing jobs, it tallies them up into easily read Slack messages.

This is how it looks in Slack:

PRODUCTION ALARM: 2 NameOfTheImportantJobs on the Important queue have failed X+ times

The app remembers the previous state of the queue, so you only get messages when the queue’s state changes.

To check if the alerting system works, I wrote a second script that simply sends a daily report to the Slack channel. If you don’t see the daily report, chances are your alert system has stopped working.

This is how the daily report looks in Slack:

Daily report on production sidekiq retries:
ImportantQueue: 2 NameOfTheImportantJobs are retried

I recommend running them from cron.

I hope this helps!

imre

j j j

How to edit an existing Certificate Revocation List

How can one edit a Certificate Revocation List aka CRL? If you use openssl or easy-rsa to manage client certificates, they already have the tools built in to generate a CRL based on the certificates that exist in your PKI.

What if you don’t have all the original PKI files? Fortunately easy-rsa is simpler under the hood than how it looks like. All you need is the original CA key and certificate, and you can dump the contents of the existing CRL back into the easy-rsa format, edit the human readable file of certificates to revoke, and generate an updated CRL.

The details: easy-rsa only really cares about the existence of pki/ca.crt and pki/private/ca.key. It will complain about missing directories and files, but feel free to create them as empty files and directories.

A CRL is a list of serial numbers of certificates, with the entire file signed by the CA, and saved in X509 format.

To add a certificate to the CRL, you don’t need the original key, you don’t need the certificate either, only the serial number of the certificate.

You can print the serial number of a certificate using this openssl command: openssl x509 -noout -serial -in CERTIFICATEFILE.crt

easy-rsa keeps the tally of the certificates it manages in the human readable pki/index.txt file. It’s a list of certificate serial numbers, their expiration dates, and their status (Valid, Expired, Revoked)

If you don’t have this file any more, it’s fine. The following command takes all the serials from an existing CRL file and prints it in the easy-rsa index.txt format:

openssl crl -in DOWNLOADED-CRL.pem -noout -text | grep "Serial Number:" | awk ' { print "R\t200330000000Z\t200330000000Z\t" $NF "\tunknown\t" } '

You can save this output in pki/index.txt.

The format is pretty simple, it’s tab-separated. The fields are:

– status (R for revoked)
– expiration datetime in ‘YYMMDDhhmmssZ’ format
– revocation datetime in ‘YYMMDDhhmmssZ’ format
– serial number
– name of file, interestingly it’s kept as ‘unknown’
– Subject Name of certificate, but it can be left empty

Now you have recreated your index.txt and you also know what data is in it. If you want to add a new certificate to revoke, add another line and enter the information above.

When you are satisfied, run ./easyrsa gen-crl and it will create an updated /pki/crl.pem file containing the list of your existing and new revoked certificates.

If you use certificate based VPN systems like Amazon AWS VPC Client VPN, this can save your hide. HTH

j j j

How to print your AWS access key in Ruby? Solution

Want to see what AWS credentials your ruby code defaults to? Here you go:

credentials = Aws::SharedCredentials.new()
pp credentials.access_key_id
pp credentials.secret_access_key

If you want to see what credential your command line aws cli uses, the following command will show you:

aws sts get-caller-identity

j j j

Problem and Solution: Bundle install failed with fatal: Could not parse object

If you specify a gem with a github url and branch in your Gemfile, you can occasionally run into the following problem:


Fetching https://github.com/DavyJonesLocker/client_side_validations
fatal: Could not parse object '261964fdec8051e5d55f85e9074ed77be555e8a5'.
Git error: command `git reset --hard 261964fdec8051e5d55f85e9074ed77be555e8a5` in directory
/.../vendor/bundle/ruby/2.3.0/bundler/gems/client_side_validations-261964fdec80
has failed.
If this error persists you could try removing the cache directory
'/.../vendor/bundle/ruby/2.3.0/cache/bundler/git/client_side_validations-e290eb7b61ac375e1849a12ab45a9444d029fd93'

You will scratch your head because the branch is definitely available on github, so what gives?

Removing the cache directory doesn’t change the outcome either.

Solution: The root of the problem is that Bundler saves the last commit ID of the branch in Gemfile.lock, and next time you try to run bundler install it will try to pull the same commit ID.

IF the repo owner has removed the last used commit, say by merging it, git won’t be able to pull it any more, and gives up, blaming the local cache directory instead of the local Gemfile.lock

To solve this, run ‘bundle update’ which will ignore the contents of Gemfile.lock and refreshes it.

j j j