Sending e-mail notifications directly from Azure Databricks using SendGrid

Hi! Let me start by saying that Azure Databricks is one of the best tools in a data engineers’ arsenal, if used properly. Databricks (in my opinion) is best fitted to run as a batch processing “facility” that enables your custom code, be it in Python, R, Scala or SQL, to run on a distributed environment that performs complex operations.

Now, despite Azure being perfectly capable of logging elaborate metrics of its multitude of services, the inner executions of the code in Databricks is still (in my opinion) best logged using custom code – and logging without notifications is only half the job, hence the purpose of this article.

Today I will show you how to execute a basic email notification from an Azure Databricks Python notebook using the Azure – SendGrid integration. First, we are going to create a SendGrid account, then we are going to create an API key and following that we will add the relevant code in our Databricks notebook.  For this guide, I will assume that you have an already setup Databricks notebook and a corresponding cluster – I will not be going over those topics.

Setting up a SendGrid account

First, we will need a means for sending mail notifications from Databricks – and for that we are going to use the mail sending platform SendGrid. You can find out more details about the platform itself from the official site. Before we begin, I will point out that we are going to be using the free plan of SendGrid, which provides us with 25 000 mails per month forever.

In the Azure portal search window, type SendGrid and select the “SendGrid accounts” in the dropdown.

In the next screen click on the Add button to begin the process of creating the account.

Setting up SendGrid

Once we have the new resource has been deployed to the group you have chose, it is time to setup SendGrid.

Go to the “Manage” button to proceed.

After confirming your identity (a mandatory step to comply with the current Anti-Spam regulations – you are prompted to create your “Sender Identity” on your first login) you can finally proceed to the actual settings that we will need.

The process here is quite straight forward – head on to the Settings –> API key section and click on the “New API key” button :

The next screen requires you to choose authorization level and the provide a meaningful (for you) name of the API key. Please note that for the purpose this article I will be using the full authorizations that the SendGrid key provides – I would strongly suggest that you use the custom setting for your production environment and tweak it to ensure a proper level of security and access segregation is observed.

Once you click on the “Create & View” button you will get your only chance to view the actual key. The screen will not allow you to continue until you click on the key value to copy it to clipboard.

Now that we have our API key generated, we can head over to the Azure Databricks environment.

Setting up Databricks

After logging in to your Databricks account, head over to the “Clusters” section – we will be using Python for this example and for that we will need the SendGrid library added to the cluster.

Go to the cluster tab and start the cluster (for the purpose of the article , I assume that you have already created or have been provided with a cluster).

After the cluster has started, head over to the library section. Click on Libraries –> Install New –> PyPi. We will be installing the “sendgrid” package.

After you press Install, the PyPi package will be installed on your currently running cluster. Please mind that in case you are running a job cluster, you will need to account for that and install the package there as well.

Now, finally, we are getting to the code. Since this is a demo, the code below is basically a set of hardcoded strings – in your real world scenario you can easily substitute these for your respective variables. Or, just leave them as strings and provide a meaningful message to the recipient:

import os
from sendgrid import SendGridAPIClient
from sendgrid.helpers.mail import Mail

if control_flag==1:
  message = Mail(
      from_email='sender@domain.com',
      to_emails='recipient@domain.com',
      subject='Something is going on with your databricks',
      html_content='Insert databricks content here'
  )
  sg = SendGridAPIClient('your_api_key_goes_here'')  
 # The code below is for your reference and diagnostic
 # purposes only, you can omit that if you don't face any issues
  response = sg.send(message)
  print(response.status_code)
  print(response.body)
  print(response.headers)

Et voila! Whenever the control_flag is equal to 1, you will get a notification (friendly reminder – don’t forget to setup the control_flag=1 prior to testing this). Also, consider the possibility that the mail you have just send might appear in the recipients spam folder.

Bonus content – sending e-mails from Databricks by using standard Python libraries

There are cases where you may be limited in installing libraries on the cluster that was provided. For such cases you can use the standard Python smtplib and email libraries, and connect to the SendGrid via its exposed smtp protocol. The connection to the smtp server of SendGrid in the case below is on port 465 as it provides SSL, alternativelly you can connect without SSL on 25, 587, & 2525. The code is below :

import smtplib
from email.mime.multipart import  MIMEMultipart
from email.mime.text import MIMEText

mail_from = 'sender@domain.com'
mail_to = 'recipient@domain.com'

msg = MIMEMultipart()
msg['From'] = mail_from
msg['To'] = mail_to
msg['Subject'] = "Subject that is meaningful to the recipient"
mail_body = "Something happened in your databricks notebook"
msg.attach(MIMEText(mail_body))

try:
    server = smtplib.SMTP_SSL('smtp.sendgrid.net', 465)
    server.ehlo()
    server.login('apikey', 'your_api_key_goes_here')
    server.sendmail(mail_from, mail_to, msg.as_string())
    server.close()
    print("Mail sent successfully to server")
except:
    print("An issue with the sending has been encountered")


Published by Martin Stoyanov

IT expert, Data engineer, SAP Consultant, Star Trek fanboy

One thought on “Sending e-mail notifications directly from Azure Databricks using SendGrid

  1. Hi,

    Thanks for the details. If my Azure databricks runs in secure cluster connectivity (SCC) then is there any impact to sending email using sendgrid? Do i have to open any firewall/port to connect to sendgrid?

    –Babu

    Liked by 1 person

Leave a comment

Design a site like this with WordPress.com
Get started