The Ryft Spark plugin listens to Spark events in real-time, which provide detailed information about the execution of Spark jobs, and writes the logs to a dedicated S3 bucket.Documentation Index
Fetch the complete documentation index at: https://docs.ryft.io/llms.txt
Use this file to discover all available pages before exploring further.
Spark Plugin Configuration
- Create an S3 bucket in your account that will store Spark event logs, or contact your Ryft representative if you prefer to use a Ryft-managed bucket.
It’s best to set a retention policy of at least 7 days.
- Add the spark plugin dependency to the Spark Application. This is done differently depending on the deployment:
- Spark 3.5
- Spark 3.3
- Register the Ryft Plugin and set the
spark.eventLog.ryft.dirconfig to the bucket defined
- Spark 3.5
- Spark 3.3
⚙️ Configuring AWS Glue Jobs
⚙️ Configuring AWS Glue Jobs
AWS Glue Spark Job Configuration
Adding the Plugin to Your Spark Session
Configure your Spark session with the Ryft plugin by adding the following configuration:- Spark 3.5
- Spark 3.3
Glue jobs only support a single SparkSession - make sure only one is initialized. Initializing more than one SparkSession can prevent the plugin from being loaded.
Uploading the Plugin JAR to S3
AWS Glue jobs require the plugin JAR to be available in S3. You can upload it directly from Maven Central using this command:- Spark 3.5
- Spark 3.3
Replace
YOUR_BUCKET_NAME with your actual bucket name. Ensure your Glue job has the necessary IAM permissions to read from this S3 location.Configuring the Extra JARs Parameter
The Glue job needs to include the plugin JAR using the--extra-jars parameter. This can be configured in several ways:- AWS Glue Console
- AWS CLI
- Programmatic
- Spark 3.5
- Spark 3.3
- Navigate to AWS Glue → Jobs → Select your job
- Go to the Job details tab
- Scroll to Advanced properties
- In the Job parameters section, add:
- Key:
--extra-jars - Value:
s3://your-bucket/jars/spark-plugin-3.5_2.13-0.3.6.jar
- Key:
📦 Using the plugin as a dependency
📦 Using the plugin as a dependency
Another way to include the plugin in Java or Scala Spark applications is by packaging it directly into your application’s uberjar. This embeds the plugin as a dependency, removing the need to reference the plugin JAR separately at runtime.
Add the plugin dependency
- Spark 3.5
- Spark 3.3
Add this to your
pom.xml:Include in your uberjar
Configure the Maven Shade plugin to include the dependency:- Spark 3.5
- Spark 3.3
This approach eliminates the need to configure external JARs in your Spark setup, which simplifies application bootstrap and gives you better control over version conflicts. However, it also means you’ll need to recompile your application whenever the plugin is updated.
🔧 Choosing the Right Plugin Version
🔧 Choosing the Right Plugin Version
Supported Artifacts and Compatibility
Choosing the Right Plugin Version
We publish two Spark plugin variants. Use the table below to select the one that matches your environment.| Artifact | Java Version | Spark Version | Scala Versions | Iceberg Version |
|---|---|---|---|---|
spark-plugin-3.3_2.12 | Java 8+ | Spark 3.3 | 2.12 | 1.2.0+ |
spark-plugin-3.3_2.12 | Java 8+ | Spark 3.3 | 2.13 | 1.2.0+ |
spark-plugin-3.5_2.12 | Java 17+ | Spark 3.5 | 2.12 | 1.7.1+ |
spark-plugin-3.5_2.13 | Java 17+ | Spark 3.5 | 2.13 | 1.7.1+ |
📌 Notes
Java Compatibility3.3_xplugins are compiled for Java 8 and run on Java 8+3.5_xplugins require Java 17+
- Minimum supported version is Iceberg 1.2.0
- ⚠️ Using older versions is unsupported
- ✅ For best results, use the latest Iceberg version officially supported by your Spark distribution
- Each plugin is published for Scala 2.12 and 2.13 - match your Spark distribution’s Scala version
✅ Recommended Usage
- Use
spark-plugin-3.3_xwith Spark 3.3 and Iceberg 1.2.0+ - Use
spark-plugin-3.5_xwith Spark 3.5+ - this is the preferred and actively maintained version
IAM Permissions
Ensure your Glue job’s IAM role has permissions to access the S3 bucket containing the JAR file:Customer managed bucket configuration
Customer managed bucket configuration
Setup S3 to SQS notifications
- Create a new SQS queue that will receive notifications on new files created in your S3 bucket.
- Add the following policy to the queue access policy to enable receiving notifications:
- Configure S3 notifications for new files created (choose “All object create events”) in the event logs bucket to be sent to the newly created SQS.
Add Ryft access policy to S3 and SQS
Add the following access policy to the Ryft-ControlPlaneRole you already created, to allow reading notifications from the queue.- IAM → Roles → Search for “Ryft-ControlPlaneRole” (or the name you used)
- Add permissions → Create inline policy → Select the JSON tab
- Add the following policy, fill in the bucket and the queue parameters
You are done! Locate the URL of the queue you just created and provide it to Ryft, we will now finish setting up the integration.
The URL should look similar to:
https://sqs.us-east-1.amazonaws.com/<account-id>/<queue-name>