GuidesΒΆ
- Why mrjob?
- Fundamentals
- Concepts
- Writing jobs
- Runners
- Spark
- Why use mrjob with Spark?
- mrjob spark-submit
- Writing your first Spark MRJob
- Running on your Spark cluster
- Using remote filesystems other than HDFS
- Other ways to run on Spark
- Passing in libraries
- Command-line options
- Uploading files to the working directory
- Archives and directories
- Multi-step jobs
- External Spark scripts
- Custom input and output formats
- Running “classic” MRJobs on Spark
- Config file format and location
- Options available to all runners
- Hadoop-related options
- Spark runner options
- Configuration quick reference
- Cloud runner options
- Job Environment Setup Cookbook
- Hadoop Cookbook
- Testing jobs
- Cloud Dataproc
- Elastic MapReduce
- Python 2 vs. Python 3
- Contributing to mrjob