mrjob.parse - log parsing

Utilities for parsing errors, counters, and status messages.

mrjob.parse.is_s3_uri(uri)

Return True if uri can be parsed into an S3 URI, False otherwise.

mrjob.parse.is_uri(uri)

Return True if uri is a URI and contains :// (we only care about URIs that can describe files)

mrjob.parse.parse_mr_job_stderr(stderr, counters=None)

Parse counters and status messages out of MRJob output.

Parameters:
  • stderr – a filehandle, a list of lines (bytes), or bytes
  • counters – Counters so far, to update; a map from group (string to counter name (string) to count.

Returns a dictionary with the keys counters, statuses, other:

  • counters: counters so far; same format as above
  • statuses: a list of status messages encountered
  • other: lines (strings) that aren’t either counters or status messages
mrjob.parse.parse_s3_uri(uri)

Parse an S3 URI into (bucket, key)

>>> parse_s3_uri('s3://walrus/tmp/')
('walrus', 'tmp/')

If uri is not an S3 URI, raise a ValueError

mrjob.parse.to_uri(path_or_uri)

If path_or_uri is not a URI already, convert it to a file:/// URI.