
[Stage 2:=============================> (1 + 1) / 2]17/06/09 06:48:54 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1496990795119_0001_02_000002 on host: slave1. Exit status: -1000. Diagnostics: Could not obtain block: BP-1759922210- file=/user/ubuntu/.sparkStaging/application_1496990795119_0001/py4j-0.10.4-src.zip org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1759922210- file=/user/ubuntu/.sparkStaging/application_1496990795119_0001/py4j-0.10.4-src.zip

The problem is more likely a lack of correlation between Spark's request for RAM (driver memory + executor memory) and Yarn's container sizing configuration. Yarn settings determine min/max container sizes, and should be based on available physical memory, number of nodes, etc. As a rule of thumb, try making the minimum Yarn container size 1.5 times the size of the requested driver/executor memory (in this case, 1.5 GB).

