TEZ-4682: [Cloud] Tez AM docker image#456
TEZ-4682: [Cloud] Tez AM docker image#456Aggarwal-Raghav wants to merge 3 commits intoapache:masterfrom
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
|
@abstractdog , I was able to start DagAppMaster with ZK on local. Attaching logs for the container docker_logs.txt But this PR has lot of open items and I need some advice on the following:
|
ffceca5 to
856875a
Compare
This comment was marked as outdated.
This comment was marked as outdated.
very good, very good, let me check this in detail sometime this week, here are some pointers in the meantime, responding your questions:
there is no Yarn NodeManager in a k8s environment, so the reader of the entrypoint.sh should see a clear code distinguishing between needed env vars and legacy/backward-compatible env vars, that's what should be handled with care in my opinion
|
|
Thanks for the pointers @abstractdog .
Few additional things:
|
856875a to
bfebf9e
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
c371557 to
9efce66
Compare
Using tez.local.mode=true, solves this as it will use |
This comment was marked as outdated.
This comment was marked as outdated.
9efce66 to
551075b
Compare
This comment was marked as outdated.
This comment was marked as outdated.
|
@abstractdog , Can you please help with review? |
let me get back to this next week |
sure |
This comment was marked as outdated.
This comment was marked as outdated.
| mvn clean install -DskipTests -Pdocker,tools | ||
| ``` | ||
|
|
||
| 2. Install zookeeper in mac by: |
There was a problem hiding this comment.
can you add ubuntu steps? we might be so kind to let linux users' life be easier
There was a problem hiding this comment.
UPDATE: can we use a dockerized zookeeper instead? install ZK on the host machine looks against this whole cloud/docker initiative (also, in case of problems or ZK nodes messed up, deleting and restarting a container feels easier and cleaner to me)
tez-dist/src/docker/tez.env
Outdated
|
|
||
| # Tez AM Container Environment Configuration | ||
|
|
||
| HADOOP_USER_NAME=tez |
There was a problem hiding this comment.
nitpicking, can you order the env vars here the same as they are ordered in the entrypoint script?
tez-dist/pom.xml
Outdated
| <argument>${project.basedir}/src/docker/build-docker.sh</argument> | ||
| <argument>-hadoop</argument> | ||
| <argument>${hadoop.version}</argument> | ||
| <argument>-tez</argument> | ||
| <argument>${project.version}</argument> | ||
| <argument>-repo</argument> | ||
| <argument>apache</argument> |
There was a problem hiding this comment.
can you make it similar to what I can see in Hive? much less verbose, e.g.
<arguments>
<argument>.... .sh</argument>
<argument>-hadoop ${hadoop.version}</argument>
<argument>-tez ${tez.version}</argument>
</arguments>
|
Thanks for the thorough review @abstractdog . I'll address the review comments shortly, you can continue the review in the meantime. I hope you are able to build the image and start tez am standalone process 😅 I still believe we can get rid of hadoop tarball dependency completely as the hadoop dependent required jars are already part of tez tarball. It might unnecessary increase docker image size. Also please suggest should I use eclipse-temurin:21.0.3_9-jre-ubi9-minimal or jdk, in case we want to take jstack or other java debugging tools, jkd image is required. |
|
tez-dist/src/docker/build-docker.sh
Outdated
| # HADOOP FETCH LOGIC # | ||
| ###################### | ||
| HADOOP_FILE_NAME="hadoop-$HADOOP_VERSION.tar.gz" | ||
| HADOOP_URL=${HADOOP_URL:-"https://archive.apache.org/dist/hadoop/core/hadoop-$HADOOP_VERSION/$HADOOP_FILE_NAME"} |
There was a problem hiding this comment.
what about using this first:
https://dlcdn.apache.org/hadoop/common/hadoop-3.4.2/hadoop-3.4.2.tar.gz
and then fall back to archive
archive.apache.org is crazy slow for me at the moment (not the first time), maybe it would worth discovering dlcdn.apache.org
tez-dist/src/docker/README.md
Outdated
| docker run \ | ||
| -p 10001:10001 -p 8042:8042 \ | ||
| --name tez-am \ | ||
| apache/tez-am:1.0.0-SNAPSHOT |
There was a problem hiding this comment.
I would introduce a TEZ_VERSION env var beforehand and refer to it: this would make it clear what's going to become obsolete in this doc, and what's more permanent :)
tez-dist/src/docker/README.md
Outdated
| docker run \ | ||
| -p 10001:10001 -p 8042:8042 \ | ||
| --name tez-am \ | ||
| apache/tez-am:1.0.0-SNAPSHOT |
There was a problem hiding this comment.
I'm trying the steps here and while the docker run I get:
2026-02-25 15:08:52,107 ERROR app.DAGAppMaster: Error starting DAGAppMaster
java.io.FileNotFoundException: /opt/tez/tez-conf.pb (No such file or directory)
at java.base/java.io.FileInputStream.open0(Native Method)
at java.base/java.io.FileInputStream.open(Unknown Source)
at java.base/java.io.FileInputStream.<init>(Unknown Source)
at org.apache.tez.common.TezUtilsInternal.readUserSpecifiedTezConfiguration(TezUtilsInternal.java:83)
at org.apache.tez.frameworkplugins.yarn.YarnServerFrameworkService$YarnAMExtensions.loadConfigurationProto(YarnServerFrameworkService.java:73)
at org.apache.tez.dag.app.DAGAppMaster.main(DAGAppMaster.java:2435)
also earlier I get this:
/entrypoint.sh: line 34: hostname: command not found
I believe this happens in the entrypoint, so should not be related to my host machine
can you advise what could possible cause these? I mean, I can debug it for sure, but maybe it's more straightforward for you, given you're the author
There was a problem hiding this comment.
For /opt/tez/tez-conf.pb in top commit i removed it because it was not failing for me. please ensure you are using the same tez-site.xml as in the PR or you can revert the last commit entrypoint.sh .
hostname one is know issue the command doesn't exist in the base docker image. i'll fix it, i forgot to remove it :-(
There was a problem hiding this comment.
cd tez-dist/src/docker and run the command.
docker run --rm \
-p 10001:10001 -p 5005:5005 \
--env-file tez.env \
--name tez-am \
apache/tez-am:1.0.0-SNAPSHOT
that tez-conf.pb is required in YARN mode not in zk mode. please give -e STANDALONE_ZOOKEEPER or tez.env and ensure that hadoop is also running or remove the followning propety. I was testing this with TEZ-4686
<property>
<name>fs.defaultFS</name>
<value>hdfs://host.docker.internal:9000</value>
</property>
There was a problem hiding this comment.
your stacktrace also suggests its going in default YARN mode.
There was a problem hiding this comment.
makes sense, but I think docker run command should contain tez.env right? I just copy-pasted the command, that was the problem, I can see only:
```bash
docker run \
-p 10001:10001 -p 8042:8042 \
--name tez-am \
apache/tez-am:1.0.0-SNAPSHOT
There was a problem hiding this comment.
yes, the README.md needs some change. will update it. Please let me know if you face any more issues in tez-am startup. I'm fully available until your tez-am image works :-)
I added the tez.env in point4 but didn't updated the older headings
There was a problem hiding this comment.
absolutely, thanks! making the image work on my side is crucial part of the review 🚀
|
💔 -1 overall
This message was automatically generated. |
|
@Aggarwal-Raghav : getting closer and closer to the finish, nice job so far! |
Let me check this in-depth over this weekend and I'll post my analysis/PR under the JIRA. Hoping its not too late 😅 Can you please check TEZ-4686 as well. With this tez-am docker image + standalone program or (tez master + hive master) the stacktrace is observed (under attachment section). I have raised a draft PR for this |
I believe this patch can be merged without TEZ-4686: with the current WIP patch of TEZ-4686, there is an issue which has to be resolved from both hive and tez side instead, I'll describe it in detail there |
|

No description provided.