Skip to content

Commit 891f3db

Browse files
committed
docs(README): Update documentation with redirection to trunkdataplatform website and typo corrections.
1 parent 213418b commit 891f3db

3 files changed

Lines changed: 39 additions & 126 deletions

File tree

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@
33

44
TDP is an open source project hosted on [GitHub](https://github.com/TOSIT-IO/TDP) administered by the [TOSIT association](https://tosit.fr/). It is released under the [Apache License 2.0](https://github.com/TOSIT-IO/TDP/blob/main/LICENSE).
55

6-
Learn more about how to [contribute to TDP ](https://www.trunkdataplatform.io/en/contribute/project/contributing) on our website.
6+
Learn more about how to [contribute to TDP](https://www.trunkdataplatform.io/en/contribute/project/contributing) on our website.

README.md

Lines changed: 37 additions & 124 deletions
Original file line numberDiff line numberDiff line change
@@ -2,146 +2,59 @@
22

33
![](static/tdp_logo.png)
44

5-
Trunk Data Platform is an Open Source, free, Hadoop distribution.
5+
[Trunk Data Platform](https://www.trunkdataplatform.io) is an Open Source, free, Hadoop distribution built from Apache projects source code.
66

7-
This distribution is built by EDF (French electricity provider) & DGFIP (Tax Office by the French Ministry of Finance), through an association called TOSIT (The Open source I Trust).
7+
## Authors
88

9-
TDP is built from Apache projects source code.
9+
This distribution is built by EDF (French electricity provider) & DGFIP (Tax Office by the French Ministry of Finance), through an association called [TOSIT](https://tosit.fr/) (The Open source I Trust).
1010

11-
## TDP repositories
11+
## Local build environment for components
1212

13-
The TDP project is composed of multiple repositories:
14-
- [tdp-collection](https://github.com/TOSIT-IO/tdp-collection): main Ansible collection to deploy TDP core components.
15-
- [tdp-collection-extras](https://github.com/TOSIT-IO/tdp-collection-extras): Ansible collection to deploy extra components that are not part of TDP core.
16-
- [tdp-collection-prerequisites](https://github.com/TOSIT-IO/tdp-collection-prerequisites): Ansible collection to deploy prerequisite components to a TDP installation (i.e.: KDC, PostgreSQL, etc.).
17-
- [tdp-lib](https://github.com/TOSIT-IO/tdp-lib): Python library to configure, manage and deploy TDP.
18-
- [tdp-server](https://github.com/TOSIT-IO/tdp-server): REST API for tdp-lib orchestration.
19-
- [tdp-ui](https://github.com/TOSIT-IO/tdp-ui): Web UI for TDP clusters deployment and configuration, uses tdp-server.
20-
- [tdp-getting-started](https://github.com/TOSIT-IO/tdp-getting-started): A ready to deploy TDP virtual environment based of Vagrant showcasing how to use every component of TDP.
13+
In order to build the TDP components, two distinct images have been made:
2114

22-
Each component of TDP also has its own repository.
15+
- [tdp-builder]((build-env/README.md)) containing Maven for Java compilation of the Apache Hadoop environment components.
16+
- [tdp-builder-python](build-env-python/README.md) containing a manylinux2014 image with different Python versions for the packaging of JupyterHub, Jupyterlab, Sparkmagic and Hue.
2317

24-
## Trunk Data Platform Release TDP-1.0 components version
18+
### tdp-builder
2519

26-
### TDP Core
20+
Run the following script to build the image, open the container and be ready to compile the components with Maven:
2721

28-
The following table shows the core components of TDP as well as the Apache branch they were based on and the TDP branch which serves as base for our releases.
22+
```sh
23+
./bin/start-build-env.sh
24+
```
2925

30-
| Component | Version | Apache Git branch | TDP Git Branch | TDP commits |
31-
| --------------------------- | ---------- | ----------------------------------------------------------------------------- | ------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------- |
32-
| Apache ZooKeeper | 3.4.6 | release-3.4.6 | XXX | X.X.X |
33-
| Apache Hadoop | 3.1.1-0.0 | [rel/release-3.1.1](https://github.com/apache/hadoop/commits/branch-3.1.1) | [branch-3.1.1-TDP](https://github.com/TOSIT-IO/hadoop/commits/branch-3.1.1-TDP) | [compare](https://github.com/TOSIT-IO/hadoop/compare/branch-3.1.1...branch-3.1.1-TDP) |
34-
| Apache Hive | 3.1.3-1.0 | [branch-3.1](https://github.com/apache/hive/commits/branch-3.1) | [branch-3.1-TDP](https://github.com/TOSIT-IO/hive/commits/branch-3.1-TDP) | [compare](https://github.com/TOSIT-IO/hive/compare/branch-3.1...branch-3.1-TDP) |
35-
| Apache Hive 2 (for Spark 3) | 2.3.9-1.0 | [branch-2.3](https://github.com/apache/hive/commits/branch-2.3) | [branch-2.3-TDP](https://github.com/TOSIT-IO/hive/commits/branch-2.3-TDP) | [compare](https://github.com/TOSIT-IO/hive/compare/branch-2.3...branch-2.3-TDP) |
36-
| Apache Hive 1 (for Spark 2) | 1.2.3-1.0 | [branch-1.2](https://github.com/apache/hive/commits/branch-1.2) | [branch-1.2-TDP](https://github.com/TOSIT-IO/hive/commits/branch-1.2-TDP) | [compare](https://github.com/TOSIT-IO/hive/compare/branch-1.2...branch-1.2-TDP) |
37-
| Apache Tez | 0.9.1-1.0 | [branch-0.9.1](https://github.com/apache/tez/commits/branch-0.9.1) | [branch-0.9.1-TDP](https://github.com/TOSIT-IO/tez/commits/branch-0.9.1-TDP) | [compare](https://github.com/TOSIT-IO/tez/compare/branch-0.9.1...branch-0.9.1-TDP) |
38-
| Apache Spark | 2.3.4-1.0 | [branch-2.3](https://github.com/apache/spark/commits/branch-2.3) | [branch-2.3-TDP](https://github.com/TOSIT-IO/spark/commits/branch-2.3-TDP) | [compare](https://github.com/TOSIT-IO/spark/compare/branch-2.3...branch-2.3-TDP) |
39-
| Apache Spark 3 | 3.2.2-0.0 | [branch-3.2](https://github.com/apache/spark/commits/branch-3.2) | [branch-3.2-TDP](https://github.com/TOSIT-IO/spark/commits/branch-3.2-TDP) | [compare](https://github.com/TOSIT-IO/spark/compare/branch-3.2...branch-3.2-TDP) |
40-
| Apache Ranger | 2.0.0-1.0 | [ranger-2.0](https://github.com/apache/ranger/commits/ranger-2.0) | [ranger-2.0-TDP](https://github.com/TOSIT-IO/ranger/commits/ranger-2.0-TDP) | [compare](https://github.com/TOSIT-IO/ranger/compare/ranger-2.0...ranger-2.0-TDP) |
41-
| Apache Solr (for Ranger) | 7.7.3 | releases/lucene-solr/7.7.3 | XXX | X.X.X |
42-
| Apache HBase | 2.1.10-1.0 | [branch-2.1](https://github.com/apache/hbase/commits/branch-2.1) | [branch-2.1-TDP](https://github.com/TOSIT-IO/hbase/commits/branch-2.1-TDP) | [compare](https://github.com/TOSIT-IO/hbase/compare/branch-2.1...branch-2.1-TDP) |
43-
| Apache Phoenix | 5.1.3-1.0 | [5.1](https://github.com/apache/phoenix/commits/5.1) | [5.1.3-TDP](https://github.com/TOSIT-IO/phoenix/commits/5.1.3-TDP) | [compare](https://github.com/TOSIT-IO/phoenix/compare/5.1...5.1.3-TDP) |
44-
| Apache Phoenix Query Server | 6.0.0-0.0 | [6.0.0](https://github.com/apache/phoenix-queryserver/commits/6.0.0) | [6.0.0-TDP](https://github.com/TOSIT-IO/phoenix-queryserver/commits/6.0.0-TDP) | [compare](https://github.com/TOSIT-IO/phoenix-queryserver/compare/6.0.0...6.0.0-TDP) |
45-
| Apache Knox | 1.6.1-0.0 | [v1.6.1](https://github.com/apache/knox/commits/v1.6.1) | [v1.6.1-TDP](https://github.com/TOSIT-IO/knox/commits/v1.6.1-TDP) | [compare](https://github.com/TOSIT-IO/knox/compare/v1.6.1...v1.6.1-TDP) |
46-
| Apache HBase Connectors | 1.0.0-0.0 | [rel/1.0.0](https://github.com/apache/hbase-connectors/commits/rel/1.0.0) | [branch-2.3.4-1.0.0-TDP](https://github.com/TOSIT-IO/hbase-connectors/commits/branch-2.3.4-1.0.0-TDP) | [compare](https://github.com/TOSIT-IO/hbase-connectors/compare/1.0.0...branch-2.3.4-1.0.0-TDP) |
47-
| Apache HBase Operator tools | 1.1.0-0.0 | [rel/1.1.0](https://github.com/apache/hbase-operator-tools/commits/rel/1.1.0) | [branch-1.1.0-TDP](https://github.com/TOSIT-IO/hbase-operator-tools/commits/branch-1.1.0-TDP) | [compare](https://github.com/TOSIT-IO/hbase-operator-tools/compare/branch-1.1.0...branch-1.1.0-TDP) |
26+
The components' versions and their repositories are found in [tdp-core](https://www.trunkdataplatform.io/en/discover/stacks/tdp2-0) and must be compiled in the follwing order:
4827

49-
Versions are approximately based on the [HDP 3.1.5 release](https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/release-notes/content/hdp_relnotes.html).
28+
- Zookeeper
29+
- Hadoop
30+
- Tez
31+
- Spark3
32+
- Hive
33+
- HBase
34+
- Ranger
35+
- Phoenix
36+
- Phoenix-queryserver
37+
- Knox
38+
- HBase Operator tools
39+
- Iceberg
5040

51-
**Note**: For some projects, the Apache foundation maintains sometimes a branch with this the components on which are backported fixes and features. We will be using these branches as much as possible if they are maintained and compatible.
41+
The Maven compilation commands of the different components can be found in the `tdp/README.md` file of each project.
5242

53-
### TDP Extras
43+
### tdp-builder-python
5444

55-
"TDP Extras" carries some projects that cannot be integrated to "TDP Core". There can be different reasons that keep the project outside of the core:
45+
Although the python coded components use the same [tdp-builder-python](build-env-python/README.md) image, they must be packaged seperately in different containers since each component needs its own envrionment:
5646

57-
- The project is not judged as a key component of the Hadoop ecosystem. This is the case of Airflow.
58-
- The project is not active enough. This is the case of Livy that has not been updated in 2 years.
59-
- The project has some incompatibilities with other "TDP Core" projects' releases. This is the case of Kafka 2.X that relies on ZooKeeper 3.5.X (and cannot use the ZooKeeper 3.4.6 of "TDP Core").
47+
- [JupyterHub](https://github.com/TOSIT-IO/jupyterhub/tree/branch-2.3.1-basic/tdp)
48+
- [JupyterLab](https://github.com/TOSIT-IO/jupyterlab/tree/branch-3.2.9-basic/tdp)
49+
- [Sparkmagic](https://github.com/TOSIT-IO/sparkmagic/tree/branch-0.21.0-basic/tdp)
50+
- [Hue](https://github.com/TOSIT-IO/hue/tree/branch-4.11.0-fix/tdp)
6051

61-
| Component | Version | Apache Git branch | TDP Git Branch | TDP commits |
62-
| ---------------------------------- | ------- | ---------------------------------------------------------------- | ------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------- |
63-
| Apache ZooKeeper 3.5.9 (for Kafka) | 3.5.9 | release-3.5.9 | XXX | X.X.X |
64-
| Apache Kafka | 2.8.2 | [2.8](https://github.com/TOSIT-IO/kafka/tree/2.8) | [2.8-TDP](https://github.com/TOSIT-IO/kafka/tree/2.8-TDP) | [compare](https://github.com/TOSIT-IO/kafka/compare/2.8...2.8-TDP) |
65-
| Apache Livy | 0.8.0 | [master](https://github.com/TOSIT-IO/incubator-livy/tree/master) | [branch-0.8.0-TDP](https://github.com/TOSIT-IO/incubator-livy/tree/branch-0.8.0-TDP) | [compare](https://github.com/TOSIT-IO/incubator-livy/compare/master...branch-0.8.0-TDP) |
66-
| Apache Airflow | 2.2.2 | 2.2.2 | XXX | X.X.X |
52+
### Special case for the Apache Livy compilation
6753

68-
**Note:** A project can graduate from "TDP Extras" to "TDP Core" if enough people are supporting it and/or if it is made compatible with all the other projects of the stack.
54+
Apache Incubator Livy has its own compilation environment with its own instrsuctions which can be found in the [Incubator Livy project](https://github.com/TOSIT-IO/incubator-livy/tree/branch-0.9.0-fix/tdp).
6955

70-
## Tested operating system (OS)
56+
## Contributing
7157

72-
Only bare metal and virtual machine deployment are tested. Container based OS may work but are not guaranteed.
58+
Contributions are always welcome!
7359

74-
- Centos 7
75-
- Rocky 8
76-
77-
Redhat like OS may work but are not guaranteed.
78-
79-
## TDP Components release
80-
81-
Every TDP initial release is built from a reference branch on the Apache Git repository according to the above tables. The main change from the original branches is the version declaration in the pom.xml files.
82-
83-
## Building / Testing environment
84-
85-
The builds / unit testing of the Maven Java projects of each component above can be run in Kubernetes pods which are scheduled by a Jenkins installation also running on Kubernetes.
86-
Kubernetes pods scheduling allows for **truly** reproducible and isolated builds. Jenkins' strong integration with the Java ecosystem is a perfect match to build the components of the distribution.
87-
88-
### Build order
89-
90-
- hadoop
91-
- tez
92-
- hive1
93-
- spark
94-
- hive2
95-
- spark3
96-
- hive
97-
- hbase
98-
- ranger
99-
- phoenix
100-
- phoenix-queryserver
101-
- knox
102-
- hbase-spark-connector
103-
- hbase-operator-tools
104-
105-
### Kubernetes
106-
107-
Kubernetes was installed on Ubuntu 20.04 Virtual Machines with [kubeadm](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/).
108-
109-
**Note:** It is strongly recommended to deploy a Storage Class in order to have persistence on the Kubernetes cluster (useful for Jenkins among others). In our case, we are using [Rook](https://rook.io/) on physical drives attached to the Kubernetes cluster's VMs.
110-
111-
### Jenkins
112-
113-
Jenkins is used to trigger the builds which is the same process for every component of the stack:
114-
115-
- Git clone the sources
116-
- Build the project
117-
- Run the unit tests
118-
- Publish the artifacts to a remote repository
119-
120-
Jenkins was installed on the Kubernetes cluster with the official [jenkinsci Helm chart](https://github.com/jenkinsci/helm-charts).
121-
122-
### Nexus / Docker registry
123-
124-
The building environment needs multiple registries:
125-
126-
- Maven to host the compiled Jars
127-
- Docker to host the images that we use to build the projects
128-
- File registry to host the .tar.gz files with the binaries and jars for every compiled projects.
129-
130-
Nexus Repository OSS can assume all three roles, is free and open source.
131-
132-
Nexus OSS was install on the Kubernetes cluster with the [helm chart](https://github.com/Oteemo/charts/tree/master/charts/sonatype-nexus) provided by [Oteemo](https://github.com/Oteemo).
133-
134-
## Local build environment
135-
136-
It is possible to run a local environment for building / small scale testing.
137-
138-
Prerequisite:
139-
140-
- Docker installed and available to your local user
141-
142-
You can start a local building environment with the `bin/start-build-env.sh` script.
143-
144-
**Note:** See `build-env/README.md` for details.
145-
146-
To build TDP component binaries, attach to the running `tdp-builder` container and `git clone` the TDP component repository to it. Each TDP component's `tdp/README.md` has custom instructions to launch the build process.
147-
Assign a directory path to the `TDP_HOME` variable in the `bin/start-build-env.sh` to control the local path of built TDP binaries.
60+
See [CONTRIBUTING.md](./CONTRIBUTING.md) for ways to get started.

build-env-python/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,4 @@ docker build . -t tdp-builder-python
1616

1717
Contrary to the `tdp-builder` container where components are compiled with maven putting the jar files in the `.m2` cache it is not the case here and therefore volumes, working directories and even users are different for each component.
1818

19-
Check the documentation of the concerned component for the command to start the container.
19+
Check the documentation of the concerned component for the command to start the container.

0 commit comments

Comments
 (0)