Skip to content

Commit a137a5f

Browse files
authored
Update makefile commands and contribution guide (#345)
* Update makefile commands * Update makefile and CONTRIBUTING.md * Update CONTRIBUTING.md
1 parent 43673f3 commit a137a5f

File tree

3 files changed

+122
-145
lines changed

3 files changed

+122
-145
lines changed

CONTRIBUTING.md

Lines changed: 113 additions & 139 deletions
Original file line numberDiff line numberDiff line change
@@ -1,163 +1,137 @@
11
# Contributing to the Databricks Labs Data Generator
2-
We happily welcome contributions to *dbldatagen*.
3-
4-
We use GitHub Issues to track community reported issues and GitHub Pull Requests for accepting changes.
5-
6-
## License
7-
8-
When you contribute code, you affirm that the contribution is your original work and that you
9-
license the work to the project under the project's Databricks license. Whether or not you
10-
state this explicitly, by submitting any copyrighted material via pull request, email, or
11-
other means you agree to license the material under the project's Databricks license and
12-
warrant that you have the legal authority to do so.
13-
14-
# Development Setup
15-
16-
## Python Compatibility
17-
18-
The code supports Python 3.10+ and has been tested with Python 3.10 and later.
19-
20-
## Quick Start
21-
2+
While **dbldatagen** cannot accept direct contribution from external contributors, all users can create GitHub Issues to propose new functionality. The dbldatagen team will review and prioritize new features based on user feedback.
3+
4+
## Making a contribution
5+
6+
### Setup
7+
To set up your local environment:
8+
9+
1. Ensure any [Non-Python Dependencies](#other-dependencies) are installed locally.
10+
2. Clone the repository:
11+
```bash
12+
git clone "repository URL"
13+
````
14+
15+
3. Open the repository in your IDE. Run the following terminal command to create a local development environment:
16+
```bash
17+
make dev
18+
```
19+
20+
### Development
21+
When contributing new functionality:
22+
23+
1. Sync changes from the `master` branch:
24+
```bash
25+
git checkout main && git pull
26+
```
27+
2. Checkout a new branch from `master`:
28+
```bash
29+
git checkout -b "branch name"
30+
```
31+
3. Add your functionality, tests, documentation, and examples.
32+
33+
### Formatting
34+
dbldatagen aims to follow [PEP8 standards](https://peps.python.org/pep-0008/). Code style should be checked for any new commits.
35+
36+
To validate code locally:
37+
38+
1. Run the following terminal command in your IDE:
39+
```bash
40+
make fmt
41+
```
42+
2. Fix any issues until no messages remain.
43+
44+
### Testing
45+
dbldatagen aims to have the highest possible test coverage. Code should be tested for any new commits.
46+
47+
To run unit tests locally:
48+
49+
1. Run the following terminal command in your IDE:
50+
```bash
51+
make test-coverage
52+
```
53+
2. Verify that all tests pass.
54+
3. Open the coverage report in your browser.
55+
4. Verify that all modified modules have full coverage.
56+
57+
### Submitting a PR
58+
To submit a pull request:
59+
60+
1. Squash all local commits in your branch.
61+
2. Push your changes:
62+
```bash
63+
git push
64+
```
65+
3. Navigate to the [Pull Requests](https://github.com/databrickslabs/dbldatagen/pulls) page and click **New pull request**.
66+
4. Complete the template.
67+
5. Submit your PR.
68+
69+
## Building the project locally
70+
71+
### Building the HTML documentation
72+
Documentation can be reviewed locally. To build and open the documentation in your browser, run the following terminal command:
2273
```bash
23-
# Install development dependencies
24-
make dev
25-
26-
# Format and lint code
27-
make fmt # Format with ruff and fix issues
28-
make lint # Check code quality
29-
30-
# Run tests
31-
make test # Run tests
32-
33-
# Build package
34-
make build # Build with modern build system
74+
make docs-serve
3575
```
3676

37-
## Development Tools
38-
39-
All development tools are configured in `pyproject.toml`.
40-
41-
## Dependencies
42-
43-
All dependencies are defined in `pyproject.toml`:
44-
45-
- `[project.dependencies]` lists dependencies necessary to run the `dbldatagen` library
46-
- `[tool.hatch.envs.default]` lists the default environment necessary to develop, test, and build the `dbldatagen` library
47-
48-
## Spark Dependencies
49-
50-
The builds have been tested against Spark 3.4.1+. This requires OpenJDK 1.8.56 or later version of Java 8.
51-
The Databricks runtimes use the Azul Zulu version of OpenJDK 8.
52-
These are not installed automatically by the build process.
53-
54-
## Creating the HTML documentation
55-
56-
Run `make docs` from the main project directory.
57-
58-
The main html document will be in the file (relative to the root of the build directory)
59-
`./docs/docs/build/html/index.html`
60-
61-
## Building the Python wheel
77+
### Building the Python wheel
78+
dbldatagen can be built locally as a Python wheel. To build the wheel, run the following terminal command:
6279

6380
```bash
64-
make build # Clean and build the package
81+
make build
6582
```
6683

67-
# Testing
68-
69-
## Developing new tests
70-
New tests should be created using PyTest with classes combining multiple `Pytest` tests.
71-
72-
Existing test code contains tests based on Python's `unittest` framework but these are
73-
run on `pytest` rather than `unitest`.
84+
## Prerequisites
7485

75-
To get a `spark` instance for test purposes, use the following code:
86+
### Python Compatibility
87+
dbldatagen supports Python 3.10+ and is tested with Python 3.10 and later.
7688

77-
```python
78-
import dbldatagen as dg
79-
80-
spark = dg.SparkSingleton.getLocalInstance("<name to flag spark instance>")
81-
```
82-
83-
The name used to flag the spark instance should be the test module or test class name.
84-
85-
## Running Tests
86-
87-
```bash
88-
# Run all tests
89-
make test
90-
91-
If using an environment with multiple Python versions, make sure to use virtual env or similar to pick up correct python versions.
92-
93-
If necessary, set `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` to point to correct versions of Python.
94-
95-
# Using the Databricks Labs data generator
96-
The recommended method for installation is to install from the PyPi package
97-
98-
You can install the library as a notebook scoped library when working within the Databricks
99-
notebook environment through the use of a `%pip` cell in your notebook.
100-
101-
To install as a notebook-scoped library, create and execute a notebook cell with the following text:
102-
103-
> `%pip install dbldatagen`
104-
105-
This installs from the PyPi package
106-
107-
You can also install from release binaries or directly from the Github sources.
108-
109-
The release binaries can be accessed at:
110-
- Databricks Labs Github Data Generator releases - https://github.com/databrickslabs/dbldatagen/releases
111-
112-
113-
The `%pip install` method also works on the Databricks Community Edition.
114-
115-
Alternatively, you use download a wheel file and install using the Databricks install mechanism to install a wheel based
116-
library into your workspace.
117-
118-
The `%pip install` method can also down load a specific binary release.
119-
For example, the following code downloads the release V0.2.1
120-
121-
> '%pip install https://github.com/databrickslabs/dbldatagen/releases/download/v021/dbldatagen-0.2.1-py3-none-any.whl'
89+
### Development Tools
90+
All development tools are configured in `pyproject.toml`.
12291

123-
# Code Quality and Style
92+
### Python Dependencies
93+
All Python dependencies are defined in `pyproject.toml`:
12494

125-
## Automated Formatting
95+
1. `[project.dependencies]` lists dependencies installed with the `dbldatagen` library
96+
2. `[tool.hatch.envs.default]` lists the default environment necessary to develop, test, and build the `dbldatagen` library
12697

127-
Code can be automatically formatted and linted with the following commands:
98+
### Non-Python Dependencies
99+
dbldatagen is tested against Databricks Runtime version 13.3LTS and OpenJDK 17.
128100

129-
```bash
130-
# Format code and fix issues automatically
131-
make fmt
101+
Spark and Java dependencies are not installed automatically by the build process and should be installed manually to develop and run dbldatagen locally.
132102

133-
# Check code quality without making changes
134-
make lint
135-
```
103+
## Development standards
136104

137-
## Coding Conventions
105+
### Code style
106+
All code should adhere to the following standards:
138107

139-
The code follows PySpark coding conventions:
140-
- Python PEP8 standards with some PySpark-specific adaptations
141-
- Method and argument names use mixed case starting with lowercase (following PySpark conventions)
142-
- Line length limit of 120 characters
108+
1. **Formatted and linted** to PEP8 standards.
109+
2. **Type-validated** using [mypy](https://mypy-lang.org/).
110+
3. **Clearly-named** variables, classes, and methods.
111+
4. **Include docstrings** that detail functionality and usage.
143112

144-
See the [Python PEP8 Guide](https://peps.python.org/pep-0008/) for general Python style guidelines.
113+
### Testing
114+
All tests should use [pytest](https://docs.pytest.org/en/stable/) with fixtures and parameterization where appropriate. This includes:
145115

146-
# Github expectations
147-
When running the unit tests on GitHub, the environment should use the same environment as the latest Databricks
148-
runtime latest LTS release. While compatibility is preserved on LTS releases from Databricks runtime 13.3 onwards,
149-
unit tests will be run on the environment corresponding to the latest LTS release.
116+
1. **Unit tests** cover functionality which does not require a Databricks workspace and should always be preferred to integration tests when possible.
117+
2. **Integration tests** cover functionality which requires Databricks compute, Unity Catalog, or other workspace features.
150118

151-
Libraries will use the same versions as the earliest supported LTS release - currently 13.3 LTS
119+
### Branches
120+
All local development should branch from `master` and adhere to the following naming convention:
152121

153-
This means for the current build:
122+
1. `feat_<feature_name>` for new functionality
123+
2. `fix_<issue_number>_<fix_name>` for bugfixes
154124

155-
- Use of Ubuntu 22.04 for the test runner
156-
- Use of Java 8
157-
- Use of Python 3.10.12 when testing / building the image
125+
### Pull requests
126+
All pull requests should adhere to the following standards:
158127

159-
See the following resources for more information
160-
= https://docs.databricks.com/en/release-notes/runtime/15.4lts.html
161-
- https://docs.databricks.com/en/release-notes/runtime/11.3lts.html
162-
- https://github.com/actions/runner-images/issues/10636
128+
1. Pull requests should be scoped to 1 repository issue.
129+
2. Local commits should be squashed on your branch before opening a pull request.
130+
3. All pull requests should include functionality, tests, documentation, and examples.
163131

132+
## License
133+
When you contribute code, you affirm that the contribution is your original work and that you
134+
license the work to the project under the project's Databricks license. Whether or not you
135+
state this explicitly, by submitting any copyrighted material via pull request, email, or
136+
other means you agree to license the material under the project's Databricks license and
137+
warrant that you have the legal authority to do so.

docs/source/conf.py

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,9 @@
1313
import os
1414
import sys
1515

16-
PACKAGE_DIR = "../../dbldatagen"
16+
PACKAGE_DIR = "../.."
1717

18-
sys.path.insert(0, os.path.abspath(f"{PACKAGE_DIR}"))
19-
sys.path.insert(0, os.path.abspath(f"{PACKAGE_DIR}/constraints"))
20-
sys.path.insert(0, os.path.abspath(f"{PACKAGE_DIR}/datasets"))
21-
sys.path.insert(0, os.path.abspath(f"{PACKAGE_DIR}/distributions"))
18+
sys.path.insert(0, os.path.abspath(PACKAGE_DIR))
2219

2320
from dbldatagen import *
2421
from dbldatagen.distributions import *

makefile

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,14 @@ fmt:
2222
test:
2323
hatch run test
2424

25+
test-coverage:
26+
make test && open htmlcov/index.html
27+
2528
build:
2629
hatch build
2730

2831
docs:
29-
cd docs && make docs
32+
cd docs && make docs
33+
34+
docs-serve:
35+
cd docs && make docs && open build/html/index.html

0 commit comments

Comments
 (0)