Skip to content

Commit f0f1850

Browse files
Merge pull request #63 from shcherbak-ai/dev
build: v0.15.0
2 parents cb133e4 + 5c13290 commit f0f1850

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

81 files changed

+12107
-4815
lines changed

CHANGELOG.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,17 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
55

66
- **Refactor**: Code reorganization that doesn't change functionality but improves structure or maintainability
77

8+
## [0.15.0](https://github.com/shcherbak-ai/contextgem/releases/tag/v0.15.0) - 2025-08-14
9+
### Added
10+
- Auto-pricing for LLMs: enable via `auto_pricing=True` to automatically estimate costs using pydantic's `genai-prices`; optional `auto_pricing_refresh=True` refreshes cached price data at runtime.
11+
12+
### Refactor
13+
- Public API made more consistent and stable: user-facing classes are now thin, well-documented facades over internal implementations. No behavior changes.
14+
- Internal reorganization for maintainability and future-proofing.
15+
16+
### Docs
17+
- Added guidance for configuring auto-pricing for LLMs.
18+
819
## [0.14.4](https://github.com/shcherbak-ai/contextgem/releases/tag/v0.14.4) - 2025-08-08
920
### Fixed
1021
- Suppressed noisy LiteLLM proxy missing-dependency error logs (prompting to install `litellm[proxy]`) emitted by `litellm>=1.75.2` during LLM API calls. ContextGem does not require LiteLLM proxy features. Suppression is scoped to LiteLLM loggers.

CONTRIBUTING.md

Lines changed: 19 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -67,19 +67,24 @@ contextgem/
6767
6868
├── contextgem/
6969
│ │
70-
│ ├── public/ # 🎯 User-facing API (start here for new features)
71-
│ │ ├── concepts.py # - Concepts API
72-
│ │ ├── aspects.py # - Aspects API
73-
│ │ ├── documents.py # - Document processing
74-
│ │ ├── pipelines.py # - Document data extraction pipelines
75-
│ │ ├── llms.py # - LLM extraction functionality
76-
│ │ └── ... # - More public modules
70+
│ ├── internal/ # 🔧 Core implementation (start here for new features)
71+
│ │ ├── base/ # - Core abstractions & business logic
72+
│ │ │ ├── concepts.py # - Internal concept implementations
73+
│ │ │ ├── aspects.py # - Internal aspect implementations
74+
│ │ │ ├── documents.py # - Internal document processing
75+
│ │ │ ├── llms.py # - Internal LLM functionality
76+
│ │ │ └── ... # - More internal implementations
77+
│ │ ├── prompts/ # - LLM prompt templates
78+
│ │ ├── typings/ # - Type definitions
79+
│ │ └── ... # - More internal modules
7780
│ │
78-
│ └── internal/ # 🔧 Internal implementation
79-
│ ├── base/ # - Core abstractions
80-
│ ├── prompts/ # - LLM prompt templates
81-
│ ├── typings/ # - Type definitions
82-
│ └── ... # - More internal modules
81+
│ └── public/ # 🎯 User-facing API (thin facades exposing internals)
82+
│ ├── concepts.py # - Public concept facades
83+
│ ├── aspects.py # - Public aspect facades
84+
│ ├── documents.py # - Public document facades
85+
│ ├── pipelines.py # - Public pipeline facades
86+
│ ├── llms.py # - Public LLM facades
87+
│ └── ... # - More public modules
8388
8489
├── tests/
8590
│ ├── cassettes/ # 📼 VCR recordings (auto-generated)
@@ -102,12 +107,12 @@ contextgem/
102107
```
103108
104109
**🎯 Quick Start for Your Contribution:**
105-
- **Adding new functionality?** → Start in `contextgem/public/`, often requires `internal/` changes too
110+
- **Adding new functionality?** → Implement in `contextgem/internal/` (core logic). Then expose via a thin public facade in `contextgem/public/` using the registry.
106111
- **Writing tests?** → Add to `tests/test_all.py::TestAll`
107112
- **Updating docs?** → Edit files in `docs/source/` or `dev/`
108113
- **Fixing README?** → Edit `dev/readme.template.md`
109114
110-
> **💡 Note:** New public features typically require supporting changes in internal modules (base classes, prompt templates, type definitions, etc.). Think of `public/` as the user interface and `internal/` as the engine that powers it.
115+
> **💡 Note:** Implement functionality in `internal/` (base classes, validation, serialization, typing). Use `public/` to expose thin, documented facades that inherit from internal classes and are registered with `@_expose_in_registry` decorator to ensure deserialization and instance creation utils return public types. Do not import public classes in internal modules; use the registry for type resolution and publicization.
111116
112117
113118
---

NOTICE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ This software includes the following third-party components:
2525

2626
Core Dependencies:
2727
- aiolimiter: Rate limiting for asynchronous operations
28+
- genai-prices: LLM pricing data and utilities (by Pydantic) to automatically estimate costs
2829
- Jinja2: Templating engine
2930
- litellm: LLM interface library (this software uses only MIT-licensed portions of LiteLLM and does not utilize any components from the enterprise/ directory)
3031
- loguru: Logging utility

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -484,6 +484,7 @@ This project is automatically scanned for security vulnerabilities using multipl
484484
ContextGem relies on these excellent open-source packages:
485485

486486
- [aiolimiter](https://github.com/mjpieters/aiolimiter): Powerful rate limiting for async operations
487+
- [genai-prices](https://github.com/pydantic/genai-prices): LLM pricing data and utilities (by Pydantic) to automatically estimate costs
487488
- [Jinja2](https://github.com/pallets/jinja): Fast, expressive, extensible templating engine used for prompt rendering
488489
- [litellm](https://github.com/BerriAI/litellm): Unified interface to multiple LLM providers with seamless provider switching
489490
- [loguru](https://github.com/Delgan/loguru): Simple yet powerful logging that enhances debugging and observability

contextgem/__init__.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
ContextGem - Effortless LLM extraction from documents
2121
"""
2222

23-
__version__ = "0.14.4"
23+
__version__ = "0.15.0"
2424
__author__ = "Shcherbak AI AS"
2525

2626
from contextgem.public import (
@@ -52,7 +52,7 @@
5252
)
5353

5454

55-
__all__ = [
55+
__all__ = (
5656
# Aspects
5757
"Aspect",
5858
# Concepts
@@ -90,4 +90,4 @@
9090
"JsonObjectClassStruct",
9191
# Converters
9292
"DocxConverter",
93-
]
93+
)

contextgem/internal/__init__.py

Lines changed: 56 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -17,17 +17,29 @@
1717
#
1818

1919
from contextgem.internal.base import (
20-
_AssignedAspectsProcessor,
21-
_AssignedConceptsProcessor,
22-
_AssignedInstancesProcessor,
23-
_Concept,
24-
_ExtractedItem,
25-
_ExtractedItemsAttributeProcessor,
26-
_InstanceBase,
27-
_MarkdownTextAttributesProcessor,
28-
_ParasAndSentsBase,
29-
_PostInitCollectorMixin,
30-
_RefParasAndSentsAttrituteProcessor,
20+
_COST_QUANT,
21+
_LOCAL_MODEL_PROVIDERS,
22+
_Aspect,
23+
_BooleanConcept,
24+
_DateConcept,
25+
_Document,
26+
_DocumentLLM,
27+
_DocumentLLMGroup,
28+
_DocumentPipeline,
29+
_ExtractionPipeline,
30+
_Image,
31+
_JsonObjectClassStruct,
32+
_JsonObjectConcept,
33+
_JsonObjectExample,
34+
_LabelConcept,
35+
_LLMPricing,
36+
_NumericalConcept,
37+
_Paragraph,
38+
_RatingConcept,
39+
_RatingScale,
40+
_Sentence,
41+
_StringConcept,
42+
_StringExample,
3143
)
3244
from contextgem.internal.converters import _DocxConverterBase, _DocxPackage
3345
from contextgem.internal.data_models import (
@@ -37,7 +49,12 @@
3749
_LLMUsage,
3850
_LLMUsageOutputContainer,
3951
)
40-
from contextgem.internal.decorators import _post_init_method, _timer_decorator
52+
from contextgem.internal.decorators import (
53+
_disable_direct_initialization,
54+
_expose_in_registry,
55+
_post_init_method,
56+
_timer_decorator,
57+
)
4158
from contextgem.internal.exceptions import (
4259
DocxConverterError,
4360
LLMAPIError,
@@ -111,19 +128,31 @@
111128
)
112129

113130

114-
__all__ = [
131+
__all__ = (
115132
# Base
116-
"_InstanceBase",
117-
"_AssignedAspectsProcessor",
118-
"_AssignedConceptsProcessor",
119-
"_AssignedInstancesProcessor",
120-
"_ExtractedItemsAttributeProcessor",
121-
"_RefParasAndSentsAttrituteProcessor",
122-
"_PostInitCollectorMixin",
123-
"_Concept",
124-
"_ExtractedItem",
125-
"_ParasAndSentsBase",
126-
"_MarkdownTextAttributesProcessor",
133+
"_COST_QUANT",
134+
"_LOCAL_MODEL_PROVIDERS",
135+
"_Aspect",
136+
"_BooleanConcept",
137+
"_DateConcept",
138+
"_Document",
139+
"_DocumentLLM",
140+
"_DocumentLLMGroup",
141+
"_DocumentPipeline",
142+
"_ExtractionPipeline",
143+
"_Image",
144+
"_JsonObjectClassStruct",
145+
"_JsonObjectConcept",
146+
"_JsonObjectExample",
147+
"_LabelConcept",
148+
"_LLMPricing",
149+
"_NumericalConcept",
150+
"_Paragraph",
151+
"_RatingConcept",
152+
"_RatingScale",
153+
"_Sentence",
154+
"_StringConcept",
155+
"_StringExample",
127156
# LLM output structs
128157
"_get_aspect_extraction_output_struct",
129158
"_get_concept_extraction_output_struct",
@@ -166,6 +195,8 @@
166195
# Decorators
167196
"_post_init_method",
168197
"_timer_decorator",
198+
"_disable_direct_initialization",
199+
"_expose_in_registry",
169200
# Extracted items
170201
"_StringItem",
171202
"_IntegerItem",
@@ -204,4 +235,4 @@
204235
"LLMExtractionError",
205236
"LLMAPIError",
206237
"DocxConverterError",
207-
]
238+
)

contextgem/internal/base/__init__.py

Lines changed: 56 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -16,38 +16,64 @@
1616
# limitations under the License.
1717
#
1818

19-
from contextgem.internal.base.attrs import (
20-
_AssignedAspectsProcessor,
21-
_AssignedConceptsProcessor,
22-
_AssignedInstancesProcessor,
23-
_ExtractedItemsAttributeProcessor,
24-
_RefParasAndSentsAttrituteProcessor,
19+
20+
from contextgem.internal.base.aspects import _Aspect
21+
from contextgem.internal.base.concepts import (
22+
_BooleanConcept,
23+
_DateConcept,
24+
_JsonObjectConcept,
25+
_LabelConcept,
26+
_NumericalConcept,
27+
_RatingConcept,
28+
_StringConcept,
29+
)
30+
from contextgem.internal.base.data_models import _LLMPricing, _RatingScale
31+
from contextgem.internal.base.documents import _Document
32+
from contextgem.internal.base.examples import _JsonObjectExample, _StringExample
33+
from contextgem.internal.base.images import _Image
34+
from contextgem.internal.base.llms import (
35+
_COST_QUANT,
36+
_LOCAL_MODEL_PROVIDERS,
37+
_DocumentLLM,
38+
_DocumentLLMGroup,
2539
)
26-
from contextgem.internal.base.concepts import _Concept
27-
from contextgem.internal.base.instances import _InstanceBase
28-
from contextgem.internal.base.items import _ExtractedItem
29-
from contextgem.internal.base.md_text import _MarkdownTextAttributesProcessor
30-
from contextgem.internal.base.mixins import _PostInitCollectorMixin
31-
from contextgem.internal.base.paras_and_sents import _ParasAndSentsBase
40+
from contextgem.internal.base.paras_and_sents import _Paragraph, _Sentence
41+
from contextgem.internal.base.pipelines import _DocumentPipeline, _ExtractionPipeline
42+
from contextgem.internal.base.utils import _JsonObjectClassStruct
3243

3344

34-
__all__ = [
35-
# Instances
36-
"_InstanceBase",
37-
# Attrs processors
38-
"_AssignedAspectsProcessor",
39-
"_AssignedConceptsProcessor",
40-
"_AssignedInstancesProcessor",
41-
"_ExtractedItemsAttributeProcessor",
42-
"_RefParasAndSentsAttrituteProcessor",
43-
# Mixins
44-
"_PostInitCollectorMixin",
45+
__all__ = (
46+
# Aspects
47+
"_Aspect",
4548
# Concepts
46-
"_Concept",
47-
# Extracted items
48-
"_ExtractedItem",
49+
"_BooleanConcept",
50+
"_DateConcept",
51+
"_JsonObjectConcept",
52+
"_LabelConcept",
53+
"_NumericalConcept",
54+
"_RatingConcept",
55+
"_StringConcept",
56+
# Data models (base)
57+
"_LLMPricing",
58+
"_RatingScale",
59+
# Documents
60+
"_Document",
61+
# Examples
62+
"_JsonObjectExample",
63+
"_StringExample",
64+
# Images
65+
"_Image",
66+
# LLMs
67+
"_COST_QUANT",
68+
"_LOCAL_MODEL_PROVIDERS",
69+
"_DocumentLLM",
70+
"_DocumentLLMGroup",
4971
# Paragraphs and sentences
50-
"_ParasAndSentsBase",
51-
# Markdown text
52-
"_MarkdownTextAttributesProcessor",
53-
]
72+
"_Paragraph",
73+
"_Sentence",
74+
# Pipelines
75+
"_DocumentPipeline",
76+
"_ExtractionPipeline",
77+
# Utils (base)
78+
"_JsonObjectClassStruct",
79+
)

0 commit comments

Comments
 (0)