-
Notifications
You must be signed in to change notification settings - Fork 0
Enhance faketelemetry with batch generation, phase shift, noise types, and data export #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot wasn't able to review any files in this pull request.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…d data export utilities Co-authored-by: adkvi <[email protected]>
Co-authored-by: adkvi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 23 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| :param output: Optional file path or file-like object to write to. | ||
| If None, returns the JSON string. |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output parameter accepts both file paths (str/Path) and file-like objects (TextIO), but the behavior when writing to file-like objects doesn't close the file, while when using a file path, the file is automatically closed after writing. This asymmetry could lead to confusion.
Consider documenting this behavior clearly: "When output is a file path, the file is opened, written, and closed automatically. When output is a file-like object, it remains open and the caller is responsible for closing it."
| records = [ | ||
| {"timestamp": ts.strftime(datetime_format), "value": val} for ts, val in data | ||
| ] |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] In the batch generation loop, datetime.strftime() is called for every single data point. For large datasets (e.g., 10Hz sampling over hours), this could result in thousands of string formatting operations, which can be slow.
Consider optimizing by:
- Pre-formatting datetime objects only when actually needed (lazy evaluation)
- Using a more efficient datetime serialization method for large batches
- Documenting the performance characteristics for large datasets
| elif self.waveform == WaveformType.RANDOM_NOISE: | ||
| value = self.amplitude * random.gauss(0, 1) + self.offset |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RANDOM_NOISE waveform doesn't apply the phase shift parameter, while other waveforms do. This creates inconsistent behavior when a user sets a phase value.
Since random noise is non-deterministic, phase might not be meaningful here, but consider documenting that phase is not applicable for this waveform type or ignoring it explicitly with a comment.
| def to_json( | ||
| data: List[Tuple[datetime, float]], | ||
| output: Optional[Union[str, Path, TextIO]] = None, | ||
| indent: int = 2, | ||
| datetime_format: str = "%Y-%m-%dT%H:%M:%S.%f", | ||
| ) -> str: | ||
| """ | ||
| Export telemetry data to JSON format. | ||
| :param data: List of (datetime, value) tuples | ||
| :param output: Optional file path or file-like object to write to. | ||
| If None, returns the JSON string. | ||
| :param indent: JSON indentation level (default 2) | ||
| :param datetime_format: Format string for datetime serialization | ||
| :return: JSON string representation of the data | ||
| """ | ||
| records = [ | ||
| {"timestamp": ts.strftime(datetime_format), "value": val} for ts, val in data | ||
| ] | ||
|
|
||
| json_str = json.dumps(records, indent=indent) | ||
|
|
||
| if output is not None: | ||
| if isinstance(output, (str, Path)): | ||
| with open(output, "w", encoding="utf-8") as f: | ||
| f.write(json_str) | ||
| else: | ||
| output.write(json_str) | ||
|
|
||
| return json_str |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The to_json() function lacks test coverage for error conditions, particularly when writing to invalid file paths or when file I/O fails. The new export functionality should have tests that verify error handling when file operations fail.
Consider adding tests for scenarios like:
- Writing to a read-only location
- Invalid file paths
- Handling file write errors gracefully
| def from_json( | ||
| source: Union[str, Path, TextIO], | ||
| datetime_format: str = "%Y-%m-%dT%H:%M:%S.%f", | ||
| ) -> List[Tuple[datetime, float]]: | ||
| """ | ||
| Import telemetry data from JSON format. | ||
| :param source: JSON string, file path, or file-like object | ||
| :param datetime_format: Format string for datetime parsing | ||
| :return: List of (datetime, value) tuples | ||
| """ | ||
| if isinstance(source, (str, Path)): | ||
| path = Path(source) | ||
| if path.exists(): | ||
| with open(path, "r", encoding="utf-8") as f: | ||
| records = json.load(f) | ||
| else: | ||
| # Assume it's a JSON string | ||
| records = json.loads(str(source)) | ||
| else: | ||
| records = json.load(source) | ||
|
|
||
| return [ | ||
| (datetime.strptime(r["timestamp"], datetime_format), float(r["value"])) | ||
| for r in records | ||
| ] | ||
|
|
||
|
|
||
| def from_csv( | ||
| source: Union[str, Path, TextIO], | ||
| datetime_format: str = "%Y-%m-%dT%H:%M:%S.%f", | ||
| has_header: bool = True, | ||
| ) -> List[Tuple[datetime, float]]: | ||
| """ | ||
| Import telemetry data from CSV format. | ||
| :param source: CSV string, file path, or file-like object | ||
| :param datetime_format: Format string for datetime parsing | ||
| :param has_header: Whether the CSV has a header row (default True) | ||
| :return: List of (datetime, value) tuples | ||
| """ | ||
| import io | ||
|
|
||
| if isinstance(source, (str, Path)): | ||
| path = Path(source) | ||
| if path.exists(): | ||
| with open(path, "r", encoding="utf-8") as f: | ||
| content = f.read() | ||
| else: | ||
| # Assume it's a CSV string | ||
| content = str(source) | ||
| reader = csv.reader(io.StringIO(content)) | ||
| else: | ||
| reader = csv.reader(source) | ||
|
|
||
| if has_header: | ||
| next(reader, None) # Skip header | ||
|
|
||
| return [ | ||
| (datetime.strptime(row[0], datetime_format), float(row[1])) | ||
| for row in reader | ||
| if row | ||
| ] |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The from_json() and from_csv() functions lack test coverage for malformed data scenarios. There are no tests to verify behavior when:
- JSON/CSV data has missing fields (e.g., no "value" field)
- Data has incorrect types (e.g., non-numeric values)
- Datetime strings don't match the expected format
- Empty data is provided
These error conditions should be tested to ensure graceful error handling or appropriate error messages.
|
|
||
| def from_csv( | ||
| source: Union[str, Path, TextIO], | ||
| datetime_format: str = "%Y-%m-%dT%H:%M:%S.%f", |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The datetime format string "%Y-%m-%dT%H:%M:%S.%f" includes microseconds (.%f), but when parsing, this assumes all timestamps will have microseconds. If timestamps are generated with varying precision or serialized without microseconds, from_csv will fail with a ValueError.
Consider using a more robust parsing approach or documenting this requirement clearly.
| # Exponential decay: frequency acts as decay rate | ||
| value = self.amplitude * math.exp(-self.frequency * t) + self.offset |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The EXPONENTIAL_DECAY waveform doesn't apply the phase shift parameter, while other waveforms do. This creates inconsistent behavior when a user sets a phase value.
Consider applying phase to exponential decay (e.g., by using t - self.phase / (2 * math.pi * self.frequency) or a similar adjustment) or documenting that phase is not supported for this waveform type.
| # Exponential decay: frequency acts as decay rate | |
| value = self.amplitude * math.exp(-self.frequency * t) + self.offset | |
| # Exponential decay: frequency acts as decay rate, apply phase as time shift | |
| t_adjusted = ( | |
| t + self.phase / (2 * math.pi * self.frequency) | |
| if self.frequency > 0 | |
| else t | |
| ) | |
| value = self.amplitude * math.exp(-self.frequency * t_adjusted) + self.offset |
| signature: (t: float) -> float | ||
| """ | ||
| if frequency < 0: | ||
| raise ValueError("Frequency must be non-negative.") |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation describes the frequency parameter as "must be >= 0", but when frequency is 0, several waveforms (SAWTOOTH, TRIANGLE, PULSE) will result in calculations involving t * self.frequency which equals 0, potentially causing unexpected behavior or division by zero in floor operations.
Additionally, for EXPONENTIAL_DECAY, frequency acts as the decay rate (per the comment on line 131), so frequency=0 would mean no decay at all (exp(0) = 1 always). The validation on line 40 allows frequency=0, but the behavior may not be meaningful for all waveform types.
Consider either: 1) Requiring frequency > 0 for specific waveforms, 2) Adding special handling for frequency=0, or 3) Documenting the expected behavior when frequency=0 for each waveform type.
| raise ValueError("Frequency must be non-negative.") | |
| raise ValueError("Frequency must be non-negative.") | |
| # For SAWTOOTH, TRIANGLE, and PULSE, frequency must be strictly positive | |
| if waveform in ( | |
| WaveformType.SAWTOOTH, | |
| WaveformType.TRIANGLE, | |
| WaveformType.PULSE, | |
| ) and frequency == 0: | |
| raise ValueError( | |
| f"Frequency must be positive for {waveform.name} waveform." | |
| ) |
| lines = csv_str.strip().split("\n") | ||
| self.assertEqual(len(lines), 4) # Header + 3 data rows | ||
| # Handle both Unix and Windows line endings | ||
| self.assertTrue(lines[0].strip() == "timestamp,value") |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertTrue(a == b) cannot provide an informative message. Using assertEqual(a, b) instead will give more informative messages.
| self.assertTrue(lines[0].strip() == "timestamp,value") | |
| self.assertEqual(lines[0].strip(), "timestamp,value") |
| from typing import Optional | ||
|
|
||
|
|
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'Optional' is not used.
| from typing import Optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 13 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| class TestDataExport(unittest.TestCase): | ||
| """Tests for data export utilities.""" | ||
|
|
||
| def setUp(self): | ||
| self.sample_data = [ | ||
| (datetime(2025, 1, 1, 12, 0, 0), 1.5), | ||
| (datetime(2025, 1, 1, 12, 0, 1), 2.5), | ||
| (datetime(2025, 1, 1, 12, 0, 2), 3.5), | ||
| ] | ||
|
|
||
| def test_to_json(self): | ||
| import json | ||
|
|
||
| json_str = to_json(self.sample_data) | ||
| parsed = json.loads(json_str) | ||
| self.assertEqual(len(parsed), 3) | ||
| self.assertEqual(parsed[0]["value"], 1.5) | ||
| self.assertIn("timestamp", parsed[0]) | ||
|
|
||
| def test_to_csv(self): | ||
| csv_str = to_csv(self.sample_data) | ||
| lines = csv_str.strip().split("\n") | ||
| self.assertEqual(len(lines), 4) # Header + 3 data rows | ||
| # Handle both Unix and Windows line endings | ||
| self.assertTrue(lines[0].strip() == "timestamp,value") | ||
|
|
||
| def test_to_dict(self): | ||
| result = to_dict(self.sample_data) | ||
| self.assertEqual(len(result), 3) | ||
| self.assertIsInstance(result[0]["timestamp"], datetime) | ||
| self.assertEqual(result[0]["value"], 1.5) | ||
|
|
||
| def test_to_dict_with_format(self): | ||
| result = to_dict(self.sample_data, datetime_format="%Y-%m-%d") | ||
| self.assertEqual(result[0]["timestamp"], "2025-01-01") | ||
|
|
||
| def test_json_roundtrip(self): | ||
| json_str = to_json(self.sample_data) | ||
| restored = from_json(json_str) | ||
| self.assertEqual(len(restored), 3) | ||
| self.assertAlmostEqual(restored[0][1], 1.5) | ||
|
|
||
| def test_csv_roundtrip(self): | ||
| csv_str = to_csv(self.sample_data) | ||
| restored = from_csv(csv_str) | ||
| self.assertEqual(len(restored), 3) | ||
| self.assertAlmostEqual(restored[0][1], 1.5) | ||
|
|
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing test coverage for file I/O operations. The tests only verify string-based export/import (e.g., to_json(data) returning a string), but don't test writing to actual files or reading from files (e.g., to_json(data, "file.json") or from_json("file.json")). Add tests that write to temporary files and read them back to ensure file operations work correctly.
| # Sawtooth with phase: adjust the time reference | ||
| t_adjusted = ( | ||
| t + self.phase / (2 * math.pi * self.frequency) | ||
| if self.frequency > 0 | ||
| else t | ||
| ) |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] When frequency is 0, the phase shift calculation creates a time offset but may not produce meaningful results for waveforms. For frequency=0, the sawtooth waveform would become constant (t*0 = 0), so the phase shift doesn't affect the output. Consider documenting this edge case or raising a warning when phase is used with frequency=0.
| raise ValueError("Duration must be positive.") | ||
|
|
||
| interval = 1.0 / sampling_rate | ||
| num_samples = int(duration * sampling_rate) |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The calculation num_samples = int(duration * sampling_rate) truncates instead of rounding, which may result in slightly fewer samples than expected. For example, with duration=1.0 and sampling_rate=3.0, you'd expect 3 samples, but with floating-point arithmetic you might get 2 if there's any precision loss. Consider using round() instead: num_samples = round(duration * sampling_rate) to ensure more predictable behavior.
| num_samples = int(duration * sampling_rate) | |
| num_samples = round(duration * sampling_rate) |
| :param sampling_rate: Samples per second (Hz) | ||
| :param duration: Optional duration in seconds. If None, streams forever. | ||
| :yields: Tuple of (datetime, float) representing timestamp and value | ||
| """ |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent validation: generate_batch validates that sampling_rate > 0 and duration > 0, but the stream method doesn't have the same validation. The stream method should also validate these parameters to prevent infinite loops or division by zero errors. Add validation at the beginning of the stream method.
| """ | |
| """ | |
| if sampling_rate <= 0: | |
| raise ValueError("Sampling rate must be positive.") | |
| if duration is not None and duration <= 0: | |
| raise ValueError("Duration must be positive if specified.") |
| :param sampling_rate: Samples per second (Hz) | ||
| :param duration: Optional duration in seconds. If None, streams forever. | ||
| :yields: Dictionary mapping channel index to (timestamp, value) tuple | ||
| """ |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent validation: generate_batch validates that sampling_rate > 0 and duration > 0, but the stream method doesn't have the same validation. The stream method should also validate these parameters to prevent infinite loops or division by zero errors. Add validation at the beginning of the stream method.
| """ | |
| """ | |
| if sampling_rate <= 0: | |
| raise ValueError("Sampling rate must be positive.") | |
| if duration is not None and duration <= 0: | |
| raise ValueError("Duration must be positive if specified.") |
| if isinstance(source, (str, Path)): | ||
| path = Path(source) | ||
| if path.exists(): | ||
| with open(path, "r", encoding="utf-8") as f: | ||
| records = json.load(f) | ||
| else: | ||
| # Assume it's a JSON string | ||
| records = json.loads(str(source)) | ||
| else: | ||
| records = json.load(source) |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The from_json function attempts to distinguish between a file path and a JSON string by checking if the path exists. However, this creates ambiguity: if a JSON string starts with text that matches an existing file path, it will incorrectly try to read that file. Consider using a more explicit approach, such as trying to parse as JSON first and only reading from file if that fails with a specific check for file-like strings, or requiring users to explicitly pass a Path object for files.
| if isinstance(source, (str, Path)): | |
| path = Path(source) | |
| if path.exists(): | |
| with open(path, "r", encoding="utf-8") as f: | |
| records = json.load(f) | |
| else: | |
| # Assume it's a JSON string | |
| records = json.loads(str(source)) | |
| else: | |
| records = json.load(source) | |
| if isinstance(source, Path): | |
| with open(source, "r", encoding="utf-8") as f: | |
| records = json.load(f) | |
| elif hasattr(source, "read"): | |
| # file-like object | |
| records = json.load(source) | |
| elif isinstance(source, str): | |
| try: | |
| records = json.loads(source) | |
| except json.JSONDecodeError: | |
| # Try to open as file path | |
| try: | |
| with open(source, "r", encoding="utf-8") as f: | |
| records = json.load(f) | |
| except FileNotFoundError: | |
| raise ValueError("Input string is neither valid JSON nor a valid file path: %r" % source) | |
| else: | |
| raise TypeError("Unsupported input type for from_json: %r" % type(source)) |
| def test_json_roundtrip(self): | ||
| json_str = to_json(self.sample_data) | ||
| restored = from_json(json_str) | ||
| self.assertEqual(len(restored), 3) | ||
| self.assertAlmostEqual(restored[0][1], 1.5) | ||
|
|
||
| def test_csv_roundtrip(self): | ||
| csv_str = to_csv(self.sample_data) | ||
| restored = from_csv(csv_str) | ||
| self.assertEqual(len(restored), 3) | ||
| self.assertAlmostEqual(restored[0][1], 1.5) | ||
|
|
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing test coverage for the error handling paths in from_json and from_csv. Consider adding tests for malformed JSON/CSV data (e.g., missing fields, invalid data types, invalid datetime formats) to ensure the functions handle errors gracefully.
| gen = TelemetryGenerator( | ||
| WaveformType.SINE, | ||
| frequency=1.0, | ||
| phase=math.pi/2 |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Inconsistent trailing comma: Line 72 has a trailing comma after phase=math.pi/2 but this is inside a function call with only that one parameter on the line. While Python allows this, it's inconsistent with the style used elsewhere in the codebase (e.g., line 17 in examples/simple_sinewave.py has a trailing comma for consistency in multi-line parameter lists). This should either have no trailing comma or be formatted consistently.
| phase=math.pi/2 | |
| phase=math.pi/2, |
| raise ValueError("Duration must be positive.") | ||
|
|
||
| interval = 1.0 / sampling_rate | ||
| num_samples = int(duration * sampling_rate) |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The calculation num_samples = int(duration * sampling_rate) truncates instead of rounding, which may result in slightly fewer samples than expected. For example, with duration=1.0 and sampling_rate=3.0, you'd expect 3 samples, but with floating-point arithmetic you might get 2 if there's any precision loss. Consider using round() instead: num_samples = round(duration * sampling_rate) to ensure more predictable behavior.
| num_samples = int(duration * sampling_rate) | |
| num_samples = round(duration * sampling_rate) |
| # With 50% probability, expect roughly half to have impulses | ||
| self.assertTrue(20 < impulse_count < 80) |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertion range (20 < impulse_count < 80) is very broad for testing. With 100 samples and 50% probability, the expected count is 50, and the range should be tighter. A 95% confidence interval would be approximately 40-60 (using binomial distribution). The current range (20-80) is too permissive and might not catch actual bugs in the impulse noise implementation.
Adds commonly requested signal generation features and data export capabilities to the faketelemetry package.
New Features
phaseparam (radians) onTelemetryGeneratorfor out-of-phase signalsgenerate_batch()returns all samples immediately without real-time delaysWaveformType.EXPONENTIAL_DECAYwhere frequency controls decay rateNoiseTypeenum withGAUSSIAN,UNIFORM, andIMPULSEoptionsto_json(),to_csv(),from_json(),from_csv(),to_dict()utilitiesExample
Other Changes
pyproject.tomlwith full metadataOriginal prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.