Skip to content

Commit 8d623df

Browse files
Adib234facebook-github-bot
authored andcommitted
Augment audio to a video (#133)
Summary: ## Related Issue Fixes #130 - [X] I have read CONTRIBUTING.md to understand how to contribute to this repository :) <Please summarize what you are trying to achieve, what changes you made, and how they acheive the desired result.> I'm trying to extract the audio to a temporary file and then apply some audio augmentation. After that I swap the current audio in the video with the augmented audio. I also gather metadata for the augmented audio and video. It either returns the output path (if it was specified) or the video path. Only file that was changed was `/AugLy/augly/video/functional.py`. A question I have is what should be the description for the `audio_aug_function` param? ## Unit Tests If your changes touch the `audio` module, please run all of the `audio` tests and paste the output here. Likewise for `image`, `text`, & `video`. If your changes could affect behavior in multiple modules, please run the tests for all potentially affected modules. If you are unsure of which modules might be affected by your changes, please just run all the unit tests. ### Video ```bash python -m unittest discover -s augly/tests/video_tests/ -p "*" ``` Output of test suite for video ``` ...../Users/admin/AugLy/augly/image/utils/utils.py:51: ResourceWarning: unclosed file <_io.BufferedReader name='/Users/admin/AugLy/augly/assets/screenshot_templates/bboxes.json'> bbox = json.load(open(local_bbox_path, "rb"))[template_key] ResourceWarning: Enable tracemalloc to get the object allocation traceback /Users/admin/AugLy/augly/video/helpers/metadata.py:357: ResourceWarning: unclosed file <_io.BufferedReader name='/Users/admin/AugLy/augly/assets/screenshot_templates/web.png'> metadata[-1]["intensity"] = getattr( ResourceWarning: Enable tracemalloc to get the object allocation traceback ./Users/admin/AugLy/augly/image/utils/utils.py:184: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations A = np.matrix(matrix, dtype=np.float) /Users/admin/AugLy/augly/image/utils/utils.py:184: PendingDeprecationWarning: the matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Please adjust your code to use regular ndarray. A = np.matrix(matrix, dtype=np.float) /Users/admin/opt/anaconda3/envs/augly/lib/python3.9/site-packages/numpy/matrixlib/defmatrix.py:69: PendingDeprecationWarning: the matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Please adjust your code to use regular ndarray. return matrix(data, dtype=dtype, copy=False) .. ====================================================================== FAIL: test_ReplaceWithBackground (transforms.composite_test.TransformsVideoUnitTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/admin/AugLy/augly/tests/video_tests/transforms/composite_test.py", line 75, in test_ReplaceWithBackground self.evaluate_class( File "/Users/admin/AugLy/augly/tests/video_tests/base_unit_test.py", line 122, in evaluate_class self.assertTrue( AssertionError: False is not true ---------------------------------------------------------------------- Ran 63 tests in 813.840s FAILED (failures=1) ``` ## Other testing N/A only looking for some feedback on the work I did so far If applicable, test your changes and paste the output here. For example, if your changes affect the requirements/installation, then test installing augly in a fresh conda env, then make sure you are able to import augly & run the unit test Pull Request resolved: #133 Reviewed By: zpapakipos Differential Revision: D31368459 Pulled By: jbitton fbshipit-source-id: 9c76057d1a2057ed25b317f2eb3fd1808c92bd8d
1 parent 0dab20c commit 8d623df

File tree

7 files changed

+215
-9
lines changed

7 files changed

+215
-9
lines changed

augly/tests/video_tests/transforms/composite_test.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
import random
66
import unittest
77

8+
import augly.audio as audaugs
89
import augly.video as vidaugs
910
from augly.tests.base_configs import VideoAugConfig
1011
from augly.tests.video_tests.base_unit_test import BaseVideoUnitTest
@@ -23,6 +24,14 @@ def setUpClass(cls):
2324
def test_ApplyLambda(self):
2425
self.evaluate_class(vidaugs.ApplyLambda(), fname="apply_lambda")
2526

27+
def test_AugmentAudio(self):
28+
self.evaluate_class(
29+
vidaugs.AugmentAudio(
30+
audio_aug_function=audaugs.PitchShift(),
31+
),
32+
fname="augment_audio",
33+
)
34+
2635
def test_InsertInBackground(self):
2736
self.evaluate_class(
2837
vidaugs.InsertInBackground(offset_factor=0.25),
@@ -79,6 +88,7 @@ def test_ReplaceWithBackground(self):
7988
source_percentage=0.7,
8089
),
8190
fname="replace_with_background",
91+
metadata_exclude_keys=["dst_duration", "dst_fps"],
8292
)
8393

8494
def test_ReplaceWithColorFrames(self):

augly/utils/expected_output/video_tests/expected_metadata.json

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,60 @@
8080
"src_width": 1920
8181
}
8282
],
83+
"augment_audio": [
84+
{
85+
"audio_aug_function": "PitchShift",
86+
"audio_aug_kwargs": {},
87+
"audio_metadata": [
88+
{
89+
"dst_duration": 10.005333333333333,
90+
"dst_num_channels": 1,
91+
"dst_sample_rate": 48000,
92+
"dst_segments": [
93+
{
94+
"end": 10.005333333333333,
95+
"start": 0
96+
}
97+
],
98+
"intensity": 1.1904761904761905,
99+
"n_steps": 1,
100+
"name": "pitch_shift",
101+
"output_path": null,
102+
"src_duration": 10.005333333333333,
103+
"src_num_channels": 1,
104+
"src_sample_rate": 48000,
105+
"src_segments": [
106+
{
107+
"end": 10.005333333333333,
108+
"start": 0
109+
}
110+
]
111+
}
112+
],
113+
"dst_duration": 10.027855,
114+
"dst_fps": 29.916666666666668,
115+
"dst_height": 1080,
116+
"dst_segments": [
117+
{
118+
"end": 10.027855,
119+
"start": 0
120+
}
121+
],
122+
"dst_width": 1920,
123+
"intensity": 1.1904761904761905,
124+
"name": "augment_audio",
125+
"src_duration": 10.027855,
126+
"src_fps": 29.916666666666668,
127+
"src_height": 1080,
128+
"src_segments": [
129+
{
130+
"end": 10.027855,
131+
"start": 0
132+
}
133+
],
134+
"src_width": 1920
135+
}
136+
],
83137
"blend_videos": [
84138
{
85139
"dst_duration": 10.027855,

augly/video/__init__.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
add_noise,
77
apply_lambda,
88
audio_swap,
9+
augment_audio,
910
blend_videos,
1011
blur,
1112
brightness,
@@ -21,20 +22,20 @@
2122
hflip,
2223
hstack,
2324
insert_in_background,
24-
replace_with_background,
2525
loop,
2626
meme_format,
2727
overlay,
2828
overlay_dots,
2929
overlay_emoji,
30-
overlay_onto_screenshot,
3130
overlay_onto_background_video,
31+
overlay_onto_screenshot,
3232
overlay_shapes,
3333
overlay_text,
3434
pad,
3535
perspective_transform_and_shake,
3636
pixelization,
3737
remove_audio,
38+
replace_with_background,
3839
replace_with_color_frames,
3940
resize,
4041
rotate,
@@ -50,6 +51,7 @@
5051
AddNoise,
5152
ApplyLambda,
5253
AudioSwap,
54+
AugmentAudio,
5355
BlendVideos,
5456
Blur,
5557
Brightness,
@@ -106,6 +108,7 @@
106108
"AddNoise",
107109
"ApplyLambda",
108110
"AudioSwap",
111+
"AugmentAudio",
109112
"BlendVideos",
110113
"Blur",
111114
"Brightness",
@@ -161,6 +164,7 @@
161164
"add_noise",
162165
"apply_lambda",
163166
"audio_swap",
167+
"augment_audio",
164168
"blend_videos",
165169
"blur",
166170
"brightness",

augly/video/functional.py

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@
88
import tempfile
99
from typing import Any, Callable, Dict, List, Optional, Tuple, Union
1010

11+
import augly.audio as audaugs
12+
import augly.audio.utils as audutils
1113
import augly.image as imaugs
1214
import augly.utils as utils
1315
import augly.video.augmenters.cv2 as ac
@@ -138,6 +140,69 @@ def audio_swap(
138140
return output_path or video_path
139141

140142

143+
def augment_audio(
144+
video_path: str,
145+
output_path: Optional[str] = None,
146+
audio_aug_function: Callable[..., Tuple[np.ndarray, int]] = audaugs.apply_lambda,
147+
metadata: Optional[List[Dict[str, Any]]] = None,
148+
**audio_aug_kwargs,
149+
) -> str:
150+
"""
151+
Augments the audio track of the input video using a given AugLy audio augmentation
152+
153+
@param video_path: the path to the video to be augmented
154+
155+
@param output_path: the path in which the resulting video will be stored.
156+
If not passed in, the original video file will be overwritten
157+
158+
@param audio_aug_function: the augmentation function to be applied onto the video's
159+
audio track. Should have the standard API of an AugLy audio augmentation, i.e.
160+
expect input audio as a numpy array or path & output path as input, and output
161+
the augmented audio to the output path
162+
163+
@param metadata: if set to be a list, metadata about the function execution
164+
including its name, the source & dest duration, fps, etc. will be appended
165+
to the inputted list. If set to None, no metadata will be appended or returned
166+
167+
@param audio_aug_kwargs: the input attributes to be passed into `audio_aug`
168+
169+
@returns: the path to the augmented video
170+
"""
171+
assert callable(audio_aug_function), (
172+
repr(type(audio_aug_function).__name__) + " object is not callable"
173+
)
174+
175+
func_kwargs = helpers.get_func_kwargs(
176+
metadata, locals(), video_path, audio_aug_function=audio_aug_function
177+
)
178+
179+
if audio_aug_function is not None:
180+
try:
181+
func_kwargs["audio_aug_function"] = audio_aug_function.__name__
182+
except AttributeError:
183+
func_kwargs["audio_aug_function"] = type(audio_aug_function).__name__
184+
185+
audio_metadata = []
186+
with tempfile.NamedTemporaryFile(suffix=".wav") as tmpfile:
187+
helpers.extract_audio_to_file(video_path, tmpfile.name)
188+
audio, sr = audutils.validate_and_load_audio(tmpfile.name)
189+
aug_audio, aug_sr = audio_aug_function(
190+
audio, sample_rate=sr, metadata=audio_metadata, **audio_aug_kwargs
191+
)
192+
audutils.ret_and_save_audio(aug_audio, tmpfile.name, aug_sr)
193+
audio_swap(video_path, tmpfile.name, output_path=output_path or video_path)
194+
195+
if metadata is not None:
196+
helpers.get_metadata(
197+
metadata=metadata,
198+
audio_metadata=audio_metadata,
199+
function_name="augment_audio",
200+
**func_kwargs,
201+
)
202+
203+
return output_path or video_path
204+
205+
141206
def blend_videos(
142207
video_path: str,
143208
overlay_path: str,

augly/video/helpers/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
add_noise_intensity,
1616
apply_lambda_intensity,
1717
audio_swap_intensity,
18+
augment_audio_intensity,
1819
blend_videos_intensity,
1920
blur_intensity,
2021
brightness_intensity,
@@ -85,6 +86,7 @@
8586
"add_noise_intensity",
8687
"apply_lambda_intensity",
8788
"audio_swap_intensity",
89+
"augment_audio_intensity",
8890
"blend_videos_intensity",
8991
"blur_intensity",
9092
"brightness_intensity",

augly/video/helpers/intensity.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
#!/usr/bin/env python3
22
# Copyright (c) Facebook, Inc. and its affiliates.
33

4-
from typing import Any, Dict, Optional, Tuple
4+
from typing import Any, Dict, List, Optional, Tuple
55

66
import augly.image.intensity as imint
77
import augly.image.utils as imutils
@@ -42,6 +42,10 @@ def audio_swap_intensity(offset: float, **kwargs) -> float:
4242
return (1.0 - offset) * 100.0
4343

4444

45+
def augment_audio_intensity(audio_metadata: List[Dict[str, Any]], **kwargs) -> float:
46+
return audio_metadata[0]["intensity"]
47+
48+
4549
def blend_videos_intensity(opacity: float, overlay_size: float, **kwargs) -> float:
4650
return imint.overlay_media_intensity_helper(opacity, overlay_size)
4751

@@ -209,9 +213,7 @@ def overlay_emoji_intensity(
209213

210214

211215
def overlay_onto_background_video_intensity(
212-
overlay_size: Optional[float],
213-
metadata: Dict[str, Any],
214-
**kwargs,
216+
overlay_size: Optional[float], metadata: Dict[str, Any], **kwargs
215217
) -> float:
216218
if overlay_size is not None:
217219
return (1 - overlay_size ** 2) * 100.0

augly/video/transforms.py

Lines changed: 72 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,10 @@
55
import random
66
from typing import Any, Callable, Dict, List, Optional, Tuple
77

8+
import augly.audio as audaugs
89
import augly.utils as utils
910
import augly.video.functional as F
11+
import numpy as np
1012
from augly.video.helpers import identity_function
1113

1214

@@ -270,6 +272,60 @@ def apply_transform(
270272
)
271273

272274

275+
class AugmentAudio(BaseTransform):
276+
def __init__(
277+
self,
278+
audio_aug_function: Callable[
279+
..., Tuple[np.ndarray, int]
280+
] = audaugs.apply_lambda,
281+
p: float = 1.0,
282+
**audio_aug_kwargs,
283+
):
284+
"""
285+
@param audio_aug_function: the augmentation function to be applied onto the
286+
video's audio track. Should have the standard API of an AugLy audio
287+
augmentation, i.e. expect input audio as a numpy array or path & output
288+
path as input, and output the augmented audio to the output path
289+
290+
@param p: the probability of the transform being applied; default value is 1.0
291+
292+
@param audio_aug_kwargs: the input attributes to be passed into `audio_aug`
293+
"""
294+
super().__init__(p)
295+
self.audio_aug_function = audio_aug_function
296+
self.audio_aug_kwargs = audio_aug_kwargs
297+
298+
def apply_transform(
299+
self,
300+
video_path: str,
301+
output_path: str,
302+
metadata: Optional[List[Dict[str, Any]]] = None,
303+
) -> str:
304+
"""
305+
Augments the audio track of the input video using a given AugLy audio
306+
augmentation
307+
308+
@param video_path: the path to the video to be augmented
309+
310+
@param output_path: the path in which the resulting video will be stored.
311+
If not passed in, the original video file will be overwritten
312+
313+
@param metadata: if set to be a list, metadata about the function execution
314+
including its name, the source & dest duration, fps, etc. will be appended
315+
to the inputted list. If set to None, no metadata will be appended or
316+
returned
317+
318+
@returns: the path to the augmented video
319+
"""
320+
return F.augment_audio(
321+
video_path=video_path,
322+
audio_aug_function=self.audio_aug_function,
323+
output_path=output_path,
324+
metadata=metadata,
325+
**self.audio_aug_kwargs,
326+
)
327+
328+
273329
class BlendVideos(BaseTransform):
274330
def __init__(
275331
self,
@@ -405,7 +461,12 @@ def apply_transform(
405461
406462
@returns: the path to the augmented video
407463
"""
408-
return F.brightness(video_path, output_path, self.level, metadata=metadata)
464+
return F.brightness(
465+
video_path,
466+
output_path,
467+
level=self.level,
468+
metadata=metadata,
469+
)
409470

410471

411472
class ChangeAspectRatio(BaseTransform):
@@ -713,7 +774,10 @@ def apply_transform(
713774
@returns: the path to the augmented video
714775
"""
715776
return F.encoding_quality(
716-
video_path, output_path, self.quality, metadata=metadata
777+
video_path,
778+
output_path,
779+
quality=int(self.quality),
780+
metadata=metadata,
717781
)
718782

719783

@@ -1622,7 +1686,12 @@ def apply_transform(
16221686
16231687
@returns: the path to the augmented video
16241688
"""
1625-
return F.pixelization(video_path, output_path, self.ratio, metadata=metadata)
1689+
return F.pixelization(
1690+
video_path,
1691+
output_path,
1692+
ratio=self.ratio,
1693+
metadata=metadata,
1694+
)
16261695

16271696

16281697
class RemoveAudio(BaseTransform):

0 commit comments

Comments
 (0)