Skip to content

Fix a few open source tests #616

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

allenwang28
Copy link
Contributor

@allenwang28 allenwang28 commented Jul 22, 2025

Differential Revision: D78747980

Fixes:

  • test_python_actors
  • test_allocator
  • test_rdma

Changes:

  • test_python_actors was failing due to rust bindings. This moves RDMA manager creation into a classmethod
    test_allocator was failing due to an aggressive regex. Relaxed this.
  • test_rdma had a typo in needs_rdma
  • in test_mailbox.py, for whatever reason the cast was taking a really long time to complete. The logic is the same without the cast.

This doesn't fix all of the tests. Remaining failures:

  • python/tests/builtins/test_log.py
  • python/tests/builtins/test_random.py
  • python/tests/test_actor_error.py
  • python/tests/test_controller.py

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 22, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78747980

allenwang28 added a commit to allenwang28/monarch-1 that referenced this pull request Jul 22, 2025
Summary:

Changes:
- For rust bindings, move RDMA manager creation into a classmethod
- test_rdma is failing because `needs_rdma` typo

Rollback Plan:

Differential Revision: D78747980
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78747980

allenwang28 added a commit to allenwang28/monarch-1 that referenced this pull request Jul 23, 2025
Summary:

Fixes:
- test_python_actors
- test_allocator 
- test_rdma

Changes:
- test_python_actors was failing due to rust bindings. This moves RDMA manager creation into a classmethod
- test_allocator was failing due to an aggressive regex. Relaxed this.
- test_rdma had a typo in `needs_rdma`
- in test_mailbox.py, for whatever reason the cast was taking a really long time to complete. The logic is the same without the cast.

This doesn't fix all of the tests

Rollback Plan:

Differential Revision: D78747980
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78747980

allenwang28 added a commit to allenwang28/monarch-1 that referenced this pull request Jul 23, 2025
Summary:

Fixes:
- test_python_actors
- test_allocator
- test_rdma

Changes:
- test_python_actors was failing due to rust bindings. This moves RDMA manager creation into a classmethod
- test_allocator was failing due to an aggressive regex. Relaxed this.
- test_rdma had a typo in `needs_rdma`
- in test_mailbox.py, for whatever reason the cast was taking a really long time to complete. The logic is the same without the cast.

This doesn't fix all of the tests. Remaining failures:
- `python/tests/builtins/test_log.py`
- `python/tests/builtins/test_random.py`
- `python/tests/test_actor_error.py`
- `python/tests/test_controller.py`

Differential Revision: D78747980
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78747980

allenwang28 added a commit to allenwang28/monarch-1 that referenced this pull request Jul 23, 2025
Summary:


Changes:
- test_python_actors was failing due to rust bindings. This moves RDMA manager creation into a classmethod
- test_allocator was failing due to an aggressive regex. Relaxed this.
- test_rdma had a typo in `needs_rdma`
- in test_mailbox.py, for whatever reason the cast was taking a really long time to complete. The logic is the same without the cast.
- fixes some logic in `HAS_TENSOR_ENGINE` for deciding on whether or not to start RDMA manager actor.
- Disable `test_actor_mesh.py` tests. For some reason, after D78292490, this test stopped cleaning up properly which affects downstream tests when running sequentially. Specifically, `builtins/test_log.py` and `builtins/test_random.py`. Syncing with pzhan9, there's currently an issue where unclean shutdown can lead to fatal error that's being actively worked on. We should re-enable these tests once that is resolved.

Differential Revision: D78747980
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78747980

allenwang28 added a commit to allenwang28/monarch-1 that referenced this pull request Jul 23, 2025
Summary:


Changes:
- test_python_actors was failing due to rust bindings. This moves RDMA manager creation into a classmethod
- test_allocator was failing due to an aggressive regex. Relaxed this.
- test_rdma had a typo in `needs_rdma`
- in test_mailbox.py, for whatever reason the cast was taking a really long time to complete. The logic is the same without the cast.
- fixes some logic in `HAS_TENSOR_ENGINE` for deciding on whether or not to start RDMA manager actor.
- Disable `test_actor_mesh.py` tests. For some reason, after D78292490, this test stopped cleaning up properly which affects downstream tests when running sequentially. Specifically, `builtins/test_log.py` and `builtins/test_random.py`. Syncing with pzhan9, there's currently an issue where unclean shutdown can lead to fatal error that's being actively worked on. We should re-enable these tests once that is resolved.

remaining failures:
- test_actor_error::test_supervigion_with_sending_error
- test_controller:: test_timeout_warning

Differential Revision: D78747980
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78747980

allenwang28 added a commit to allenwang28/monarch-1 that referenced this pull request Jul 23, 2025
Summary:


Changes:
- test_python_actors was failing due to rust bindings. This moves RDMA manager creation into a classmethod
- test_allocator was failing due to an aggressive regex. Relaxed this.
- test_rdma had a typo in `needs_rdma`
- in test_mailbox.py, for whatever reason the cast was taking a really long time to complete. The logic is the same without the cast.
- fixes some logic in `HAS_TENSOR_ENGINE` for deciding on whether or not to start RDMA manager actor.
- Disable `test_actor_mesh.py` tests. For some reason, after D78292490, this test stopped cleaning up properly which affects downstream tests when running sequentially. Specifically, `builtins/test_log.py` and `builtins/test_random.py`. Syncing with pzhan9, there's currently an issue where unclean shutdown can lead to fatal error that's being actively worked on. We should re-enable these tests once that is resolved.

remaining failures that are marked as skipped in OSS:
- test_actor_error::test_supervision_with_sending_error
- test_controller:: test_timeout_warning


These tests seem to fail when run in isolation, but fail when run sequentially.

Differential Revision: D78747980
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78747980

allenwang28 added a commit to allenwang28/monarch-1 that referenced this pull request Jul 23, 2025
Summary:


Changes:
- test_python_actors was failing due to rust bindings. This moves RDMA manager creation into a classmethod
- test_allocator was failing due to an aggressive regex. Relaxed this.
- test_rdma had a typo in `needs_rdma`
- in test_mailbox.py, for whatever reason the cast was taking a really long time to complete. The logic is the same without the cast.
- fixes some logic in `HAS_TENSOR_ENGINE` for deciding on whether or not to start RDMA manager actor.
- Disable `test_actor_mesh.py` tests. For some reason, after D78292490, this test stopped cleaning up properly which affects downstream tests when running sequentially. Specifically, `builtins/test_log.py` and `builtins/test_random.py`. Syncing with pzhan9, there's currently an issue where unclean shutdown can lead to fatal error that's being actively worked on. We should re-enable these tests once that is resolved.

remaining failures that are marked as skipped in OSS:
- test_actor_error::test_supervision_with_sending_error
- test_controller:: test_timeout_warning


These tests seem to fail when run in isolation, but fail when run sequentially.

Differential Revision: D78747980
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78747980

Summary:


Changes:
- test_python_actors was failing due to rust bindings. This moves RDMA manager creation into a classmethod
- test_allocator was failing due to an aggressive regex. Relaxed this.
- test_rdma had a typo in `needs_rdma`
- in test_mailbox.py, for whatever reason the cast was taking a really long time to complete. The logic is the same without the cast.
- fixes some logic in `HAS_TENSOR_ENGINE` for deciding on whether or not to start RDMA manager actor.
- Disable `test_actor_mesh.py` tests. For some reason, after D78292490, this test stopped cleaning up properly which affects downstream tests when running sequentially. Specifically, `builtins/test_log.py` and `builtins/test_random.py`. Syncing with pzhan9, there's currently an issue where unclean shutdown can lead to fatal error that's being actively worked on. We should re-enable these tests once that is resolved.

remaining failures that are marked as skipped in OSS:
- test_actor_error::test_supervision_with_sending_error
- test_controller:: test_timeout_warning


These tests seem to fail when run in isolation, but fail when run sequentially.

Differential Revision: D78747980
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78747980

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in b3fcd71.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants