-
Notifications
You must be signed in to change notification settings - Fork 4.2k
feat: Use openedx_catalog app, backfill its CourseRuns [FC-0117] #38023
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
bradenmacdonald
wants to merge
12
commits into
openedx:master
Choose a base branch
from
open-craft:braden/openedx-catalog
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+383
−5
Open
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
97e1631
temp: use WIP version of openedx-core
bradenmacdonald 73f2ad1
feat: Use openedx_catalog app, backfill it with all known courses
bradenmacdonald 133d4b0
feat: properly set "created" timestamp on course runs during backfill
bradenmacdonald 24c5120
docs: correct field name of "short_name"
bradenmacdonald 1970173
fix: better normalization of language codes
bradenmacdonald 953637d
feat: keep courses in sync with CourseRun/CatalogCourse
bradenmacdonald c569920
test: tests to confirm courses stay in sync with CatalogCourse/CourseRun
bradenmacdonald 84e3acf
feat: delete CourseRun/CatalogCourse when deleting a course
bradenmacdonald 2902fe8
test: update import linter
bradenmacdonald be726d3
test: fix issue with the tests running on CI
bradenmacdonald a54bbaa
test: remove duplicate test
bradenmacdonald 8c0e9e0
refactor: course_id -> course_key, run -> run_code
bradenmacdonald File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
150 changes: 150 additions & 0 deletions
150
...re/djangoapps/content/course_overviews/migrations/0030_backfill_new_catalog_courseruns.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,150 @@ | ||
| """ | ||
| Data migration to populate the new CourseRun and CatalogCourse models. | ||
| """ | ||
|
|
||
| # Generated by Django 5.2.11 on 2026-02-13 21:47 | ||
| import logging | ||
|
|
||
| from django.conf import settings | ||
| from django.db import migrations | ||
| from organizations.api import ensure_organization, exceptions as org_exceptions | ||
|
|
||
| log = logging.getLogger(__name__) | ||
|
|
||
| # https://github.com/openedx/openedx-platform/issues/38036 | ||
| NORMALIZE_LANGUAGE_CODES = { | ||
| "zh-hans": "zh-cn", | ||
| "zh-hant": "zh-hk", | ||
| "ca@valencia": "ca-es-valencia", | ||
| } | ||
|
|
||
|
|
||
| def backfill_openedx_catalog(apps, schema_editor): | ||
| """ | ||
| Populate the new CourseRun and CatalogCourse models. | ||
| """ | ||
| # CourseOverview is a cache model derived from modulestore; modulestore is the source of truth for courses, so we'll | ||
| # use it to get the list of "all courses on the system" to populate the new CourseRun and CatalogCourse models. | ||
| CourseIndex = apps.get_model("split_modulestore_django", "SplitModulestoreCourseIndex") | ||
| CourseOverview = apps.get_model("course_overviews", "CourseOverview") | ||
| CatalogCourse = apps.get_model("openedx_catalog", "CatalogCourse") | ||
| CourseRun = apps.get_model("openedx_catalog", "CourseRun") | ||
|
|
||
| created_catalog_course_ids: set[int] = set() | ||
| all_course_runs = CourseIndex.objects.filter(base_store="mongodb", library_version="").order_by("course_id") | ||
| for course_idx in all_course_runs: | ||
| org_code: str = course_idx.course_id.org | ||
| course_code: str = course_idx.course_id.course | ||
| run_code: str = course_idx.course_id.run | ||
|
|
||
| # Ensure that the Organization exists. | ||
| try: | ||
| org_data = ensure_organization(org_code) | ||
| except org_exceptions.InvalidOrganizationException as exc: | ||
| # Note: IFF the org exists among the modulestore courses but not in the Organizations database table, | ||
| # and if auto-create is disabled (it's enabled by default), this will raise InvalidOrganizationException. It | ||
| # would be up to the operator to decide how they want to resolve that. | ||
| raise ValueError( | ||
| f'The organization short code "{org_code}" exists in modulestore ({course_idx.course_id}) but ' | ||
| "not the Organizations table, and auto-creating organizations is disabled. You can resolve this by " | ||
| "creating the Organization manually (e.g. from the Django admin) or turning on auto-creation. " | ||
| "You can set active=False to prevent this Organization from being used other than for historical data. " | ||
| ) from exc | ||
| if org_data["short_name"] != org_code: | ||
| # On most installations, the 'short_name' database column is case insensitive (unfortunately) | ||
| log.warning( | ||
| 'The course with ID "%s" does not match its Organization.short_name "%s"', | ||
| course_idx.course_id, | ||
| org_data["short_name"], | ||
| ) | ||
|
|
||
| # Fetch the CourseOverview if it exists | ||
| try: | ||
| course_overview = CourseOverview.objects.get(id=course_idx.course_id) | ||
| except CourseOverview.DoesNotExist: | ||
| course_overview = None # Course exists in modulestore but details aren't cached into CourseOverview yet | ||
| display_name: str = (course_overview.display_name if course_overview else None) or course_code | ||
|
|
||
| # Determine the course language. | ||
| # Note that in Studio, the options for course language generally came from the ALL_LANGUAGES setting, which is | ||
| # mostly two-letter language codes with no locale, except it uses "zh_HANS" for Mandarin and "zh_HANT" for | ||
| # Cantonese. We normalize those to "zh-cn" and "zh-hk" for consistency with our platform UI languages / | ||
| # Transifex, but you can still access the "old" version using the CatalogCourse.language_short | ||
| # getter/setter for backwards compatbility. See https://github.com/openedx/openedx-platform/issues/38036 | ||
| language = settings.LANGUAGE_CODE | ||
| if course_overview and course_overview.language: | ||
| language = course_overview.language.lower() | ||
| language = language.replace("_", "-") # Ensure we use hyphens for consistency (`en-us` not `en_us`) | ||
| # Normalize this language code. The previous/non-normalized code will still be available via the | ||
| # "language_short" property for backwards compatibility. | ||
| language = NORMALIZE_LANGUAGE_CODES.get(language, language) | ||
| if len(language) > 2 and language[2] != "-": | ||
| # This seems like an invalid value; revert to the default: | ||
| log.warning( | ||
| 'The course with ID "%s" has invalid language "%s" - using default language "%s" instead.', | ||
| course_idx.course_id, | ||
| language, | ||
| settings.LANGUAGE_CODE, | ||
| ) | ||
| language = settings.LANGUAGE_CODE | ||
|
|
||
| # Ensure that the CatalogCourse exists. | ||
| cc, cc_created = CatalogCourse.objects.get_or_create( | ||
| org_id=org_data["id"], | ||
| course_code=course_code, | ||
| defaults={ | ||
| "display_name": display_name, | ||
| "language": language, | ||
| }, | ||
| ) | ||
| if cc_created: | ||
| created_catalog_course_ids.add(cc.pk) | ||
| elif cc.pk in created_catalog_course_ids: | ||
| # This CatalogCourse was previously created during this same migration | ||
| # Check if all the runs have the same display_name: | ||
| if ( | ||
| course_overview | ||
| and course_overview.display_name | ||
| and course_overview.display_name != cc.display_name | ||
| and cc.display_name != course_code | ||
| ): | ||
| # The runs have different names, so just use the course code as the common catalog course name. | ||
| cc.display_name = course_code | ||
| cc.save(update_fields=["display_name"]) | ||
|
|
||
| if cc.course_code != course_code: | ||
| raise ValueError( | ||
| f"The course {course_idx.course_id} exists in modulestore with a different capitalization of its " | ||
| f'course code compared to other instances of the same run ("{course_code}" vs "{cc.course_code}"). ' | ||
| "This really should not happen. To fix it, delete the inconsistent course runs (!). " | ||
| ) | ||
|
|
||
| # Create the CourseRun | ||
| new_run, run_created = CourseRun.objects.get_or_create( | ||
| catalog_course=cc, | ||
| run_code=run_code, | ||
| course_key=course_idx.course_id, | ||
| defaults={"display_name": display_name}, | ||
| ) | ||
|
|
||
| # Correct the "created" timestamp. Since it has auto_now_add=True, we can't set its value except using update() | ||
| # The CourseOverview should have the "created" date unless it's missing or the course was created before | ||
| # the CourseOverview model existed. In any case, it should be good enough. Otherwise use the default (now). | ||
| if course_overview: | ||
| if course_overview.created < cc.created and cc.pk in created_catalog_course_ids: | ||
| # Use the 'created' date from the oldest course run that we process. | ||
| CatalogCourse.objects.filter(pk=cc.pk).update(created=course_overview.created) | ||
| if run_created: | ||
| CourseRun.objects.filter(pk=new_run.pk).update(created=course_overview.created) | ||
|
|
||
|
|
||
| class Migration(migrations.Migration): | ||
| dependencies = [ | ||
| ("openedx_catalog", "0001_initial"), | ||
| ("course_overviews", "0029_alter_historicalcourseoverview_options"), | ||
| ("split_modulestore_django", "0003_alter_historicalsplitmodulestorecourseindex_options"), | ||
| ] | ||
|
|
||
| operations = [ | ||
| migrations.RunPython(backfill_openedx_catalog, reverse_code=migrations.RunPython.noop), | ||
| ] | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My instinct here would be to backfill using the latest CourseRun's title rather than the course code. My thinking is this: in our new model, the CatalogCourse title represents the "default" CourseRun title, i.e. the CatalogCourse title is used when no CourseRun is specified. Thus, the CatalogCourse title should be backfilled to something that an instructor would actually want to use as the title when they're re-running the course, and I think that would be the latest CourseRun's title rather than the course code.
But, that assumes that course authors typically do not reflect the run into their course title (e.g., "Physics 101 - Spring 2027"), which I think is a safe assumption, but I could be wrong. What do you think @ormsbee @bradenmacdonald ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually thinking the opposite here, that authors sometimes do incorporate the run into their title in exactly that manner, because I'm pretty sure I've seen that, on instances which have lots of very similar runs (e.g. CCX use case with dozens of identical CCX variants of a course). But I just checked a few real instances and didn't see any examples of this, so maybe it's uncommon enough that we should just use the title as the default.