Skip to content

Rethink how to handle files in the judging process #152

Open
@lw

Description

@lw

This text has grown longer that I had expected. If you already know the current situation, with its issues, feel free to jump to the Proposal section and spare some time.

Premise

The judging process is the sequence of operations performed by CMS on a submission to obtain a score and all intermediate products (executables and evaluations). This comprises compilation, evaluation and scoring, and usually requires access to data associated to the task and the dataset. The first two steps (i.e. compilation and evaluation) handle files and are managed by the TaskType whereas the third one (i.e. scoring) uses the outcome and text of the evaluation and is managed by the ScoreType.

The files required by this process (we could call them "input", if this name wasn't already in use for something else) are the files submitted by the user, the managers provided by the task/dataset, the input and the (official) output of the testcase. (Note that managers may play a role in both compilation (graders, stubs, etc.) and evaluation (comparators, managers, etc.)). Other files can be generated and the re-used during the process: executables and outputs produced by the user's program (to be compared with the official one).

It would be tempting to think that the only ones provided by the user are the ones of the first kind, but:

  • the other ones are provided by admins which can be considered "users" as well;
  • we have to support usertests, in which the contestants themselves provide (some) managers and input files.

Problems

At the moment we use the submission_format attribute of tasks, which is basically a list of strings, to specify the names of the files the user has to provide. This is limiting because:

  • There's no way, for example, to give a different size limit for each file: the size limit is specified in cms.conf and is the same for all files of all tasks (while inputs of output-only tasks, which are actually outputs, may be much larger that inputs of batch tasks, which are source files).
  • All languages need to have the same number of files, with approximately the same structure and names: we only allow to use a %l to insert a language-specific file extension. This worked well so far but I think it won't anymore as soon as we add support for other languages, in particular interpreted scripting languages.
  • There's no way to customize the set of files required for usertests. The current solution was quickly written before IOI and, in my opinion, looks like a workaround: have the TaskType list these files via the get_user_managers and get_auto_managers methods. Not flexible and it's again impossible to impose size limits. There's also no way to give a name to the input file (which CWS hardcoded-ly names "input") and to specify its size (except in cms.conf, as before).

Finally, our testcases are designed to hold one input and one output files. This has worked well up to now, but I see cases where tasks may require different settings (see task C, "Garbled Email" from Round 1B of Google Code Jam 2013: I think the dictionary should be represented as an additional input file even if, in that case, due to its being always the same a manager would have worked as well).

For the sake of flexibility I propose to change some of these things.

Proposal

I have not made up my mind about the details yet, but here are some ideas.

The items of submission_format (that is SubmissionFormatElements) should be extended with more information. In particular I suggest to use different elements for files of different languages. Hence filename would contain the filename with the exact extension, no wildcards (e.g. "foo.cpp") and a new field, language, should contain a string defining the language ("c", "cpp", "pas" or None if language-independent, like in output-only tasks). I would then add a description field to contain a short human-readable text describing the file to be shown, for example, in CWS. Additionally I think a codename field is needed: it would contain values like "source", "encoder", "decoder" etc. and it would allow TaskTypes to get the right files they need without knowing the exact filename (which now also depends on the language!). The codename will be hidden from the user and only used internally. And, finally, a max_size field for the maximum file limit (with None meaning infinity?).

An almost identical structure would then be introduced for managers, to be used for usertest submission. Two additional fields would then be added: submission_digest (not nullable) and user_test_digest (nullable). The first tells the digest of the file to be used for this manager when judging submissions, the second tells the one to use in user_test or None to ask the user to provide it. This approach would replace the current manager implementation. This is the part I'm less convinced about: I think it makes sense to split the format definition (that goes in the task) from the actual files (that go in the datasets).

I fear a similar thing has then to be added for input files too (to specify filename, codename and size), and perhaps even for outputs (only to pair filename with codename). This seems an overkill and I'd better think about it another while.

All of this would also require some changes in Job and related classes. I was thinking to have EvaluationJob represent a single testcase instead of a whole list, and to group them together into something like a "JobGroup" (a dict of keywords to Jobs). A single Job would then simply list all submitted files, managers, inputs and outputs indexed by their codename. The attached managers, for example, would be all (and only) the ones for the language of the submission and the language-independent ones.

I was also thinking of having TaskType "suggest" the file formats they need, for example by telling which codenames they understand, whether they're optional, etc. using methods like the current get_user_managers.

What do you think of this proposal? I'd like to read comments and feedback before working better on the details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions