Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
ea482d5
added a feature to remove user both as worskpace member and workspace…
ishangujarathi Apr 19, 2024
8219ef7
disabling user from daily emails when user is marked inactive at orga…
ishangujarathi Apr 23, 2024
7b05c4d
stopping_task_pr_1
ch20b063 Apr 25, 2024
e8c8edc
fixed the bug regarding status not getting changed in the Active stat…
ishangujarathi Apr 29, 2024
74b55b9
Merge branch 'dev' into stopping_task2
ishvindersethi22 Apr 30, 2024
3faec98
updated the email for user analytics report
Pursottam6003 May 2, 2024
99254eb
Merge branch 'dev' of https://github.com/ai4Bharat/Shoonya-Backend in…
Pursottam6003 May 2, 2024
5f44c3f
updated the black linting
Pursottam6003 May 2, 2024
2b6d819
deleting and resuming task
ch20b063 May 3, 2024
7caee97
Merge branch 'stopping_task2' of https://github.com/AI4Bharat/Shoonya…
ch20b063 May 3, 2024
76ae455
Merge branch 'dev' into stopping_task2
KunalTiwary May 6, 2024
37dab45
updated the email template for the backend code
Pursottam6003 May 6, 2024
42966df
changed the time formatting
Pursottam6003 May 6, 2024
bf70ea0
Merge branch 'dev' of https://github.com/ai4Bharat/Shoonya-Backend in…
Pursottam6003 May 6, 2024
d64f55b
Merge pull request #1062 from AI4Bharat/remove_user_workspaces
aparna-aa May 7, 2024
40a96be
added a feature to remove user from frozen users list when user is ma…
ishangujarathi May 8, 2024
7cd1089
Merge pull request #1074 from AI4Bharat/user_marked_active
aparna-aa May 11, 2024
f0d05bc
updated the changes for new project creation
Pursottam6003 May 13, 2024
ef047a7
made changes to create project
Pursottam6003 May 18, 2024
5bd41d8
Merge branch 'dev' of https://github.com/AI4Bharat/Shoonya-Backend in…
KunalTiwary May 21, 2024
bcd73eb
Merge branch 'new_proj_creation' of https://github.com/AI4Bharat/Shoo…
KunalTiwary May 21, 2024
cf8c386
added changes for StandardizedTranscriptionEditing project type
KunalTiwary May 27, 2024
9dfdb52
black linting
KunalTiwary May 27, 2024
3b93f2d
black changes
KunalTiwary May 27, 2024
86967b6
modified the download endpoint
KunalTiwary May 27, 2024
3216c3e
removed commented lines
KunalTiwary May 27, 2024
8926ff3
export changes
KunalTiwary May 30, 2024
19386b2
modified the flow
KunalTiwary Jun 4, 2024
a91547f
added total duration
KunalTiwary Jun 5, 2024
ba7a4f6
changes for transcribed_json
KunalTiwary Jun 6, 2024
c8a3072
added ac enabled stage
KunalTiwary Jun 6, 2024
bef8b14
minor fix
KunalTiwary Jun 6, 2024
0b4d126
Merge branch 'dev' into invitation_email_formatting
ishvindersethi22 Jun 7, 2024
0c10aae
Merge pull request #1068 from AI4Bharat/invitation_email_formatting
ishvindersethi22 Jun 7, 2024
57f53dd
updated the code for creating new logging functionality
Pursottam6003 Jun 18, 2024
f00db18
Merge branch 'dev' of https://github.com/AI4Bharat/Shoonya-Backend in…
KunalTiwary Jun 19, 2024
57778db
Initial changes
KunalTiwary Jun 19, 2024
fe34f7b
Merge branch 'dev' into StandardizedTranscriptionEditing
KunalTiwary Jun 20, 2024
7784f38
Merge pull request #1082 from AI4Bharat/StandardizedTranscriptionEditing
ishvindersethi22 Jun 20, 2024
cf0b1c9
Merge branch 'dev' of https://github.com/AI4Bharat/Shoonya-Backend in…
KunalTiwary Jun 20, 2024
8528e8b
Merge branch 'dev' of https://github.com/AI4Bharat/Shoonya-Backend in…
KunalTiwary Jun 20, 2024
45d5746
added changes to download and export
KunalTiwary Jun 21, 2024
c37a807
ressolve the blocking api issue part 2
Pursottam6003 Jun 22, 2024
a8de12c
formatted the code with black
Pursottam6003 Jun 22, 2024
c648508
Merge branch 'dev' into new-logging-branch
Pursottam6003 Jun 22, 2024
c8d27fd
ressolved the logging error
Pursottam6003 Jun 22, 2024
19b1aba
updated the code for backend to create a new log file with respect to…
Pursottam6003 Jun 24, 2024
fec9f3f
small_fix
KunalTiwary Jun 27, 2024
f056eb0
added duplicate annotation message
KunalTiwary Jun 27, 2024
e2b47a7
Merge pull request #1088 from AI4Bharat/duplicate_fix
ishvindersethi22 Jun 28, 2024
7dae3f2
Merge branch 'dev' into OCRSegmentCategorisationRelationMappingEditing
ishvindersethi22 Jun 28, 2024
6649625
Merge pull request #1087 from AI4Bharat/OCRSegmentCategorisationRelat…
ishvindersethi22 Jun 28, 2024
3215774
Merge branch 'dev' into logging-functionality
ishvindersethi22 Jun 28, 2024
0e5007b
Merge pull request #1084 from AI4Bharat/logging-functionality
ishvindersethi22 Jun 28, 2024
959e404
Merge branch 'dev' into new-logging-branch
ishvindersethi22 Jun 28, 2024
dfb5a93
Merge pull request #1085 from AI4Bharat/new-logging-branch
ishvindersethi22 Jun 28, 2024
759e0da
celery fix
KunalTiwary Jun 28, 2024
45cc3a9
Merge pull request #1089 from AI4Bharat/celery_fix
ishvindersethi22 Jul 1, 2024
5ccc43c
changes for setting back parent result to revised/rejected task result
kartikvirendrar Jul 8, 2024
4bc843c
changes for setting back parent result to revised/rejected task result
kartikvirendrar Jul 8, 2024
807b3c5
Merge pull request #1094 from AI4Bharat/revised-rejected
aparna-aa Jul 9, 2024
38a2133
substituted lang_choices in constants
KunalTiwary Jul 9, 2024
5d51fcb
Merge pull request #1095 from AI4Bharat/lang_fix
aparna-aa Jul 9, 2024
ec10755
Merge branch 'dev' into master
KunalTiwary Jul 26, 2024
c6ed2fe
Merge pull request #1099 from AI4Bharat/master
aparna-aa Jul 26, 2024
da3bb7d
added filtering for datasets
KunalTiwary Aug 21, 2024
4e83bcd
Merge pull request #1109 from AI4Bharat/dataset_filtering
ishvindersethi22 Aug 21, 2024
c103054
sup_cumulative_tasks_count
KunalTiwary Sep 6, 2024
f851b15
Update .env.example
ishvindersethi22 Sep 19, 2024
1726670
Merge branch 'master' into cumulative_fix
KunalTiwary Sep 20, 2024
64a9b1b
Merge pull request #1111 from AI4Bharat/cumulative_fix
ishvindersethi22 Sep 23, 2024
bac533f
added count fix
KunalTiwary Sep 23, 2024
36ae80b
Merge pull request #1115 from AI4Bharat/sup_cumulative_changes
ishvindersethi22 Sep 23, 2024
5504122
Merge pull request #1112 from AI4Bharat/env-update
ishvindersethi22 Sep 23, 2024
5a55d01
added changes for azure keys
KunalTiwary Sep 23, 2024
037cc20
Merge branch 'master' into azure__key_fix
ishvindersethi22 Sep 23, 2024
7ea175c
Merge pull request #1116 from AI4Bharat/azure__key_fix
ishvindersethi22 Sep 23, 2024
8443826
resolved key error
KunalTiwary Sep 23, 2024
fb6b0c4
Merge branch 'master' into azure_key_fix_master
ishvindersethi22 Sep 23, 2024
87c0ffd
Merge pull request #1117 from AI4Bharat/azure_key_fix_master
ishvindersethi22 Sep 23, 2024
748a6ef
minor_fix in acoustic_enabled_stage
KunalTiwary Oct 25, 2024
6227184
Merge pull request #1124 from AI4Bharat/minor_fix_ac_en_stage
aparna-aa Oct 25, 2024
f33852b
bulk_add_members_to_projects
munishmangla98 Jun 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 44 additions & 12 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1,21 +1,53 @@
SECRET_KEY='<-- YOUR SECRET KEY HERE -->'

DB_NAME='postgres' # Insert your database name here
DB_USER='postgres' # Insert your PostgreSQL username here
DB_PASSWORD='password' #Insert your PostgreSQL password here.
DB_HOST='db'
DB_PORT='5432'
DB_NAME='citus' # Insert your database name here
DB_USER='citus' # Insert your PostgreSQL username here
DB_PASSWORD='' #Insert your PostgreSQL password here.
DB_HOST=''

SMTP_USERNAME = ""
SMTP_PASSWORD = ""
API_URL=''

API_URL='http://localhost:8000'

LOGGING='false'
LOG_LEVEL='INFO'

ENV='dev'
DB_PORT='5432'

FRONTEND_URL=''
ENV='dev'
DEFAULT_FROM_EMAIL=""
EMAIL_HOST=""
SMTP_USERNAME=""
SMTP_PASSWORD=""

INDIC_TRANS_V2_KEY=''
INDIC_TRANS_V2_URL=''

LOGGING='true'
LOG_LEVEL='WARNING'

GOOGLE_APPLICATION_CREDENTIALS = ''

FRONTEND_URL_FOR_RESET_PASSWORD = 'https://dev.shoonya.ai4bharat.org'
SECRET_KEY_RESET_PASSWORD = ''

ASR_DHRUVA_URL = ''
ASR_DHRUVA_AUTHORIZATION = ''

INDEX_NAME= 'django_logs_dev'
ELASTICSEARCH_URL=''

AZURE_CONNECTION_STRING = ''

STORAGE_ACCOUNT_CONNECTION_STRING='

CONTAINER_NAME_FOR_DOWNLOAD_ALL_PROJECTS=''

LOGS_CONTAINER_NAME='logs'


FLOWER_ADDRESS="localhost"
FLOWER_PORT="5555"
FLOWER_USERNAME="shoonya"
FLOWER_PASSWORD="flower123"
FRONTEND_URL=''
CELERY_BROKER_URL="redis://redis:6379"
REDIS_HOST="127.0.0.1"
REDIS_PORT="6379"
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Generated by Django 3.2.14 on 2024-05-21 06:02

from django.db import migrations, models


class Migration(migrations.Migration):
dependencies = [
("dataset", "0046_merge_20240416_2233"),
]

operations = [
migrations.AddField(
model_name="speechconversation",
name="final_transcribed_json",
field=models.JSONField(
blank=True,
help_text="Field where data from this standardised_transcription_editing type will be exported.",
null=True,
verbose_name="final_transcribed_json",
),
),
]
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Generated by Django 3.2.14 on 2024-06-19 11:15

from django.db import migrations, models


class Migration(migrations.Migration):
dependencies = [
("dataset", "0047_speechconversation_final_transcribed_json"),
]

operations = [
migrations.AddField(
model_name="ocrdocument",
name="bboxes_relation_prediction_json",
field=models.JSONField(
blank=True, null=True, verbose_name="bboxes_relation_prediction_json"
),
),
]
12 changes: 12 additions & 0 deletions backend/dataset/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -311,6 +311,10 @@ class OCRDocument(DatasetBase):
verbose_name="bboxes_relation_json", null=True, blank=True
)

bboxes_relation_prediction_json = models.JSONField(
verbose_name="bboxes_relation_prediction_json", null=True, blank=True
)

annotated_document_details_json = models.JSONField(
verbose_name="annotated_document_details_json", null=True, blank=True
)
Expand Down Expand Up @@ -484,6 +488,14 @@ class SpeechConversation(DatasetBase):
blank=True,
help_text=("Prepopulated prediction for the implemented models"),
)
final_transcribed_json = models.JSONField(
verbose_name="final_transcribed_json",
null=True,
blank=True,
help_text=(
"Field where data from this standardised_transcription_editing type will be exported."
),
)

def __str__(self):
return str(self.id)
Expand Down
10 changes: 4 additions & 6 deletions backend/dataset/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,9 @@
#### CELERY SHARED TASKS


@shared_task(
bind=True,
)
@shared_task(queue="default")
def upload_data_to_data_instance(
self, dataset_string, pk, dataset_type, content_type, deduplicate=False
dataset_string, pk, dataset_type, content_type, deduplicate=False
):
# sourcery skip: raise-specific-error
"""Celery background task to upload the data to the dataset instance through file upload.
Expand Down Expand Up @@ -102,8 +100,8 @@ def upload_data_to_data_instance(
raise Exception(f"Upload failed for lines: {failed_rows}")


@shared_task(bind=True)
def deduplicate_dataset_instance_items(self, pk, deduplicate_field_list):
@shared_task(queue="default")
def deduplicate_dataset_instance_items(pk, deduplicate_field_list):
if len(deduplicate_field_list) == 0:
return "Field list cannot be empty"
try:
Expand Down
112 changes: 105 additions & 7 deletions backend/functions/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
ANNOTATED,
)
from tasks.views import SentenceOperationViewSet
from users.models import User, LANG_CHOICES
from users.models import User
from django.core.mail import EmailMessage

from utils.blob_functions import (
Expand All @@ -56,7 +56,11 @@
import tempfile

from shoonya_backend.locks import Lock

from utils.constants import LANG_CHOICES
from projects.tasks import filter_data_items
from projects.models import BATCH
from dataset import models as dataset_models
from projects.registry_helper import ProjectRegistry
import logging

logger = logging.getLogger(__name__)
Expand All @@ -72,6 +76,10 @@ def sentence_text_translate_and_save_translation_pairs(
input_dataset_instance_id,
output_dataset_instance_id,
batch_size,
filter_string,
sampling_mode,
sampling_parameters,
variable_parameters,
api_type="indic-trans-v2",
checks_for_particular_languages=False,
automate_missing_data_items=True,
Expand All @@ -87,6 +95,10 @@ def sentence_text_translate_and_save_translation_pairs(
Allowed - [indic-trans, google, indic-trans-v2, azure, blank]
checks_for_particular_languages (bool): If True, checks for the particular languages in the translations.
automate_missing_data_items (bool): If True, consider only those data items that are missing in the target dataset instance.
filter_string (str): string to filter input data.
sampling_mode (str): can be batch or full.
sampling_parameters (json): is a json that contains, batch number and batch size

"""
task_name = "sentence_text_translate_and_save_translation_pairs"
output_sentences = list(
Expand All @@ -113,6 +125,14 @@ def sentence_text_translate_and_save_translation_pairs(
"metadata_json",
)
)
if filter_string and sampling_mode and sampling_parameters:
input_sentences = get_filtered_items(
"SentenceText",
input_dataset_instance_id,
filter_string,
sampling_mode,
sampling_parameters,
)

# Convert the input_sentences list into a dataframe
input_sentences_complete_df = pd.DataFrame(
Expand Down Expand Up @@ -403,7 +423,15 @@ def conversation_data_machine_translation(

@shared_task(bind=True)
def generate_ocr_prediction_json(
self, dataset_instance_id, user_id, api_type, automate_missing_data_items
self,
dataset_instance_id,
user_id,
api_type,
automate_missing_data_items,
filter_string,
sampling_mode,
sampling_parameters,
variable_parameters,
):
"""Function to generate OCR prediction data and to save to the same data item.
Args:
Expand Down Expand Up @@ -436,7 +464,14 @@ def generate_ocr_prediction_json(
)
except Exception as e:
ocr_data_items = []

if filter_string and sampling_mode and sampling_parameters:
ocr_data_items = get_filtered_items(
"OCRDocument",
dataset_instance_id,
filter_string,
sampling_mode,
sampling_parameters,
)
# converting the dataset_instance to pandas dataframe.
ocr_data_items_df = pd.DataFrame(
ocr_data_items,
Expand Down Expand Up @@ -555,7 +590,15 @@ def generate_ocr_prediction_json(

@shared_task(bind=True)
def generate_asr_prediction_json(
self, dataset_instance_id, user_id, api_type, automate_missing_data_items
self,
dataset_instance_id,
user_id,
api_type,
automate_missing_data_items,
filter_string,
sampling_mode,
sampling_parameters,
variable_parameters,
):
"""Function to generate ASR prediction data and to save to the same data item.
Args:
Expand Down Expand Up @@ -589,7 +632,14 @@ def generate_asr_prediction_json(
)
except Exception as e:
asr_data_items = []

if filter_string and sampling_mode and sampling_parameters:
asr_data_items = get_filtered_items(
"SpeechConversation",
dataset_instance_id,
filter_string,
sampling_mode,
sampling_parameters,
)
# converting the dataset_instance to pandas dataframe.
asr_data_items_df = pd.DataFrame(
asr_data_items,
Expand Down Expand Up @@ -703,7 +753,16 @@ def generate_asr_prediction_json(


@shared_task(bind=True)
def populate_draft_data_json(self, pk, user_id, fields_list):
def populate_draft_data_json(
self,
pk,
user_id,
fields_list,
filter_string,
sampling_mode,
sampling_parameters,
variable_parameters,
):
task_name = "populate_draft_data_json"
try:
dataset_instance = DatasetInstance.objects.get(pk=pk)
Expand All @@ -712,6 +771,10 @@ def populate_draft_data_json(self, pk, user_id, fields_list):
dataset_type = dataset_instance.dataset_type
dataset_model = apps.get_model("dataset", dataset_type)
dataset_items = dataset_model.objects.filter(instance_id=dataset_instance)
if filter_string and sampling_mode and sampling_parameters:
dataset_items = get_filtered_items(
dataset_type, pk, filter_string, sampling_mode, sampling_parameters
)
cnt = 0
for dataset_item in dataset_items:
new_draft_data_json = {}
Expand Down Expand Up @@ -1695,3 +1758,38 @@ def upload_all_projects_to_blob_and_get_url(csv_files_directory):
return "Error in generating url"
blob_url = f"https://{account_name}.blob.{endpoint_suffix}/{CONTAINER_NAME_FOR_DOWNLOAD_ALL_PROJECTS}/{blob_client.blob_name}?{sas_token}"
return blob_url


def get_filtered_items(
dataset_model,
dataset_instance_id,
filter_string,
sampling_mode,
sampling_parameters,
):
registry_helper = ProjectRegistry.get_instance()
project_type = registry_helper.get_project_name_from_dataset(dataset_model)
if not isinstance(dataset_instance_id, list):
dataset_instance_id = [dataset_instance_id]
filtered_items = filter_data_items(
project_type=project_type,
dataset_instance_ids=dataset_instance_id,
filter_string=filter_string,
)
# Apply sampling
if sampling_mode == BATCH:
batch_size = sampling_parameters["batch_size"]
try:
batch_number = sampling_parameters["batch_number"]
if len(batch_number) == 0:
batch_number = [1]
except KeyError:
batch_number = [1]
sampled_items = []
for batch_num in batch_number:
sampled_items += filtered_items[
batch_size * (batch_num - 1) : batch_size * batch_num
]
else:
sampled_items = filtered_items
return sampled_items
Loading
Loading