Skip to content

Conversation

@robertatakenaka
Copy link
Member

📝 Descrição

Este PR introduz uma mudança estrutural na forma como os históricos de instituições (Owner, Publisher, Sponsor, Copyright) lidam com dados brutos (raw data). A principal alteração é a inversão da ordem de herança para priorizar o RawOrganizationMixin, permitindo que o sistema valide campos de texto original antes de recorrer ao modelo legado de instituições.

🛠️ Alterações Técnicas

1. Ordem de Resolução de Métodos (MRO)

  • Alterada a assinatura das classes OwnerHistory, PublisherHistory, SponsorHistory e CopyrightHolderHistory.
  • Nova ordem: (..., RawOrganizationMixin, BaseHistoryItem).
  • Impacto: Chamadas ao método .get() agora priorizam a lógica de raw_text, resolvendo o conflito de assinaturas onde a BaseHistoryItem era limitada a institution.

2. Core & Mixins (core/models.py)

  • Implementação robusta de get, get_or_create e create no RawOrganizationMixin.
  • Adição da propriedade institution_name com lógica de fallback: raw_institution_name > raw_text > institution.name.
  • Adição da propriedade data para abstrair a recuperação de geolocalização (country_acronym, etc).

3. Refatoração de Tasks e Sources

  • journal/tasks.py: Substituição de lógicas manuais de bulk_update por chamadas aos métodos add_* do modelo Journal, garantindo que a lógica de negócio esteja centralizada no model e não na task.
  • journal/models.py: Limpeza de código nas propriedades owner_names, publisher_names, etc., utilizando as novas propriedades computadas.

🧪 Como Testar?

  1. Migrações: Certifique-se de que não há migrações pendentes (as mudanças foram majoritariamente lógicas em métodos e propriedades).
  2. Criação de Histórico:
# O teste deve validar que o Mixin agora captura o original_data corretamente
journal.add_publisher(original_data="Nome da Editora Bruto", ...)
  1. Verificação de Fallback:
  • Crie um item de histórico apenas com institution.
  • Crie outro apenas com raw_text.
  • Ambos devem retornar o nome correto ao acessar .institution_name.

📌 Notas Adicionais

  • A remoção do parâmetro raw_text em favor de original_data nos métodos do Journal visa unificar a linguagem ubíqua do projeto, evitando confusão entre o dado que chega da fonte (AM) e o dado persistido.
  • O uso de cls.objects.get(**params) dentro do Mixin garante que a busca seja feita na classe final (ex: OwnerHistory), respeitando o polimorfismo do Django.

- Adiciona métodos @classmethod get, get_or_create e create.
- Implementa priorização de 'raw_text' sobre 'institution' para compatibilidade legada.
- Adiciona propriedades 'institution_name' e 'data' para abstração de campos de localização.
…e data

- Adiciona método 'create' em BaseHistoryItem para padronização.
- Implementa propriedade 'data' para retornar metadados geográficos da instituição.
…cesso a dados

- Altera ordem de herança (RawOrganizationMixin antes de BaseHistoryItem) para priorizar novos campos.
- Simplifica propriedades owner_names, publisher_names e sponsors usando a nova API.
- Remove parâmetro redundante 'raw_text' em métodos add_* em favor de 'original_data'.
…portação

- Otimiza a extração de país, estado e cidade para evitar chamadas redundantes.
- Garante que os dados brutos sejam repassados corretamente para add_publisher e add_owner.
- Substitui o uso de bulk_update por chamadas aos métodos add_* do modelo Journal.
- Melhora a legibilidade do loop de processamento de AMJournal.
- Garante que patrocinadores e detentores de copyright também sejam processados pelo novo fluxo.
Copilot AI review requested due to automatic review settings February 5, 2026 22:45
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the history model classes (OwnerHistory, PublisherHistory, SponsorHistory, CopyrightHolderHistory) to prioritize raw organization data over legacy institution fields. The main change is inverting the inheritance order to place RawOrganizationMixin before BaseHistoryItem in the Method Resolution Order (MRO), which affects how these classes resolve method calls and property access.

Changes:

  • Changed MRO for history classes to prioritize RawOrganizationMixin methods and properties over BaseHistoryItem
  • Implemented get, get_or_create, and create methods in RawOrganizationMixin with raw_text as the primary identifier
  • Added institution_name and data properties to RawOrganizationMixin with fallback logic
  • Refactored task_replace_institution_by_raw_institution to use Journal.add_* methods instead of bulk_update
  • Removed raw_text parameter from Journal.add_* methods in favor of original_data
  • Simplified property implementations in Journal model to use the new institution_name property

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
core/models.py Implements get/get_or_create/create methods and properties in RawOrganizationMixin to handle raw organization data
journal/models.py Changes MRO for history classes, removes raw_text parameter, and simplifies property implementations
journal/tasks.py Refactors task to use Journal.add_* methods instead of bulk_update for updating history records
journal/sources/am_to_core.py Extracts variable assignments outside the if block for better code organization
institution/models.py Refactors BaseHistoryItem.get_or_create to use new create method and adds data property

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.



class OwnerHistory(Orderable, ClusterableModel, BaseHistoryItem, RawOrganizationMixin):
class OwnerHistory(Orderable, ClusterableModel, RawOrganizationMixin, BaseHistoryItem):
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MRO change places RawOrganizationMixin before BaseHistoryItem, which means RawOrganizationMixin.institution_name property will override BaseHistoryItem.institution_name. While this appears intentional based on the PR description, there's a risk of breaking existing code that relies on the institution field. The fallback logic in RawOrganizationMixin.institution_name checks for raw_institution_name, then raw_text, then institution. However, if all these fields are None, this could result in unexpected None values where previously institution.institution_name was guaranteed to exist. Consider ensuring that either raw_text or institution is always populated when creating history records.

Copilot uses AI. Check for mistakes.
Comment on lines 1211 to 1216
initial_date=None,
final_date=None,
location=None,
raw_text=None,
raw_institution_name=None,
raw_country_name=None,
raw_country_code=None,
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test inconsistency: The test at journal/tests.py:355 passes raw_text parameter to add_copyright_holder, but the raw_text parameter has been removed from the method signature in this PR. This test will fail with a TypeError for unexpected keyword argument. The test should be updated to remove the raw_text parameter or verify that raw_text is automatically set from original_data.

Copilot uses AI. Check for mistakes.
core/models.py Outdated
final_date=final_date,
institution=institution,
)
except cls.DoesNotExist:
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect indentation in the except clause. The line has an extra space before "except" which will cause a SyntaxError. The "except" keyword should align with the "try" keyword.

Suggested change
except cls.DoesNotExist:
except cls.DoesNotExist:

Copilot uses AI. Check for mistakes.
Comment on lines +230 to 254
raw_country_name=extract_value(publisher_country)
raw_state_name=extract_value(publisher_state)
raw_city_name=extract_value(publisher_city)

# Usa novo método add_publisher ao invés de PublisherHistory
journal.add_publisher(
user=user,
original_data=p,
location=location,
raw_institution_name=p,
raw_country_name=extract_value(publisher_country),
raw_state_name=extract_value(publisher_state),
raw_city_name=extract_value(publisher_city),
raw_country_name=raw_country_name,
raw_state_name=raw_state_name,
raw_city_name=raw_city_name,
)
# Usa novo método add_owner ao invés de OwnerHistory
journal.add_owner(
user=user,
original_data=p,
location=location,
raw_institution_name=p,
raw_country_name=extract_value(publisher_country),
raw_state_name=extract_value(publisher_state),
raw_city_name=extract_value(publisher_city),
raw_country_name=raw_country_name,
raw_state_name=raw_state_name,
raw_city_name=raw_city_name,
)
journal.contact_name = p
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic issue: The variables raw_country_name, raw_state_name, and raw_city_name are being assigned inside the loop after the "if p:" check, but they are then used in both add_publisher and add_owner calls even when the if condition is False (when p is empty/None). The variable assignments should be moved outside the loop, or the calls to add_publisher and add_owner should be indented to be inside the "if p:" block.

Copilot uses AI. Check for mistakes.
Prioriza os campos raw_* sobre o campo institution (deprecated).
"""
_data = {}
_data["country_acronym"] = self.raw_country_code
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistency in data property implementations. The RawOrganizationMixin.data property (lines 429-432) returns raw_country_code for "country_acronym", while BaseHistoryItem.data property (institution/models.py:556) returns institution_country_acronym. Since RawOrganizationMixin now comes first in the MRO, the RawOrganizationMixin version will be used, potentially breaking existing behavior. The field mapping should be consistent: raw_country_code is the ISO code, not an acronym. Consider whether "country_acronym" should map to raw_country_code or if a different field name should be used.

Suggested change
_data["country_acronym"] = self.raw_country_code
# Mantém compatibilidade: "country_acronym" continua sendo um acrônimo,
# conforme BaseHistoryItem.data (institution_country_acronym).
_data["country_acronym"] = getattr(self, "institution_country_acronym", None)
# Exponha o código ISO em uma chave separada e semanticamente correta.
_data["country_iso_code"] = getattr(self, "raw_country_code", None)

Copilot uses AI. Check for mistakes.
Comment on lines 759 to 762
owners = list(
self.owner_history.select_related(
"institution__institution", "institution__institution__location"
).all()
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistency: The owner_data property still uses select_related optimization (lines 760-762) while the other properties (publisher_names, sponsors, copyright_holders) had their select_related removed. This is inconsistent. Either all should use select_related for performance optimization, or none should. Since owner_data accesses the data property which may need related objects, keeping select_related here makes sense, but consider whether the other properties might also need it if they access institution fields as a fallback.

Copilot uses AI. Check for mistakes.


class OwnerHistory(Orderable, ClusterableModel, BaseHistoryItem, RawOrganizationMixin):
class OwnerHistory(Orderable, ClusterableModel, RawOrganizationMixin, BaseHistoryItem):
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base classes have conflicting values for attribute 'get': classmethod() and classmethod().
Base classes have conflicting values for attribute 'data': Property data and Property data.
Base classes have conflicting values for attribute 'Meta': class Meta and class Meta.
Base classes have conflicting values for attribute 'create': classmethod() and classmethod().
Base classes have conflicting values for attribute 'get_or_create': classmethod() and classmethod().
Base classes have conflicting values for attribute 'institution_name': Property institution_name and Property institution_name.

Copilot uses AI. Check for mistakes.


class PublisherHistory(Orderable, ClusterableModel, BaseHistoryItem, RawOrganizationMixin):
class PublisherHistory(Orderable, ClusterableModel, RawOrganizationMixin, BaseHistoryItem):
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base classes have conflicting values for attribute 'get': classmethod() and classmethod().
Base classes have conflicting values for attribute 'data': Property data and Property data.
Base classes have conflicting values for attribute 'Meta': class Meta and class Meta.
Base classes have conflicting values for attribute 'create': classmethod() and classmethod().
Base classes have conflicting values for attribute 'get_or_create': classmethod() and classmethod().
Base classes have conflicting values for attribute 'institution_name': Property institution_name and Property institution_name.

Copilot uses AI. Check for mistakes.


class SponsorHistory(Orderable, ClusterableModel, BaseHistoryItem, RawOrganizationMixin):
class SponsorHistory(Orderable, ClusterableModel, RawOrganizationMixin, BaseHistoryItem):
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base classes have conflicting values for attribute 'get': classmethod() and classmethod().
Base classes have conflicting values for attribute 'data': Property data and Property data.
Base classes have conflicting values for attribute 'Meta': class Meta and class Meta.
Base classes have conflicting values for attribute 'create': classmethod() and classmethod().
Base classes have conflicting values for attribute 'get_or_create': classmethod() and classmethod().
Base classes have conflicting values for attribute 'institution_name': Property institution_name and Property institution_name.

Copilot uses AI. Check for mistakes.


class CopyrightHolderHistory(Orderable, ClusterableModel, BaseHistoryItem, RawOrganizationMixin):
class CopyrightHolderHistory(Orderable, ClusterableModel, RawOrganizationMixin, BaseHistoryItem):
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base classes have conflicting values for attribute 'get': classmethod() and classmethod().
Base classes have conflicting values for attribute 'data': Property data and Property data.
Base classes have conflicting values for attribute 'Meta': class Meta and class Meta.
Base classes have conflicting values for attribute 'create': classmethod() and classmethod().
Base classes have conflicting values for attribute 'get_or_create': classmethod() and classmethod().
Base classes have conflicting values for attribute 'institution_name': Property institution_name and Property institution_name.

Copilot uses AI. Check for mistakes.
@robertatakenaka robertatakenaka merged commit b550e51 into scieloorg:main Feb 5, 2026
7 of 9 checks passed
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 16 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +498 to +503
journal.add_copyright_holder(
user=user,
original_data=_copyright_holder,
location=None,
raw_institution_name=_copyright_holder,
)
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code calls journal.add_copyright_holder() for each copyright holder, which will create new CopyrightHolderHistory records instead of updating existing ones. The original code used bulk_update to update the raw_institution_name field on existing records. This change could lead to duplicate copyright holder history records.

Copilot uses AI. Check for mistakes.
Comment on lines +556 to +558
_data["country_acronym"] = self.institution_country_acronym
_data["state_acronym"] = self.institution_state_acronym
_data["city_name"] = self.institution_city_name
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both RawOrganizationMixin (in core/models.py) and BaseHistoryItem (in institution/models.py) define a data property. With the new MRO where RawOrganizationMixin comes before BaseHistoryItem, the RawOrganizationMixin.data property will take precedence. However, these two implementations have different behaviors:

  • RawOrganizationMixin.data returns raw_country_code, raw_state_acron, raw_city_name
  • BaseHistoryItem.data returns institution_country_acronym, institution_state_acronym, institution_city_name

This change in MRO will alter the behavior of existing code that calls the data property on history instances. Verify that this is the intended behavior and that all consumers of this property can handle the new data structure.

Suggested change
_data["country_acronym"] = self.institution_country_acronym
_data["state_acronym"] = self.institution_state_acronym
_data["city_name"] = self.institution_city_name
# Prioriza campos raw_* se existirem; caso contrário, usa os campos
# derivados de institution_* para manter compatibilidade legada.
country_acronym = getattr(self, "raw_country_code", None)
if country_acronym is None:
country_acronym = self.institution_country_acronym
state_acronym = getattr(self, "raw_state_acron", None)
if state_acronym is None:
state_acronym = self.institution_state_acronym
city_name = getattr(self, "raw_city_name", None)
if city_name is None:
city_name = self.institution_city_name
_data["country_acronym"] = country_acronym
_data["state_acronym"] = state_acronym
_data["city_name"] = city_name

Copilot uses AI. Check for mistakes.
Comment on lines +1096 to 1099
institution_history.raw_text = original_data
institution_history.raw_institution_name = raw_institution_name
institution_history.raw_country_name = raw_country_name
institution_history.raw_country_code = raw_country_code
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code now sets raw_text = original_data on the history instance after calling get_or_create(). However, there's a logical issue: the get_or_create() at line 1086-1091 (outside this diff region) calls the method with institution parameter but not raw_text. With the new MRO where RawOrganizationMixin.get_or_create() takes precedence, it will query using institution parameter only, potentially missing existing records that have matching raw_text (now being set to original_data).

This could lead to duplicate records being created. Consider updating the get_or_create() call to include raw_text=original_data parameter so the query matches on the same fields that are being set.

Copilot uses AI. Check for mistakes.
Comment on lines +393 to +403
@classmethod
def create(cls, raw_text, initial_date=None, final_date=None, user=None, institution=None):
history = cls()
history.raw_text = raw_text
history.creator = user
history.initial_date = initial_date
history.final_date = final_date
if institution:
history.institution = institution
history.save()
return history
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The create() method in RawOrganizationMixin requires raw_text as a mandatory parameter (not Optional), but it's perfectly valid to create a history record with only institution and no raw_text (for legacy support). This will cause errors when calling create() with raw_text=None.

Consider making raw_text optional in the method signature and adding validation to ensure at least one of raw_text or institution is provided, similar to what's done in the get() method.

Copilot uses AI. Check for mistakes.
Comment on lines +427 to +432
Prioriza os campos raw_* sobre o campo institution (deprecated).
"""
_data = {}
_data["country_acronym"] = self.raw_country_code
_data["state_acronym"] = self.raw_state_acron
_data["city_name"] = self.raw_city_name
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data property in RawOrganizationMixin uses inconsistent field names compared to BaseHistoryItem.data:

  • Uses raw_country_code instead of country_acronym (country code vs acronym are different concepts)
  • Uses raw_state_acron (note: typo "acron" instead of "acronym") instead of state_acronym

This inconsistency could break consumers of this property that expect the field names to match those from BaseHistoryItem.data. Consider using the same field naming convention or clearly documenting the change.

Suggested change
Prioriza os campos raw_* sobre o campo institution (deprecated).
"""
_data = {}
_data["country_acronym"] = self.raw_country_code
_data["state_acronym"] = self.raw_state_acron
_data["city_name"] = self.raw_city_name
Prioriza os campos raw_* sobre o campo institution (deprecated).
Para manter compatibilidade com BaseHistoryItem.data, expõe os
seguintes campos normalizados:
- country_acronym
- state_acronym
- city_name
Também expõe explicitamente os campos brutos correspondentes:
- raw_country_code
- raw_state_acron
- raw_city_name
"""
_data = {}
# Campos normalizados, compatíveis com BaseHistoryItem.data
_data["country_acronym"] = self.raw_country_code
_data["state_acronym"] = self.raw_state_acron
_data["city_name"] = self.raw_city_name
# Campos brutos, para quem precisar de acesso direto aos valores raw_*
_data["raw_country_code"] = self.raw_country_code
_data["raw_state_acron"] = self.raw_state_acron
_data["raw_city_name"] = self.raw_city_name

Copilot uses AI. Check for mistakes.


class PublisherHistory(Orderable, ClusterableModel, BaseHistoryItem, RawOrganizationMixin):
class PublisherHistory(Orderable, ClusterableModel, RawOrganizationMixin, BaseHistoryItem):
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same MRO change as in OwnerHistory. See comment on line 1333 for implications.

Copilot uses AI. Check for mistakes.
Comment on lines +461 to +476
return cls.create(
institution=institution,
initial_date=initial_date,
final_date=final_date,
user=user,
)

@classmethod
def create(cls, institution, initial_date=None, final_date=None, user=None):
history = cls()
history.institution = institution
history.creator = user
history.initial_date = initial_date
history.final_date = final_date
history.save()
return history
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_or_create() method has been refactored to call the new create() method instead of inlining the creation logic. This is a good refactoring for code reusability. However, ensure that any subclasses that override get_or_create() are aware of this change in behavior, as they now need to ensure the create() method is properly implemented.

Copilot uses AI. Check for mistakes.
Comment on lines +456 to +473
journal.add_publisher(
user=user,
original_data=_publisher,
location=None,
raw_institution_name=_publisher,
raw_country_name=publisher_country,
raw_state_name=publisher_state,
raw_city_name=publisher_city,
)
journal.add_owner(
user=user,
original_data=_publisher,
location=None,
raw_institution_name=_publisher,
raw_country_name=publisher_country,
raw_state_name=publisher_state,
raw_city_name=publisher_city,
)
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refactored code calls journal.add_publisher() and journal.add_owner() for each publisher, which will create new history records instead of updating existing ones. The original code used bulk_update to update the raw_* fields on existing PublisherHistory and OwnerHistory records that matched the institution name. This change in behavior could lead to duplicate history records being created every time this task runs.

Consider using get_or_create logic to find existing history records first, or add a check to see if a record with the same institution already exists before creating a new one.

Copilot uses AI. Check for mistakes.
Comment on lines 758 to +766
def owner_data(self):
owner_data = {}
owners = list(
self.owner_history.select_related(
"institution__institution", "institution__institution__location"
).all()
)
for p in owners:
owner_data["country_acronym"] = p.institution_country_acronym
owner_data["state_acronym"] = p.institution_state_acronym
owner_data["city_name"] = p.institution_city_name
break # Usa apenas o primeiro owner
return owner_data
return p.data
return {}
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The owner_data property has been simplified from creating and populating a dictionary with multiple owners' data to just returning the first owner's data property. However, the old implementation had a break statement that only used the first owner anyway. The new implementation is cleaner, but there's a subtle behavior change: the old code initialized an empty dictionary and would return it even if no owners existed, while the new code returns an empty dictionary only after checking all owners.

This is a minor improvement, but verify that all consumers of owner_data can handle both cases correctly.

Copilot uses AI. Check for mistakes.
Comment on lines +785 to +786
for item in self.copyright_holder_history.all():
items.append(item.institution_name)
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The select_related() call has been removed from this query as well. If any copyright holder records still use the legacy institution field instead of the raw fields, this could cause N+1 query issues when accessing item.institution_name.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant