Why do we skip cases where the student and teacher operate on the same view? If they are operating on different views, why should they produce similar results to calculate the cross-entropy loss? #267

jinghere11 · 2023-12-12T03:17:30Z

    total_loss = 0
    n_loss_terms = 0
    for iq, q in enumerate(teacher_out):
        for v in range(len(student_out)):
            if v == iq:
                # we skip cases where student and teacher operate on the same view
                continue
            loss = torch.sum(-q * F.log_softmax(student_out[v], dim=-1), dim=-1)
            total_loss += loss.mean()
            n_loss_terms += 1

The text was updated successfully, but these errors were encountered:

fbliman · 2024-03-06T12:29:01Z

I am not an expert, but my intuition is that feeding the same image will lead to a very small loss and hence an insginificat training, so is wasted resources

but thats only a guess

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why do we skip cases where the student and teacher operate on the same view? If they are operating on different views, why should they produce similar results to calculate the cross-entropy loss? #267

Why do we skip cases where the student and teacher operate on the same view? If they are operating on different views, why should they produce similar results to calculate the cross-entropy loss? #267

jinghere11 commented Dec 12, 2023

fbliman commented Mar 6, 2024

Why do we skip cases where the student and teacher operate on the same view? If they are operating on different views, why should they produce similar results to calculate the cross-entropy loss? #267

Why do we skip cases where the student and teacher operate on the same view? If they are operating on different views, why should they produce similar results to calculate the cross-entropy loss? #267

Comments

jinghere11 commented Dec 12, 2023

fbliman commented Mar 6, 2024