Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add modEventDivertBlobs event to send blobs to abyss #2238

Merged
merged 32 commits into from
Mar 12, 2024
Merged

Conversation

foysalit
Copy link
Contributor

@foysalit foysalit commented Feb 29, 2024

This PR adds a new mod event that sends all linked blobs to a record to an external service for performing further analysis on the blobs.

The new event expects an array of subjectBlobCids that will be forwarded to the external service.

A couple of open questions

  1. Do we need to validate the event for subjectBlobCids or default to ALL blobs in the record if none are specified?
  2. What's the "right way" of getting the blob?

}

private async getBlob(opts: { did: string; cid: string }) {
// TODO: Is this safe to do or should we be reaching out to the pds instead?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need some help here figuring out the right way to get the blob.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devinivy might be able to fill in with typescript-specific tricks... maybe there is a helper for this already? You could look in the appview code, in the auto-labeler section.

The general flow is to resolve the DID doc (probably from a local cache, or even just passing the full doc through to this method from elsewhere); get the account's PDS from there; make an un-authenticated request to the getBlobs endpoint, with appropriate HTTP retries and timeout. For correctness, then verify the hash of the content bytes against the blob CID.

status: number,
data: Record<string, boolean>,
) =>
nock(BLOB_DIVERT_SERVICE_HOST)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use this pattern of intercepting http reqs anywhere in the codebase so wondering if it's ok to introduce this here?

Copy link
Collaborator

@devinivy devinivy Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I often avoid these kinds of "live" mocking libraries because they tend to create some weird interactions. I think we can give it a whirl and if it shows any signs of weirdness, not have any qualms about ripping it out.

It seems like we could pretty easily mock it in the application by placing a method on BlobDiverter responsible for getting this response, and instead just mock-out that method. Node now comes with pretty awesome built-in mocking which makes this quite easy to do: https://nodejs.org/api/test.html#class-mocktracker

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool ended up using the jest.fn mocks.

@bnewbold
Copy link
Collaborator

Some high-level comments:

  • I think it could be just modEventDivert, which could in theory make it more generic in the future? but i'm also fine with modEventDivertBlobs
  • I think modEventDivertBlobs should take a list of blob CIDs, instead of fetching and sending all. similar to how we do blob purges by selecting. a different (in UI) will be having it default to "all" images.
  • I kind of assumed that we would also purge the blob CIDs at the same time, but I guess to make it work better with un-takedown (if the content is reviewed and found to be ok), we would want to revert. I guess we could either make the "un-takedown" event work with content that had been diverted; or have the UI action first a divert, then a second takedown (with the same blobs purged). Doing it all in the divert action feels more robust to me in some ways, but splitting in to multiple events has also become something of a pattern for ozone API.

@foysalit
Copy link
Contributor Author

I think it could be just modEventDivert, which could in theory make it more generic in the future? but i'm also fine with modEventDivertBlobs

I'm also fine with modEventDivert and agree that the generic wording might be helpful down the line.

I think modEventDivertBlobs should take a list of blob CIDs, instead of fetching and sending all. similar to how we do blob purges by selecting. a different (in UI) will be having it default to "all" images.

Yep, that's what happens right now, we expect the event to send us blobCids but I was asking if we should validate that there in fact, at least 1 blobCid and if not, should we fail or assume that all blobs linked to the record should be diverted?

Doing it all in the divert action feels more robust to me in some ways, but splitting in to multiple events has also become something of a pattern for ozone API.

Yeah I'd go for multiple event approach but just let the backend emit a consecutive takedown event when divert is received.

Copy link
Collaborator

@bnewbold bnewbold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a high level, looking great!

subjectDid: string
subjectBlobCid: string
subjectUri: string
divertedAt: Date | null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can name this confirmedAt to keep it consistent with repo_push_event, record_push_event, blob_push_event. The idea is that once the event has been accepted by the other service it's "confirmed."

return false
}

const url = `${this.serviceConfig.url}?did=${subjectDid}&uri=${subjectUri}`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will want to ensure these parts get %-encoded. The most reliable way to do it is probably to build a URL.:

const url = new URL(this.serviceConfig.url)
url.searchParams.set('did', subjectDid)
url.searchParams.set('uri', subjectUri)

method: 'POST',
data: imageStream,
headers: {
Authorization: this.serviceConfig.authToken,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we wouldn't need to configure the whole authorization header including the auth scheme. Should we be using service auth here, a bearer token, or something else? (CC @bnewbold)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

abyss wants HTTP Basic auth, with username admin, and the password as configured "token".

I don't think we have service-auth implemented generically in golang yet, so not super quick to add that (there are other golang clients for this service as well, so need both client+server impls). If we can keep it HTTP Basic that is best from my perspective. I don't know how the axios HTTP library works, maybe we could even encode the password in the URL so there is a single var to configure? abyss is generally not really set up as a regular atproto service, it is most like cardyb and other microservices (one diff is that abyss has pseudo-lexicon endpoints, though the golang side doesn't actually use the lexicon codegen or validation).

Comment on lines 148 to 150
if (!this.serviceConfig.authToken || !this.serviceConfig.url) {
throw new Error('Blob divert service not configured')
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's no service config, can we avoid creating a BlobDiverter at all? It is a bit awkward to have to check both serviceConfig.authToken and serviceConfig.url in each method and opens the question about how each method should behave if they aren't present.

Comment on lines 77 to 80
export type BlobReportServiceConfig = {
url?: string
authToken?: string
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like both url and authToken are required in order to use the blob report service. Could we make these both required here, and update OzoneConfig to allow it not to be configured:

export type OzoneConfig = {
  // ...
  blobReportService: BlobReportServiceConfig | null
}

@@ -0,0 +1,229 @@
import {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to toss out an idea— the blob_push_event has an eventType. Could we make blob diversions a new event type and avoid needing to add a new table, new part of the daemon that needs the polling machinery, etc.? I think the blob_push_event is already setup decently well for this situation, may just need to handle the eventType. I could see most of the BlobDiverter class sticking around to encapsulate the specifics about how the blob is pushed out, but not need to handle the polling, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I like that approach better, refactored the code to use that table instead.

Comment on lines 330 to 335
if (
isModEventDivert(event) &&
subjectInfo.subjectUri &&
subjectInfo.subjectBlobCids?.length
) {
const blobDiverts = await Promise.all(
Copy link
Collaborator

@devinivy devinivy Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other event types we don't take the effect of the event in logEvent(). Instead there's a method that handles it, which gets called within com.atproto.admin.emitModerationEvent after logging the event. For consistency maybe we should move this into that pattern rather than have this outlier.

Comment on lines 168 to 173
const uploadResult = await retryHttp(() =>
this.uploadBlob(
{ imageStream, contentType },
{ subjectDid, subjectUri },
),
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The imageStream is stateful so it's not going to be super retryable if it gets partially used-up. It's possible that this.getBlob() and this.uploadBlob() should go in the same retry block.

Base automatically changed from nullable-review-state to main March 6, 2024 20:30
import { subjectFromInput } from '../../mod-service/subject'
import { ModerationLangService } from '../../mod-service/lang'
import { retryHttp } from '../../util'
import { ModeratorOutput, RoleOutput } from '../../auth-verifier'

const handleModerationEvent = async ({
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored this into a function so that we can execute the entire flow again after divert event.

content,
recipientDid: subject.did,
}),
if (isModEventDivert(event) && subject.isRecord()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sync divert blob sequence here before logging the event.
this way, the moderator will immediately know that the diversion failed and the event won't be logged and won't be takendown.

})

// On divert events, we need to automatically take down the blobs
if (isModEventDivert(input.body.event)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emit takedown event following divert event.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One event expanding into two events on the backend is a little funky. If we find ourselves needing to do this again it might be worth revisiting, seeing if there's a nice way to think about this.

)
.returning(['id', 'subjectDid', 'subjectBlobCid', 'eventType'])
.execute()
const blobEvts = await this.eventPusher.logBlobPushEvent(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unrelated to the current implementation but felt nice to keep this refactoring.

subjectDid: subject.did,
subjectUri: subject.uri || null,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model seems to expect a subjectUri but it wasn't being passed here.

@devinivy devinivy merged commit 2802880 into main Mar 12, 2024
11 checks passed
@devinivy devinivy deleted the divert-blobs branch March 12, 2024 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants