Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine format of information for crawler #59

Open
9 tasks
codehag opened this issue Jun 6, 2018 · 10 comments
Open
9 tasks

Determine format of information for crawler #59

codehag opened this issue Jun 6, 2018 · 10 comments
Labels
discussion Idea / Issue that needs to be discussed first

Comments

@codehag
Copy link
Collaborator

codehag commented Jun 6, 2018

We are planning on having a crawler collect information from proposals for display. At the moment the following are necessary:

  • title : text string
  • short description (max 150 characters?) : text string
  • champion : array of strings
  • author : array of strings
  • test : URL
  • specification : URL
  • code sample : code block (text string)
  • resources (github repo, etc) : array of urls
  • proposal group (if applicable) : proposal group name

Is anything missing from this list?

<edited to add proposal groups, code samples, and types for the information>

@codehag codehag added the discussion Idea / Issue that needs to be discussed first label Jun 6, 2018
@keithamus
Copy link
Member

  • stage the proposal is currently at (perhaps with dates of when it hit certain stages?)

@zoepage
Copy link
Contributor

zoepage commented Jun 6, 2018

Which of the points (tests, specs...) imply an URL?

@jasonwilliams
Copy link
Member

(Nice To Have) - Example code

@codehag
Copy link
Collaborator Author

codehag commented Jun 6, 2018

for now this is only for stage 3, as we do not have plans for the other stages. if we are thinking about other stages, what would people like to see?

@jasonwilliams
Copy link
Member

Looks like example code can be pulled from #66

@keithamus
Copy link
Member

keithamus commented Jul 4, 2018

Proposal

If we combine GitHub's metadata, and some well-known paths, we can get all of the data we need.

GitHub Metadata

  • name and github url
  • description
  • direct collaborators (these can be maintained so that the first one is the author, the others are champions)
  • homepage url (stage 3 specs can set this to their rendered spec text)

Well-known paths in repo:

├── README.md
├── sample.js // this is where authors can put sample implementations
├── spec
│   ├── index.html // this seems to be defacto standard for spec

The query

GitHub's v4 graphql API can get us all this data in one hit:

{
  organization(login: "tc39") {
    repositories(first: 20, isFork: false, privacy: PUBLIC) {
      nodes {
        name
        description: shortDescriptionHTML(limit: 150)
        githubUrl: url
        specUrl: homepageUrl
        collaborators(first: 10, affiliation: DIRECT) {
          nodes {
            username: login
            fullName: name
            company
          }
        }
        readme: object(expression: "master:README.md") {
          ... on Blob {
            text
          }
        }
        codeSample: object(expression: "master:sample.js") {
          ... on Blob {
            text
          }
        }
      }
    }
  }
}

You can try out this query yourselves in the graphql explorer - the only part that might not work is collaborators which requires push access to view.

Any missing info in this query will just return null which would a great opporunity for us to raise warnings or possibly automatically file issues against those repos.

We could also make use of GitHub's topics feature which allows you to assign arbitrary tags to a repo. We could include those in the query to get small snippets of info - or to filter repos by this. For example if all proposals added a topic of proposal we could filter by those, and repos could also add stage-1/stage-2/etc to designate what stage they've reached.

@ljharb
Copy link
Member

ljharb commented Jul 4, 2018

Topics can’t be changed by pull request tho, so there’s no way for non admins (or collabs, I’m not sure which) to make or explicitly request a change - unless topics are somehow accessible in the repo?

@keithamus
Copy link
Member

Well, we could use https://github.com/probot/settings which facilitates every setting on a GitHub repository to be synced to a YAML file - which users can then make pull-requests on. Of course if we're adding an app to manage a YAML file we may want to just have that data surfaced via a well-known file anyway - perhaps package.json

@ljharb
Copy link
Member

ljharb commented Jul 5, 2018

Using a bot isn't particularly ergonomic tho - it'd be really nice if that feature was built into github directly :-)

@codehag
Copy link
Collaborator Author

codehag commented Oct 4, 2018

So, as a way to move forward here, would it make sense to use the meta data for now? Then if the issue that ljharb brought up is common enough we can look into automating that as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Idea / Issue that needs to be discussed first
Projects
None yet
Development

No branches or pull requests

5 participants