Skip to content

traceypooh/textAV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

<!doctype html><script>window.REMOTE=1</script><script src="https://archive.org/~tracey/slides/eveal.js/eveal.js"></script><title>Archive TV and Captions, textAV</title>

Archive TV and Captions

at textAV, NYU
July 2017

by [traceypooh](https://twitter.com/tracey_pooh)
git clone https://github.com/traceypooh/textAV; open textAV/index.html
_?_ for key shortcuts

archive.org/tv

  • recording 50 - 100 channels
    • 24 x 7
    • around the world
    • since 2000
  • 2 million news shows
  • search captions

Demo Time!


The Third Eye

  • reading the "lower thirds"
  • compare networks
    • editorial?
    • angle?

Comey -v- Sessions

http://archive.org/~tracey/tv/comey.htm http://archive.org/~tracey/tv/sessions.htm


Lower third tech

  • crop third every second
  • tesseract (OCR)
  • simhash
    • similarity hash
    • phrases nearly equal?
  • grouping ~repeated instances

tweetybot

https://twitter.com/tvThirdEye

  • CNN now
  • expand to MSNBC, FOXNEWS, BBCNEWS
  • launching soon

BBCNEWS

  • ccextractor
    • OCR caption glyphs (euro DVB)
    • tesseract
  • avoid repeated / rolling windows
    • compare two images:
      • how to cook
      • how to cook for humans
    • some deduping and simhash

congressional archives

  • coming next week
  • trump administration, too
  • allow CC searching subsets
  • browsing easier
  • find most watched or cited pieces

clips

  • little JSON annotations
  • arbitrary start/end
  • auto expands each clip to a "synthetic" document
    • to elastic search
  • JSONPatch for changes
  • track play counts, some referers

clip

{
    "268.1|269.1": {
        "subject": [
            "Criminal Activity"
            "Crime"
        ],
        "factcheck": [
            "http://www.factcheck.org/2016/07/factchecking-trumps-big-speech/"
        ]
    },
    "266.7|267.2": {
        "ad_id": "PolAd_DonaldTrump_d9dsn",
        "type": "campaign",
        "race": "PRES",
        "cycle": "2016",
        "message": "pro",
        "sponsor": [
            "Republican National Cmte"
        ],
        "sponsor_type": "PAC",
        "subject": [
            "Job Accomplishments"
        ],
        "person": [
            "Donald Trump"
        ]
    },
    "268.1|269.1": {
        "collection": [
            "nancy_pelosi_archive"
        ],
        "subject": [
            "Voting",
        ],
    }
}

other


THE END

About

archive.org TV and captions slides for NYU textAV

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages