dart-sdk

mirror of https://github.com/dart-lang/sdk synced 2024-10-03 11:11:05 +00:00

History

floitsch@google.com 74da0c7286 Remove usage of dart:json. R=jmesserly@google.com, lrn@google.com, nweiz@google.com, rnystrom@google.com Review URL: https://codereview.chromium.org//23596007 git-svn-id: https://dart.googlecode.com/svn/branches/bleeding_edge/dart@26789 260f80e4-7a28-3924-810f-c04153c831b5		2013-08-28 14:05:11 +00:00
..
data	Integrate MDN content into API documentation.	2012-01-31 21:33:41 +00:00
crawl.js	Repair bitrot in the utils/apidoc/mdn/ code. It now runs all the way through,	2012-08-29 19:49:39 +00:00
database.json	cleanup postProcess step and output obsolete.json file	2012-02-08 01:20:57 +00:00
extract.dart	Remove usage of dart:json.	2013-08-28 14:05:11 +00:00
extract.sh	Repair bitrot in the utils/apidoc/mdn/ code. It now runs all the way through,	2012-08-29 19:49:39 +00:00
extractRunner.js	Switch from DRT to content shell.	2013-05-29 08:34:52 +00:00
full_run.sh	Repair bitrot in the utils/apidoc/mdn/ code. It now runs all the way through,	2012-08-29 19:49:39 +00:00
obsolete.json	cleanup postProcess step and output obsolete.json file	2012-02-08 01:20:57 +00:00
postProcess.dart	Remove usage of dart:json.	2013-08-28 14:05:11 +00:00
prettyPrint.dart	Remove usage of dart:json.	2013-08-28 14:05:11 +00:00
README.txt	Allow apidoc to skip MDN database.json entries that are bogus.	2012-09-06 20:29:16 +00:00
search.js	Repair bitrot in the utils/apidoc/mdn/ code. It now runs all the way through,	2012-08-29 19:49:39 +00:00
util.dart	Remove usage of dart:json.	2013-08-28 14:05:11 +00:00

README.txt

***** Current status

Currently it runs all the way through, but the database.json has all
members[] lists empty.  Most entries are skipped for "Suspect title";
some have ".pageText not found".

Currently only works on Linux; OS X (or other) will need minor path changes.

You will need a reasonably modern node.js installed.
0.5.9 is too old; 0.8.8 is not too old.

I needed to add my own "DumpRenderTree_resources/missingImage.gif",
for some reason.

For the reasons above, we're currently just using the checked-in
database.json from Feb 2012, but it has some bogus entries.  In
particular, the one for UnknownElement would inject irrelevant German
text into our docs.  So a hack in apidoc.dart (_mdnTypeNamesToSkip)
works around this.

***** Overview

Here's a rough walkthrough of how this works. The ultimate output file is
database.filtered.json.

full_run.sh executes all of the scripts in the correct order.

search.js
- read data/domTypes.json
- for each dom type:
  - search for page on www.googleapis.com
  - write search results to output/search/<type>.json
    . this is a list of search results and urls to pages

crawl.js
- read data/domTypes.json
- for each dom type:
  - for each output/search/<type>.json:
    - for each result in the file:
      - try to scrape that cached MDN page from webcache.googleusercontent.com
      - write mdn page to output/crawl/<type><index of result>.html
- write output/crawl/cache.json
  . it maps types -> search result page urls and titles

extract.sh
- compile extract.dart to js
- run extractRunner.js
  - read data/domTypes.json
  - read output/crawl/cache.json
  - read data/dartIdl.json
  - for each scraped search result page:
    - create a cleaned up html page in output/extract/<type><index>.html that
      contains the scraped content + a script tag that includes extract.dart.js.
    - create an args file in output/extract/<type><index>.html.json with some
      data on how that file should be processed
    - invoke dump render tree on that file
    - when that returns, parse the console output and add it to database.json
    - add any errors to output/errors.json
  - save output/database.json

extract.dart
- xhr output/extract/<type><index>.html.json
- all sorts of shenanigans to actually pull the content out of the html
- build a JSON object with the results
- do a postmessage with that object so extractRunner.js can pull it out

- run postProcess.dart
  - go through the results for each type looking for the best match
  - write output/database.html
  - write output/examples.html
  - write output/obsolete.html
  - write output/database.filtered.json which is the best matches

***** Process for updating database.json using these scripts.

TODO(eub) when I get the scripts to work all the way through.