ChartJS 如何< canvas>用python或javascript抓取元素中的数据?

niknxzdl  于 2023-01-26  发布在  Chart.js
关注(0)|答案(1)|浏览(343)

我想从像this (stat game of the game I play)这样的站点抓取数据,其中交互式图表在<canvas>元素中呈现,并且没有将任何数据显示为可抓取的HTML元素。检查HTML,页面似乎使用chartjs
虽然python中的帮助是首选,但如果我真的需要使用一些javascript,那也没问题。
另外,我希望避免使用需要额外文件的方法,如phantomjs,但如果这是唯一的方法,请慷慨地分享它。

fae0ux8s

fae0ux8s1#

解决这个问题的一种方法是在页面源代码中第1050行附近检查页面的<script>,这实际上是图表初始化的地方。在图表的初始化过程中有一个循环模式,其中画布元素被逐个查询以获得它们的上下文,然后是提供图表的标签和统计信息的变量。
此解决方案包括使用node.js,至少是包含以下模块的最新版本:

  • cheerio,用于查询DOM中的元素
  • axios,用于发送HTTP请求以获得页面源。
  • abstract-syntax-tree以获得我们希望抓取的脚本的javascript对象树表示。

下面是solution和源代码:

  1. const cheerio = require('cheerio');
  2. const axios = require('axios');
  3. const { parse, each, find } = require('abstract-syntax-tree');
  4. async function main() {
  5. // get the page source
  6. const { data } = await axios.get(
  7. 'https://stats.warbrokers.io/players/i/5d2ead35d142affb05757778'
  8. );
  9. // load the page source with cheerio to query the elements
  10. const $ = cheerio.load(data);
  11. // get the script tag that contains the string 'Chart.defaults'
  12. const contents = $('script')
  13. .toArray()
  14. .map(script => $(script).html())
  15. .find(contents => contents.includes('Chart.defaults'));
  16. // convert the script content to an AST
  17. const ast = parse(contents);
  18. // we'll put all declarations in this object
  19. const declarations = {};
  20. // current key
  21. let key = null;
  22. // iterate over all variable declarations inside a script
  23. each(ast, 'VariableDeclaration', node => {
  24. // iterate over possible declarations, e.g. comma separated
  25. node.declarations.forEach(item => {
  26. // let's get the key to contain the values of the statistics and their labels
  27. // we'll use the ID of the canvas itself in this case..
  28. if(item.id.name === 'ctx') { // is this a canvas context variable?
  29. // get the only string literal that is not '2d'
  30. const literal = find(item, 'Literal').find(v => v.value !== '2d');
  31. if(literal) { // do we have non- '2d' string literals?
  32. // then assign it as the current key
  33. key = literal.value;
  34. }
  35. }
  36. // ensure that the variable we're getting is an array expression
  37. if(key && item.init && item.init.type === 'ArrayExpression') {
  38. // get the array expression
  39. const array = item.init.elements.map(v => v.value);
  40. // did we get the values from the statistics?
  41. if(declarations[key]) {
  42. // zip the objects to associate keys and values properly
  43. const result = {};
  44. for(let index = 0; index < array.length; index++) {
  45. result[array[index]] = declarations[key][index];
  46. }
  47. declarations[key] = result;
  48. // let's make the key null again to avoid getting
  49. // unnecessary array expression
  50. key = null;
  51. } else {
  52. // store the values
  53. declarations[key] = array;
  54. }
  55. }
  56. });
  57. });
  58. // logging it here, it's up to you how you deal with the data itself
  59. console.log(declarations);
  60. }
  61. main();
展开查看全部

相关问题