A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://phabricator.wikimedia.org/p/Sumit/ below:

♟ Sumit

Thanks @Halfak for the invite to this task! I am interested in understanding how long standing editor practices can be best encoded in the editing interface that can help editors do better and allow us to build robust models from structured editing data that can automatically flag outstanding issues for editors to fix. One of the problems with building effective automated content flaw detection to help Wikipedians is the lack of precise information around historical edits (like what exactly was improved in this edit?)

I'm curious to know what kind of features is the Wikidata topic models API using? Is it the same features as the original topic model developed for English Wikipedia or something different?

Please test run your solutions locally. If it runs and gives expected results, submit a PR and it can be reviewed, if it doesn't seek help regarding the error. Screenshot of a code doesn't give much context to comment on.

I have double checked the path, but the error persists.

The get_sections() method is specified in the Wikicode class in wikicode.py in the mwparserfromhell repo.

Please let me know where I am going wrong here.

ROC_AUC:

roc_auc (micro=0.943, macro=0.948):                                                                                                                              
        -------------------------------------------  -----                                                                                                       
        Geography.Maps                               0.971                                                                                                       
        Geography.Europe                             0.929                                                                                                       
        Culture.Media                                0.951                                                                                                       
        STEM.Physics                                 0.975                                                                                                       
        Geography.Oceania                            0.966                                                                                                       
        STEM.Meteorology                             0.987                                                                                                       
        Culture.Internet culture                     0.969                                                                                                       
        History_And_Society.Military and warfare     0.968                                                                                                       
        Culture.Performing arts                      0.982                                                                                                       
        STEM.Engineering                             0.954
        Culture.Language and literature              0.949
        STEM.Space                                   0.987
        STEM.Geosciences                             0.972
        STEM.Technology                              0.942
        Geography.Landforms                          0.987
        STEM.Biology                                 0.956
        Culture.Broadcasting                         0.973
        Culture.Sports                               0.977
        STEM.Chemistry                               0.98 
        Assistance.Maintenance                       0.838
        Culture.Visual arts                          0.969
        Culture.Plastic arts                         0.966
        History_And_Society.Transportation           0.977
        STEM.Mathematics                             0.98 
        Culture.Entertainment                        0.971 
        STEM.Medicine                                0.974
        STEM.Information science                     0.969 
        STEM.Meteorology                             0.987
        Culture.Internet culture                     0.969                                                                                                       
        History_And_Society.Military and warfare     0.968                                                                                                       
        Culture.Performing arts                      0.982                                                                                                       
        STEM.Engineering                             0.954
        Culture.Language and literature              0.949
        STEM.Space                                   0.987
        STEM.Geosciences                             0.972
        STEM.Technology                              0.942
        Geography.Landforms                          0.987
        STEM.Biology                                 0.956
        Culture.Broadcasting                         0.973
        Culture.Sports                               0.977
        STEM.Chemistry                               0.98 
        Assistance.Maintenance                       0.838
        Culture.Visual arts                          0.969
        Culture.Plastic arts                         0.966
        History_And_Society.Transportation           0.977
        STEM.Mathematics                             0.98 
        Culture.Entertainment                        0.971
        STEM.Medicine                                0.974
        STEM.Information science                     0.969
        STEM.Time                                    0.973
        History_And_Society.Education                0.969
        History_And_Society.Politics and government  0.941
        Culture.Food and drink                       0.975
        Assistance.Contents systems                  0.95 
        History_And_Society.Business and economics   0.948
        Assistance.Article improvement and grading   0.684
        Geography.Countries                          0.893
        History_And_Society.History and society      0.868
        Culture.Philosophy and religion              0.936
        Assistance.Files                             0.773
        STEM.Science                                 0.935
        Geography.Cities                             0.969
        Culture.Crafts and hobbies                   0.965
        Culture.Arts                                 0.985
        Geography.Bodies of water                    0.987
Sumit

renamed

T193789: [Discuss] Storage of model training/testing datasets

from

aodaaaaaaa

to

[Discuss] Random sampling by PAWS vs API requests

.

counts (n=84480):                                                                                                                                      [598/1636]
                        label                                              n          TP    FP    FN     TN                                                              
                        ---------------------------------------------  -----  ---  -----  ----  ----  -----                                                              
                        'STEM.Mathematics'                              1454  -->    938   516    98  82928                                                              
                        'Assistance.Files'                               350  -->     28   322   111  84019                                                              
                        'Culture.Food and drink'                        2264  -->   1559   705   156  82060                                                              
                        'STEM.Biology'                                  3134  -->   1772  1362   266  81080                                                              
                        'History_And_Society.Business and economics'    6075  -->   2993  3082   834  77571                                                              
                        'Assistance.Contents systems'                   1953  -->    686  1267   142  82385                                                              
                        'Culture.Language and literature'              19588  -->  14199  5389  2390  62502                                                              
                        'Culture.Media'                                 2039  -->    596  1443   261  82180
                        'Culture.Philosophy and religion'               3840  -->   1693  2147   451  80189
                        'STEM.Physics'                                  2376  -->   1259  1117   360  81744
                        'STEM.Chemistry'                                2083  -->   1287   796   265  82132
                        'History_And_Society.Military and warfare'      3921  -->   2453  1468   392  80167
                        'Geography.Europe'                             15349  -->   8930  6419  2580  66551
                        'History_And_Society.Education'                 2633  -->   1603  1030   252  81595
                        'Geography.Landforms'                           2148  -->   1710   438   139  82193
                        'Assistance.Article improvement and grading'      67  -->     16    51  3082  81331
                        'Culture.Plastic arts'                          3717  -->   2116  1601   404  80359
                        'STEM.Space'                                    2117  -->   1731   386   102  82261
                        'Geography.Maps'                                2421  -->   1370  1051    69  81990
                        'Culture.Performing arts'                       4180  -->   3313   867   389  79911
                        'Geography.Cities'                               791  -->    493   298   111  83578
                        'Culture.Broadcasting'                          2807  -->   1586  1221   434  81239
                        'STEM.Engineering'                              2133  -->    768  1365   267  82080
                        'Assistance.Maintenance'                        5028  -->   1112  3916   244  79208
                        'History_And_Society.History and society'       7010  -->   1371  5639   520  76950
                        'STEM.Time'                                     2216  -->   1520   696   102  82162
                        'Culture.Sports'                                4844  -->   3970   874   369  79267
                        'Culture.Crafts and hobbies'                    1988  -->   1138   850    64  82428
                        'STEM.Information science'                      2037  -->   1148   889   117  82326
                        'History_And_Society.Politics and government'   4047  -->   1572  2475   508  79925
                        'History_And_Society.Transportation'            3680  -->   2508  1172   341  80459
                        'Culture.Arts'                                  1999  -->   1488   511   101  82380
                        'Geography.Countries'                          24068  -->  14352  9716  4136  56276
                        'Geography.Bodies of water'                     2232  -->   1732   500   154  82094
                        'STEM.Meteorology'                              1753  -->   1360   393    72  82655
                        'Geography.Oceania'                             4025  -->   2479  1546   213  80242
                        'STEM.Medicine'                                 1951  -->   1116   835   266  82263
                        'Culture.Visual arts'                           4563  -->   2594  1969   544  79373
                        'STEM.Science'                                  2133  -->    545  1588   160  82187
                        'Culture.Internet culture'                      1839  -->    922   917   222  82419
                        'STEM.Technology'                               3825  -->   1330  2495   597  80058
                        'Culture.Entertainment'                         5529  -->   3597  1932   577  78374
                        'STEM.Geosciences'                              1987  -->   1183   804   125  82368
                        'STEM.Medicine'                                 1951  -->   1116   835   266  82263
                        'Culture.Visual arts'                           4563  -->   2594  1969   544  79373
                        'STEM.Science'                                  2133  -->    545  1588   160  82187
                        'Culture.Internet culture'                      1839  -->    922   917   222  82419
                        'STEM.Technology'                               3825  -->   1330  2495   597  80058
                        'Culture.Entertainment'                         5529  -->   3597  1932   577  78374
                        'STEM.Geosciences'                              1987  -->   1183   804   125  82368
pr_auc (micro=0.761, macro=0.724):                                                                                                                      [22/1636]
        -------------------------------------------  -----                                                                                
        Culture.Arts                                 0.911                                                                                            
        Culture.Internet culture                     0.685                                                                                                       
        Culture.Language and literature              0.871                                                                                                       
        Culture.Performing arts                      0.912                                                                                                       
        History_And_Society.Transportation           0.858                                                                                                       
        Assistance.Files                             0.042                                                                                                       
        STEM.Science                                 0.498                                                                                                       
        STEM.Medicine                                0.743                                                                                                       
        Culture.Crafts and hobbies                   0.813                                         
        History_And_Society.Military and warfare     0.812                                                                                                       
        STEM.Technology                              0.56                                                                                                        
        STEM.Meteorology                             0.919                                                                                                       
        Assistance.Maintenance                       0.458                                                                                                       
        Culture.Philosophy and religion              0.633                                                                                                       
        STEM.Engineering                             0.578                                                                                                       
        Culture.Entertainment                        0.84                                                                                                        
        History_And_Society.Business and economics   0.7                                           
        Geography.Landforms                          0.927                                                                                              
        STEM.Biology                                 0.748                                                                                
        Assistance.Contents systems                  0.611                                                                                            
        Geography.Maps                               0.835                                                                                                       
        STEM.Geosciences                             0.8                                                                                                         
        History_And_Society.Education                0.777                                                                                                       
        Geography.Bodies of water                    0.914                                                                                                       
        STEM.Mathematics                             0.845                                                                                                       
        History_And_Society.Politics and government  0.615
        Geography.Europe                             0.763
        STEM.Physics                                 0.717
        Assistance.Article improvement and grading   0.004
        STEM.Space                                   0.938
        History_And_Society.History and society      0.486
        Geography.Oceania                            0.838
        Geography.Countries                          0.779
        STEM.Time                                    0.86
        STEM.Chemistry                               0.779
        Geography.Cities                             0.73
        Culture.Food and drink                       0.856
        Culture.Broadcasting                         0.735
        STEM.Information science                     0.79
        Culture.Sports                               0.914
        Culture.Media                                0.497
        Culture.Visual arts                          0.776
        Culture.Plastic arts                         0.774
        -------------------------------------------  -----

Looks like an issue with [[0]] being returned on an empty string '' by wordvectors instead of the usual null vector of dimensions (300,)

Sumit

raised the priority of

T191214: Edittypes repo setup

from

Lowest

to

Medium

.

Sumit

renamed

T190288: Investigate runtime of tune with high number of estimators

from

Drafttopic estimators take very less time to train but tune hangs up forever

to

Investigate runtime of tune with high number of estimators

.

I wonder if you could figure out where the hangup is happening by adding "--debug" to the tune utility call.

Sumit

renamed

T190288: Investigate runtime of tune with high number of estimators

from

Drafttopic estimators take very less time but tune hangs up forever

to

Drafttopic estimators take very less time to train but tune hangs up forever

.

Yeah we'll need scipy >= 0.18.1 but i see for revscoring scipy is already set as - scipy >= 0.13.3, < 1.0.999

The recommended order for review should be - 18, 20, 19

Final resolution done by using a wrapper function - https://github.com/wiki-ai/revscoring/pull/394

I made a demo of this problem to try to see if I could reproduce it in isolation. See https://github.com/halfak/demo_shared_memory

TL;DR: it didn't work. I get the exact same output for both strategies!

@Ragesoss there's ongoing work around topic modeling for English Wikipedia using WikiProject topics as bases. If Education Program Dashboard has some similar categorization of articles around pre-defined topics, a similar model can be built to predict topics as well as recommend them. Let me know if you wanna talk more about it.

@Jayprakash12345 Could I take up this?

@Sumit please link to the code changes you're making that seem to improve memory sharing.

Refer to the gist in the first comment for the code changes that make it multiprocessing friendly.

Test code for benchmarking using word2vec as an external module contained in english_vectors:

from multiprocessing import Pool, cpu_count
import functools
from revscoring.dependencies import solve
from revscoring.datasources.meta import vectorizers
from revscoring.features.meta import aggregators
from revscoring.languages import english
from revscoring.languages.english_vectors import google_news_kvs
from revscoring.datasources import revision_oriented

with wordvectors blockers now cleared, building drafttopic model on ores-stat-01

Working on the Debian packaging here: https://phabricator.wikimedia.org/source/word2vec/

@Sumit Is the gensim package able to read the gzipped file, or should we decompress during installation?

A common use case of fetch_text is augmenting the dataset with X info from Y api. This will address:

The binary *was* on ores-misc-01 which is now nuked. I'll upload it to ores-staging-01 from my system again from where it can be put somewhere public.

I've taken backup of the tuning reports, and the GradientBoosting and RandomForest models.

@Sumit, please move to the "done" column before closing tasks. We need this in order to consistently report what has been "done".

Looks like we don't include the top level category names yet. @Sumit said he'd like to do that in a separate PR.

Could free up 2.2G more...

Removed 800MB of my stuff which included cached models and datasets.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4