![]() ,IFNULL(ANY_VALUE(IF(tag2='javascript',1,null)),0) Xjavascript You can reduce or augment the sensibility of these relations with the percent threshold: ‘unit-testing’ a relation to almost every column here, except to ‘php’, ‘html’, ‘css’, and ‘jquery’.‘multi-threading’ shows a relation to ‘python’, ‘java’, ‘c#’, and ‘android`.‘machine-learning’ shows a relation to ‘python’, but not the other way around.‘javascript’ shows a relation to ‘php’, ‘html’, ‘css’, ‘node.js’, and ‘jquery’.What you see here is a co-occurrence matrix: Let’s see first a subset of these results: Then I can use that string to get a huge table, with a 1 for every time a tag co-occurs with the main one at least certain % of time. So I’m going to create a string first that will define all the columns where I want to find co-occurrence. BigQuery ML does a good job of hot-encoding strings, but it doesn’t handle arrays as I wish it did (stay tuned). WHERE tag1 IN (SELECT tag FROM active_tags)ĪND tag2 IN (SELECT tag FROM active_tags) SELECT *, MAX(questions) OVER(PARTITION BY tag1) questions_tag1įROM data, UNNEST(SPLIT(tags, '|')) tag1, UNNEST(SPLIT(tags, '|')) tag2 SELECT *, questions/questions_tag1 percent So I’ll take these relationships and I’ll save them on an auxiliary table - plus a percentage of how frequently a relationship happens for each tag.ĬREATE OR REPLACE TABLE `deleting.stack_overflow_tag_co_ocurrence`įROM `fh-bigquery.stackoverflow_archive.201906_posts_questions` Let’s find tags that usually go together:Ĭo-occurring tags on Stack Overflow questions ![]() ORDER BY 2 DESC Top Stack Overflow tags by number of questions. In this picture I only have 240 tags - how would you group and categorize 4,000+ of them?įROM `fh-bigquery.stackoverflow_archive.201906_posts_questions`, These are the most active Stack Overflow tags since 2018 - they’re a lot. You can check out more about working with Stack Overflow data and BigQuery here and here. In this post he works with BigQuery – Google’s serverless data warehouse – to run k-means clustering over Stack Overflow’s published dataset, which is refreshed and uploaded to Google’s Cloud once a quarter. Felipe Hoffa is a Developer Advocate for Google Cloud. Visualizing a universe of clustered tags. When you press the back button, the current URL is removed from the stack, and the previous URL is accessed.How would you group more than 4,000 active Stack Overflow tags into meaningful groups? This is a perfect task for unsupervised learning and k-means clustering - and now you can do all this inside BigQuery. Each time you visit a new page, it is added on top of the stack. In browsers - The back button in a browser saves all the URLs you have visited previously in a stack.In compilers - Compilers use the stack to calculate the value of expressions like 2 + 4 / 5 * (7 - 9) by converting the expression to prefix or postfix form.Because of the LIFO order of stack, you will get the letters in reverse order. To reverse a word - Put all the letters in a stack and pop them out.O(1).Īlthough stack is a simple data structure to implement, it is very powerful. ![]() Printf("Item popped= %d", s->items) įor the array-based implementation of a stack, the push and pop operations take constant time, i.e. ![]() Utility function to return the size of the stack The most common stack implementation is using arrays, but it can also be implemented using lists. Stack Implementations in Python, Java, C, and C++
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |