2006)), containing about 700,000 posts to (in total about 140 million words) by almost 20,000 bloggers. Slightly more information seems to be coming from content (75.1% accuracy) than from style (72.0% accuracy). We see the women focusing on personal matters, leading to important content words like love and boyfriend, and important style words like I and other personal pronouns.For each blogger, metadata is present, including the blogger s self-provided gender, age, industry and astrological sign. The creators themselves used it for various classification tasks, including gender recognition (Koppel et al. The men, on the other hand, seem to be more interested in computers, leading to important content words like software and game, and correspondingly more determiners and prepositions.
Προκειμένου να χρησιμοποιήσεις τα πλήρη χαρακτηριστικά της ιστοσελίδας, θα πρέπει να επιτρέψεις την θέαση περιεχομένου Flash στον φυλλομετρητή σου.A group which is very active in studying gender recognition (among other traits) on the basis of text is that around Moshe Koppel. 2002) they report gender recognition on formal written texts taken from the British National Corpus (and also give a good overview of previous work), reaching about 80% correct attributions using function words and parts of speech.Later, in 2004, the group collected a Blog Authorship Corpus (BAC; (Schler et al.Προτείνουμε σε όλους μας τους χρήστες να χρησιμοποιούν την έκδοση Flash της συζήτησης (χρησιμοποιείται τώρα).Μόνο τα premium μέλη μπορούν να δουν τις web κάμερες των άλλων χρηστών.We then experimented with several author profiling techniques, namely Support Vector Regression (as provided by LIBSVM; (Chang and Lin 2011)), Linguistic Profiling (LP; (van Halteren 2004)), and Ti MBL (Daelemans et al.